CN110557639A

CN110557639A - Application of interleaved prediction

Info

Publication number: CN110557639A
Application number: CN201910468418.8A
Authority: CN
Inventors: 张凯; 张莉; 刘鸿彬; 王悦
Original assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Current assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Priority date: 2018-05-31
Filing date: 2019-05-31
Publication date: 2019-12-10
Anticipated expiration: 2039-05-31
Also published as: WO2019229682A1; CN110557639B; TW202005388A

Abstract

A method of processing a video block, comprising: the method further includes determining to apply interleaved prediction to the block as the block satisfies a condition, determining a prediction block based on the first inter-prediction block and the second inter-prediction block, and generating an encoded or decoded representation of the block using the prediction block. A first inter-prediction block is generated from a first set of sub-blocks obtained by partitioning the block according to a first partitioning pattern, and a second inter-prediction block is generated from a second set of sub-blocks obtained by partitioning the block according to a second partitioning pattern. At least one sub-block in the second group has a different size than the sub-blocks in the first group.

Description

Application of interleaved prediction

Cross Reference to Related Applications

In accordance with applicable patent laws and/or paris convention, the present application claims in time the priority and benefit of (1) a prior chinese patent application filed on day 31, 5, 2018 with international patent application number PCT/CN2018/089242 and (2) a prior chinese patent application filed on day 2, 1, 2019 with international patent application number PCT/CN2019/070058, which were subsequently abandoned after filing. The entire disclosures of international patent application numbers PCT/CN2018/089242 and PCT/CN2019/070058 are incorporated herein by reference as part of the present disclosure, in accordance with U.S. law.

Technical Field

This document relates to video coding techniques, devices and systems.

Background

Motion Compensation (MC) is a technique in video processing that predicts frames in video by considering the motion of the camera and/or objects in the video, given previous and/or future frames. Motion compensation may be used in the encoding of video data to achieve video compression.

Disclosure of Invention

This document discloses methods, systems, and devices related to sub-block based motion prediction in video motion compensation.

in one exemplary aspect, a method of processing a video block is disclosed, the method comprising: the method further includes determining to apply interleaved prediction to the block as the block satisfies a condition, determining a prediction block based on the first inter-prediction block and the second inter-prediction block, and generating an encoded or decoded representation of the block using the prediction block. A first inter-prediction block is generated from a first set of sub-blocks obtained by partitioning the block according to a first partitioning pattern, and a second inter-prediction block is generated from a second set of sub-blocks obtained by partitioning the block according to a second partitioning pattern. At least one sub-block in the second group has a different size than the sub-blocks in the first group.

In another exemplary aspect, an apparatus includes: a processor configured to implement the methods described herein.

In yet another exemplary aspect, the various techniques described herein may be implemented as a computer program product, stored on a non-transitory computer-readable medium, comprising program code for implementing the methods described herein.

In yet another exemplary aspect, a video decoding apparatus may implement the methods described herein.

the details of one or more embodiments are set forth in the accompanying drawings, the description, and the claims.

Drawings

Fig. 1 is a diagram illustrating an example of sub-block based prediction.

Fig. 2 shows an example of an affine motion field of a block described by two control point motion vectors.

Fig. 3 shows an example of an affine motion vector field for each sub-block of a block.

Fig. 4 shows an example of motion vector prediction of a block 400 in AF _ INTER mode.

Fig. 5A shows an example of the selection order of candidate blocks of a current Coding Unit (CU).

fig. 5B shows another example of a candidate block of a current CU in AF _ MERGE mode.

Fig. 6 shows an example of an optional temporal motion vector prediction (ATMVP) motion prediction process for a CU.

Fig. 7 shows an example of one CU and neighboring blocks with four sub-blocks.

Fig. 8 is a flow diagram of an example method of video processing.

Fig. 9 shows an example of a functional block diagram of a video encoder or video decoder.

Fig. 10 shows an example of bidirectional matching used in a Frame Rate Up Conversion (FRUC) method.

Fig. 11 shows an example of template matching used in the FRUC method.

Fig. 12 shows an example of uni-directional Motion Estimation (ME) in the FRUC method.

Fig. 13 illustrates an example of interleaved prediction with two partition modes in accordance with the disclosed techniques.

Fig. 14A illustrates an example partitioning pattern in which a block is partitioned into 4 × 4 sub-blocks in accordance with the disclosed technique.

Fig. 14B illustrates an example partitioning pattern in which a block is partitioned into 8x 8 sub-blocks in accordance with the disclosed technique.

Fig. 14C illustrates an example partitioning pattern in which a block is partitioned into 4 × 8 sub-blocks in accordance with the disclosed technique.

Fig. 14D illustrates an example partitioning pattern in which a block is partitioned into 8x 4 sub-blocks in accordance with the disclosed technique.

Fig. 14E illustrates an example partitioning pattern in which a block is partitioned into inconsistent sub-blocks in accordance with the disclosed techniques.

Fig. 14F illustrates another example partitioning pattern in which a block is partitioned into inconsistent sub-blocks in accordance with the disclosed techniques.

FIG. 14G illustrates yet another example partitioning pattern in which a block is partitioned into inconsistent sub-blocks in accordance with the disclosed techniques.

Fig. 15A is an example flow diagram of a method of improving bandwidth usage and prediction accuracy of a block-based motion prediction video system in accordance with the disclosed techniques.

Fig. 15B is another example flow diagram of a method of improving bandwidth usage and prediction accuracy of a block-based motion prediction video system in accordance with the disclosed techniques.

FIG. 16 is a schematic diagram illustrating an example of an architecture of a computer system or other control device that may be used to implement portions of the disclosed technology.

FIG. 17 illustrates a block diagram of an example embodiment of a mobile device that may be used to implement portions of the disclosed technology.

Detailed Description

Global motion compensation is one of the variants of motion compensation techniques in video compression and can be used to predict the motion of the camera. However, moving objects within frames of a video file are not adequately represented by various implementations of global motion compensation. Local motion estimation, such as block motion compensation, may be used to interpret moving objects within a frame, where the frame is divided into blocks of pixels for performing motion prediction.

Sub-block based prediction developed based on block motion compensation was first introduced into the video coding standard by the High Efficiency Video Coding (HEVC) annex I (3D-HEVC).

Fig. 1 is a schematic diagram illustrating an example of prediction-based subblocks. Using sub-block based prediction, a block 100, such as a Coding Unit (CU) or a Prediction Unit (PU), is divided into several non-overlapping sub-blocks 101. Different sub-blocks may be assigned different motion information, such as reference indices or Motion Vectors (MVs). Motion compensation is then performed separately for each sub-block.

In order to explore future video coding techniques beyond HEVC, the Video Coding Experts Group (VCEG) and the Motion Picture Experts Group (MPEG) jointly established the joint video exploration team (jfet) in 2015. JVET takes a number of approaches and adds them to a reference software named Joint Exploration Model (JEM). In JEM, sub-block based prediction is employed in a variety of coding techniques, such as affine prediction, optional temporal motion vector prediction (ATMVP), spatial-temporal motion vector prediction (STMVP), bi-directional optical flow (BIO), and frame rate up-conversion (FRUC), which are discussed in detail below.

Affine prediction

In HEVC, only the translational motion model is applied to Motion Compensated Prediction (MCP). However, the camera and the object may have a variety of motions, such as zoom in/out, rotation, perspective motion, and/or other irregular motions. JEM, on the other hand, applies simplified affine transform motion compensated prediction.

FIG. 2 shows a motion vector V from two control points₀And V₁An example of an affine motion field of block 200 is described. The Motion Vector Field (MVF) of block 200 may be described by the following equation:

As shown in FIG. 2, (v)_0x,v_0y) Is the motion vector of the upper left corner control point, and (v)_1x,v_1y) Is the motion vector of the upper right hand corner control point. To simplify motion compensated prediction, sub-block based affine transform prediction may be applied. The subblock size M × N is derived as follows:

Here, MvPre is the motion vector score precision (e.g., 1/16 in JEM). (v)_2x,v_2y) Is the motion vector of the lower left control point, which is calculated according to equation (1). If desired, M and N can be adjusted downward as divisors of w and h, respectively.

Fig. 3 shows an example of affine MVF for each sub-block of block 300. To derive the motion vector for each M × N sub-block, the motion vector for the center sample of each sub-block may be calculated according to equation (1) and rounded to motion vector fractional precision (e.g., 1/16 in JEM). A motion compensated interpolation filter may then be applied to generate a prediction for each sub-block using the derived motion vectors. After MCP, the high precision motion vector of each sub-block is rounded and saved to the same precision as the normal motion vector.

In JEM, there are two affine motion patterns: AF _ INTER mode and AF _ MERGE mode. For CUs with a width and height larger than 8, the AF _ INTER mode may be applied. In the bitstream, the affine flag at CU level is signaled (signal) to indicate whether the AF _ INTER mode is used. In AF _ INTER mode, neighboring block construction is used with motion vector pair { (v)₀,v₁)|v₀＝{v_A,v_B,v_c},v₁＝{v_D,v_E} of the candidate list.

fig. 4 shows an example of Motion Vector Prediction (MVP) of a block 400 in the AF _ INTER mode. As shown in fig. 4, v0 is selected from the motion vectors of sub-block A, B or C. The motion vectors of the neighboring blocks may be scaled according to the reference list. The motion vector may also be scaled according to a relationship between Picture Order Count (POC) of the neighboring block reference, POC of the current CU reference, and POC of the current CU. Selecting v from adjacent sub-blocks D and E₁The method is similar. When the number of candidate lists is less than 2, the list is populated by copying the motion vector pairs consisting of each AMVP candidate. When the candidate list is greater than 2, the candidates may first be sorted according to neighboring motion vectors (e.g., based on the similarity of two motion vectors in a pair of candidates). In some implementations, the first two candidates are retained. In some embodiments, a Rate Distortion (RD) cost check is used to determine which motion vector pair candidate to select as the Control Point Motion Vector Predictor (CPMVP) for the current CU. An index indicating the position of CPMVP in the candidate list may be signaled in the bitstream. After determining the CPMVP of the current affine CU, affine motion estimation is applied and Control Point Motion Vectors (CPMVs) are found. The differences of CPMV and CPMVP are then signaled in the bitstream.

When a CU is applied in AF _ MERGE mode, it gets the first block encoded with affine mode from the valid neighboring reconstructed blocks. Fig. 5A shows an example of the selection order of candidate blocks of the current CU 500. As shown in fig. 5A, the selection order may be from left (501), up (502), top right (503), bottom left (504) to top left (505) of the current CU 500. Fig. 5B shows another example of a candidate block of the current CU 500 in the AF _ MERGE mode. If the neighboring lower left block 501 is encoded in affine mode, as shown in FIG. 5B, the motion vectors v2, v3, and v4 containing the upper left, upper right, and lower left corners of the CU of sub-block 501 are derived. The motion vector v0 in the upper left corner of the current CU 500 is calculated based on v2, v3, and v 4. The motion vector v1 in the upper right of the current CU may be calculated accordingly.

After calculating CPMV v0 and v1 of the current CU according to the affine motion model in equation (1), the MVF of the current CU can be generated. To identify whether the current CU uses AF _ MERGE mode encoding, an affine flag may be signaled in the bitstream when at least one neighboring block is encoded in affine mode.

Optional temporal motion vector prediction (ATMVP)

In the ATMVP method, a Temporal Motion Vector Prediction (TMVP) method is modified by extracting multiple sets of motion information (including motion vectors and reference indices) from blocks smaller than a current CU.

Fig. 6 shows an example of the ATMVP motion prediction process for CU 600. The ATMVP method predicts the motion vector of sub-CU 601 within CU 600 in two steps. The first step is to identify the corresponding block 651 in the reference picture 650 with a temporal vector. The reference picture 650 is also referred to as a motion source picture. The second step is to divide the current CU 600 into sub-CUs 601 and obtain the motion vector and reference index of each sub-CU from the block corresponding to each sub-CU.

In a first step, the reference picture 650 and the corresponding block are determined by motion information of spatially neighboring blocks of the current CU 600. To avoid the repeated scanning process of the neighboring blocks, the first MERGE candidate in the MERGE candidate list of the current CU 600 is used. The first available motion vector and its associated reference index are set as the index of the temporal vector and the motion source picture. In this way, the corresponding block can be identified more accurately than the TMVP, where the corresponding block (sometimes referred to as a collocated block) is always located in the lower right corner or center position with respect to the current CU.

In a second step, the corresponding block of sub-CU 651 is identified by a time vector in the motion source picture 650 by adding the time vector to the coordinates of the current CU. For each sub-CU, the motion information of the sub-CU is derived using the motion information of its corresponding block (e.g., the smallest motion grid covering the center sample). After identifying the motion information of the corresponding nxn block, it is converted into a motion vector and reference index of the current sub-CU in the same way as the TMVP of HEVC, where motion scaling and other procedures are applied. For example, the decoder checks whether a low delay condition is met (e.g., POC of all reference pictures of the current picture is less than POC of the current picture), and motion vector MVy (e.g., a motion vector corresponding to reference picture list X) for each sub-CU may be predicted using motion vector MVx (e.g., X equals 0 or 1 and Y equals 1-X).

Space-time motion vector prediction (STMVP)

In the STMVP method, the motion vectors of sub-CUs are recursively derived in raster scan order. Fig. 7 shows an example of one CU and neighboring blocks with four sub-blocks. Consider an 8 × 8 CU 700, which includes four 4 × 4 sub-CUs a (701), B (702), C (703), and D (704). The neighboring 4 × 4 blocks in the current frame are labeled a (711), b (712), c (713), and d (714).

The motion derivation of sub-CU a starts by identifying its two spatial neighbors. The first neighbor is an nxn block (block c 713) above the sub-CU a 701. If this block c (713) is not available or intra coded, the other nxn blocks above the sub-CU a (701) are checked (from left to right, starting at block c 713). The second neighbor is a block to the left of sub-CU a701 (block b 712). If block b (712) is not available or intra coded, the other blocks to the left of sub-CU a701 are checked (from top to bottom, starting at block b 712). The motion information obtained by each list from neighboring blocks is scaled to the first reference frame of the given list. Next, the Temporal Motion Vector Prediction (TMVP) of sub-block a701 is derived following the same procedure as the TMVP specified in HEVC. The motion information of the collocated block at block D704 is extracted and scaled accordingly. Finally, after retrieving and scaling the motion information, all available motion vectors are averaged separately for each reference list. The average motion vector is specified as the motion vector of the current sub-CU.

frame Rate Up Conversion (FRUC)

For a CU, the FRUC flag may be signaled when its MERGE flag is true. When the FRUC flag is false, the MERGE index may be signaled and the normal MERGE mode used. When the FRUC flag is true, another FRUC mode flag may be signaled to indicate which method (e.g., two-way matching or template matching) will be used to derive the motion information for the block.

At the encoder side, a decision is made whether to use FRUC MERGE mode for the CU based on the RD cost choices made for normal MERGE candidates. For example, a plurality of matching patterns (e.g., bi-directional matching and template matching) of the CU are checked by using RD cost selection. The mode that results in the lowest cost is further compared to other CU modes. If the FRUC matching pattern is the most efficient pattern, then the FRUC flag is set to true for the CU and the associated matching pattern is used.

In general, the motion derivation process in FRUC MERGE mode has two steps: CU-level motion search is performed first, followed by sub-CU-level motion refinement. At the CU level, an initial motion vector for the entire CU is derived based on bi-directional matching or template matching. First, a list of MV candidates is generated and the candidate that results in the lowest matching cost is selected as the starting point for further CU-level refinement. A local search based on two-way matching or template matching is then performed near the starting point. And taking the MV result of the minimum matching cost as the MV value of the whole CU. Subsequently, the motion information is further refined at sub-CU level, starting from the derived CU motion vector.

For example, the following derivation process is performed for W × H CU motion information derivation. In the first stage, the MVs of the entire W × H CU are derived. In the second stage, the CU is further divided into M × M sub-CUs. The value of M is calculated as (13), D is the predefined division depth, and is set to 3 by default in JEM. The MV value of each sub-CU is then derived.

Fig. 10 shows an example of bidirectional matching used in a Frame Rate Up Conversion (FRUC) method. Bi-directional matching is used to obtain motion information of a current CU by finding the closest match between two blocks along the motion trajectory of the current CU (1000) in two different reference pictures (1010, 1011). Under the continuous motion trajectory assumption, the motion vectors MV0(1001) and MV1(1002) pointing to two reference blocks are proportional to the temporal distance between the current picture and the two reference pictures (e.g., TD0(1003) and TD1 (1004)). In some embodiments, when the current picture 1000 is temporally located between two reference pictures (1010, 1011) and the temporal distances of the current picture to the two reference pictures are the same, the bi-directional match becomes a mirror-based bi-directional MV.

Fig. 11 illustrates an example of template matching used in the FRUC method. Template matching may be used to obtain motion information for the current CU 1100 by finding the closest match between the template in the current picture (e.g., the top and/or left neighboring blocks of the current CU) and the block in the reference picture 1110 (e.g., the same size as the template). Template matching may also be applied to AMVP mode, in addition to FRUC MERGE mode described above. In both JEM and HEVC, AMVP has two candidates. By the template matching method, new candidates can be derived. If the newly derived candidate by template matching is different from the first existing AMVP candidate, it is inserted at the very beginning of the AMVP candidate list and then the list size is set to 2 (e.g., by deleting the second existing AMVP candidate). When applied to AMVP mode, only CU level search is applied.

The MV candidates set at the CU level may include the following: (1) the original AMVP candidate, if the current CU is in AMVP mode, (2) all MERGE candidates, (3) a number of MVs in the interpolated MV field (described later), and top and left neighboring motion vectors.

When using bi-directional matching, each valid MV of the MERGE candidate may be used as an input to generate a MV pair that is assumed to be a bi-directional match. For example, one valid MV of MERGE candidate at reference list a is (MVa, ref)_a). Then find its paired reference picture ref of bi-directional MV in another reference list B_bSo that ref_aand ref_bTemporally on different sides of the current picture. If reference ref in reference list B_bIf not, ref will be referenced_bDetermined as being equal to the reference ref_aDifferent references, andIts temporal distance to the current picture is the minimum distance in list B. Determining a reference ref_bThen, based on the current picture and the reference ref_aRef, reference_bTime distance between MVa and MVb.

In some implementations, four MVs from the interpolated MV field may also be added to the CU level candidate list. More specifically, the interpolated MVs at positions (0, 0), (W/2, 0), (0, H/2) and (W/2, H/2) of the current CU are added. When FRUC is applied in AMVP mode, the original AMVP candidates are also added to the CU-level MV candidate set. In some implementations, at the CU level, 15 MVs of an AMVP CU and 13 MVs of a MERGE CU may be added to the candidate list.

The MV candidates set at the sub-CU level include MVs determined from the CU level search, (2) top, left side, top left, and top right neighboring MVs, (3) scaled versions of collocated MVs in the reference picture, (4) one or more ATMVP candidates (e.g., up to four), and (5) one or more STMVP candidates (e.g., up to four). The scaled MV from the reference picture is derived as follows. The reference pictures in both lists are traversed. The MVs at collocated positions of sub-CUs in the reference picture are scaled as references to starting CU-level MVs. The ATMVP and STMVP candidates may be the first four. At the sub-CU level, one or more MVs (e.g., up to 17) are added to the candidate list.

Generation of interpolated MV fields

Before encoding a frame, an interpolated motion field for the entire picture is generated based on a one-way ME. This motion field can then be used subsequently as MV candidates at the CU level or sub-CU level.

In some embodiments, the motion field for each reference picture in the two reference lists is traversed at the 4x4 block level. Fig. 12 shows an example of uni-directional Motion Estimation (ME)1200 in the FRUC method. For each 4x4 block, if the motion associated with the block passes through a 4x4 block in the current picture and the block is not assigned any interpolation motion, the motion of the reference block is scaled to the current picture according to temporal distances TD0 and TD1 (in the same way as MV scaling of TMVP in HEVC) and the scaled motion is assigned to the block in the current frame. If no scaled MV is assigned to a 4x4 block, the motion of the block is marked as unavailable in the interpolated motion field.

Interpolation and matching costs

When the motion vector points to a fractional sample position, motion compensated interpolation is required. To reduce complexity, bilinear interpolation is used for both bi-directional matching and template matching instead of the conventional 8-tap HEVC interpolation.

The computation of the matching cost is somewhat different at different steps. When selecting candidates from the CU-level candidate set, the matching cost may be the absolute sum-difference (SAD) of the bi-directional matching or the template matching. After determining the starting MV, the matching cost C of the bi-directional matching search at sub-CU level is calculated as follows:

Here, w is a weight coefficient. In some embodiments, w may be empirically set to 4. MV and MV^sIndicating the current MV and the starting MV, respectively. SAD can still be used as the matching cost for pattern matching searching at sub-CU level.

In FRUC mode, the MV is derived by using only the luminance (luma) samples. The derived motion will be used for luminance (luma) and chrominance (chroma) for MC inter prediction. After the MV is determined, the final MC is performed using an 8-tap (8-taps) interpolation filter for luminance and a 4-tap (4-taps) interpolation filter for chrominance.

MV refinement is a pattern-based MV search, with a two-way matching cost or template matching cost as a criterion. In JEM, two search modes are supported-the Unrestricted Centric Biased Diamond Search (UCBDS) and the adaptive cross search, with MV refinement at the CU level and sub-CU level, respectively. For MV refinement at both CU level and sub-CU level, MV is searched directly at one-quarter luma sample MV precision, followed by one-eighth luma sample MV refinement. The MV refined search range for the CU and sub-CU step is set to 8 luma samples.

in the bi-directional matching MERGE mode, bi-prediction is applied because the motion information of a CU is derived based on the closest match between two blocks along the current CU motion trajectory in two different reference pictures. In template matching MERGE mode, the encoder may select a CU from list 0 uni-directional prediction, list 1 uni-directional prediction, or bi-directional prediction. The selection may be based on the template matching cost as follows:

If costBi & gt factor & ltmin (cost0, cost1)

then bi-directional prediction is used;

Otherwise, if cost0< ═ cost1

Then the one-way prediction in list 0 is used;

If not, then,

using the unidirectional prediction in table 1;

Here, cost0 is the SAD for the list 0 template match, cost1 is the SAD for the list 2 template match, and cost bi is the SAD for the bi-directional template match. For example, when the value of factor equals 1.25, it means that the selection process is biased towards bi-prediction. Inter prediction direction selection may be applied to the CU-level template matching process.

The sub-block based prediction techniques discussed above may be used to obtain more accurate motion information for each sub-block when the sub-block size is small. However, smaller sub-blocks impose higher bandwidth requirements in motion compensation. On the other hand, for smaller sub-blocks, the derived motion information may not be accurate, especially when there is some noise in the block. Therefore, having a fixed sub-block size within a block may be sub-optimal.

Techniques are described herein that may be used in various embodiments to address bandwidth and accuracy issues introduced by fixed sub-block sizes using non-uniform and/or variable sub-block sizes. These techniques, also known as interleaved prediction, use different methods of partitioning blocks in order to more reliably obtain motion information without increasing bandwidth consumption.

The block is divided into sub-blocks having one or more division modes using an interleaving prediction technique. The division mode denotes a method of dividing a block into subblocks, including the size of the subblock and the position of the subblock. For each partition mode, a corresponding prediction block may be generated by deriving motion information of each sub-block based on the partition mode. Thus, in some embodiments, a plurality of prediction blocks may be generated by a plurality of partition modes even for one prediction direction. In some embodiments, only one partitioning pattern may be applied for each prediction direction.

Fig. 13 illustrates an example of interleaved prediction with two partition modes in accordance with the disclosed techniques. The current block 1300 may be divided into a plurality of modes. For example, as shown in FIG. 13, the current block is divided into a mode 0(1301) and a mode 1 (1302). Generating two prediction blocks P₀(1303) And P₁(1304). By calculating P₀(1303) And P₁(1304) May generate a final prediction block P of the current block 1300 (1305).

in general, given X partition modes, X prediction blocks (denoted P) of a current block₀，P₁,，…,P_X-1) May be generated by sub-block based prediction in X partition modes. The final prediction of the current block (denoted P) may be generated as:

Here, (x, y) are coordinates of pixels in the block, and w_i(x, y) is P_iThe weight coefficient of (2). By way of example, and not limitation, the weights may be expressed as:

N is a non-negative value. Alternatively, the displacement operation in equation (8) may also be expressed as:

The sum of weights is a power of 2 and by performing a shift operation instead of floating-point division, the weighted sum P can be calculated more efficiently.

The partition modes may have different subblock shapes, sizes or positions. In some embodiments, the partitioning pattern may include irregular sub-block sizes. Fig. 14A-14G show examples of several partitioning modes for a 16 × 16 block. In fig. 14A, a block is divided into 4x4 sub-blocks according to the disclosed technique. This mode is also used for JEM. Fig. 14B illustrates an example of a partitioning pattern for partitioning a block into 8x 8 sub-blocks in accordance with the disclosed technique. Fig. 14C illustrates an example of a partitioning pattern for partitioning a block into 8x 4 sub-blocks in accordance with the disclosed technique. Fig. 14D illustrates an example of a partitioning pattern for partitioning a block into 4 × 8 sub-blocks in accordance with the disclosed technique. In fig. 14E, a portion of a block is divided into 4x4 sub-blocks according to the disclosed technique. The pixels at the block boundaries are divided into smaller sub-blocks of size, e.g. 2x4, 4x 2 or 2x 2. Some sub-blocks may be combined to form larger sub-blocks. Fig. 14F shows examples of adjacent sub-blocks (e.g., the 4x4 sub-block and the 2x4 sub-block) that are combined to form larger sub-blocks of size 6x 4, 4x6, or 6x 6. In fig. 14G, a portion of a block is divided into 8 × 8 sub-blocks. While the pixels at the block boundaries are divided into smaller sub-blocks such as 8x 4, 4x 8 or 4x 4.

In sub-block based prediction, the shape and size of a sub-block may be determined based on the shape and/or size of an encoded block and/or encoded block information. The encoded block information may include an encoding algorithm used on the block and/or sub-block, such as whether the motion compensated prediction is (1) an affine prediction method, (2) an optional temporal motion vector prediction method, (3) a spatio-temporal motion vector prediction method, (4) a bi-directional optical flow method, or (5) a frame rate up-conversion method. For example, in some embodiments, when the size of the current block is M × N, the size of the sub-block is 4 × N (or 8 × N, etc.), i.e., the sub-block has the same height as the current block. In some embodiments, when the size of the current block is M × N, the size of the sub-block is M × 4 (or M × 8, etc.), i.e., the sub-block has the same width as the current block. In some embodiments, when the size of the current block is M × N (where M > N), the size of the sub-block is a × B, where a > B (e.g., 8 × 4). Alternatively, the size of the sub-block is B × a (e.g., 4 × 8).

In some embodiments, the size of the current block is mxn. When M × N < ═ T (or min (M, N) < = T, or max (M, N) < ═ T, etc.), the size of the sub-blocks is a × B; when mxn > T (or min (M, N) > T, or max (M, N) > T, etc.), the size of the sub-block is cxd, where a < ═ C and B < ═ D. For example, if M × N < ═ 256, the size of the sub-block may be 4 × 4. In some implementations, the size of the sub-blocks is 8x 8.

In some embodiments, whether to apply the interleaved prediction may be determined based on a direction of the inter prediction. The direction indicates whether the first or second inter prediction block is backward predicted in time or forward predicted in time. For example, in some embodiments, interleaved prediction may be suitable for bi-directional prediction, but not for uni-directional prediction.

As another example, when multiple hypotheses are applied, when there is more than one reference block, the interleaved prediction may be applied to one prediction direction. Multiple hypotheses may indicate that multiple video frames are used to make the prediction block. When a prediction block is made using a plurality of video frames, an interleaved prediction may be applied to one prediction direction. The prediction direction may be a forward or backward prediction direction. The forward prediction direction refers to a plurality of video frames in the video sequence occurring before the prediction block, and the backward direction refers to a plurality of reference frames in the video sequence occurring after the prediction block.

In some embodiments, how to apply the interleaved prediction may also be determined based on the inter prediction direction. In some embodiments, a bi-prediction block with sub-block based prediction is divided into sub-blocks with two different division modes for two different reference lists. In bi-directional prediction, a first reference list and a second reference list may be derived, where the first reference list and the second reference list represent frames that are temporally forward and backward away from a prediction block. More specifically, the first reference list may include a first chunk in a first direction relative to a prediction block in the video sequence. The first set of blocks may be used to create a prediction block. The second reference list may include a second chunk in a second direction relative to a prediction block in the video sequence. The second set of blocks may be used to create a prediction block. The first and second directions may be opposite, i.e. one direction may be forward in time and the other direction may be backward in time away from the prediction block. The first reference list may be partitioned into a first set of sub-blocks according to a first partitioning mode. The second reference list may be partitioned into a second group of sub-blocks according to a second partitioning mode, wherein the first mode and the second mode are different.

For example, when predicted from the reference list 0(L0), the bi-prediction block is divided into 4 × 8 sub-blocks as shown in fig. 14D. When predicted from reference list 1(L1), the same block is divided into 8 × 4 sub-blocks as shown in fig. 14C. The final prediction P is calculated as:

Here, P⁰And P¹Predicted values from L0 and L1, respectively. w is a⁰And w¹Weighted values from L0 and L1, respectively. As shown in equation (16), the weighting value may be determined as: w is a⁰(x,y)+w¹(x,y)＝1<<N (where N is a non-negative integer value). Since each direction prediction uses fewer sub-blocks (e.g., 4x 8 sub-blocks instead of 8x 8 sub-blocks), the computation requires less bandwidth than existing sub-block based approaches. By using larger sub-blocks, the prediction result is also less susceptible to noise interference.

In some embodiments, a uni-directional prediction block with sub-block based prediction is divided into sub-blocks with two or more different division modes for the same reference list. For example, for prediction of list L (L ═ 0 or 1), P^Lthe calculation is as follows:

Here, XL is the number of division patterns of list L.Is a prediction generated with the ith partition mode, anis thatThe weighting value of (2). For example, when XL is 2, list L applies two partitioning modes.In the first division mode, the block is divided into 4 × 8 sub-blocks as shown in fig. 14D, and in the second division mode, the block is divided into 8 × 4 sub-blocks as shown in fig. 14C.

In some embodiments, a bi-prediction block based on sub-block prediction is considered as a combination of two uni-directional prediction blocks from L0 and L1, respectively. The prediction from each list can be derived as described in the example above. The final prediction P can be calculated as:

Here the parameters a and b are two additional weights applied to the two intra-predicted blocks. In this particular example, both a and b may be set to 1. Similar to the example above, since the prediction for each direction uses fewer sub-blocks (e.g., 4x 8 sub-blocks instead of 8x 8 sub-blocks), the bandwidth usage is better than or the same as the existing sub-block based approach. At the same time, the prediction results can be improved by using larger subblocks.

In some embodiments, a separate non-uniform mode may be used in each unidirectional prediction block. For example, for each list L (e.g., L0 or L1), the blocks are divided into different patterns (e.g., as shown in fig. 14E or fig. 14F). Using a smaller number of sub-blocks reduces the bandwidth requirements. The non-uniformity of the sub-blocks also increases the robustness of the prediction results.

In some embodiments, for a multi-hypothesis coding block, there may be multiple prediction blocks generated by different partition modes for each prediction direction (or reference picture list). Multiple prediction blocks may be used and additional weights applied to generate the final prediction. For example, the additional weight may be set to 1/M, where M is the total number of generated prediction blocks.

In some embodiments, the encoder may determine whether and how to apply interleaved prediction. Then, the encoder may transmit information corresponding to the determination to the decoder at a sequence level, a picture level, a view level, a slice level, a Coding Tree Unit (CTU) (also referred to as a maximum coding unit (LCU)) level, a CU level, a PU level, a Tree Unit (TU) level, a slice group level, or a region level (possibly including a plurality of CUs/PUs/TUs/LCUs). These information may be signaled in a Sequence Parameter Set (SPS), a View Parameter Set (VPS), a Picture Parameter Set (PPS), a Slice Header (SH), a picture header, a sequence header, a slice level, a slice group level, a CTU/LCU, a CU, a PU, a TU, or a first block of a region.

In some implementations, the interleaved prediction is applicable to existing sub-block methods, such as affine prediction, ATMVP, STMVP, FRUC, or BIO. In this case, no additional signaling cost is required. In some implementations, the new sub-block MERGE candidates generated by the interleaved prediction may be inserted into a MERGE list, such as interleaved prediction + ATMVP, interleaved prediction + STMVP, interleaved prediction + FRUC, and so on.

In some embodiments, a flag may be signaled to indicate whether to use interleaved prediction. Signaling the flag may include encoding the flag in video information. In one example, if the current block is affine inter coded, a flag a may be signaled to indicate whether to use interleaved prediction. In another example, if the current block is affine MERGE encoded and unidirectional prediction is applied, the flag may be signaled to indicate whether interleaved prediction is used. In a third example, if the current block is affine MERGE encoded, the flag may be signaled to indicate whether to use interleaved prediction.

in some embodiments, interleaved prediction may always be used if the current block is affine MERGE encoded and unidirectional prediction is applied. In some embodiments, if the current block is affine MERGE encoded, interleaved prediction may always be used.

In some embodiments, the flag indicating whether to use interleaved prediction may be inherited without signaling. In one example, if the current block is affine MERGE encoded, inheritance may be used. In another example, the flag may be inherited from flags of neighboring blocks from which the affine model is inherited. In a third example, the flag inherits from a predefined neighboring block, such as a left or upper neighboring block. In a fourth example, the flag may be inherited from a first encountered affine coded neighboring block. If no neighboring blocks are affine coded, the flag may be inferred to be zero. In other words, if no neighboring blocks are affine coded, no interleaved prediction is applied. In a fifth example, the flag can only be inherited if the current block applies uni-directional prediction. In a sixth example, the flag can be inherited only if the current block and the neighboring block to be inherited from are located in the same CTU. In a seventh example, the flag can be inherited only when the current block and a neighboring block to be inherited therefrom are located in the same CTU row. In an eighth example, when the affine model is derived from a temporal neighboring block, the flag cannot be inherited from the flags of the neighboring block. In the ninth example, the flag cannot be inherited from flags that are not in the same LCU or LCU row or adjacent blocks of a video data processing unit (e.g., 64x64 or 128x 128). How the flag is signaled and/or derived may depend on the block scale and/or coding information of the current block. The coding information includes information coded in the video.

In some embodiments, if the reference picture is a current picture, then no interleaved prediction is applied. In one example, if the reference frame is a current frame containing a prediction block, the flag indicating whether to use interleaved prediction is not signaled. The reference frame is used as the basis for predicting the prediction block.

In some embodiments, the partition mode to be used by the current block may be derived based on information from spatial and/or temporal neighboring blocks. For example, rather than relying on the encoder to send the relevant information, both the encoder and decoder may employ a set of predetermined rules to obtain a partitioning pattern based on temporal adjacency (e.g., a previously used partitioning pattern for the same block) or spatial adjacency (e.g., a partitioning pattern used by a neighboring block).

In some embodiments, the weighting value w may be fixed. For example, all the partition modes can be weighted equally: w is a_i(x, y) is 1. In some embodiments, the weighting values may be determined based on the location of the block and the segmentation mode used. For example, for different (x, y), w_i(x, y) may be different. In some embodiments, the weighting values may further depend on the coding technique based on sub-block prediction(e.g., affine or ATMVP) and/or other encoded information (e.g., skip or non-skip mode and/or MV information).

In some embodiments, the encoder may determine the weighting values and send these values to the decoder at the sequence level, picture level, slice level, CTU/LCU level, CU level, PU level, or region level (possibly including multiple CUs/PUs/TUs/LCUs). The weighting values may be signaled in a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a Slice Header (SH), a CTU/LCU, a CU, a PU, or a first block of a region. In some embodiments, the weighting values may be derived from weighting values of spatially and/or temporally adjacent blocks.

It should be noted that the interleaved prediction techniques disclosed herein may be applied to one, some, or all of the coding techniques based on sub-block prediction. For example, the interleaved prediction technique may be applied to affine prediction, while other sub-block prediction based coding techniques (e.g., ATMVP, STMVP, FRUC, or BIO) do not use interleaved prediction. As another example, all affine, ATMVP, and STMVP apply the interleaved prediction techniques disclosed herein.

Fig. 15A is an example flow diagram of a method 1500 of improving motion prediction in a video system in accordance with the disclosed techniques. The method 1500 includes, at 1502, selecting a set of pixels from a video frame to form a block. The method 1500 includes, at 1504, partitioning a block into a first set of sub-blocks according to a first pattern. The method 1500 includes, at 1506, generating a first inter-prediction block based on the first set of sub-blocks. The method 1500 includes dividing the block into a second set of sub-blocks according to a second pattern at 1508. At least one sub-block in the second group has a size different from a size of one sub-block in the first group. The method 1500 includes generating a second inter prediction block based on the second set of sub-blocks, at 1510. The method 1500 also includes determining a prediction block based on the first inter prediction block and the second inter prediction block, at 1512.

In some embodiments, the first inter prediction block or the second inter prediction block is generated using at least one of (1) an affine prediction method, (2) an optional temporal motion vector prediction method, (3) a spatio-temporal motion vector prediction method, (4) a bi-directional optical flow method, or (5) a frame rate up-conversion method.

in some embodiments, the sub-blocks in the first group or the second group have a rectangular shape. In some embodiments, the sub-blocks in the first set of sub-blocks have non-uniform shapes. In some embodiments, the sub-blocks in the second set of sub-blocks have non-uniform shapes.

In some embodiments, the method includes determining the first mode or the second mode based on a size of the block. In some embodiments, the method comprises determining the first mode or the second mode based on information from a second block that is temporally or spatially adjacent to the block.

In some embodiments, for motion prediction of a block in a first direction, partitioning the block into a first set of sub-blocks is performed. In some embodiments, for motion prediction of the block in the second direction, partitioning the block into a second set of sub-blocks is performed.

In some embodiments, for motion prediction of a block in a first direction, partitioning the block into a first set of sub-blocks and partitioning the block into a second set of sub-blocks is performed. In some embodiments, the method further comprises: performing motion prediction on the block in the second direction by dividing the block into a third group of sub-blocks according to a third mode; generating a third intermediate prediction block based on the third group of sub-blocks; dividing the block into a fourth group of sub-blocks according to a fourth pattern, wherein at least one sub-block in the fourth group is a different size than the sub-blocks in the third group; generating a fourth intermediate prediction block based on the fourth group of sub-blocks; determining a second prediction block based on the third inter prediction block and the fourth inter prediction block; and determining a third prediction block based on the prediction block and the second prediction block.

In some embodiments, the method includes sending information of a first mode and a second mode for partitioning a block to an encoding device in a block-based motion prediction video system. In some embodiments, transmitting the information of the first mode and the second mode is performed at one of: (1) a sequence level, (2) a picture level, (3) a view level, (4) a slice level, (5) a coding tree unit, (6) a maximum coding unit level, (7) a coding unit level, (8) a prediction unit level, (10) a tree unit level, or (11) a region level.

In some embodiments, determining the prediction result comprises: applying the first set of weights to the first inter-prediction block to obtain a first weighted prediction block; applying the second set of weights to the second inter-prediction block to obtain a second weighted prediction block; and calculating a weighted sum of the first weighted prediction block and the second weighted prediction block to obtain a prediction block.

in some embodiments, the first set of weights or the second set of weights comprise fixed weight values. In some embodiments, the first set of weights or the second set of weights are determined based on information from another block that is temporally or spatially adjacent to the block. In some embodiments, the first set of weights or the second set of weights are determined using a coding algorithm used to generate the first prediction block or the second prediction block. In some implementations, at least one value of the first set of weights is different from another value of the first set of weights. In some implementations, at least one value of the second set of weights is different from another value of the second set of weights. In some implementations, the sum of the weights is equal to a power of two.

In some embodiments, the method includes transmitting the weights to an encoding device in a block-based motion prediction video system. In some embodiments, the transmission weights are performed at one of: (1) a sequence level, (2) a picture level, (3) a view level, (4) a slice level, (5) a coding tree unit, (6) a maximum coding unit level, (7) a coding unit level, (8) a prediction unit level, (10) a tree unit level, or (11) a region level.

Fig. 15B is an example flow diagram of a method 1550 of improving block-based motion prediction in a video system in accordance with the disclosed techniques. The method 1550 includes selecting a set of pixels from the video frame to form a block at 1552. The method 1550 includes dividing, at 1554, a block into a plurality of sub-blocks based on a size of the block or information of another block spatially or temporally adjacent to the block. At least one of the plurality of sub-blocks has a different size than the other sub-blocks. The method 1550 further includes generating a motion vector prediction by applying an encoding algorithm to the plurality of sub-blocks at 1556. In some embodiments, the encoding algorithm includes at least one of (1) an affine prediction method, (2) an optional temporal motion vector prediction method, (3) a spatio-temporal motion vector prediction method, (4) a bi-directional optical flow method, or (5) a frame rate up-conversion method.

as further described herein, the encoding process may avoid checking affine patterns of blocks split from parent blocks, where the parent blocks themselves are encoded using a pattern other than an affine pattern.

table 1 illustrates example performance results using conventional 2x2 affine prediction for Random Access (RA) configurations.

Table 12 example test results for x2 affine prediction

	Y	U	V	EncT	DecT
						class A1	-0.11％	-0.18％	-0.09％	139％	111％
Class A2	-0.9％	-0.85％	-0.68％	142％	125％
						CategoriesB	-0.58％	-0.51％	-0.67％	136％	114％
Class C	-0.26％	-0.24％	-0.24％	129％	108％
						Class D	-0.54％	-0.52％	-0.53％	130％	118％
class F	-0.89％	-1.02％	-0.97％	125％	108％
						General of	-0.47％	-0.44％	-0.44％	136％	114％

Table 2 illustrates example performance results from applying interleaved prediction to unidirectional prediction in accordance with embodiments of the present technique. Table 3 illustrates example performance results from applying interleaved prediction to bi-directional prediction in accordance with embodiments of the present technique.

table 2 example test results of interleaved prediction in unidirectional prediction

	Y	U	V	EncT	DecT
						Class A1	-0.05％	-0.14％	-0.02％	101％	100％
Class A2	-0.55％	-0.17％	-0.11％	102％	101％
						Class B	-0.33％	-0.17％	-0.20％	101％	101％
Class C	-0.15％	-0.16％	-0.04％	100％	100％
						Class D	-0.21％	-0.09％	-0.02％	106％	106％
Class F	-0.39％	-0.40％	-0.39％	102％	102％
						General of	-0.27％	-0.16％	-0.11％	101％	101％

TABLE 3 example test results of interleaved prediction in bi-directional prediction

	Y	U	V	EncT	DecT
						Class A1	-0.09％	-0.18％	-0.12％	103％	102％
Class A2	-0.74％	-0.40％	-0.28％	104％	104％
						Class B	-0.37％	-0.39％	-0.35％	103％	102％
Class C	-0.22％	-0.19％	-0.13％	102％	102％
						Class D	-0.42％	-0.28％	-0.32％	103％	102％
Class F	-0.60％	-0.64％	-0.62％	102％	102％
						General of	-0.38％	-0.30％	-0.23％	103％	102％

As shown in tables 2 and 3, the interleaved prediction achieves a major coding gain with lower complexity compared to conventional 2x2 affine prediction based coding. In particular, the interleaved prediction applied to the bi-directional prediction obtains a coding gain of 0.38% compared to the 2x2 affine method (0.47%). The encoding time and decoding time are 103% and 102%, respectively, compared to 136% and 114% in the 2x2 affine method.

FIG. 16 is a schematic diagram illustrating an example of a structure of a computer system or other control device 1600 that may be used to implement portions of the disclosed technology. In fig. 16, computer system 1600 includes one or more processors 1605 and memory 1610 connected by an interconnect 1625. Interconnect 1625 may represent any one or more separate physical buses, point-to-point connections, or both, connected by appropriate bridges, adapters, or controllers. Thus, interconnect 1625 may include, for example, a system bus, a Peripheral Component Interconnect (PCI) bus, a hyper transport or Industry Standard Architecture (ISA) bus, a Small Computer System Interface (SCSI) bus, a Universal Serial Bus (USB), an IIC (I2C) bus, or an Institute of Electrical and Electronics Engineers (IEEE) standard 674 bus (sometimes referred to as a "firewire").

The processor 1605 may include a Central Processing Unit (CPU) to control overall operation of the host, for example. In some embodiments, the processor 1605 accomplishes this by executing software or firmware stored in the memory 1610. The processor 1605 may be or include one or more programmable general or special purpose microprocessors, Digital Signal Processors (DSPs), programmable controllers, Application Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), and the like, or a combination of such devices.

The memory 1610 may be or include the main memory of a computer system. Memory 1610 represents any suitable form of Random Access Memory (RAM), Read Only Memory (ROM), flash memory, etc., or combination of these devices. In use, memory 1610 may contain, among other things, a set of machine instructions that, when executed by processor 1605, cause processor 1605 to perform operations to implement embodiments of the disclosed technology.

Also connected to the processor 1605 by an interconnect 1625 is an (optional) network adapter 1615. Network adapter 1615 provides computer system 1600 with the ability to communicate with remote devices, such as storage clients and/or other storage servers, and may be, for example, an ethernet adapter or a fibre channel adapter.

FIG. 17 illustrates a block diagram of an example embodiment of a mobile device 1700 that may be used to implement portions of the disclosed technology. Mobile device 1700 may be a laptop, smartphone, tablet, camera, or other device capable of processing video. The mobile device 1700 includes a processor or controller 1701 to process data and a memory 1702 in communication with the processor 1701 to store and/or buffer data. For example, the processor 1701 may include a Central Processing Unit (CPU) or a microcontroller unit (MCU). In some implementations, the processor 1701 may include a Field Programmable Gate Array (FPGA). In some implementations, the mobile device 1700 includes or communicates with a Graphics Processing Unit (GPU), a Video Processing Unit (VPU), and/or a wireless communication unit to implement various visual and/or communication data processing functions of the smartphone device. For example, the memory 1702 may include and store processor-executable code that, when executed by the processor 1701, configures the mobile device 1700 to perform various operations, such as receiving information, commands and/or data, processing information and data, and transmitting or providing processed information/data to another data device, such as an actuator or external display. To support various functions of the mobile device 1700, the memory 1702 may store information and data, such as instructions, software, values, images, and other data processed or referenced by the processor 1701. For example, the storage functionality of memory 1702 may be implemented using various types of Random Access Memory (RAM) devices, Read Only Memory (ROM) devices, flash memory devices, and other suitable storage media. In some implementations, the mobile device 1700 includes an input/output (I/O) unit 1703 to interface the processor 1701 and/or memory 1702 with other modules, units, or devices. For example, the I/O unit 1703 may interface with the processor 1701 and the memory 1702 to utilize various wireless interfaces compatible with typical data communication standards, for example, between one or more computers and user equipment in the cloud. In some implementations, mobile device 1700 may interface with other devices through I/O unit 1703 using a wired connection. Mobile device 1700 may also be connected to other external interfaces (e.g., data storage) and/or visual or audio display devices 1704 to retrieve and transmit data and information which may be processed by a processor, stored by a memory, or displayed on display device 1704 or an output unit of an external device. For example, the display device 1704 may display a video frame modified based on MVP (e.g., a video frame including the prediction block 1305 as shown in fig. 13) in accordance with the disclosed techniques.

In some embodiments, a video decoder device may implement a video decoding method in which video decoding is performed using the improved block-based motion prediction described herein. The method may include forming a video block using a set of pixels from a video frame. The block may be partitioned into a first set of sub-blocks according to a first pattern. The first intermediate prediction block may correspond to a first group of sub-blocks. The block may include a second set of sub-blocks according to a second pattern. At least one sub-block in the second group has a size different from a size of one sub-block in the first group. The method may also determine a prediction block based on the first intermediate prediction block and a second intermediate prediction block generated from the second set of sub-blocks. Other features of the method may be similar to the method 1500 described above.

In some embodiments, a decoder-side method of video decoding may improve the predicted video quality by utilizing block-based motion prediction using blocks of a video frame, where a block corresponds to a set of blocks of pixels. A block may be divided into a plurality of sub-blocks based on the size of the block or information from another block spatially or temporally adjacent to the block, wherein at least one sub-block of the plurality of sub-blocks has a size different from the size of the other sub-blocks. The decoder may use a motion vector prediction generated by applying an encoding algorithm to a plurality of sub-blocks. Other features of the method are described with reference to fig. 15B and the corresponding description.

In some embodiments, the video decoding method may be implemented using a decoding apparatus implemented on the hardware platform as described in fig. 16 and 17.

Fig. 8 is a flow diagram of an example method 800 of video encoding or decoding. The method 800 includes determining (802) to apply an interleaved prediction to a block because the block satisfies a condition. The method 800 includes determining (804) a prediction block based on the first inter prediction block and the second inter prediction block. The method 800 includes generating (806) an encoded or decoded representation of the block using the prediction block. For example, a video encoder or transcoder may perform encoding at 806, and a video decoder may perform generation of a decoded representation at 806. The first intermediate prediction block is generated from a first group of sub-blocks obtained by partitioning the block according to a first pattern, and the second intermediate prediction block is generated from a second group of sub-blocks obtained by partitioning the block according to a second pattern, and at least one sub-block in the second group has a different size from the sub-blocks in the first group.

In some embodiments, the condition satisfied by the block is that the block is encoded using bi-directional predictive coding. In some embodiments, the block satisfies the condition that the block is predicted using multi-hypothesis prediction, and wherein the interleaved prediction is applied to prediction directions available for multiple reference blocks. In some embodiments, a block is partitioned into a first set of sub-blocks and partitioned into a second set of sub-blocks to motion predict the block in a first direction. In some embodiments, the encoded representation may include information of the first pattern and the second pattern.

In some embodiments, the condition comprises encoding the block using bi-directional prediction instead of uni-directional prediction, wherein bi-directional prediction is based on previous and subsequent video frames and uni-directional prediction is based only on previous or subsequent video frames. In some embodiments, the condition is that the block is bi-directionally predicted, and wherein the first intermediate prediction block is generated from a first set of sub-blocks using a first reference list of the block, and the second intermediate prediction block is generated from a second set of sub-blocks using a second reference list of the block. In some embodiments, the condition is that the block is uni-directionally predicted, and wherein the first intermediate prediction block is generated from the first set of sub-blocks using a reference list L0 or L1 of the block, and the second intermediate prediction block is generated from the second set of sub-blocks using a reference list L0 or L1 of the block. In some embodiments, the condition is that the block is bi-directionally predicted, and wherein the first intermediate prediction block is generated from a first set of sub-blocks using a first reference list of the block, and the second intermediate prediction block is generated from a second set of sub-blocks using a second reference list of the block. The method may also include generating one or more third intermediate prediction blocks from one or more third groups of sub-blocks of the block using the first reference list and generating one or more fourth intermediate prediction blocks from one or more fourth groups of sub-blocks of the block using the second reference list, wherein the prediction block is determined based on the one or more third intermediate prediction blocks and/or the one or more fourth intermediate prediction blocks.

In some embodiments, the condition is that the block is a multi-hypothesis coded block. The method may also include generating one or more additional inter-prediction blocks from one or more sets of additional sub-blocks of the block using the reference list for the block, wherein the prediction block is determined based on the one or more additional inter-prediction blocks.

Other features and variations of method 800 may be similar to those described with reference to fig. 15A and 15B.

FIG. 9 shows a functional block diagram of an example apparatus 1900 that implements the interleaving prediction techniques disclosed herein. For example, the apparatus 1900 may be a video encoder or transcoder that receives the video 1902. The received video 1902 may be in the form of compressed video or uncompressed video. The video 1902 may be received over a network interface or from a storage device. The video 1902 (either uncompressed or compressed form) may correspond to a certain size of video frames. The apparatus 1900 may perform pre-processing 1904 operations on the video 1902. The pre-processing 1904 may be optional and may include content such as decryption, color space conversion, quality enhancement filtering, and the like. The encoder 1906 can convert the video 1902 into an encoded representation that can be selectively post-processed by the post-processing block 1910 to produce output video. For example, the encoder 1906 may perform interleaved prediction on blocks of the video 1902. A block may represent a video region of any size, but is typically selected to have a fixed number of horizontal and vertical dimensions in number of pixels (e.g., 128x128 or 16x16, etc.). In some cases, a block may represent a coding unit. Optional post-processing blocks may include filtering, encryption, packaging, etc. The output video 1910 may be stored on a storage device or may be transmitted over a network interface.

From the foregoing, it will be appreciated that specific embodiments of the disclosed technology have been described herein for purposes of illustration, but that various modifications may be made without deviating from the scope of the invention. Accordingly, the disclosed technology is not limited except as by the appended claims.

The embodiments, modules, and functional operations disclosed herein and otherwise described may be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed herein and structural equivalents thereof, or in combinations of one or more of them. The disclosed embodiments and other embodiments may be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer-readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing apparatus" includes all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or groups of computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. Propagated signals are artificially generated signals, e.g., machine-generated electrical, optical, or electromagnetic signals, that are generated to encode information for transmission to suitable receiver apparatus.

a computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described herein can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, such as internal hard disks or removable disks; a magneto-optical disk; and CDROM and DVD-ROM discs. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various functions described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claim combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described herein should not be understood as requiring such separation in all embodiments.

Only a few implementations and examples have been described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims

1. A method of processing a video block using interleaved prediction, the method comprising:

determining a prediction block for a block based on a first inter prediction block and a second inter prediction block as the block satisfies one or more conditions; and

Generating an encoded or decoded representation of the block using the prediction block;

Wherein the first inter-prediction block is generated from a first set of sub-blocks into which the block is partitioned according to a first partition mode, and the second inter-prediction block is generated from a second set of sub-blocks into which the block is partitioned according to a second partition mode,

Wherein the first partitioning pattern is different from the second partitioning pattern.

2. The method of claim 1, wherein the condition is that the block is uni-directionally predicted, and wherein the first intermediate prediction block is generated from the first set of sub-blocks and the second intermediate prediction block is generated from the second set of sub-blocks from a same reference picture.

3. The method of claim 1, wherein the condition that the block satisfies is that the block uses bi-predictive coding.

4. The method of claim 1, wherein the condition comprises encoding the block using bi-prediction instead of uni-prediction, wherein the bi-prediction is based on reference list L0 and reference list L1, and the uni-prediction is based only on reference list L0 or reference list L1.

5. The method of claim 1, wherein the condition is that the block is bi-predicted, and wherein the first inter-prediction block is generated from the first set of sub-blocks using a first reference list for the block, and the second inter-prediction block is generated from the second set of sub-blocks using a second reference list for the block.

6. The method of claim 1, wherein the condition satisfied by the block is that the block is predicted using multi-hypothesis prediction, and wherein the interleaved prediction is applied to prediction directions available for multiple reference blocks.

7. The method of claim 1, wherein the condition is that the block is bi-predicted, and wherein the first intermediate prediction block of the block is generated from the first set of sub-blocks using a first reference list and the second intermediate prediction block of the block is generated from the second set of sub-blocks using a second reference list, the method further comprising:

Generating one or more third inter-prediction blocks from one or more third groups of sub-blocks of the block using the first reference list;

Generating one or more fourth inter-prediction blocks from one or more fourth groups of sub-blocks of the block using the second reference list;

Wherein the prediction block is determined based on the one or more third inter prediction blocks and/or the one or more fourth inter prediction blocks.

8. the method of claim 1, wherein the condition is that the block is a multi-hypothesis coded block, the method further comprising:

Generating one or more additional inter-prediction blocks from one or more additional sets of sub-blocks of the block using a reference list used by the block; and is

Wherein the prediction block is determined based on the one or more additional inter-prediction blocks.

9. The method of claim 8, wherein the prediction block is determined as an equally weighted sum of the first inter prediction block, the second inter prediction block, and the one or more additional inter prediction blocks.

10. the method of claim 1, wherein partitioning the block into a first set of sub-blocks and partitioning the block into a second set of sub-blocks is performed for motion prediction of the block in a first direction.

11. The method of any of claims 1 to 10, further comprising:

Information of the first partition mode and the second partition mode is included in the encoded representation.

12. The method of claim 11, wherein the information of the first partition mode and the second partition mode is included at one of (1) a sequence level, (2) a picture level, (3) a view level, (4) a slice level, (5) a coding tree unit, (6) a maximum coding unit level, (7) a coding unit level, (8) a prediction unit level, (10) a tree unit level, or (11) a region level.

13. The method of any of claims 1-12, wherein the encoding indicates that information related to the interleaved prediction is contained at (1) a sequence level, (2) a picture level, (3) a view level, (4) a slice level, (5) a Coding Tree Unit (CTU), (6) a Largest Coding Unit (LCU) level, (7) a Coding Unit (CU) level, (8) a Prediction Unit (PU) level, (9) a Tree Unit (TU) level, (10) a slice level, (11) a slice group level, or (12) a region level that may include multiple CUs/PUs/TUs/LCUs.

14. The method of any of claims 1-13, wherein the encoding representation includes information regarding whether and how to apply the interleaved prediction in an initial block of a Sequence Parameter Set (SPS), a View Parameter Set (VPS), a Picture Parameter Set (PPS), a Slice Header (SH), a picture header, a sequence header, a slice level, a slice group level, or a region, wherein the information comprises a flag that is selectively included based on the condition of the block.

15. The method of claim 14, wherein the flag is ignored when encoding the block using a sub-block prediction method.

16. The method of claim 15, wherein the sub-block prediction method is a selectable temporal motion vector prediction method.

17. The method of claim 15, wherein the sub-block prediction method is an affine prediction method.

18. The method of claim 15, wherein the sub-block prediction method is a frame rate up-conversion method.

19. The method of claim 15, wherein the sub-block prediction method is a bi-directional optical flow method.

20. The method of claim 15, wherein the sub-block prediction method is a space-time motion vector prediction method.

21. The method of claim 14, comprising: signaling a flag indicating whether to use the interleaved prediction if the block is affine coded.

22. The method of claim 14, wherein a flag indicates whether to use the interleaved prediction, the flag not being signaled.

23. The method of any of claims 1-22, inheriting a flag indicating whether to use the interleaved prediction for the block from previous coding information in the coded representation.

24. The method of claim 23, comprising: the flag is inherited from a neighboring block from which the affine model is inherited.

25. The method of claim 24, comprising: inheriting the flag of a predefined neighboring block, the predefined neighboring block comprising a left neighboring block or an above neighboring block.

26. The method of claim 23, comprising: the flag is inherited from the originally encountered affine-coded neighboring blocks.

27. The method of claim 23, comprising: when no neighboring blocks are affine coded, it is inferred that there is no interleaved prediction.

28. The method of claim 23, comprising: when the block applies uni-directional prediction, the flag is inherited.

29. The method of claim 23, comprising: inheriting the flag when the block and a neighboring block from which the flag is inherited are located in the same CTU.

30. The method of claim 23, comprising: inheriting the flag when the current block and a neighboring block inherited therefrom are located in the same CTU row.

31. The method of any of claims 1-30, wherein the conditions include a width and a height of the block.

32. The method of claim 1, wherein the condition comprises encoding another block of the video frame without using the block.

33. A video processing apparatus comprising a processor configured to implement the method of any one or more of claims 1 to 32.

34. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing the method of any of claims 1-32.