CN113557720A

CN113557720A - Adaptive weights in multi-hypothesis prediction in video coding

Info

Publication number: CN113557720A
Application number: CN202080019945.1A
Authority: CN
Inventors: 朱维佳; 许继征; 张莉; 张凯; 刘鸿彬; 王悦
Original assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Current assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Priority date: 2019-03-12
Filing date: 2020-03-12
Publication date: 2021-10-26
Also published as: WO2020182187A1

Abstract

Adaptive weights in multi-hypothesis prediction in video coding are disclosed. In an example implementation, for a transition between a first block of video and a bitstream representation of the first block of video, determining whether to enable or disable use of updating weights in a Combined Intra and Inter Prediction (CIIP) mode to be applied during the transition; in response to determining that updating of weights in CIIP mode is enabled, updating weights applied in a portion of pixels of the first block in CIIP mode; and performing the conversion based on the updated weights and the non-updated weights.

Description

Adaptive weights in multi-hypothesis prediction in video coding

Cross Reference to Related Applications

The present application claims timely priority and benefit from international patent application No. PCT/CN2019/077838 filed 3, 12.2019, in accordance with the regulations of applicable patent laws and/or paris convention. The entire disclosure of international patent application No. PCT/CN2019/077838 is incorporated herein by reference as part of the disclosure of the present application.

Technical Field

This patent application document relates to video encoding and decoding techniques, devices and systems.

Background

Despite advances in video compression, digital video still occupies the largest bandwidth usage on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, the demand for bandwidth for digital video usage is expected to continue to grow.

Disclosure of Invention

This document describes various embodiments and techniques for using multi-hypothesis prediction in video coding.

In one example aspect, various methods for using multi-hypothesis prediction in video coding are disclosed.

In another example aspect, a method of video processing is disclosed. The method includes updating weights for combining inter and intra prediction modes during a transition between a current video region and a bitstream representation of a current video block, and performing the transition based on the updating.

In another example aspect, a method of video processing is disclosed. The method comprises the following steps: determining, for a transition between a first block of video and a bitstream representation of the first block of video, whether to enable or disable use of updating weights in a Combined Intra and Inter Prediction (CIIP) mode to be applied during the transition; in response to determining that updating the weights in the CIIP mode is enabled, updating the weights applied within the portion of pixels of the first block in the CIIP mode; and performing the conversion based on the updated weights and the non-updated weights.

In another exemplary aspect, a method of video processing is disclosed. The method comprises the following steps: determining, for a transition between a first block of video and a bitstream representation of the first block of video, a set of weights from a plurality of sets of weights being used within a Combined Intra and Inter Prediction (CIIP) mode, the determining depending on a message present within at least one of: a sequence level including a Sequence Parameter Set (SPS), a slice level including a slice header, a slice group level including a slice group header, a picture level including a picture header, and a block level including a Coding Tree Unit (CTU) and a Coding Unit (CU); applying a CIIP mode based on the determined set of weights to generate a final prediction for the first block; and performing the conversion based on the final prediction; wherein the final prediction of the first block is generated based on a weighted sum of the intermediate intra prediction and the intermediate merge inter prediction of the first block.

In yet another example aspect, a video codec device configured to implement one of the above methods is disclosed.

In yet another example aspect, a video decoding apparatus configured to implement one of the above methods is disclosed.

In yet another aspect, a computer-readable medium is disclosed. A computer readable medium has stored thereon processor executable code for implementing one of the above methods.

These and other aspects are described herein.

Drawings

Fig. 1 shows an example derivation process for merge candidate list construction.

Fig. 2 shows an example of the positions of spatial merge candidates.

Fig. 3 shows an example of candidate pairs considered for redundancy checking of spatial merge candidates.

Fig. 4 shows an example of the location of the second PU for an N × 2N partition and a 2N × N partition.

Fig. 5 is an exemplary illustration of motion vector scaling of a temporal merge candidate.

Fig. 6 shows examples C0 and C1 of candidate positions of the time-domain merge candidate.

Fig. 7 shows an example of combining bi-directionally predicted merge candidates.

Fig. 8 summarizes the derivation of motion vector prediction candidates.

Fig. 9 shows an illustration of motion vector scaling of spatial motion vector candidates.

Fig. 10 shows neighboring samples used to derive IC parameters.

Fig. 11 shows a simplified affine motion model for (a) 4-parameter affine and (b) 6-parameter affine.

Fig. 12 is an example of affine MVF per sub-block.

Fig. 13 shows a 4-parameter affine model (a) and a 6-parameter affine model (b).

Fig. 14 shows MVPs for AF _ INTER for inherited affine candidates.

Fig. 15 shows MVPs for AF _ INTER for constructing affine candidates.

Fig. 16 shows candidates for AF _ merge.

Fig. 17 shows candidate positions for affine merge mode.

Fig. 18 shows an example of the MMVD search process.

Fig. 19 shows an example of MMVD search points.

Fig. 20 shows DMVR based on two-sided template matching.

Fig. 21 is an example of MVDs (0,1) mirrored between list 0 and list 1 in a DMVR.

Fig. 22 shows MVs that can be examined in one iteration.

Fig. 23 shows an example of a desired reference spot with padding.

FIG. 24 illustrates an example apparatus for implementing the techniques described in this document.

FIG. 25 is a flow diagram of an example method of video processing.

FIG. 26 is a flow diagram of an example method of video processing.

FIG. 27 is a flow diagram of an example method of video processing.

In the description of the drawings, (a) and (b) refer to the left-hand and right-hand sides of the corresponding drawings.

Detailed Description

This document provides various techniques that may be used by a decoder of an image or video bitstream to improve the quality of decompressed or decoded digital video or images. For the sake of brevity, the term "video" is used herein to include both a sequence of pictures (conventionally referred to as video) and individual images. In addition, the video encoder may also implement these techniques during the encoding process in order to reconstruct the decoded frames for further encoding.

Section headings are used in this document to ease understanding and do not limit embodiments and techniques to corresponding sections. As such, embodiments from one section may be combined with embodiments from other sections.

1. Brief introduction

This patent document relates to video encoding and decoding techniques. In particular, this document relates to candidate list construction in video codecs. It can be applied to existing video codec standards such as HEVC, or pending standards (multifunctional video codec). It may also be applicable to future video codec standards or video codecs.

2. Preliminary discussion

Video codec standards have been developed primarily through the development of the well-known ITU-T and ISO/IEC standards. ITU-T makes H.261 and H.263, ISO/IEC makes MPEG-1 and MPEG-4 visuals, and both organizations jointly make the H.262/MPEG-2 video, the H.264/MPEG-4 Advanced Video Codec (AVC), and the H.265/HEVC standards. Since h.262, video codec standards have been based on hybrid video codec structures, in which temporal prediction plus transform coding is utilized. To explore future video codec technologies beyond HEVC, VCEG and MPEG have together established the joint video exploration team (jfet) in 2015. Since then, JFET has adopted many new approaches and applied them to a reference software named Joint Exploration Model (JEM). In month 4 of 2018, a joint video experts group (jfet) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11(MPEG) was created to work for the VVC standard with a goal of 50% bit rate reduction compared to HEVC.

Inter prediction in 2.1 HEVC/H.265

Each inter-predicted PU has motion parameters for one or two reference picture lists. The motion parameters include a motion vector and a reference picture index. Inter _ pred _ idc may also be used to signal the use of one of the two reference picture lists. Motion vectors can be explicitly coded into deltas relative to a predictor.

When a CU is coded in skip mode, one PU is associated with the CU and there are no significant residual coefficients, no motion vector delta or reference picture index of the coding. merge mode is defined as the motion parameters of the current PU being obtained from neighboring PUs that include spatial and temporal candidates. The merge mode may be applied to any inter-predicted PU, not just those used for skip mode. An alternative to merge mode is explicit transmission of motion parameters, where motion vectors (more precisely, Motion Vector Differences (MVDs) compared to motion vector predictors), corresponding reference picture indices of each reference picture list and reference picture list usage are signaled explicitly per PU. Such a mode is referred to as Advanced Motion Vector Prediction (AMVP) in this disclosure.

A PU is generated from one block of samples when the signaling indicates that one of the two reference picture lists is to be used. This is called "one-way prediction". For both P-slices and B-slices, unidirectional prediction is available.

When the signaling indicates that two reference picture lists are to be used, the PU is generated from two blocks of samples. This is called "bi-prediction". Bi-prediction is only available for B slices.

Details regarding the inter prediction modes specified in HEVC will be provided below. The description will start in merge mode.

2.1.1 reference Picture List

In HEVC, the term inter prediction is used to denote a prediction derived from a reference picture, rather than a data element (e.g., a sample value or a motion vector) of a currently decoded picture. Similar to in h.264/AVC, pictures may be predicted from multiple reference pictures. Reference pictures used for inter prediction are organized in one or more reference picture lists. The reference index identifies which reference picture in the list should be used to create the prediction signal.

A single reference picture List, List 0, is used for P slices, and two reference picture lists, List 0 and List 1, are used for B slices. It should be noted that the reference pictures contained in the List 0/1 may be from past pictures and future pictures in the capture/display order.

Merge mode in 2.1.2 HEVC

2.1.2.1 merge mode candidate derivation

When predicting a PU using merge mode, an index pointing to an entry in the merge candidate list is parsed from the bitstream, and motion information is retrieved using the index. The construction of this list is specified in the HEVC standard and can be summarized according to the following sequence of steps:

step 1: initial candidate derivation

Step 1.1: spatial domain candidate derivation

Step 1.2: redundancy check for null field candidates

Step 1.3: time domain candidate derivation

Step 2: additional candidate insertions

Step 2.1: creation of bi-directional prediction candidates

Step 2.2: insertion of zero motion candidates

A schematic illustration of these steps is also given in fig. 1. For spatial merge candidate derivation, a maximum of four merge candidates are selected from among the candidates located at five different positions. For time-domain merge candidate derivation, at most one merge candidate is selected from among the two candidates. Since the number of each PU candidate is assumed to be constant at the decoder, additional candidates are generated when the number of candidates obtained from step 1 does not reach the largest number of merge candidates (MaxNumMergeCand) signaled in the slice header. Since the number of candidates is constant, the index of the best merge candidate is coded using truncated unary binarization (TU). If the size of the CU is equal to 8, then all PUs of the current CU share a single merge candidate list, which is equivalent to the merge candidate list of the 2N × 2N prediction unit.

Hereinafter, operations associated with the foregoing steps will be described in detail.

2.1.2.2 spatial domain candidate derivation

In the derivation of the spatial merge candidates, a maximum of four merge candidates are selected among the candidates located at the positions shown in fig. 2. The order of derivation was a1, B1, B0, a0, and B2. Position B2 is considered only if any PU at position a1, B1, B0, a0 is unavailable (e.g., because it belongs to another stripe or slice) or intra-coded. After the candidate at position a1 is added, the addition of the remaining candidates is subject to a redundancy check, which ensures that candidates with the same motion information are excluded from the list, thereby improving the codec efficiency. In order to reduce computational complexity, not all possible candidate pairs are considered among the mentioned redundancy checks. Instead, only the pairs linked with the arrows of fig. 3 are considered and the corresponding candidate for redundancy check is added to the list only if it does not have the same motion information. Another source of duplicate motion information is the "second PU" associated with a partition other than 2 nx 2N. As an example, fig. 4 shows the second PU for the N × 2N and 2N × N cases, respectively. When the current PU is partitioned into nx2N, the candidate at position a1 is not considered for list construction. In fact, adding this candidate will result in both prediction units having the same motion information, which is redundant for having only one PU within the codec unit. Similarly, position B1 is not considered when the current PU is partitioned into 2N × N.

2.1.2.3 time-domain candidate derivation

In this step, only one candidate is added to the list. In particular, in this derivation of temporal merge candidates, a scaled motion vector is derived based on a collocated PU belonging to a picture within a given reference picture list having the smallest POC difference with the current picture. The derived reference picture list to be used for collocated PUs is explicitly signaled in the slice header. The scaled motion vector for the temporal merge candidate is obtained as shown by the dashed line in fig. 5, scaled from the motion vector of the collocated PU using POC distances (tb and td), where tb is defined as the POC difference between the reference picture of the current picture and the current picture, and td is defined as the POC difference between the reference picture of the collocated picture and the collocated picture. The reference picture index of the temporal merge candidate is set equal to zero. A practical implementation of the scaling process is described in the HEVC specification. For B slices, two motion vectors are obtained (one for reference picture list 0 and the other for reference picture list 1) and combined to produce a bi-predicted merge candidate.

As shown in fig. 6, in the collocated pu (y) belonging to the reference frame, the position of the temporal candidate is selected between candidates C0 and C1. If the PU at location C0 is unavailable, intra coded, or outside the current coding tree unit (CTU, also known as LCU, maximum coding unit) row, then location C1 is used. Otherwise, position C0 is used in the derivation of the time-domain merge candidate.

Temporal motion vector prediction is also referred to as "TMVP".

2.1.2.4 additional candidate insertions

In addition to spatial and temporal merge candidates, there are two additional types of merge candidates: the bi-directional prediction merge candidate and the zero merge candidate are combined. The combined bidirectional prediction merge candidate is generated by using the spatial merge candidate and the temporal merge candidate. The combined bi-predictive merge candidate is used for B slices only. The combined bi-prediction candidate is generated by combining the first reference picture list motion parameters of the initial candidate with the second reference picture list motion parameters of the other. If these two tuples provide different motion hypotheses, they will form new bi-directional prediction candidates. As an example, fig. 7 shows the case when two candidates in the original list (left side), which have mvL0 and refIdxL0 or mvL1 and refIdxL1, are used to create a combined bi-predictive merge candidate that is added to the final list (right side). There are many rules regarding the combination considered to generate these additional merge candidates.

Zero motion candidates are inserted to fill the remaining entries in the merge candidate list and thus reach the MaxNumMergeCand capacity. These candidates have zero spatial displacement and reference picture indices that start at zero and increase each time a new zero motion candidate is added to the list. Finally, no redundancy check is performed on these candidates.

2.1.3 AMVP

AMVP exploits the spatial-temporal correlation of motion vectors with neighboring PUs, which is used for explicit transmission of motion parameters. For each reference picture list, the motion vector candidate list is constructed in the following way: the availability of left and upper temporal neighboring PU locations is first checked, redundant candidates are removed, and zero vectors are added, making the candidate list a constant length. The encoder may then select the best predictor from the candidate list and transmit a corresponding index indicating the selected candidate. Similar to merge index signaling, truncated unary codes are used to encode the index of the best motion vector candidate. The maximum value to be encoded in this case is 2 (refer to fig. 8). In the following sections, details will be provided regarding the derivation process of motion vector prediction candidates.

2.1.3.1 derivation of AMVP candidates

Fig. 8 shows an example derivation process of a motion vector prediction candidate.

In motion vector prediction, two types of motion vector candidates are considered: spatial motion vector candidates and temporal motion vector candidates. For spatial motion vector candidate derivation, two motion vector candidates are finally derived based on the motion vectors of each PU located at five different positions as shown in fig. 2.

For temporal motion vector candidate derivation, one motion vector candidate is selected from two candidates derived based on two different collocated positions. After the first list of spatio-temporal candidates is made, the repeated motion vector candidates in the list are removed. If the number of possible candidates is greater than two, then such motion vector candidates are removed from the list: its reference picture index within the associated reference picture list is greater than 1. If the number of spatio-temporal motion vector candidates is less than two, additional zero motion vector candidates are added to the list.

2.1.3.2 spatial motion vector candidates

Among the derivation of spatial motion vector candidates, a maximum of two candidates are considered among five possible candidates derived from PUs located at the positions shown in fig. 2, which are the same as those of the moving merge. The derived order for the left side of the current PU is defined as a0, a1, and scale a0, scale a 1. The derivation order for the upper side of the current PU is defined as B0, B1, B2, scale B0, scale B1, and scale B2. For each side, there are thus four cases that can be used as motion vector candidates, two of which do not require the use of spatial scaling and two of which use spatial scaling. The four different cases are summarized as follows:

no spatial domain scaling

(1) Same reference picture list, and same reference picture index (same POC)

(2) Different reference picture lists, but the same reference picture (same POC)

Spatial domain scaling

(3) Same reference picture list, but different reference picture indices (different POCs)

(4) Different reference picture lists, and different reference pictures (different POCs)

First, the case of no spatial scaling is checked, followed by spatial scaling. Spatial scaling is considered regardless of the reference picture list when POC is different between the reference picture of the neighboring PU and the reference picture of the current PU. If all PUs of the left side candidate are not available or intra coded, scaling for the upper side motion vector is allowed, facilitating parallel derivation of the left side MV candidate and the upper side MV candidate. Otherwise, spatial scaling for the upper motion vector is not allowed.

In the spatial scaling process, the motion vectors of neighboring PUs are scaled in a similar manner as the temporal scaling, as shown in fig. 9. The main difference is that the reference picture list and index of the current PU are given as input; the actual scaling procedure is the same as the time domain scaling procedure.

2.1.3.3 temporal motion vector candidates

All the procedures of derivation of temporal merge candidates are the same as those of derivation of spatial motion vector candidates (see fig. 6) except for reference picture index derivation. The reference picture index is signaled to the decoder.

Local illumination compensation in 2.2 JEM

Local Illumination Compensation (LIC) is based on a linear model for illumination variation using a scaling factor a and an offset b. And, a Codec Unit (CU) that codecs for each inter mode adaptively enables or disables local illumination compensation.

Fig. 10 shows neighboring samples used to derive LIC parameters.

When the LIC is applied to the CU, the parameters a and b are derived using the least squares error method by using the current CPU's neighboring samples and their corresponding reference samples. More specifically, as shown in fig. 12, sub-sampled (2: 1 sub-sampled) neighboring samples of a CU and corresponding samples in a reference picture (identified by motion information of the current CU or sub-CU) are used.

2.2.1 derivation of prediction blocks

LIC parameters are derived and applied separately for each prediction direction. For each prediction direction, a first prediction block is generated with decoded motion information, after which a provisional prediction block is obtained by applying the LIC model. After that, a final prediction block is derived using the two provisional prediction blocks.

When a CU is coded in merge mode, copying LIC flags from neighboring blocks in a similar manner to motion information copying in merge mode; otherwise, signaling a LIC flag for the CU to indicate whether to apply LIC.

When LIC is enabled for a picture, an additional CU-level RD check is needed to determine if LIC is applied to a CU. When LIC is enabled for a CU, the sum of absolute differences with mean removed (MR-SAD) and the sum of absolute Hadamard transform differences with mean removed (MR-SATD) are used for integer pixel motion refinement and fractional pixel motion refinement, respectively, instead of SAD and SATD.

To reduce the coding complexity, the following coding scheme is applied in JEM.

● when there is no significant illumination change between the current picture and its reference picture, LIC is disabled for the entire picture. To identify this situation, a histogram of the current picture and each reference picture of the current picture is computed at the encoder. Disabling LIC for the current picture if the histogram difference between the current picture and each reference picture of the current picture is less than a given threshold; otherwise, starting LIC for the current picture.

2.3 inter-frame prediction method in VVC

There are several new codec tools for inter-prediction improvement, such as adaptive motion vector difference resolution (AMVR) for signaling MVD, affine prediction mode, Triangle Prediction Mode (TPM), advanced TMVP (ATMVP, also known as SbTMVP), generalized bi-directional prediction (GBI), bi-directional optical flow (BIO).

Coding and decoding block structure in 2.3.1 VVC

In the VVC, a quadtree/binary tree/multi-branch tree (QT/BT/TT) structure is adopted to divide a picture into square or rectangular blocks.

In addition to QT/BT/TT, a separate tree (also called dual codec tree) is also adopted for I frames in VVC. In the case of a split tree, the codec block structure is signaled separately for the luma and chroma components.

2.3.2 adaptive motion vector difference resolution

In HEVC, when use _ integer _ mv _ flag in a slice header is equal to 0, a Motion Vector Difference (MVD) (between a motion vector of a PU and a predicted motion vector) is signaled in units of quarter luma samples. In VVC, locally Adaptive Motion Vector Resolution (AMVR) is introduced. In the VVC, the MVD can be coded in units of quarter-luminance samples, full-luminance samples, and four-luminance samples (i.e., 1/4-pel, 1-pel, 4-pel). The MVD resolution is controlled on a Coding Unit (CU) level, and an MVD resolution flag is conditionally signaled for each CU having at least one non-zero MVD component.

For a CU with at least one non-zero MVD component, a first flag is signaled to indicate whether quarter luma sample MV precision is used in the CU. When the first flag (equal to 1) indicates that quarter-luma sample MV precision is not used, another flag is signaled to indicate whether full-luma sample MV precision or four-luma sample MV precision is used.

When the first MVD resolution flag of a CU is zero, or when the flag is not coded for the CU (meaning that all MVDs within the CU are zero), a quarter-luma sample MV resolution is used for the CU. When the CU uses the full luma sample MV precision or the four luma sample MV precision, the MVPs in the AMVP candidate list of the CU are rounded to the corresponding precision.

2.3.3 affine motion compensated prediction

In HEVC, only the translational motion model is applied for Motion Compensated Prediction (MCP). In the real world, there are many kinds of movements, such as zoom-in/zoom-out, rotation, perspective movement, and other irregular movements. In VVC, a simplified affine transform motion compensated prediction is applied using a 4-parameter affine model and a 6-parameter affine model. As shown in fig. 13, the affine motion field of the block is described by two Control Point Motion Vectors (CPMV) of a 4-parameter affine model and 3 CPMV of a 6-parameter affine model.

Fig. 11 shows a simplified affine motion model of (a) 4-parameter affine and (b) 6-parameter affine.

The Motion Vector Field (MVF) of the block is described by the following equations, respectively, in which a 4-parameter affine model is employed in equation (1) (in which 4 parameters are defined as variables a, b, e, and f), and a 6-parameter affine model is employed in equation 2 (in which 6 parameters are defined as a, b, c, d, e, and f):

wherein (mv)^h ₀,mv^h ₀) Is the motion vector of the upper left corner control point, (mv)^h ₁,mv^h ₁) Is the motion vector of the upper right corner control point, (mv)^h ₂,mv^h ₂) Is the motion vector of the lower left corner control point, all three of which are referred to as Control Point Motion Vectors (CPMV), (x, y) representing the coordinates of the representative point with respect to the upper left sample point in the current block, (mv)^h(x,y),mv^v(x, y)) is a motion vector derived for a sample located at (x, y). The CP motion vectors may be signaled (such as in affine AMVP mode) or may be derived instantaneously (such as in affine merge mode). w and h are the width and height of the current block. In practice, division is achieved by right shifting along with a rounding operation. In the VTM, a representative point is defined as the center position of the subblock, for example, when the coordinate of the upper left corner of the subblock is (xs, ys) with respect to the coordinate of the upper left sample point within the current block, then the coordinate of the representative point is defined as (xs +2, ys + 2). For each sub-block (e.g., 4 × 4 in VTM), the representative point is used to derive a motion vector for the entire sub-block.

To further simplify motion compensated prediction, sub-block based affine transform prediction is applied. To derive the motion vector for each M × N (in the current VVC, both M and N are set to 4) subblocks, the motion vector for the center sample point of each subblock is calculated according to equations (1) and (2) (as shown in fig. 14) and rounded to 1/16 fractional accuracy. Then, a motion compensated interpolation filter of 1/16 pixels is applied, generating a prediction for each sub-block by means of the derived motion vector. The 1/16 pixel interpolation filter is introduced by the affine mode.

After MCP, the high precision motion vector of each sub-block is rounded and saved with the same precision as the normal motion vector.

Signaling of affine predictions

Similar to the translational motion model, there are two modes of signaling side information due to affine prediction. They are AFFINE _ INTER mode and AFFINE _ MERGE mode.

-AF _ INTER mode

For CUs with both width and height larger than 8, the AF _ INTER mode may be applied. An affine flag in the CU level is signaled in the bitstream to indicate whether AF _ INTER mode is used.

In this mode, for each reference picture List (List 0 or List 1), an affine AMVP candidate List is constructed in the following order with three types of affine motion predictors, where each candidate includes the estimated CPMV of the current block. Signaling optimal CPMV found at the codec side (e.g., mv in FIG. 15)₀ mv₁ mv₂) And the difference between the estimated CPMV. Furthermore, the index of the affine AMVP from which the estimated CPMV is derived is further signaled.

1) Inherited affine motion predictor

The inspection order is similar to that of spatial MVP in HEVC AMVP list construction. First, a left-handed affine motion predictor is derived from the first block in a1, a0 that is affine coded and has the same reference picture as in the current block. Second, the upside-down inherited affine motion predictor is derived from the first block in { B1, B0, B2} that is affine coded and has the same reference picture as in the current block. Five chunks a1, a0, B1, B0, B2 are depicted in fig. 16.

Once the neighboring block is found to be coded with affine mode, the CPMV of the coding unit covering the neighboring block is used to derive the predictor of the CPMV of the current block. For example, if a1 was coded with a non-affine mode and a0 was coded with a 4-parameter affine mode, then the left-handed affine MV predictor would be derived from a 0. In this case, the method of using the upper left CPMV among the CPMVs (see FIG. 16) covering the CUs of A0

And the upper right CPMV in FIG. 16

Represented) to derive an estimated CPMV for the current block from the upper left position (with coordinates (x0, y0)), the upper right position (with coordinates (x1, y1)) and the lower right position (with coordinates (x2, y2)) of the current block

As indicated.

2) Building affine motion predictor

The building affine motion predictor consists of Control Point Motion Vectors (CPMV) derived from neighboring inter-coded blocks, which have the same reference picture, as shown in fig. 15. If the current affine motion model is a 4-parameter affine, the number of CPMVs is 2, otherwise if the current affine motion model is a 6-parameter affine, the number of CPMVs is 3. Upper left CPMV

Is derived from the MV at the first block in the group a, B, C that is inter coded and has the same reference picture as in the current block. CPMV at the upper right

Is derived from the MV at the first block in the group D, E that is inter coded and has the same reference picture as in the current block. Left lower CPMV

Is derived from the MV at the first block in the group F, G that is inter coded and has the same reference picture as in the current block.

If the current affine motion model is a 4-parameter affine, then only

And

both are found, that is to say

And

the constructed affine motion predictor is only inserted into the candidate list when used as the estimated CPMV for the top left (with coordinates (x0, y0)) and top right (with coordinates (x1, y1)) positions of the current block.

If the current affine motion model is a 6 parameter affine, then only

And

all of the results are found, that is,

and

only when used as estimated CPMVs for the top-left position (with coordinates (x0, y0)), the top-right position (with coordinates (x1, y1)) and the bottom-right position (with coordinates (x2, y2)) of the current block will an affine constructedThe motion predictor is inserted into the candidate list.

The clipping process is not applied when the build affine motion predictor is inserted into the candidate list.

3) Normal AMVP movement predictor

The following applies until the number of affine motion predictors reaches a maximum value.

1) By setting all CPMVs equal to

An affine motion predictor is derived (if available).

2) By setting all CPMVs equal to

An affine motion predictor is derived (if available).

3) By setting all CPMVs equal to

An affine motion predictor is derived (if available).

4) The affine motion predictor is derived by setting all CPMVs equal to HEVC TMVP (if available).

5) The affine motion predictor is derived by setting all CPMVs to zero MV.

Note that in constructing an affine motion predictor, we have derived

Fig. 14 shows MVPs for AF _ INTER for inherited affine candidates.

Fig. 15 shows MVPs for AF _ INTER for constructing affine candidates.

In the AF _ INTER mode, 2/3 control points are required when the 4/6 parameter affine mode is applied, and thus 2/3 MVDs need to be coded for these control points, as shown in fig. 15. It is proposed to derive the MV as follows, i.e. from the mvd₀Predicting mvd₁And mvd₂。

As shown in fig. 15, in which,

mvd_iand mv₁The predicted motion vector, motion vector difference, and motion vector of the upper left pixel (i ═ 0), the upper right pixel (i ═ 1), or the lower left pixel (i ═ 2), respectively. Note that the addition of two motion vectors (e.g., mvA (xA, yA) and mvB (xB, yB)) is equal to the sum of two components separately, that is, newMV is mvA + mvB, and the two components of newMV are set to (xA + xB) and (yA + yB), respectively.

-AF _ MERGE mode

When the CPU is applied in AF _ MERGE mode, it gets the first block coded with affine mode from the valid neighbor reconstruction block. And the order of selection of candidate blocks is from left, top right, bottom left to top left as shown in fig. 16 (represented by A, B, C, D, E in order). For example, if the neighbor lower-left block is affine mode coded (as represented by a0 in fig. 16), then the Control Point (CP) motion vector mv containing the top-left, top-right, and bottom-left corners of the neighboring CU/PU of block a is extracted₀ ^N、mv₁ ^NAnd mv₂ ^N. Based on mv₀ ^N、mv₁ ^NAnd mv₂ ^NCalculating the top left/top right/bottom left motion vector mv of the current CU/PU₀ ^C、mv₁ ^CAnd mv₂ ^C(it is used only for 6-parameter affine models). It should be noted that in VTM-2.0,if the current block is affine coded, the sub-block located in the upper left corner (e.g., 4 x 4 block in VTM) stores mv0 and the sub-block located in the upper right corner stores mv 1. If the current block is coded using a 6-parameter affine model, the sub-block located in the lower left corner stores mv 2; otherwise (using a 4-parameter affine model), the LB stores mv 2'. The other sub-blocks store MVs for the MC.

In deriving the CPMV mv of the current CU₀ ^C、mv₁ ^CAnd mv₂ ^CThereafter, the MVF of the current CU is generated according to the simplified affine motion model equations (1) and (2). To identify whether the current CU is coded in AF _ MERGE mode, an affine flag is signaled in the bitstream when there is at least one neighboring block coded in affine mode.

The affine merge candidate list is constructed by the following steps:

1) inserting inherited affine candidates

Inherited affine candidates refer to: candidates are derived from affine motion models of blocks for which valid neighbor affine codecs are located. At most two inherited affine candidates are derived from affine motion models of neighboring blocks and inserted into the candidate list. For the left predictor, the scan order is { A0, A1 }; for the upper predictor, the scan order is { B0, B1, B2 }.

2) Insertion-building affine candidates

If the number of candidates in the affine merge candidate list is less than maxnumaffineband (e.g., 5), then the build affine candidate is inserted into the candidate list. Constructing affine candidates means: candidates are constructed by combining neighbor motion information for each control point.

a) The motion information of the control point is first derived from the assigned spatial and temporal neighbors shown in fig. 17. CPk (k ═ 1,2, 3, 4) denotes the kth control point. A0, a1, a2, B0, B1, B2, and B3 are spatial positions for predicting CPk (k ═ 1,2, 3); t is the temporal location used to predict CP 4.

The coordinates of the CP1, CP2, CP3, and CP4 are (0,0), (W,0), (H,0), and (W, H), respectively, where W and H are the width and height of the current block.

The motion information of each control point is obtained according to the following priority order:

for CP1, the check priority is B2- > B3- > a 2. If B2 is available, then B2 applies. Otherwise, if B2 is not available, then B3 is used. If neither B2 nor B3 is available, then A2 is used. If all three candidates are not available, then no motion information for CP1 is available.

For CP2, check priority is B1- > B0.

For CP3, check priority is a1- > a 0.

For CP4, T is used.

b) Next, affine merge candidates are constructed using combinations of control points.

I. Motion information of three control points is required to construct a 6-parameter affine candidate. The three control points may be selected from one of the following four combinations ({ CP1, CP2, CP4}, { CP1, CP2, CP3}, { CP2, CP3, CP4}, { CP1, CP3, CP4 }). The combinations CP1, CP2, CP3, CP2, CP3, CP4, CP1, CP3, CP4 will be converted into a 6-parameter motion model represented by an upper left control point, an upper right control point and a lower left control point.

Motion information of two control points is needed to construct a 4-parameter affine candidate. The two control points may be selected from one of two combinations ({ CP1, CP2}, { CP1, CP3 }). These two combinations will be converted into a 4-parameter motion model represented by the upper left control point and the upper right control point.

Building a combination of affine candidates inserted into the candidate list in the following order: { CP1, CP2, CP3}, { CP1, CP2, CP4}, { CP1, CP3, CP4}, { CP2, CP3, CP4}, { CP1, CP2}, { CP1, CP3}

i. For each combination, the reference index of list X of each CP is checked, and if they are all the same, then this combination has a valid CPMV for list X. If the combination does not have a valid CPMV for both list 0 and list 1, then the combination is marked as invalid. Otherwise, it is valid and CPMV is put into the subblock merge list.

3) Filling with zero motion vectors

If the number of candidates in the affine merge candidate list is less than 5, a zero motion vector with a zero reference index is inserted into the candidate list until the list is full.

More specifically, for the sub-block merge candidate list, the 4-parameter merge candidate has an MV set to (0,0) and prediction directions set to unidirectional prediction (for P slices) and bidirectional prediction (for B slices) from list 0.

In VTM4, the CPMV of the affine CU is stored into a separate buffer. The stored CPMVs are only used to generate the inherited CPMVPs in affine merge mode and affine AMVP mode for the recently coded CU. Subblocks MV derived from CPMV are used for motion compensation, MV derivation of merge/AMVP lists for panning MV, and deblocking. To avoid the picture line buffers for additional CPMVs, the affine motion data inheritance from CUs originating from above CTUs is treated differently than from the normal neighboring CUs. If the candidate CU for affine motion data inheritance is within a row of upper CTUs, the lower left and lower right sub-blocks MV in the row buffer (instead of CPMV) are used for affine MVP derivation. In this way, the CPMV is only stored into the local buffer. If the candidate CU is 6-parameter affine coded, the affine model is downgraded to a 4-parameter model.

Merge (MMVD) using motion vector differences

The final motion vector representation (UMVE, also known as MMVD) will be introduced. UMVE is used for skip mode or merge mode employing the proposed motion vector expression method.

UMVE reuses the same merge candidates as those included in the conventional merge candidate list in VVC. Among these merge candidates, basic candidates can be selected and further extended by the proposed motion vector expression method.

UMVE provides a new Motion Vector Difference (MVD) representation method in which MVDs are represented using a starting point, a motion magnitude, and a motion direction.

This proposed technique uses the merge candidate list as is. But only the candidate of the DEFAULT merge TYPE (MRG _ TYPE _ DEFAULT _ N) is considered for UMVE expansion.

The base candidate index defines a starting point. The base candidate index indicates the best candidate among the candidates in the list, as described below.

TABLE 1 basic candidate IDX

Basic candidate IDX	0	1	2	3
					Nth MVP	1 st MVP	2 nd MVP	3 rd MVP	4 th MVP

If the number of basic candidates is equal to 1, the basic candidate IDX is not signaled.

The distance index is motion amplitude information. The distance index indicates a predefined distance from the start point information. The predefined distances are as follows:

TABLE 2 distance IDX

The direction index indicates the direction of the MVD with respect to the starting point. The direction index may represent four directions as shown below.

TABLE 3 Direction IDX

Direction IDX	00	01	10	11
					x axis	+	–	N/A	N/A
y axis	N/A	N/A	+	–

The UMVE flag is signaled immediately after sending the skip flag or the mergerge flag. If the skip or merge flag is true, then the UMVE flag is parsed. If the UMVE flag is equal to 1, then the UMVE syntax is parsed. However, if not equal to 1, the AFFINE flag is resolved. If AFFINE flag is equal to 1, it is AFFINE mode, but if not equal to 1, then the skip/merge index is parsed for the skip/merge mode of the VTM.

No additional line buffers due to UMVE candidates are required. Since the software skip/merge candidates are directly used as base candidates. In case of using the input UMVE index, the supplementation of the MV is decided just before the motion compensation. It is not necessary to maintain a long line buffer for this purpose.

Under the current common test condition, the first merge candidate or the second merge candidate in the merge candidate list can be selected as the basic candidate.

UMVE is also known as merge (mmvd) with MV difference.

Furthermore, the flag tile _ group _ fpel _ mmvd _ enabled _ flag whether fractional distance is used is signaled to the decoder in the slice header. When fractional distances are disabled, the distances in the default table are all multiplied by 4, i.e., the distance table {1,2,4,8,16,32,64,128} pixels is used. Since the size of the distance table does not change, the entropy coding of the distance index does not change.

Decoder side motion vector refinement (DMVR)

In the bi-directional prediction operation, for prediction of a region of one block, two prediction blocks formed using Motion Vectors (MVs) of list 0 and MVs of list 1, respectively, are combined to form a single prediction signal. In the decoder-side motion vector refinement (DMVR) method, the two motion vectors of the bi-prediction are further refined.

DMVR in JEM

In the JEM design, the motion vectors are refined by a double-sided template matching process. Applying double-sided template matching in the decoder to perform a distortion-based search between the double-sided template and reconstructed samples in the reference picture, thereby obtaining refined MVs without the need to transmit additional motion information. An example is shown in fig. 20. As shown in fig. 20, the double-sided template is generated as a weighted combination (i.e., average) of two prediction blocks from the initial MV0 of list 0 and the initial MV1 of list 1, respectively. The template matching operation consists of computing a cost metric between the generated template and the sample region in the reference picture (surrounding the initial prediction block). For each of the two reference pictures, the MV that results in the lowest template cost is considered the updated MV of the list to replace the original one. In JEM, nine MV candidates are searched for each list. The nine MV candidates include the original MV and 8 surrounding MVs that have a luminance sample offset from the original MV in the horizontal direction or the vertical direction or both. Finally, two new MVs, MV0 'and MV 1', as shown in fig. 20, are used to generate the final bi-directional prediction result. The Sum of Absolute Differences (SAD) is used as a cost measure. Note that in calculating the cost of a prediction block generated from one surrounding MV, the prediction block is obtained using rounded MVs (to the integer-pixel level) instead of actual MVs.

Fig. 20 shows DMVR based on two-sided template matching.

DMVR in VVC

For DMVR in VVC, assume that the MVD mirror between list 0 and list 1 is as shown in fig. 21, and perform bilateral matching to refine the MV, i.e. find the best MVD among several MVD candidates. MVs of the two reference picture lists are represented by MVL0(L0X, L0Y) and MVL1(L1X, L1Y). The MVDs represented by (MvdX, MvdY) of list 0 that can minimize the cost function (e.g., SAD) are defined as the optimal MVDs. For the SAD function, which is defined as the SAD between the list 0 reference block, which is derived using the motion vectors in the list 0 reference pictures (L0X + MvdX, L0Y + MvdY), and the list 1 reference block, which is derived using the motion vectors in the list 1 reference pictures (L1X-MvdX, L1Y-MvdY).

The motion vector refinement process may iterate twice. As shown in fig. 22, in each iteration, a maximum of 6 MVDs (with integer pixel accuracy) can be checked in two steps. In a first step, the MVD (0,0), (-1,0), (0, -1), (0,1) is checked. In a second step, one of the MVDs (-1, -1), (-1, 1), (1, -1), or (1,1) may be selected for inspection and further inspected. Assume that the function Sad (x, y) returns the Sad value of MVD (x, y). The MVD represented by (MvdX, MvdY) examined in the second step is decided as follows:

MvdX＝-1；

MvdY＝-1；

If(Sad(1，0)<Sad(-1,0))

MvdX＝1；

If(Sad(0，1)<Sad(0，-1))

MvdY＝1；

in the first iteration, the starting point is the signaled MV, and in the second iteration, the starting point is the signaled MV plus the selected best MVD in the first iteration. DMVR applies only when one reference picture is a preceding picture and the other reference picture is a succeeding picture and both reference pictures have the same picture order count distance from the current picture.

Fig. 22 shows MVs that can be examined in one iteration. Also, the DMVR in the VVC first performs integer MVD refinement as described above. This is the first step. Thereafter, MVD refinement in fractional precision is conditionally performed, thereby further refining the motion vectors. This is the second step. The condition of whether to perform the second step is based on whether the MVD after the current iteration is a zero MV. If it is a zero MV (the vertical and horizontal components of the MV are 0), then the second step will be performed.

Details of the fractional MVD refinement are given below. It should be noted that the MVD represents the difference between the initial motion vector and the final motion vector used in motion compensation.

The integer distance positions and the estimated cost at these positions are used to fit a parametric error surface, which is then used to determine 1/16 pixel accuracy sub-pixel offsets.

The proposed method will be summarized below:

1. the parametric error surface fit is only calculated when the center position is the best cost position in a given iteration.

2. The center position cost and the cost at the (-1,0), (0, -1), (1,0) and (0,1) positions from the center are used to fit a 2-D parabolic error surface equation of the form

E(x,y)＝A(x-x₀)²+B(y-y₀)²+C

Wherein (x)₀,y₀) Corresponding to the position with the lowest cost, C corresponds to the lowest cost value. By solving 5 equations out of 5 unknowns, (x)₀,y₀) Is calculated as:

x₀＝(E(-1,0)-E(1,0))/(2(E(-1,0)+E(1,0)-2E(0,0)))

y₀＝(E(0,-1)-E(0,1))/(2((E(0,-1)+E(0,1)-2E(0,0)))

(x) can be determined by adjusting the accuracy with which the division is performed (i.e., how many bits of the quotient are calculated)₀,y₀) To any desired sub-pixel accuracy calculation. For 1/16 pixel accuracy, only 4 bits in the absolute value of the quotient need to be calculated, which applies to the 2 divisions needed to implement each CU based on fast shift subtraction.

3. Calculated (x)₀,y₀) Is added to the integer distance refinement MV resulting in a sub-pixel exact refinement increment MV.

The magnitude of the derived fractional motion vector is constrained to be less than or equal to half a pixel

To further simplify the process of DMVR, several variations on the design in JEM are proposed. More specifically, the DMVR design adopted by VTM-4.0 (which will soon be released) has the following main features:

● terminate early when the (0,0) position SAD between List 0 and List 1 is less than the threshold.

● terminate early when the SAD between List 0 and List 1 is zero for a certain position.

● block size of DMVR: w × H > -64 & & H > -8, where W and H are the width and height of the block.

● for a DMVR with a CU size >16 × 16, the CU is divided into multiple 16 × 16 sub-blocks. When only the width or height of a CU is greater than 16, it is divided only in the vertical direction or the horizontal direction.

● reference block size (W +7) × (H +7) (for luminance).

● integer pixel search based on 25-point SAD (i.e., (+ -)2 refines the search range, single stage)

● are based on a bilinear interpolated DMVR.

● sub-pixel refinement based on the "parametric error surface equation". This process is only performed when the lowest SAD cost is not equal to zero and the best MVD is (0,0) in the last MV refinement iteration.

● luma/chroma MC w/reference block padding (if needed)

● are used only for refined MVs for MC and TMVP.

Use of DMVR

A DMVR may be enabled when the following conditions are true:

the DMVR enabled flag (i.e., SPS _ DMVR _ enabled _ flag) in the SPS is equal to 1.

The TPM flag, the inter affine flag, and the sub-block merge flag (either atmmvpmerge or affine merge) flags are all equal to 0.

The merge flag is equal to 1.

-the current block is bi-predicted and the POC distance between the current picture and the reference picture in list 1 is equal to the POC distance between the reference picture in list 0 and the current picture.

-current CU height greater than or equal to 8

The number of brightness samples (CU width x height) is greater than or equal to 64

-desired reference sample points in DMVR

For a block with size W × H, (W +2 × offSet + filterSize-1) × (H +2 × offSet + filterSize-1) reference samples are needed, assuming that the maximum allowed MVD value is +/-offSet (e.g., 2 in VVC) and the filter size is filterSize (e.g., 8 for luminance and 4 for chrominance in VVC). To reduce memory bandwidth, only the central (W + filterSize-1) × (H + filterSize-1) reference samples are extracted, and other pixels are generated by repeating the boundaries of the extracted samples. An example of an 8 x 8 block is shown in fig. 23.

During motion vector refinement, bilinear motion compensation is performed using these reference samples. At the same time, final motion compensation is also performed using these reference samples.

Fig. 23 shows an example of a required reference sampling point with padding.

-Combined Intra and Inter Prediction (CIIP)

Multi-hypothesis prediction is proposed, where combining intra and inter prediction is one way to generate multiple hypotheses.

When applying multi-hypothesis prediction to improve intra mode, multi-hypothesis prediction combines one intra prediction and one merge index prediction. In the merge CU, a flag is signaled for merge mode, so that intra mode is selected from the intra candidate list when the flag is true. For the luminance component, the intra candidate list is derived from 4 intra prediction modes including DC mode, planar mode, horizontal mode, and vertical mode, and the size of the intra candidate list may be 3 or 4 depending on the block shape. When the CU width is greater than twice the CU height, the horizontal mode is excluded from the intra-mode list, and when the CU height is greater than twice the CU width, the vertical mode is removed from the intra-mode list. A weighted average is used to combine one intra prediction mode selected by an intra mode index and one merge index prediction selected by a merge index. For the chroma component, DM is always applied without excessive signaling. The weights used for the joint prediction are described as follows. Equal weights are applied when either DC or planar mode is selected, or CB width or height is less than 4. For those CBs having a CB width and height greater than or equal to 4, when the horizontal/vertical mode is selected, one CB is first vertically/horizontally divided into four equal-area regions. Is represented as (w _ intra)_i,w_inter_i) Will be applied to the corresponding region, where i is from 1 to 4, and (w _ intra)₁，w_inter₁)＝(6，2)，(w_intra₂，w_inter₂)＝(5，3)，(w_intra₃，w_inter₃)＝(3，5)，(w_intra₄，w_inter₄)＝(2，6)。(w_intra₁,w_inter₁) For the region closest to the reference sample point, and (w _ intra)₄，w_inter₄) For the region furthest from the reference sample. The combined prediction can then be calculated by summing the two weighted predictions and right-shifting by 3 bits. In addition, the intra-prediction mode of the intra-hypothesis for the predictor may be saved for reference by the next neighboring CU.

In VTM4, when a CU is coded in merge mode, and if the CU contains at least 64 luma samples (i.e., CU width multiplied by CU height is equal to or greater than 64), an additional flag is signaled to indicate whether or not to apply combined inter/intra prediction (CIIP) to the current CU.

3. Problem(s)

CIIP employs a weighted combination between intra-prediction and inter-prediction, which may be less efficient when coding and decoding screen content with sharp edges. Which may obscure the predicted signal and further impair codec performance.

4. Example enumeration of embodiments and techniques

It is proposed that when CIIP is enabled for a block, certain samples within the block can be predicted from intra prediction only, while others can be predicted from inter prediction only.

The following detailed description is to be taken as an example to illustrate the general concepts. These techniques should not be construed narrowly. Furthermore, these inventions may be combined in any manner.

The methods described below may also be applicable to other decoder motion information derivation techniques in addition to the CIIP mentioned below.

1. The weights may be updated in CIIP mode for the prediction unit/codec block/region.

a. In one example, in CIIP mode can be represented as (w _ intra)_i,w_inter_i) (where i is from 1 to 4) the weight set is updated to (w _ intra)₁,w_inter₁)＝(N,0)，(w_intra₂，w_inter₂)＝(0，N)，(w_intra₃,w_inter₃) (0, N) and (w _ intra)₄,w_inter₄)＝(0,N)

b. In one example, in CIIP mode can be represented as (w _ intra)_i,w_inter_i) (where i is updated from 1 to 4) weight set (w _ intra1, w _ inter1) ═ N,0), (w _ intra2, w _ inter2) ═ N,0), (w _ intra3, w _ inter3) ═ 0, N, and (w _ intra4, w _ inter4) ═ 0, N)

c. In one example, in CIIP mode can be represented as (w _ intra)_i,w_inter_i) (wherein i is from 1 to4) Is updated to (w _ intra1, w _ inter1) ═ N,0), (w _ intra2, w _ inter2) ═ N,0, (w _ intra3, w _ inter3) ═ N,0, (w _ intra4, w _ inter4) ═ 0, N)

d. In one example, the weights of only some portions are updated according to the above method, and other portions still use the current weights.

e. In one example, the upper k rows of pixels may have an (intra, inter) weight of (N,0), and the other rows may have a weight of (0, N).

f. In one example, the left k columns of pixels may have an (intra, inter) weight of (N,0), and the other columns may have a weight of (0, N).

g. In one example, N is set to 1. Alternatively, when the final prediction block is the average of the two prediction blocks divided by 8, N is set to 8.

h. In one example, weighted prediction may be implemented as block replication in CIIP mode when using the updated weight set described above.

i. In one example, the value of the weight of the update in CIIP mode for a prediction unit/codec block/region may depend on, for example, a message (e.g., flag) signaled in the sequence (e.g., SPS)/slice (e.g., slice header)/slice group (e.g., slice group header)/picture level (e.g., picture header)/block level (e.g., CTU or CU).

2. The indication of the use of updating weights in CIIP mode for a prediction unit/codec block/region may depend on, for example, a message (e.g., a flag) signaled in a sequence (e.g., SPS)/slice (e.g., slice header)/slice group (e.g., slice group header)/picture level (e.g., picture header)/block level (e.g., CTU or CU).

a. In one example, a flag (e.g., tile _ group _ MHIntra _ SCC _ weight _ enabled _ flag) signaled in a sequence (e.g., SPS)/slice (e.g., slice header)/slice group (e.g., slice header)/picture level (e.g., picture header)/block level (e.g., CTU or CU) is true, then the weights in the CIIP mode may be updated. In other words, the weights in the CIIP may be updated when the signaled flag is true in the sequence (e.g., SPS)/slice (e.g., slice header)/slice group (e.g., slice header)/picture level (e.g., picture header)/block level (e.g., CTU or CU).

b. In one example, a flag (e.g., tile _ group _ MHIntra _ SCC _ weight _ enabled _ flag) signaled in a sequence (e.g., SPS)/slice (e.g., slice header)/slice group (e.g., slice header)/picture level (e.g., picture header)/block level (e.g., CTU or CU) is false, then the weights in the CIIP mode may not be updated. In other words, when the flag signaled in the sequence (e.g., SPS)/slice (e.g., slice header)/slice group (e.g., slice header)/picture level (e.g., picture header)/block level (e.g., CTU or CU) is false, the weights within the CIIP may not be updated.

c. In one example, an indication of the use of updates to weights in CIIP mode may be inferred, and may depend on

a) Dimension of current block

b) Current quantization parameter

c) Transformation type

d) Coding and decoding mode for inter-frame motion vector directed blocks

e) Motion vector accuracy

f) Merge index used in CIIP

g) Intra-prediction modes used in CIIP

h) Magnitude of motion vector used in CIIP

i) Indication of which weight set to use for neighboring blocks

d. In one example, the indication of the use of updating weights in the CIIP mode may be signaled on a sub-block level.

e. In one example, an indication of the use of updates to weights in CIIP mode may be inferred at the sub-block level.

3. Multiple weight sets may be provided in CIIP mode. The indication of which weight set to use for a prediction unit/codec block/region may depend on, for example, a message (e.g., a flag) signaled in the sequence (e.g., SPS)/slice (e.g., slice header)/slice group (e.g., slice header)/picture level (e.g., picture header)/block level (e.g., CTU or CU).

a. In one example, a notification message (e.g., a flag) may be signaled in a sequence (e.g., SPS)/slice (e.g., slice header)/slice group (e.g., slice header)/picture level (e.g., picture header)/block level (e.g., CTU or CU) to indicate which weight set may be used in the CIIP mode.

b. Alternatively, in one example, an indication of which weight set to use in CIIP mode may be inferred, and further, in one example, inference thereof may be based on

a) Dimension of current block

b) Current quantization parameter

c) Merge index used in CIIP

d) Intra-prediction modes used in CIIP

e) Motion vector magnitude used within CIIP

f) Indication of which weight set to use for neighboring blocks

c. In one example, an indication of which weight set to use in CIIP mode may be signaled at the sub-block level.

d. In one example, an indication of which weight set to use in CIIP mode may be inferred at the sub-block level.

Examples of the embodiments

7.3.2 original byte sequence payload, trailing bits (trailing bits) and byte alignment syntax

7.3.2.1 sequence parameter set RBSP syntax

sps _ scc _ MHIntra _ weight _ enabled _ flag specifies that the weights in the CIIP mode are updated. sps _ scc _ MHIntra _ weight _ enabled _ flag equal to 0 specifies that no update is made to the weights in the CIIP mode. If there is no sps _ scc _ MHIntra _ weight _ enabled _ flag, it is inferred to be 0.

7.3.4 slice group header syntax

7.3.4.1 generic slice group header syntax

8.5.7.6 weighted sample prediction process for combining merge and intra prediction

The inputs to this process are:

-the width cbWidth of the current codec block,

-the height cbHeight of the current codec block,

two (cbWidth) x (cbHeight) arrays predSamplesInter and predSamplesIntra

-intra prediction mode predModeIntra

-variable cIdx specifying color component index

The output of this process is a (cbWidth) x (cbHeight) array predSamplesComb that predicts the sample point values

The variable bitDepth is derived as follows:

-if cIdx equals 0, bitDepth is set equal to bitDepth_Y。

Else, bitDepth is set equal to bitDepth_C。

The derivation of the predicted sample point predSamplesComb [ x ] [ y ] (where x is 0.. cbWidth-1 and y is 0.. cbHeight-1) is as follows:

the weight w is derived as follows:

-w is set equal to 4 if one or more of the following conditions is true:

-cbWidth is less than 4.

-cbHeight less than 4.

predModeIntra equals INTRA _ PLANAR

predModeIntra equals INTRA _ DC.

Otherwise, if predModeIntra is INTRA _ ANGULAR50, then w is specified in tables 8-11, where nPos equals y and nSize equals cbHeight.

Otherwise, if predModeIntra is INTRA _ ANGULAR18, then w is specified in tables 8-11, where nPos equals x and nSize equals cbWidth.

-otherwise, w is set equal to 4.

The derivation of the predicted sample point predSamplesComb [ x ] [ y ] is as follows:

-if tile_group_MHIntra_SCC_weight_enabled_flag is 1,

-if tile_group_MHIntra_SCC_weight_enabled_flag is 0,

tables 8-11-specification of w1 as a function of position nP and size nS

Tables 8-12-specification of w2 as a function of position nP and size nS

Fig. 24 is a block diagram of the video processing apparatus 1000. Device 1000 may be used to implement one or more of the methods described herein. The device 1000 may be embodied in a smartphone, tablet, computer, internet of things (IoT) receiver, and the like. The device 1000 may include one or more processors 1002, one or more memories 1004, and video processing hardware 1006. The processor(s) 1002 may be configured to implement one or more methods described in this document. The memory(s) 1004 may be used to store data and code for implementing the methods and techniques described herein. The video processing hardware 1006 may be used to implement some of the techniques described in this document in hardware circuitry.

Fig. 25 is a flow diagram of an example method 2500 of video processing. The method 2500 includes updating (2502) weights for combining inter and intra prediction modes during a transition between a current video region and a bitstream representation of a current video block, and performing (2504) the transition based on the updating.

Additional features are described in the claims section and chapter 4.

Fig. 26 is a flow diagram of an example method 2600 of video processing. The method 2600 comprises: determining (2502) whether to enable or disable use of updating weights in a Combined Intra and Inter Prediction (CIIP) mode to be applied during a transition between a first block of video and a bitstream representation of the first block of video for the transition; updating (2504) the weights applied in the partial pixels of the first block in the CIIP mode in response to determining that updating of the weights in the CIIP mode is enabled; and performing (2506) a transformation based on the updated weights and the non-updated weights.

In some examples, CIIP mode is applied to derive a final prediction for the first block based on a weighted sum of intermediate intra prediction and intermediate merge inter prediction for the first block.

In some examples, CIIP mode is applied for at least one of prediction units, codec blocks, and regions during the conversion.

In some examples, the weights applied in the partial pixels of the first block in the CIIP mode include one or more weight sets, and each weight set (w _ intra)_i,w_inter_i) Including weights for intra modes (w _ intra)_i) And a weight for inter mode (w _ inter)_i) And applied to a corresponding prediction unit or codec block or region, where i is an integer.

In some examples, i is from 1 to 4.

In some examples, the set of weights (w _ intra)₁,w_inter₁) For the region closest to the reference sample point, and (w _ intra)₄,w_inter₄) For the region furthest from the reference sample.

In some examples, the set of weights in CIIP mode is updated to (w _ intra)₁,w_inter₁)＝(N,0)、(w_intra₂,w_inter₂)＝(0，N)、(w_intra₃,w_inter₃) (0, N) and (w _ intra)₄,w_inter₄) (0, N), N is an integer.

In some examples, the weight set in the CIIP mode is updated to (w _ intra1, w _ inter1) ═ N,0), (w _ intra2, w _ inter2) ═ N,0, (w _ intra3, w _ inter3) ═ 0, N), and (w _ intra4, w _ inter4) ═ 0, N being integers.

In some examples, the weight set in the CIIP mode is updated to (w _ intra1, w _ inter1) ═ N,0), (w _ intra2, w _ inter2) ═ N,0), (w _ intra3, w _ inter3) ═ N,0, and (w _ intra4, w _ inter4) ═ 0, N being an integer.

In some examples, the set of weights (w _ intra, w _ inter) for pixels at the top K rows of the first block has a weight (N,0) and the set of weights (w _ intra, w _ inter) for the other rows has a weight (0, N), K being an integer.

In some examples, the set of weights (w _ intra, w _ inter) for pixels at the left K columns of the first block has a weight (N,0) and the set of weights (w _ intra, w _ inter) for the other columns has a weight (0, N), K being an integer.

In one example, N is set to 1.

In some examples, N is set to 8 when the final prediction block is a weighted average of the two prediction blocks divided by 8.

In some examples, weighted prediction is implemented as block replication in CIIP mode when an updated set of weights is used during the transition.

In some examples, the values of the weights applied in the CIIP mode in the partial pixels of the first block depend on a message signaled in at least one of the following options: a sequence level including a Sequence Parameter Set (SPS), a slice level including a slice header, a slice group level including a slice group header, a picture level including a picture header, and a block level including a Coding Tree Unit (CTU) and a Coding Unit (CU).

In some examples, the determination is based on an indication of use of updates to weights in the CIIP mode.

In some examples, the indication of the use of updates to weights in CIIP mode depends on a message signaled in at least one of the following options: a sequence level including a Sequence Parameter Set (SPS), a slice level including a slice header, a slice group level including a slice group header, a picture level including a picture header, and a block level including a Coding Tree Unit (CTU) and a Coding Unit (CU).

In some examples, the message includes a flag present in the bitstream representation.

In some examples, when the flag is true, use of updates to weights in CIIP mode is enabled.

In some examples, when the flag is false, use of updating weights in the CIIP mode is disabled.

In some examples, the flag is tile _ group _ MHIntra _ SCC _ weight _ enabled _ flag.

In some examples, an indication of the use of updates to weights in the CIIP mode is inferred.

In some examples, the indication of the use of updates to weights in CIIP mode depends on at least one of the following options:

a) a current block dimension;

b) a current quantization parameter;

c) transforming the type;

d) coding and decoding modes of the blocks pointed by the inter-frame motion vectors;

e) motion vector accuracy;

f) merge index used in CIIP mode;

g) intra prediction mode used in CIIP mode;

h) the magnitude of the motion vector used within CIIP mode;

i) an indication of which weight set is used by neighboring blocks of the first block.

In some examples, the indication of the use of updating weights in CIIP mode is signaled at the sub-block level.

In some examples, the indication of the use of updates to weights in CIIP mode is inferred at the sub-block level.

Fig. 27 is a flow diagram of an example method 2700 of video processing. The method 2700 includes: determining (2702) a set of weights from a plurality of sets of weights being used in a Combined Intra and Inter Prediction (CIIP) mode for a transition between a first block of video and a bitstream representation of the first block of video, the determining depending on a message present in at least one of: a sequence level including a Sequence Parameter Set (SPS), a slice level including a slice header, a slice group level including a slice group header, a picture level including a picture header, and a block level including a Coding Tree Unit (CTU) and a Coding Unit (CU); applying (2704) a CIIP mode based on the determined set of weights to generate a final prediction of the first block; and performing (2706) the conversion based on the final prediction; wherein the final prediction of the first block is generated based on a weighted sum of the intermediate intra prediction and the intermediate merge inter prediction of the first block.

In some examples, at least one of the plurality of sets of weights in the CIIP mode is applied for at least one of a prediction unit, a codec block, and a region during the converting.

In some examples, the message is signaled to indicate which weight set to use in CIIP mode.

In some examples, an indication of which of a plurality of weight sets is being used in CIIP mode is inferred.

In some examples, the indication of which of the plurality of weight sets is being used in CIIP mode depends on at least one of:

a) a current block dimension;

b) a current quantization parameter;

c) merge index used in CIIP mode;

d) intra prediction mode used in CIIP mode;

e) the magnitude of the motion vector used in CIIP mode;

f) an indication of which weight set is used by neighboring blocks of the first block.

In some examples, the indication of which of the plurality of weight sets being used in CIIP mode is signaled at the sub-block level.

In some examples, the indication of which of the plurality of weight sets is being used in CIIP mode is inferred at the sub-block level.

In some examples, the conversion generates a first block of the video from the bitstream representation.

In some examples, the conversion generates a bitstream representation from a first block of the video.

Other aspects, examples, embodiments, modules, and functional operations disclosed and described herein may be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed herein and their structural equivalents, or in combinations of one or more of them. The disclosed embodiments and other embodiments may be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer-readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described herein can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD ROM and DVD ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this patent document contains many specifics, these should not be construed as limitations on the scope of any subject matter or claims, but rather as descriptions of features specific to particular embodiments of particular technologies. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although certain features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Also, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described herein should not be understood as requiring such separation in all embodiments.

Only a few implementations and examples have been described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims

1. A video processing method, comprising:

determining, for a transition between a first block of video and a bitstream representation of said first block of video, whether to enable or disable use of updating weights in a Combined Intra and Inter Prediction (CIIP) mode to be applied during said transition;

in response to determining that updating of weights in CIIP mode is enabled, updating weights applied in a portion of pixels of the first block in CIIP mode; and

the conversion is performed based on the updated weights and the non-updated weights.

2. The method of claim 1, wherein the CIIP mode is applied to derive a final prediction for the first block based on a weighted sum of intermediate intra prediction and intermediate merge inter prediction for the first block.

3. The method of claim 1 or 2, wherein the CIIP mode is applied for at least one of a prediction unit, a coded block, and a region during the converting.

4. The method according to any of claims 1-3, wherein the weights applied in the CIIP mode in the partial pixels of the first block comprise one or more weight sets, and each weight set (w _ intra)_i,w_inter_i) Including weights for intra modes (w _ intra)_i) And a weight for inter mode (w _ inter)_i) And applied to a corresponding prediction unit or codec block or region, where i is an integer.

5. The method of claim 4, wherein i is from 1 to 4.

6. The method according to claim 5, wherein the set of weights (w _ intra)₁,w_inter₁) For the region closest to the reference sample point, and (w _ intra)₄,w_inter₄) For the region furthest from the reference sample point.

7. The method of claim 5 or 6, wherein the set of weights in CIIP mode is updated to (w _ intra)₁,w_inter₁)＝(N,0)、(w_intra₂,w_inter₂)＝(0,N)、(w_intra₃,w_inter₃) (0, N) and (w _ intra)₄,w_inter₄) (0, N), N is an integer.

8. The method of claim 5 or 6, wherein the set of weights in CIIP mode is updated to (w _ intra)₁,w_inter₁)＝(N,0)、(w_intra₂,w_inter₂)＝(N,0)、(w_intra₃,w_inter₃) (0, N) and (w _ intra)₄,w_inter₄) (0, N), N is an integer.

9. The method of claim 5 or 6, wherein the set of weights in CIIP mode is updated to (w _ intra)₁,w_inter₁)＝(N,0)、(w_intra₂,w_inter₂)＝(N,0)、(w_intra₃,w_inter₃) (N,0) and (w _ intra)₄,w_inter₄) (0, N), N is an integer.

10. The method according to any of claims 1-5, wherein the set of weights (w _ intra, w _ inter) for pixels at the top K rows of the first block has a weight (N,0) and the set of weights (w _ intra, w _ inter) for the other rows has a weight (0, N), K being an integer.

11. The method according to any of claims 1-5, wherein the set of weights (w _ intra, w _ inter) for pixels at the left K columns of the first block has a weight (N,0) and the set of weights (w _ intra, w _ inter) for the other columns has a weight (0, N), K being an integer.

12. The method of any of claims 6-11, wherein N is set to 1.

13. The method according to any of claims 6-11, wherein N is set to 8 when the final prediction block is a weighted average of the two prediction blocks divided by 8.

14. The method according to any of claims 1-13, wherein weighted prediction is implemented as block copy in CIIP mode when using an updated weight set during the transition.

15. The method according to any of claims 1-14, wherein the values of the weights applied in the CIIP mode in the partial pixels of the first block depend on a message signaled in at least one of the following options: a sequence level including a Sequence Parameter Set (SPS), a slice level including a slice header, a slice group level including a slice group header, a picture level including a picture header, and a block level including a Coding Tree Unit (CTU) and a Coding Unit (CU).

16. The method of any of claims 1-15, wherein the determining is based on an indication of a use of updating weights in CIIP mode.

17. The method of claim 16, wherein the indication of the use of updates to weights in CIIP mode depends on a message signaled in at least one of the following options: a sequence level including a Sequence Parameter Set (SPS), a slice level including a slice header, a slice group level including a slice group header, a picture level including a picture header, and a block level including a Coding Tree Unit (CTU) and a Coding Unit (CU).

18. The method of claim 17, wherein the message comprises a flag present in the bitstream representation.

19. The method of claim 18, wherein when the flag is true, enabling use of updates to weights in CIIP mode.

20. The method of claim 18, wherein use of updating weights in CIIP mode is disabled when the flag is false.

21. The method of any of claims 18-20, wherein the flag is tile _ group _ MHIntra _ SCC _ weight _ enabled _ flag.

22. The method of claim 16, wherein an indication of use of updating weights in CIIP mode is inferred.

23. The method of claim 22, wherein the indication of the use of updates to weights in CIIP mode depends on at least one of:

a) a current block dimension;

b) a current quantization parameter;

c) transforming the type;

e) motion vector accuracy;

f) merge index used in CIIP mode;

g) intra prediction mode used in CIIP mode;

h) the magnitude of the motion vector used in CIIP mode;

24. The method of claim 16, wherein the indication of the use of updates to weights in CIIP mode is signaled at a sub-block level.

25. The method of claim 16, wherein the indication of use of updating weights in CIIP mode is inferred at a sub-block level.

26. A video processing method, comprising:

determining a set of weights from a plurality of sets of weights being used in a combined intra and inter prediction CIIP mode for a transition between a first block of video and a bitstream representation of said first block of video, said determining being dependent on a message present in at least one of: a sequence level including a Sequence Parameter Set (SPS), a slice level including a slice header, a slice group level including a slice group header, a picture level including a picture header, and a block level including a Coding Tree Unit (CTU) and a Coding Unit (CU);

applying a CIIP mode to generate a final prediction for the first block based on the determined set of weights; and

performing the conversion based on the final prediction;

wherein the final prediction of the first block is generated based on a weighted sum of the intermediate intra prediction and the intermediate merge inter prediction of the first block.

27. The method of claim 26, wherein, during the converting, at least one of the plurality of sets of weights in the CIIP mode is applied for at least one of a prediction unit, a coded block, and a region.

28. The method of claim 26 or 27, wherein the message comprises a flag present in the bitstream representation.

29. The method according to any of claims 26-28, wherein the message is signaled to indicate which weight set to use in CIIP mode.

30. The method of claim 26 or 27, wherein the indication of which of the plurality of weight sets is being used in CIIP mode is inferred.

31. The method of claim 30, wherein the indication of which of the plurality of weight sets is being used in CIIP mode depends on at least one of:

a) a current block dimension;

b) a current quantization parameter;

c) merge index used in CIIP mode;

d) intra prediction mode used in CIIP mode;

e) the magnitude of the motion vector used in CIIP mode;

32. The method of claim 26 or 27, wherein the indication of which of the plurality of weight sets is being used in CIIP mode is signaled at a sub-block level.

33. The method of claim 26 or 27, wherein the indication of which of the plurality of weight sets is being used in CIIP mode is inferred at a sub-block level.

34. The method of any of claims 1-33, wherein the converting generates a first block of video from the bitstream representation.

35. The method of any of claims 1-33, wherein the converting generates the bitstream representation from a first block of the video.

36. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of claims 1 to 35.

37. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for implementing the method of any of claims 1 to 35.