CN113676733A

CN113676733A - System and method for improved inter-frame intra joint prediction

Info

Publication number: CN113676733A
Application number: CN202111033487.XA
Authority: CN
Inventors: 修晓宇; 陈漪纹; 王祥林
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-01-09
Filing date: 2020-01-09
Publication date: 2021-11-19
Anticipated expiration: 2040-01-09
Also published as: CN113542748B; CN117014615A; CN113542748A; CN116347102B; CN117014615B; CN116347102A; CN117294842A; CN116800962A; CN113676733B

Abstract

The present disclosure relates to a system and method for improving inter-frame intra joint prediction for video coding and decoding. The method includes obtaining a first reference picture and a second reference picture associated with a current prediction block, obtaining a first prediction L0 based on a first motion vector MV0 from the current prediction block to a reference block in the first reference picture, obtaining a second prediction L1 based on a second motion vector MV1 from the current prediction block to a reference block in the second reference picture, determining whether to apply a bi-directional optical flow (BDOF) operation, and calculating bi-prediction of the current prediction block based on the first prediction L0 and the second prediction L1 and the first gradient value and the second gradient value.

Description

System and method for improved inter-frame intra joint prediction

Cross Reference to Related Applications

This application is based on and claims priority from provisional application No. 62/790421 filed on 9.1.2019, which is incorporated herein by reference in its entirety.

Technical Field

This application relates to video coding and compression. More particularly, the present application relates to a method and apparatus related to an inter-frame intra joint prediction (CIIP) method for video coding and decoding.

Background

Various video codec techniques may be used to compress the video data. Video coding is performed according to one or more video coding standards. For example, video codec standards include general video codec (VVC), joint exploration test model (JEM), high efficiency video codec (h.265/HEVC), advanced video codec (h.264/AVC), Moving Picture Experts Group (MPEG) codec, and so forth. Video coding typically uses prediction methods that exploit redundancy present in video pictures or sequences (e.g., inter-prediction, intra-prediction, etc.). An important goal of video codec techniques is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradation of video quality.

Disclosure of Invention

Examples of the present disclosure provide methods for improving the efficiency of semantic signaling of merge (merge) related modes.

According to a first aspect of the present disclosure, a video encoding and decoding method includes: obtaining a first reference picture and a second reference picture associated with a current prediction block, wherein the first reference picture precedes the current picture and the second reference picture follows the current picture in display order based on a first motion from the current prediction block to a reference block in the first reference pictureThe vector MV0 retrieves a first prediction L0, retrieves a second prediction L1 based on a second motion vector MV1 from the current prediction block to a reference block in a second reference picture, determines whether to apply a bi-directional optical flow (BDOF) operation, wherein the BDOF calculates first horizontal and vertical gradient values of prediction samples associated with the first prediction L0

And

and second horizontal and vertical gradient values associated with a second prediction L1

And

and based on the first and second predictions L0 and L1 and the first gradient value

And

and a second gradient value

And

a bi-prediction of the current prediction block is calculated.

According to a second aspect of the present disclosure, a video encoding and decoding method includes: the method includes obtaining a reference picture in a reference picture list associated with a current prediction block, generating inter prediction based on a first motion vector from the current picture to a first reference picture, obtaining an intra prediction mode associated with the current prediction block, generating intra prediction of the current prediction block based on the intra prediction, generating final prediction of the current prediction block by averaging the inter prediction and the intra prediction, and determining whether the current prediction block is considered as inter mode or intra mode for Most Probable Mode (MPM) based intra mode prediction.

According to a third aspect of the disclosure, a non-transitory computer-readable storage medium having instructions stored therein is provided. When executed by one or more processors, cause a computing device to perform operations comprising: obtaining a first reference picture and a second reference picture associated with a current prediction block, wherein the first reference picture precedes the current picture and the second reference picture follows the current picture in display order, obtaining a first prediction L0 based on a first motion vector MV0 from the current prediction block to a reference block in the first reference picture, obtaining a second prediction L1 based on a second motion vector MV1 from the current prediction block to a reference block in the second reference picture, determining whether to apply a bi-directional optical flow (BDOF) operation, wherein the BDOF calculates first horizontal and vertical gradient values of a prediction sample associated with the first prediction L0

And

And

and calculates a bi-prediction of the current prediction block.

According to a fourth aspect of the disclosure, a non-transitory computer-readable storage medium having instructions stored therein is provided. When executed by one or more processors, cause a computing device to perform operations comprising: the method includes obtaining a reference picture in a reference picture list associated with a current prediction block, generating inter prediction based on a first motion vector from the current picture to a first reference picture, obtaining an intra prediction mode associated with the current prediction block, generating intra prediction of the current prediction block based on the intra prediction, generating final prediction of the current prediction block by averaging the inter prediction and the intra prediction, and determining whether the current prediction block is considered as inter mode or intra mode for Most Probable Mode (MPM) based intra mode prediction.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate examples consistent with the disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a block diagram of an encoder according to an example of the present disclosure.

Fig. 2 is a block diagram of a decoder according to an example of the present disclosure.

Fig. 3 is a flow diagram illustrating a method for generating inter-frame intra joint prediction (CIIP) according to an example of the present disclosure.

Fig. 4 is a flow chart illustrating a method for generating a CIIP according to an example of the present disclosure.

Fig. 5A is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.

Fig. 5B is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.

Fig. 5C is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.

Fig. 5D is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.

Fig. 5E is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.

Fig. 6A is a diagram illustrating inter-frame intra joint prediction (CIIP) according to an example of the present disclosure.

Fig. 6B is a diagram illustrating inter-frame intra joint prediction (CIIP) according to an example of the present disclosure.

Fig. 6C is a diagram illustrating inter-frame intra joint prediction (CIIP) according to an example of the present disclosure.

Fig. 7A is a flow chart of an MPM candidate list generation process according to an example of the present disclosure.

Fig. 7B is a flow chart of an MPM candidate list generation process according to an example of the present disclosure.

Fig. 8 is a diagram illustrating a workflow of an existing CIIP design in a VVC, according to an example of the present disclosure.

Fig. 9 is a diagram illustrating a workflow of a CIIP method proposed by removing BDOF according to an example of the present disclosure.

Fig. 10 is a diagram illustrating a workflow of CIIP based on unidirectional prediction, where a prediction list is selected based on POC distances, according to an example of the present disclosure.

Fig. 11A is a flow chart of a method when generating CIIP-enabled blocks for an MPM candidate list according to an example of the present disclosure.

Fig. 11B is a flow chart of a method when generating a forbidden CIIP block for an MPM candidate list according to an example of the present disclosure.

Fig. 12 is a diagram illustrating a computing environment coupled with a user interface, according to an example of the present disclosure.

Detailed Description

Reference will now be made in detail to examples of the present disclosure, some of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments set forth in the following description of the examples of the present disclosure do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with aspects related to the disclosure set forth in the claims below.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein is intended to mean and include any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various information, the information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first information may be referred to as a second information without departing from the scope of the present disclosure; and similarly, the second information may also be referred to as the first information. As used herein, the term "if" may be understood in context to mean "when … …" or "at … …" or "in response to a determination".

The first version of the HEVC standard, finalized in 2013 in 10 months, provides a bit rate saving of about 50% or an equivalent perceptual quality compared to the previous generation video codec standard h.264/MPEG AVC. Although the HEVC standard provides significant codec improvements over its predecessors, there is evidence that higher codec efficiencies than HEVC can be achieved using additional codec tools. Based on this, both VCEG and MPEG started the exploration of new codec technologies standardized for future video codecs. In 10 months 2015, ITU-T VECG and ISO/IEC MPEG established a joint video exploration team (jfet), and important research on advanced technologies capable of greatly improving coding and decoding efficiency began. By integrating several additional codec tools on top of the HEVC test model (HM), jfet maintains a reference software named Joint Exploration Model (JEM).

In 10.2017, ITU-T and ISO/IEC published a joint proposal symptom set (CfP) on video compression with the capability to go beyond HEVC. In month 4 2018, 23 CfP responses were received and evaluated at the 10 th jfet meeting, which exhibited a compression efficiency gain of about 40% higher than that of HEVC. Based on such evaluation results, jfet initiated a new project to develop a new generation of video codec standard, which is named universal video codec (VVC). In the same month, a reference software code base called VVC Test Model (VTM) was established for presenting reference embodiments of the VVC standard.

Like HEVC, VVC is built on a block-based hybrid video codec framework. Fig. 1 (described below) presents a block diagram of a generic block-based hybrid video codec system. The input video signal is processed block by block, called Coding Unit (CU). In VTM-1.0, a CU may reach 128 × 128 pixels. However, unlike HEVC which partitions blocks based on only quadtrees, in VVC, one Coding Tree Unit (CTU) is partitioned into multiple CUs based on quadtrees/binary/ternary trees to accommodate varying local features. Furthermore, the concept of multiple partition unit types in HEVC is removed, i.e. there is no longer a distinction of CU, Prediction Unit (PU) and Transform Unit (TU) in VVC; instead, each CU is always used as a basic unit for both prediction and transform without further partitioning. In the multi-type tree structure, one CTU is first divided by a quad tree structure. Each leaf node of the quadtree may then be further partitioned by a binary tree and a ternary tree structure. As shown in fig. 5A, 5B, 5C, 5D, and 5E (described below), there are five division types, which are a four-pronged division, a horizontal two-pronged division, a vertical two-pronged division, a horizontal three-pronged division, and a vertical three-pronged division, respectively.

In fig. 1 (described below), spatial prediction and/or temporal prediction may be performed. Spatial prediction (or "intra prediction") uses pixel points from already encoded samples (called reference samples) of neighboring blocks in the same video picture/slice to predict the current video block. Spatial prediction reduces the spatial redundancy inherent in video signals. Temporal prediction (also referred to as "inter prediction" or "motion compensated prediction") uses reconstructed pixels from already encoded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in video signals. The temporal prediction signal for a given CU is typically signaled by one or more Motion Vectors (MVs) that indicate the amount and direction of motion between the current CU and its temporal reference. Further, when a plurality of reference pictures are supported, one reference picture index for identifying from which reference picture in the reference picture library the temporal prediction signal comes is additionally transmitted. After spatial and/or temporal prediction, a mode decision block in the encoder selects the best prediction mode, e.g., based on a rate-distortion optimization method. Then subtracting the prediction block from the current video block; and the prediction residual is decorrelated using transform and quantization.

The quantized residual coefficients are inverse quantized and inverse transformed to form a reconstructed residual, which is then added back to the prediction block to form the reconstructed signal for the CU. Further in-loop filtering, such as deblocking filtering, sample adaptive compensation (SAO), and adaptive in-loop filtering (ALF), may be applied to the reconstructed CU before the reconstructed CU is placed in the reference picture library and used to encode future video blocks. To form the output video bitstream, the coding mode (inter or intra), prediction mode information, motion information and quantized residual coefficients are all sent to an entropy coding unit to be further compressed and packed to form the bitstream.

Fig. 2 (described below) presents a general block diagram of a block-based video decoder. The video bitstream is first entropy decoded in an entropy decoding unit. The coding mode and prediction information are sent to a spatial prediction unit (when intra coded) or a temporal prediction unit (when inter coded) to form a prediction block. The residual transform coefficients are sent to an inverse quantization unit and an inverse transform unit to reconstruct the residual block. The prediction block and the residual block are then added together. The reconstructed block may further undergo in-loop filtering before being stored in the reference picture library. The reconstructed video in the reference picture library is then sent out to drive the display device and used to predict future video blocks.

Fig. 1 shows a typical encoder 100. The encoder 100 has a video input 110, motion compensation 112, motion estimation 114, intra/inter mode decision 116, block predictor 140, adder 128, transform 130, quantization 132, prediction related information 142, intra prediction 118, picture buffer 120, inverse quantization 134, inverse transform 136, adder 126, memory 124, in-loop filter 122, entropy coding 138, and bitstream 144.

Fig. 2 shows a block diagram of a typical decoder 200. The decoder 200 has a bitstream 210, entropy decoding 212, inverse quantization 214, inverse transform 216, adder 218, intra/inter mode selection 220, intra prediction 222, memory 230, in-loop filter 228, motion compensation 224, picture buffer 226, prediction related information 234, and video output 232.

Fig. 3 illustrates an example method 300 for generating inter-frame intra joint prediction (CIIP) in accordance with this disclosure.

At step 310, a first reference picture and a second reference picture associated with the current prediction block are obtained, wherein the first reference picture precedes the current picture and the second reference picture follows the current picture in display order.

At step 312, a first prediction L0 is obtained based on a first motion vector MV0 from the current prediction block to a reference block in a first reference picture.

At step 314, a second prediction L1 is obtained based on a second motion vector MV1 from the current prediction block to a reference block in a second reference picture.

At step 316, it is determined whether a bi-directional optical flow (BDOF) operation is to be applied, wherein the BDOF computes first horizontal and vertical gradient values for predicted samples associated with first prediction L0 and second horizontal and vertical gradient values associated with second prediction L1. For example, the BDOF calculates first horizontal and vertical gradient values for predicted samples associated with a first prediction L0

And

and second horizontal and vertical gradient values associated with the second prediction L1

And

。

in step 318, a bi-prediction of the current prediction block is calculated based on the first and second predictions L0 and L1 and the first and second gradient values. E.g. first gradient value

And

and a second gradient value

And

。

FIG. 4 illustrates an example method for generating CIIPs in accordance with this disclosure. For example, the method includes uni-directional prediction based inter prediction and MPM based intra prediction for generating CIIP.

At step 410, a reference picture in a reference picture list associated with the current prediction block is acquired.

At step 412, an inter prediction is generated based on the first motion vector from the current picture to the first reference picture.

At step 414, the intra prediction mode associated with the current prediction block is obtained.

At step 416, an intra prediction of the current prediction block is generated based on the intra prediction.

At step 418, a final prediction of the current prediction block is generated by averaging the inter prediction and the intra prediction.

At step 420, for Most Probable Mode (MPM) based intra-mode prediction, it is determined whether the current prediction block is considered as an inter-mode or an intra-mode.

Fig. 5A illustrates a diagram showing block quad-partitioned in a multi-type tree structure according to an example of the present disclosure.

Fig. 5B illustrates a diagram showing block vertical binary partitions in a multi-type tree structure, according to an example of the present disclosure.

Fig. 5C illustrates a diagram showing block-level binary partitioning in a multi-type tree structure, according to an example of the present disclosure.

Fig. 5D illustrates a diagram showing block vertical trifurcated partitioning in a multi-type tree structure, according to an example of the present disclosure.

Fig. 5E illustrates a diagram showing block-level trifurcated partitions in a multi-type tree structure, according to an example of the present disclosure.

Inter-frame intra joint prediction

As shown in fig. 1 and 2, inter and intra prediction methods are used in a hybrid video coding scheme, in which each PU is only allowed to select either inter prediction or intra prediction to exploit correlation in the temporal or spatial domain, and never both. However, as indicated in the previous document, residual signals generated by inter-prediction blocks and intra-prediction blocks may exhibit characteristics very different from each other. Thus, when two kinds of prediction can be combined in an efficient manner, a more accurate prediction can be expected for reducing the energy of the prediction residual and thus improving the coding efficiency. Furthermore, in natural video content, the motion of moving objects can be complex. For example, there may be regions that contain both old content (e.g., objects included in previously encoded pictures) and emerging new content (e.g., objects not included in previously encoded pictures). In this case, neither inter prediction nor intra prediction can provide an accurate prediction of the current block.

To further improve prediction efficiency, inter-frame intra joint prediction (CIIP), which combines inter prediction and intra prediction of one CU encoded by a merge mode, is employed in the VVC standard. Specifically, for each merging CU, an additional identification is signaled to indicate whether the current CU enables CIIP. For the luminance component, CIIP supports four commonly used intra modes, including plane Prediction (PLANAR), directional angle prediction (DC), horizontal prediction (HORIZONAL), and VERTICAL prediction (VERTICAL) modes. For the chroma component, DM (i.e., the same intra mode for chroma reuse of the luma component) is always applied without additional signaling. In addition, in the existing CIIP design, weighted averaging is applied to combine inter-predicted samples and intra-predicted samples of one CIIP CU. Specifically, when PLANAR or DC mode is selected, an equal weight (i.e., 0.5) is applied. Otherwise (i.e., applying the HORIZONAL or VERTICAL mode), the current CU is first split horizontally (for the HORIZONAL mode) or vertically (for the VERTICAL mode) into four equal sized regions.

Representing an application as(w_intra _i , w_inter _i )Where i =0 and i = 3 represent the regions closest and farthest to the reconstructed neighboring samples for intra prediction. In current CIIP designs, the values of the weight set are set to(w_intra ₀ , w_inter ₀ ) = (0.75, 0.25)、(w_intra ₁ , w_inter ₁ ) = (0.625, 0.375)、(w_intra ₂ , w_inter ₂ ) = (0.375, 0.625)And(w_intra ₃ , w_inter ₃ ) = (0.25, 0.75). Fig. 6A, 6B, and 6C (described below) provide examples to illustrate the CIIP mode.

Furthermore, in the current VVC operating specification, the intra-mode of one CIIP CU can be used as a predictor to predict the intra-mode of its neighboring CIIP CU through a Most Probable Mode (MPM) mechanism. In particular, for each CIIP CU, when its neighboring blocks are also CIIP CUs, the intra-modes of those neighboring blocks are first rounded to the closest of PLANAR, DC, HORIZONAL, and VERTICAL modes and then added to the MPM candidate list of the current CU. However, when constructing the MPM list for each intra CU, when one of its neighboring blocks is coded by the CIIP mode, then that neighboring block is deemed unavailable, i.e., the intra mode of one CIIP CU is not allowed to be used to predict the intra mode of its neighboring intra CU. Fig. 7A and 7B (described below) compare MPM list generation procedures for intra-CU and CIIP CU.

Bidirectional light stream

Conventional bi-prediction in video coding is a simple combination of two temporally predicted blocks taken from already reconstructed reference pictures. However, due to the limitation of block-based motion compensation, residual small motion may be observed between samples of two prediction blocks, thus reducing the efficiency of motion compensated prediction. To solve this problem, bi-directional optical flow (BDOF) is applied in VVC to reduce the effect of such motion on each sample within a block.

Specifically, as shown in fig. 6A, 6B, and 6C (described below), BDOF is a sample-wise motion refinement performed on top of block-based motion compensated prediction when bi-prediction is used. A 6 x 6 window around the sub-block

After intra-application of BDOF, motion refinement for each 4 × 4 sub-block is calculated by minimizing the difference between L0 and L1 prediction samples

. In particular, the present invention relates to a method for producing,

the values of (d) are derived as follows:

where ⌊ ∙ ⌋ is a floor function; clip3(min, max, x) is at [ min, max]Clipping a function of a given value x within a range; symbol>>Representing a bitwise right shift operation; symbol<<Representing a bit-wise left shift operation;

is a motion refinement threshold that prevents propagation errors caused by irregular local motion, which is equal to

WhereinBDIs the bit depth of the input video. In the step (1), the first step is carried out,

，

。

the values of S1, S2, S3, S5 and S6 were calculated as follows:

，

，

，

，

wherein the content of the first and second substances,

wherein the content of the first and second substances,

is the sample value at coordinate (i, j) of the prediction signal in list k (k =0, 1), which is generated with medium-high precision (i.e. 16 bits);

and

by directly calculating its two phasesThe difference between neighboring samples obtains the horizontal and vertical gradient values of the samples, i.e.,

。

based on the motion refinement derived in (1), the final bidirectional predicted samples for the CU are computed by interpolating the L0/L1 predicted samples along the motion trajectory based on the optical flow model, as indicated by the following equation

Wherein

And

are right shift and offset values that are applied to combine the L0 and L1 prediction signals for bi-directional prediction, equal to 15-BD and 15-BD, respectively

。

Fig. 6A illustrates a diagram showing inter-frame intra joint prediction for horizontal mode according to an example of the present disclosure.

Fig. 6B illustrates a diagram showing inter-frame intra joint prediction for VERTICAL mode according to an example of the present disclosure.

Fig. 6C illustrates a diagram showing inter-frame intra joint prediction for PLANAR and DC modes according to an example of the present disclosure.

Fig. 7A shows a flowchart of an MPM candidate list generation process for an intra CU according to an example of the present disclosure.

Fig. 7B shows a flow diagram of an MPM candidate list generation process for a CIIP CU according to an example of the present disclosure.

Improvements in CIIP

Although CIIP can improve the efficiency of conventional motion compensated prediction, its design can be further improved. Specifically, the following problems in existing CIIP designs in VVCs are identified in this disclosure.

First, as discussed in the "inter-frame intra joint prediction" section, because CIIP combines inter-and intra-predicted samples, each CIIP CU needs to use its reconstructed neighboring samples to generate the prediction signal. This means that the decoding of one CIIP CU depends on the complete reconstruction of its neighboring blocks. Due to this interdependency, for practical hardware implementations, CIIP needs to be performed at the reconstruction stage where neighboring reconstructed samples become available for intra prediction. Since the decoding of CUs in the reconstruction stage has to be performed sequentially (i.e. one after the other), the number of computational operations involved in the CIIP process (e.g. multiplication, addition and bit shifting) cannot be too high in order to ensure a sufficient throughput for real-time decoding.

As mentioned in the "bi-directional optical flow" section, when an inter-coded CU is predicted from two reference blocks in the forward and backward temporal directions, BDOF is enabled to improve the prediction quality. As shown in fig. 8 (described below), in the current VVC, BDOF is also involved to generate inter prediction samples of the CIIP mode. Given the additional complexity of BDOF, such a design may severely reduce the coding/decoding throughput of a hardware codec when CIIP is enabled.

Second, in the current CIIP design, when a CIIP CU refers to a bi-directionally predicted merge candidate, it is necessary to generate motion compensated prediction signals in lists L0 and L1. When one or more MVs are not integer precision, an additional interpolation process must be invoked to interpolate samples at fractional sample positions. Such a process not only increases computational complexity, but also memory bandwidth because more reference samples need to be accessed from external memory.

Third, as discussed in the "inter-frame intra joint prediction" section, in current CIIP designs, the intra-mode of a CIIP CU and the intra-mode of an intra-CU are treated differently when building the MPM lists of their neighboring blocks. Specifically, when a current CU is encoded by the CIIP mode, its neighboring CIIP CUs are considered as intra-frames, i.e., the intra-frames mode of the neighboring CIIP CUs may be added to the MPM candidate list. However, when the current CU is encoded by intra mode, its neighboring CIIP CUs are considered as inter, i.e., the intra mode of the neighboring CIIP CU is not included in the MPM candidate list. This non-uniform design may not be optimal for the final version of the VVC standard.

Fig. 8 illustrates a diagram showing a workflow of an existing CIIP design in a VVC, according to an example of the present disclosure.

Simplified CIIP

In the present disclosure, methods are provided that simplify existing CIIP designs to facilitate hardware codec implementations. In general, the main aspects of the technology presented in this disclosure are summarized below.

First, in order to improve CIIP encoding/decoding throughput, it is proposed to exclude BDOF from the generation of inter-frame prediction samples in CIIP mode.

Second, in order to reduce computational complexity and storage bandwidth consumption, when one CIIP CU is bi-directionally predicted (i.e., has L0 MV and L1 MV), a method of converting a block from bi-directional prediction to uni-directional prediction to generate inter prediction samples is proposed.

Third, two methods are proposed to coordinate intra modes of a CIIP CU and an intra CU when forming MPM candidates for neighboring blocks of the CU.

CIIP without BDOF

As noted in the "problem statement" section, BDOF is always enabled to generate inter prediction samples for CIIP mode when the current CU is bi-predicted. Due to the additional complexity of BDOF, existing CIIP designs can significantly reduce encoding/decoding throughput, especially making real-time decoding difficult for VVC decoders. On the other hand, for CIIP CUs, their final predicted samples are generated by averaging the inter-predicted samples and the intra-predicted samples. In other words, the predicted samples refined by BDOF will not be used directly as the prediction signal for CIIP CU. Thus, the corresponding improvements obtained from BDOF are less efficient for CIIP CUs than for conventional bi-predictive CUs (where BDOF is applied directly to generate the prediction samples). Therefore, based on the above considerations, it is proposed to disable BDOF when generating inter-prediction samples for CIIP mode. Fig. 9 (described below) shows the corresponding workflow of the proposed CIIP process after the BDOF removal.

Fig. 9 illustrates a diagram showing a workflow of the CIIP method proposed by removing BDOF according to an example of the present disclosure.

CIIP based on unidirectional prediction

As discussed above, when the merging candidates referred to by one CIIP CU are bidirectionally predicted, both L0 and L1 prediction signals are generated to predict samples within the CU. To reduce memory bandwidth and interpolation complexity, in one embodiment of the present disclosure, it is proposed to use only inter-predicted samples generated with uni-directional prediction (even when the current CU is bi-directionally predicted) in combination with intra-predicted samples in the CIIP mode. Specifically, when the current CIIP CU is predicted uni-directionally, inter-prediction samples will be directly combined with intra-prediction samples. Otherwise (i.e., the current CU is bi-predicted), inter prediction samples used by the CIIP are generated based on uni-directional prediction from one prediction list (L0 or L1). To select the prediction list, different methods may be applied. In the first approach, it is proposed to always select the first prediction (i.e., list L0) for any CIIP block predicted by two reference pictures.

In the second approach, it is proposed to always select the second prediction (i.e., list L1) for any CIIP block predicted by two reference pictures. In a third approach, an adaptive approach is applied, where a prediction list associated with one reference picture is selected, the reference picture having a smaller Picture Order Count (POC) distance from the current picture. Fig. 10 (described below) illustrates a unidirectional prediction based CIIP workflow where the prediction list is selected based on POC distances.

Finally, in the last approach, it is proposed to enable the CIIP mode only when the current CU is predicted unidirectionally. Furthermore, to reduce overhead, the signaling of the CIIP enable/disable identification depends on the prediction direction of the current CIIP CU. When the current CU is predicted uni-directionally, a CIIP identification will be signaled in the bitstream to indicate whether CIIP is enabled or disabled. Otherwise (i.e. the current CU is bi-directionally predicted), the signaling of the CIIP identity will be skipped and always inferred to be false, i.e. CIIP is always disabled.

Fig. 10 illustrates a diagram showing a workflow of unidirectional prediction based CIIP of a POC distance based selection prediction list according to one example of the present disclosure.

Coordination of intra modes for CIIP CU and intra CU for MPM candidate list construction

As discussed above, current CIIP designs are not uniform in how the intra-modes of a CIIP CU and an intra-CU are used to form the MPM candidate list of their neighboring blocks. In particular, the intra-modes of both the CIIP CU and the intra-CU may predict the intra-modes of their neighboring blocks encoded in CIIP mode. However, only the intra mode of the intra CU can predict the intra mode of the intra CU. To achieve a more uniform design, this section proposes two methods to coordinate the use of intra modes for CIIP CUs and intra CUs in MPM list construction.

In the first approach, for MPM list construction, it is proposed to treat the CIIP mode as inter mode. Specifically, when generating an MPM list for a CIIP CU or an intra CU, its neighboring blocks are marked as unavailable in intra mode when they are coded in CIIP mode. In this way, intra modes without CIIP blocks can be used to construct the MPM list. In contrast, in the second method, for MPM list construction, it is suggested to treat the CIIP mode as an intra mode. Specifically, in this method, the intra mode of a CIIP CU may predict the intra modes of its neighboring CIIP blocks and intra blocks. Fig. 11A and 11B (described below) show an MPM candidate list generation process when the above-described two methods are applied.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise examples described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. It is intended that the scope of the disclosure be limited only by the claims appended hereto.

Fig. 11A illustrates a flow diagram of a method when generating CIIP-enabled blocks for an MPM candidate list according to an example of the present disclosure.

Fig. 11B illustrates a flow chart of a method when generating a forbidden CIIP block for an MPM candidate list according to an example of the present disclosure.

FIG. 12 illustrates a computing environment 1210 coupled with a user interface 1260. The computing environment 1210 may be part of a data processing server. Computing environment 1210 includes a processor 1220, memory 1240, and I/O interfaces 1250.

The processor 1220 typically controls the overall operation of the computing environment 1210, such as operations associated with display, data acquisition, data communication, and picture processing. Processor 1220 may include one or more processors to execute instructions to perform all or some of the steps of the methods described above. Further, the processor 1220 may include one or more circuits that facilitate interaction between the processor 1220 and other components. The processor may be a Central Processing Unit (CPU), microprocessor, single-chip, GPU, etc.

The memory 1240 is configured to store various types of data to support the operation of the computing environment 1210. Examples of such data include instructions for any application or method operating on computing environment 1210, video data, picture data, and so forth. The memory 1240 may be implemented using any type or combination of volatile or non-volatile storage devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, a magnetic disk, or an optical disk.

I/O interface 1250 provides an interface between processor 1220 and peripheral interface modules, such as a keyboard, click wheel, buttons, and the like. These buttons may include, but are not limited to, a home button, a start scan button, and a stop scan button. The I/O interface 1250 may be coupled with an encoder and a decoder.

In an embodiment, a non-transitory computer readable storage medium comprising a plurality of programs, such as included in memory 1240, executable by processor 1220 in computing environment 1210, for performing the above-described methods is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium has stored therein a plurality of programs for execution by a computing device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the computing device to perform the above-described method for motion prediction.

In an embodiment, the computing environment 1210 may be implemented with one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), Graphics Processing Units (GPUs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.

Claims

1. A method of video decoding, the method comprising:

obtaining a first reference picture and a second reference picture associated with a current encoding block of a current picture, wherein the first reference picture precedes the current picture and the second reference picture follows the current picture in display order;

obtaining a first prediction based on a first motion vector from the current coding block to a reference block in the first reference picture;

obtaining a second prediction based on a second motion vector from the current coding block to a reference block in the second reference picture; and

calculating a bi-prediction for the current coding block based at least on the first prediction and the second prediction, comprising: in response to determining that inter-frame intra joint prediction is not applied to compute bi-directional prediction for the current coding block, enabling bi-directional optical flow (BDOF) in computing bi-directional prediction for the current coding block.

2. The method of claim 1, wherein calculating a bi-prediction for the current coding block based at least on the first prediction and the second prediction further comprises:

in response to determining to apply inter-frame intra joint prediction to compute bi-prediction for the current coding block, disabling BDOF in computing bi-prediction for the current coding block.

3. The method of claim 2, in response to determining to apply inter-intra joint prediction to compute a bi-prediction for the current coding block, computing a bi-prediction for the current coding block further comprising:

calculating a bi-prediction for the current coding block based on averaging the first prediction and the second prediction.

4. The method of claim 1, in response to determining that inter-intra joint prediction is not applied to compute bi-prediction for the current coding block, computing bi-prediction for the current coding block further comprises:

calculating first horizontal and first vertical gradient values, respectively, for predicted samples associated with the first prediction

And

and calculating second horizontal and second vertical gradient values of predicted samples associated with the second prediction, respectively

And

wherein, in the step (A),

is associated with the first prediction at a sample point position

Predicted sample points of (c), and

is associated with the second prediction at a sample point position

A predicted sample point of (c); and

calculating a bi-prediction for the current coding block based on the first prediction, the second prediction, the first horizontal gradient value, the first vertical gradient value, the second horizontal gradient value, and the second vertical gradient value.

5. The method of claim 4, in response to determining that inter-intra joint prediction is not applied to compute bi-prediction for the current coding block, computing bi-prediction for the current coding block further comprises:

calculating a motion correction for each sub-block by minimizing a difference between predicted samples of the first prediction and the second prediction, an

And calculating the bidirectional prediction of the current coding block based on the motion correction, the first horizontal gradient value, the first vertical gradient value, the second horizontal gradient value, the second vertical gradient value, the first prediction and the second prediction.

6. The method of claim 5, in response to determining that inter-intra joint prediction is not applied to compute bi-prediction for the current coding block, computing bi-prediction for the current coding block further comprises:

calculating a BDOF value based on the motion correction, the first horizontal gradient value, the first vertical gradient value, the second horizontal gradient value, and the second vertical gradient value;

and calculating the bidirectional prediction of the current coding block based on the BDOF value and the first prediction and the second prediction.

7. A video decoding device comprising one or more processors and one or more memories coupled to the one or more processors, the video decoding device configured to perform operations comprising:

8. The video coding and decoding apparatus of claim 7, wherein calculating a bi-prediction for the current coding block based at least on the first prediction and the second prediction further comprises: in response to determining to apply inter-frame intra joint prediction to compute bi-prediction for the current coding block, disabling BDOF in computing bi-prediction for the current coding block.

9. The video coding and decoding apparatus of claim 8, in response to determining to apply inter-intra joint prediction to compute a bi-prediction for the current coding block, computing a bi-prediction for the current coding block further comprising:

10. The video coding and decoding apparatus of claim 7, in response to determining that inter-intra joint prediction is not applied to compute a bi-prediction for the current coding block, computing a bi-prediction for the current coding block further comprises:

And

And

wherein, in the step (A),

is as described aboveAssociated with the first prediction, at the location of the sample point

Predicted sample points of (c), and

is associated with the second prediction at a sample point position

A predicted sample point of (c); and

11. The video coding and decoding apparatus of claim 10, in response to determining that inter-intra joint prediction is not applied to compute bi-prediction for the current coding block, computing bi-prediction for the current coding block further comprises:

12. The video coding and decoding device of claim 11, wherein in response to determining that inter-intra joint prediction is not applied to compute bi-prediction for the current coding block, computing bi-prediction for the current coding block further comprises:

13. A non-transitory computer readable storage medium storing a plurality of programs for execution by a computing device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the computing device to perform the method of any of claims 1-6.

14. A computer program product comprising instructions which, when executed by a processor, carry out the method according to any one of claims 1-6.