CN117294842A

CN117294842A - Video encoding and decoding method, apparatus, and non-transitory computer readable storage medium

Info

Publication number: CN117294842A
Application number: CN202311175593.0A
Authority: CN
Inventors: 修晓宇; 陈漪纹; 王祥林
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-01-09
Filing date: 2020-01-09
Publication date: 2023-12-26
Also published as: CN116800962A; CN113676733B; CN113676733A; CN113542748B; CN113542748A; CN117014615B; CN116347102A; CN116347102B; CN117014615A

Abstract

The present disclosure relates to a method for video encoding and decoding. The method includes obtaining a first reference picture and a second reference picture associated with a current prediction block, obtaining a first prediction L0 based on a first motion vector MV0 from the current prediction block to a reference block in the first reference picture, obtaining a second prediction L1 based on a second motion vector MV1 from the current prediction block to a reference block in the second reference picture, determining whether to apply a bidirectional optical flow (BDOF) operation, and calculating a bidirectional prediction of the current prediction block based on the first prediction L0 and the second prediction L1 and the first gradient value and the second gradient value.

Description

Video encoding and decoding method, apparatus, and non-transitory computer readable storage medium

Cross Reference to Related Applications

The present application is based on and claims priority from provisional application No. 62/790421 filed on 1/9 at 2019, the entire contents of which are incorporated herein by reference.

Technical Field

This application relates to video codec and compression. More particularly, the present application relates to methods and apparatus relating to inter-frame intra-prediction (CIIP) methods for video coding.

Background

Various video codec techniques may be used to compress video data. Video encoding and decoding are performed according to one or more video encoding and decoding standards. For example, video coding standards include general video coding (VVC), joint exploration test model (JEM), high efficiency video coding (h.265/HEVC), advanced video coding (h.264/AVC), motion Picture Experts Group (MPEG) coding, and the like. Video codecs typically use prediction methods (e.g., inter-prediction, intra-prediction, etc.) that exploit redundancy present in video pictures or sequences. An important goal of video codec technology is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradation of video quality.

Disclosure of Invention

Examples of the present disclosure provide methods for improving the efficiency of semantic signaling of merge (merge) related modes.

According to a first aspect of the present disclosure, a video encoding and decoding method includes: acquiring a first reference picture and a second reference picture associated with a current prediction block, wherein the first reference picture precedes the current picture and the second reference picture follows the current picture in display order, based on a first motion vector MV0 from the current prediction block to a reference block in the first reference pictureA first prediction L0, a second prediction L1 is acquired based on a second motion vector MV1 from a current prediction block to a reference block in a second reference picture, and whether a bidirectional optical flow (BDOF) operation is applied is determined, wherein the BDOF calculates first horizontal and vertical gradient values of prediction samples associated with the first prediction L0And->And a second horizontal and vertical gradient value associated with a second prediction L1 +.>And->And based on the first prediction L0 and the second prediction L1 and the first gradient value +.>And->And a second gradient value->And->A bi-prediction of the current prediction block is calculated.

According to a second aspect of the present disclosure, a video encoding and decoding method includes: obtaining a reference picture in a reference picture list associated with a current prediction block, generating inter prediction based on a first motion vector from the current picture to a first reference picture, obtaining an intra prediction mode associated with the current prediction block, generating intra prediction of the current prediction block based on the intra prediction, generating a final prediction of the current prediction block by averaging the inter prediction and the intra prediction, and determining, for intra mode prediction based on a Most Probable Mode (MPM), whether the current prediction block is considered as an inter mode or an intra mode.

According to a third aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having instructions stored therein. When executed by one or more processors, cause a computing device to perform operations comprising: acquiring a first reference picture and a second reference picture associated with a current prediction block, wherein the first reference picture precedes the current picture and the second reference picture follows the current picture in display order, acquiring a first prediction L0 based on a first motion vector MV0 from the current prediction block to a reference block in the first reference picture, acquiring a second prediction L1 based on a second motion vector MV1 from the current prediction block to a reference block in the second reference picture, determining whether to apply a bidirectional optical flow (BDOF) operation, wherein the BDOF calculates first horizontal and vertical gradient values of prediction samples associated with the first prediction L0And->And a second horizontal and vertical gradient value associated with a second prediction L1 +.>And->And calculates bi-prediction of the current prediction block.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having instructions stored therein. When executed by one or more processors, cause a computing device to perform operations comprising: obtaining a reference picture in a reference picture list associated with a current prediction block, generating inter prediction based on a first motion vector from the current picture to a first reference picture, obtaining an intra prediction mode associated with the current prediction block, generating intra prediction of the current prediction block based on the intra prediction, generating a final prediction of the current prediction block by averaging the inter prediction and the intra prediction, and determining whether the current prediction block is considered as an inter mode or an intra mode for intra mode prediction based on a Most Probable Mode (MPM).

It is to be understood that both the foregoing general description and the following detailed description are exemplary, but are not restrictive, of the present disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate examples consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a block diagram of an encoder according to an example of the present disclosure.

Fig. 2 is a block diagram of a decoder according to an example of the present disclosure.

Fig. 3 is a flow chart illustrating a method for generating inter-frame intra-combined prediction (CIIP) according to an example of the present disclosure.

Fig. 4 is a flow chart illustrating a method for generating CIIP in accordance with an example of the present disclosure.

Fig. 5A is a diagram illustrating block partitioning in a multi-type tree structure, according to an example of the present disclosure.

Fig. 5B is a diagram illustrating block partitioning in a multi-type tree structure, according to an example of the present disclosure.

Fig. 5C is a diagram illustrating block partitioning in a multi-type tree structure, according to an example of the present disclosure.

Fig. 5D is a diagram illustrating block partitioning in a multi-type tree structure, according to an example of the present disclosure.

Fig. 5E is a diagram illustrating block partitioning in a multi-type tree structure, according to an example of the present disclosure.

Fig. 6A is a diagram illustrating inter-frame intra-frame joint prediction (CIIP) according to an example of the present disclosure.

Fig. 6B is a diagram illustrating inter-frame intra-frame joint prediction (CIIP) according to an example of the present disclosure.

Fig. 6C is a diagram illustrating inter-frame intra-frame joint prediction (CIIP) according to an example of the present disclosure.

Fig. 7A is a flowchart of an MPM candidate list generation procedure according to an example of the present disclosure.

Fig. 7B is a flowchart of an MPM candidate list generation procedure according to an example of the present disclosure.

Fig. 8 is a diagram illustrating a workflow of an existing CIIP design in a VVC, according to an example of the present disclosure.

Fig. 9 is a diagram illustrating a workflow of a CIIP method proposed by removing BDOF according to an example of the present disclosure.

Fig. 10 is a diagram illustrating a workflow of unidirectional prediction-based CIIP in which a prediction list is selected based on POC distance according to an example of the present disclosure.

Fig. 11A is a flowchart of a method when generating an enabled CIIP block for an MPM candidate list according to an example of the disclosure.

Fig. 11B is a flowchart of a method when a disabled CIIP block is generated for an MPM candidate list according to an example of the present disclosure.

FIG. 12 is a diagram illustrating a computing environment coupled with a user interface according to an example of the present disclosure.

Detailed Description

Reference will now be made in detail to examples of the present disclosure, some of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements, unless otherwise indicated. The embodiments set forth in the following description of examples of the present disclosure do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with aspects related to the disclosure recited in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the present disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein is intended to mean and include any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms "first," "second," "third," etc. may be used herein to describe various information, the information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, the first information may be referred to as second information without departing from the scope of the present disclosure; and similarly, the second information may also be referred to as the first information. As used herein, the term "if" may be understood as meaning "when..once..or" in response to a judgment "depending on the context.

The first release of the HEVC standard was finalized in 10 months 2013, which provides about 50% bit rate savings or equivalent perceived quality as compared to the previous generation video codec standard h.264/MPEG AVC. Although the HEVC standard provides significant codec improvements over its predecessor, there is evidence that higher codec efficiencies than HEVC can be achieved using additional codec tools. Based on this, both VCEG and MPEG began the exploration of new codec technologies for future video codec standardization. 10 months 2015, ITU-T VECG and ISO/IEC MPEG established a joint video exploration team (jfet) and began an important study of advanced technologies that could greatly improve codec efficiency. By integrating several additional codec tools on top of the HEVC test model (HM), the jfet maintains a reference software called Joint Exploration Model (JEM).

In month 10 2017, ITU-T and ISO/IEC promulgated joint proposal symptoms (CfP) for video compression with capabilities exceeding HEVC. At month 4 of 2018, at the 10 th jfet conference, 23 CfP responses were received and evaluated, which exhibited a compression efficiency gain of about 40% higher than that of HEVC. Based on such evaluation results, jfet has launched a new project to develop a new generation video codec standard, named universal video codec (VVC). In the same month, a reference software code library, called a VVC Test Model (VTM), was created for exposing reference implementations of the VVC standard.

Like HEVC, VVC is built on a block-based hybrid video codec framework. Fig. 1 (described below) presents a block diagram of a generic block-based hybrid video codec system. The input video signal is processed block by block, called a Coding Unit (CU). In VTM-1.0, one CU may reach 128×128 pixels. However, unlike HEVC, which partitions blocks based on quadtrees only, in VVC, one Coding Tree Unit (CTU) is partitioned into multiple CUs based on quadtrees/binary/trigeminal trees to accommodate varying local features. Furthermore, the concept of multiple partition unit types in HEVC is removed, i.e. no distinction of CU, prediction Unit (PU) and Transform Unit (TU) is present anymore in VVC; instead, each CU is always used as a base unit for both prediction and transformation, without further partitioning. In a multi-type tree structure, one CTU is first divided by a quadtree structure. Each quadtree leaf node may then be further partitioned by a binary tree and a trigeminal tree structure. As shown in fig. 5A, 5B, 5C, 5D, and 5E (described below), there are five division types, respectively, a quad division, a horizontal binary division, a vertical binary division, a horizontal trigeminal division, and a vertical trigeminal division.

In fig. 1 (described below), spatial prediction and/or temporal prediction may be performed. Spatial prediction (or "intra prediction") predicts a current video block using pixels from samples (referred to as reference samples) of neighboring blocks already encoded in the same video picture/slice. Spatial prediction reduces the spatial redundancy inherent in video signals. Temporal prediction (also referred to as "inter prediction" or "motion compensated prediction") uses reconstructed pixels from already encoded video pictures to predict a current video block. Temporal prediction reduces the inherent temporal redundancy in video signals. The temporal prediction signal of a given CU is typically signaled by one or more Motion Vectors (MVs) that indicate the amount and direction of motion between the current CU and its temporal reference. In addition, when a plurality of reference pictures are supported, one reference picture index for identifying from which reference picture in the reference picture library the temporal prediction signal comes is additionally transmitted. After spatial and/or temporal prediction, a mode decision block in the encoder selects the best prediction mode, e.g. based on a rate-distortion optimization method. Then subtracting the predicted block from the current video block; and decorrelates the prediction residual using transformation and quantization.

The quantized residual coefficients are inverse quantized and inverse transformed to form a reconstructed residual, which is then added back to the prediction block to form a reconstructed signal of the CU. Further in-loop filtering, such as deblocking filtering, sample adaptive compensation (SAO), and adaptive in-loop filtering (ALF), may be applied to the reconstructed CU before it is placed in the reference picture library and used to encode future video blocks. To form the output video bitstream, the coding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to an entropy encoding unit to be further compressed and packed to form the bitstream.

Fig. 2 (described below) presents a general block diagram of a block-based video decoder. The video bitstream is first entropy decoded in an entropy decoding unit. The coding mode and the prediction information are sent to a spatial prediction unit (when intra-coded) or a temporal prediction unit (when inter-coded) to form a prediction block. The residual transform coefficients are sent to an inverse quantization unit and an inverse transform unit to reconstruct the residual block. The prediction block and the residual block are then added together. The reconstructed block may further undergo in-loop filtering before being stored in the reference picture library. The reconstructed video in the reference picture library is then sent out to drive the display device and used to predict future video blocks.

Fig. 1 shows a typical encoder 100. Encoder 100 has a video input 110, motion compensation 112, motion estimation 114, intra/inter mode decision 116, block predictor 140, adder 128, transform 130, quantization 132, prediction related information 142, intra prediction 118, picture buffer 120, inverse quantization 134, inverse transform 136, adder 126, memory 124, in-loop filter 122, entropy coding 138, and bitstream 144.

Fig. 2 shows a block diagram of a typical decoder 200. Decoder 200 has a bitstream 210, entropy decoding 212, inverse quantization 214, inverse transform 216, adder 218, intra/inter mode selection 220, intra prediction 222, memory 230, in-loop filter 228, motion compensation 224, picture buffer 226, prediction related information 234, and video output 232.

Fig. 3 illustrates an example method 300 for generating inter-frame intra-joint prediction (CIIP) in accordance with this disclosure.

In step 310, a first reference picture and a second reference picture associated with a current prediction block are acquired, wherein the first reference picture precedes the current picture and the second reference picture follows the current picture in display order.

In step 312, a first prediction L0 is obtained based on a first motion vector MV0 from the current prediction block to a reference block in a first reference picture.

In step 314, a second prediction L1 is obtained based on a second motion vector MV1 from the current prediction block to a reference block in a second reference picture.

At step 316, it is determined whether a bidirectional optical flow (BDOF) operation is applied, where BDOF calculates first horizontal and vertical gradient values for the prediction samples associated with the first prediction L0 and second horizontal and vertical gradient values associated with the second prediction L1. For example, BDOF calculates first horizontal and vertical gradient values for the prediction samples associated with the first prediction L0Andand a second horizontal and vertical gradient value associated with a second prediction L1 +.>And->

In step 318, bi-prediction of the current prediction block is calculated based on the first prediction L0 and the second prediction L1 and the first gradient value and the second gradient value. For example, a first gradient valueAnd->And a second gradient value->And

fig. 4 illustrates an example method for generating CIIP in accordance with this disclosure. For example, the method includes unidirectional prediction-based inter prediction and MPM-based intra prediction for generating CIIP.

In step 410, a reference picture in a reference picture list associated with a current prediction block is acquired.

At step 412, inter-prediction is generated based on a first motion vector from the current picture to a first reference picture.

At step 414, the intra prediction mode associated with the current prediction block is obtained.

At step 416, intra prediction of the current prediction block is generated based on the intra prediction.

At step 418, the final prediction of the current prediction block is generated by averaging the inter prediction and intra prediction.

At step 420, for intra mode prediction based on a Most Probable Mode (MPM), it is determined whether the current prediction block is considered as an inter mode or an intra mode.

Fig. 5A illustrates a diagram showing block quadtree partitions in a multi-type tree structure, according to an example of the present disclosure.

Fig. 5B illustrates a diagram showing block vertical binary partitions in a multi-type tree structure, according to an example of the present disclosure.

Fig. 5C illustrates a diagram showing block-level binary partitions in a multi-type tree structure, according to an example of the present disclosure.

Fig. 5D illustrates a diagram showing block vertical trifurcated partitions in a multi-type tree structure, according to an example of the present disclosure.

Fig. 5E illustrates a diagram showing block horizontal trifurcated partitions in a multi-type tree structure, according to an example of the present disclosure.

Inter-frame intra-frame joint prediction

As shown in fig. 1 and 2, inter and intra prediction methods are used in a hybrid video codec scheme, where each PU is only allowed to select either inter prediction or intra prediction to exploit correlation in the temporal or spatial domain, and never both. However, as indicated in the previous document, residual signals generated by inter-prediction blocks and intra-prediction blocks may exhibit very different characteristics from each other. Thus, when two predictions can be combined in an efficient manner, a more accurate prediction can be expected for reducing the energy of the prediction residual and thus improving the codec efficiency. Furthermore, in natural video content, the movement of moving objects may be complex. For example, there may be regions that contain both old content (e.g., objects included in previously encoded pictures) and new content that appears (e.g., objects not included in previously encoded pictures). In this case, neither inter prediction nor intra prediction can provide one accurate prediction of the current block.

To further improve the prediction efficiency, inter-frame intra-prediction (CIIP), which combines inter-prediction and intra-prediction of one CU encoded by a merge mode, is employed in the VVC standard. Specifically, for each merged CU, an additional flag is signaled to indicate whether or not the current CU enables CIIP. For the luminance component, CIIP supports four common intra modes including PLANAR Prediction (PLANAR), direction angle prediction (DC), horizontal prediction (HORIZONAL), and VERTICAL prediction (verrtial) modes. For the chrominance components, DM is always applied (i.e. the same intra-mode for chrominance reuse luminance components) without additional signaling. In addition, in existing CIIP designs, a weighted average is applied to combine inter-prediction samples and intra-prediction samples of one CIIP CU. Specifically, when the PLANAR or DC mode is selected, equal weights (i.e., 0.5) are applied. Otherwise (i.e., applying HORIZONAL or verrtial modes), the current CU is first partitioned into four equally sized regions, either horizontally (for HORIZONAL modes) or vertically (for verrtial modes).

The application is expressed as (w_intra) _i ，w_inter _i ) Combining different regions by four weight sets of (a)Inter and intra prediction samples in the domain, where i=0 and i=3 represent the regions closest and farthest from neighboring samples for reconstruction of intra prediction. In the current CIIP design, the value of the weight set is set to (w_intra) ₀ ，w_inter ₀ )＝(0.75，0.25)、(w_intra ₁ ，w_inter ₁ )＝(0.625，0.37.5)、(w_intra ₂ ，w_inter ₂ ) = (0.375,0.625) and (w_intra) ₃ ，w_inter ₃ ) = (0.25,0.75). Fig. 6A, 6B, and 6C (described below) provide examples to illustrate the CIIP mode.

Furthermore, in the current VVC operating specification, the intra mode of one CIIP CU may be used as a predictor to predict the intra mode of its neighboring CIIP CU by a Most Probable Mode (MPM) mechanism. Specifically, for each CIIP CU, when its neighboring blocks are also CIIP CUs, the intra modes of those neighboring blocks are first rounded to the nearest of PLANAR, DC, HORIZONAL and VERTICAL modes and then added to the MPM candidate list for the current CU. However, when an MPM list is built for each intra CU, when one of its neighboring blocks is coded by the CIIP mode, then that neighboring block is deemed unusable, i.e. the intra mode of one CIIP CU is not allowed to be used to predict the intra mode of its neighboring intra CU. Fig. 7A and 7B (described below) compare the MPM list generation procedure of the intra CU and the CIIP CU.

Bidirectional optical flow

Conventional bi-prediction in video codec is a simple combination of two temporal prediction blocks taken from reference pictures that have been reconstructed. However, due to the limitations of block-based motion compensation, small residual motion can be observed between samples of two prediction blocks, thus reducing the efficiency of motion-compensated prediction. To solve this problem, bidirectional optical flow (BDOF) is applied in the VVC to reduce the effect of such motion on each sample within a block.

In particular, as shown in fig. 6A, 6B, and 6C (described below), BDOF is a sample-by-sample motion refinement performed over block-based motion compensated prediction when bi-directional prediction is used. After BDOF is applied within a 6 x 6 window Ω around the sub-blockMotion refinement (v) for each 4 x 4 sub-block is calculated by minimizing the difference between L0 and L1 prediction samples _x ，v _y ). Specifically, (v) _x ，v _y ) The values of (2) are derived as follows:

wherein the method comprises the steps ofIs a floor function; clip3 (min, max, x) is at [ min, max]Clipping a function of a given value x in a range; the symbol > represents a bitwise right shift operation; the symbol < represents a bit-wise left shift operation; th (th) _BDOF Is a motion refinement threshold that prevents propagation errors caused by irregular local motion, which is equal to 2 ^13-BD Where BD is the bit depth of the input video. In (1), the->

The values of S1, S2, S3, S5 and S6 are calculated as follows:

S ₁ ＝∑ _(i，j)∈Ω ψ _x (i，j)·ψ _x (i，j)，S ₃ ＝∑ _(i，j)∈Ω θ(i，j)·ψ _x (i，j)，

S ₂ ＝∑ _(i，j)∈Ω ψ _x (i，j)·ψ _y (i，j)，S ₅ ＝∑ _(i，j)∈Ω ψ _y (i，j)·ψ _y (i，j)，

S ₆ ＝∑ _(i，j)∈Ω θ(i，j)·ψ _y (i，j)

wherein,

θ(i，j)＝(I ⁽¹⁾ (i，j)＞＞6)-(I ⁽⁰⁾ (i，j)＞＞6)

wherein I is ^(k) (i, j) is the sample value at the coordinates (i, j) of the prediction signal in the list k (k=0, 1), which is generated with moderately high precision (i.e. 16 bits);and->Is the horizontal and vertical gradient values of the sample point obtained by directly calculating the difference between two adjacent sample points thereof, that is,

calculating a final bi-prediction sample of the CU by interpolating the L0/L1 prediction samples along the motion trajectory based on the optical flow model based on the motion refinement derived in (1), as indicated by the following equation

pred _BDOF (x，y)＝(I ⁽⁰⁾ (x，y)+I ⁽¹⁾ (x，y)+b+o _offset )＞＞shift

Wherein shift and o _offset Is the right shift value and offset value that are applied to combine the L0 and L1 prediction signals for bi-prediction equal to 15-BD and 1 < (14-BD) +2· (1 < 13), respectively.

Fig. 6A illustrates a diagram showing inter-frame intra-frame joint prediction for horizontal mode according to an example of the present disclosure.

Fig. 6B illustrates a diagram showing inter-frame intra-frame joint prediction for the verrtial mode according to an example of the present disclosure.

Fig. 6C illustrates a diagram showing inter-frame intra-frame joint prediction for both the planet and DC modes, according to an example of the present disclosure.

Fig. 7A shows a flowchart of an MPM candidate list generation procedure of an intra CU according to an example of the present disclosure.

Fig. 7B shows a flowchart of an MPM candidate list generation procedure of a CIIP CU according to an example of the present disclosure.

Improvements in CIIP

While CIIP may improve the efficiency of conventional motion compensated prediction, its design may be further improved. Specifically, the following problems in existing CIIP designs in VVCs are identified in this disclosure.

First, as discussed in the section "inter-intra joint prediction," because CIIP combines samples of inter and intra predictions, each CIIP CU needs to use its reconstructed neighboring samples to generate a prediction signal. This means that the decoding of one CIIP CU depends on the complete reconstruction of its neighboring blocks. Because of this interdependence, for practical hardware implementations, CIIP needs to be performed at the reconstruction stage where neighboring reconstruction samples become available for intra prediction. Since the decoding of the CUs in the reconstruction stage has to be performed sequentially (i.e. one after the other), the number of computational operations involved in the CIIP process (e.g. multiplications, additions and bit shifts) cannot be too high in order to ensure a sufficient throughput for real-time decoding.

As mentioned in the section "bidirectional optical flow", BDOF is enabled to improve the prediction quality when predicting one inter-coded CU from two reference blocks in the forward and backward temporal directions. As shown in fig. 8 (described later), in the current VVC, BDOF is also referred to generate inter-prediction samples of the CIIP mode. In view of the additional complexity of BDOF, such a design may severely reduce the encoding/decoding throughput of the hardware codec when CIIP is enabled.

Second, in the current CIIP design, when one CIIP CU references one bi-directionally predicted merge candidate, it is necessary to generate motion compensated prediction signals in the lists L0 and L1. When one or more MVs are not of integer precision, an additional interpolation process must be invoked to interpolate samples at fractional sample positions. This process not only increases computational complexity, but also increases memory bandwidth because more reference samples need to be accessed from external memory.

Third, as discussed in the section "inter-intra-frame joint prediction", in the current CIIP design, the intra-frame mode of the CIIP CU and the intra-frame mode of the intra-frame CU are treated differently when constructing MPM lists of its neighboring blocks. Specifically, when one current CU is encoded by the CIIP mode, its neighboring CIIP CU is considered as intra, i.e., the intra mode of the neighboring CIIP CU may be added to the MPM candidate list. However, when the current CU is coded by intra mode, its neighboring CIIP CUs are considered to be inter frames, i.e., the intra modes of neighboring CIIP CUs are not included in the MPM candidate list. This non-uniform design may not be optimal for the final version of the VVC standard.

Fig. 8 shows a diagram illustrating a workflow of an existing CIIP design in a VVC, according to an example of the present disclosure.

Simplifying CIIP

In this disclosure, a method of simplifying existing CIIP designs to facilitate hardware codec implementations is provided. In general, the main aspects of the technology presented in this disclosure are summarized below.

First, in order to improve the CIIP encoding/decoding throughput, it is suggested to exclude BDOF from generation of inter-prediction samples in CIIP mode.

Second, in order to reduce computational complexity and memory bandwidth consumption, when one CIIP CU is bi-directionally predicted (i.e., has L0 MV and L1 MV), a method of converting a block from bi-directional prediction to uni-directional prediction to generate inter-prediction samples is proposed.

Third, two methods are proposed to coordinate intra modes of CIIP CUs and intra CUs when MPM candidates for neighboring blocks of the CU are formed.

CIIP without BDOF

As indicated in the "problem statement" section, BDOF is always enabled to generate inter-prediction samples for CIIP mode when the current CU is bi-directionally predicted. Due to the additional complexity of BDOF, existing CIIP designs may significantly reduce encoding/decoding throughput, especially making real-time decoding difficult for VVC decoders. On the other hand, for CIIP CUs, their final prediction samples are generated by averaging the inter-prediction samples and the intra-prediction samples. In other words, the prediction samples refined by BDOF will not be directly used as the prediction signal for CIIP CU. Thus, the corresponding improvement obtained from BDOF is less efficient for CIIP CUs than for conventional bi-predictive CUs, where BDOF is directly applied to generate prediction samples. Therefore, based on the above considerations, it is suggested to disable BDOF when generating inter-prediction samples of CIIP mode. Fig. 9 (described below) shows the corresponding workflow of the proposed CIIP process after removal of BDOF.

Fig. 9 shows a diagram illustrating a workflow of the CIIP method proposed by removing BDOF according to an example of the present disclosure.

CIIP based on unidirectional prediction

As discussed above, when merge candidates referenced by one CIIP CU are bi-directionally predicted, both L0 and L1 prediction signals are generated to predict samples within the CU. To reduce storage bandwidth and interpolation complexity, in one embodiment of the present disclosure, it is suggested to use only inter-prediction samples generated with unidirectional prediction (even when the current CU is bi-predictive) in combination with intra-prediction samples in the CIIP mode. Specifically, when the current CIIP CU is predicted unidirectionally, inter-prediction samples will be directly combined with intra-prediction samples. Otherwise (i.e., the current CU is bi-directionally predicted), the inter-prediction samples used by CIIP are generated based on unidirectional predictions from a prediction list (L0 or L1). To select the prediction list, different methods may be applied. In the first approach, it is suggested to always select the first prediction (i.e., list L0) for any CIIP block predicted by two reference pictures.

In the second approach, it is suggested to always select the second prediction (i.e., list L1) for any CIIP block predicted by two reference pictures. In a third method, an adaptive method is applied in which a prediction list associated with a reference picture is selected, the reference picture having a smaller Picture Order Count (POC) distance from the current picture. Fig. 10 (described below) shows a workflow of CIIP based on unidirectional prediction, in which a prediction list is selected based on POC distance.

Finally, in the last method, it is suggested to enable the CIIP mode only when the current CU is predicted unidirectionally. Furthermore, to reduce overhead, the signaling of the CIIP enable/disable flag depends on the prediction direction of the current CIIP CU. When the current CU is predicted unidirectionally, the CIIP identification will be signaled in the bitstream to indicate whether CIIP is enabled or disabled. Otherwise (i.e., the current CU is bi-predictive), the signaling of the CIIP identification will be skipped and always inferred to be false, i.e., CIIP is always disabled.

Fig. 10 illustrates a diagram showing a unidirectional prediction-based CIIP workflow based on POC distance selection prediction list according to one example of the present disclosure.

Coordination of intra modes of CIIP CUs and intra CUs for MPM candidate list construction

As discussed above, current CIIP designs are not uniform in terms of how intra modes of CIIP CUs and intra CUs are used to form MPM candidate lists for their neighboring blocks. Specifically, the intra modes of both the CIIP CU and the intra CU may predict the intra modes of their neighboring blocks encoded in the CIIP mode. However, only the intra mode of the intra CU may predict the intra mode of the intra CU. To achieve a more unified design, this section proposes two methods to coordinate the use of intra modes of CIIP CUs and intra CUs in MPM list construction.

In the first approach, for MPM list construction, the CIIP mode is suggested to be considered as an inter mode. Specifically, when generating an MPM list of one CIIP CU or one intra CU, when its neighboring block is encoded in CIIP mode, then the intra mode of the neighboring block is marked as unavailable. In this way, intra mode without CIIP blocks can be used to construct MPM lists. In contrast, in the second method, for MPM list construction, it is recommended that the CIIP mode is regarded as an intra-mode. Specifically, in this method, the intra mode of the CIIP CU may predict the intra mode of its neighboring CIIP blocks and intra blocks. Fig. 11A and 11B (described below) illustrate MPM candidate list generation procedures when the above two methods are applied.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following its general principles and including such departures from the present disclosure as come within known or customary practice in the art. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise examples described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. It is intended that the scope of the disclosure be limited only by the claims appended hereto.

Fig. 11A illustrates a flowchart of a method when generating an enabled CIIP block for an MPM candidate list according to an example of the disclosure.

Fig. 11B illustrates a flowchart of a method when generating a disabled CIIP block for an MPM candidate list according to an example of the disclosure.

FIG. 12 illustrates a computing environment 1210 coupled with a user interface 1260. The computing environment 1210 may be part of a data processing server. The computing environment 1210 includes a processor 1220, memory 1240, and an I/O interface 1250.

Processor 1220 generally controls the overall operation of computing environment 1210, such as operations associated with display, data acquisition, data communication, and picture processing. Processor 1220 may include one or more processors to execute instructions to perform all or some of the steps of the methods described above. Further, processor 1220 may include one or more circuits that facilitate interaction between processor 1220 and other components. The processor may be a Central Processing Unit (CPU), microprocessor, single-chip, GPU, or the like.

Memory 1240 is configured to store various types of data to support the operation of computing environment 1210. Examples of such data include instructions, video data, picture data, and the like for any application or method operating on computing environment 1210. The memory 1240 may be implemented using any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk.

I/O interface 1250 provides an interface between processor 1220 and peripheral interface modules, such as a keyboard, click wheel, buttons, and the like. These buttons may include, but are not limited to, a home button, a start scan button, and a stop scan button. I/O interface 1250 may be coupled with an encoder and a decoder.

In an embodiment, there is also provided a non-transitory computer readable storage medium, such as included in memory 1240, comprising a plurality of programs executable by processor 1220 in computing environment 1210 for performing the methods described above. For example, the non-transitory computer readable storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

The non-transitory computer readable storage medium has stored therein a plurality of programs for execution by a computing device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the computing device to perform the method for motion prediction described above.

In an embodiment, the computing environment 1210 may be implemented with one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), graphics Processing Units (GPUs), controllers, microcontrollers, microprocessors, or other electronic components for performing the methods described above.

Claims

1. A video encoding method, the method comprising:

dividing a current picture into one or more code blocks;

generating an inter prediction value of the current code block based on the inter prediction mode;

generating an intra-frame prediction value of the current code block based on an intra-frame prediction mode;

generating a final prediction value of the current code block based on the inter prediction value and the intra prediction value;

when constructing a Most Probable Mode (MPM) list of neighboring code blocks, determining a prediction mode of the current code block to be regarded as a unified mode, the unified mode being defined as an inter mode, or as an intra mode,

wherein bidirectional optical flow (BDOF) operation is disabled for the current code block in response to a final prediction of the current code block being generated based on both the inter-prediction value and the intra-prediction value; and

a video bitstream output is formed based on the final prediction of the current code block.

2. The method of claim 1, wherein the generating an inter prediction value of a current code block based on an inter prediction mode comprises: an inter prediction value of the current code block is generated based on at least one motion vector from the current picture to at least one reference picture of a reference picture list, respectively.

3. The method of claim 2, wherein the reference picture list is a first reference picture list L0 when inter-predicting the current code block from one reference picture in the first reference picture list L0.

4. The method of claim 2, wherein the reference picture list is a second reference picture list L1 when inter-predicting the current code block from one reference picture in the second reference picture list L1.

5. The method of claim 2, wherein the reference picture list is a first reference picture list L0 when the current code block is inter-predicted from one of the first reference picture list L0 and one of the second reference picture list L1.

6. The method of claim 2, wherein the reference picture list is a first reference picture list L0 when inter-predicting the current code block from the first reference picture and a second reference picture in the second reference picture list L1.

7. The method of claim 2, wherein when inter-predicting the current code block from one of a first reference picture in a first reference picture list L0 and one of a second reference picture in a second reference picture list L1, the reference picture list is a reference picture list associated with one reference picture, the one reference picture being one of the first reference picture and the second reference picture having a smaller Picture Order Count (POC) distance from the current picture.

8. The method of claim 1, wherein the generating the intra-prediction value for the current code block based on the intra-prediction mode comprises: an intra prediction value of the current code block is generated based on an intra planar prediction mode.

9. A video encoding device comprising one or more processors and one or more memories coupled to the one or more processors, the one or more processors configured to perform the method of any of claims 1-8.

10. A non-transitory computer readable storage medium storing a plurality of programs for execution by a computing device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the computing device to perform the method of any of claims 1-8.

11. A non-transitory computer readable storage medium storing a plurality of programs for execution by a video encoding device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the video encoding device to perform the method of any of claims 1-8, obtain a corresponding video bitstream, and store in the non-transitory computer readable storage medium.

12. A computer program product comprising instructions which, when executed by a processor, implement the method according to any of claims 1-8.