CN113615197B

CN113615197B - Method and apparatus for bit depth control of bi-directional optical flow

Info

Publication number: CN113615197B
Application number: CN202080024193.8A
Authority: CN
Inventors: 修晓宇; 陈漪纹; 王祥林
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-03-26
Filing date: 2020-03-26
Publication date: 2023-03-07
Anticipated expiration: 2040-03-26
Also published as: WO2020198543A1; CN113615197A

Abstract

A bit depth control method, apparatus, and non-transitory computer-readable storage medium are provided. The method comprises the following steps: obtaining a first reference picture I associated with a video block ⁽⁰⁾ And a second reference picture I ⁽¹⁾ (ii) a From the first reference picture I ⁽⁰⁾ Obtaining a first prediction sample point I of the video block from the reference block in (1) ⁽⁰⁾ (i, j); from the second reference picture I ⁽¹⁾ Obtaining a second prediction sample point I of the video block from the reference block in (1) ⁽¹⁾ (i, j); controlling the internal bit depth of the BDOF by applying a right shift to the internal bi-directional optical flow BDOF parameters; BDOF-based is applied to the video block according to the first predicted sample point I ⁽⁰⁾ (I, j) and the second predicted sample I ⁽¹⁾ (i, j) to obtain motion refinement of samples in the video block; and refining based on the motionAnd obtaining bidirectional prediction sampling points of the video block.

Description

Method and apparatus for bit depth control of bi-directional optical flow

Cross Reference to Related Applications

This application is based on and claims priority from provisional application No. 62/823,951 filed on 26/3/2019, the entire contents of which are incorporated herein by reference in their entirety.

Technical Field

The present application relates to video coding and compression. More particularly, the present disclosure relates to methods and apparatus for a bi-directional optical flow (BDOF) method for video coding and decoding.

Background

Various video codec techniques may be used to compress the video data. Video coding is performed according to one or more video coding standards. For example, video codec standards include general video codec (VVC), joint exploration test model (JEM), high efficiency video codec (h.265/HEVC), advanced video codec (h.264/AVC), moving Picture Experts Group (MPEG) codec, and so forth. Video coding is typically performed using prediction methods (e.g., inter-prediction, intra-prediction, etc.) that exploit redundancy present in video images or sequences. An important goal of video codec techniques is to compress video data into a form that uses a lower bit rate while avoiding or minimizing video quality degradation.

Disclosure of Invention

Examples of the disclosure provide bit depth control for bi-directional optical flowA method and apparatus for making the same. According to a first aspect of the present disclosure, there is provided a BDOF bit depth control method for encoding and decoding a video signal. The method may include obtaining a first reference picture I associated with a video block ⁽⁰⁾ And a second reference picture I ⁽¹⁾ . In display order, the first reference picture I ⁽⁰⁾ May precede the current picture and the second reference picture I ⁽¹⁾ May follow the current picture. The method may comprise deriving from the first reference picture I ⁽⁰⁾ Obtaining a first prediction sample point I of the video block from a reference block in (1) ⁽⁰⁾ (i, j). i and j may represent the coordinates of a sample point within the current picture. The method may comprise deriving the second reference picture from the second reference picture I ⁽¹⁾ Obtaining a second prediction sample point I of the video block from the reference block in (1) ⁽¹⁾ (i, j). The method may include controlling an inner bit depth of the BDOF by applying a right shift to an inner BDOF parameter when the coded bit depth is greater than 12 bits. BDOF may use internal BDOF parameters including based on the first predicted sample I ⁽⁰⁾ (I, j) and (ii) horizontal and vertical gradient values obtained from the second prediction samples I ⁽¹⁾ (i, j) the horizontal gradient value and the vertical gradient value obtained. The method may comprise applying BDOF based on the first predicted sample I to the video block ⁽⁰⁾ (I, j) and the second predicted sample I ⁽¹⁾ (i, j) to obtain a motion refinement of samples in the video block. The method may include obtaining bi-predictive samples for the video block based on the motion refinement.

According to a second aspect of the present disclosure, a computing device is provided. The computing device may include one or more processors, a non-transitory computer-readable memory storing instructions executable by the one or more processors. The one or more processors may be configured to obtain a first reference picture I associated with a video block ⁽⁰⁾ And a second reference picture I ⁽¹⁾ . In display order, the first reference picture I ⁽⁰⁾ May precede the current picture and the second reference picture I ⁽¹⁾ Can be arranged inAfter the current picture. The one or more processors may be configured to determine a first reference picture I from the first reference picture I ⁽⁰⁾ Obtaining a first prediction sample point I of the video block from the reference block in (1) ⁽⁰⁾ (i, j). i and j may represent the coordinates of a sample point within the current picture. The one or more processors may be configured to derive the second reference picture I from the second reference picture I ⁽¹⁾ Obtaining a second prediction sample point I of the video block from the reference block in (1) ⁽¹⁾ (i, j). The one or more processors may be configured to control an inner bit depth of the BDOF by applying a right shift to the inner BDOF parameter when the coded bit depth is greater than 12 bits. BDOF may use internal BDOF parameters including based on the first predicted sample I ⁽⁰⁾ (I, j) and (ii) horizontal and vertical gradient values obtained from the second prediction samples I ⁽¹⁾ (i, j) the horizontal gradient value and the vertical gradient value obtained. The one or more processors may be configured to apply BDOF to the video block according to the first predicted sample point I based on ⁽⁰⁾ (I, j) and the second predicted sample I ⁽¹⁾ (i, j) to obtain a motion refinement of samples in the video block. The one or more processors may be configured to obtain bi-predictive samples for the video block based on the motion refinement.

According to a third aspect of the present disclosure, a non-transitory computer-readable storage medium having instructions stored therein is provided. When executed by one or more processors of an apparatus, the instructions may cause the apparatus to: obtaining a first reference picture I associated with a video block ⁽⁰⁾ And a second reference picture I ⁽¹⁾ . In display order, the first reference picture I ⁽⁰⁾ May precede the current picture and the second reference picture I ⁽¹⁾ May follow the current picture. The instructions may cause the apparatus to: from the first reference picture I ⁽⁰⁾ Obtaining a first prediction sample point I of the video block from the reference block in (1) ⁽⁰⁾ (i, j). i and j may represent the coordinates of a sample point within the current picture. The instructions may cause the apparatus to: from the second reference picture I ⁽¹⁾ Obtaining a second prediction sample point I of the video block from the reference block in (1) ⁽¹⁾ (i, j). The instructions may cause the apparatus to: when the coded bit depth is greater than 12 bits, the internal bit depth of the BDOF is controlled by applying a right shift to the internal BDOF parameters. BDOF may use internal BDOF parameters including based on the first predicted sample I ⁽⁰⁾ (I, j) and (ii) horizontal and vertical gradient values obtained from the second prediction samples I ⁽¹⁾ (i, j) the horizontal gradient value and the vertical gradient value obtained. The instructions may cause the apparatus to: BDOF-based is applied to the video block according to the first predicted sample point I ⁽⁰⁾ (I, j) and the second predicted sample I ⁽¹⁾ (i, j) to obtain a motion refinement of samples in the video block. The instructions may cause the apparatus to: and obtaining bidirectional prediction sampling points of the video block based on the motion refinement.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate examples consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a block diagram of an encoder according to an example of the present disclosure.

Fig. 2 is a block diagram of a decoder according to an example of the present disclosure.

Fig. 3A is a diagram illustrating block partitioning in a multi-type tree structure according to an example of the present disclosure.

Fig. 3B is a diagram illustrating block partitioning in a multi-type tree structure according to an example of the present disclosure.

Fig. 3C is a diagram illustrating block partitioning in a multi-type tree structure according to an example of the present disclosure.

Fig. 3D is a diagram illustrating block partitioning in a multi-type tree structure according to an example of the present disclosure.

Fig. 3E is a diagram illustrating block partitioning in a multi-type tree structure according to an example of the present disclosure.

Fig. 4 is a graphical illustration of a bi-directional optical flow (BDOF) model according to an example of the present disclosure.

Fig. 5 is a flowchart illustrating a bit depth control method of encoding and decoding a video signal according to an example of the present disclosure.

Fig. 6 is a flowchart illustrating a method for controlling an internal bit depth of a BDOF according to an example of the present disclosure.

Fig. 7 is a diagram illustrating a computing environment coupled with a user interface according to an example of the present disclosure.

Detailed Description

Reference will now be made in detail to example embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same reference numerals in different drawings represent the same or similar elements, unless otherwise specified. The implementations set forth in the following description of example embodiments do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with aspects set forth in the claims below related to the present disclosure.

The terminology used in the disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is intended to mean and include any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are merely used to distinguish one type of information from another. For example, a first information may be referred to as a second information without departing from the scope of the present disclosure; and similarly, the second information may also be referred to as the first information. As used herein, the term "if" can be understood to mean "when … …" or "at … …" or "in response to a determination", depending on the context.

The first version of the HEVC standard was completed in 2013 in 10 months, which provides a bit rate saving of about 50% or an equivalent perceptual quality compared to the previous generation video codec standard h.264/MPEG AVC. Although the HEVC standard provides significant codec improvements over its predecessors, there is evidence that additional codec tools may be used to achieve codec efficiencies over HEVC. On this basis, both VCEG and MPEG began the search for new codec technologies to achieve future video codec standardization. ITU-T VECG and ISO/IEC MPEG established a joint video exploration team (jfet) in 2015 at 10 months, and significant research into advanced technologies that can greatly improve codec efficiency began. Jfet maintains a reference software called Joint Exploration Model (JEM) by integrating a number of additional codec tools on top of the HEVC test model (HM).

In 10 s 2017, joint proposals (CfP) on video compression with the capability to go beyond HEVC were published by ITU-T and ISO/IEC. In 4 months of 2018, 23 CfP replies were received and evaluated at the 10 th jfet meeting, demonstrating an improvement in compression efficiency of about 40% over HEVC. Based on the results of such evaluations, jfet initiated a new project to develop a new generation of video codec standard known as universal video codec (VVC). In the same month, a reference software code base called VVC Test Model (VTM) is established for demonstrating the reference implementation of the VVC standard.

Like HEVC, VVC is built on a block-based hybrid video codec framework. Fig. 1 shows a block diagram of a generic block-based hybrid video coding system. In particular, fig. 1 shows a typical encoder 100. The encoder 100 has a video input 110, motion compensation 112, motion estimation 114, intra/inter mode decision 116, block prediction value 140, adder 128, transform 130, quantization 132, prediction related information 142, intra prediction 118, picture buffer 120, inverse quantization 134, inverse transform 136, adder 126, memory 124, loop filter 122, entropy coding 138, and bitstream 144.

The input video signal is processed block by block, called Coding Unit (CU). In VTM-1.0, a CU may be up to 128 × 128 pixels. However, unlike HEVC which partitions blocks based on only quadtrees, in VVC one Coding Tree Unit (CTU) is divided into CUs to accommodate local characteristics that differ based on quadtrees. In addition, the concept of multiple partition unit types in HEVC is removed, i.e., the differentiation of CU, prediction Unit (PU) and Transform Unit (TU) is no longer present in VVC; instead, each CU is always used as a basic unit for both prediction and transform without further partitioning. In a multi-type tree structure, one CTU is first partitioned according to a quadtree structure. Each leaf node of the quadtree may then be further partitioned according to a binary tree structure and a ternary tree structure.

As shown in fig. 3A, 3B, 3C, 3D, and 3E, there are five division types, quaternary division, horizontal binary division, vertical binary division, horizontal ternary division, and vertical ternary division.

Fig. 3A shows a diagram illustrating block quaternary partitioning in a multi-type tree structure according to the present disclosure.

FIG. 3B shows a diagram illustrating block vertical binary partitioning in a multi-type tree structure according to the present disclosure.

FIG. 3C shows a diagram illustrating block-level binary partitioning in a multi-type tree structure according to the present disclosure.

FIG. 3D illustrates a diagram illustrating vertical ternary partitioning of blocks in a multi-type tree structure according to the present disclosure.

FIG. 3E shows a diagram illustrating block-level ternary partitioning in a multi-type tree structure according to the present disclosure.

In fig. 1, spatial prediction and/or temporal prediction may be performed. Spatial prediction (or "intra prediction") uses pixels from samples (referred to as reference samples) of already coded neighboring blocks in the same video picture/slice to predict the current video block. Spatial prediction reduces the spatial redundancy inherent in video signals. Temporal prediction (also referred to as "inter prediction" or "motion compensated prediction") uses reconstructed pixels from an encoded video picture to predict a current video block. Temporal prediction reduces the temporal redundancy inherent in video signals. The temporal prediction signal for a given CU is typically signaled by one or more Motion Vectors (MVs) that indicate the amount and direction of motion between the current CU and the temporal reference of the current CU. Also, if multiple reference pictures are supported, one reference picture index is additionally transmitted for identifying from which reference picture in the reference picture store the temporal prediction signal comes. After spatial and/or temporal prediction, a mode decision block in the encoder selects the best prediction mode, e.g., based on a rate-distortion optimization method. The prediction block is then subtracted from the current video block and the prediction residual is decorrelated and quantized using a transform. The quantized residual coefficients are inverse quantized and inverse transformed to form a reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU. Further, before the reconstructed CU is placed in a reference picture store and used for coding future video blocks, loop filtering such as deblocking filters, sample Adaptive Offset (SAO), and Adaptive Loop Filters (ALF) may be applied to the reconstructed CU. To form the output video bitstream, the coding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to an entropy coding unit for further compression and packing to form the bitstream.

Fig. 2 presents a general block diagram of a block-based video decoder. In particular, fig. 2 shows a block diagram of a typical decoder 200. The decoder 200 has a bitstream 210, entropy decoding 212, inverse quantization 214, inverse transform 216, adder 218, intra/inter mode selection 220, intra prediction 222, memory 230, loop filter 228, motion compensation 224, picture buffer 226, prediction related information 234, and video output 232.

In fig. 2, a video bitstream is first entropy decoded at an entropy decoding unit. The coding mode and prediction information are sent to a spatial prediction unit (in the case of intra-coding) or a temporal prediction unit (in the case of inter-coding) to form a prediction block. The residual transform coefficients are sent to an inverse quantization unit and an inverse transform unit to reconstruct a residual block. Then, the prediction block and the residual block are added. The reconstructed block may be further loop filtered and then stored in a reference picture store. The reconstructed video in the reference picture store is then sent out to drive the display device and used to predict future video blocks.

Bidirectional light stream

Conventional bi-prediction in video coding is a simple combination of two temporally predicted blocks obtained from already reconstructed reference pictures. However, due to the limitation of block-based motion compensation, there may still be residual small motion that may be observed between the samples of the two prediction blocks, thus reducing the efficiency of motion compensated prediction. To solve this problem, BDOF is applied in VVC to reduce the influence of such motion for each sample within a block.

Specifically, when bi-prediction is used, BDOF is a sample-level motion refinement performed on top of block-based motion compensated prediction, as shown in fig. 4. Fig. 4 shows a diagram of a BDOF model according to the present disclosure.

After applying BDOF inside a 6 × 6 window Ω around each 4 × 4 sub-block, the motion refinement (v) of this sub-block is calculated by minimizing the difference between the L0 and L1 predicted samples _x ，v _y ). Specifically, (v) _x ，v _y ) Is derived as

Wherein the content of the first and second substances,

is a floor function; clip3 (min, max, x) is the clipping of a given value x to [ min, max [ ]]A function within a range; the symbol > represents a bit-by-bit right shift operation; the symbol < represents bit-by-bit left shift operation; th (h) _BDOF Is a motion refinement threshold for preventing propagation errors due to irregular local motion, which is equal to 2 ^13-BD Where BD is the bit depth of the input video. For example, the bit depth represents the number of bits used to define each pixel. In the step (1), the first step is carried out,

S ₁ 、S ₂ 、S ₃ 、S ₅ and S ₆ Is calculated as

Wherein the content of the first and second substances,

wherein, I ^(k) (i, j) are the sample values at coordinates (i, j) of the prediction signal in list k (k =0,1), which are generated with medium to high precision (i.e., 16 bits);

and

respectively, a horizontal gradient and a vertical gradient of a sample point obtained by directly calculating a difference between two adjacent sample points of the sample point, that is,

Based on the motion refinement derived in (1), the final bidirectional prediction samples of the CU are computed by interpolating the L0/L1 prediction samples along the motion trajectory based on the optical flow model, as shown in

Wherein shift and o _offset Is the right shift value and offset value applied to combine the L0 and L1 prediction signals for bi-directional prediction, equal to 15-BD and 1 < (14-BD) +2 (1 < 13), respectively. Table 1 showsThe specific bit width of the intermediate parameters involved in the BDOF procedure. As shown in Table 1, the internal bit width of the entire BDOF process does not exceed ₃₂ A bit. In addition, the multiplication with the worst possible input takes place on the product v in equation (1) _x S _2，m Where S is input _2，m And v _x With 15 bits and 4 bits respectively. Therefore, a 15-bit multiplier is sufficient for BDOF.

TABLE 1 bit Width of intermediate parameters of BDOF in VVC

Efficiency of bi-directional prediction

Although BDOF can improve the efficiency of bi-prediction, there is still an opportunity to further improve the design of BDOF. In particular, the present disclosure finds the following problem for controlling the bit width of intermediate parameters in existing BDOF designs of VVCs.

As shown in Table 1, the parameter θ (i, j) (i.e., the difference between the L0 predicted samples and the L1 predicted samples) and the parameter ψ _x (i, j) and ψ _y (i, j) (i.e., the sum of the horizontal/vertical L0 and L1 gradient values) are all at the same value ₁₁ Bit width representation. While this approach may facilitate overall control of the internal bit-width of the BDOF, it is suboptimal in terms of the accuracy of the resulting motion refinement. Part of the reason may be to calculate the gradient values as the difference between neighboring predicted samples, as shown in equation (4). Due to the high-pass nature of this process, in the presence of noise (e.g., noise captured in the original video and codec noise generated during the codec process), the reliability of the resulting gradient may beAnd (4) reducing. Therefore, it may not always be beneficial to represent the gradient values with a high bit width.

As shown in Table 1, the maximum bit width usage in the entire BDOF process occurs in the computation of the vertical motion refinement v _y Wherein S is first introduced ₆ (27 bits) is left-shifted by 3 bits and then subtracted ((v) _x S _2，m )＜＜12+v _x S _2，s ) 2 (/ 30 bits). Thus, the maximum bit width of the current design is equal to 31 bits. In practical hardware implementations, the codec process with the maximum internal bit width greater than 16 bits is typically implemented with a 32-bit implementation. Thus, existing designs do not take full advantage of the effective dynamic range of the 32-bit implementation. This may result in an unnecessary loss of precision of the motion refinement obtained by BDOF.

Improving efficiency of bi-directional prediction using BDOF

In the present disclosure, an improved bit-width control method is proposed to solve these two problems of the bit-width control method pointed out for the existing BDOF design in the "problem statement" section.

Fig. 5 illustrates a bit depth control method of encoding and decoding a video signal according to the present disclosure.

In step 510, a first reference picture I associated with a video block is obtained ⁽⁰⁾ And a second reference picture I ⁽¹⁾ . First reference picture I in display order ⁽⁰⁾ Before the current picture, and a second reference picture I ⁽¹⁾ After the current picture. For example, the reference picture may be a video picture adjacent to a current picture being encoded.

In step 512, from the first reference picture I ⁽⁰⁾ Obtaining a first prediction sample point I of the video block from the reference block ⁽⁰⁾ (i, j), wherein i and j represent the coordinates of a sample point in the current picture. For example, the first predicted sample I ⁽⁰⁾ (i, j) may be a prediction sample in an L0 list of a reference picture preceding in display order using a motion vector.

In step 514, from the second reference picture I ⁽¹⁾ Obtaining video from a reference block in (1)Second predicted sample of block I ⁽¹⁾ (i, j). For example, the second predicted sample I ⁽¹⁾ (i, j) may be a prediction sample in the L1 list of the reference picture following in display order using the motion vector.

In step 516, when the coded bit depth is greater than 12 bits, the inner bit depth of the BDOF is controlled by applying a right shift to the inner BDOF parameter. For example, BDOF uses internal BDOF parameters including first predicted sample point I based ⁽⁰⁾ (I, j) and (ii) horizontal and vertical gradient values obtained from the first and second predicted samples I ⁽¹⁾ (i, j) the horizontal gradient value and the vertical gradient value obtained.

In step 518, based on the BDOF being applied to the video block according to the first predicted sample, ⁽⁰⁾ (f, j) and the second predicted samples to obtain a motion refinement of samples in the video block.

In step 520, bi-predictive samples for the video block are obtained based on the motion refinement.

First, to overcome the negative effects of gradient estimation errors, in the proposed method, the gradient values in equation (4) are calculated

And

introducing an additional right shift n _grad I.e. in order to reduce the internal bit width of the gradient values. Specifically, the horizontal and vertical gradients at each sample location are calculated as

In addition, the additional bits are shifted by n _adj Introduced into variable ψ _x (i，j)、ψ _y (i, j) and θ (i, j) to control the entire BDOF process to operate the BDOF process with the appropriate internal bit width, as depicted below:

as will be seen in Table 2, the parameter ψ is due to the modification applied to the number of bits shifted to the right in equations (6) and (7) _x (i，j)、ψ _y The dynamic ranges of (i, j) and θ (i, j) will be different, in contrast to the prior BDOF design shown in Table 1, where all three parameters are represented by the same dynamic range (i.e., 21 bits). Such variations may increase the internal parameter S ₁ 、S ₂ 、S ₃ 、S ₅ And S ₆ Which may increase the maximum bit width of the internal BDOF process to above 32 bits. Therefore, to ensure a 32-bit implementation, two additional clipping operations are introduced to the pair S ₂ And S ₆ In the calculation of the value of (c). In particular, in the proposed method, the values of these two parameters are calculated as:

wherein, B ₂ And B ₆ Are respectively used for controlling S ₂ And S ₆ The output dynamic range of the parameter. It should be noted that unlike gradient calculations, in equation (8), the clipping operation is only applied once to calculate the motion refinement of each 4 × 4 sub-block inside one BDOF CU, i.e. is invoked on a 4 × 4 unit basis. The corresponding complexity increase due to the clipping operation introduced in the proposed method is therefore quite insignificant.

In practice, different values of n may be applied _grad 、n _adj 、B ₂ And B ₆ To achieve different trade-offs between the intermediate bit width and the accuracy of the inner BDOF derivation. As an embodiment of the present disclosure, n is proposed _grad And n _adj Set to 2, set B ₂ Set to 25, and B ₆ Set to 27. As another embodiment of the present disclosure, it is proposed to use n _grad Set to 1, n _adj Set to 4, set B ₂ Set to 26, and B ₆ Is set at 28.

Table 2 shows the corresponding bit width of each intermediate parameter when the proposed bit width control method is applied to the BDOF. In table 2, the grey highlights the variation applied in the proposed bit width control method compared to the existing BDOF design in VVC. As shown in table 2, the proposed bit width control method makes the internal bit width of the entire BDOF process not more than 32 bits. In addition, with the proposed design, the maximum bit width is just 32 bits, and thus the available dynamic range of a 32-bit hardware implementation can be fully exploited. On the other hand, as shown in Table 2, the multiplication with the worst input occurs at the product v _x S _2，m Where S is input _2，m Is 14 bits, and the input v _x Is 6 bits. Thus, as with the existing BDOF design, a 16-bit multiplier is also large enough when applying the proposed method.

Table 2 bit width of the intermediate parameters of the proposed method

Fig. 6 illustrates an example method for controlling the internal bit depth of a BDOF according to this disclosure.

In step 610, based on the first predicted sample I ⁽⁰⁾ (I +1,j) and the first predicted sample I ⁽⁰⁾ (I-1,j) to obtain a first predicted sample I ⁽⁰⁾ (ii) a first horizontal gradient value of (i, j).

In step 612, based on the second predicted sample I ⁽¹⁾ (I +1,j) and second predicted sample I ⁽¹⁾ (I-1,j) to obtain a second predicted sample I ⁽¹⁾ (ii) a second horizontal gradient value of (i, j).

In step 614, based on the first predicted sample I ⁽⁰⁾ (I, j + 1) and the first predicted sample I ⁽⁰⁾ (I, j-1) to obtain a first predicted sample point I ⁽⁰⁾ (ii) a first vertical gradient value of (i, j).

In step 616, based on the second predicted sample point I ⁽¹⁾ (I, j + 1) and a second predicted sample I ⁽¹⁾ (I, j-1) to obtain a second predicted sample point I ⁽¹⁾ (ii) a second vertical gradient value of (i, j).

In step 618, the first horizontal gradient value and the second horizontal gradient value are right-shifted by the first shift value.

In step 620, the first and second vertical gradient values are right-shifted by a first shift value.

In the above method, a clipping operation as in equation (8) is added to avoid deriving v _x And v _y The intermediate parameter overflows. However, such clipping is only required when the relevant parameters are accumulated in a large local window. When a small window is applied, overflow may not occur. Therefore, in another embodiment of the present disclosure, the following bit depth control method is proposed for the BDOF method that does not require clipping, as described below.

First, at each sample position, the gradient value in equation (4)

And

is calculated as

Secondly, the relevant parameter psi for the BDOF procedure _x (i，j)、ψ _y The sum of (i, j) and θ (i,j) The calculation is as follows:

third, S ₁ 、S ₂ 、S ₃ 、S ₅ And S ₆ Is calculated as

Fourth, motion refinement for each 4 × 4 sub-block (v) _x ，v _y ) Is derived as

Fifthly, based on the optical flow model, the final bidirectional prediction sample of the CU is calculated by interpolating the L0/L1 prediction samples along the motion trajectory, as shown in the following equation

The above-described BDOF bit-width control method is based on the assumption that the internal bit-depth used to encode the video cannot exceed 12 bits, so that the accuracy of the output signal from Motion Compensation (MC) is 14 bits. In other words, when the internal bit depth is greater than 12 bits, the BDOF bit width control methods as specified in equations (9) to (13) cannot guarantee that all bit depths of the internal BDOF operation are within 32 bits. To solve this overflow problem of high internal bit depth, an improved BDOF bit depth control method is disclosed below, which introduces an additional right shift of the bit level depending on the internal bit depth applied after the MC level. In this method, when the internal bit depth is greater than 12 bits, the output signal of the MC is always shifted to 14 bits, so that the existing BDOF bit depth control method, which is designed for the internal bit depth of 8 to 12 bits, can be reused for the BDOF process of the high bit depth video. In particular, assuming bitdepth is the internal bit depth, the proposed method can be implemented by:

First, at each sample position, the gradient value in equation (4)

And

is calculated as

Secondly, the relevant parameter psi for the BDOF procedure _x (i，j)、ψ _y (i, j) and θ (i, j) are calculated as:

third, S ₁ 、S ₂ 、S ₃ 、S ₅ And S ₆ Is calculated as

Fourth, motion refinement (v) for each 4 × 4 sub-block _x ，v _y ) Is derived as

Therein, th _BDOF Is the motion refinement threshold, which is calculated to be 1 < max (5,bitdepth-7) based on the internal bit depth. In another example, th _BDOF It can be calculated as 1 < (bitdepth-7) based on the internal bit depth. In other words, to control BDOF motionFor dynamic range, the motion refinement threshold is determined as the coded bit depth of 2 minus the power of 7.

FIG. 7 illustrates a computing environment 710 coupled with a user interface 760. The computing environment 710 may be part of a data processing server. The computing environment 710 includes a processor 720, a memory 740, and an I/O interface 750.

The processor 720 generally controls the overall operation of the computing environment 710, such as operations associated with display, data acquisition, data communication, and image processing. Processor 720 may include one or more processors to execute instructions to perform all or some of the steps of the above-described methods. Further, processor 720 may include one or more modules that facilitate interaction between processor 720 and other components. The processor may be a Central Processing Unit (CPU), microprocessor, single-chip, GPU, etc.

The memory 740 is configured to store various types of data to support the operation of the computing environment 710. The memory 740 may include predetermined software 742. Examples of such data include instructions for any application or method operating on the computing environment 710, video data sets, image data, and so forth. The memory 740 may be implemented using any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

I/O interface 750 provides an interface between processor 720 and peripheral interface modules such as a keyboard, click wheel, buttons, etc. The buttons may include, but are not limited to, a home button, a start scan button, and a stop scan button. The I/O interface 750 may be coupled to an encoder and a decoder.

In an embodiment, a non-transitory computer readable storage medium is also provided, comprising a plurality of programs, such as embodied in the memory 740, executable by the processor 720 in the computing environment 710 for performing the above-described methods. For example, the non-transitory computer readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The non-transitory computer readable storage medium has stored therein a plurality of programs for execution by a computing device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the computing device to perform the motion prediction method described above.

In embodiments, the computing environment 710 may be implemented with one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), graphics Processing Units (GPUs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.

The description of the present disclosure has been presented for purposes of illustration and is not intended to be exhaustive or limited to the disclosure. Many modifications, variations and alternative embodiments will become apparent to those of ordinary skill in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings.

The embodiment was chosen and described in order to explain the principles of the disclosure and to enable others of ordinary skill in the art to understand the disclosure for various embodiments and with the best mode of practicing the disclosure and with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of the disclosure is not to be limited to the specific examples of the embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the disclosure.

Claims

1. A bi-directional optical flow BDOF bit-depth control method for coding and decoding a video signal, the bit-depth control method comprising:

obtaining a first reference picture I associated with a video block ⁽⁰⁾ And a second reference picture I ⁽¹⁾ Wherein, in display order, the first reference picture I ⁽⁰⁾ Prior to the current picture, and the second reference picture I ⁽¹⁾ Subsequent to the current picture;

from the first reference picture I ⁽⁰⁾ Obtaining a first prediction sample point I of the video block from the reference block in (1) ⁽⁰⁾ (i, j), wherein i and j represent coordinates of a sample point within the current picture;

from the second reference picture I ⁽¹⁾ Obtaining a second prediction sample point I of the video block from the reference block in (1) ⁽¹⁾ (i，j)；

Controlling an inner bit depth of a BDOF by applying a right shift to an inner BDOF parameter when a coded bit depth is greater than 12 bits, wherein the inner BDOF parameter comprises an inner BDOF based on the first predicted sample point I ⁽⁰⁾ (I, j) and (ii) horizontal and vertical gradient values obtained from the second prediction samples I ⁽¹⁾ (i, j) controlling an internal bit depth of the BDOF by applying a right shift to the internal BDOF parameter comprises:

based on the first predicted sampling point I ⁽⁰⁾ (I +1,j) and the first predicted sample I ⁽⁰⁾ (I-1,j) to obtain a first predicted sample I ⁽⁰⁾ (ii) a first horizontal gradient value of (i, j);

based on the second predicted sample point I ⁽¹⁾ (I +1,j) and second predicted sample I ⁽¹⁾ (I-1,j) to obtain a second predicted sample I ⁽¹⁾ (ii) a second horizontal gradient value of (i, j);

based on the first predicted sampling point I ⁽⁰⁾ (I, j + 1) and the first predicted sample I ⁽⁰⁾ (I, j-1) to obtain a first predicted sample point I ⁽⁰⁾ (ii) a first vertical gradient value of (i, j);

based on the second predicted sampling point I ⁽¹⁾ (I, j + 1) and a second predicted sample I ⁽¹⁾ (I, j-1) to obtain a second predicted sample point I ⁽¹⁾ (ii) a second vertical gradient value of (i, j);

right-shifting the first horizontal gradient value and the second horizontal gradient value by a first shift value; and

right-shifting the first vertical gradient value and the second vertical gradient value by the first shift value,

wherein the first shift value is equal to the coded bit depth minus 6;

BDOF-based is applied to the video block according to the first predicted sample point I ⁽⁰⁾ (I, j) and the second predicted sample I ⁽¹⁾ (i, j) to obtain motion refinement of samples in the video block; and

and obtaining bidirectional prediction sampling points of the video block based on the motion refinement.

2. The method of claim 1, further comprising:

Obtaining a first correlation value, wherein the first correlation value is based on the first predicted sampling point I ⁽⁰⁾ (I, j) and (ii) a horizontal gradient value based on the second predicted sample I ⁽¹⁾ (ii) sum of the horizontal gradient values of (i, j);

obtaining a second correlation value, wherein the second correlation value is based on the first predicted sample point I ⁽⁰⁾ (I, j) and (ii) a vertical gradient value based on the second predicted sample I ⁽¹⁾ (ii) sum of vertical gradient values of (i, j);

modifying the first correlation value by right-shifting the first correlation value using a second shift value; and

modifying the second correlation value by right-shifting the second correlation value using the second shift value.

3. The method of claim 2, wherein the second shift value is equal to the coded bit depth minus 11.

4. The method of claim 2, further comprising:

right shifting the first predicted sample I by using a third shift value ⁽⁰⁾ (i, j) obtaining a first modified predicted sample;

right-shifting the second predicted sample I by using the third shift value ⁽¹⁾ (i, j) obtaining a second modified predicted sample; and

obtaining a third correlation value, wherein the third correlation value is a difference between the first modified predicted sample and the second modified predicted sample.

5. The method of claim 4, wherein the third shift value is equal to the coded bit depth minus 8.

6. The method of claim 4, further comprising:

obtaining a first internal sum value based on a sum of squares of the first correlation value for each sample point within each 4 x 4 sub-block of the video block;

obtaining a second internal summation value based on the sum of the products of the first correlation value and the second correlation value of each sample point in each 4 × 4 sub-block of the video block;

obtaining a third internal summation value based on the sum of products of the first correlation value and the third correlation value of each sample point in each 4 x 4 sub-block of the video block;

obtaining a fourth internal sum value based on a sum of squares of the second correlation values for each sample point within each 4 x 4 sub-block of the video block;

obtaining a fifth internal sum value based on the sum of products of the second correlation value and the third correlation value of each sample point in each 4 × 4 sub-block of the video block;

obtaining a horizontal motion refinement value based on a quotient of the third internal summation value and the first internal summation value, wherein the motion refinement value comprises the horizontal motion refinement value;

obtaining a vertical motion refinement value based on the second, fourth, fifth, and horizontal motion refinement values, wherein the motion refinement value comprises the vertical motion refinement value; and

Clipping the horizontal motion refinement value and the vertical motion refinement value based on a motion refinement threshold.

7. The method of claim 6, wherein the motion refinement threshold is determined as a coded bit depth of 2 minus the power of 7.

8. A computing device, comprising:

one or more processors;

a non-transitory computer-readable storage medium storing instructions executable by the one or more processors, wherein the one or more processors are configured to:

Controlling an internal bit depth of a bidirectional optical flow (BDOF) by applying a right shift to BDOF parameters when a coded bit depth is greater than 12 bits, wherein the internal BDOF parameters include an I-sample based on the first prediction ⁽⁰⁾ (I, j) and (ii) horizontal and vertical gradient values obtained from the second prediction samples I ⁽¹⁾ (i, j) controlling an internal bit depth of the BDOF by applying a right shift to the internal BDOF parameter comprises:

based on the first predicted sampling point I ⁽⁰⁾ (I, j + 1) and the first predicted sample I ⁽⁰⁾ (I, j-1) to obtain a first predicted sample point I ⁽⁰⁾ (i, j) first droopA straight gradient value;

based on the second predicted sample point I ⁽¹⁾ (I, j + 1) and a second predicted sample I ⁽¹⁾ (I, j-1) to obtain a second predicted sample point I ⁽¹⁾ (ii) a second vertical gradient value of (i, j);

wherein the first shift value is equal to the coded bit depth minus 6;

9. The computing device of claim 8, wherein the one or more processors are further configured to:

10. The computing device of claim 9, wherein the second shift value is equal to the coded bit depth minus 11.

11. The computing device of claim 9, wherein the one or more processors are further configured to:

12. The computing device of claim 11, wherein the third shift value is equal to the coded bit depth minus 8.

13. The computing device of claim 11, wherein the one or more processors are further configured to:

obtaining a second internal summation value based on the sum of products of the first correlation value and the second correlation value of each sample point in each 4 x 4 sub-block of the video block;

14. The computing device of claim 13, wherein the motion refinement threshold is determined as a coded bit depth of 2 minus the power of 7.

15. A non-transitory computer readable storage medium storing a plurality of programs for execution by a computing device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the computing device to:

Controlling an internal bit depth of a bidirectional optical flow BDOF parameter by applying a right shift to the BDOF parameter when the coded bit depth is larger than 12 bits, wherein the BDOF parameter comprises a first predicted sample point I based on ⁽⁰⁾ (i, j) the horizontal and vertical gradient values obtained, and based on the second prediction samplesPoint I ⁽¹⁾ (i, j) controlling an internal bit depth of the BDOF by applying a right shift to the internal BDOF parameter comprises:

wherein the first shift value is equal to the coded bit depth minus 6;

16. The non-transitory computer readable storage medium of claim 15, wherein the plurality of programs further cause the computing device to:

obtaining a second correlation value, wherein the second correlation value is based on the first predicted sampling point I ⁽⁰⁾ (I, j) and a vertical gradient value based on the second predicted sample I ⁽¹⁾ (ii) sum of vertical gradient values of (i, j);

modifying the second correlation value by right shifting the second correlation value using the second shift value.

17. The non-transitory computer-readable storage medium of claim 16, wherein the second shift value is equal to the coded bit depth minus 11.

18. The non-transitory computer readable storage medium of claim 16, wherein the plurality of programs further cause the computing device to:

right-shifting the first predicted sample I by using a third shift value ⁽⁰⁾ (i, j) obtaining a first modified predicted sample;

19. The non-transitory computer-readable storage medium of claim 18, wherein the third shift value is equal to the coded bit depth minus 8.

20. The non-transitory computer readable storage medium of claim 18, wherein the plurality of programs further cause the computing device to:

obtaining a fifth internal summation value based on the sum of the products of the second correlation value and the third correlation value of each sample point in each 4 × 4 sub-block of the video block;

21. The non-transitory computer readable storage medium of claim 20, wherein the motion refinement threshold is determined as a coded bit depth of 2 minus the power of 7.