CN114175659A - Apparatus and method for bit width control of bi-directional optical flow - Google Patents

Apparatus and method for bit width control of bi-directional optical flow Download PDF

Info

Publication number
CN114175659A
CN114175659A CN202080045432.8A CN202080045432A CN114175659A CN 114175659 A CN114175659 A CN 114175659A CN 202080045432 A CN202080045432 A CN 202080045432A CN 114175659 A CN114175659 A CN 114175659A
Authority
CN
China
Prior art keywords
value
decoder
predicted sample
predicted
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080045432.8A
Other languages
Chinese (zh)
Inventor
修晓宇
陈漪纹
王祥林
马宗全
朱弘正
叶水明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Publication of CN114175659A publication Critical patent/CN114175659A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/573Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Methods, apparatuses, and non-transitory computer-readable storage media for decoding a video signal are provided. The method comprises the following steps: obtaining and videoBlock associated first reference picture I(0)And a second reference picture I(1)From the first reference picture I(0)Obtaining a first prediction sample point I of the video block from the reference block in (1)(0)(I, j) from the second reference picture I(1)Obtaining a second prediction sample point I of the video block from the reference block in (1)(1)(I, j) controlling an internal bit depth of the BDOF by applying a right shift to internal BDOF parameters, based on which(0)(I, j) and a second predicted sample I(1)(i, j) is applied to the video block to obtain final bi-predictive samples for the video block.

Description

Apparatus and method for bit width control of bi-directional optical flow
Cross Reference to Related Applications
This application is based on and claims priority from provisional application No. 62/866,607 filed on 25.6.2019 and provisional application No. 62/867,185 filed on 26.6.2019, the entire contents of which are incorporated herein by reference in their entirety for all purposes.
Technical Field
The present disclosure relates to video coding and compression. More particularly, the present disclosure relates to methods and apparatus for a bi-directional optical flow (BDOF) method for video coding and decoding.
Background
Various video codec techniques may be used to compress video data. Video coding is performed according to one or more video coding standards. For example, video codec standards include general video codec (VVC), joint exploration test model (JEM), high efficiency video codec (h.265/HEVC), advanced video codec (h.264/AVC), Moving Picture Experts Group (MPEG) codec, and so forth. Video coding typically uses prediction methods that exploit redundancy present in video images or sequences (e.g., inter-prediction, intra-prediction, etc.). An important goal of video codec technology is to compress video data into a form using a lower bit rate while avoiding or minimizing video quality degradation.
Disclosure of Invention
Examples of the present disclosure provide methods and apparatus for motion vector prediction in video coding.
According to a first aspect of the present disclosure, there is provided a method of decoding a video signal. The method may include obtaining, at a decoder, a first reference picture I associated with a video block(0)And a second reference picture I(1). First reference picture I in display order(0)Preceding the current picture and a second reference picture I(1)After the current picture. The method may also include, at a decoder, decoding from a first reference picture I(0)Obtaining a first prediction sample point I of the video block from the reference block in (1)(0)(i, j). i and j may represent the coordinates of a sample point within the current picture. The method may include, at a decoder, decoding from a second reference picture I(1)Obtaining a second prediction sample point I of the video block from the reference block in (1)(1)(i, j). The method may include controlling, at a decoder, an inner bit depth of the BDOF by applying a right shift to an inner BDOF parameter. The BDOF is independent of input video bit depth. The internal BDOF parameter may include a first predicted sample point I based(0)(I, j) deriving horizontal and vertical gradient values based on the second predicted sample I(1)(I, j) derived horizontal and vertical gradient values, and first predicted sample I(0)(I, j) and a second predicted sample I(1)(i, j) sample difference between (i, j). The method may compriseIncluding at the decoder, based on the BDOF, according to a first predicted sample I(0)(I, j) and a second predicted sample I(1)(i, j) is applied to the video block to obtain final bi-predictive samples for the video block.
According to a second aspect of the present disclosure, there is provided a method of decoding a video signal. The method may include obtaining, at a decoder, a first reference picture I associated with a video block(0)And a second reference picture I(1). First reference picture I in display order(0)Preceding the current picture and a second reference picture I(1)After the current picture. The method may also include, at a decoder, decoding from a first reference picture I(0)Obtaining a first prediction sample point I of the video block from the reference block in (1)(0)(i, j). i and j represent the coordinates of a sample point within the current picture. The method may include, at a decoder, decoding from a second reference picture I(1)Obtaining a second prediction sample point I of the video block from the reference block in (1)(1)(i, j). The method may further include controlling, at a decoder and when an internal bit depth is greater than 12 bits, the internal bit depth of the BDOF by applying a right shift to internal BDOF parameters to align the precision of an output prediction signal to a constant. The internal BDOF parameter includes a first predicted sample point I(0)(I, j) deriving horizontal and vertical gradient values based on the second predicted sample I(1)(I, j) derived horizontal and vertical gradient values, and first predicted sample I(0)(I, j) and a second predicted sample I(1)(i, j) sample difference between (i, j). The method may comprise, at a decoder, based on the BDOF, predicting samples I from a first prediction sample I(0)(I, j) and a second predicted sample I(1)(i, j) is applied to the video block to obtain final bi-predictive samples of the video block. The method may also include obtaining, at a decoder, the output prediction signal based on the final bi-directionally predicted samples.
According to a third aspect of the present disclosure, a computing device for decoding a video signal is provided. The computing device may include one or more processors, and the storage may be oneOr non-transitory computer readable memory of instructions executed by more processors. The one or more processors may be configured to: obtaining, at a decoder, a first reference picture I associated with a video block(0)And a second reference picture I(1). First reference picture I in display order(0)Preceding the current picture and a second reference picture I(1)After the current picture, the one or more processors may be further configured to, at a decoder, derive a reference picture from a first reference picture I(0)Obtaining a first prediction sample point I of the video block from the reference block in (1)(0)(i, j). i and j represent the coordinates of a sample point within the current picture. The one or more processors may be configured to derive a reference picture from a second reference picture at a decoder(1)Obtaining a second prediction sample point I of the video block from the reference block in (1)(1)(i, j). The one or more processors may be further configured to control, at the decoder, an internal bit depth of a bi-directional optical flow (BDOF) by applying a right shift to the BDOF parameters. The BDOF is independent of input video bit depth. The internal BDOF parameter includes a first predicted sample point I(0)(I, j) deriving horizontal and vertical gradient values based on the second predicted sample I(1)(I, j) derived horizontal and vertical gradient values, and first predicted sample I(0)(I, j) and a second predicted sample I(1)(i, j) a sample difference value between; and based on the video block based on a first prediction sample I at a decoder(0)(I, j) and a second predicted sample I(1)(i, j) is applied the BDOF to obtain final bi-predictive samples of the video block.
According to a fourth aspect of the present disclosure, a non-transitory computer-readable storage medium having instructions stored thereon is provided. When the instructions are executed by one or more processors of the apparatus, the instructions may convey the apparatus to perform the steps of: obtaining, at a decoder, a first reference picture I associated with a video block(0)And a second reference picture I(1). First reference picture I in display order(0)Preceding the current picture and a second reference picture I(1)After the current picture. The instructions may further cause the device to perform the steps of: from a first reference picture I at a decoder(0)Obtaining a first prediction sample point I of the video block from the reference block in (1)(0)(i, j). i and j represent the coordinates of a sample point within the current picture. The instructions may additionally cause the device to perform the steps of: from a second reference picture I at a decoder(1)Obtaining a second prediction sample point I of the video block from the reference block in (1)(1)(i, j). The instructions may further cause the device to perform the steps of: at a decoder and when an internal bit depth is greater than 12 bits, controlling an internal bit depth of an internal bi-directional optical flow (BDOF) parameter by applying a right shift to the BDOF to align the precision of an output prediction signal to a constant. The internal BDOF parameter includes a first predicted sample point I(0)(I, j) deriving horizontal and vertical gradient values based on the second predicted sample I(1)(I, j) derived horizontal and vertical gradient values, and first predicted sample I(0)(I, j) and a second predicted sample I(1)(i, j) sample difference between (i, j). The instructions may additionally further cause the device to perform the steps of: at decoder from first predicted samples I based on the BDOF(0)(I, j) and a second predicted sample I(1)(i, j) is applied to the video block to obtain final bi-predictive samples for the video block. The instructions may also further cause the device to perform the steps of: obtaining, at a decoder, the output prediction signal based on the final bi-directional prediction samples.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate examples consistent with the disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a block diagram of an encoder according to an example of the present disclosure.
Fig. 2 is a block diagram of a decoder according to an example of the present disclosure.
Fig. 3A is a diagram illustrating block partitioning in a multi-type tree structure according to an example of the present disclosure.
Fig. 3B is a diagram illustrating block partitioning in a multi-type tree structure according to an example of the present disclosure.
Fig. 3C is a diagram illustrating block partitioning in a multi-type tree structure according to an example of the present disclosure.
Fig. 3D is a diagram illustrating block partitioning in a multi-type tree structure according to an example of the present disclosure.
Fig. 3E is a diagram illustrating block partitioning in a multi-type tree structure according to an example of the present disclosure.
Fig. 4 is a diagram of a bi-directional optical flow (BDOF) model according to an example of the present disclosure.
Fig. 5 is a bit depth control method of BDOF according to an example of the present disclosure.
Fig. 6 is a bit depth control method of BDOF according to an example of the present disclosure.
Fig. 7 is a diagram illustrating a computing environment coupled with a user interface according to an example of the present disclosure.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings, in which the same reference numerals in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations set forth in the following description of exemplary embodiments do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with aspects related to the present disclosure as recited in the claims below.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a", "an", and "the" (are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is intended to mean and include any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms "first," "second," "third," etc. may be used herein to describe various information, the information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first information may be referred to as a second information without departing from the scope of the present disclosure; and similarly, the second information may also be referred to as the first information. As used herein, the term "if" can be understood to mean "when … …" or "at … …" or "in response to a determination", depending on the context.
The first version of the HEVC standard was completed in 2013 in 10 months, which provides a bit rate saving of about 50% or equivalent perceptual quality compared to the previous generation video codec standard h.264/MPEG AVC. Although the HEVC standard provides more significant codec improvements than its previous generation, evidence suggests that additional codec tools may be utilized to achieve codec efficiencies superior to HEVC. On this basis, both VCEG and MPEG have begun the quest for new codec techniques for future video codec standardization. ITU-T VECG and ISO/IEC MPEG constitute the joint video exploration team (jfet) 10 months 2015 to begin important research into advanced technologies that can achieve substantial enhancement of codec efficiency. One reference software, called Joint Exploration Model (JEM), is maintained by jfet by integrating several additional codec tools on top of the HEVC test model (HM).
In 2017, month 10, a joint proposal (CfP) was published by ITU-T and ISO/IEC on video compression with capabilities exceeding HEVC. In month 4 of 2018, 23 CfP responses were received and evaluated at the 10 th jfet meeting, which demonstrated a compression efficiency gain of about 40% above HEVC. Based on such evaluation results, jfet launched a new project to develop a new generation of video codec standard named universal video codec (VVC). In the same month, a reference software code base called a VVC Test Model (VTM) for demonstrating a reference implementation of the VVC standard is established.
Like HEVC, VVC is built on a block-based hybrid video codec framework.
Fig. 1 shows a general diagram of a block-based video encoder for VVC. In particular, fig. 1 shows a typical encoder 100. The encoder 100 has a video input 110, motion compensation 112, motion estimation 114, intra/inter mode decision 116, block prediction value 140, adder 128, transform 130, quantization 132, prediction related information 142, intra prediction 118, picture buffer 120, inverse quantization 134, inverse transform 136, adder 126, memory 124, in-loop filter 122, entropy coding 138, and bitstream 144.
In encoder 100, a video frame is partitioned into multiple video blocks for processing. For each given video block, a prediction is formed based on either an inter prediction method or an intra prediction method.
A prediction residual, representing the difference between the current video block (part of video input 110) and the prediction of the current video block (part of block prediction value 140), is sent from adder 128 to transform 130. The transform coefficients are then sent from transform 130 to quantization 132 for entropy reduction. The quantized coefficients are then fed to entropy encoding 138 to generate a compressed video bitstream. As shown in fig. 1, prediction related information 142 from the intra/inter mode decision 116, such as video block partition information, Motion Vectors (MVs), reference picture indices and intra prediction modes, is also fed through entropy coding 138 and saved into a compressed bitstream 144. The compressed bitstream 144 comprises a video bitstream.
In the encoder 100, decoder-related circuitry is also required in order to reconstruct the pixels for prediction purposes. First, the prediction residual is reconstructed by inverse quantization 134 and inverse transformation 136. The reconstructed prediction residual is combined with the block prediction value 140 to generate an unfiltered reconstructed pixel of the current video block.
Spatial prediction (or "intra prediction") uses pixels from samples (called reference samples) of already coded neighboring blocks in the same video frame as the current video block to predict the current video block.
Temporal prediction (also referred to as "inter prediction") uses reconstructed pixels from already coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in video signals. The temporal prediction signal for a given Coding Unit (CU) or coding block is typically denoted by one or more MVs, where the one or more MVs indicate the amount and direction of motion between the current CU and its temporal reference. Furthermore, if multiple reference pictures are supported, one reference picture index is additionally transmitted, wherein the reference picture index is used to identify which reference picture in the reference picture memory the temporal prediction signal comes from.
Motion estimation 114 accesses video input 110 and signals from picture buffer 120 and outputs motion estimation signals to motion compensation 112. Motion compensation 112 accesses video input 110, signals from picture buffer 120, and motion estimation signals from motion estimation 114 and outputs motion compensated signals to intra/inter mode decision 116.
After performing spatial and/or temporal prediction, an intra/inter mode decision 116 in the encoder 100 selects the best prediction mode, e.g., based on a rate-distortion optimization method. The block prediction value 140 is then subtracted from the current video block and the resulting prediction residual is decorrelated using transform 130 and quantization 132. The resulting quantized residual coefficients are dequantized by dequantization 134 and inverse transformed by inverse transform 136 to form a reconstructed residual, which is then added back to the prediction block to form the reconstructed signal for the CU. In-loop filtering 122, such as a deblocking filter, Sample Adaptive Offset (SAO), and/or adaptive in-loop filter (ALF), may be further applied to the reconstructed CU prior to placing the reconstructed CU in a reference picture memory of picture buffer 120 and used to encode future video blocks. To form the output video bitstream 144, the codec mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to the entropy encoding unit 138 to be further compressed and packed to form the bitstream.
Fig. 1 presents a block diagram of a generic block-based hybrid video coding system. The input video signal is processed block by block, called Coding Unit (CU). In VTM-1.0, a CU may be 128 × 128 pixels maximum. However, unlike HEVC which partitions blocks based on only quadtrees, in VVC one Coding Tree Unit (CTU) is partitioned into CUs to adapt to the varying local characteristics based on quadtrees/binary trees/ternary trees. In addition, the concept of multiple partition unit types in HEVC is removed, i.e., there is no longer a distinction of CU, Prediction Unit (PU), and Transform Unit (TU) in VVC; instead, each CU is always used as a base unit for both prediction and transform without further partitioning. In the multi-type tree structure, one CTU is first divided by the quadtree structure. Each leaf node of the quadtree may then be further partitioned by binary and ternary tree structures.
As shown in fig. 3A, 3B, 3C, 3D, and 3E, there are five partition types: quaternary segmentation, horizontal binary segmentation, vertical binary segmentation, horizontal ternary segmentation, and vertical ternary segmentation.
Fig. 3A shows a diagram illustrating block quartering in a multi-type tree structure according to the present invention.
FIG. 3B shows a diagram illustrating block vertical binary partitioning in a multi-type tree structure according to the present invention.
FIG. 3C shows a diagram illustrating block-level binary partitioning in a multi-type tree structure according to the present invention.
FIG. 3D shows a diagram illustrating vertical ternary partitioning of blocks in a multi-type tree structure in accordance with the present invention.
FIG. 3E shows a diagram illustrating block-level ternary partitioning in a multi-type tree structure in accordance with the present invention.
In fig. 1, spatial prediction and/or temporal prediction may be performed. Spatial prediction (or "intra prediction") uses pixels from samples (called reference samples) of already coded neighboring blocks in the same video picture/slice to predict the current video block. Spatial prediction reduces the spatial redundancy inherent in video signals. Temporal prediction (also referred to as "inter prediction" or "motion compensated prediction") uses reconstructed pixels from already coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in video signals. The temporal prediction signal for a given CU is typically denoted by one or more Motion Vectors (MV), where the one or more Motion Vectors (MV) indicate the amount and direction of motion between the current CU and its temporal reference. Furthermore, if multiple reference pictures are supported, one reference picture index is additionally transmitted, wherein the reference picture index is used to identify which reference picture in the reference picture memory the temporal prediction signal comes from. After spatial and/or temporal prediction, a mode decision block in the encoder selects the best prediction mode, e.g., based on a rate-distortion optimization method. The prediction block is then subtracted from the current video block and the prediction residual is decorrelated using transform and quantization. The quantized residual coefficients are inverse quantized and inverse transformed to form a reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU. Furthermore, in-loop filtering, such as deblocking filters, Sample Adaptive Offset (SAO), and adaptive in-loop filters (ALF), may be applied to the reconstructed CU prior to placing the reconstructed CU in reference picture memory and used to encode future video blocks. To form the output video bitstream, the codec mode (inter or intra), prediction mode information, motion information and quantized residual coefficients are all sent to an entropy coding unit to be further compressed and packed to form the bitstream.
Fig. 2 shows a general block diagram of a video decoder for VVC. In particular, fig. 2 shows a block diagram of a typical decoder 200. The decoder 200 has a bitstream 210, entropy decoding 212, inverse quantization 214, inverse transform 216, adder 218, intra/inter mode selection 220, intra prediction 222, memory 230, in-loop filter 228, motion compensation 224, picture buffer 226, prediction related information 234, and video output 232.
The decoder 200 is similar to the reconstruction related part of the encoder 100 belonging to fig. 1. In the decoder 200, an input video bitstream 210 is first decoded by entropy decoding 212 to derive quantized coefficient levels and prediction related information. The quantized coefficient levels are then processed by inverse quantization 214 and inverse transformation 216 to obtain a reconstructed prediction residual. The block prediction value mechanism implemented in the intra/inter mode selector 220 is configured to perform intra prediction 222 or motion compensation 224 based on the decoded prediction information. A set of unfiltered reconstructed pixels is obtained by adding the reconstructed prediction residual from the inverse transform 216 to the prediction output generated by the block prediction value mechanism using adder 218.
The reconstructed block may further pass through an in-loop filter 228 before being stored in a picture buffer 226 that serves as a reference picture storage. The reconstructed video in the picture buffer 226 may be sent to drive a display device and used to predict future video blocks. With the in-loop filter 228 turned on, a filtering operation is performed on these reconstructed pixels to derive a final reconstructed video output 232.
Fig. 2 presents a general block diagram of a block-based video decoder. The video bitstream is first entropy decoded at an entropy decoding unit. The coding mode and prediction information are sent to a spatial prediction unit (if intra coding) or a temporal prediction unit (if inter coding) to form a prediction block. The residual transform coefficients are sent to an inverse quantization unit and an inverse transform unit to reconstruct a residual block. The prediction block and the residual block are then added together. The reconstructed block may be further subjected to in-loop filtering before it is stored in the reference picture store. The reconstructed video in the reference picture store is then sent out to drive the display device and used to predict future video blocks.
Bidirectional light stream
Conventional bi-prediction in video coding is a simple combination of two temporally predicted blocks obtained from already reconstructed reference pictures. However, due to the limitation of block-based motion compensation, there may be remaining small motion that may be observed between the samples of the two prediction blocks, thereby reducing the efficiency of motion compensated prediction. To address this inefficiency, bi-directional optical flow (BDOF) is applied in VVC, for example, to reduce the effect of such motion on each sample point within a block.
Fig. 4 shows a diagram of a BDOF model according to the present disclosure.
Specifically, an error! No reference source is found. When bi-prediction is used, BDOF is a sample-wise motion refinement performed on top of block-based motion compensated prediction. After applying BDOF within a 6 × 6 window Ω around the sub-block, motion refinement (v) for each 4 × 4 sub-block is calculated by minimizing the difference between the L0 predicted samples and the L1 predicted samples (v £ v @)s,vy). Specifically, (v)s,vy) The value of (d) is derived as:
Figure BDA0003423201720000091
wherein the content of the first and second substances,
Figure BDA0003423201720000098
is a floor function; clip3(min, max, x) is the clipping of a given value x into the range min, max]A function of; the symbol > denotes a bit-by-bit right shift operation; the symbol < represents a bit-by-bit left shift operation; th (h)BDOFIs a motion refinement threshold to prevent propagation errors due to irregular local motion, which is equal to 213-BDWhere BD is the bit depth of the input video. In the step (1), the first step is carried out,
Figure BDA0003423201720000092
Figure BDA0003423201720000093
S1、S2、S3、S5and S6The value of (d) is calculated as:
Figure BDA0003423201720000094
wherein
Figure BDA0003423201720000095
Wherein, I(k)(I, j) is the sample value at coordinate (I, j) of the prediction signal in list k, k being 0, 1, where I is generated with medium to high precision (i.e., 16 bits)(k)(i,j);
Figure BDA0003423201720000096
And
Figure BDA0003423201720000097
is a horizontal gradient of a sample point obtained by directly calculating a difference between two adjacent sample points of the sample pointAnd a vertical gradient, i.e.,
Figure BDA0003423201720000101
based on the motion refinement derived in (1), the final bidirectional predicted samples for the CU are computed by interpolating the L0/L1 predicted samples along the optical flow model-based motion trajectory, as shown below:
Figure BDA0003423201720000102
wherein shift and ooffsetIs a right shift value and an offset value applied to combine the L0 predicted signal and the L1 predicted signal for bi-directional prediction, shift and ooffsetEqual to 15-BD and 1 < (14-BD) +2 (1 < 13), respectively. Table 1 shows the specific bit widths of the intermediate parameters involved in the BDOF procedure. As shown in the table, the internal bit width of the entire BDOF process does not exceed 32 bits. In addition, the multiplication with the worst possible input takes place by the product v in (1)xS2m, where the inputs are 15 bits and 4 bits. Therefore, a 15-bit multiplier is sufficient for BDOF.
TABLE 1 bit Width of intermediate parameters of BDOF in VVC
Figure BDA0003423201720000103
Figure BDA0003423201720000111
Figure BDA0003423201720000121
The terminology used in the present disclosure is for the purpose of describing illustrative examples only and is not intended to be limiting of the present disclosure. As used in this disclosure and the appended claims, the singular forms "a", "an", and "the" (are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that, unless the context clearly dictates otherwise, the terms "or" and/or "as used herein are intended to mean and include any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, third, etc. may be included herein to describe various information, the information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, the first information may include what is referred to as second information without departing from the scope of the present disclosure; and similarly, the second information may also be referred to as the first information. As used herein, the term "if" may be understood to mean "when … …" or "at … …" or "in response to … …", depending on the context.
Reference throughout this specification to "one example," "an exemplary example," etc., in the singular or plural form means that one or more particular features, structures, or characteristics described in connection with the example are included in at least one example of the present disclosure. Thus, the appearances of the phrases "in one example" or "in an example," "in an illustrative example," or the like, in the singular or plural, in various places throughout this specification are not necessarily all referring to the same example. Furthermore, the particular features, structures, or characteristics in one or more examples may be combined in any suitable manner.
Current BDOF and PROF designs
Although BDOF may improve the efficiency of bi-predictive prediction, its design may still be further improved. In particular, the following inefficiencies in existing BDOF designs in VVCs for controlling the bit width of intermediate parameters are identified in this disclosure.
1) As shown in table 1, the parameter θ (i, j) (i.e., the difference between the L0 predicted samples and the L1 predicted samples) and the parameter ψ are expressed in the same bit width of 11 bitsx(i, j) and parametersψy(i, j) (i.e., the sum of the horizontal L0 gradient values and the horizontal L1 gradient values/the sum of the vertical L0 gradient values and the vertical L1 gradient values). While such an approach may facilitate overall control of the internal bit-width of the BDOF, it is suboptimal with respect to the accuracy of the derived motion refinement. This is because, as shown in (4), the gradient value is calculated as a difference between neighboring prediction samples; due to the high-pass nature of this process, the derived gradient is less reliable in the presence of noise (e.g., noise captured in the original video and codec noise generated during the codec process). This means that it may not always be beneficial to represent the gradient values with a high bit width.
2) As shown in table 1, refining v with vertical motionyThe calculation of (b) occurs the maximum bit width usage of the entire BDOF process, where S6(27 bits) are first left-shifted by 3 bits and then subtracted ((v)xS2,m)<<12+vxS2,s) 2 (/ 30 bits). Thus, the maximum bit width of the current design is equal to 31 bits. In practical hardware implementations, the codec procedure is typically implemented by a 32-bit implementation with a maximum internal bit width greater than 16 bits. Thus, existing designs do not take full advantage of the effective dynamic range of 32-bit implementations. This may result in an unnecessary loss of precision for the motion refinement derived from BDOF.
Improved bit width control method
In the present disclosure, an improved bit-width control method is proposed to solve two problems of the bit-width control method, as indicated in the "current BDOF and PROF design" section for the existing BDOF design. First, to overcome the negative effects of gradient estimation errors, gradient values are calculated in (4)
Figure BDA0003423201720000141
And
Figure BDA0003423201720000142
introducing an additional right shift n in the proposed methodgradI.e. to reduce the interior of the gradient valuesA partial bit width. Specifically, the horizontal and vertical gradients at each sample location are calculated as:
Figure BDA0003423201720000143
furthermore, in order to control the entire BDOF process such that the entire BDOF process is operated with the appropriate internal bit width, the additional bits are shifted by nadjIntroduced into variable ψx(i,j)、ψyIn the calculation of (i, j) and θ (i, j), as follows:
Figure BDA0003423201720000144
as will be seen in table 2, due to the modification to the number of right shift bits applied in (6) and (7), the three parameters ψ are represented with the same dynamic range (i.e., 21 bits) as in the existing BDOF design in table 1x(i,j)、ψy(i, j) and θ (i, j)) by comparison of the parameter ψx(i, j), parameter ψyThe dynamic ranges of (i, j) and the parameter θ (i, j) will be different. Such a change may increase the internal parameter S1Internal parameter S2Internal parameter S3Internal parameter S5And an internal parameter S6Which may potentially increase the maximum bit width of the internal BDOF process to over 32 bits. Therefore, to ensure a 32-bit implementation, S is calculated2And S6Two additional clipping operations are introduced for the value of (c). In particular, in the proposed method, the values of two parameters are calculated as
Figure BDA0003423201720000151
Wherein, B2And B6Respectively is control of S2And S6The output dynamic range of the parameter. It should be noted that, unlike the gradient calculation, the clipping operation in (8) is applied only once to calculate oneThe motion refinement of each 4 x 4 sub-block within a BDOF CU (i.e., the clipping operation in (8) is invoked based on 4 x 4 units). The corresponding increase in complexity due to the clipping operation introduced in the proposed method is therefore completely negligible.
In practice, different n may be appliedgrad、nadj、B2And B6To achieve different trade-offs between intermediate bit width and internal BDOF derivation precision. As an embodiment of the present disclosure, n is proposedgradAnd nadjSet to 2, set B2Set to 25 and set B6Set to 27. Table 2 shows the corresponding bit width for each intermediate parameter when the proposed bit width control method is applied to the BDOF. In table 2, the grey highlights the changes applied in the proposed bit width control method compared to the existing BDOF design in VVC. As can be seen from table 2, with the proposed bit width control method, the internal bit width of the entire BDOF process does not exceed 32 bits. Furthermore, with the proposed design, the maximum bit width is exactly 32 bits, which may fully exploit the available dynamic range of a 32-bit hardware implementation. On the other hand, as shown in the table, at the product vxS2,mWhere multiplication with the worst possible input takes place, where the input S2,mIs 14 bits and inputs vxIs 6 bits. Thus, as with the existing BDOF design, a 16-bit multiplier is also large enough when applying the proposed method.
Table 2 bit width of the intermediate parameters of the proposed method
Figure BDA0003423201720000152
Figure BDA0003423201720000161
Figure BDA0003423201720000171
In the above method, a clipping operation as in equation (8) is added to avoid deriving vsAnd vyOverflow of the intermediate parameter. However, such clipping is only required when the relevant parameters accumulate in a large local window. When a small window is applied, it may not overflow. Therefore, in another embodiment of the present disclosure, the following bit depth control method is proposed for the BDOF method without clipping, as described below.
1) First, the gradient value in (4) at each sampling point position
Figure BDA0003423201720000172
And
Figure BDA0003423201720000173
is calculated as:
Figure BDA0003423201720000174
Figure BDA0003423201720000181
2) then, the relevant parameter psi for the BDOF procedurex(i,j)、ψy(i, j) and θ (i, j) are calculated as:
Figure BDA0003423201720000182
3)S1、S2、S3、S5and S6The value of (d) is calculated as:
Figure BDA0003423201720000183
4) motion refinement (v) for each 4 × 4 sub-blocks,vy) Is derived as:
Figure BDA0003423201720000184
5) the final bidirectional predicted samples for the CU are computed by interpolating the L0/L1 predicted samples along the motion trajectory based on the optical flow model, as shown by:
predBDOF(x,y)=(I(0)(x,y)+I(1)(x,y)+b+ooffset)>>shift (13)
Figure BDA0003423201720000191
1) the above-described BDOF bit-width control method is based on the consumption that the internal bit-depth for coding and decoding video cannot exceed 12 bits, so that the accuracy of the output signal from Motion Compensation (MC) is 14 bits. In other words, when the internal bit depth is greater than 12 bits, the BDOF bit width control method as specified in (9) to (13) cannot guarantee that all bit depths of the internal BDOF operation are within 32 bits. To address this overflow inefficiency for high internal bit depths, the BDOF bit depth control method is improved below by introducing an additional bit-wise right shift that depends on the internal bit depth applied after the MC stage. By this method, the MC output signal is always shifted to 14 bits when the internal bit depth is larger than 12 bits, so that the existing BDOF bit depth control method designed for the internal bit depth of 8 bits to 12 bits can be reused for the BDOF process of high bit depth video. In particular, assuming that the bit depth is an internal bit depth, the proposed method is described by the following steps: first, in (4), the gradient value at each sampling point position
Figure BDA0003423201720000192
And
Figure BDA0003423201720000193
is calculated as:
Figure BDA0003423201720000194
2) then, the relevant parameter psi for the BDOF procedurex(i,j),ψy(i, j) and θ (i, j) are calculated as:
Figure BDA0003423201720000195
3)S1、S2、S3、S5and S6The value of (d) is calculated as:
Figure BDA0003423201720000201
4) the motion refinement for each 4 x 4 sub-block is derived as:
Figure BDA0003423201720000202
therein, thBDOFIs the motion refinement threshold, which is calculated as 1 < max (5, bit depth-7) based on the internal bit depth.
Fig. 5 illustrates a method 500 of decoding a video signal according to the present disclosure. The method may be applied, for example, to a decoder.
In step 510, the decoder may obtain a first reference picture I associated with a video block(0)And a second reference picture I(1). First reference picture I in display order(0)May precede the current picture and be a second reference picture I(1)After the current picture.
In step 512, the decoder may derive a first reference picture I(0)Obtain a first prediction sample I of the video block(0)(i, j). The numbers i and j may represent the coordinates of a sample point within the current picture.
In step 514, the decoder may read the second reference picture I(1)Chinese ginsengObtaining second prediction sample point I of video block by considering block(1)(i,j)。
In step 516, the decoder may control an inner bit depth of the BDOF by applying a right shift to an inner BDOF parameter, wherein the BDOF is independent of the input video bit depth, and wherein the inner BDOF parameter comprises an inner BDOF parameter based on the first predicted sample I(0)(I, j) deriving horizontal and vertical gradient values based on the second predicted sample I(1)(I, j) derived horizontal and vertical gradient values, and first predicted sample I(0)(I, j) and a second predicted sample I(1)(i, j) sample difference between (i, j).
In step 518, the decoder may predict samples I from the first prediction sample based on BDOF(0)(I, j) and a second predicted sample I(1)(i, j) are applied to the video block to obtain final bi-predictive samples for the video block.
In the above approach, although the design may ensure that the maximum intermediate bit depth for all internal parameters of the BDOF derivation does not exceed 32 bits, it may still result in different internal bit depths for the BDOF refinement derivation when the bit depths of the input video are different. In the following, another BDOF bit depth control method is proposed, wherein the internal bit depth of the BDOF derivation is independent of the input video bit depth.
1) First, the gradient value in (4) at each sampling point position
Figure BDA0003423201720000211
And
Figure BDA0003423201720000212
is calculated as:
Figure BDA0003423201720000213
2) then, the relevant parameter psi for the BDOF procedurex(i,j)、ψy(i, j) and θ (i, j) are calculated as:
Figure BDA0003423201720000214
3)S1、S2、S3、S5and S6The value of (d) is calculated as:
Figure BDA0003423201720000215
Figure BDA0003423201720000221
Figure BDA0003423201720000222
4) motion refinement (v) for each 4 × 4 sub-blocks,vy) Is derived as:
Figure BDA0003423201720000223
therein, thBDOFIs the motion refinement threshold, which is a constant equal to 32.
5) The final bidirectional predicted samples for the CU are computed by interpolating the L0/L1 predicted samples along the motion trajectory based on the optical flow model, as shown by:
Figure BDA0003423201720000224
fig. 6 illustrates a method 600 of decoding a video signal according to the present disclosure. The method can be applied, for example, to a decoder.
In step 610, the decoder may obtain the horizontal gradient difference. The horizontal gradient difference may be a difference between the first horizontal gradient value and the second horizontal gradient value.
In step 612, the decoder may obtain the vertical gradient difference. The vertical gradient difference may be a difference between the first vertical gradient value and the second vertical gradient value.
In step 614, the decoder may left shift the horizontal gradient difference by a third shift value.
In step 616, the decoder may left-shift the vertical gradient difference by a third shift value.
In step 618, the decoder may calculate a sample refinement value based on the sum of the product of the horizontal motion refinement value and the horizontal gradient difference and the product of the vertical motion refinement value and the vertical gradient difference. For example, the sampling point refinement value may be calculated based on the following equation (27). The sample refinement value may be b in equation (27), where the sample refinement value is used to calculate pred for the final bi-directionally predicted sampleBDOF(x,y)。
In step 620, the decoder may base on the first predicted sample I(0)(I, j) second predicted sample point I (1)(i, j), the sample refinement value, and the offset value to obtain a final bi-directionally predicted sample for the video block.
In step 622, the decoder may right-shift the final bidirectional prediction samples by a fourth shift value.
In current VVC designs, the accuracy of the output prediction signal is not constant when the internal bit depth is greater than 12 bits. Such a design may not be very friendly for the following BDOF derivation. In one embodiment of the present disclosure, a new bit-shifting method is proposed for motion compensated prediction for high internal bit depths (i.e., > 12 bits) to align the accuracy of the output prediction signal to a constant (e.g., 20 bits). For example, the proposed method may comprise at least the following steps:
1) horizontal interpolation is performed using reference samples from a temporal reference picture to obtain horizontal fractional interpolated prediction samples, and then, when the internal bit depth is not greater than 12 bits, a right shift of one (bit depth-8) is applied to the interpolated prediction samples. In addition, if the internal bit depth is greater than 12 bits, a left shift of one (bit depth-12) is applied to the interpolated predicted samples.
2) Vertical interpolation is performed using the interpolated samples from the first step, applying a 6-bit right shift.
Given a fixed precision of the interpolated predicted samples, the following unified BDOF bit depth control method can be applied to high internal bit depths exceeding 12 bits:
1) first, the gradient value in (4) at each sampling point position
Figure BDA0003423201720000231
And
Figure BDA0003423201720000232
is calculated as:
Figure BDA0003423201720000233
2) then, the relevant parameter psi for the BDOF procedurex(i,j)、ψy(i, j) and θ (i, j) are calculated as:
Figure BDA0003423201720000234
Figure BDA0003423201720000241
θ(i,j)=(I(1)(i,j)>>8)-(I(0)(i,j)>>8)
3)S1、S2、S3、S5and S6The value of (d) is calculated as:
Figure BDA0003423201720000242
4) motion refinement (v) for each 4 × 4 sub-blocks,vy) Is derived as:
Figure BDA0003423201720000243
therein, thBDOFIs the motion refinement threshold, which is a constant equal to 32.
5) The final bidirectional predicted samples for the CU are computed by interpolating the L0/L1 predicted samples along the motion trajectory based on the optical flow model, as shown by:
Figure BDA0003423201720000251
the above-described methods may be implemented using an apparatus comprising one or more circuits including an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components. The apparatus may use circuitry in combination with other hardware components or software components to perform the above-described method. Each module, sub-module, unit or sub-unit disclosed above may be implemented at least in part using the one or more circuits.
Fig. 7 illustrates a computing environment 710 coupled with a user interface 760. The computing environment 710 may be part of a data processing server. The computing environment 710 includes a processor 720, a memory 740, and an I/O interface 750.
The processor 720 generally controls the overall operation of the computing environment 710, such as operations associated with display, data acquisition, data communication, and image processing. The processor 720 may include one or more processors for executing instructions to perform all or some of the steps of the above-described methods. Further, processor 720 may include one or more modules that facilitate interaction between processor 720 and other components. The processor may be a Central Processing Unit (CPU), microprocessor, single-chip, GPU, etc.
The memory 740 is configured to store various types of data to support the operation of the computing environment 710. The memory 740 may include predetermined software 742. Examples of such data include instructions for any application or method operating on computing environment 710, video data sets, image data, and so forth. The memory 740 may be implemented using any type or combination of volatile and non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, a magnetic or optical disk.
I/O interface 750 provides an interface between processor 720 and peripheral interface modules such as a keyboard, click wheel, buttons, etc. The buttons may include, but are not limited to, a home button, a start scan button, and a stop scan button. The I/O interface 750 may be coupled with an encoder and a decoder.
In some embodiments, a non-transitory computer readable storage medium comprising a plurality of programs, such as embodied in the memory 740, executable by the processor 720 in the computing environment 710 for performing the above-described methods is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium has stored therein a plurality of programs for execution by a computing device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the computing device to perform the above-described method for motion prediction.
In some embodiments, the computing environment 710 may be implemented using one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), Graphics Processors (GPUs), controllers, micro-controllers, microprocessors, or other electronic components that perform the above-described methods.
The description of the present disclosure has been presented for purposes of illustration and is not intended to be exhaustive or limited to the disclosure. Many modifications, variations and alternative embodiments will become apparent to those skilled in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings.
The examples were chosen and described in order to explain the principles of the disclosure and to enable others of ordinary skill in the art to understand the disclosure for various embodiments and with the best mode of carrying out the disclosure with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of the disclosure is not to be limited to the specific examples of the embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the disclosure.

Claims (34)

1. A bit-depth control method for decoding a bi-directional optical flow BDOF of video signal, comprising:
obtaining, at a decoder, a first reference picture I associated with a video block(0)And a second reference picture I(1)Wherein, in display order, the first reference picture I(0)Prior to a current picture and the second reference picture I(1)Subsequent to the current picture;
determining a reference picture I from the first reference picture I at the decoder(0)Obtaining a first prediction sample point I of the video block from the reference block in (1)(0)(i, j), wherein i and j represent coordinates of a sample point within the current picture;
determining a second reference picture I from the second reference picture I at the decoder(1)Obtaining a second prediction sample point I of the video block from the reference block in (1)(1)(i,j);
Controlling, at the decoder, an inner bit depth of the BDOF by applying a right shift to an inner BDOF parameter, wherein the BDOF is independent of an input video bit depth, and wherein the inner BDOF parameter comprises an I-prediction sample based on the first prediction sample(0)(I, j) derived horizontal and vertical gradient values based on the second predicted sample I(1)(I, j) derived horizontal and vertical gradient values, and the first predicted sample I(0)(I, j) and the second predicted sample I(1)(i, j) a sample difference value between; and is
At the decoder, based on the BDOF, according to the first predicted sample I(0)(I, j) and the second predicted sample I(1)(i, j) is applied to the video block to obtain final bi-predictive samples for the video block.
2. The method of claim 1, wherein controlling the internal bit depth of the BDOF by applying a right shift to internal BDOF parameters comprises:
based on first predicted samples I at the decoder(0)(I +1, j) and the first predicted sample I(0)(I-1, j) to obtain a first predicted sample I(0)(ii) a first horizontal gradient value of (i, j);
based on second predicted samples I at the decoder(1)(I +1, j) and a second predicted sample I(1)(I-1, j) to obtain a second predicted sample I(1)(ii) a second horizontal gradient value of (i, j);
based on first predicted samples I at the decoder(0)(I, j +1) and the first predicted sample I(0)(I, j-1) to obtain a first predicted sample I(0)(ii) a first vertical gradient value of (i, j);
based on second predicted samples I at the decoder(1)(I, j +1) and a second predicted sample I(1)(I, j-1) to obtain a second predicted sample I(1)(ii) a second vertical gradient value of (i, j);
right-shifting the first horizontal gradient value and the second horizontal gradient value by a first shift value at the decoder; and
right-shifting the first vertical gradient value and the second vertical gradient value by the first shift value at the decoder.
3. The method of claim 2, wherein the first shift value is equal to a codec bit depth minus 6.
4. The method of claim 1, further comprising:
obtaining, at the decoder, a first correlation value, wherein the first correlation value is based on the first predicted sample I(0)(I, j) and (ii) a horizontal gradient value based on the second predicted sample point I(1)(ii) the sum of the horizontal gradient values of (i, j);
obtaining, at the decoder, a second correlation value, wherein the second correlation value is based on the first predicted sample I(0)(I, j) and (ii) a vertical gradient value based on the second predicted sample I(1)(ii) the sum of the vertical gradient values of (i, j);
modifying the first correlation value by right shifting the first correlation value by 1 at the decoder; and
modifying the second correlation value by right shifting the second correlation value by 1 at the decoder.
5. The method of claim 4, further comprising:
transmitting the first predicted sample I at the decoder by using a second shift value(0)(i, j) right shifting to obtain a first modified predicted sample;
transmitting the second predicted sample I at the decoder by using the second shift value(1)(i, j) right shifting to obtain a second modified predicted sample; and
obtaining a third correlation value at the decoder, wherein the third correlation value is a difference between the first modified predicted sample and the second modified predicted sample.
6. The method of claim 5, wherein the second shift value is equal to a codec bit depth minus 8.
7. The method of claim 5, further comprising:
obtaining, at the decoder, a plurality of internal summation values based on the first correlation value, the second correlation value, and the third correlation value within each 4 x 4 sub-block of the video block.
8. The method of claim 7, further comprising:
obtaining, at the decoder, a horizontal motion refinement value based on at least one of the plurality of internal summation values, wherein a motion refinement value comprises the horizontal motion refinement value;
obtaining, at the decoder, a vertical motion refinement value based on at least one of the plurality of intra-summed values and the horizontal motion refinement value, wherein a motion refinement value comprises the vertical motion refinement value; and
clipping, at the decoder, the horizontal motion refinement value and the vertical motion refinement value based on a motion refinement threshold.
9. The method of claim 8, further comprising:
obtaining, at the decoder, a horizontal gradient difference value, wherein the horizontal gradient difference value is a difference between a first horizontal gradient value and a second horizontal gradient value;
obtaining, at the decoder, a vertical gradient difference, wherein the vertical gradient difference is a difference between a first vertical gradient value and a second vertical gradient value;
left-shifting the horizontal gradient difference by a third shift value at the decoder;
left shifting the vertical gradient difference by the third shift value at the decoder;
calculating, at the decoder, a sample refinement value based on a sum of a product of the horizontal motion refinement value and the horizontal gradient difference and a product of the vertical motion refinement value and the vertical gradient difference;
based on the first predicted sample I at the decoder(0)(I, j) the second predicted sample point I(1)(i, j), a sum of the sample refinement value and an offset value to obtain the final bi-directionally predicted sample of the video block; and
right shifting the final bi-directionally predicted samples by a fourth shift value at the decoder.
10. The method of claim 9, wherein the third shift value is equal to a codec bit depth minus 12.
11. A bit-depth control method for decoding a bi-directional optical flow BDOF of video signal, comprising:
obtaining, at a decoder, a first reference picture I associated with a video block(0)And a second reference picture I(1)Wherein, in display order, the first reference picture I(0)Prior to a current picture and the second reference picture I(1)Subsequent to the current picture;
determining a reference picture I from the first reference picture I at the decoder(0)Obtaining a first prediction sample point I of the video block from the reference block in (1)(0)(i, j), wherein i and j represent coordinates of a sample point within the current picture;
determining a second reference picture I from the second reference picture I at the decoder(1)Obtaining a second prediction sample point I of the video block from the reference block in (1)(1)(i,j);
Controlling, at the decoder and when an internal bit depth is greater than 12 bits, an internal BDOF parameter of the BDOF by applying a right shift to the internal BDOF parameter to align accuracy of an output prediction signal to a constant, wherein the internal BDOF parameter includes I based on the first prediction samples(0)(I, j) derived horizontal and vertical gradient values based on the second predicted sample I(1)(I, j) derived horizontal and vertical gradient values, and the first predicted sample I(0)(I, j) and the second predicted sample I(1)(i, j) a sample difference value between;
at the decoder, based on the BDOF, according to the first predicted sample I(0)(I, j) and the second predicted sample I(1)(i, j) is applied to the video block to obtain final bi-predictive samples of the video block; and is
Obtaining, at the decoder, the output prediction signal based on the final bi-directionally predicted samples.
12. The method of claim 11, wherein when the internal bit depth is greater than 12 bits, controlling the internal bit depth of the BDOF by applying a right shift to the internal BDOF parameters to align the precision of the output prediction signal to a constant comprises:
based on first predicted samples I at the decoder(0)(I +1, j) and the first predicted sample I(0)(I-1, j) to obtain a first predicted sample I(0)(ii) a first horizontal gradient value of (i, j);
based on second predicted samples I at the decoder(1)(I +1, j) and a second predicted sample I(1)(I-1, j) to obtain a second predicted sample I(1)(ii) a second horizontal gradient value of (i, j);
based on first predicted samples I at the decoder(0)(I, j +1) and the first predicted sample I(0)(I, j-1) to obtain a first predicted sample I(0)(ii) a first vertical gradient value of (i, j);
based on second predicted samples I at the decoder(1)(I, j +1) and a second predicted sample I(1)(I, j-1) to obtain a second predicted sample I(1)(ii) a second vertical gradient value of (i, j);
right-shifting the first horizontal gradient values and the second horizontal gradient values by 10 at the decoder; and
right-shifting the first vertical gradient value and the second vertical gradient value by 10 at the decoder.
13. The method of claim 11, further comprising:
obtaining, at the decoder, a first correlation value, wherein the first correlation value is based on the first predicted sample I(0)(I, j) and (ii) a horizontal gradient value based on the second predicted sample point I(1)(ii) the sum of the horizontal gradient values of (i, j);
obtaining, at the decoder, a second correlation value, wherein the second correlation value is based on the first predicted sample I(0)(I, j) and (ii) a vertical gradient value based on the second predicted sample I(1)(ii) the sum of the vertical gradient values of (i, j);
modifying the first correlation value by right shifting the first correlation value by 1 at the decoder; and
modifying the second correlation value by right shifting the second correlation value by 1 at the decoder.
14. The method of claim 13, further comprising:
at the decoder by sampling the first predicted samples I(0)(i, j) right shifting by 8 to obtain a first modified predicted sample;
by combining the second predicted samples I at the decoder(1)(i, j) right shifting by 8 to obtain a second modified predicted sample; and
obtaining a third correlation value at the decoder, wherein the third correlation value is a difference between the first modified predicted sample and the second modified predicted sample.
15. The method of claim 14, further comprising:
obtaining, at the decoder, a plurality of internal summation values based on the first correlation value, the second correlation value, and the third correlation value within each 4 x 4 sub-block of the video block.
16. The method of claim 15, further comprising:
obtaining, at the decoder, a horizontal motion refinement value based on at least one of the plurality of internal summation values, wherein a motion refinement value comprises the horizontal motion refinement value;
obtaining, at the decoder, a vertical motion refinement value based on at least one of the plurality of intra-summed values and the horizontal motion refinement value, wherein a motion refinement value comprises the vertical motion refinement value; and
clipping, at the decoder, the horizontal motion refinement value and the vertical motion refinement value based on a motion refinement threshold.
17. The method of claim 16, further comprising:
obtaining, at the decoder, a horizontal gradient difference value, wherein the horizontal gradient difference value is a difference between a first horizontal gradient value and a second horizontal gradient value;
obtaining, at the decoder, a vertical gradient difference, wherein the vertical gradient difference is a difference between a first vertical gradient value and a second vertical gradient value;
left shifting the horizontal gradient difference by 4 at the decoder;
left-shifting the vertical gradient difference by 4 at the decoder;
calculating a sample refinement value based on a sum of a product of the horizontal motion refinement value and the horizontal gradient difference and a product of the vertical motion refinement value and the vertical gradient difference;
based on the first predicted sample I at the decoder(0)(I, j) the second predicted sample point I(1)(i, j), a sum of the sample refinement value and an offset value to obtain the final bi-directionally predicted sample of the video block; and
right-shifting the final bi-directionally predicted samples by a shift value at the decoder.
18. A computing device, comprising:
one or more processors;
a non-transitory computer-readable storage medium storing instructions executable by the one or more processors, wherein the one or more processors are configured to:
obtaining, at a decoder, a first reference picture I associated with a video block(0)And a second reference picture I(1)Wherein, in the display order,the first reference picture I(0)Prior to a current picture and the second reference picture I(1)Subsequent to the current picture;
determining a reference picture I from the first reference picture I at the decoder(0)Obtaining a first prediction sample point I of the video block from the reference block in (1)(0)(i, j), wherein i and j represent coordinates of a sample point within the current picture;
determining a second reference picture I from the second reference picture I at the decoder(1)Obtaining a second prediction sample point I of the video block from the reference block in (1)(1)(i,j);
Controlling, at the decoder, an internal bit depth of a bi-directional optical flow (BDOF) by applying a right shift to BDOF parameters, wherein the BDOF is independent of input video bit depth, and wherein the internal BDOF parameters include an I-prediction based on the first prediction samples(0)(I, j) derived horizontal and vertical gradient values based on the second predicted sample I(1)(I, j) derived horizontal and vertical gradient values, and the first predicted sample I(0)(I, j) and the second predicted sample I(1)(i, j) a sample difference value between; and
at the decoder, based on the BDOF, according to the first predicted sample I(0)(I, j) and the second predicted sample I(1)(i, j) is applied to the video block to obtain final bi-predictive samples for the video block.
19. The computing device of claim 18, wherein the one or more processors configured to control the internal bit depth of the BDOF by applying a right shift to internal BDOF parameters are further configured to:
based on first predicted samples I at the decoder(0)(I +1, j) and the first predicted sample I(0)(I-1, j) to obtain a first predicted sample I(0)(ii) a first horizontal gradient value of (i, j);
based on second predicted samples I at the decoder(1)(i +1, j) and second predicted samplesI(1)(I-1, j) to obtain a second predicted sample I(1)(ii) a second horizontal gradient value of (i, j);
based on first predicted samples I at the decoder(0)(I, j +1) and the first predicted sample I(0)(I, j-1) to obtain a first predicted sample I(0)(ii) a first vertical gradient value of (i, j);
based on second predicted samples I at the decoder(1)(I, j +1) and a second predicted sample I(1)(I, j-1) to obtain a second predicted sample I(1)(ii) a second vertical gradient value of (i, j);
right-shifting the first horizontal gradient value and the second horizontal gradient value by a first shift value at the decoder; and
right-shifting the first vertical gradient value and the second vertical gradient value by the first shift value at the decoder.
20. The computing device of claim 19, wherein the first shift value is equal to a codec bit depth minus 6.
21. The computing device of claim 18, wherein the one or more processors are further configured to:
obtaining, at the decoder, a first correlation value, wherein the first correlation value is based on the first predicted sample I(0)(I, j) horizontal gradient values and sampling points I based on second prediction(1)(ii) the sum of the horizontal gradient values of (i, j);
obtaining, at the decoder, a second correlation value, wherein the second correlation value is based on the first predicted sample I(0)(I, j) and (ii) a vertical gradient value based on the second predicted sample I(1)(ii) the sum of the vertical gradient values of (i, j);
modifying the first correlation value by right shifting the first correlation value by 1 at the decoder; and
modifying the second correlation value by right shifting the second correlation value by 1 at the decoder.
22. The computing device of claim 21, wherein the one or more processors are further configured to:
transmitting the first predicted sample I at the decoder by using a second shift value(0)(i, j) right shifting to obtain a first modified predicted sample;
transmitting the second predicted sample I at the decoder by using the second shift value(1)(i, j) right shifting to obtain a second modified predicted sample; and
obtaining a third correlation value at the decoder, wherein the third correlation value is a difference between the first modified predicted sample and the second modified predicted sample.
23. The computing device of claim 22, wherein the second shift value is equal to a codec bit depth minus 8.
24. The computing device of claim 22, wherein the one or more processors are further configured to:
obtaining, at the decoder, a plurality of internal summation values based on the first correlation value, the second correlation value, and the third correlation value within each 4 x 4 sub-block of the video block.
25. The computing device of claim 24, further comprising:
obtaining, at the decoder, a horizontal motion refinement value based on at least one of the plurality of internal summation values, wherein a motion refinement value comprises the horizontal motion refinement value;
obtaining, at the decoder, a vertical motion refinement value based on at least one of the plurality of intra-summed values and the horizontal motion refinement value, wherein a motion refinement value comprises the vertical motion refinement value; and
clipping, at the decoder, the horizontal motion refinement value and the vertical motion refinement value based on a motion refinement threshold.
26. The computing device of claim 25, wherein the one or more processors are further configured to:
obtaining, at the decoder, a horizontal gradient difference value, wherein the horizontal gradient difference value is a difference between a first horizontal gradient value and a second horizontal gradient value;
obtaining, at the decoder, a vertical gradient difference, wherein the vertical gradient difference is a difference between a first vertical gradient value and a second vertical gradient value;
left-shifting the horizontal gradient difference by a third shift value at the decoder;
left shifting the vertical gradient difference by the third shift value at the decoder;
calculating, at the decoder, a sample refinement value based on a sum of a product of the horizontal motion refinement value and the horizontal gradient difference and a product of the vertical motion refinement value and the vertical gradient difference;
based on the first predicted sample I at the decoder(0)(I, j) the second predicted sample point I(1)(i, j), a sum of the sample refinement value and an offset value to obtain the final bi-directionally predicted sample of the video block; and
right shifting the final bi-directionally predicted samples by a fourth shift value at the decoder.
27. The computing device of claim 26, wherein the third shift value is equal to a codec bit depth minus 12.
28. A non-transitory computer readable storage medium storing a plurality of programs for execution by a computing device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the computing device to perform acts comprising:
obtaining, at a decoder, a first reference picture I associated with a video block(0)And a second reference picture I(1)Wherein, in display order, the first reference picture I(0)Prior to a current picture and the second reference picture I(1)Subsequent to the current picture;
determining a reference picture I from the first reference picture I at the decoder(0)Obtaining a first prediction sample point I of the video block from the reference block in (1)(0)(i, j), wherein i and j represent coordinates of a sample point within the current picture;
determining a second reference picture I from the second reference picture I at the decoder(1)Obtaining a second prediction sample point I of the video block from the reference block in (1)(1)(i,j);
Controlling, at the decoder and when an internal bit depth is greater than 12 bits, an internal bi-directional optical flow (BDOF) parameter by applying a right shift to the BDOF parameter to align accuracy of an output prediction signal to a constant, wherein the internal BDOF parameter includes I/O based on the first prediction samples(0)(I, j) derived horizontal and vertical gradient values based on the second predicted sample I(1)(I, j) derived horizontal and vertical gradient values, and the first predicted sample I(0)(I, j) and the second predicted sample I(1)(i, j) a sample difference value between;
at the decoder, based on the BDOF, according to the first predicted sample I(0)(I, j) and the second predicted sample I(1)(i, j) is applied to the video block to obtain final bi-predictive samples of the video block; and
obtaining, at the decoder, the output prediction signal based on the final bi-directionally predicted samples.
29. The non-transitory computer readable storage medium of claim 28, wherein the plurality of programs further cause the computing device to perform the acts of:
based on first predicted samples I at the decoder(0)(i+1, j) and the first predicted sample I(0)(I-1, j) to obtain a first predicted sample I(0)(ii) a first horizontal gradient value of (i, j);
based on second predicted samples I at the decoder(1)(I +1, j) and a second predicted sample I(1)(I-1, j) to obtain a second predicted sample I(1)(ii) a second horizontal gradient value of (i, j);
based on first predicted samples I at the decoder(0)(I, j +1) and the first predicted sample I(0)(I, j-1) to obtain a first predicted sample I(0)(ii) a first vertical gradient value of (i, j);
based on second predicted samples I at the decoder(1)(I, j +1) and a second predicted sample I(1)(I, j-1) to obtain a second predicted sample I(1)(ii) a second vertical gradient value of (i, j);
right-shifting the first horizontal gradient values and the second horizontal gradient values by 10 at the decoder; and
right-shifting the first vertical gradient value and the second vertical gradient value by 10 at the decoder.
30. The non-transitory computer readable storage medium of claim 28, wherein the plurality of programs further cause the computing device to perform the acts of:
obtaining, at the decoder, a first correlation value, wherein the first correlation value is based on the first predicted sample I(0)(I, j) and (ii) a horizontal gradient value based on the second predicted sample point I(1)(ii) the sum of the horizontal gradient values of (i, j);
obtaining, at the decoder, a second correlation value, wherein the second correlation value is based on the first predicted sample I(0)(I, j) and (ii) a vertical gradient value based on the second predicted sample I(1)(ii) the sum of the vertical gradient values of (i, j);
modifying the first correlation value by right shifting the first correlation value by 1 at the decoder; and
modifying the second correlation value by right shifting the second correlation value by 1 at the decoder.
31. The non-transitory computer readable storage medium of claim 30, wherein the plurality of programs further cause the computing device to perform the actions of:
at the decoder by sampling the first predicted samples I(0)(i, j) right shifting by 8 to obtain a first modified predicted sample;
by combining the second predicted samples I at the decoder(1)(i, j) right shifting by 8 to obtain a second modified predicted sample; and
obtaining a third correlation value at the decoder, wherein the third correlation value is a difference between the first modified predicted sample and the second modified predicted sample.
32. The non-transitory computer readable storage medium of claim 31, wherein the plurality of programs further cause the computing device to perform the acts of:
obtaining, at the decoder, a plurality of internal summation values based on the first correlation value, the second correlation value, and the third correlation value within each 4 x 4 sub-block of the video block.
33. The non-transitory computer-readable storage medium of claim 32, further comprising:
obtaining, at the decoder, a horizontal motion refinement value based on at least one of the plurality of internal summation values, wherein a motion refinement value comprises the horizontal motion refinement value;
obtaining, at the decoder, a vertical motion refinement value based on at least one of the plurality of intra-summed values and the horizontal motion refinement value, wherein a motion refinement value comprises the vertical motion refinement value; and
clipping, at the decoder, the horizontal motion refinement value and the vertical motion refinement value based on a motion refinement threshold.
34. The non-transitory computer readable storage medium of claim 33, wherein the plurality of programs further cause the computing device to perform the actions of:
obtaining, at the decoder, a horizontal gradient difference value, wherein the horizontal gradient difference value is a difference between a first horizontal gradient value and a second horizontal gradient value;
obtaining, at the decoder, a vertical gradient difference, wherein the vertical gradient difference is a difference between a first vertical gradient value and a second vertical gradient value;
left shifting the horizontal gradient difference by 4 at the decoder;
left-shifting the vertical gradient difference by 4 at the decoder;
calculating, at the decoder, a sample refinement value based on a sum of a product of the horizontal motion refinement value and the horizontal gradient difference and a product of the vertical motion refinement value and the vertical gradient difference;
based on the first predicted sample I at the decoder(0)(I, j) the second predicted sample point I(1)(i, j), a sum of the sample refinement value and an offset value to obtain the final bi-directionally predicted sample of the video block; and
right shifting the final bidirectional prediction samples by a shift value at the decoder.
CN202080045432.8A 2019-06-25 2020-06-25 Apparatus and method for bit width control of bi-directional optical flow Pending CN114175659A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201962866607P 2019-06-25 2019-06-25
US62/866,607 2019-06-25
US201962867185P 2019-06-26 2019-06-26
US62/867,185 2019-06-26
PCT/US2020/039702 WO2020264221A1 (en) 2019-06-25 2020-06-25 Apparatuses and methods for bit-width control of bi-directional optical flow

Publications (1)

Publication Number Publication Date
CN114175659A true CN114175659A (en) 2022-03-11

Family

ID=74061322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080045432.8A Pending CN114175659A (en) 2019-06-25 2020-06-25 Apparatus and method for bit width control of bi-directional optical flow

Country Status (2)

Country Link
CN (1) CN114175659A (en)
WO (1) WO2020264221A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116668717B (en) 2019-01-06 2024-03-15 北京达佳互联信息技术有限公司 Video encoding method, computing device, and medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10687069B2 (en) * 2014-10-08 2020-06-16 Microsoft Technology Licensing, Llc Adjustments to encoding and decoding when switching color spaces
CN114827599A (en) * 2016-02-03 2022-07-29 Oppo广东移动通信有限公司 Moving image decoding device, encoding device, and predicted image generation device

Also Published As

Publication number Publication date
WO2020264221A1 (en) 2020-12-30

Similar Documents

Publication Publication Date Title
CN114363612B (en) Method and apparatus for bit width control of bi-directional optical flow
US9270993B2 (en) Video deblocking filter strength derivation
JP2023030062A (en) Bit-width control for bi-directional optical flow
EP4018667A1 (en) Methods and apparatus on prediction refinement with optical flow
EP4032298A1 (en) Methods and apparatus for prediction refinement with optical flow
JP2023159292A (en) Methods and apparatus on prediction refinement with optical flow
WO2020257629A1 (en) Methods and apparatus for prediction refinement with optical flow
WO2020223552A1 (en) Methods and apparatus of prediction refinement with optical flow
EP3942824A1 (en) Methods and apparatuses for prediction refinement with optical flow
WO2021072326A1 (en) Methods and apparatuses for prediction refinement with optical flow, bi-directional optical flow, and decoder-side motion vector refinement
CN114175659A (en) Apparatus and method for bit width control of bi-directional optical flow
CN113615197B (en) Method and apparatus for bit depth control of bi-directional optical flow
EP3777175A1 (en) Image processing apparatus and method
US11785204B1 (en) Frequency domain mode decision for joint chroma coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination