WO2021188707A1

WO2021188707A1 - Methods and apparatuses for simplification of bidirectional optical flow and decoder side motion vector refinement

Info

Publication number: WO2021188707A1
Application number: PCT/US2021/022811
Authority: WO
Inventors: Xiaoyu XIU; Wei Chen; Yi-Wen Chen; Tsung-Chuan MA; Hong-Jheng Jhu; Xianglin Wang; Bing Yu
Original assignee: Beijing Dajia Internet Information Technology Co., Ltd.
Priority date: 2020-03-20
Filing date: 2021-03-17
Publication date: 2021-09-23
Also published as: CN115315955A

Abstract

Methods, apparatuses, and non-transitory computer-readable storage mediums are provided for decoding a video signal. A decoder obtains a forward reference picture L ⁽⁰⁾ and a backward reference picture L ⁽¹⁾ associated with a coding unit (CU). The decoder may further obtain forward reference samples L ⁽⁰⁾ (x, y) of the CU from a reference block in the forward reference picture L ⁽⁰⁾. The decoder may further obtain backward reference samples L ⁽¹⁾ (x', y') of the CU from a reference block in the backward reference picture L ⁽¹⁾. The decoder may further skip bi-directional optical flow (BIO) process based on a distortion measurement between integer reference samples. The decoder may further obtain, when the BIO process is skipped, prediction samples of the CU.

Description

METHODS AND APPARATUSES FOR SIMPLIFICATION OF BIDIRECTIONAL

OPTICAL FLOW AND DECODER SIDE MOTION VECTOR REFINEMENT

CROSS-REFERENCE TO RELATED APPLICATION [0001] This application is based upon and claims priority to Provisional Application No. 62/992,893 filed on March 20, 2020, the entire contents thereof are incorporated herein by reference in their entireties for all purposes.

TECHNICAL FIELD

[0002] This disclosure is related to video coding and compression. More specifically, this disclosure relates to methods and apparatuses for simplifications of bidirectional optical flow (BIO) tool (also abbreviated as BDOF) and decoder-side motion vector refinement (DMVR).

BACKGROUND

[0003] Various video coding techniques may be used to compress video data. Video coding is performed according to one or more video coding standards. For example, nowadays, some well-known video coding standards include Versatile Video Coding (VVC), High Efficiency Video Coding (HEVC, also known as H.265 or MPEG-H Part2) and Advanced Video Coding (AVC, also known as H.264 or MPEG-4 Part 10), which are jointly developed by ISO/IEC MPEG and ITU-T VECG. AOMedia Video 1 (AVI) was developed by Alliance for Open Media (AOM) as a successor to its preceding standard VP9. Audio Video Coding (AVS), which refers to digital audio and digital video compression standard, is another video compression series developed by the Audio and Video Coding Standard Workgroup of China. Most of the existing video coding standards are built upon the famous hybrid video coding framework, i.e., using block-based prediction methods (e.g., inter-prediction, intra-prediction) to reduce redundancy present in video images or sequences and to use transform coding to compact the energy of the prediction errors. An important goal of video coding techniques is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradations to video quality. SUMMARY

[0004] Examples of the present disclosure provide methods and apparatuses for simplifications of bidirectional optical flow (BIO) and decoder-side motion vector refinement (DMVR).

[0005] According to a first aspect of the present disclosure, a method for decoding a video signal is provided. The method may include a decoder obtaining a forward reference picture L⁽⁰⁾ and a backward reference picture L⁽¹⁾ associated with a coding unit (CU). The forward reference picture L⁽⁰⁾ may be before a current picture and the backward reference picture L⁽¹⁾ may be after the current picture in display order. The decoder may also obtain forward reference samples L⁽⁰⁾(x, y) of the CU from a reference block in the forward reference picture L⁽⁰⁾. The x and y may represent an integer coordinate of one sample in the forward reference picture L⁽⁰⁾ . The decoder may also obtain backward reference samples L⁽¹⁾(x',y' of the CU from a reference block in the backward reference picture L₍ ¹ ₎. The x’ and y’ may represent an integer coordinate of one sample in the backward reference picture L⁽¹⁾. The decoder may also skip bi-directional optical flow (BIO) process based on a distortion measurement between integer reference samples. The distortion measurement between the integer reference samples may indicate a difference between the forward reference samples L⁽⁰⁾(x,y) and the backward reference samples L⁽¹⁾(x', y'). The decoder may also obtain, when the BIO process is skipped, prediction samples of the CU.

[0006] According to a second aspect of the present disclosure, a method for decoding a video signal is provided. The method may include a decoder obtaining a first reference picture /⁽⁰⁾ and a second reference picture /⁽¹⁾ associated with a coding unit (CU). The first reference picture may be before a current picture and the second reference picture /⁽¹⁾ may be after the current picture in display order. The decoder may also obtain first prediction samples I⁽⁰⁾(i, j) of the CU from a reference block in the first reference picture I⁽⁰⁾. The i and j may represent a coordinate of one sample with the current picture. The decoder may also obtain second prediction samples I⁽¹⁾(i, j) of the CU from a reference block in the second reference picture I⁽¹⁾. The decoder may also obtain motion refinements for samples in the CU based on bi-directional optical flow (BIO) process being applied to the CU based on the first prediction samples the second prediction samples I⁽¹⁾( i, j), horizontal gradient values, and vertical gradient values. The horizontal gradient values and the vertical gradient values may be calculated using gradient filters with fewer coefficients. The decoder may also obtain bi prediction samples of the CU based on the motion refinements.

[0007] According to a third aspect of the present disclosure, a computing device is provided. The computing device may include one or more processors, anon-transitory computer-readable memory storing instructions executable by the one or more processors. The one or more processors may be configured to obtain a forward reference picture L⁽⁰⁾ and a backward reference picture L⁽¹⁾ associated with a coding unit (CU). The forward reference picture L⁽⁰⁾ may be before a current picture and the backward reference picture L⁽¹⁾ may be after the current picture in display order. The one or more processors may also be configured to obtain forward reference samples L⁽⁰⁾ (x, y) of the CU from a reference block in the forward reference picture L⁽⁰⁾. The x and y may represent an integer coordinate of one sample in the forward reference picture L⁽⁰⁾. The one or more processors may also be configured to obtain backward reference samples L⁽¹⁾(x', y') of the CU from a reference block in the backward reference picture L₍ ¹ ₎. The x’ and y’ may represent an integer coordinate of one sample in the backward reference picture L₍ ¹ ₎. The one or more processors may also be configured to skip bi-directional optical flow (BIO) process based on a distortion measurement between integer reference samples. The distortion measurement between the integer reference samples may indicate a difference between the forward reference samples L⁽⁰⁾ (x, y) and the backward reference samples L⁽¹⁾(x', y'). The one or more processors may also be configured to obtain, when the BIO process is skipped, prediction samples of the CU.

[0008] According to a fourth aspect of the present disclosure, a non-transitory computer- readable storage medium having stored therein instructions is provided. When the instructions are executed by one or more processors of the apparatus, the instructions may cause the apparatus to obtain, at a decoder, a first reference picture and a second reference picture associated with a coding unit (CU). The first reference picture /⁽⁰⁾ may be before a current picture and the second reference picture may be after the current picture in display order. The instructions may also cause the apparatus to obtain, at the decoder, first prediction samples of the CU from a reference block in the first reference picture I⁽⁰⁾. The i and j may represent a coordinate of one sample with the current picture. The instructions may further cause the apparatus to obtain, at the decoder, second prediction samples I⁽¹⁾(i, j) of the CU from a reference block in the second reference picture I₍ ¹ ₎. The instructions may also cause the apparatus to obtain, at the decoder, motion refinements for samples in the CU based on bi- directional optical flow (BIO) process being applied to the CU based on the first prediction samples I⁽⁰⁾(i, j) the second prediction samples I⁽¹⁾(x', y'), horizontal gradient values, and vertical gradient values. The horizontal gradient values and the vertical gradient values may be calculated using gradient filters with fewer coefficients. The instructions may also cause the apparatus to obtain, at the decoder, bi-prediction samples of the CU based on the motion refinements.

[0009] It is to be understood that the above general descriptions and detailed descriptions below are only examples and explanatory and not intended to limit the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS [0010] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate examples consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

[0011] FIG. 1 is a block diagram of an encoder, according to an example of the present disclosure.

[0012] FIG. 2 is a block diagram of a decoder, according to an example of the present disclosure.

[0013] FIG. 3A is a diagram illustrating block partitions in a tree partition structure of the AVS3, according to an example of the present disclosure.

[0014] FIG. 3B is a diagram illustrating block partitions in a tree partition structure of the AVS3, according to an example of the present disclosure.

[0015] FIG. 3C is a diagram illustrating block partitions in a tree partition structure of the AVS3, according to an example of the present disclosure.

[0016] FIG. 3D is a diagram illustrating block partitions in a tree partition structure of the AVS3, according to an example of the present disclosure.

[0017] FIG. 3E is a diagram illustrating block partitions in a tree partition structure of the AVS3, according to an example of the present disclosure.

[0018] FIG. 4 is a diagram illustration of a bi-directional optical flow (BDOF or BIO) model, according to an example of the present disclosure.

[0019] FIG. 5 is a diagram illustration of a DMVR model, according to an example of the present disclosure.

[0020] FIG. 6 a diagram illustration of integer searching candidates for the DMVR, according to an example of the present disclosure.

[0021] FIG. 7 is a flowchart of a motion compensation process with the DMVR and the BIO, according to an example of the present disclosure.

[0022] FIG. 8 is a flowchart of the proposed multi-stage early termination scheme for the BIO and the DMVR, according to an example of the present disclosure.

[0023] FIG. 9 is a method for decoding a video signal, according to an example of the present disclosure.

[0024] FIG. 10 is a method for decoding a video signal, according to an example of the present disclosure.

[0025] FIG. 11 is a diagram illustrating a computing environment coupled with a user interface, according to an example of the present disclosure.

DETAILED DESCRIPTION

[0026] Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise represented. The implementations set forth in the following description of embodiments do not represent all implementations consistent with the present disclosure. Instead, they are merely examples of apparatuses and methods consistent with aspects related to the present disclosure, as recited in the appended claims.

[0027] The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to limit the present disclosure. As used in the present disclosure and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It shall also be understood that the term “and/or” used herein is intended to signify and include any or all possible combinations of one or more of the associated listed items.

[0028] It shall be understood that, although the terms “first,” “second,” “third,” etc., may be used herein to describe various information, the information should not be limited by these terms. These terms are only used to distinguish one category of information from another. For example, without departing from the scope of the present disclosure, first information may be termed as second information; and similarly, second information may also be termed as first information. As used herein, the term “if’ may be understood to mean “when” or “upon” or “in response to a judgment” depending on the context.

[0029] The first generation AVS standard includes the Chinese national standard “Information Technology, Advanced Audio Video Coding, Part 2: Video” (known as AVS1) and “Information Technology, Advanced Audio Video Coding Part 16: Radio Television Video” (known as AVS+). It can offer around 50% bit-rate saving at the same perceptual quality compared to the MPEG-2 standard. The AVS 1 standard video part was promulgated as the Chinese national standard in February 2006. The second-generation AVS standard includes the series of Chinese national standard “Information Technology, Efficient Multimedia Coding” (knows as AVS2), which is mainly targeted at the transmission of extra HD TV programs. The coding efficiency of the AVS2 is double that of the AVS+. In May 2016, the AVS2 was issued as the Chinese national standard. Meanwhile, the AVS2 standard video part was submitted by the Institute of Electrical and Electronics Engineers (IEEE) as one international standard for applications. The AVS3 standard is one new generation video coding standard for UHD video application aiming at surpassing the coding efficiency of the latest international standard HEVC. In March 2019, at the 68th AVS meeting, the AVS3-P2 baseline was finished, which provides approximately 30% bit-rate savings over the HEVC standard. Currently, there is one reference software, called high performance model (HPM), is maintained by the AVS group to demonstrate a reference implementation of the AVS3 standard.

[0030] Like the HEVC, the AVS3 standard is built upon the block-based hybrid video coding framework.

[0031] FIG. 1 shows a general diagram of a block-based video encoder for the VVC. Specifically, FIG. 1 shows atypical encoder 100. The encoder 100 has video input 110, motion compensation 112, motion estimation 114, intra/inter mode decision 116, block predictor 140, adder 128, transform 130, quantization 132, prediction related info 142, intra prediction 118, picture buffer 120, inverse quantization 134, inverse transform 136, adder 126, memory 124, in-loop filter 122, entropy coding 138, and bitstream 144.

[0032] In the encoder 100, a video frame is partitioned into a plurality of video blocks for processing. For each given video block, a prediction is formed based on either an inter prediction approach or an intra prediction approach.

[0033] A prediction residual, representing the difference between a current video block, part of video input 110, and its predictor, part of block predictor 140, is sent to a transform 130 from adder 128. Transform coefficients are then sent from the Transform 130 to a Quantization 132 for entropy reduction. Quantized coefficients are then fed to an Entropy Coding 138 to generate a compressed video bitstream. As shown in FIG. 1, prediction related information 142 from an intra/inter mode decision 116, such as video block partition info, motion vectors (MVs), reference picture index, and intra prediction mode, is also fed through the Entropy Coding 138 and saved into a compressed bitstream 144. Compressed bitstream 144 includes a video bitstream.

[0034] In the encoder 100, decoder-related circuitries are also needed in order to reconstruct pixels for the purpose of prediction. First, a prediction residual is reconstructed through an Inverse Quantization 134 and an Inverse Transform 136. This reconstructed prediction residual is combined with a Block Predictor 140 to generate un-filtered reconstructed pixels for a current video block. [0035] Spatial prediction (or “intra prediction”) uses pixels from samples of already coded neighboring blocks (which are called reference samples) in the same video frame as the current video block to predict the current video block.

[0036] Temporal prediction (also referred to as “inter prediction”) uses reconstructed pixels from already-coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal. The temporal prediction signal for a given coding unit (CU) or coding block is usually signaled by one or more MVs, which indicate the amount and the direction of motion between the current CU and its temporal reference. Further, if multiple reference pictures are supported, one reference picture index is additionally sent, which is used to identify from which reference picture in the reference picture storage, the temporal prediction signal comes from.

[0037] Motion estimation 114 intakes video input 110 and a signal from picture buffer 120 and output, to motion compensation 112, amotion estimation signal. Motion compensation 112 intakes video input 110, a signal from picture buffer 120, and motion estimation signal from motion estimation 114 and output to intra/inter mode decision 116, a motion compensation signal.

[0038] After spatial and/or temporal prediction is performed, an intra/inter mode decision 116 in the encoder 100 chooses the best prediction mode, for example, based on the rate- distortion optimization method. The block predictor 140 is then subtracted from the current video block, and the resulting prediction residual is de-correlated using the transform 130 and the quantization 132. The resulting quantized residual coefficients are inverse quantized by the inverse quantization 134 and inverse transformed by the inverse transform 136 to form the reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU. Further in-loop filtering 122, such as a deblocking filter, a sample adaptive offset (SAO), and/or an adaptive in-loop filter (ALF) may be applied on the reconstructed CU before it is put in the reference picture storage of the picture buffer 120 and used to code future video blocks. To form the output video bitstream 144, coding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to the entropy coding unit 138 to be further compressed and packed to form the bitstream.

[0039] FIG. 1 gives the block diagram of a generic block-based hybrid video encoding system. The input video signal is processed block by block (called coding units (CUs)). Different from the HEVC, which partitions blocks only based on quad-trees, in the AVS3, one coding tree unit (CTU) is split into CUs to adapt to varying local characteristics based on quad/binary/extended-quad-tree. Additionally, the concept of multiple partition unit type in the HEVC is removed, i.e., the separation of CU, prediction unit (PU), and transform unit (TU) does not exist in the AVS3; instead, each CU is always used as the basic unit for both prediction and transform without further partitions. In the tree partition structure of the AVS3, one CTU is firstly partitioned based on a quad-tree structure. Then, each quad-tree leaf node can be further partitioned based on a binary and extended-quad-tree structure. As shown in FIG. 3A, 3B, 3C, 3D, and 3E, there are five splitting types, quaternary partitioning, horizontal binary partitioning, vertical binary partitioning, horizontal extended quad-tree partitioning, and vertical extended quad-tree partitioning.

[0040] FIG. 3A shows a diagram illustrating block quaternary partition in a tree partition structure of the AVS3, in accordance with the present disclosure.

[0041] FIG. 3B shows a diagram illustrating block vertical binary partition in a tree partition structure of the AVS3, in accordance with the present disclosure.

[0042] FIG. 3C shows a diagram illustrating block horizontal binary partition in a tree partition structure of the AVS3, in accordance with the present disclosure.

[0043] FIG. 3D shows a diagram illustrating block vertical extended quaternary partition in a tree partition structure of the AVS3, in accordance with the present disclosure.

[0044] FIG. 3E shows a diagram illustrating block horizontal ternary partition in a tree partition structure of the AVS3, in accordance with the present disclosure.

[0045] In FIG. 1, spatial prediction and/or temporal prediction may be performed. Spatial prediction (or “intra prediction”) uses pixels from the samples of already coded neighboring blocks (which are called reference samples) in the same video picture/slice to predict the current video block. Spatial prediction reduces spatial redundancy inherent in the video signal. Temporal prediction (also referred to as “inter prediction” or “motion compensated prediction”) uses reconstructed pixels from the already coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal. Temporal prediction signal for a given CU is usually signaled by one or more motion vectors (MVs), which indicate the amount and the direction of motion between the current CU and its temporal reference. Also, if multiple reference pictures are supported, one reference picture index is additionally sent, which is used to identify from which reference picture in the reference picture storage the temporal prediction signal comes. After spatial and/or temporal prediction, the mode decision block in the encoder chooses the best prediction mode, for example, based on the rate-distortion optimization method. The prediction block is then subtracted from the current video block, and the prediction residual is de-correlated using transform and then quantized. The quantized residual coefficients are inverse quantized and inverse transformed to form the reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU. Further in-loop filtering, such as deblocking filter, sample adaptive offset (SAO), and adaptive in-loop filter (ALF) may be applied on the reconstructed CU before it is put in the reference picture storage and used as a reference code for future video blocks. To form the output video bitstream, coding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to the entropy coding unit to be further compressed and packed.

[0046] FIG. 2 shows a general block diagram of a video decoder for the VVC. Specifically, FIG. 2 shows a typical decoder 200 block diagram. Decoder 200 has bitstream 210, entropy decoding 212, inverse quantization 214, inverse transform 216, adder 218, intra/inter mode selection 220, intra prediction 222, memory 230, in-loop filter 228, motion compensation 224, picture buffer 226, prediction related info 234, and video output 232.

[0047] Decoder 200 is similar to the reconstruction-related section residing in the encoder 100 of FIG. 1. In the decoder 200, an incoming video bitstream 210 is first decoded through an Entropy Decoding 212 to derive quantized coefficient levels and prediction-related information. The quantized coefficient levels are then processed through an Inverse Quantization 214 and an Inverse Transform 216 to obtain a reconstructed prediction residual. A block predictor mechanism, implemented in an Intra/inter Mode Selector 220, is configured to perform either an Intra Prediction 222 or a Motion Compensation 224, based on decoded prediction information. A set of unfiltered reconstructed pixels is obtained by summing up the reconstructed prediction residual from the Inverse Transform 216 and a predictive output generated by the block predictor mechanism, using a summer 218.

[0048] The reconstructed block may further go through an In-Loop Filter 228 before it is stored in a Picture Buffer 226, which functions as a reference picture storage. The reconstructed video in the Picture Buffer 226 may be sent to drive a display device, as well as used to predict future video blocks. In situations where the In-Loop Filter 228 is turned on, a filtering operation is performed on these reconstructed pixels to derive a final reconstructed Video Output 232. [0049] FIG. 2 gives a general block diagram of a block-based video decoder. The video bitstream is first entropy decoded at entropy decoding unit. The coding mode and prediction information are sent to either the spatial prediction unit (if intra coded) or the temporal prediction unit (if inter-coded) to form the prediction block. The residual transform coefficients are sent to inverse quantization unit and inverse transform unit to reconstruct the residual block. The prediction block and the residual block are then added together. The reconstructed block may further go through in-loop filtering before it is stored in reference picture storage. The reconstructed video in reference picture storage is then sent out for display, as well as used to predict future video blocks.

[0050] The focus of the disclosure is to reduce the complexity of the bidirectional optical flow (BIO) tool and the decoder-side motion vector refinement (DMVR) tool that are used in both the VVC and the AVS3 standards. In the VVC, the BIO tool is also abbreviated as BDOF. To facilitate the description of the disclosure, in the following, the existing BIO and DMVR designs in the AVS3 standard are used as examples to explain the main design aspects of the two coding tools. After that, possible improvements to the existing BIO and DMVR designs are discussed. Finally, some methods are proposed to reduce the complexity while maintaining the majority gain of the two coding tools. Please note that though the existing BIO and DMVR designs in the AVS3 standard are used as the base in the following description, to a person skilled in the art of video coding, the proposed methods described in the disclosure can also be applied to other BIO and DMVR designs or other coding tools that are in the same or similar design flavor.

[0051] Bidirectional Optical Flow

[0052] Conventional bi-prediction in video coding is a simple combination of two temporal prediction blocks obtained from the reference pictures. However, due to the signaling cost and accuracy tradeoff of motion vectors, the motion vectors received at the decoder end may not be so accurate. As a result, there may be still remaining small motion that can be observed between the two prediction blocks, which could reduce the efficiency of motion compensated prediction. To improve on this, the BIO tool is adopted in both VVC and AVS3 standards to compensate such motion for every sample inside one block. Specifically, the BIO is sample- wise motion refinement that is performed on top of the block-based motion-compensated predictions when bi-prediction is used. In the existing BIO design, the derivation of the refined motion vector for each sample in one block is based on the classical optical flow model. Let / (x, y) be the sample value at the coordinate (x, y) of the prediction block derived from the reference picture list k (k = 0, 7), and are the horizontal

and vertical gradients of the sample. Assuming the optical flow model is valid, the motion refinement (v_x, v_y ) at (x, y) can be derived by

[0053] With the combination of the optical flow equation () and the interpolation of the prediction blocks along the motion trajectory (as shown in FIG. 4 described below), we can obtain the BIO prediction as

[0054] FIG. 4 shows an illustration of a BDOF model, in accordance with the present disclosure.

[0055] In FIG. 4, (MV_x0, MV_y0) and (MV_x1, MV_y1) indicate the block-level motion vectors that are used to generate the two prediction blocks /⁽⁰⁾ and I⁽¹⁾. Further, the motion refinement (v_x, v_y ) at the sample location (x, y) is calculated by minimizing the difference D between the values of the samples after motion refinement compensation (i.e., A and B in FIG. 4), as shown as

[0056] Additionally, in order to ensure the regularity of the derived motion refinement, it is assumed that the motion refinement is consistent within a local surrounding area centered at

(x, y) therefore, in the current BIO design in the AVS3, the values of (v_x, v_y ) are derived by minimizing D inside the 4x4 window W around the current sample at (x, y) as

[0057] As shown in (2) and (4), in addition to the block-level MC, gradients need to be derived in the BIO for every sample of each motion compensated block (i.e., I⁽⁰⁾ and I⁽¹⁾) in order to derive the local motion refinement and generate the final prediction at that sample location. In the AVS3, the gradients are calculated by a 2D separable finite impulse response (FIR) filtering process, which defines a set of 8-tap filters and applies different filters to derive the horizontal and vertical gradients according to the precision of the block-level motion vector (e.g., (MV_x0, MV_y0) and (MV_x1 , MV_y1) in FIG. 4). Table 1 illustrates the coefficients of the gradient filters that are used by the BIO. Table 1 Gradient filters used in BIO

[0058] Finally, the BIO is only applied to bi-prediction blocks, which are predicted by two reference blocks from temporal neighboring pictures. Additionally, the BIO is enabled without sending additional information from the encoder to the decoder. Specifically, the BIO is applied to all the bi-directional predicted blocks, which have both the forward and backward prediction signals.

[0059] Decoder-Side Motion Vector Derivation (DMVR)

[0060] Similar to the VVC standard, to increase the accuracy of the MVs of the regular merge mode, a bilateral-matching based decoder side motion vector refinement (DMVR) is applied in the AVS3. In bi-prediction operation, a refined MV is searched around the initial MVs in the reference picture list L0 and reference picture list L1. The method calculates the distortion between the two candidate blocks in the reference picture list L0 and list L1. As illustrated in FIG. 5 (described below), the SAD between the red blocks based on each MV candidate around the initial MV is calculated. The MV candidate with the lowest SAD becomes the refined MV and is used to generate the bi-predicted signal.

[0061] FIG. 5 shows a decoding side motion vector refinement.

[0062] FIG. 5 includes 520 refPic in list L0, 540 current picture, and 560 refPic in list L1.

520 refPic in list L0 is a reference picture of the first list and includes 522 current CU, 524 reference block, 526 MVdiff, 528 MV0, and 530 MV0’. 526 MVdiff is the motion vector difference between 522 current CU and 524 reference block. 528 MVO is the motion vector between blocks 522 current CU and 542 current CU. 530 MVO’ is the motion vector between blocks 522 current CU and 542 current CU. 540 current picture is a current picture of the video and includes 542 current CU, 544 MVU, and 546 MV1. 544 MVU is the motion vector between block 542 current CU and 562 reference block. 546 MV1 is the motion vector between blocks 542 current CU and 564 current CU. 560 refPic in List L1 is a reference picture in the second list and includes 562 reference block, 564 current CU, and 566 -MVdiff. 566 -MVdiff is the motion vector difference between 562 reference block and 564 current CU.

[0063] In DVMR, the search points are surrounding an integer sample pointed by the initial MV and the MV offsets that are considered to conform with the mirroring rule. In other words, any MV refinement that is checked by DMVR should satisfy the following two equations:

MV O' = MV 0 + MV_offset (5)

MV 1' = MV 1 - MV _offset (6) where MV _of fset represents the refinement offset between the initial MV and the refined MV in one of the reference pictures. The refinement search range is two integer luma samples from the initial MV. The searching includes the integer sample search stage and fractional sample refinement stage.

[0064] At the stage of integer search, the sum of absolute difference (SAD) of 21 integer sample positions (including the integer sample position corresponding to the initial MV) as indicated in FIG. 6 (described below) are checked. The SAD of the initial MV pair is first calculated. The integer offset, which minimizes the SAD value, is selected as the integer sample offset of the integer searching stage.

[0065] FIG. 6 shows integer searching candidates for the DMVR. The black triangle is the integer sample position associated with the initial MV, and the blank or white triangles are the neighboring integer sample positions.

[0066] The integer sample search is followed by fractional sample refinement. To save the computational complexity, the fractional sample refinement is derived by using a parametric error surface method, instead of an additional search with SAD comparison. In parametric error surface based sub-pixel offsets estimation, the center position cost and the costs at four neighboring positions from the center are used to fit a 2-D parabolic error surface equation as follows

E(x,y) = A(x - x_min)² + B(y - y_min )² + C (7) where (x_min, y_min ) corresponds to the fractional position with the least cost, and C corresponds to the minimum cost value. By solving the above equations by using the cost value of the five search points, the ( x_min>y_min ) is computed as:

X_mln = (E-(—1,0) - E (1,0))/ (2(E (—1,0) + E(1,0) - 2E(0,0))) (8) y_min = (E(0, -1) - E(0,1))/(2((E(0, -1) + E(0,1) - 2E(0,0))) (9)

[0067] The value of x_min and y_min are automatically constrained to be between - 8 and 8 since all cost values are positive and the smallest value is £(0,0). The computed fractional ( _min, y_min ) ^are added to the integer distance refinement MV to get the sub-pixel accurate refinement delta MV.

[0068] Improvements to BIO and DMVR

[0069] Although the BIO and the DMVR can efficiently enhance the efficiency of motion compensation, they also introduce significant complexity increase to the encoder and decode designs in both hardware and software. Specifically, in this disclosures, the following complexity issues in the existing BIO and DMVR designs are identified:

[0070] First, as discussed above, at the regular motion compensation stage, the DMVR and the BIO are always enabled for the bi-predicted blocks, which have both forward and backward prediction signals. Such design may not be practical for certain video applications (e.g., video streaming on mobile devices) that cannot afford too heavy computations due to its limited power. For example, the BIO needs to derive the gradient values at each sample location, which requires a number of multiplications and additions due to the 2D FIR filtering operations, and the DMVR needs to calculate multiple SAD values during the bilateral matching process. All of those operations require intensive computations. Such complexity increase may become even worse when the BIO is jointly applied with the DMVR to one bi-predicted CU. As shown in FIG. 7 (described below), in such a case, the CU needs to be further split into multiple sub blocks for the DMVR, and each sub-block may present one unique motion vector. Therefore, when the BIO is further applied, all the BIO-related operations need to be carried out separately for each sub-block. This could lead to significant computational complexity and memory bandwidth and can potentially complicate the pipeline design and/or parallel processing in hardware and software. [0071] FIG. 7 shows a flowchart of the motion compensation process with the DMVR and the BIO.

[0072] In step 701, the process starts.

[0073] In step 702, the CU is divided into multiple sub-blocks with a size equal to min(16,

CUWidth) X min(16, CUHeight).

[0074] In step 703, variable i set to 0.

[0075] In step 704, DMVR is applied to the i- th sub-block.

[0076] In step 705, motion compensation is applied to the /-th sub-block with refined motion.

[0077] In step 706, BIO is applied to the i-th sub-block.

[0078] In step 707, it’s determined if the current sub-block is the last sub-block. If yes, the process continues to step 709. If no, the process continues to step 708.

[0079] In step 708, the variable i is increased by 1 and the process continues to step 704.

[0080] In step 709, the process ends.

[0081] Second, as described earlier, the current BIO uses 2D separable FIR filters to calculate the horizontal and vertical gradient values. Specifically, the low-pass 8-tap interpolation filters (that are used for the interpolation at regular motion compensation) and the high-pass 8-tap gradient filters (as shown in Table 1) are applied, and the filter selection is based on the fractional position of the corresponding motion vector. Assuming both the LO and L1 MVs point to reference samples that are at fractional sample positions in both horizontal and vertical directions, the number of multiplications and additions to calculate the horizontal and vertical gradients in LO and L1 will be (Wx(H+7) + WxH) x 2 x2. However, due to the high-pass nature of gradient calculation, using a gradient filter with more filter coefficients may not always be beneficial to accurately extract useful gradient information from neighboring samples. Therefore, it is highly desirable to further reduce the filter length of gradient filters used for the BIO, which can potentially not only improve the precision of gradient derivation but also reduce the computational complexity of the BIO. [0082] Proposed Methods

[0083] To reduce the complexity of motion compensation, methods are firstly proposed to conditionally bypass the DMVR and/or the BIO processes at regular motion compensation. Specifically, in the proposed methods, the main aspects of the BIO and the DMVR are kept the same as their existing designs. However, the BIO and the DMVR will be completely skipped at either CU-level or sub-block-level when certain conditions as proposed in the following are met. Secondly, to reduce the computational complexity of the BIO, gradient filters with fewer coefficients are proposed to replace the existing 8-tap gradient filters in the existing BIO design. [0084] Skipping the BIO and the DMVR Based on the Similarity of Reference Samples

[0085] As mentioned above, both the BIO and the DMVR utilize the L0 and L1 prediction samples to derive local motion refinements at different granularity levels (e.g., the BIO derives the motion refinements for each sample while the motion refinement of the DMVR is calculated per sub-block). When the difference between the L0 and L1 prediction signals is small, the two prediction blocks are highly correlated such that the DMVR and the BIO processes can be safely skipped without incurring substantial coding loss. However, because the initial motion vectors may point to reference samples at fractional sample positions, the generation of the L0 and L1 prediction samples may invoke the interpolation process, which requires non-negligible complexity and lead to some delays to make the decision. On the other hand, instead of directly comparing the prediction samples, another way to measure the correlation between two prediction blocks are to calculate the difference between their integer reference samples in the reference pictures from which the two prediction blocks are generated, i.e.,

where L⁽⁰⁾(x, y) and L⁽¹⁾(x', y') are the sample values at the integer sample coordinate (x, y) and (x’, y) at the forward and backward reference pictures; B and B ’ are the set of the integer sample coordinates that are used to generate the L0 and L1 prediction samples of the current block; N is the number of samples in the current block; D is the distortion measurement for which different metrics may be applied, such as, sum of square error (SSE), sum of absolute difference (SAD) and sum of absolute transformed difference (SATD). Given equation (10), the BIO and/or the DMVR could be skipped at the motion compensation stage when the difference measurement is no larger than one predefined threshold, i.e., Diff ≤ D_thres ; otherwise, the BIO and/or the DMVR still needs to be applied.

[0086] The proposed early termination method for the DMVR and the BIO can be carried out at either CU level or sub-block level, which can potentially provide various tradeoffs between coding performance and complexity reduction. On the one hand, sub-block-level early termination can better maintain the coding gains of the BIO and the DMVR due to the finer control granularity of the BIO and the DMVR. This may not be optimal in terms of the complexity reduction, given that the distortion measurement and the early termination decision need to be performed for each sub-block separately. On the other hand, though CU-level early termination can potentially lead to more significant complexity reduction, it may not be able to achieve an acceptable tradeoff between performance and complexity for coding blocks with inhomogeneous characteristics. Additionally, as mentioned above, when the DMVR is applied, the motion field of the current CU is derived at sub-block level. Therefore, it is highly possible that the prediction samples of various sub-blocks inside the current CU become more diverse after the DMVR is applied. In such a case, the decoder may determine on a sub-block basis whether the DMVR and the BIO processes are to be bypassed or not. Otherwise (i.e., when the DMVR is not applied), it is acceptable to rely on the CU-level distortion measurement to determine if the BIO process for the whole CU is to be bypassed or not. Based on such consideration, one multi-stage early termination method is proposed to adaptively skip the BIO and the DMVR process at either CU-level or sub-block-level, depending on whether the DMVR is allowed to one current CU or not. FIG. 8 (described below) illustrates the modified motion compensation process with the proposed multi-stage early termination method being applied to the BIO and the DMVR.

[0087] FIG. 8 shows a flowchart of the proposed multi-stage early termination scheme for the BIO and the DMVR.

[0088] In step 801, the process starts.

[0089] In step 802, it is determined if the DMVR is applied to the CU. If no, the process continues to step 811. If yes, the process continues to step 803.

[0090] In step 803, the CU is divided into multiple sub-blocks with a size equal to min(16, CUWidth) X min(16, CUHeight).

[0091] In step 804, the variable i is set to 0.

[0092] In step 805, it is determined if the difference between L0 and L1 integer reference samples of the sub-block, as calculated in equation 10, is less than or equal to threshold thresDMVR. If yes, the process continues to step 807. If no, the process continues to step 806. [0093] In step 806, DMVR is applied to the i-th sub-block.

[0094] In step 807, it is determined if the difference between L0 and L1 integer reference samples of the sub-block, as calculated in equation 10, is less than threshold thresBIO. If yes, the process continues to step 809. If no, the process continues to step 808.

[0095] In step 808, BIO is applied to the i-th sub-block.

[0096] In step 809, it is determined if the current sub-block is the last block if yes, the process continues to step 813. If no, the process continues to step 810.

[0097] In step 810, the variable i is increased by 1 and the process continues to step 805.

[0098] In step 811, it is determined if the difference, as calculated in equation 10, is less than threshold thresBIO. If yes, the process continues to step 813. If no, the process continues to step 812.

[0099] In step 812, BIO is applied to the CU.

[00100] In step 813, the process ends.

[00101] As illustrated in FIG. 8, the proposed method can be summarized as the following.

[00102] First, when the DMVR is disabled to the current CU, the decision on whether to bypass the BIO process is made at CU-level. Specifically, if the distortion measurement (as shown in (10)) of the reference samples of the CU is no larger than a predefined threshold thresBIO, the BIO process is completely disabled for the whole CU; otherwise, the BIO is still applied for the CU.

[00103] Second, when the DMVR is enabled (i.e., allowed) to the current CU, the decision on whether to bypass the BIO and the DMVR is made at sub-block-level. Additionally, two thresholds, thresBIO and thresDMVR are used to bypass the BIO and the DMVR for each sub block separately.

[00104] FIG. 9 shows a method for decoding a video signal. The method may be, for example, applied to a decoder.

[00105] In step 910, the decoder may obtain a forward reference picture L⁽⁰⁾ and a backward reference picture L⁽¹⁾ associated with a coding unit (CU). The forward reference picture L⁽⁰⁾ is before a current picture and the backward reference picture L⁽¹⁾ is after the current picture in display order.

[00106] In step 912, the decoder may obtain forward reference samples L⁽⁰⁾ (x, y) of the CU from a reference block in the forward reference picture L⁽⁰⁾ . The x and y represent an integer coordinate of one sample in the forward reference picture L⁽⁰⁾ .

[00107] In step 914, the decoder may obtain backward reference samples L^{(1 )}(x',y') of the CU from a reference block in the backward reference picture L⁽¹.⁾ The x’ and y’ represent an integer coordinate of one sample in the backward reference picture L⁽¹.⁾

[00108] In step 916, the decoder may skip bi-directional optical flow (BIO) process based on a distortion measurement between integer reference samples. The distortion measurement between the integer reference samples indicates a difference between the forward reference samples L⁽⁰⁾(x, y) and the backward reference samples L^{(1 )(}x',y').

[00109] In step 918, the decoder may obtain, when the BIO process is skipped, prediction samples of the CU.

[00110] Skipping the BIO Based on the Gradient Information

[00111] As discussed above, the BIO is designed to improve the accuracy of motion compensated prediction by providing sample-wise motion refinement, which is calculated based on the local gradients calculated at each sample location in one motion compensated block. For the blocks within one region that includes less high-frequency details (e.g., the flat area), the gradient values as derived using the gradient filters in Error! Reference source not found, tend to be small. Therefore, the BIO cannot provide effective refinements to the prediction samples of those blocks. This can be demonstrated by equation (2) that when the local gradients

are close to be zero, the final prediction signal obtained from the BIO is approximately equal to the prediction signal generated by the conventional bi-prediction,

Taking this into consideration, in one embodiment of the disclosure, it is proposed to only apply the BIO to the prediction samples of the coding blocks, which includes enough high-frequency information. The decision on whether the prediction signals of one video block include enough high-frequency information can be made based on various criteria. In one example, the average magnitude of the gradients for the samples within a block can be used. If the average gradient magnitude is smaller than one threshold, then the block is classified as a flat area and the BIO should not be applied; otherwise, the block is considered to include sufficient high-frequency details where the BIO is still applicable.

[00112] Similar to the early termination methods proposed in the section of “Skipping the BIO and the DMVR based on the similarity of reference samples,” the proposed gradient-based BIO early termination method can also be applied at either CU-level or sub-block-level. When the method is applied at CU-level, the gradient values of all the prediction samples inside the CU are used to determine whether the BIO is bypassed or not. Otherwise, when the method is applied at sub-block-level, the decision on whether to skip the BIO process is made for each sub-block individually by comparing the average gradient value of the prediction samples within the corresponding sub-block.

[00113] In another embodiment, a hybrid condition that checks both reference sample difference (according to equation 10) and gradient information is proposed. Note that in this condition, reference sample difference and gradient information may be checked jointly or separately. In the joint case, both sample difference and gradient information should be significant in order to apply BIO. Otherwise, BIO is skipped. In the separate case, when either sample difference or gradient information is small (e.g., through threshold comparison), BIO is skipped.

[00114] Simplified Gradient Filters for the BIO

[00115] In the current BIO design, it uses 2D separable 8-tap FIR filters to calculate the horizontal and vertical gradient values, i.e., 8-tap interpolation filters and 8-tap gradient filters. As discussed earlier, using 8-tap gradient filters may not always be effective to accurately extract gradient information from reference samples while resulting in a non-negligible computational complexity increase. To improve on this, in this section, gradient filters with fewer coefficients are proposed for calculating the gradient information that is used by the BIO. [00116] Specifically, the input to the gradient derivation process may include the same reference samples as that used for the motion compensation and the fractional components (fracX, fracY) of the input motion (MV_X, MV_y) of the current block. Additionally, depending on the direction of the derived gradients, the order of applying the gradient filters h_G and the interpolation filters h_L is different. Specifically, in the case of deriving horizontal gradients, the gradient filter h_G is firstly applied in the horizontal direction to derive the horizontal gradient values at horizontal fractional sample position fracX ; then, the interpolation filter hi is applied vertically to interpolate the gradient values at vertical fractional sample position fracY. On the contrary, when vertical gradients are derived, the interpolation filter h_L is firstly applied horizontally to interpolate intermediate interpolation samples at horizontal sample position fracX, followed by the gradient filter ho being applied in the vertical direction to derive the vertical gradient at vertical fraction sample position fracY from the intermediate interpolation samples. In one embodiment, the 4-tap gradient filters as shown in Table 2 are proposed for the gradient calculation of the BIO.

Table 2 Proposed 4-tap gradient filters used in BIO

[00117] In another embodiment, the 6-tap gradient filters in Table 3 are proposed to derive the gradients for the BIO.

Table 3 Proposed 6-tap gradient filters used in BIO

[00118] FIG. 10 shows a method for decoding a video signal. The method may be, for example, applied to a decoder.

[00119] In step 1010, the decoder may obtain a first reference picture I⁽⁰⁾ and a second reference picture I⁽¹⁾ associated with a coding unit (CU). The first reference picture /⁽⁰⁾ is before a current picture and the second reference picture I⁽¹⁾ is after the current picture in display order.

[00120] In step 1012, the decoder may obtain first prediction samples /⁽⁰⁾ (i,j) of the CU from a reference block in the first reference picture I⁽⁰⁾. The i and j represent a coordinate of one sample with the current picture.

[00121] In step 1014, the decoder may obtain second prediction samples I⁽¹⁾(i, j) of the CU from a reference block in the second reference picture I⁽¹⁾.

[00122] In step 1016, the decoder may obtain motion refinements for samples in the CU based on bi-directional optical flow (BIO) process being applied to the CU based on the first prediction samples I⁽⁰⁾(i, j) the second prediction samples I⁽¹⁾(i, j) . horizontal gradient values, and vertical gradient values. The horizontal gradient values and the vertical gradient values are calculated using gradient filters with fewer coefficients.

[00123] In step 1018, the decoder may obtain bi-prediction samples of the CU based on the motion refinements.

[00124] FIG. 11 shows a computing environment 1110 coupled with a user interface 1160. The computing environment 1110 can be part of a data processing server. The computing environment 1110 includes processor 1120, memory 1140, and I/O interface 1150.

[00125] The processor 1120 typically controls overall operations of the computing environment 1110, such as the operations associated with the display, data acquisition, data communications, and image processing. The processor 1120 may include one or more processors to execute instructions to perform all or some of the steps in the above-described methods. Moreover, the processor 1120 may include one or more modules that facilitate the interaction between the processor 1120 and other components. The processor may be a Central Processing Unit (CPU), a microprocessor, a single chip machine, a GPU, or the like.

[00126] The memory 1140 is configured to store various types of data to support the operation of the computing environment 1110. Memory 1140 may include predetermine software 1142. Examples of such data include instructions for any applications or methods operated on the computing environment 1110, video datasets, image data, etc. The memory 1140 may be implemented by using any type of volatile or non-volatile memory devices, or a combination thereof, such as a static random-access memory (SRAM), an electrically erasable programmable read-only memory (EEPROM), an erasable programmable read-only memory (EPROM), a programmable read-only memory (PROM), a read-only memory (ROM), a magnetic memory, a flash memory, a magnetic or optical disk.

[00127] The I/O interface 1150 provides an interface between the processor 1120 and peripheral interface modules, such as a keyboard, a click wheel, buttons, and the like. The buttons may include but are not limited to, a home button, a start scan button, and a stop scan button. The I/O interface 1150 can be coupled with an encoder and decoder. [00128] In some embodiments, there is also provided a non-transitory computer-readable storage medium comprising a plurality of programs, such as comprised in the memory 1140, executable by the processor 1120 in the computing environment 1110, for performing the above-described methods. For example, the non-transitory computer-readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disc, an optical data storage device or the like.

[00129] The non-transitory computer-readable storage medium has stored therein a plurality of programs for execution by a computing device having one or more processors, where the plurality of programs when executed by the one or more processors, cause the computing device to perform the above-described method for motion prediction.

[00130] In some embodiments, the computing environment 1110 may be implemented with one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field- programmable gate arrays (FPGAs), graphical processing units (GPUs), controllers, micro controllers, microprocessors, or other electronic components, for performing the above methods.

[00131] The description of the present disclosure has been presented for purposes of illustration and is not intended to be exhaustive or limited to the present disclosure. Many modifications, variations, and alternative implementations will be apparent to those of ordinary skill in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings.

[00132] The examples were chosen and described in order to explain the principles of the disclosure and to enable others skilled in the art to understand the disclosure for various implementations and to best utilize the underlying principles and various implementations with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of the disclosure is not to be limited to the specific examples of the implementations disclosed and that modifications and other implementations are intended to be included within the scope of the present disclosure.

Claims

CLAIMS What is claimed is:

1. A method for decoding a video signal, comprising: obtaining, at a decoder, a forward reference picture L⁽⁰⁾ and a backward reference picture L⁽¹⁾ associated with a coding unit (CU), wherein the forward reference picture L⁽⁰⁾ is before a current picture and the backward reference picture L⁽¹⁾ is after the current picture in display order; obtaining, at the decoder, forward reference samples L⁽⁰⁾ (x, y) of the CU from a reference block in the forward reference picture L⁽⁰⁾, wherein x and y represent an integer coordinate of one sample in the forward reference picture L⁽⁰⁾ ; obtaining, at the decoder, backward reference samples L⁽¹⁾ (x',y') of the CU from a reference block in the backward reference picture L⁽¹,⁾ wherein x’ and y’ represent an integer coordinate of one sample in the backward reference picture L^{(1 )}; skipping, at the decoder, bi-directional optical flow (BIO) process based on a distortion measurement between integer reference samples, wherein the distortion measurement between the integer reference samples indicates a difference between the forward reference samples L⁽⁰⁾ ( x , y) and the backward reference samples L⁽¹⁾(x', y'); and obtaining, at the decoder when the BIO process is skipped, prediction samples of the

CU.

2. The method of claim 1, wherein skipping, at the decoder, the BIO process based on the distortion measurement between the integer reference samples comprise: skipping, at the decoder, the BIO process based on the distortion measurement between the integer reference samples calculated using a set of reference samples at integer coordinates and a distortion metric, wherein the distortion metric comprises one of following metrics: a sum of square error (SSE), a sum of absolute difference (SAD), and a sum of absolute transformed difference (SATD).

3. The method of claim 1, wherein skipping, at the decoder, the BIO process based on the distortion measurement between integer reference samples comprise: determining, at the decoder, that a decoder-side motion vector refinement (DMVR) process is not applied to a current CU; determining, at the decoder, that the difference between the forward reference samples L⁽⁰⁾ (x, y) and the backward reference samples L⁽¹⁾ (x', y') of the CU is less than or equal to a predefined threshold thresBIO; and skipping, at the decoder, the BIO process for the CU.

4. The method of claim 1, wherein skipping, at the decoder, the BIO process based on the distortion measurement between integer reference samples comprise: determining, at the decoder, that a decoder-side motion vector refinement (DMVR) process is applied to a current CU; dividing, at the decoder, the current CU into multiple sub-blocks in equal size; determining, at the decoder, that the difference between the forward reference samples L⁽⁰⁾ (x, y) and the backward reference samples L⁽¹⁾ (x', y') of a sub-block is less than or equal to a predefined threshold thresDMVR; skipping, at the decoder, the DMVR process for the sub-block; determining, at the decoder, that the difference between the forward reference samples L⁽⁰⁾ (x, y) and the backward reference samples L⁽¹⁾ (x', y') of the sub-block is less than or equal to a predefined threshold thresBIO; and skipping, at the decoder, the BIO process for the sub-block.

5. The method of claim 1, further comprising: skipping, at the decoder, the BIO process based on gradient information.

6. The method of claim 5, wherein skipping, at the decoder, the BIO process based on gradient information comprises: skipping, at the decoder, the BIO process when an average magnitude of gradients for samples within the CU is smaller than a flat area threshold, wherein the gradient information comprises the average magnitude of gradients for the samples within the CU.

7. The method of claim 5, wherein skipping, at the decoder, the BIO process based on gradient information comprises: skipping, at the decoder and at a CU level, the BIO process when an average gradient magnitude of all prediction samples within the CU is smaller than a flat area threshold, wherein the gradient information comprises the average gradient magnitude of all prediction samples within the CU.

8. The method of claim 5, wherein skipping, at the decoder, the BIO process based on gradient information comprises: skipping, at the decoder and at a sub-block level, the BIO process when an average gradient magnitude of prediction samples within a corresponding sub-block of the CU is smaller than a flat area threshold, wherein the gradient information comprises the average gradient magnitude of prediction samples within a corresponding sub-block of the CU.

9. The method of claim 5, further comprising: skipping, at the decoder, the BIO process when the distortion measurement and the gradient information are less than predefined thresholds, wherein the gradient information comprises an average magnitude of gradients for samples within a CU.

10. The method of claim 5, further comprising: skipping, at the decoder, the BIO process when the distortion measurement and the gradient information are less than predefined thresholds, wherein the gradient information comprises an average magnitude of gradients for samples within a CU.

11. A method for decoding a video signal, comprising: obtaining, at a decoder, a first reference picture /⁽⁰⁾ and a second reference picture associated with a coding unit (CU), wherein the first reference picture /⁽⁰⁾ is before a current picture and the second reference picture /⁽¹⁾ is after the current picture in display order; obtaining, at the decoder, first prediction samples I⁽⁰⁾(i,j) of the CU from a reference block in the first reference picture I⁽⁰⁾, wherein i and j represent a coordinate of one sample with the current picture; obtaining, at the decoder, second prediction samples I⁽¹⁾(i, j) of the CU from a reference block in the second reference picture I⁽¹⁾ obtaining, at the decoder, motion refinements for samples in the CU based on bi directional optical flow (BIO) process being applied to the CU based on the first prediction samples the second prediction samples I⁽¹⁾(i, j), horizontal gradient values, and vertical gradient values, wherein the horizontal gradient values and the vertical gradient values are calculated using gradient filters with fewer coefficients; and obtaining, at the decoder, bi-prediction samples of the CU based on the motion refinements.

12. The method of claim 11, wherein the gradient filters comprise 4-tap gradient filters comprising a fractional sample position with values equal to -6, 0, 6, and 0.

13. The method of claim 11, wherein the gradient filters comprise 6-tap gradient filters comprising a fractional sample position with values equal to 4, -11, 0, 9, -3, and 1.

14. A computing device, comprising: one or more processors; and a non-transitory computer-readable storage medium storing instructions executable by the one or more processors, wherein the one or more processors are configured to: obtain a forward reference picture L⁽⁰⁾ and a backward reference picture L⁽¹⁾ associated with a coding unit (CU), wherein the forward reference picture L⁽⁰⁾ is before a current picture and the backward reference picture L⁽¹⁾ is after the current picture in display order; obtain forward reference samples L⁽⁰⁾ (x, y) of the CU from a reference block in the forward reference picture L⁽⁰⁾, wherein x and y represent an integer coordinate of one sample in the forward reference picture L⁽⁰⁾: obtain backward reference samples L⁽¹⁾(x', y') of the CU from a reference block in the backward reference picture L₍ ¹ ₎ wherein x’ and y’ represent an integer coordinate of one sample in the backward reference picture L⁽¹⁾; skip bi-directional optical flow (BIO) process based on a distortion measurement between integer reference samples, wherein the distortion measurement between the integer reference samples indicates a difference between the forward reference samples L⁽⁰⁾(x,y) and the backward reference samples L⁽¹⁾(x', y'); and obtain, when the BIO process is skipped, prediction samples of the CU.

15. The computing device of claim 14, wherein the one or more processors configured to skip the BIO process based on the distortion measurement between the integer reference samples are further configured to: skip the BIO process based on the distortion measurement between the integer reference samples calculated using a set of reference samples at integer coordinates and a distortion metric, wherein the distortion metric comprises one of following metrics: a sum of square error (SSE), a sum of absolute difference (SAD), and a sum of absolute transformed difference (SATD).

16. The computing device of claim 14, wherein the one or more processors configured to skip the BIO process based on the distortion measurement between integer reference samples are further configured to: determine that a decoder-side motion vector refinement (DMVR) process is not applied to a current CU; determine that the difference between the forward reference samples L⁽⁰⁾ (x, y) and the backward reference samples L⁽¹⁾(x', y') of the CU is less than or equal to a predefined threshold thresBIO; and skip the BIO process for the CU.

17. The computing device of claim 14, wherein the one or more processors configured to skip the BIO process based on the distortion measurement between integer reference samples are further configured to: determine that a decoder-side motion vector refinement (DMVR) process is applied to a current CU; divide the current CU into multiple sub-blocks in equal size; determine that the difference between the forward reference samples L⁽⁰⁾ (x, y) and the backward reference samples L⁽¹⁾(x', y') of a sub-block is less than or equal to a predefined threshold thresDMVR; skip the DMVR process for the sub-block; determine that the difference between the forward reference samples L⁽⁰⁾ (x, y) and the backward reference samples L⁽¹⁾(x', y') the sub-block is less than or equal to a predefined threshold thresBIO; and skip the BIO process for the sub-block.

18. The computing device of claim 14, wherein the one or more processors are further configured to: skip the BIO process based on gradient information.

19. The computing device of claim 18, wherein the one or more processors configured to skip the BIO process based on gradient information are further configured to: skip the BIO process when an average magnitude of gradients for samples within the CU is smaller than a flat area threshold, wherein the gradient information comprises the average magnitude of gradients for the samples within the CU.

20. The computing device of claim 18, wherein the one or more processors configured to skip the BIO process based on gradient information are further configured to: skip, at a CU level, the BIO process when an average gradient magnitude of all prediction samples within the CU is smaller than a flat area threshold, wherein the gradient information comprises the average gradient magnitude of all prediction samples within the

CU.

21. The computing device of claim 18, wherein the one or more processors configured to skip the BIO process based on gradient information are further configured to: skip, at a sub-block level, the BIO process when an average gradient magnitude of prediction samples within a corresponding sub-block of the CU is smaller than a flat area threshold, wherein the gradient information comprises the average gradient magnitude of prediction samples within a corresponding sub-block of the CU.

22. The computing device of claim 18, wherein the one or more processors are further configured to: skip the BIO process when the distortion measurement and the gradient information are less than predefined thresholds, wherein the gradient information comprises an average magnitude of gradients for samples within a CU.

23. The computing device of claim 18, wherein the one or more processors are further configured to: skip the BIO process when the distortion measurement and the gradient information are less than predefined thresholds, wherein the gradient information comprises an average magnitude of gradients for samples within a CU.

24. Anon-transitory computer-readable storage medium storing a plurality of programs for execution by a computing device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the computing device to perform acts comprising: obtaining, at a decoder, a first reference picture /⁽⁰⁾ and a second reference picture associated with a coding unit (CU), wherein the first reference picture /⁽⁰⁾ is before a current picture and the second reference picture /⁽¹⁾ is after the current picture in display order; obtaining, at the decoder, first prediction samples I⁽⁰⁾(i, j) of the CU from a reference block in the first reference picture I⁽⁰⁾, wherein i and j represent a coordinate of one sample with the current picture; obtaining, at the decoder, second prediction samples I⁽¹⁾(i, j) of the CU from a reference block in the second reference picture /⁽¹⁾; obtaining, at the decoder, motion refinements for samples in the CU based on bi- directional optical flow (BIO) process being applied to the CU based on the first prediction samples I⁽⁰⁾(i, j), the second prediction samples I⁽¹⁾(i, j), horizontal gradient values, and vertical gradient values, wherein the horizontal gradient values and the vertical gradient values are calculated using gradient filters with fewer coefficients; and obtaining, at the decoder, bi-prediction samples of the CU based on the motion refinements.

25. The non-transitory computer-readable storage medium of claim 24, wherein the gradient filters comprise 4-tap gradient filters comprising a fractional sample position with values equal to -6, 0, 6, and 0.

26. The non-transitory computer-readable storage medium of claim 24, wherein the gradient filters comprise 6-tap gradient filters comprising a fractional sample position with values equal to 4, -11, 0, 9, -3, and 1.