CN115315955A

CN115315955A - Simplified method and apparatus for bi-directional optical flow and decoder-side motion vector refinement

Info

Publication number: CN115315955A
Application number: CN202180022872.6A
Authority: CN
Inventors: 修晓宇; 陈伟; 陈漪纹; 马宗全; 朱弘正; 王祥林; 于冰
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-03-20
Filing date: 2021-03-17
Publication date: 2022-11-08
Also published as: WO2021188707A1

Abstract

Methods, apparatuses, and non-transitory computer-readable storage media for decoding a video signal are provided. A decoder obtains a forward reference picture L associated with a Coding Unit (CU) ⁽⁰⁾ And backward reference picture L ⁽¹⁾ . The decoder may also refer to picture L from the forward direction ⁽⁰⁾ Reference block acquisition inForward reference sample point L of CU ⁽⁰⁾ (x, y). The decoder may also backward reference picture L ⁽¹⁾ The reference block in (1) obtains backward reference sample point L of CU ⁽¹⁾ (x ', y'). The decoder may also skip bi-directional optical flow (BIO) processing based on distortion measurements between integer reference samples. The decoder may also obtain predicted samples of the CU when skipping the BIO process.

Description

Simplified method and apparatus for bi-directional optical flow and decoder-side motion vector refinement

Cross Reference to Related Applications

This application is based on and claims priority from provisional application No. 62/992,893, filed on 3/2/2020, which provisional application is hereby incorporated by reference in its entirety for all purposes.

Technical Field

The present disclosure relates to video coding and compression. More particularly, the present disclosure relates to simplified methods and apparatus for bi-directional optical flow (BIO) tools (also abbreviated BDOF) and decoder-side motion vector refinement (DMVR).

Background

Various video codec techniques may be used to compress video data. Video coding is performed according to one or more video coding standards. For example, today, some well-known video codec standards include the general video codec (VVC), the high efficiency video codec (HEVC, also known as H265 or MPEG-H part 2), and the advanced video codec (AVC, also known as H264 or MPEG-4 part 1) developed jointly by ISO/IEC MPEG and ITU-T VECG. AOMedia Video1 (AVI) was developed by the open media Alliance (AOM) as a successor to its previous standard VP 9. Audio video coding and decoding (AVS), a standard for digital audio and digital video compression, is another video compression family developed by the chinese audio video coding and decoding standards working group. Most existing video codec standards build on a well-known hybrid video codec framework, i.e., using block-based prediction methods (e.g., inter-prediction, intra-prediction) to reduce redundancy present in video images or sequences and using transform codec to compress the energy of the prediction error. An important goal of video codec techniques is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradation of video quality.

Disclosure of Invention

Examples of the present disclosure provide simplified methods and apparatus for bi-directional optical flow (BIO) and decoder-side motion vector refinement (DMVR).

According to a first aspect of the present disclosure, a method for decoding a video signal is provided. The method may include a decoder obtaining a forward reference picture L associated with a Coding Unit (CU) ⁽⁰⁾ And backward reference picture L ⁽¹⁾ . Forward reference picture L in display order ⁽⁰⁾ Reference picture L may be preceded and followed by a current picture ⁽¹⁾ May follow the current picture. The decoder may refer to picture L from the forward direction ⁽⁰⁾ Obtain a forward reference sample point L of the CU ⁽⁰⁾ (x, y). x and y may represent forward reference picture L ⁽⁰⁾ Of the sample point. The decoder may refer to picture L from the backward direction ⁽¹⁾ The reference block in (1) obtains backward reference sample point L of CU ⁽¹⁾ (x ', y'). x 'and y' may represent backward reference picture L ⁽¹⁾ The integer coordinates of one sample point. The decoder may skip bi-directional optical flow (BIO) processing based on distortion measurements between integer reference samples. Loss between integer reference samplesTrue measurements may indicate a forward reference sample point L ⁽⁰⁾ (x, y) and backward reference samples L ⁽¹⁾ (x ', y') in the same direction. When skipping the BIO process, the decoder may obtain the predicted samples of the CU.

According to a second aspect of the present disclosure, a method for decoding a video signal is provided. The method may include a decoder obtaining a first reference picture I associated with a Coding Unit (CU) ⁽⁰⁾ And a second reference picture I ⁽¹⁾ . In display order, first reference picture I ⁽⁰⁾ May precede the current picture, and the second reference picture I ⁽¹⁾ May follow the current picture. The decoder may derive the first reference picture I ⁽⁰⁾ Obtaining a first predicted sample point I of the CU from the reference block in (1) ⁽⁰⁾ (i, j). i and j may represent coordinates of one sample point of the current picture. The decoder may derive the second reference picture I ⁽¹⁾ Obtain a second predicted sample I of the CU ⁽¹⁾ (i, j). The decoder may obtain motion refinement for samples in the CU based on bi-directional optical flow BIO processing, where BIO processing is based on first predicted samples I ⁽⁰⁾ (I, j) second predicted sample I ⁽¹⁾ (i, j), horizontal gradient values, and vertical gradient values are applied to the CU. The horizontal gradient value and the vertical gradient value may be calculated using a gradient filter having fewer coefficients. The decoder may obtain bi-directional predicted samples for the CU based on the motion refinement.

According to a third aspect of the present disclosure, a computing device is provided. The computing device may include one or more processors, a non-transitory computer-readable memory storing instructions executable by the one or more processors. The one or more processors may be configured to obtain a forward reference picture L associated with a Coding Unit (CU) ⁽⁰⁾ And backward reference picture L ⁽¹⁾ . Forward reference picture L in display order ⁽⁰⁾ Reference picture L may be preceded and followed by a current picture ⁽¹⁾ May follow the current picture. The one or more processors may be configured to reference picture L from the front direction ⁽⁰⁾ Obtain a forward reference sample point L of the CU ⁽⁰⁾ (x, y). x and y may represent a forward reference picture L ⁽⁰⁾ Is an integer of one sample pointAnd (4) coordinates. The one or more processors may be configured to reference the picture L from the rear direction ⁽¹⁾ The reference block in (1) obtains backward reference sample point L of CU ⁽¹⁾ (x ', y'). x 'and y' may represent backward reference picture L ⁽¹⁾ Of the sample point. The one or more processors may be configured to skip bidirectional optical flow (BIO) processing based on distortion measurements between integer reference samples. Distortion measurements between integer reference samples may indicate a forward reference sample L ⁽⁰⁾ (x, y) and backward reference samples L ⁽¹⁾ (x ', y') in the same direction. When skipping BIO processing, the one or more processors may be configured to obtain predicted samples of the CU.

According to a fourth aspect of the present disclosure, a non-transitory computer-readable storage medium having instructions stored thereon is provided. When executed by one or more processors of a device, the instructions may cause the device to obtain, at a decoder, a first reference picture I associated with a Coding Unit (CU) ⁽⁰⁾ And a second reference picture I ⁽¹⁾ . In display order, first reference picture I ⁽⁰⁾ May precede the current picture and be a second reference picture I ⁽¹⁾ May follow the current picture. The instructions may also cause the apparatus to determine a second reference picture from the second reference picture at the decoder ⁽⁰⁾ Obtain a first predicted sample I of the CU ⁽⁰⁾ (i, j). i and j may represent coordinates of one sample point of the current picture. The instructions may also cause the apparatus to determine a second reference picture I from a second reference picture I at a decoder ⁽¹⁾ Obtaining a second predicted sample point I of the CU from the reference block in (1) ⁽¹⁾ (i, j). The instructions may also cause the apparatus to obtain, at a decoder, motion refinement for samples in the CU based on bi-directional optical flow (BIO) processing, wherein the BIO processing is based on first predicted samples I ⁽⁰⁾ (I, j) second predicted sample point I ⁽¹⁾ (i, j), horizontal gradient values, and vertical gradient values. The horizontal gradient value and the vertical gradient value may be calculated using a gradient filter having fewer coefficients. The instructions may also cause the apparatus to obtain bi-directional prediction samples for the CU at a decoder based on the motion refinement.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate examples consistent with the disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a block diagram of an encoder according to an example of the present disclosure.

Fig. 2 is a block diagram of a decoder according to an example of the present disclosure.

Fig. 3A is a diagram illustrating block partitioning in a tree partitioning structure of AVS3 according to an example of the present disclosure.

Fig. 3B is a diagram illustrating block partitioning in a tree partitioning structure of AVS3 according to an example of the present disclosure.

Fig. 3C is a diagram illustrating block partitioning in a tree partitioning structure of AVS3 according to an example of the present disclosure.

Fig. 3D is a diagram illustrating block partitioning in a tree partitioning structure of AVS3 according to an example of the present disclosure.

Fig. 3E is a diagram illustrating block partitioning in a tree partitioning structure of AVS3 according to an example of the present disclosure.

Fig. 4 is an illustration of a bi-directional optical flow (BDOF or BIO) model according to an example of the present disclosure.

Fig. 5 is an illustration of a DMVR model according to an example of the present disclosure.

Fig. 6 is an illustration of integer search candidates for DMVR according to an example of the disclosure.

Fig. 7 is a flow diagram of a motion compensation process utilizing DMVR and BIO according to an example of the present disclosure.

Fig. 8 is a flow diagram of a proposed multi-order early termination scheme for BIO and DMVR in accordance with an example of the present disclosure.

Fig. 9 is a method for decoding a video signal according to an example of the present disclosure.

Fig. 10 is a method for decoding a video signal according to an example of the present disclosure.

Fig. 11 is a diagram illustrating a computing environment connected with a user interface according to an example of the present disclosure.

Detailed Description

Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings, in which like numerals in different drawings represent the same or similar elements, unless otherwise specified. The embodiments set forth in the following description of the examples do not represent all embodiments consistent with the present disclosure. Rather, as recited in the appended claims, they are merely examples of apparatus and methods consistent with aspects related to the present disclosure.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein is intended to mean and include any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various information, this information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first information may be referred to as a second information without departing from the scope of the present disclosure; and similarly, the second information may also be referred to as the first information. As used herein, the term "if" can be understood to mean "when 8230; \8230or" after 8230; "or" in response to a judgment ", depending on the context.

The first generation of AVS standard includes Chinese national standard' information technology, advanced audio and video coding and decoding, part 2: video "(referred to as AVS 1) and" information technology, advanced audio video codec, part 16: broadcast television video "(referred to as AVS +). It can provide a bit rate saving of about 50% compared to the MPEG-2 standard at the same perceptual quality. The AVS1 standard video part was promulgated as a national standard in china in 2006, month 2. The second generation AVS standard includes the series of chinese national standards "information technology, high efficiency multimedia codec" (referred to as AVS 2), and AVS2 is mainly directed to the transmission of ultra-HD TV programs. The coding efficiency of AVS2 is twice that of AVS +. In 2016, 5 months, AVS2 was released as a chinese national standard. Meanwhile, the AVS2 standard video part is submitted by the Institute of Electrical and Electronics Engineers (IEEE) as an international standard for applications. The AVS3 standard is a new generation of video codec standard for UHD video applications, aiming at exceeding the codec efficiency of the latest international standard HEVC. In month 3 2019, on the 68 th AVS conference, the AVS3-P2 baseline was completed, which provides a bit rate savings of about 30% above the HEVC standard. Currently, there is a reference software called High Performance Model (HPM) maintained by the AVS group to demonstrate the reference implementation of the AVS3 standard.

As with HEVC, the AVS3 standard builds on a block-based hybrid video codec framework.

Fig. 1 shows a general diagram of a block-based video encoder for VVC. In particular, fig. 1 shows a typical encoder 100. The encoder 100 has a video input 110, motion compensation 112, motion estimation 114, intra/inter mode decision 116, block prediction value 140, adder 128, transform 130, quantization 132, prediction related information 142, intra prediction 118, picture buffer 120, inverse quantization 134, inverse transform 136, adder 126, memory 124, loop filter 122, entropy coding 138, and bitstream 144.

In encoder 100, a video frame is partitioned into multiple video blocks for processing. For each given video block, a prediction is formed based on either an inter prediction method or an intra prediction method.

A prediction residual, representing the difference between the current video block (part of video input 110) and its prediction value (part of block prediction value 140), is sent from adder 128 to transform 130. The transform coefficients are then sent from transform 130 to quantization 132 for entropy reduction. The quantized coefficients are then fed to entropy encoding 138 to generate a compressed video bitstream. As shown in fig. 1, prediction related information 142 from the intra/inter mode decision 116, such as video block partitioning information, motion Vectors (MVs), reference picture indices, and intra prediction modes, is also fed through entropy coding 138 and saved into a compressed bitstream 144. The compressed bitstream 144 comprises a video bitstream.

In the encoder 100, decoder-related circuitry is also required in order to reconstruct the pixels for prediction purposes. First, the prediction residual is reconstructed by inverse quantization 134 and inverse transformation 136. The reconstructed prediction residual is combined with the block prediction value 140 to generate an unfiltered reconstructed pixel for the current video block.

Spatial prediction (or "intra prediction") uses pixels from samples (referred to as reference samples) of already coded neighboring blocks in the same video frame as the current video block to predict the current video block.

Temporal prediction (also referred to as "inter prediction") uses reconstructed pixels from already coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in video signals. The temporal prediction signal for a given Coding Unit (CU) or coding block is typically signaled according to one or more MVs indicating the amount and direction of motion between the current CU and its temporal reference. Furthermore, if multiple reference pictures are supported, one reference picture index is additionally sent, which is used to identify from which reference picture in the reference picture memory the temporal prediction signal comes.

Motion estimation 114 receives video input 110 and signals from picture buffer 120 and outputs motion estimation signals to motion compensation 112. Motion compensation 112 receives video input 110, a signal from picture buffer 120, and a motion estimation signal from motion estimation 114, and outputs the motion compensated signal to intra/inter mode decision 116.

After spatial and/or temporal prediction is performed, an intra/inter mode decision 116 in the encoder 100 selects the best prediction mode, e.g., based on a rate-distortion optimization method. The block prediction value 140 is then subtracted from the current video block and the resulting prediction residual is decorrelated using transform 130 and quantization 132. The resulting quantized residual coefficients are dequantized by dequantization 134 and inverse transformed by inverse transform 136 to form a reconstructed residual, which is then added back to the prediction block to form the reconstructed signal for the CU. Further loop filtering 122, such as a deblocking filter, sample Adaptive Offset (SAO), and/or Adaptive Loop Filter (ALF), may be applied to the reconstructed CU before the reconstructed CU is placed in the reference picture memory of picture buffer 120 and used to encode future video blocks. To form the output video bitstream 144, the codec mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to the entropy encoding unit 138 to be further compressed and packed to form the bitstream.

Fig. 1 shows a block diagram of a generic block-based hybrid video coding system. The input video signal is processed block by block, called Coding Unit (CU). Unlike HEVC, which partitions blocks based on quadtrees only, in AVS3 one Coding Tree Unit (CTU) is divided into CUs to adapt to changing local characteristics based on quadtree/binary tree/extended quadtree. Furthermore, the concept of multiple partition unit types in HEVC is removed, i.e., there is no distinction among CU, prediction Unit (PU) and Transform Unit (TU) in AVS 3; instead, each CU is always used as a basic unit for both prediction and transform without further partitioning. In the tree splitting structure of AVS3, one CTU is first split based on the quadtree structure. Each quadtree leaf node may then be further partitioned based on the binary tree and the extended quadtree structure. As shown in fig. 3A, 3B, 3C, 3D, and 3E, there are five partition types: quad segmentation, horizontal binary segmentation, vertical binary segmentation, horizontally expanded quad-tree segmentation, and vertically expanded quad-tree segmentation.

Fig. 3A shows a diagram illustrating block quad-segmentation in a tree-segmentation structure of AVS3 according to the present disclosure.

Fig. 3B shows a diagram illustrating block vertical binary segmentation in a tree segmentation structure of AVS3 according to the present disclosure.

Fig. 3C shows a diagram illustrating block-level binary segmentation in a tree segmentation structure of AVS3 according to the present disclosure.

Fig. 3D shows a diagram illustrating a block vertical extension quad-partition in a tree-partition structure of AVS3 according to the present disclosure.

Fig. 3E shows a diagram illustrating block-level ternary partitioning in a tree partitioning structure of AVS3 according to the present disclosure.

In fig. 1, spatial prediction and/or temporal prediction may be performed. Spatial prediction (or "intra prediction") uses pixels from samples (referred to as reference samples) of already coded neighboring blocks in the same video picture/slice to predict the current video block. Spatial prediction reduces the spatial redundancy inherent in video signals. Temporal prediction (also referred to as "inter prediction" or "motion compensated prediction") uses reconstructed pixels from already coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in video signals. The temporal prediction signal for a given CU is typically signaled in accordance with one or more Motion Vectors (MVs) that indicate the amount and direction of motion between the current CU and its temporal reference. Furthermore, if multiple reference pictures are supported, one reference picture index is additionally sent, which is used to identify from which reference picture in the reference picture memory the temporal prediction signal comes. After spatial prediction and/or temporal prediction, a mode decision block in the encoder selects the best prediction mode, e.g., based on a rate-distortion optimization method. The prediction block is then subtracted from the current video block and the prediction residual is decorrelated using a transform and then quantized. The quantized residual coefficients are inverse quantized and inverse transformed to form a reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU. Further loop filtering, such as deblocking filters, sample Adaptive Offset (SAO), and Adaptive Loop Filters (ALF), may be applied to the reconstructed CU before it is placed in reference picture memory and used as a reference codec for future video blocks. To form the output video bitstream, the coding/decoding mode (inter or intra), the prediction mode information, the motion information and the quantized residual coefficients are all sent to an entropy coding unit to be further compressed and packed.

Fig. 2 shows a general block diagram of a video decoder for VVC. In particular, fig. 2 shows a block diagram of a typical decoder 200. Decoder 200 has bitstream 210, entropy decoding 212, inverse quantization 214, inverse transform 216, adder 218, intra/inter mode selection 220, intra prediction 222, memory 230, loop filter 228, motion compensation 224, picture buffer 226, prediction related information 234, and video output 232.

The decoder 200 is similar to the reconstruction related parts present in the encoder 100 of fig. 1. In the decoder 200, an input video bitstream 210 is first decoded by entropy decoding 212 to obtain quantized coefficient levels and prediction related information. The quantized coefficient levels are then processed by inverse quantization 214 and inverse transformation 216 to obtain a reconstructed prediction residual. The block prediction value mechanism implemented in the intra/inter mode selector 220 is configured to perform intra prediction 222 or motion compensation 224 based on the decoded prediction information. A set of unfiltered reconstructed pixels is obtained by adding the reconstructed prediction residual from the inverse transform 216 and the prediction output generated according to the block prediction value mechanism using a summer 218.

The reconstructed block may further undergo a loop filter 228 before being stored in a picture buffer 226 that is used as a reference picture memory. The reconstructed video in the picture buffer 226 may be sent to drive a display device and used to predict future video blocks. With the loop filter 228 turned on, a filtering operation is performed on these reconstructed pixels to obtain a final reconstructed video output 232.

Fig. 2 presents a general block diagram of a block-based video decoder. The video bitstream is first entropy decoded at an entropy decoding unit. The codec mode and prediction information are sent to a spatial prediction unit (if intra-coded) or a temporal prediction unit (if inter-coded) to form a prediction block. The residual transform coefficients are sent to an inverse quantization unit and an inverse transform unit to reconstruct a residual block. The prediction block and the residual block are then added together. The reconstructed block may further undergo loop filtering before being stored in the reference picture memory. The reconstructed video in the reference picture store is then sent out for display and used to predict future video blocks.

The focus of the present disclosure is to reduce the complexity of the bi-directional optical flow (BIO) tool and the decoder-side motion vector refinement (DMVR) tool used in both the VVC standard and the AVS3 standard. In VVC, the BIO tool is also abbreviated as BDOF. For the convenience of the description of the present disclosure, in the following, existing BIO and DMVR designs in the AVS3 standard are used as examples for explaining the main design aspects of two codec tools. Following this, possible improvements to existing BIO and DMVR designs are discussed. Finally, methods have been proposed to reduce complexity while maintaining most of the gains of both codec tools. Note that while the existing BIO and DMVR designs in the AVS3 standard are used as the basis in the following description, it will be apparent to those skilled in the art of video codec that the proposed methods described in this disclosure may also be applied to other BIO and DMVR designs or other codec tools having the same or similar design styles.

Bidirectional light stream

Conventional bi-prediction in video coding is a simple combination of two temporally predicted blocks obtained from a reference picture. However, due to the signaling cost and accuracy trade-off of motion vectors, the motion vectors received at the decoder end may be less accurate. As a result, there may still be residual small motion that may be observed between the two prediction blocks, which may reduce the efficiency of motion compensated prediction. To improve on this, the BIO tool is employed in both the VVC standard and the AVS3 standard to compensate for this motion for each sample within a block. In particular, BIO is a sample-by-sample motion refinement performed on top of block-based motion compensated prediction when bi-prediction is used. In existing BIO designs, a refined motion vector for each sample in a block is derived based on a conventional optical flow model. Let I ^(k) (x, y) is a sample value at coordinates (x, y) of the prediction block derived from the reference picture list k (k =0, 1), and

and

are the horizontal and vertical gradients of the sample points. Assuming the optical flow model is valid, the motion refinement at (x, y) (v) can be obtained by _x ，v _y )：

Using a combination of optical flow equations () and interpolation of predicted blocks along the motion trajectory (as shown in fig. 4 described below), we can obtain the following BIO prediction:

fig. 4 shows a diagram of a BDOF model according to the present disclosure.

In FIG. 4, (MV) _x0 ，MV _y0 ) And (MV) _x1 ，MV _y1 ) Indication is used to generate two prediction blocks I ⁽⁰⁾ And I ⁽¹⁾ Block level motion vectors. Further, as shown below, the motion refinement (v) at the sampling point position (x, y) is calculated by minimizing the difference Δ between the values of the sampling points (i.e., a and B in fig. 4) after the motion refinement compensation _x ，v _y )：

Furthermore, to ensure regularity of the resulting motion refinement, it is assumed that the motion refinement is consistent within a local surrounding area centered at (x, y); thus, in the current BIO design in AVS3, (v) is obtained by minimizing Δ within a 4 × 4 window Ω around the current sample point at (x, y) _x ，v _y ) The values of (a) are as follows:

as shown in (2) and (4), in addition to block level MC, it is also necessary to compensate the block (i.e., I) for each motion in the BIO ⁽⁰⁾ And I ⁽¹⁾ ) In order to obtain local motion refinement and generate a final prediction at the location of the sample.In AVS3, the gradient is computed by a 2D separable Finite Impulse Response (FIR) filtering process that defines a set of 8-tap filters and applies different filters to account for block-level motion vectors (e.g., (MV) in fig. 4 _x0 ，MV _y0 ) And (MV) _x1 ，MV _y1 ) ) yields horizontal and vertical gradients. Table 1 shows the coefficients of the gradient filter used by the BIO.

TABLE 1 gradient Filter for use in BIO

Fractional position	Gradient filter
		0	{-4，11，-39，-1，41，-14，8，-2}
1/4	{-2，6，-19，-31，53，-12，7，-2}
		1/2	{0，-1，0，-50，50，0，1，0}
3/4	{2，-7，12，-53，31，19，-6，2}

Finally, the BIO is only applied to bi-prediction blocks, which are predicted from two reference blocks from temporally neighboring pictures. Furthermore, BIO is enabled without sending additional information from the encoder to the decoder. In particular, the BIO is applied to all bi-prediction blocks having both forward and backward prediction signals.

Decoder side motion vector Derivation (DMVR)

Similar to the VVC standard, to improve the accuracy of the MV for the conventional merge mode, decoder-side motion vector refinement (DMVR) based on bilateral matching is applied in AVS 3. In the bi-directional prediction operation, refinement MVs are searched around the initial MVs in the reference picture list L0 and the reference picture list L1. The method calculates distortion between two candidate blocks in the reference picture list L0 and the reference picture list L1. As shown in fig. 5 (described below), the SAD between the red blocks based on each MV candidate around the initial MV is calculated. The MV candidate with the lowest SAD becomes the refined MV and is used to generate the bi-directional prediction signal.

Fig. 5 shows the decoding side motion vector refinement.

Fig. 5 includes 520 refPic in list L0, 540 current picture and 560refPic in list L1. 520 refPic in list L0 is the reference picture of the first list and includes 522 the current CU, 524 reference block, 526 MVdiff, 528 MV0, and 530 MV0'.526 MVdiff is 522 the motion vector difference between the current CU and 524 reference block. 528 MV0 is the motion vector between the current CU of block 522 and the current CU of block 542. 530 MV0' is the motion vector between the current CU of block 522 and the current CU of block 542. 540 the current picture is the current picture of the video and includes 542 current CU, 544 MV1', and 546 MV1. 544 MV1' is the motion vector between the current CU and the 562 reference block of block 542. 546 MV1 is the motion vector between the current CU and the current CU of

block

542 and 564. 560refPic in list L1 is a reference picture in the second list and includes 562 reference blocks, 564 current CU and 566-MVdiff.566-MVdiff is the motion vector difference between the 562 reference block and the 564 current CU.

In DVMR, the search points surround integer samples pointed to by the initial MV and MV offsets that are considered to comply with the mirroring rule. In other words, any MV refinement examined by DMVR should satisfy the following two equations:

MV0′＝MV0+MV_offset (5)

MV1′＝MV1-MV_offset (6)

wherein MV _ offset represents a refinement offset between an initial MV and a refined MV in one of the plurality of reference pictures. The refined search range is two integer luma samples from the initial MV. The search comprises an integer sampling point search stage and a fractional sampling point refinement stage.

At the stage of the integer search, the Sum of Absolute Differences (SAD) of 21 integer sample positions (including the integer sample position corresponding to the initial MV) as shown in fig. 6 (described below) is checked. The SAD of the initial MV pair is first calculated. The integer offset that minimizes the SAD value is selected as the integer sample offset for the integer search stage.

Fig. 6 shows integer search candidates for DMVR. The black triangles are integer sample positions associated with the initial MV, and the blank or white triangles are adjacent integer sample positions.

Integer sample search is followed by fractional sample refinement. To save computational complexity, fractional sample refinement is achieved by using a parametric error surface approach rather than an additional search with SAD comparisons. In the sub-pixel offset estimation based on the parametric error surface, the 2-D parabolic error surface equation is fitted using the center position cost and the cost of four neighboring positions from the center as follows:

E(x，y)＝A(x-x _min ) ² +B(y-y _min ) ² +C (7)

wherein (x) _min ，y _min ) Corresponds to the fractional position with the smallest cost and C corresponds to the smallest cost value. By solving the above equation using the cost values of the five search points, (x) is calculated as follows _min ，y _min )：

x _min ＝(E(-1，0)-E(1，0))/(2(E(-1，0)+E(1，0)-2E(0，0))) (8)

y _min ＝(E(0，-1)-E(0，1))/(2((E(0，-1)+E(0，1)-2E(0，0))) (9)

Since all cost values are positive and the minimum value is E (0, 0), x _min And y _min Is automatically constrained between-8 and 8. The calculated score (x) _min ，y _min ) And adding the integer distance refinement MV to obtain the sub-pixel precision refinement increment MV.

Improvements to BIO and DMVR

Although BIO and DMVR can effectively enhance the efficiency of motion compensation, they also introduce significant complexity increases in both hardware and software to the encoder and decoding design. Specifically, in this disclosure, the following complexity issues in existing BIO and DMVR designs are addressed:

first, as discussed above, in the conventional motion compensation stage, DMVR and BIO are always enabled for bi-predicted blocks with both forward and backward prediction signals. Such a design may not be practical for a particular video application (e.g., a video stream on a mobile device) that cannot afford too heavy a calculation due to its limited power. For example, the BIO needs to derive a gradient value at each sample position, which requires multiple multiplications and additions due to the 2D FIR filtering operation, and the DMVR needs to compute multiple SAD values during the bilateral matching process. All of these operations require intensive computations. This complexity increase may become even more pronounced when applying the BIO in conjunction with the DMVR to a bi-directionally predicted CU. As shown in fig. 7 (described below), in this case, the CU needs to be further divided into a plurality of sub-blocks for DMVR, and each sub-block can present a unique motion vector. Therefore, when applying the BIO further, all BIO-related operations need to be performed separately for each sub-block. This may result in considerable computational complexity and memory bandwidth, and may potentially complicate pipeline design and/or parallel processing in terms of hardware and software.

Fig. 7 shows a flow chart of a motion compensation process using DMVR and BIO.

At step 701, the process begins.

In step 702, a CU is divided into sub-blocks with a size equal to min (16, CUWidth) x min (16, CUHeight).

In step 703, a variable i is set to 0.

At step 704, the DMVR is applied to the ith sub-block.

In step 705, motion compensation is applied to the ith sub-block with refined motion.

In step 706, BIO is applied to the ith sub-block.

In step 707, it is determined whether the current sub-block is the last sub-block. If so, the process continues to step 709. If not, the process continues to step 708.

At step 708, the variable i is incremented by 1, and the process continues to step 704.

At step 709, the process ends.

Second, as previously described, the current BIO uses a 2D separable FIR filter to calculate the horizontal gradient value and the vertical gradient value. Specifically, a low-pass 8-tap interpolation filter (used for interpolation in conventional motion compensation) and a high-pass 8-tap gradient filter (as shown in table 1) are applied, and the filter selection is based on the fractional position of the corresponding motion vector. Assuming that L0 MV and L1 MV point to reference samples at fractional sample positions in both the horizontal direction and the vertical direction, the number of times of multiplication and addition for calculating the horizontal gradient and the vertical gradient in L0 and L1 will be (W × (H + 7) + W × H) × 2 × 2. However, due to the high-pass nature of gradient computation, using a gradient filter with more filter coefficients may not always be beneficial to accurately extract useful gradient information from neighboring samples. Therefore, it is highly desirable to further reduce the filter length of the gradient filter for BIO, which not only potentially improves the accuracy of the gradient derivation, but also reduces the computational complexity of BIO.

Proposed method

To reduce the complexity of motion compensation, methods are first proposed to conditionally bypass DMVR and/or BIO processing under conventional motion compensation. In particular, in the proposed method, the main aspects of BIO and DMVR remain the same as their existing design. However, when certain conditions are met as set forth below, the BIO and DMVR will be skipped entirely at the CU level or sub-block level. Second, to reduce the computational complexity of the BIO, a gradient filter with fewer coefficients is proposed to replace the existing 8-tap gradient filter in the existing BIO design.

Skipping BIO and DMVR based on similarity of reference samples

As described above, both BIO and DMVR utilize L0 and L1 predicted samples to derive different levels of granularity of local motion refinement (e.g., BIO derives motion refinement for each sample and motion refinement for DMVR is calculated for each sub-block). When the difference between the L0 prediction signal and the L1 prediction signal is small, the two prediction blocks are highly correlated, so that DMVR and BIO processing can be safely skipped without incurring a large codec loss. However, because the initial motion vector may point to the reference sample at the fractional sample position, the generation of L0 and L1 predicted samples may invoke interpolation processing, which requires non-negligible complexity and results in some delay to make the decision. Another way to measure the correlation between two prediction blocks, instead of directly comparing prediction samples, on the other hand, is to calculate the difference between their integer reference samples in a reference picture from which the two prediction blocks were generated, i.e.,

wherein L is ⁽⁰⁾ (x, y) and L ⁽¹⁾ (x ', y') are integer sample coordinates (x, y) at the forward and backward reference pictures and sample values at (x ', y'); b and B' are sets of integer sample coordinates used to generate L0 and L1 prediction samples for the current block; n is the number of samples in the current block; d is a distortion measure to which different metrics may be applied, such as Sum of Squared Error (SSE), sum of Absolute Difference (SAD), and Sum of Absolute Transformed Difference (SATD). Given equation (10), the difference measure is not greater than a predefined threshold (i.e., diff ≦ D) _thres ) BIO and/or DMVR can be skipped during the motion compensation phase; otherwise, the BIO and/or DMVR still needs to be applied.

The proposed early termination methods for DMVR and BIO may be performed at the CU level or sub-block level, which may potentially provide various tradeoffs between coding performance and complexity reduction. On the one hand, due to the finer granularity of control of BIO and DMVR, sub-block level early termination may better maintain the codec gains of BIO and DMVR. This may not be optimal in terms of complexity reduction, considering that the distortion measurement and early termination decision need to be performed for each sub-block separately. On the other hand, although CU-level early termination may potentially result in more significant complexity reduction, it may not be possible to codec blocks with non-uniform characteristics to achieve an acceptable tradeoff between performance and complexity. Furthermore, as described above, when applying DMVR, the motion field of the current CU is obtained at a sub-block level. Thus, after applying DMVR, the predicted samples of the various sub-blocks within the current CU are likely to become more diverse. In this case, the decoder may determine whether DMVR and BIO processing will be bypassed on a sub-block basis. Otherwise (i.e., when no DMVR is applied), it is acceptable to rely on the CU-level distortion measure to determine whether the BIO processing for the entire CU will be bypassed. Based on this consideration, a multi-stage early termination method is proposed to adaptively skip the BIO and DMVR processing at the CU or sub-block level depending on whether DMVR is allowed for a current CU. Fig. 8 (described below) shows the proposed multi-stage early termination method applied to the modified motion compensation process of BIO and DMVR.

Fig. 8 shows a flow diagram of a proposed multi-stage early termination scheme for BIO and DMVR.

At step 801, the process begins.

At step 802, it is determined whether DMVR is applied to the CU. If not, the process continues to step 811. If so, the process continues to step 803.

In step 803, the CU is divided into sub-blocks with a size equal to min (16, CUWidth) x min (16, CUHeight).

At step 804, a variable i is set to 0.

In step 805, it is determined whether the difference between the L0 integer reference samples and the L1 integer reference samples of the sub-block as calculated in equation 10 is less than or equal to a threshold threshdmvr. If so, the process continues to step 807. If not, the process continues to step 806.

At step 806, the DMVR is applied to the ith sub-block.

In step 807, it is determined whether the difference between the L0 integer reference sample point and the L1 integer reference sample point of the sub-block as calculated in equation 10 is less than a threshold thresBIO. If so, the process continues to step 809. If not, the process continues to step 808.

In step 808, BIO is applied to the ith sub-block.

At step 809, it is determined whether the current sub-block is the last block. If so, the process continues to step 813. If not, the process continues to step 810.

At step 810, the variable i is incremented by 1 and the process continues to step 805.

In step 811, it is determined whether the difference as calculated in equation 10 is less than the threshold thresBIO. If so, the process continues to step 813. If not, the process continues to step 812.

In step 812, the BIO is applied to the CU.

At step 813, the process ends.

As shown in fig. 8, the proposed method can be summarized as follows.

First, when the DMVR is disabled for the current CU, a decision is made at the CU level as to whether to bypass the BIO processing. Specifically, if the distortion measure of the CU's reference samples (as shown in (10)) is not greater than the predefined threshold thresBIO, the BIO process is completely disabled for the entire CU; otherwise, BIO is still applied to the CU.

Second, when DMVR is enabled (i.e., allowed) for the current CU, a decision is made at the sub-block level as to whether to bypass BIO and DMVR. Furthermore, two thresholds ThresBIO and threshdmvr are used to bypass the BIO and DMVR, respectively for each sub-block.

Fig. 9 shows a method for decoding a video signal. The method may for example be applied to a decoder.

At step 910, the decoder may obtain a forward reference picture L associated with a Coding Unit (CU) ⁽⁰⁾ And backward reference picture L ⁽¹⁾ . Forward reference picture L in display order ⁽⁰⁾ Before the current picture, and backward reference picture L ⁽¹⁾ After the current picture.

In step 912, the decoder may refer to picture L from the forward direction ⁽⁰⁾ Obtaining a forward reference sample point L of the CU by the reference block in (1) ⁽⁰⁾ (x, y). x and y denote forward reference pictures L ⁽⁰⁾ Of the sample point.

In step 914, the decoder may refer to picture L from the backward direction ⁽¹⁾ Obtaining backward reference sample point L of CU by reference block in (1) ⁽¹⁾ (x ', y'). x 'and y' denote backward reference pictures L ⁽¹⁾ Of the sample point.

In step 916, the decoder may skip bidirectional optical flow (BIO) processing based on distortion measurements between integer reference samples. Distortion measure between integer reference samples indicates a forward reference sample L ⁽⁰⁾ (x, y) and backward reference samples L ⁽¹⁾ (x ', y') in the same direction.

In step 918, when the BIO process is skipped, the decoder can obtain predicted samples of the CU.

Skipping BIO based on gradient information

As discussed above, BIO is designed to improve the accuracy of motion compensated prediction by providing a sample-by-sample motion refinement that is calculated based on the local gradient calculated at each sample position in one motion compensated block. For a block within an area that includes less high frequency detail (e.g., a flat area), error! Gradient values that do not find a reference source, such as using a gradient filter, tend to be small. Thus, BIO cannot provide an efficient refinement of the predicted samples for those blocks. This can be demonstrated by equation (2) when the local gradient (i.e.,

and

) Close to zero, the final prediction signal obtained from the BIO is approximately equal to the prediction signal generated by conventional bi-prediction (i.e.,

). In view of this, in one embodiment of the present disclosure, it is proposed to apply the BIO only to prediction samples of an encoded block that includes sufficient high frequency information. Whether a prediction signal for a video block is a packet or not may be made based on various criteriaIncluding the determination of sufficient high frequency information. In one example, an average of the gradients for the samples within a block may be used. If the average gradient value is less than a threshold, then the block is classified as a flat region and BIO should not be applied; otherwise, the block is considered to include sufficient high frequency detail, with the BIO still applicable.

Similar to the early termination method proposed in the section "skipping BIO and DMVR based on similarity of reference samples", the proposed gradient-based BIO early termination method can also be applied at CU level or sub-block level. When the method is applied at the CU level, the gradient values of all predicted samples within the CU are used to determine whether to bypass the BIO. Otherwise, when the method is applied at the sub-block level, a decision is made separately for each sub-block as to whether to skip the BIO processing by comparing the average gradient values of the prediction samples within the corresponding sub-block.

In another embodiment, it is proposed to check the mixing condition of both the reference sample point difference (according to equation 10) and the gradient information. Note that under such conditions, the reference sample point difference and gradient information may be examined jointly or individually. In the joint case, both the sample difference and gradient information should be important in order to apply the BIO. Otherwise, the BIO is skipped. In the individual case, when the sample difference or gradient information is small (e.g., by threshold comparison), the BIO is skipped.

Simplified gradient filter for BIO

In the current BIO design, it uses a 2D separable 8-tap FIR filter (i.e., an 8-tap interpolation filter and an 8-tap gradient filter) to calculate the horizontal and vertical gradient values. As discussed previously, using an 8-tap gradient filter may not always be effective in accurately extracting gradient information from reference samples, while resulting in a non-negligible increase in computational complexity. To improve on this, in this section, a gradient filter with fewer coefficients is proposed for computing the gradient information used by the BIO.

In particular, the input of the gradient derivation process may include the same reference samples as used for motion compensation and the input Motion (MV) of the current block _x ，MV _y ) Fractional component (fracX, fr)acY). Furthermore, depending on the direction of the resulting gradient, a gradient filter h is applied _G And an interpolation filter h _L In a different order. Specifically, in the case of deriving a horizontal gradient, a gradient filter h is first applied in the horizontal direction _G To obtain a horizontal gradient value at a horizontal fractional sampling point position fracX; then, the interpolation filter h is applied vertically _L To interpolate the gradient values at the vertical fractional sample position fracY. Conversely, when deriving the vertical gradient, the interpolation filter h is first applied horizontally _L To interpolate the intermediate interpolated samples at the horizontal sample position fracX and then apply the gradient filter h in the vertical direction _G To derive the vertical gradient at the vertical fractional sample position fracY from the intermediate interpolated samples. In one embodiment, a 4-tap gradient filter as shown in table 2 is proposed for gradient calculation of BIO.

Table 2 proposed 4-tap gradient filter for use in BIO

In another embodiment, a 6-tap gradient filter in table 3 is proposed to obtain the gradient for BIO.

Table 3 proposed 6-tap gradient filter for use in BIO

Fig. 10 shows a method for decoding a video signal. The method may for example be applied to a decoder.

In step 1010, the decoder may obtain a first reference picture I associated with a Coding Unit (CU) ⁽⁰⁾ And a second reference picture I ⁽¹⁾ . In display order, the first reference picture I ⁽⁰⁾ Before the current picture, and a second reference picture I ⁽¹⁾ After the current picture.

In step 1012, the decoder may proceed from the first reference picture I ⁽⁰⁾ Obtain a first predicted sample I of the CU ⁽⁰⁾ (i, j). i and j represent the coordinates of one sample point of the current picture.

In step 1014, the decoder may select a reference picture from the second reference picture I ⁽¹⁾ Obtaining a second predicted sample point I of the CU from the reference block in (1) ⁽¹⁾ (i，j)。

In step 1016, the decoder may obtain motion refinement for samples in the CU based on a bi-directional optical flow (BIO) process, where the BIO process is based on a first predicted sample I ⁽⁰⁾ (I, j) second predicted sample point I ⁽¹⁾ (i, j), horizontal gradient values, and vertical gradient values are applied to the CU. The horizontal gradient value and the vertical gradient value are calculated using a gradient filter having fewer coefficients.

In step 1018, the decoder may obtain bi-directional predicted samples for the CU based on the motion refinement.

FIG. 11 illustrates a computing environment 1110 coupled with a user interface 1160. The computing environment 1110 may be part of a data processing server. The computing environment 1110 includes a processor 1120, memory 1140, and I/O interfaces 1150.

The processor 1120 generally controls the overall operation of the computing environment 1110, such as operations associated with display, data acquisition, data communication, and image processing. The processor 1120 may include one or more processors for executing instructions to perform all or some of the steps of the above-described methods. Further, processor 1120 may include one or more modules that facilitate interaction between processor 1120 and other components. The processor may be a Central Processing Unit (CPU), microprocessor, single-chip, GPU, etc.

The memory 1140 is configured to store various types of data to support operation of the computing environment 1110. The memory 1140 may include predetermined software 1142. Examples of such data include instructions for any application or method operating on computing environment 1110, video data sets, image data, and so forth. The memory 1140 may be implemented using any type of volatile or non-volatile memory device, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, a magnetic or optical disk, or a combination thereof.

I/O interface 1150 provides an interface between processor 1120 and peripheral interface modules such as a keyboard, click wheel, buttons, etc. The buttons may include, but are not limited to, a home button, a start scan button, and a stop scan button. The I/O interface 1150 may interface with an encoder and a decoder.

In some embodiments, a non-transitory computer readable storage medium comprising a plurality of programs, such as included in the memory 114, executable by the processor 1120 in the computing environment 1110 to perform the above-described methods is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and so on.

In a non-transitory computer readable storage medium, a plurality of programs for execution by a computing device having one or more processors are stored, wherein the plurality of programs, when executed by the one or more processors, cause the computing device to perform the above-described method for motion prediction.

In some embodiments, the computing environment 1110 may be implemented with one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), graphics Processing Units (GPUs), controllers, micro-controllers, microprocessors, or other electronic components to perform the above-described methods.

The description of the present disclosure has been presented for purposes of illustration and is not intended to be exhaustive or limited to the disclosure. Many modifications, variations and alternative embodiments will become apparent to those skilled in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings.

The embodiments were chosen and described in order to explain the principles of the disclosure and to enable others of ordinary skill in the art to understand the disclosure for various embodiments and to best utilize the disclosure with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of the disclosure is not to be limited to the specific examples of the embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the disclosure.

Claims

1. A method for decoding a video signal, comprising:

obtaining, at a decoder, a forward reference picture L associated with a coding unit CU ⁽⁰⁾ And backward reference picture L ⁽¹⁾ Wherein in display order, the forward reference picture L ⁽⁰⁾ Preceding the current picture and the backward reference picture L ⁽¹⁾ Subsequent to the current picture;

at the decoder, from the forward reference picture L ⁽⁰⁾ Obtain a forward reference sample point L of the CU ⁽⁰⁾ (x, y), wherein x and y denote the forward reference picture L ⁽⁰⁾ An integer coordinate of one of the sample points;

at the decoder, from the backward reference picture L ⁽¹⁾ Obtaining a backward reference sample point L of the CU by the reference block in the coding unit ⁽¹⁾ (x, 'y'), where x 'and y' denote the backward reference picture L ⁽¹⁾ An integer coordinate of one of the sample points;

skipping, at a decoder, bi-directional optical flow BIO processing based on a distortion measure between integer reference samples, wherein the distortion measure between integer reference samples indicates the forward reference samples L ⁽⁰⁾ (x, y) and the backward reference sample point L ⁽¹⁾ (x ', y'); and is

At the decoder, when the BIO process is skipped, the predicted samples of the CU are obtained.

2. The method of claim 1, wherein skipping the BIO processing based on the distortion measure between the integer reference samples at the decoder comprises:

at a decoder, skipping BIO processing based on a distortion measure between integer reference samples calculated using a set of reference samples at integer coordinates and a distortion metric, wherein the distortion metric comprises one of: the sum of squared errors SSE, the sum of absolute differences SAD and the sum of absolute transformed differences SATD.

3. The method of claim 1, wherein skipping, at the decoder, the BIO processing based on the distortion measure between the integer reference samples comprises:

at the decoder, determining that decoder-side motion vector refinement (DMVR) processing is not applied to the current CU;

determining, at a decoder, the forward reference sample L of the CU ⁽⁰⁾ (x, y) and the backward reference sampling point L ⁽¹⁾ (x ', y') is less than or equal to a predefined threshold thresBIO; and is provided with

At the decoder, the BIO processing for the CU is skipped.

4. The method of claim 1, wherein skipping, at the decoder, the BIO processing based on the distortion measure between the integer reference samples comprises:

at the decoder, determining that a decoder-side motion vector refinement DMVR process is applied to the current CU;

at a decoder, dividing the current CU into a plurality of sub-blocks of equal size;

determining, at a decoder, the forward reference samples L of a sub-block ⁽⁰⁾ (x, y) and the backward reference sample point L ⁽¹⁾ (x ', y') is less than or equal to a predefined threshold threshDMVR;

skipping, at a decoder, DMVR processing for the sub-blocks;

determining, at a decoder, the forward reference samples L of the sub-block ⁽⁰⁾ (x, y) and the backward reference sample point L ⁽¹⁾ (x ', y') is less than or equal to a predefined threshold thresBIO; and is

At the decoder, the BIO processing for the sub-blocks is skipped.

5. The method of claim 1, further comprising:

at the decoder, the BIO process is skipped based on the gradient information.

6. The method of claim 5, wherein skipping BIO processing at the decoder based on the gradient information comprises:

skipping BIO processing at a decoder when an average of gradients for samples within the CU is less than a flat region threshold, wherein the gradient information comprises an average of gradients for samples within the CU.

7. The method of claim 5, wherein skipping BIO processing at the decoder based on the gradient information comprises: when the average gradient value of all predicted samples within the CU is less than a flat region threshold, at a decoder and at a CU level, skipping BIO processing, wherein the gradient information comprises the average gradient value of all predicted samples within the CU.

8. The method of claim 5, wherein skipping BIO processing at the decoder based on the gradient information comprises: skipping BIO processing at a decoder and at a sub-block level when an average gradient value of predicted samples within a corresponding sub-block of the CU is less than a flat region threshold, wherein the gradient information comprises an average gradient value of predicted samples within a corresponding sub-block of the CU.

9. The method of claim 5, further comprising:

when the distortion measure and the gradient information are less than a predefined threshold, skipping BIO processing at a decoder, wherein the gradient information comprises an average of gradients for samples within a CU.

10. The method of claim 5, further comprising:

11. A method for decoding a video signal, comprising:

obtaining, at a decoder, a first reference picture I associated with a coding unit CU ⁽⁰⁾ And a second reference picture I ⁽¹⁾ Wherein, in display order, the first reference picture I ⁽⁰⁾ Before the current picture, and a second reference picture I ⁽¹⁾ Subsequent to the current picture;

at the decoder, from a first reference picture I ⁽⁰⁾ Obtaining a first predicted sample point I of the CU from the reference block in (1) ⁽⁰⁾ (i, j), wherein i and j represent coordinates of a sample point of the current picture;

at the decoder, from the second reference picture I ⁽¹⁾ Obtaining a second predicted sample point I of the CU from the reference block in (1) ⁽¹⁾ (i,j)；

Obtaining, at a decoder, motion refinement for samples in the CU based on bi-directional optical flow BIO processing, wherein BIO processing is based on a first predicted sample I ⁽⁰⁾ (I, j) second predicted sample I ⁽¹⁾ (i, j), horizontal gradient values and vertical gradient values are applied to the CU, the horizontal gradient values and the vertical gradient values calculated using a gradient filter with fewer coefficients; and is

At a decoder, bi-directionally predicted samples of the CU are obtained based on the motion refinement.

12. The method of claim 11, wherein the gradient filter comprises a 4-tap gradient filter comprising fractional sample positions having values equal to-6, 0, 6, and 0.

13. The method of claim 11, wherein the gradient filter comprises a 6-tap gradient filter comprising fractional sample positions having values equal to 4, -11, 0, 9, -3, and 1.

14. A computing device, comprising:

one or more processors; and

a non-transitory computer-readable storage medium storing instructions executable by the one or more processors, wherein the one or more processors are configured to:

obtaining a forward reference picture L associated with a coding unit CU ⁽⁰⁾ And backward reference picture L ⁽¹⁾ Wherein in display order, the forward reference picture L ⁽⁰⁾ Preceding the current picture and the backward reference picture L ⁽¹⁾ Subsequent to the current picture;

from the forward reference picture L ⁽⁰⁾ Obtaining a forward reference sample point L of the CU by the reference block in (1) ⁽⁰⁾ (x, y), wherein x and y denote the forward reference picture L ⁽⁰⁾ An integer coordinate of one of the sample points;

from the backward reference picture L ⁽¹⁾ Obtaining a backward reference sample point L of the CU by the reference block in the coding unit ⁽¹⁾ (x ', y'), where x 'and y' represent the backward reference picture L ⁽¹⁾ An integer coordinate of a sample point;

skipping bi-directional optical flow BIO processing based on distortion measurements between integer reference samples, wherein the distortion measurements between the integer reference samples indicate the forward reference samples L ⁽⁰⁾ (x, y) and the backward reference sampling point L ⁽¹⁾ (x ', y'); and is

When the BIO processing is skipped, obtaining the predicted sample point of the CU.

15. The computing device of claim 14, wherein the one or more processors configured to skip BIO processing based on distortion measurements between integer reference samples are further configured to:

skipping BIO processing based on a distortion measure between integer reference samples calculated using a set of reference samples at integer coordinates and a distortion metric, wherein the distortion metric comprises one of: the sum of squared errors SSE, the sum of absolute differences SAD and the sum of absolute transformed differences SATD.

16. The computing device of claim 14, wherein the one or more processors configured to skip BIO processing based on distortion measurements between integer reference samples are further configured to:

determining that decoder-side motion vector refinement (DMVR) processing is not applied to the current CU;

determining the forward reference sample point L of the CU ⁽⁰⁾ (x, y) and the backward reference sample point L ⁽¹⁾ (x ', y') is less than or equal to a predefined threshold thresBIO; and is

Skip BIO processing for the CU.

17. The computing device of claim 14, wherein the one or more processors configured to skip BIO processing based on distortion measurements between integer reference samples are further configured to:

determining that decoder-side motion vector refinement (DMVR) processing is applied to the current CU;

dividing the current CU into a plurality of sub-blocks of equal size;

determining the forward reference samples L of a sub-block ⁽⁰⁾ (x, y) and the backward reference sampling point L ⁽¹⁾ (x ', y') is less than or equal to a predefined threshold threshDMVR;

skipping DMVR processing for the sub-block;

determining the forward reference samples L of the sub-block ⁽⁰⁾ (x, y) and the backward reference sampling point L ⁽¹⁾ (x ', y') is less than or equal to a predefined threshold thresBIO; and is provided with

Skip BIO processing for the sub-blocks.

18. The computing device of claim 14, wherein the one or more processors are further configured to:

the BIO process is skipped based on the gradient information.

19. The computing device of claim 18, wherein the one or more processors configured to skip BIO processing based on gradient information are further configured to:

skip BIO processing when an average of gradients for samples within the CU is less than a flat region threshold, wherein the gradient information includes the average of gradients for samples within the CU.

20. The computing device of claim 18, wherein the one or more processors configured to skip BIO processing based on gradient information are further configured to:

skipping BIO processing at a CU level when an average gradient value of all predicted samples within the CU is less than a flat region threshold, wherein the gradient information comprises the average gradient value of all predicted samples within the CU.

21. The computing device of claim 18, wherein the one or more processors configured to skip BIO processing based on gradient information are further configured to:

skipping BIO processing at a sub-block level when an average gradient value of predicted samples within a corresponding sub-block of the CU is less than a flat region threshold, wherein the gradient information comprises the average gradient value of predicted samples within the corresponding sub-block of the CU.

22. The computing device of claim 18, wherein the one or more processors are further configured to:

skipping BIO processing when the distortion measure and the gradient information are less than a predefined threshold, wherein the gradient information comprises an average of gradients for samples within a CU.

23. The computing device of claim 18, wherein the one or more processors are further configured to:

24. A non-transitory computer readable storage medium storing a plurality of programs for execution by a computing device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the computing device to perform acts comprising:

obtaining, at a decoder, a first reference picture I associated with a coding unit CU ⁽⁰⁾ And a second reference picture I ⁽¹⁾ Wherein, in display order, the first reference picture I ⁽⁰⁾ Preceding the current picture and a second reference picture I ⁽¹⁾ Subsequent to the current picture;

Obtaining, at a decoder, motion refinement for samples in the CU based on a bi-directional optical flow BIO process, wherein the BIO process is based on a first predicted sample I ⁽⁰⁾ (I, j) second predicted sample point I ⁽¹⁾ (i, j), horizontal gradient values and vertical gradient values are applied to the CU, the horizontal gradient values and the vertical gradient values calculated using a gradient filter with fewer coefficients; and is provided with

25. The non-transitory computer readable storage medium of claim 24, wherein the gradient filter comprises a 4-tap gradient filter comprising fractional sample positions with values equal to-6, 0, 6, and 0.

26. The non-transitory computer readable storage medium of claim 24, wherein the gradient filter comprises a 6-tap gradient filter comprising fractional sample positions having values equal to 4, -11, 0, 9, -3, and 1.