WO2021188707A1 - Methods and apparatuses for simplification of bidirectional optical flow and decoder side motion vector refinement - Google Patents
Methods and apparatuses for simplification of bidirectional optical flow and decoder side motion vector refinement Download PDFInfo
- Publication number
- WO2021188707A1 WO2021188707A1 PCT/US2021/022811 US2021022811W WO2021188707A1 WO 2021188707 A1 WO2021188707 A1 WO 2021188707A1 US 2021022811 W US2021022811 W US 2021022811W WO 2021188707 A1 WO2021188707 A1 WO 2021188707A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- samples
- decoder
- gradient
- bio
- block
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 140
- 230000003287 optical effect Effects 0.000 title claims abstract description 23
- 239000013598 vector Substances 0.000 title claims description 28
- 230000002457 bidirectional effect Effects 0.000 title description 6
- 230000008569 process Effects 0.000 claims abstract description 86
- 238000005259 measurement Methods 0.000 claims abstract description 28
- 239000000523 sample Substances 0.000 description 44
- 238000005192 partition Methods 0.000 description 25
- 238000010586 diagram Methods 0.000 description 21
- 230000002123 temporal effect Effects 0.000 description 18
- 238000013461 design Methods 0.000 description 15
- 238000013139 quantization Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 7
- 238000001914 filtration Methods 0.000 description 6
- 238000000638 solvent extraction Methods 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 238000009795 derivation Methods 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 238000007792 addition Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 239000013074 reference sample Substances 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 241000023320 Luma <angiosperm> Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000002146 bilateral effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 239000000796 flavoring agent Substances 0.000 description 1
- 235000019634 flavors Nutrition 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
- H04N19/52—Processing of motion vectors by encoding by predictive encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/577—Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
Definitions
- This disclosure is related to video coding and compression. More specifically, this disclosure relates to methods and apparatuses for simplifications of bidirectional optical flow (BIO) tool (also abbreviated as BDOF) and decoder-side motion vector refinement (DMVR).
- BIO bidirectional optical flow
- DMVR decoder-side motion vector refinement
- Video coding is performed according to one or more video coding standards.
- video coding standards include Versatile Video Coding (VVC), High Efficiency Video Coding (HEVC, also known as H.265 or MPEG-H Part2) and Advanced Video Coding (AVC, also known as H.264 or MPEG-4 Part 10), which are jointly developed by ISO/IEC MPEG and ITU-T VECG.
- AV AOMedia Video 1
- AOM Alliance for Open Media
- Examples of the present disclosure provide methods and apparatuses for simplifications of bidirectional optical flow (BIO) and decoder-side motion vector refinement (DMVR).
- BIO bidirectional optical flow
- DMVR decoder-side motion vector refinement
- a method for decoding a video signal may include a decoder obtaining a forward reference picture L (0) and a backward reference picture L (1) associated with a coding unit (CU).
- the forward reference picture L (0) may be before a current picture and the backward reference picture L (1) may be after the current picture in display order.
- the decoder may also obtain forward reference samples L (0) (x, y) of the CU from a reference block in the forward reference picture L (0) .
- the x and y may represent an integer coordinate of one sample in the forward reference picture L (0) .
- the decoder may also obtain backward reference samples L (1) (x',y' of the CU from a reference block in the backward reference picture L ( 1 ) .
- the x’ and y’ may represent an integer coordinate of one sample in the backward reference picture L (1) .
- the decoder may also skip bi-directional optical flow (BIO) process based on a distortion measurement between integer reference samples.
- the distortion measurement between the integer reference samples may indicate a difference between the forward reference samples L (0) (x,y) and the backward reference samples L (1) (x', y').
- the decoder may also obtain, when the BIO process is skipped, prediction samples of the CU.
- a method for decoding a video signal may include a decoder obtaining a first reference picture / (0) and a second reference picture / (1) associated with a coding unit (CU).
- the first reference picture may be before a current picture and the second reference picture / (1) may be after the current picture in display order.
- the decoder may also obtain first prediction samples I (0) (i, j) of the CU from a reference block in the first reference picture I (0) .
- the i and j may represent a coordinate of one sample with the current picture.
- the decoder may also obtain second prediction samples I (1) (i, j) of the CU from a reference block in the second reference picture I (1) .
- the decoder may also obtain motion refinements for samples in the CU based on bi-directional optical flow (BIO) process being applied to the CU based on the first prediction samples the second prediction samples I (1) ( i, j), horizontal gradient values, and vertical gradient values.
- the horizontal gradient values and the vertical gradient values may be calculated using gradient filters with fewer coefficients.
- the decoder may also obtain bi prediction samples of the CU based on the motion refinements.
- a computing device may include one or more processors, anon-transitory computer-readable memory storing instructions executable by the one or more processors.
- the one or more processors may be configured to obtain a forward reference picture L (0) and a backward reference picture L (1) associated with a coding unit (CU).
- the forward reference picture L (0) may be before a current picture and the backward reference picture L (1) may be after the current picture in display order.
- the one or more processors may also be configured to obtain forward reference samples L (0) (x, y) of the CU from a reference block in the forward reference picture L (0) .
- the x and y may represent an integer coordinate of one sample in the forward reference picture L (0) .
- the one or more processors may also be configured to obtain backward reference samples L (1) (x', y') of the CU from a reference block in the backward reference picture L ( 1 ) .
- the x’ and y’ may represent an integer coordinate of one sample in the backward reference picture L ( 1 ) .
- the one or more processors may also be configured to skip bi-directional optical flow (BIO) process based on a distortion measurement between integer reference samples.
- the distortion measurement between the integer reference samples may indicate a difference between the forward reference samples L (0) (x, y) and the backward reference samples L (1) (x', y').
- the one or more processors may also be configured to obtain, when the BIO process is skipped, prediction samples of the CU.
- a non-transitory computer- readable storage medium having stored therein instructions.
- the instructions may cause the apparatus to obtain, at a decoder, a first reference picture and a second reference picture associated with a coding unit (CU).
- the first reference picture / (0) may be before a current picture and the second reference picture may be after the current picture in display order.
- the instructions may also cause the apparatus to obtain, at the decoder, first prediction samples of the CU from a reference block in the first reference picture I (0) .
- the i and j may represent a coordinate of one sample with the current picture.
- the instructions may further cause the apparatus to obtain, at the decoder, second prediction samples I (1) (i, j) of the CU from a reference block in the second reference picture I ( 1 ) .
- the instructions may also cause the apparatus to obtain, at the decoder, motion refinements for samples in the CU based on bi- directional optical flow (BIO) process being applied to the CU based on the first prediction samples I (0) (i, j) the second prediction samples I (1) (x', y'), horizontal gradient values, and vertical gradient values.
- BIO bi- directional optical flow
- the horizontal gradient values and the vertical gradient values may be calculated using gradient filters with fewer coefficients.
- the instructions may also cause the apparatus to obtain, at the decoder, bi-prediction samples of the CU based on the motion refinements.
- FIG. 1 is a block diagram of an encoder, according to an example of the present disclosure.
- FIG. 2 is a block diagram of a decoder, according to an example of the present disclosure.
- FIG. 3A is a diagram illustrating block partitions in a tree partition structure of the AVS3, according to an example of the present disclosure.
- FIG. 3B is a diagram illustrating block partitions in a tree partition structure of the AVS3, according to an example of the present disclosure.
- FIG. 3C is a diagram illustrating block partitions in a tree partition structure of the AVS3, according to an example of the present disclosure.
- FIG. 3D is a diagram illustrating block partitions in a tree partition structure of the AVS3, according to an example of the present disclosure.
- FIG. 3E is a diagram illustrating block partitions in a tree partition structure of the AVS3, according to an example of the present disclosure.
- FIG. 4 is a diagram illustration of a bi-directional optical flow (BDOF or BIO) model, according to an example of the present disclosure.
- FIG. 5 is a diagram illustration of a DMVR model, according to an example of the present disclosure.
- FIG. 6 a diagram illustration of integer searching candidates for the DMVR, according to an example of the present disclosure.
- FIG. 7 is a flowchart of a motion compensation process with the DMVR and the BIO, according to an example of the present disclosure.
- FIG. 8 is a flowchart of the proposed multi-stage early termination scheme for the BIO and the DMVR, according to an example of the present disclosure.
- FIG. 9 is a method for decoding a video signal, according to an example of the present disclosure.
- FIG. 10 is a method for decoding a video signal, according to an example of the present disclosure.
- FIG. 11 is a diagram illustrating a computing environment coupled with a user interface, according to an example of the present disclosure.
- first, second, third, etc. may be used herein to describe various information, the information should not be limited by these terms. These terms are only used to distinguish one category of information from another. For example, without departing from the scope of the present disclosure, first information may be termed as second information; and similarly, second information may also be termed as first information.
- first information may be termed as second information; and similarly, second information may also be termed as first information.
- second information may also be termed as first information.
- the term “if’ may be understood to mean “when” or “upon” or “in response to a judgment” depending on the context.
- the first generation AVS standard includes the Chinese national standard “Information Technology, Advanced Audio Video Coding, Part 2: Video” (known as AVS1) and “Information Technology, Advanced Audio Video Coding Part 16: Radio Television Video” (known as AVS+). It can offer around 50% bit-rate saving at the same perceptual quality compared to the MPEG-2 standard.
- the AVS 1 standard video part was promulgated as the Chinese national standard in February 2006.
- the second-generation AVS standard includes the series of Chinese national standard “Information Technology, Efficient Multimedia Coding” (knows as AVS2), which is mainly targeted at the transmission of extra HD TV programs.
- the coding efficiency of the AVS2 is double that of the AVS+. In May 2016, the AVS2 was issued as the Chinese national standard.
- the AVS2 standard video part was submitted by the Institute of Electrical and Electronics Engineers (IEEE) as one international standard for applications.
- the AVS3 standard is one new generation video coding standard for UHD video application aiming at surpassing the coding efficiency of the latest international standard HEVC.
- March 2019, at the 68th AVS meeting the AVS3-P2 baseline was finished, which provides approximately 30% bit-rate savings over the HEVC standard.
- HPM high performance model
- the AVS3 standard is built upon the block-based hybrid video coding framework.
- FIG. 1 shows a general diagram of a block-based video encoder for the VVC.
- FIG. 1 shows atypical encoder 100.
- the encoder 100 has video input 110, motion compensation 112, motion estimation 114, intra/inter mode decision 116, block predictor 140, adder 128, transform 130, quantization 132, prediction related info 142, intra prediction 118, picture buffer 120, inverse quantization 134, inverse transform 136, adder 126, memory 124, in-loop filter 122, entropy coding 138, and bitstream 144.
- a video frame is partitioned into a plurality of video blocks for processing. For each given video block, a prediction is formed based on either an inter prediction approach or an intra prediction approach.
- a prediction residual representing the difference between a current video block, part of video input 110, and its predictor, part of block predictor 140, is sent to a transform 130 from adder 128.
- Transform coefficients are then sent from the Transform 130 to a Quantization 132 for entropy reduction.
- Quantized coefficients are then fed to an Entropy Coding 138 to generate a compressed video bitstream.
- prediction related information 142 from an intra/inter mode decision 116 such as video block partition info, motion vectors (MVs), reference picture index, and intra prediction mode, is also fed through the Entropy Coding 138 and saved into a compressed bitstream 144.
- Compressed bitstream 144 includes a video bitstream.
- a prediction residual is reconstructed through an Inverse Quantization 134 and an Inverse Transform 136. This reconstructed prediction residual is combined with a Block Predictor 140 to generate un-filtered reconstructed pixels for a current video block.
- Spatial prediction (or “intra prediction”) uses pixels from samples of already coded neighboring blocks (which are called reference samples) in the same video frame as the current video block to predict the current video block.
- Temporal prediction uses reconstructed pixels from already-coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal.
- the temporal prediction signal for a given coding unit (CU) or coding block is usually signaled by one or more MVs, which indicate the amount and the direction of motion between the current CU and its temporal reference. Further, if multiple reference pictures are supported, one reference picture index is additionally sent, which is used to identify from which reference picture in the reference picture storage, the temporal prediction signal comes from.
- Motion estimation 114 intakes video input 110 and a signal from picture buffer 120 and output, to motion compensation 112, amotion estimation signal.
- Motion compensation 112 intakes video input 110, a signal from picture buffer 120, and motion estimation signal from motion estimation 114 and output to intra/inter mode decision 116, a motion compensation signal.
- an intra/inter mode decision 116 in the encoder 100 chooses the best prediction mode, for example, based on the rate- distortion optimization method.
- the block predictor 140 is then subtracted from the current video block, and the resulting prediction residual is de-correlated using the transform 130 and the quantization 132.
- the resulting quantized residual coefficients are inverse quantized by the inverse quantization 134 and inverse transformed by the inverse transform 136 to form the reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU.
- in-loop filtering 122 such as a deblocking filter, a sample adaptive offset (SAO), and/or an adaptive in-loop filter (ALF) may be applied on the reconstructed CU before it is put in the reference picture storage of the picture buffer 120 and used to code future video blocks.
- coding mode inter or intra
- prediction mode information motion information
- quantized residual coefficients are all sent to the entropy coding unit 138 to be further compressed and packed to form the bitstream.
- FIG. 1 gives the block diagram of a generic block-based hybrid video encoding system.
- the input video signal is processed block by block (called coding units (CUs)).
- CUs coding units
- CUs coding units
- one coding tree unit (CTU) is split into CUs to adapt to varying local characteristics based on quad/binary/extended-quad-tree.
- CTU coding tree unit
- the concept of multiple partition unit type in the HEVC is removed, i.e., the separation of CU, prediction unit (PU), and transform unit (TU) does not exist in the AVS3; instead, each CU is always used as the basic unit for both prediction and transform without further partitions.
- each CTU is firstly partitioned based on a quad-tree structure. Then, each quad-tree leaf node can be further partitioned based on a binary and extended-quad-tree structure. As shown in FIG. 3A, 3B, 3C, 3D, and 3E, there are five splitting types, quaternary partitioning, horizontal binary partitioning, vertical binary partitioning, horizontal extended quad-tree partitioning, and vertical extended quad-tree partitioning.
- FIG. 3A shows a diagram illustrating block quaternary partition in a tree partition structure of the AVS3, in accordance with the present disclosure.
- FIG. 3B shows a diagram illustrating block vertical binary partition in a tree partition structure of the AVS3, in accordance with the present disclosure.
- FIG. 3C shows a diagram illustrating block horizontal binary partition in a tree partition structure of the AVS3, in accordance with the present disclosure.
- FIG. 3D shows a diagram illustrating block vertical extended quaternary partition in a tree partition structure of the AVS3, in accordance with the present disclosure.
- FIG. 3E shows a diagram illustrating block horizontal ternary partition in a tree partition structure of the AVS3, in accordance with the present disclosure.
- spatial prediction and/or temporal prediction may be performed.
- Spatial prediction (or “intra prediction”) uses pixels from the samples of already coded neighboring blocks (which are called reference samples) in the same video picture/slice to predict the current video block. Spatial prediction reduces spatial redundancy inherent in the video signal.
- Temporal prediction (also referred to as “inter prediction” or “motion compensated prediction”) uses reconstructed pixels from the already coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal.
- Temporal prediction signal for a given CU is usually signaled by one or more motion vectors (MVs), which indicate the amount and the direction of motion between the current CU and its temporal reference.
- MVs motion vectors
- one reference picture index is additionally sent, which is used to identify from which reference picture in the reference picture storage the temporal prediction signal comes.
- the mode decision block in the encoder chooses the best prediction mode, for example, based on the rate-distortion optimization method.
- the prediction block is then subtracted from the current video block, and the prediction residual is de-correlated using transform and then quantized.
- the quantized residual coefficients are inverse quantized and inverse transformed to form the reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU.
- in-loop filtering such as deblocking filter, sample adaptive offset (SAO), and adaptive in-loop filter (ALF) may be applied on the reconstructed CU before it is put in the reference picture storage and used as a reference code for future video blocks.
- coding mode inter or intra
- prediction mode information motion information
- quantized residual coefficients are all sent to the entropy coding unit to be further compressed and packed.
- FIG. 2 shows a general block diagram of a video decoder for the VVC. Specifically, FIG. 2 shows a typical decoder 200 block diagram. Decoder 200 has bitstream 210, entropy decoding 212, inverse quantization 214, inverse transform 216, adder 218, intra/inter mode selection 220, intra prediction 222, memory 230, in-loop filter 228, motion compensation 224, picture buffer 226, prediction related info 234, and video output 232.
- Decoder 200 is similar to the reconstruction-related section residing in the encoder 100 of FIG. 1.
- an incoming video bitstream 210 is first decoded through an Entropy Decoding 212 to derive quantized coefficient levels and prediction-related information.
- the quantized coefficient levels are then processed through an Inverse Quantization 214 and an Inverse Transform 216 to obtain a reconstructed prediction residual.
- a block predictor mechanism implemented in an Intra/inter Mode Selector 220, is configured to perform either an Intra Prediction 222 or a Motion Compensation 224, based on decoded prediction information.
- a set of unfiltered reconstructed pixels is obtained by summing up the reconstructed prediction residual from the Inverse Transform 216 and a predictive output generated by the block predictor mechanism, using a summer 218.
- the reconstructed block may further go through an In-Loop Filter 228 before it is stored in a Picture Buffer 226, which functions as a reference picture storage.
- the reconstructed video in the Picture Buffer 226 may be sent to drive a display device, as well as used to predict future video blocks.
- a filtering operation is performed on these reconstructed pixels to derive a final reconstructed Video Output 232.
- FIG. 2 gives a general block diagram of a block-based video decoder.
- the video bitstream is first entropy decoded at entropy decoding unit.
- the coding mode and prediction information are sent to either the spatial prediction unit (if intra coded) or the temporal prediction unit (if inter-coded) to form the prediction block.
- the residual transform coefficients are sent to inverse quantization unit and inverse transform unit to reconstruct the residual block.
- the prediction block and the residual block are then added together.
- the reconstructed block may further go through in-loop filtering before it is stored in reference picture storage.
- the reconstructed video in reference picture storage is then sent out for display, as well as used to predict future video blocks.
- the focus of the disclosure is to reduce the complexity of the bidirectional optical flow (BIO) tool and the decoder-side motion vector refinement (DMVR) tool that are used in both the VVC and the AVS3 standards.
- the BIO tool is also abbreviated as BDOF.
- BDOF the existing BIO and DMVR designs in the AVS3 standard are used as examples to explain the main design aspects of the two coding tools. After that, possible improvements to the existing BIO and DMVR designs are discussed. Finally, some methods are proposed to reduce the complexity while maintaining the majority gain of the two coding tools.
- BIO and DMVR designs in the AVS3 standard are used as the base in the following description, to a person skilled in the art of video coding, the proposed methods described in the disclosure can also be applied to other BIO and DMVR designs or other coding tools that are in the same or similar design flavor.
- the derivation of the refined motion vector for each sample in one block is based on the classical optical flow model.
- the motion refinement (v x , v y ) at (x, y) can be derived by
- FIG. 4 shows an illustration of a BDOF model, in accordance with the present disclosure.
- (MV x0 , MV y0 ) and (MV x1 , MV y1 ) indicate the block-level motion vectors that are used to generate the two prediction blocks / (0) and I (1) .
- the motion refinement (v x , v y ) at the sample location (x, y) is calculated by minimizing the difference D between the values of the samples after motion refinement compensation (i.e., A and B in FIG. 4), as shown as
- the gradients need to be derived in the BIO for every sample of each motion compensated block (i.e., I (0) and I (1) ) in order to derive the local motion refinement and generate the final prediction at that sample location.
- the gradients are calculated by a 2D separable finite impulse response (FIR) filtering process, which defines a set of 8-tap filters and applies different filters to derive the horizontal and vertical gradients according to the precision of the block-level motion vector (e.g., (MV x0 , MV y0 ) and (MV x1 , MV y1 ) in FIG. 4).
- Table 1 illustrates the coefficients of the gradient filters that are used by the BIO. Table 1 Gradient filters used in BIO
- the BIO is only applied to bi-prediction blocks, which are predicted by two reference blocks from temporal neighboring pictures. Additionally, the BIO is enabled without sending additional information from the encoder to the decoder. Specifically, the BIO is applied to all the bi-directional predicted blocks, which have both the forward and backward prediction signals.
- a bilateral-matching based decoder side motion vector refinement (DMVR) is applied in the AVS3.
- DMVR bilateral-matching based decoder side motion vector refinement
- a refined MV is searched around the initial MVs in the reference picture list L0 and reference picture list L1.
- the method calculates the distortion between the two candidate blocks in the reference picture list L0 and list L1.
- the SAD between the red blocks based on each MV candidate around the initial MV is calculated.
- the MV candidate with the lowest SAD becomes the refined MV and is used to generate the bi-predicted signal.
- FIG. 5 shows a decoding side motion vector refinement
- FIG. 5 includes 520 refPic in list L0, 540 current picture, and 560 refPic in list L1.
- 520 refPic in list L0 is a reference picture of the first list and includes 522 current CU, 524 reference block, 526 MVdiff, 528 MV0, and 530 MV0’.
- 526 MVdiff is the motion vector difference between 522 current CU and 524 reference block.
- 528 MVO is the motion vector between blocks 522 current CU and 542 current CU.
- 530 MVO’ is the motion vector between blocks 522 current CU and 542 current CU.
- 540 current picture is a current picture of the video and includes 542 current CU, 544 MVU, and 546 MV1.
- 544 MVU is the motion vector between block 542 current CU and 562 reference block.
- 546 MV1 is the motion vector between blocks 542 current CU and 564 current CU.
- 560 refPic in List L1 is a reference picture in the second list and includes 562 reference block, 564 current CU, and 566 -MVdiff.
- 566 -MVdiff is the motion vector difference between 562 reference block and 564 current CU.
- search points are surrounding an integer sample pointed by the initial MV and the MV offsets that are considered to conform with the mirroring rule.
- any MV refinement that is checked by DMVR should satisfy the following two equations:
- MV O' MV 0 + MV_offset (5)
- MV 1' MV 1 - MV _offset (6)
- MV _of fset represents the refinement offset between the initial MV and the refined MV in one of the reference pictures.
- the refinement search range is two integer luma samples from the initial MV.
- the searching includes the integer sample search stage and fractional sample refinement stage.
- the sum of absolute difference (SAD) of 21 integer sample positions are checked.
- the SAD of the initial MV pair is first calculated.
- the integer offset, which minimizes the SAD value, is selected as the integer sample offset of the integer searching stage.
- FIG. 6 shows integer searching candidates for the DMVR.
- the black triangle is the integer sample position associated with the initial MV, and the blank or white triangles are the neighboring integer sample positions.
- the integer sample search is followed by fractional sample refinement.
- the fractional sample refinement is derived by using a parametric error surface method, instead of an additional search with SAD comparison.
- parametric error surface based sub-pixel offsets estimation the center position cost and the costs at four neighboring positions from the center are used to fit a 2-D parabolic error surface equation as follows
- BIO and the DMVR can efficiently enhance the efficiency of motion compensation, they also introduce significant complexity increase to the encoder and decode designs in both hardware and software. Specifically, in this disclosures, the following complexity issues in the existing BIO and DMVR designs are identified:
- the DMVR and the BIO are always enabled for the bi-predicted blocks, which have both forward and backward prediction signals.
- Such design may not be practical for certain video applications (e.g., video streaming on mobile devices) that cannot afford too heavy computations due to its limited power.
- the BIO needs to derive the gradient values at each sample location, which requires a number of multiplications and additions due to the 2D FIR filtering operations, and the DMVR needs to calculate multiple SAD values during the bilateral matching process. All of those operations require intensive computations.
- Such complexity increase may become even worse when the BIO is jointly applied with the DMVR to one bi-predicted CU. As shown in FIG.
- FIG. 7 shows a flowchart of the motion compensation process with the DMVR and the BIO.
- step 701 the process starts.
- step 702 the CU is divided into multiple sub-blocks with a size equal to min(16,
- step 703 variable i set to 0.
- step 704 DMVR is applied to the i- th sub-block.
- step 705 motion compensation is applied to the /-th sub-block with refined motion.
- BIO is applied to the i-th sub-block.
- step 707 it’s determined if the current sub-block is the last sub-block. If yes, the process continues to step 709. If no, the process continues to step 708.
- step 708 the variable i is increased by 1 and the process continues to step 704.
- step 709 the process ends.
- the current BIO uses 2D separable FIR filters to calculate the horizontal and vertical gradient values. Specifically, the low-pass 8-tap interpolation filters (that are used for the interpolation at regular motion compensation) and the high-pass 8-tap gradient filters (as shown in Table 1) are applied, and the filter selection is based on the fractional position of the corresponding motion vector. Assuming both the LO and L1 MVs point to reference samples that are at fractional sample positions in both horizontal and vertical directions, the number of multiplications and additions to calculate the horizontal and vertical gradients in LO and L1 will be (Wx(H+7) + WxH) x 2 x2.
- both the BIO and the DMVR utilize the L0 and L1 prediction samples to derive local motion refinements at different granularity levels (e.g., the BIO derives the motion refinements for each sample while the motion refinement of the DMVR is calculated per sub-block).
- the two prediction blocks are highly correlated such that the DMVR and the BIO processes can be safely skipped without incurring substantial coding loss.
- the initial motion vectors may point to reference samples at fractional sample positions, the generation of the L0 and L1 prediction samples may invoke the interpolation process, which requires non-negligible complexity and lead to some delays to make the decision.
- SSE sum of square error
- SAD sum of absolute difference
- SATD sum of absolute transformed difference
- the BIO and/or the DMVR could be skipped at the motion compensation stage when the difference measurement is no larger than one predefined threshold, i.e., Diff ⁇ D thres ; otherwise, the BIO and/or the DMVR still needs to be applied.
- the proposed early termination method for the DMVR and the BIO can be carried out at either CU level or sub-block level, which can potentially provide various tradeoffs between coding performance and complexity reduction.
- sub-block-level early termination can better maintain the coding gains of the BIO and the DMVR due to the finer control granularity of the BIO and the DMVR. This may not be optimal in terms of the complexity reduction, given that the distortion measurement and the early termination decision need to be performed for each sub-block separately.
- CU-level early termination can potentially lead to more significant complexity reduction, it may not be able to achieve an acceptable tradeoff between performance and complexity for coding blocks with inhomogeneous characteristics.
- the decoder may determine on a sub-block basis whether the DMVR and the BIO processes are to be bypassed or not. Otherwise (i.e., when the DMVR is not applied), it is acceptable to rely on the CU-level distortion measurement to determine if the BIO process for the whole CU is to be bypassed or not.
- one multi-stage early termination method is proposed to adaptively skip the BIO and the DMVR process at either CU-level or sub-block-level, depending on whether the DMVR is allowed to one current CU or not.
- FIG. 8 illustrates the modified motion compensation process with the proposed multi-stage early termination method being applied to the BIO and the DMVR.
- FIG. 8 shows a flowchart of the proposed multi-stage early termination scheme for the BIO and the DMVR.
- step 801 the process starts.
- step 802 it is determined if the DMVR is applied to the CU. If no, the process continues to step 811. If yes, the process continues to step 803.
- step 803 the CU is divided into multiple sub-blocks with a size equal to min(16, CUWidth) X min(16, CUHeight).
- step 804 the variable i is set to 0.
- step 805 it is determined if the difference between L0 and L1 integer reference samples of the sub-block, as calculated in equation 10, is less than or equal to threshold thresDMVR. If yes, the process continues to step 807. If no, the process continues to step 806. [0093] In step 806, DMVR is applied to the i-th sub-block.
- step 807 it is determined if the difference between L0 and L1 integer reference samples of the sub-block, as calculated in equation 10, is less than threshold thresBIO. If yes, the process continues to step 809. If no, the process continues to step 808.
- BIO is applied to the i-th sub-block.
- step 809 it is determined if the current sub-block is the last block if yes, the process continues to step 813. If no, the process continues to step 810.
- step 810 the variable i is increased by 1 and the process continues to step 805.
- step 811 it is determined if the difference, as calculated in equation 10, is less than threshold thresBIO. If yes, the process continues to step 813. If no, the process continues to step 812.
- BIO is applied to the CU.
- step 813 the process ends.
- the decision on whether to bypass the BIO process is made at CU-level. Specifically, if the distortion measurement (as shown in (10)) of the reference samples of the CU is no larger than a predefined threshold thresBIO, the BIO process is completely disabled for the whole CU; otherwise, the BIO is still applied for the CU.
- the decision on whether to bypass the BIO and the DMVR is made at sub-block-level. Additionally, two thresholds, thresBIO and thresDMVR are used to bypass the BIO and the DMVR for each sub block separately.
- FIG. 9 shows a method for decoding a video signal.
- the method may be, for example, applied to a decoder.
- the decoder may obtain a forward reference picture L (0) and a backward reference picture L (1) associated with a coding unit (CU).
- the forward reference picture L (0) is before a current picture and the backward reference picture L (1) is after the current picture in display order.
- the decoder may obtain forward reference samples L (0) (x, y) of the CU from a reference block in the forward reference picture L (0) .
- the x and y represent an integer coordinate of one sample in the forward reference picture L (0) .
- the decoder may obtain backward reference samples L (1 ) (x',y') of the CU from a reference block in the backward reference picture L (1 . )
- the x’ and y’ represent an integer coordinate of one sample in the backward reference picture L (1 . )
- the decoder may skip bi-directional optical flow (BIO) process based on a distortion measurement between integer reference samples.
- the distortion measurement between the integer reference samples indicates a difference between the forward reference samples L (0) (x, y) and the backward reference samples L (1 )( x',y').
- the decoder may obtain, when the BIO process is skipped, prediction samples of the CU.
- the BIO is designed to improve the accuracy of motion compensated prediction by providing sample-wise motion refinement, which is calculated based on the local gradients calculated at each sample location in one motion compensated block.
- sample-wise motion refinement which is calculated based on the local gradients calculated at each sample location in one motion compensated block.
- the BIO cannot provide effective refinements to the prediction samples of those blocks. This can be demonstrated by equation (2) that when the local gradients are close to be zero, the final prediction signal obtained from the BIO is approximately equal to the prediction signal generated by the conventional bi-prediction,
- the BIO it is proposed to only apply the BIO to the prediction samples of the coding blocks, which includes enough high-frequency information.
- the decision on whether the prediction signals of one video block include enough high-frequency information can be made based on various criteria.
- the average magnitude of the gradients for the samples within a block can be used. If the average gradient magnitude is smaller than one threshold, then the block is classified as a flat area and the BIO should not be applied; otherwise, the block is considered to include sufficient high-frequency details where the BIO is still applicable.
- the proposed gradient-based BIO early termination method can also be applied at either CU-level or sub-block-level.
- the gradient values of all the prediction samples inside the CU are used to determine whether the BIO is bypassed or not. Otherwise, when the method is applied at sub-block-level, the decision on whether to skip the BIO process is made for each sub-block individually by comparing the average gradient value of the prediction samples within the corresponding sub-block.
- a hybrid condition that checks both reference sample difference (according to equation 10) and gradient information is proposed. Note that in this condition, reference sample difference and gradient information may be checked jointly or separately. In the joint case, both sample difference and gradient information should be significant in order to apply BIO. Otherwise, BIO is skipped. In the separate case, when either sample difference or gradient information is small (e.g., through threshold comparison), BIO is skipped.
- the current BIO design uses 2D separable 8-tap FIR filters to calculate the horizontal and vertical gradient values, i.e., 8-tap interpolation filters and 8-tap gradient filters.
- 8-tap gradient filters may not always be effective to accurately extract gradient information from reference samples while resulting in a non-negligible computational complexity increase.
- gradient filters with fewer coefficients are proposed for calculating the gradient information that is used by the BIO.
- the input to the gradient derivation process may include the same reference samples as that used for the motion compensation and the fractional components (fracX, fracY) of the input motion (MV X , MV y ) of the current block.
- the order of applying the gradient filters h G and the interpolation filters h L is different. Specifically, in the case of deriving horizontal gradients, the gradient filter h G is firstly applied in the horizontal direction to derive the horizontal gradient values at horizontal fractional sample position fracX ; then, the interpolation filter hi is applied vertically to interpolate the gradient values at vertical fractional sample position fracY. On the contrary, when vertical gradients are derived, the interpolation filter h L is firstly applied horizontally to interpolate intermediate interpolation samples at horizontal sample position fracX, followed by the gradient filter ho being applied in the vertical direction to derive the vertical gradient at vertical fraction sample position fracY from the intermediate interpolation samples.
- the 4-tap gradient filters as shown in Table 2 are proposed for the gradient calculation of the BIO.
- the 6-tap gradient filters in Table 3 are proposed to derive the gradients for the BIO.
- FIG. 10 shows a method for decoding a video signal.
- the method may be, for example, applied to a decoder.
- the decoder may obtain a first reference picture I (0) and a second reference picture I (1) associated with a coding unit (CU).
- the first reference picture / (0) is before a current picture and the second reference picture I (1) is after the current picture in display order.
- the decoder may obtain first prediction samples / (0) (i,j) of the CU from a reference block in the first reference picture I (0) .
- the i and j represent a coordinate of one sample with the current picture.
- the decoder may obtain second prediction samples I (1) (i, j) of the CU from a reference block in the second reference picture I (1) .
- the decoder may obtain motion refinements for samples in the CU based on bi-directional optical flow (BIO) process being applied to the CU based on the first prediction samples I (0) (i, j) the second prediction samples I (1) (i, j) .
- BIO bi-directional optical flow
- horizontal gradient values, and vertical gradient values are calculated using gradient filters with fewer coefficients.
- the decoder may obtain bi-prediction samples of the CU based on the motion refinements.
- FIG. 11 shows a computing environment 1110 coupled with a user interface 1160.
- the computing environment 1110 can be part of a data processing server.
- the computing environment 1110 includes processor 1120, memory 1140, and I/O interface 1150.
- the processor 1120 typically controls overall operations of the computing environment 1110, such as the operations associated with the display, data acquisition, data communications, and image processing.
- the processor 1120 may include one or more processors to execute instructions to perform all or some of the steps in the above-described methods.
- the processor 1120 may include one or more modules that facilitate the interaction between the processor 1120 and other components.
- the processor may be a Central Processing Unit (CPU), a microprocessor, a single chip machine, a GPU, or the like.
- the memory 1140 is configured to store various types of data to support the operation of the computing environment 1110.
- Memory 1140 may include predetermine software 1142. Examples of such data include instructions for any applications or methods operated on the computing environment 1110, video datasets, image data, etc.
- the memory 1140 may be implemented by using any type of volatile or non-volatile memory devices, or a combination thereof, such as a static random-access memory (SRAM), an electrically erasable programmable read-only memory (EEPROM), an erasable programmable read-only memory (EPROM), a programmable read-only memory (PROM), a read-only memory (ROM), a magnetic memory, a flash memory, a magnetic or optical disk.
- SRAM static random-access memory
- EEPROM electrically erasable programmable read-only memory
- EPROM erasable programmable read-only memory
- PROM programmable read-only memory
- ROM read-only memory
- magnetic memory a magnetic memory
- flash memory
- the I/O interface 1150 provides an interface between the processor 1120 and peripheral interface modules, such as a keyboard, a click wheel, buttons, and the like.
- the buttons may include but are not limited to, a home button, a start scan button, and a stop scan button.
- the I/O interface 1150 can be coupled with an encoder and decoder.
- a non-transitory computer-readable storage medium comprising a plurality of programs, such as comprised in the memory 1140, executable by the processor 1120 in the computing environment 1110, for performing the above-described methods.
- the non-transitory computer-readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disc, an optical data storage device or the like.
- the non-transitory computer-readable storage medium has stored therein a plurality of programs for execution by a computing device having one or more processors, where the plurality of programs when executed by the one or more processors, cause the computing device to perform the above-described method for motion prediction.
- the computing environment 1110 may be implemented with one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field- programmable gate arrays (FPGAs), graphical processing units (GPUs), controllers, micro controllers, microprocessors, or other electronic components, for performing the above methods.
- ASICs application-specific integrated circuits
- DSPs digital signal processors
- DSPDs digital signal processing devices
- PLDs programmable logic devices
- FPGAs field- programmable gate arrays
- GPUs graphical processing units
- controllers micro controllers, microprocessors, or other electronic components, for performing the above methods.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202180022872.6A CN115315955A (en) | 2020-03-20 | 2021-03-17 | Simplified method and apparatus for bi-directional optical flow and decoder-side motion vector refinement |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202062992893P | 2020-03-20 | 2020-03-20 | |
US62/992,893 | 2020-03-20 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021188707A1 true WO2021188707A1 (en) | 2021-09-23 |
Family
ID=77771348
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2021/022811 WO2021188707A1 (en) | 2020-03-20 | 2021-03-17 | Methods and apparatuses for simplification of bidirectional optical flow and decoder side motion vector refinement |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115315955A (en) |
WO (1) | WO2021188707A1 (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019195643A1 (en) * | 2018-04-06 | 2019-10-10 | Vid Scale, Inc. | A bi-directional optical flow method with simplified gradient derivation |
-
2021
- 2021-03-17 WO PCT/US2021/022811 patent/WO2021188707A1/en active Application Filing
- 2021-03-17 CN CN202180022872.6A patent/CN115315955A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019195643A1 (en) * | 2018-04-06 | 2019-10-10 | Vid Scale, Inc. | A bi-directional optical flow method with simplified gradient derivation |
Non-Patent Citations (4)
Title |
---|
H.-C. CHUANG (QUALCOMM), J. CHEN (QUALCOMM), K. ZHANG (QUALCOMM), M. KARCZEWICZ (QUALCOMM): "EE2-related: A simplified gradient filter for Bi-directional optical flow (BIO)", 7. JVET MEETING; 20170713 - 20170721; TORINO; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), 6 July 2017 (2017-07-06), XP030150879 * |
J. CHEN, Y. YE, S. KIM: "Algorithm description for Versatile Video Coding and Test Model 8 (VTM 8)", 17. JVET MEETING; 20200107 - 20200117; BRUSSELS; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), 3 March 2020 (2020-03-03), XP030288000 * |
X. XIU (KWAI), Y.-W. CHEN (KWAI), T.-C. MA (KWAI), H.-J. JHU (KWAI), X. WANG (KWAI INC.): "Non-CE4: On SAD threshold for BDOF early termination", 16. JVET MEETING; 20191001 - 20191011; GENEVA; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), 25 September 2019 (2019-09-25), XP030217561 * |
Y.-C. YANG (FGINNOV), P.-H. LIN (FOXCONN): "CE4-related: On Conditions for enabling PROF", 15. JVET MEETING; 20190703 - 20190712; GOTHENBURG; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), no. JVET-O0313, 26 June 2019 (2019-06-26), pages 1 - 4, XP030219229 * |
Also Published As
Publication number | Publication date |
---|---|
CN115315955A (en) | 2022-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11166037B2 (en) | Mutual excluding settings for multiple tools | |
JP2020526109A (en) | Motion vector refinement for multi-reference prediction | |
EP3566447B1 (en) | Method and apparatus for encoding and decoding motion information | |
WO2020073937A1 (en) | Intra prediction for multi-hypothesis | |
JP2022123067A (en) | Video decoding method, program and decoder readable storage medium | |
EP3912352B1 (en) | Early termination for optical flow refinement | |
US20220094913A1 (en) | Methods and apparatus for signaling symmetrical motion vector difference mode | |
US20230115074A1 (en) | Geometric partition mode with motion vector refinement | |
CN113196783A (en) | De-blocking filter adaptive encoder, decoder and corresponding methods | |
US9438925B2 (en) | Video encoder with block merging and methods for use therewith | |
WO2021188598A1 (en) | Methods and devices for affine motion-compensated prediction refinement | |
JP2023063506A (en) | Method for deriving constructed affine merge candidate | |
EP4320863A1 (en) | Geometric partition mode with explicit motion signaling | |
WO2022081878A1 (en) | Methods and apparatuses for affine motion-compensated prediction refinement | |
WO2022032028A1 (en) | Methods and apparatuses for affine motion-compensated prediction refinement | |
WO2021188707A1 (en) | Methods and apparatuses for simplification of bidirectional optical flow and decoder side motion vector refinement | |
WO2020264221A1 (en) | Apparatuses and methods for bit-width control of bi-directional optical flow | |
CN114009017A (en) | Motion compensation using combined inter and intra prediction | |
WO2022026480A1 (en) | Weighted ac prediction for video coding | |
US20220272375A1 (en) | Overlapped block motion compensation for inter prediction | |
WO2022026888A1 (en) | Methods and apparatuses for affine motion-compensated prediction refinement | |
WO2021248135A1 (en) | Methods and apparatuses for video coding using satd based cost calculation | |
EP4352960A1 (en) | Geometric partition mode with motion vector refinement | |
CN113727107A (en) | Video decoding method and apparatus, and video encoding method and apparatus | |
Kim et al. | Multilevel Residual Motion Compensation for High Efficiency Video Coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21772582 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21772582 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04/04/2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21772582 Country of ref document: EP Kind code of ref document: A1 |