CN113632480B

CN113632480B - Interaction between adaptive loop filtering and other codec tools

Info

Publication number: CN113632480B
Application number: CN202080025296.6A
Authority: CN
Inventors: 张莉; 张凯; 庄孝强; 刘鸿彬; 王悦
Original assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Current assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Priority date: 2019-03-29
Filing date: 2020-03-30
Publication date: 2024-07-12
Anticipated expiration: 2040-03-30

Abstract

Interactions between adaptive loop filtering and other codec tools are described. In one exemplary aspect, a method for video processing includes: for a transition between a first block of video and a bitstream representation of the first block, configuring a first derivation process for deriving first gradient values used in a first codec tool to align with a second derivation process for deriving second gradient values used in a second codec tool different from the first codec tool; and performing the conversion based on the configured first derivation process.

Description

Interaction between adaptive loop filtering and other codec tools

Cross Reference to Related Applications

The present application timely claims the priority and benefit of international patent application number PCT/CN2019/080356 filed on 29 th 3 th 2019, in accordance with applicable regulations of patent laws and/or paris convention. The entire disclosure of international patent application number PCT/CN2019/080356 is incorporated by reference as part of the present disclosure.

Technical Field

This patent document relates to video encoding and decoding techniques, devices, and systems.

Background

Despite advances in video compression, digital television frequent occupies the largest bandwidth usage over the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, the bandwidth requirements for digital video usage are expected to continue to increase.

Disclosure of Invention

Devices, systems, and methods are described that relate to digital video codec, and in particular, to interactions between adaptive loop filtering and other codec tools. The method may be applied to existing video codec standards (e.g., high Efficiency Video Codec (HEVC)) as well as future video codec standards (e.g., versatile Video Codec (VVC)) or codecs.

In one representative aspect, the disclosed techniques can be used to provide a method for video processing. The method comprises the following steps: for a current video block, configuring a first derivation process for deriving first gradient values for use in a first codec tool, the configuring being based on a second derivation process for deriving second gradient values for use in a second codec tool different from the first codec tool; and reconstructing the current video block from the corresponding bitstream representation based on the first derivation process and the first codec, wherein at least one of the first codec and the second codec involves a pixel filtering process, and wherein the first gradient value and the second gradient value are indicative of a directional change in light intensity or color components over a subset of samples in the current video block.

In another representative aspect, the disclosed techniques can be used to provide a method for video processing. The method comprises the following steps: for a current video block, configuring a first padding process for use in a first codec tool, the configuring being based on a second padding process for use in a second codec tool different from the first codec tool; and reconstructing the current video block from the corresponding bitstream representation based on the first padding process and the first codec tool, wherein the first padding process and the second padding process comprise adding out-of-range samples to the calculation of gradient values indicative of directional changes in light intensity or color components by a subset of the samples in the current video block.

In one representative aspect, the disclosed techniques can be used to provide a method for video processing. The method comprises the following steps: for a transition between a first block of video and a bitstream representation of the first block, configuring a first derivation process for deriving first gradient values used in a first codec tool to align with a second derivation process for deriving second gradient values used in a second codec tool different from the first codec tool; and performing the conversion based on the configured first derivation process.

In one representative aspect, the disclosed techniques can be used to provide a method for video processing. The method comprises the following steps: deriving gradient values for use in one or more codec tools by applying a sub-block level gradient calculation procedure for a transition between a first block and a bitstream representation of the first block of video, wherein the gradient values are derived for partial samples within a predicted block of the first block; and performing the conversion based on the derived gradient values.

In one representative aspect, the disclosed techniques can be used to provide a method for video processing. The method comprises the following steps: for a transition between a first block of video and a bitstream representation of the first block, configuring a first padding process in a first codec tool to align with a second padding process in a second codec tool different from the first codec tool, wherein the first padding process is used to pad samples outside a range of gradient values used in the first codec tool and the second padding process is used to pad samples outside the range of gradient values used in the second codec tool; and performing the conversion based on the configured first filling process.

In yet another representative aspect, the above-described method is embodied in the form of processor-executable code and stored in a computer-readable program medium.

In yet another representative aspect, an apparatus configured or operable to perform the above-described method is disclosed. The device may include a processor programmed to implement such a method.

In yet another representative aspect, a video decoder device may implement the methods described herein.

The above and other aspects and features of the disclosed technology are described in more detail in the accompanying drawings, description and claims.

Drawings

Fig. 1 shows an example of a block diagram of an encoder for video codec.

Fig. 2A, 2B and 2C show examples of filter shapes of the adaptive loop filter (GALF) based on geometric transformations.

Fig. 3 shows an example of a flow chart of GALF encoder decisions.

Fig. 4A-4D illustrate example sub-sampled laplacian calculations for Adaptive Loop Filter (ALF) classification.

Fig. 5 shows an example of the shape of the luminance filter.

Fig. 6 shows an example of region division of a Wide Video Graphics Array (WVGA) sequence.

FIG. 7 shows an example of optical flow trajectories used by a bi-directional optical flow (BIO) algorithm.

FIGS. 8A and 8B illustrate example snapshots using a bi-directional optical flow (BIO) algorithm without block expansion.

Fig. 9 shows an example of interpolation samples used in BIO.

Fig. 10 shows an example of Predictive Refinement (PROF) employing optical flow.

Fig. 11A and 11B illustrate example methods for interaction between adaptive loop filtering and other codec tools in accordance with the techniques of this disclosure.

Fig. 12 is a block diagram of an example of a hardware platform for implementing the visual media decoding or visual media encoding techniques described in this document.

Fig. 13 shows a flow chart of yet another example method for video processing.

Fig. 14 shows a flow chart of yet another example method for video processing.

Fig. 15 shows a flow chart of yet another example method for video processing.

Detailed Description

Video codec methods and techniques are ubiquitous in modern technology as the demand for higher resolution video is increasing. Video codecs typically include electronic circuitry or software that compresses and decompresses digital video, and are continually being improved to provide higher codec efficiency. The video codec converts uncompressed video into a compressed format, or vice versa. There is a complex relationship between video quality, the amount of data used to represent the video (determined by the bit rate), the complexity of the encoding and decoding algorithms, the sensitivity to data loss and errors, the ease of editing, random access and end-to-end delay. Compression formats typically conform to standard video compression specifications, such as the High Efficiency Video Codec (HEVC) standard (also known as h.265 or MPEG-H part 2), the Versatile Video Codec (VVC) standard to be finalized, or other current and/or future video codec standards.

In some embodiments, future video codec techniques are explored using reference software called Joint Exploration Model (JEM). In JEM, sub-block based prediction is adopted in several codec tools such as affine prediction, optional temporal motion vector prediction (ATMVP), spatio-temporal motion vector prediction (STMVP), bi-directional optical flow (BIO), frame Rate Up Conversion (FRUC), locally Adaptive Motion Vector Resolution (LAMVR), overlapped Block Motion Compensation (OBMC), local Illumination Compensation (LIC), and decoder-side motion vector refinement (DMVR).

Embodiments of the disclosed technology may be applied to existing video codec standards (e.g., HEVC, h.265) and future standards to improve runtime performance. The section headings are used herein to enhance the readability of the description, and do not in any way limit the discussion or embodiments (and/or implementations) to the corresponding sections.

1 Example of color space and chroma subsampling

The color space is also known as a color model (or color system), which is an abstract mathematical model that simply describes a range of colors into tuples, which are typically 3 or 4 values or color components (e.g., RGB). Basically, the color space is a detailed description of the coordinate system and subspace.

For video compression, the most frequently used color spaces are YCbCr and RGB.

YCbCr, Y 'CbCr, or Y Pb/Cb Pr/Cr (also written as YCbCr or Y' CbCr) are a series of color spaces that are used as part of a color image pipeline in video and digital photography systems. Y' is a luminance component, CB and CR are blue and red difference chrominance components. Y' (preferred) differs from Y, which is luminance, which means non-linear codec of light intensity based on gamma corrected RGB primaries.

Chroma subsampling is the process of encoding and decoding an image by implementing a lower resolution for chroma information than for luma information, which takes advantage of the fact that the human visual system has less sensitivity to color differences than to luma.

1.1 4:4:4 Color format

Each of the three Y' CbCr components has the same sampling rate and thus no chroma sub-sampling. This approach is sometimes used in high-end film scanners and in motion picture post-production.

1.2 4:2:2 Color format

The two chrominance components are sampled at half the sampling rate of the luminance, e.g. halving the horizontal chrominance resolution. This reduces the bandwidth of the uncompressed video signal by one third with little visual difference.

1.3 4:2:0 Color format

In 4:2:0, the horizontal sampling is doubled compared to 4:1:1, but the vertical resolution is halved since only the Cb and Cr channels are sampled in this scheme in interlace. Thus, the data rates are the same. Each of Cb and Cr is sub-sampled by one-half in both the horizontal and vertical directions. There are three variants of the 4:2:0 scheme, which have different horizontal and vertical addressing (siting).

In MPEG-2, cb and Cr are co-located in the horizontal direction (cosit). Cb and Cr are addressed between pixels in the vertical direction (interstitial addressing).

In JPEG/JFIF, H.261 and MPEG-1, cb and Cr are interstitially addressed halfway between luminance samples that are one another.

In 4:2:0DV, cb and Cr are co-located in the horizontal direction. In the vertical direction they are co-located on the interlace.

2 Example of codec flow for a typical video codec

Fig. 1 shows an example of an encoder block diagram of a VVC, which contains three loop filter blocks: deblocking Filter (DF), sample Adaptive Offset (SAO), and ALF. Unlike DF using a predefined filter, SAO and ALF reduce the mean square error between the original and reconstructed samples by adding an offset and applying a Finite Impulse Response (FIR) filter, respectively, by signaling the offset and codec side information of the filter coefficients. ALF is located on the last processing stage of each picture and can be considered as a tool to attempt to capture and repair artifacts caused by previous stages.

Examples of geometry transform-based adaptive loop filters in 3JEM

In JEM, a geometry transform based adaptive loop filter (GALF) is applied that is adaptive by means of a block based filter. For the luminance component, one of 25 filters is selected for each 2x 2 block based on the direction and activity of the local gradient.

3.1 Example of Filter shape

In JEM, up to three diamond filter shapes may be selected for the luminance component (as shown in fig. 2A, 2B, and 2C for 5×5, 7×7, and 9×9 diamonds, respectively). The index is signaled at the picture level to indicate the filter shape for the luma component. For the chrominance components in the picture, a 5×5 diamond shape is always used.

3.1.1 Block Classification

Each 2 x 2 block is categorized into one of 25 categories. Quantized values of directionality D and activity based on class index CThe classification index C is derived as follows:

To calculate D and First, the gradients in the horizontal, vertical and two diagonal directions are calculated using the 1-D laplace operator:

the indices i and j refer to the coordinates of the top-left sample in the 2 x 2 block, and R (i, j) indicates the reconstructed sample at coordinates (i, j).

Thereafter, D maximum and minimum values of the gradients in the horizontal and vertical directions are set as:

And the maximum and minimum values of the gradients in the two diagonal directions are set as:

To derive the value D of the directivity, these values are compared with respect to each other and with two thresholds t ₁ and t ₂:

step 1: if it is AndBoth are true, then D is set to 0.

Step 2: if it isThen proceed from step 3; otherwise, continuing from step 4.

Step 3: if it isThen D is set to 2; otherwise D is set to 1.

Step 4: if it isThen D is set to 4; otherwise D is set to 3.

The activity value a is calculated as:

Further quantize a to a range of 0 to 4, and represent the quantized value as

For two chrominance components in a picture, no classification method is applied, i.e. a single set of ALF coefficients is applied for each chrominance component.

3.1.2 Geometric transformation of Filter coefficients

Before filtering each 2 x 2 block, a geometric transformation such as rotation or diagonal and vertical flipping is applied to filter the coefficients f (k, l) according to the gradient values calculated for that block. This is equivalent to applying these transforms to samples in the filter support area. The idea is to make different blocks of the ALF more similar by aligning (align) the directionality of these different blocks.

Three geometric transformations including diagonal, vertical flip and rotation were introduced:

diagonal line: f _D (k, l) =f (l, k),

And (3) vertically overturning: f _V (K, l) =f (K, K-l-1), (9)

And (3) rotation: f _R (K, l) =f (K-l-1, K).

Here, K is the size of the filter, and 0+.k, l+.k-1 is the coefficient coordinates such that position (0, 0) is in the upper left corner and position (K-1 ) is in the lower right corner. A transform is applied to the filter coefficients f (k, l) in dependence on the gradient values calculated for the block. The relationship between the transformation and the four gradients in the four directions is summarized in table 1.

TABLE 1 mapping of gradients and transforms calculated for a block

Gradient value	Transformation
		G _d2<g_d1 and g _h<g_v	No conversion
G _d2<g_d1 and g _v<g_h	Diagonal line
		G _d1<g_d2 and g _h<g_v	Vertical flip
G _d1<g_d2 and g _v<g_h	Rotating

3.1.3 Signaling of Filter parameters

In JEM, the filter parameters are signaled GALF for the first CTU, i.e., after the stripe header and before the SAO parameters of the first CTU. Up to 25 sets of luma filter coefficients may be signaled. To reduce bit overhead, filter coefficients of different classifications may be combined. Also, GALF coefficients of the reference picture are stored and allowed to be repeatedly used as GALF coefficients of the current picture. The current picture may choose to use GALF coefficients stored for the reference picture and bypass GALF coefficient signaling. In this case, only the index to one of the reference pictures is signaled and the stored GALF coefficients of the indicated reference picture are inherited for the current picture.

To support GALF time domain prediction, a candidate list of GALF filter banks is maintained. The candidate list is empty when decoding of a new sequence begins. After decoding a picture, a corresponding set of filters may be added to the candidate list. Once the size of the candidate list reaches the maximum allowable value (i.e., 6 in the current JEM), the new set of filters overwrites the oldest set in decoding order, i.e., a first-in-first-out (FIFO) rule is applied to update the candidate list. To avoid repetition, a group is added to the list only when the corresponding picture does not use GALF temporal prediction. To support temporal scalability, there are multiple candidate lists for the filter bank, each candidate list being associated with a temporal layer. More specifically, each array allocated by the temporal layer index (TempIdx) may be made up of a filter bank with previously decoded pictures equal to lower TempIdx. For example, the kth array is assigned to be associated with TempIdx equal to k, and it contains only filter banks from pictures with TempIdx less than or equal to k. After a picture is encoded, the filter bank associated with that picture will be used to update those arrays associated with TempIdx that are equal or higher.

Time domain prediction of GALF coefficients is used for inter-frame codec frames, minimizing signaling overhead. For intra frames, temporal prediction is not available and a set of 16 fixed filters is assigned to each class. To indicate the use of fixed filters, each type of flag is signaled and the index of the selected fixed filter is signaled if needed. Even when a fixed filter is selected for a given class, the coefficients f (k, l) of the adaptive filter can still be transmitted for this class, in which case the coefficients of the filter to be applied to the reconstructed image are the sum of the two sets of coefficients.

The filtering process of the luminance component may be controlled at the CU level. A notification flag is signaled to indicate whether to apply GALF to the luma component of the CU. For the chrominance components, it is indicated only at the picture level whether GALF is applied.

3.1.4 Filtering procedure

On the decoder side, when GALF is enabled for a block, each sample R (i, j) within the block is filtered, resulting in a sample value R' (i, j) as shown below, where L represents the filter length, f _m,n represents the filter coefficients, and f (k, L) represents the decoded filter coefficients.

3.1.5 Determination of encoder-side Filter parameters

The overall encoder decision process of GALF is shown in fig. 3. For each CU's luma samples, the encoder makes a decision as to whether to apply GALF, and includes the appropriate signaling flags in the stripe header. For chroma samples, the decision to apply the filter is done on a picture level instead of a CU level. Furthermore, the chroma GALF of a picture is checked only when the luma GALF is enabled for that picture.

Examples of geometry transform based adaptive loop filters in 4VVC

The current design of GALF in VVC has the following major changes compared to JEM:

1) The adaptive filter shape is removed. Only 7 x 7 filter shapes are allowed for the luminance component and only 5 x5 filter shapes are allowed for the chrominance component.

2) Both the time domain prediction of the ALF parameters and the prediction from the fixed filter are removed.

3) For each CTU, a one bit flag signaling whether ALF is enabled or disabled.

4) The computation of the category index is performed on the 4 x4 level instead of the 2x 2 level. Furthermore, as proposed, a sub-sampling laplacian calculation method for ALF classification is utilized. More specifically, it is not necessary to calculate a horizontal/vertical/45 degree diagonal/135 degree diagonal gradient for each of the points within a block. Instead, 1:2 sub-sampling is utilized.

Examples of region-based adaptive loop filters in 5AVS2

ALF is the final stage in loop filtering. There are two stages in this process. The first stage is filter coefficient derivation. To train the filter coefficients, the encoder classifies the reconstructed pixels of the luma component into 16 regions and trains a set of filter coefficients for each class using the wiener-hopf (wiener-hopf) equation, thereby minimizing the mean square error between the original frame and the reconstructed frame. To reduce redundancy between the 16 sets of filter coefficients, the encoder will adaptively combine them based on the rate-distortion performance. To the greatest extent, 16 different filter banks can be allocated for the luminance component, while only one filter bank is allocated for the chrominance component. The second stage is a filter decision that includes both frame and LCU stages. First, the encoder decides whether to perform frame-level adaptive loop filtering. If the frame level ALF starts, the encoder further decides whether to execute the LCU level ALF.

5.1 Filter shape

The filter shape adopted in AVS-2 is a 7 x 7 cross shape superimposed on a3 x 3 square shape, as shown in fig. 5, for both the luminance and chrominance components. Each square in fig. 5 corresponds to a sample point. Thus, a total of 17 samples are used to derive the filtered value for the sample at position C8. Taking into account the overhead of transmitting these coefficients, point symmetric filters are utilized with only nine coefficients { C0, C1,..c 8} left, which halve the number of filter coefficients and the number of multiplications in the filtering. The point symmetric filter may also reduce the computation of one filtered sample by half, e.g., only 9 multiplications and 14 additions are required for one filtered sample.

5.2 Adaptive region-based merging

To accommodate different codec errors, AVS-2 adopts a plurality of adaptive loop filters based on regions for the luma component. The luminance component is divided into 16 substantially equally sized base regions, where each base region is aligned with a maximum codec unit (LCU) boundary as shown in fig. 6, and a Wiener filter is derived for each region. The more filters used, the more distortion is reduced, but the bits used to encode these coefficients increase as the number of filters increases. To achieve optimal rate-distortion performance, these regions may be combined into fewer larger regions sharing the same filter coefficients. To simplify the merging process, each region is known with an index ordered according to modified Hilbert (Hilbert) based on image prior correlation. Two regions with consecutive indices may be combined based on the rate-distortion cost.

Mapping information between the regions should be signaled to the decoder. In AVS-2, the merging result is represented using the number of base regions, and the filter coefficients are sequentially compressed according to the region ordering of the filter coefficients. For example, when {0,1}, {2,3,4}, {5,6,7,8,9} and the remaining basic region are each merged into one region, only three integers are encoded and decoded to represent the merged map, i.e., 2,3, 5.

5.3 Signaling of side information

A plurality of switch flags are also used. An adaptive_loop_filter_enable control is used to control whether an adaptive loop filter is applied for the entire sequence. The picture_alf_ enble [ i ] of the picture switch flag controls whether ALF is applied for the corresponding i-th picture component. Only when picture _ alf _ enble i is enabled will the corresponding LCU-level flag and filter coefficients for that color component be transmitted. LCU level flag LCU _alf_enable [ k ] controls whether ALF is enabled for the corresponding kth LCU, and this flag is interspersed into the stripe data. The decisions of the specified flags for the different levels are all based on the rate-distortion cost. The high flexibility further enables the ALF to significantly improve the codec efficiency.

In some embodiments, and for the luminance component, there may be up to 16 sets of filter coefficients.

In some embodiments, and for each chrominance component (Cb and Cr), a set of filter coefficients may be transmitted.

6 Bidirectional optical flow (BIO)

6.1 Overview and analysis of BIO

In BIO, motion compensation is first performed to generate a first prediction (in each prediction direction) of the current block. Spatial gradients, temporal gradients, and optical flow of each sub-block or pixel within the first prediction block are used, which are then used to generate a second prediction, e.g., a final prediction of the sub-block or pixel. Details are described below.

The bi-directional optical flow (BIO) method is a sample-by-sample motion refinement performed on top of block-by-block motion compensation for bi-prediction. In some implementations, the sample level motion refinement does not use signaling.

Let I ^(k) be the luminance value from reference k (k=0, 1) after block motion compensation, and willAndRepresented as the horizontal and vertical components of the I ^(k) gradient, respectively. Assuming that the optical flow is valid, the motion vector field (v _x,v_y) is given by:

Combining this optical flow equation with Hermite interpolation of the motion trajectories for each sample point yields the sum function value I ^(k) and the derivative at the end AndA unique third-order polynomial that all match. The value of this polynomial at t=0 is the BIO prediction:

FIG. 7 illustrates an example optical flow trace in a bi-directional optical flow (BIO) method. Here, τ ₀ and τ ₁ represent distances from the reference frame. The distances τ ₀ and τ ₁：τ₀ =poc (current) -POC (Ref ₀),τ₁＝POC(Ref₁) -POC (current) are calculated based on POC for Ref ₀ and Ref ₁. If both predictions are from the same temporal direction (both from the past or both from the future), the sign is different (e.g., τ ₀·τ₁ < 0). In this case, if the predictions are not from the same instant (e.g., τ ₀≠τ₁), then BIO is applied. The two referenced regions have non-zero motion (e.g., MVx ₀,MVy₀,MVx₁,MVy₁ noteq0) and the block motion vector is proportional to the temporal distance (e.g., MVx ₀/MVx₁＝MVy₀/MVy₁＝-τ₀/τ₁).

The motion vector field (v _x,v_y) is determined by minimizing the distance delta between the values in point a and point B. Fig. 7 shows an example of an intersection of a motion trajectory and a reference frame plane. The model uses only the first linear term of the local taylor expansion of Δ:

All values in the above equation depend on the sample position denoted (i ', j'). Assuming that the motion is uniform in the locally surrounding area, delta can be minimized within a (2m+1) x (2m+1) square window Ω centered around the current predicted point (i, j), where M equals 2:

for this optimization problem, JEM uses a simplified approach that minimizes first in the vertical direction and then in the horizontal direction. This brings about the following results:

Wherein,

To avoid division by zero or minima, regularization parameters r and m are introduced in equations (15) and (16), where:

r=500.4 ^d-8 equation (18)

M=700.4 ^d-8 equation (19)

Here, d is the bit depth of the video samples.

In order to keep the memory access for BIO the same as for conventional bi-predictive motion compensation, all prediction and gradient values I ^(k) are calculated for positions inside the current block,Fig. 8A shows an example of an access location outside of block 800. As shown in fig. 8A, in equation (17), a (2m+1) × (2m+1) square window Ω centered on the current prediction point on the boundary of the prediction block needs to access a position outside the block. In JEM, I ^(k) outside the block,Is set equal to the nearest available value inside the block. This may be implemented, for example, as a fill region 801, as shown in fig. 8B.

With BIO it is possible to refine the motion field for each sample. To reduce computational complexity, a block-based design of BIO is used in JEM. Motion refinement may be calculated based on 4 x 4 blocks. In block-based BIO, the values of s _n in equation (17) for all samples in a 4×4 block can be aggregated, after which the aggregate values of s _n are used to derive the BIO motion vector offset for the 4×4 block. More specifically, block-based BIO derivation can be performed using the following formula:

here, b _k denotes a set of samples belonging to the kth 4×4 block of the prediction block. S _n in equations (15) and (16) is replaced with ((s _n,bk) > > 4) to derive the associated motion vector offset.

In some cases, the MV community (regiment) of the BIO may be unreliable due to noise or irregular motion. Thus, in BIO, the amplitude of the MV community is clipped to a threshold value. The threshold is determined based on whether the reference pictures of the current picture are all from one direction. For example, if all reference pictures of the current picture are from one direction, the value of the threshold is set to 12×2 ^14-d; otherwise, it is set to 12×2 ^13-d.

At the same time as motion compensated interpolation, the gradient of the BIO may be calculated using an operation conforming to the HEVC motion compensation process, e.g., a 2D separable Finite Impulse Response (FIR). In some embodiments, the inputs to the 2D separable FIR are the same reference frame samples as the motion compensation process and fractional positions (fracX, fracY) according to the fractional portion of the block motion vector. For horizontal gradientsThe signal is first vertically interpolated using BIOfilterS corresponding to fractional position fracY with the descaled shift d-8. Thereafter, a gradient filter BIOfilterG is applied in the horizontal direction corresponding to fractional position fracX with a descaled shift of 18-d. For vertical gradientsGradient filters are first applied vertically using BIOfilterG corresponding to fractional position fracY with a descaled shift d-8. Thereafter, signal shifting is performed using BIOfilterS in the horizontal direction corresponding to fractional position fracX with a descaled shift of 18-d. The length of the interpolation filter BIOfilterG for gradient computation and the interpolation filter BIOfilterF for signal displacement may be shorter (e.g., 6 taps) to maintain reasonable complexity. Table 2 shows example filters that may be used for gradient computation for different fractional positions of block motion vectors in BIO. Table 3 shows an example interpolation filter that may be used for prediction signal generation in BIO.

TABLE 2 example filters for gradient computation in BIO

Fractional pixel location	Interpolation filter for gradient (BIOfilterG)
		0	{8,-39,-3,46,-17,5}
1/16	{8,-32,-13,50,-18,5}
		1/8	{7,-27,-20,54,-19,5}
3/16	{6,-21,-29,57,-18,5}
		1/4	{4,-17,-36,60,-15,4}
5/16	{3,-9,-44,61,-15,4}
		3/8	{1,-4,-48,61,-13,3}
7/16	{0,1,-54,60,-9,2}
		1/2	{-1,4,-57,57,-4,1}

TABLE 3 example interpolation Filter for prediction Signal Generation in BIO

In JEM, BIO may be applied to all bi-predicted blocks when the two predictions are from different reference pictures. The BIO may be disabled when Local Illumination Compensation (LIC) is enabled for the CU.

In some embodiments, OBMC is applied for the block after the normal MC process. To reduce computational complexity, no BIO may be applied during the OBMC process. This means that when using the block's own MV, BIO is applied in the MC process for the block, and when using the MV of the neighboring block during the OBMC process, BIO is not applied in the MC process.

6.2 Examples of BIO in VTM-3.0

Step 1: determining whether BIO is applicable (W/H is width/height of current block)

BIO is not applicable under the following conditions:

o the current video block is o (iPOC-iPOC ₀)×(iPOC-iPOC₁) equal to or greater than 0 of affine codec or ATMVP codec

O h= =4 or (w= =4 and h= =8)

O employs weighted prediction

The weight of the key Gbi is not (1, 1)

If the total SAD between the two reference blocks (denoted R ₀ and R ₁) is less than the threshold, then BIO is not used, where

Step 2: data preparation

For a W H block, (W+2) x (H+2) samples are interpolated.

Just as in normal motion compensation, the inner w×h samples are interpolated using an 8 tap interpolation filter.

The four outer rows of samples (black circles in fig. 9) are interpolated using a bilinear filter.

For each position, the gradient at two reference points (R ₀ and R ₁) is calculated.

Gx0(x,y)＝(R0(x+1,y)-R0(x-1,y))>>4

Gy0(x,y)＝(R0(x,y+1)-R0(x,y-1))>>4

Gx1(x,y)＝(R1(x+1,y)-R1(x-1,y))>>4

Gy1(x,y)＝(R1(x,y+1)-R1(x,y-1))>>4

For each location, the intrinsic value is calculated as:

T1＝(R0(x,y)>>6)-(R1(x,y)>>6),T2＝(Gx0(x,y)+Gx1(x,y))>>3,T3＝(Gy0(x,y)+Gy1(x,y))>>3; And

B1(x,y)＝T2*T2，B2(x,y)＝T2*T3，B3(x,y)＝-T1*T2，B5(x,y)＝T3*T3，B6(x,y)＝-T1*T3

Step 3: computing predictions for each block

For one 4 x 4 block, if the SAD between two 4 x 4 reference blocks is less than the threshold, then the BIO is skipped.

Vx and Vy are calculated.

The final prediction for each location in a4 x 4 block is calculated.

b(x,y)＝(Vx(Gx⁰(x,y)-Gx¹(x,y))+Vy(Gy⁰(x,y)-Gy¹(x,y))+1)>>1

P(x,y)＝(R⁰(x,y)+R¹(x,y)+b(x,y)+offset)>>shift

Here, b (x, y) is referred to as a correction term.

6.3 Alternative examples of BIO in VTM-3.0

Decoding process of 8.3.4 inter blocks

-If predflag l0 and predflag l1 are equal to 1,DiffPicOrderCnt(currPic,refPicList0[refIdx0])*DiffPicOrderCnt(currPic,refPicList1[refIdx1])<0,MotionModelIdc[xCb][yCb] equal to 0 and MergeModeList [ merge_idx [ xCb ] [ yCb ] ] is not equal to SbCol, then the value of bioAvailableFlag is set to true.

Otherwise, the value of bioAvailableFlag is set to false.

If bioAvailableFlag is equal to true, then the following applies:

-setting the variable shift equal to Max (2, 14-bitDepth).

Variables cuLevelAbsDiffThres and subCuLevelAbsDiffThres are set equal to (1 < < (bitDepth-8+shift)) × cbWidth × cbHeight and 1< < (bitDepth-3+shift). Variable cuLevelSumAbsoluteDiff is set equal to 0.

For xSbIdx =0.(cbWidth > > 2) -1 and ySbIdx =0.(cbHeight > > 2) -1, the variables subCuLevelSumAbsoluteDiff [ xSbIdx ] [ ySbIdx ] and bidirectional optical flow utilization flag bioUtilizationFlag [ xSbIdx ] [ ySbIdx ] for the current sub-block are derived as:

subCuLevelSumAbsoluteDiff[xSbIdx][ySbIdx]＝∑_i∑_jAbs(predSamplesL0L[(xSbIdx<<2)+1+i][(ySbIdx<<2)+1+j]-predSamplesL1L[(xSbIdx<<2)+1+i][(ySbIdx<<2)+1+j]), Wherein i, j=0.3

bioUtilizationFlag[xSbIdx][ySbIdx]＝

subCuLevelSumAbsoluteDiff[xSbIdx][ySbIdx]>＝subCuLevelAbsDiffThres

cuLevelSumAbsoluteDiff+＝subCuLevelSumAbsoluteDiff[xSbIdx][ySbIdx]

If cuLevelSumAbsoluteDiff is less than cuLevelAbsDiffThres, then bioAvailableFlag is set to false.

-If bioAvailableFlag is equal to true, using luma coding sub-block width sbWidth, luma coding sub-block height sbHeight and sample array PREDSAMPLESL0L, predSamplesL L and variables predflag L0, predflag L1, refIdxL0, refIdxL1, predict samples PREDSAMPLESL [ xl+xsb ] [ yl+ ySb ] within the current luma codec sub-block are derived by invoking the bi-directional optical flow sample prediction procedure specified in clause 8.3.4.5, where xl=0.. sbWidth-1 and yl=0.. sbHeight-1.

8.3.4.3 Fractional sample interpolation procedure

8.3.4.3.1 Overview

The inputs to this process are:

luminance location (xSb, ySb), specifying the left upsampling point of the current codec sub-block with respect to the left upsampling point of the current picture,

A variable sbWidth specifying the width of the current codec sub-block in luma samples,

A variable sbHeight specifying the height of the current codec sub-block in luma samples,

Luminance motion vectors mvLX given in units of 1/16 luminance samples,

A chrominance motion vector mvCLX given in units of 1/32 chrominance samples,

-A selected reference picture sample array refPicLXL and arrays refPicLXCb and refPicLXCr.

-A bidirectional optical flow enabled flag bioAvailableFlag.

The output of this process is:

-an (sbWidth) x (sbHeight) array predSamplesLXL of predicted luminance sample values when bioAvailableFlag is false, or an (sbWidth +2) x (sbHeight +2) array predSamplesLXL of predicted luminance sample values when bioAvailableFlag is true.

Two (sbWidth/2) x (sbHeight/2) arrays predSamplesLXCb and predSamplesLXCr of predicted chroma-sample values.

Let (xIntL, yIntL) be the luminance position given in full-pel units, and (xFracl, yFracl) be the offset given in 1/16-pel units. These variables are used only in this clause to specify fractional sample positions within the reference sample arrays refPicLXL, refPicLXCb and refPicLXCr.

When bioAvailableFlag is equal to true, for each luminance sample point within the predicted luminance sample array predSamplesLXL (xl= -1.. sbWidth, yl= -1.. sbHeight), the corresponding predicted luminance sample point value predSamplesLXL [ xL ] [ yL ]:

variables xIntL, yIntL, xFracL and yFracL are derived as follows:

xIntL＝xSb-1+(mvLX[0]>>4)+xL

yIntL＝ySb-1+(mvLX[1]>>4)+yL

xFracL＝mvLX[0]&15

yFracL＝mvLX[1]&15

-deriving bilinearFiltEnabledFlag the value as follows:

-setting the value of bilinearFiltEnabledFlag to true if xL is equal to-1 or sbWidth, or yL is equal to-1 or sbHeight.

Otherwise, the value of bilinearFiltEnabledFlag is set to false.

-Deriving the predicted luminance sample value predSamplesLXL [ xL ] [ yL ] by invoking the procedure specified in clause 8.3.4.3.2 with (xIntL, yIntL), (xFracL, yFracL), refPicLXL, and bilinearFiltEnabledFlag as inputs.

When bioAvailableFlag equals false, for each luminance sample point within the predicted luminance sample array predSamplesLXL (xl=0.. sbWidth-1, yl=0.. sbHeight-1), the corresponding predicted luminance sample value predSamplesLXL [ xL ] [ yL ]:

variables xIntL, yIntL, xFracL and yFracL are derived as follows:

xIntL＝xSb+(mvLX[0]>>4)+xL

yIntL＝ySb+(mvLX[1]>>4)+yL

xFracL＝mvLX[0]&15

yFracL＝mvLX[1]&15

Setting variable bilinearFiltEnabledFlag to false.

-Deriving the predicted luminance sample value predSamplesLXL [ xL ] [ yL ] by invoking the procedure specified in clause 8.3.4.3.2 with (xIntL, yIntL), (xFracL, yFracL) and refPicLXL and bilinearFiltEnabledFlag as inputs.

8.3.4.5 Bidirectional optical flow prediction process

The inputs to this process are:

Two variables nCbW and nCbH, specifying the width and height of the current codec block,

Two (nCbW + 2) x (nCbH + 2) arrays of predicted luminance samples PREDSAMPLESL0 and PREDSAMPLESL1,

The prediction list uses the flags predflag l0 and predflag l1,

Reference indices refIdxL0 and refIdxL1,

-Bidirectional optical flow utilization flag bioUtilizationFlag [ xSbIdx ] [ ySbIdx ], wherein xSbIdx =0..(nCbW > > 2) -1, ysbdx=0..(nCbH > > 2) -1

The output of this process is the (nCbW) x (nCbH) array pbSamples of luminance prediction samples values.

Variable bitDepth is set equal to BitDepthY.

The variable shift2 is set equal to Max (3, 15-bitDepth) and the variable offset2 is set equal to 1< < (shift 2-1).

Variable MVREFINETHRES is set equal to Max (2, 13-bitDepth).

For xSbIdx =0..(nCbW > > 2) -1 and ySbIdx =0..(nCbH > > 2) -1,

-If bioUtilizationFlag [ xSbIdx ] [ ySbIdx ] is false, for x=xsb..xsb+3, y=ysb.. ySb +3, deriving the prediction sample value for the current prediction unit as follows:

pbSamples[x][y]＝Clip3(0,(1<<bitDepth)-1，

(predSamplesL0[x][y]+predSamplesL1[x][y]+offset2)>>shift2)

otherwise, deriving the prediction sample value of the current prediction unit as follows:

-deriving the position (xSb, ySb) of the left-hand sample of the specified current sub-block relative to the left-hand samples of the predicted sample arrays PREDSAMPLESL and PREDSAMPLEL1 as follows:

xSb＝(xSbIdx<<2)+1

ySb＝(ySbIdx<<2)+1

for x=xsb-1..xsb+4, y= ySb-1.. ySb +4, the following applies:

-deriving the position (hx, vy) of each of the corresponding samples (x, y) within the array of predicted samples as follows:

hx＝Clip3(1,nCbW,x)

vy＝Clip3(1,nCbH,y)

variables GRADIENTHL [ x ] [ y ], GRADIENTVL [ x ] [ y ], GRADIENTHL [ x ] [ y ] and GRADIENTVL [ x ] [ y ] are derived as follows:

gradientHL0[x][y]＝(predSamplesL0[hx+1][vy]-predSampleL0[hx-1][vy])>>4

gradientVL0[x][y]＝(predSampleL0[hx][vy+1]-predSampleL0[hx][vy-1])>>4

gradientHL1[x][y]＝(predSamplesL1[hx+1][vy]-predSampleL1[hx-1][vy])>>4

gradientVL1[x][y]＝(predSampleL1[hx][vy+1]-predSampleL1[hx][vy-1])>>4

the variables temp, tempX and tempY are derived as follows:

temp[x][y]＝(predSamplesL0[hx][vy]>>6)-(predSamplesL1[hx][vy]>>6)

tempX[x][y]＝(gradientHL0[x][y]+gradientHL1[x][y])>>3

tempY[x][y]＝(gradientVL0[x][y]+gradientVL1[x][y])>>3

variables sGx, sGy2, sGxGy, sGxdI and sGydI are derived as follows:

sGx2＝∑_x∑_y(tempX[xSb+x][ySb+y]*

tempX [ xsb+x ] [ ySb +y ]), wherein x, y= -1..4

sGy2＝∑_x∑_y(tempY[xSb+x][ySb+y]*

TempY [ xsb+x ] [ ySb +y ]), wherein x, y= -1..4

sGxGy＝∑_x∑_y(tempX[xSb+x][ySb+y]*

TempY [ xsb+x ] [ ySb +y ]), wherein x, y= -1..4

sGxdI＝∑_x∑_y(-tempX[xSb+x][ySb+y]*

Temp [ xsb+x ] [ ySb +y ]), where x, y= -1..4

sGydI＝∑_x∑_y(-tempY[xSb+x][ySb+y]*

Temp [ xsb+x ] [ ySb +y ]), where x, y= -1..4

-Deriving the horizontal and vertical motion refinements of the current sub-block as follows:

vx＝sGx2>0Clip3(-mvRefineThres,mvRefineThres,-(sGxdI<<3)>>Floor(Log2(sGx2))):0

vy＝sGy2>0Clip3(-mvRefineThres,mvRefineThres,((sGydI<<3)-((vx*sGxGym)<<12+vx*sGxGys)>>1)>>Floor(Log2(sGy2))):0

sGxGym＝sGxGy>>12；

sGxGys＝sGxGy&((1<<12)-1)

For x=xsb-1..xsb+2, y= ySb-1.. ySb +2, the following applies:

sampleEnh＝Round((vx*(gradientHL1[x+1][y+1]-gradientHL0[x+1][y+1]))>>1)+Round((vy*(gradientVL1[x+1][y+1]-gradientVL0[x+1][y+1]))>>1)

pbSamples[x][y]＝Clip3(0,(1<<bitDepth)-1,(predSamplesL0[x+1][y+1]+predSamplesL1[x+1][y+1]+sampleEnh+offset2)>>shift2)

7 predictive refinement with optical flow (PROF)

The present contribution proposes a method for sub-block based affine motion compensated prediction using optical flow refinement. After performing sub-block based affine motion compensation, the prediction samples are refined by adding the differences derived from the optical flow equations, which is called Prediction Refinement (PROF) with optical flow. The proposed method enables inter prediction at pixel level granularity without increasing memory access bandwidth.

In order to achieve finer granularity of motion compensation, the present contribution proposes a method for sub-block based affine motion compensation prediction using optical flow refinement. After performing sub-block based affine motion compensation, the luminance prediction samples are refined by adding the differences derived from the optical flow equations. The proposed PROF is described as the following four steps.

Step 1) performs sub-block based affine motion compensation to generate sub-block predictions I (I, j).

Step 2) calculates the spatial gradients of sub-block predictions g _x (i, j) and g _y (i, j) at each of the sample positions using a 3-tap filter [ -1,0,1 ].

g_x(i,j)＝I(i+1,j)-I(i-1,j)

g_y(i,j)＝I(i,j+1)-I(i,j-1)

The sub-block prediction is extended by one pixel on each side for gradient computation. To reduce memory bandwidth and complexity, pixels on the extended boundary are copied from the nearest integer pixel position in the reference picture. Thus, additional interpolation for the filled region is avoided.

And 3) calculating brightness prediction refinement through an optical flow equation.

ΔI(i,j)＝g_x(i,j)*Δv_x(i,j)+g_y(i,j)*Δv_y(i,j)

Here, Δv (i, j) is a difference between the pixel MV (represented by v (i, j)) calculated for the sample point (i, j) and the sub-block MV of the sub-block to which the pixel (i, j) belongs, as shown in fig. 10.

Since affine model parameters and pixel locations relative to the center of a sub-block do not change from sub-block to sub-block, Δv (i, j) can be calculated for the first sub-block and repeated for other sub-blocks within the same CU. Let x and y be the horizontal and vertical offsets from the pixel location to the center of the sub-block, then deltav (x, y) can be derived by the following equation,

For a 4-parameter affine model,

For a 6-parameter affine model,

Here, (v _0x,v_0y)、(v_1x,v_1y)、(v_2x,v_2y) is the upper left, upper right and lower left control point motion vector, w and h are the width and height of the CU.

Step 4) finally, adding the luminance prediction refinement to the sub-block prediction I (I, j). The final prediction I' is generated according to the following equation:

I′(i,j)＝I(i,j)+ΔI(i,j)

8 disadvantages of existing implementations

The design of VVC has the following problems:

(1) Different gradient calculation methods are used in BIO (also known as BDOF)/ALF/PROF, such as [1, -1], [ -1,2, -1].

(2) In the BIO process, gradient computation is done at the sample level, where the gradient is computed for each sample. While the refined motion vectors (such as Vx and Vy) are done in the 4 x4 sub-block level, which depends on the gradient values of the 6 x 6 block covering the sub-block. Each sample calculation of the gradient increases the computational complexity.

9 Exemplary methods for interaction of ALF with other codec tools

Embodiments of the disclosed technology overcome the drawbacks of existing implementations, thereby providing higher codec efficiency for video codec. Interactions between adaptive loop filtering and other codec tools based on the disclosed techniques may enhance existing and future video codec standards and will be elucidated in the following examples described for various implementations. The examples of the disclosed technology provided below explain the general principles and are not intended to be limiting. In the examples, various features described in these examples may be combined unless explicitly indicated to the contrary.

In the following example, shift (x, s) is defined as Shift (x, s) = (x+off) > > s.

In the following example SatShift (x, n) is defined as

In an example, offset0 and/or offset1 is set to (1 < < n) > >1 or (1 < < (n-1)). In another example, offset0 and/or offset1 is set to 0. In another example, offset0 = offset1 = ((1 < < n) > > 1) -1 or ((1 < < (n-1))) -1.

In the following example, clip3 (x, min, max) is defined as

It is proposed that the gradient computation process in ALF/non-local ALF is aligned with the gradient computation process used in decoder derivation methods (e.g. decoder motion vector refinement/BIO, etc.).

1. It is proposed that all or part of the gradient value derivation process used in BIO is aligned with the corresponding gradient value derivation process used in ALF.

A. in one example, the vertical gradient value in BIO (denoted as g _v) is calculated in the same manner as the vertical gradient value in ALF is calculated.

I. in one example, the vertical gradient computation is defined as [ -1,2, -1]

A filter.

In one example of this, in one embodiment,

g_v＝Shift(2R(k,l)-R(k,l-off1)-R(k,l+off2),prec)

Where R (i, j) indicates the reconstructed or predicted samples at coordinates (i, j), the variables off1, off2, and prec are integers, e.g., off1 = off2 = 1 and prec = 0.

Alternatively, the composition of the present invention,

g_v＝SatShift(2R(k,l)-R(k,l-off1)-R(k,l+off2),prec)

B. In one example, the horizontal gradient value in BIO (denoted as g _h) is calculated in the same manner as that used to calculate the horizontal gradient value in ALF.

I. In one example, the horizontal gradient calculation is defined as [ -1,2, -1]

A filter.

In one example of this, in one embodiment,

G _h =shift (2R (k, l) -R (k-off 1, l) -R (k+off 2, l), prec), where R (i, j) indicates the reconstructed or predicted samples at coordinates (i, j), the variables off1, off2, and prec are integers, e.g., off1 = off2 = 1 and prec = 0;

alternatively, the composition of the present invention,

g_h＝SatShift(2R(k,l)-R(k-off1,l)-R(k+off2,l),prec)

2. It is proposed that all or part of the gradient value derivation process used in the PROF is identical to the corresponding gradient value derivation process used in the ALF.

A. In one example, the vertical gradient value in the PROF (denoted as g _v) is calculated in the same manner as the vertical gradient value in the ALF is calculated.

I. In one example, the vertical gradient computation is defined as a [ -1,2, -1] filter.

In one example, g _v =shift (2R (k, l) -R (k, l-off 1) -R (k, l+off 2), prec), where R (i, j) indicates a reconstructed or predicted sample point at coordinate (i, j), off1, off2, and prec are integers, e.g., off1 = off2 = 1 and prec = 0;

alternatively, the composition of the present invention,

g_v＝SatShift(2R(k,l)-R(k,l-off1)-R(k,l+off2),prec)

B. In one example, the horizontal gradient value in the PROF (denoted as g _h) is calculated in the same manner as the horizontal gradient value in the ALF is calculated.

I. in one example, the horizontal gradient calculation is defined as a [ -1,2, -1] filter.

In one example, g _h =shift (2R (k, l) -R (k-off 1, l) -R (k+off 2, l), prec), where R (i, j) indicates a reconstructed or predicted sample point at coordinate (i, j), off1, off2, and prec are integers, e.g., off1 = off2 = 1 and prec = 0;

alternatively, the composition of the present invention,

g_h＝SatShift(2R(k,l)-R(k-off1,l)-R(k+off2,l),prec)

3. It is proposed that sub-block level gradient computation methods can be applied to BIO/PROF and other non-ALF codec tools, where gradients will not be computed for all samples within a block.

A. in one example, gradient values may be derived using only selected coordinates, such as shown in fig. 4.

B. In one example, when gradient values are not calculated for certain coordinates, the associated gradient values may be copied from the gradient values associated with its neighbors in which the gradient was calculated.

C. How the gradient values are copied from the selected spots to those remaining spots may depend on the gradient direction, e.g. horizontal or vertical gradient.

4. It is proposed that all or part of the gradient value derivation process used in ALF is the same as the corresponding derivation process of other codec tools (e.g., gradient values used in BIO).

A. In one example, the vertical gradient value in the ALF (denoted as g _v) is calculated in the same manner as the vertical gradient value in the BIO.

I. For example, the number of the cells to be processed,

g_v(x,y)＝Shift(R(x,y+off1)-R(x,y-off2),prec)，

Where R (x, y) indicates the reconstructed or predicted sample point at coordinates (x, y), the variables off1, off2, and prec are integers, e.g., off1 = off2 = 1 and prec = 0;

alternatively, the composition of the present invention,

g_v(x,y)＝SatShift(R(x,y+off1)-R(x,y-off2),prec)。

Alternatively, the composition of the present invention,

g_v(x,y)＝Shift(|R(x,y+off1)-R(x,y-off2)|,prec)。

Alternatively, the composition of the present invention,

g_v(x,y)＝SatShift(|R(x,y+off1)-R(x,y-off2)|,prec)。

B. in one example, the horizontal gradient value in the ALF (denoted as g _h) is calculated in the same manner as the horizontal gradient value in the BIO.

I. For example, the number of the cells to be processed,

g_h(x,y)＝Shift(R(x+1,y)-R(x-1,y),prec)，

alternatively, the composition of the present invention,

g_h(x,y)＝SatShift(R(x+1,y)-R(x-1,y),prec)。

Alternatively, the composition of the present invention,

g_h(x,y)＝Shift(|R(x+1,y)-R(x-1,y)|,prec)。

Alternatively, the composition of the present invention,

g_h(x,y)＝SatShift(|R(x+1,y)-R(x-1,y)|,prec)。

C. In one example, the vertical gradient values for all or some of the pixels within the block are calculated in the same manner as the vertical gradient values in the BIO are calculated and averaged (or otherwise processed) to obtain the vertical gradient values for the block used in the ALF.

D. In one example, the horizontal gradient values for all or some of the pixels within the block are calculated in the same manner as the horizontal gradient values in the BIO are calculated and averaged (or otherwise processed) to obtain the horizontal gradient values for the block used in the ALF.

5. All or part of the filling method proposed for filling out of range samples used to derive gradients in ALF is the same as the corresponding filling method used for filling out of range samples used to derive gradients in BIO.

A. Alternatively, all or part of the filling method used to fill out-of-range samples used to derive gradients in the BIO is the same as the corresponding filling method used to fill out-of-range samples used to derive gradients in the ALF.

6. All or part of the filling method proposed for filling out-of-range samples used to derive gradients in the PROF is the same as the corresponding filling method used for filling out-of-range samples used to derive gradients in the BIO.

A. Alternatively, all or part of the filling method used to fill out-of-range samples used to derive gradients in the BIO is the same as the corresponding filling method used to fill out-of-range samples used to derive gradients in the PROF.

7. All or part of the filling method proposed for filling out of range samples used to derive gradients in the PROF is the same as the corresponding filling method used for filling out of range samples used to derive gradients in the ALF.

A. alternatively, all or part of the filling method for filling out-of-range samples used to derive gradients in the ALF is the same as the corresponding filling method for filling out-of-range samples used to derive gradients in the PROF.

8. The proposed method can also be applied to other codec tools that rely on gradient computation.

The above-described examples may be incorporated into the context of the methods described below (e.g., method 1100 and method 1150), which may be implemented at a video decoder or video encoder.

Fig. 11A shows a flowchart of an exemplary method for video processing. The method 1100 includes: in step 1102, for a current video block, a first derivation process for deriving first gradient values used in a first codec tool is configured based on a second derivation process for deriving second gradient values used in a second codec tool different from the first codec tool.

The method 1100 includes: in step 1104, the current video block is reconstructed from the corresponding bitstream representation based on the first derivation process and the first codec tool. In some embodiments, at least one of the first codec tool and the second codec tool involves a pixel filtering process, and the first gradient value and the second gradient value are indicative of directional changes in light intensity or color components occurring over a range of sample subsets of the current video block.

In some embodiments, the pixel filtering process is an adaptive loop filtering process.

In some embodiments, the first codec tool is a bi-directional optical flow (BIO) refinement and the second codec tool is an Adaptive Loop Filtering (ALF) process.

In some embodiments, the first codec tool is a Prediction Refinement (PROF) process that employs optical flow, and the second codec tool is an Adaptive Loop Filtering (ALF) process.

In some embodiments, the first codec tool is an Adaptive Loop Filtering (ALF) process and the second codec tool is a bi-directional optical flow (BIO) refinement.

In some embodiments, the first and second derivation processes include either vertical gradient value calculations or horizontal gradient value calculations. In an example, the vertical gradient value calculation or the horizontal gradient value calculation is based on a [ -1,2, -1] filter.

In some embodiments, the first derivation process includes sub-block level gradient value calculation. In an example, the first derivation process is not applied to each of the points of the current video block.

Fig. 11B shows a flowchart of an exemplary method for video processing. The method 1150 includes: in step 1152, for the current video block, a first padding process used in a first codec tool is configured, the configuration being based on a second padding process used in a second codec tool different from the first codec tool.

The method 1150 includes: in step 1154, the current video block is reconstructed from the corresponding bitstream representation based on the first padding process and the first codec tool. In some embodiments, the first filling process and the second filling process include adding out-of-range samples through a subset of samples in the current video block to the calculation of gradient values indicative of directional changes in light intensity or color components.

In some embodiments, the first codec tool is a bi-directional optical flow (BIO) refinement and the second codec tool is a Predictive Refinement (PROF) process employing optical flow.

In some embodiments, the first codec tool is a Predictive Refinement (PROF) process that employs optical flow, and the second codec tool is a bi-directional optical flow (BIO) refinement.

In some embodiments, the first codec tool is an Adaptive Loop Filtering (ALF) process and the second codec tool is a Predictive Refinement (PROF) process that employs optical flow.

10 Example implementations of the disclosed technology

Fig. 12 is a block diagram of a video processing apparatus 1200. The apparatus 1200 may be used to implement one or more of the methods described herein. The apparatus 1200 may be embodied into a smart phone, tablet, computer, internet of things (IoT) receiver, or the like. The apparatus 1200 may include one or more processors 1202, one or more memories 1204, and video processing hardware 1206. The processor(s) 1202 may be configured to implement one or more methods described herein (including, but not limited to, methods 1100 and 1150). Memory(s) 1204 may be used to store data and code for implementing the methods and techniques described herein. The video processing hardware 1206 may be used to implement some of the techniques described in this document in hardware circuitry.

In some embodiments, these video codec methods may be implemented using an apparatus implemented on a hardware platform as described with respect to fig. 12.

Fig. 13 is a flow chart of an example method 1300 of video processing. The method 1300 includes: configuring (1302), for a transition between a first block of video and a bitstream representation of the first block, a first derivation process for deriving first gradient values used in a first codec tool to be aligned with a second derivation process for deriving second gradient values used in a second codec tool different from the first codec tool; and performing (1304) the conversion based on the configured first derivation process.

In some embodiments, all or part of the first derivation process is aligned with corresponding all or part of the second derivation process.

In some embodiments, the first derivation process and the second derivation process comprise the same vertical gradient value calculation for calculating vertical gradient values and/or the same horizontal gradient value calculation for calculating horizontal gradient values.

In some embodiments, the first codec tool is a bi-directional optical flow (BDOF) refinement and the second codec tool is an Adaptive Loop Filtering (ALF) process.

In some embodiments, the first codec tool is an ALF process refinement and the second codec tool is BDOF refinement.

In some embodiments, the vertical gradient value calculation and/or the horizontal gradient value calculation is based on a [ -1,2, -1] filter.

In some embodiments, the vertical gradient value (g _v) in BDOF and/or the horizontal gradient value (g _h) in BDOF are calculated using the function Shift (x, n), which is defined as

Shift(x,n)＝(x+offset0)>>n，

Where x is a variable, offset0 is set to (1 < < n) > >1 or (1 < < (n-1)), or offset0 is set to 0, or offset 0= ((1 < < n) > > 1) -1 or ((1 < < (n-1))) -1, n is an integer.

In some embodiments, the vertical gradient value (g _v) in BDOF is calculated as follows:

g_v＝Shift(2R(k,l)-R(k,l-off1)-R(k,l+off2),prec)，

Where R (i, j) indicates the reconstructed or predicted samples at coordinates (i, j), variables off1, off2, and prec are integers.

In some embodiments, the horizontal gradient value (g _h) in BDOF is calculated as follows:

g_h＝Shift(2R(k,l)-R(k-off1,l)-R(k+off2,l),prec)，

In some embodiments, off1 = off2 = 1 and prec = 0.

In some embodiments, by calculating the vertical gradient value (g _v) in BDOF and/or the horizontal gradient value (g _h) in BDOF using a function SatShift (x, n), satShift (x, n) is defined as

Where x is a variable, offset0 and/or offset1 is set to (1 < < n) > >1 or (1 < < (n-1)), or offset0 and/or offset1 is set to 0, or offset0 = offset 1= ((1 < < n) > > 1) -1 or ((1 < < (n-1))) -1, n is an integer.

g_v＝SatShift(2R(k,l)-R(k,l-off1)-R(k,l+off2),prec)，

g_h＝SatShift(2R(k,l)-R(k-off1,l)-R(k+off2,l),prec)，

In some embodiments, off1 = off2 = 1 and prec = 0.

In some embodiments, the first codec tool is a Prediction Refinement (PROF) process employing optical flow applied to affine codec blocks, and the second codec tool is an Adaptive Loop Filtering (ALF) process.

In some embodiments, the vertical gradient value (g _v) in the PROF and/or the horizontal gradient value (g _h) in the PROF are calculated using the function Shift (x, n), which is defined as

Shift(x,n)＝(x+offset0)>>n，

In some embodiments, the vertical gradient value (g _v) in the PROF is calculated as follows:

g_v＝Shift(2R(k,l)-R(k,l-off1)-R(k,l+off2),prec)，

In some embodiments, the horizontal gradient value (g _h) in the PROF is calculated as follows:

g_h＝Shift(2R(k,l)-R(k-off1,l)-R(k+off2,l),prec)，

In some embodiments, off1 = off2 = 1 and prec = 0.

In some embodiments, satShift (x, n) is defined as by calculating the vertical gradient value (g _v) in PROF and/or the horizontal gradient value (g _h) in PROF using a function SatShift (x, n)

g_v＝SatShift(2R(k,l)-R(k,l-off1)-R(k,l+off2),prec)，

g_h＝SatShift(2R(k,l)-R(k-off1,l)-R(k+off2,l),prec)，

In some embodiments, off1 = off2 = 1 and prec = 0.

In some embodiments, the first codec tool is an Adaptive Loop Filtering (ALF) process and the second codec tool is a bi-directional optical flow (BDOF) refinement.

In some embodiments, the vertical gradient value calculation and/or the horizontal gradient value calculation is based on a [1, -1] filter.

In some embodiments, the vertical gradient value (g _v) in the ALF and/or the horizontal gradient value (g _h) in the ALF are calculated using the function Shift (x, n), which is defined as

Shift(x,n)＝(x+offset0)>>n，

In some embodiments, the vertical gradient value (g _v) in the ALF is calculated as follows:

g_v(x,y)＝Shift(R(x,y+off1)-R(x,y-off2),prec)，

where x and y are variables and R (i, j) indicates reconstructed or predicted samples at coordinates (i, j), variables off1, off2 and prec are integers.

g_v(x,y)＝Shift(|R(x,y+off1)-R(x,y-off2)|,prec)，

In some embodiments, the horizontal gradient value (g _h) in the ALF is calculated as follows:

g_h(x,y)＝Shift(R(x+1,y)-R(x-1,y),prec)，

Where x and y are variables, R (i, j) indicates reconstructed or predicted samples at coordinates (i, j), and variable prec is an integer.

g_h(x,y)＝Shift(|R(x+1,y)-R(x-1,y)|,prec)，

In some embodiments, off1 = off2 = 1 and prec = 0.

In some embodiments, satShift (x, n) is defined as by calculating a vertical gradient value (g _v) in ALF and/or a horizontal gradient value (g _h) in ALF using a function SatShift (x, n)

g_v(x,y)＝SatShift(R(x,y+off1)-R(x,y-off2),prec)，

g_v(x,y)＝SatShift(|R(x,y+off1)-R(x,y-off2)|,prec)，

g_h(x,y)＝SatShift(R(x+1,y)-R(x-1,y),prec)，

g_h(x,y)＝SatShift(|R(x+1,y)-R(x-1,y)|,prec)，

In some embodiments, off1 = off2 = 1 and prec = 0.

In some embodiments, the vertical gradient values for all or part of the samples in the first block are calculated in the same manner as the vertical gradient values in calculation BDOF and averaged to obtain the vertical gradient values for the first block used in the ALF.

In some embodiments, the horizontal gradient values for all or part of the samples in the first block are calculated in the same manner as the horizontal gradient values in calculation BDOF and averaged to obtain the horizontal gradient values for the first block used in the ALF.

Fig. 14 is a flow chart of an example method 1400 of video processing. The method 1400 includes: deriving (1402) gradient values for use in one or more codec tools by applying a sub-block level gradient calculation procedure for a transition between a first block and a bitstream representation of the first block of video, wherein the gradient values are derived for partial samples within a predicted block of the first block; and performing (1404) the conversion based on the derived gradient values.

In some embodiments, the one or more codec tools include at least one of bi-directional optical flow (BDOF) refinement, predictive Refinement (PROF) procedures employing optical flow, and other non-ALF codec tools.

In some embodiments, gradient values are derived using only samples at selected coordinates.

In some embodiments, when gradient values are not calculated for samples at certain coordinates, the gradient values associated with the samples at those coordinates are copied from the gradient values associated with their neighbors in which the gradient values were calculated.

In some embodiments, how the gradient values are copied from the selected sample points to those remaining sample points depends on the gradient direction including at least one of a horizontal gradient or a vertical gradient.

Fig. 15 is a flow chart of an example method 1500 of video processing. The method 1500 includes: configuring (1502) a first padding process in a first codec tool to be aligned with a second padding process in a second codec tool different from the first codec tool for a transition between a first block of video and a bitstream representation of the first block, wherein the first padding process is used to pad samples outside a range of gradient values used in the first codec tool and the second padding process is used to pad samples outside the range of gradient values used in the second codec tool; and performing (1504) the conversion based on the configured first filling procedure.

In some embodiments, all or part of the first filling process is aligned with the corresponding all or part of the second filling process.

In some embodiments, the first codec tool is a bi-directional optical flow (BDOF) refinement and the second codec tool is a Predictive Refinement (PROF) process that employs optical flow.

In some embodiments, the first codec tool is a Predictive Refinement (PROF) process that employs optical flow, and the second codec tool is a bi-directional optical flow (BDOF) refinement.

In some embodiments, the first codec tool and/or the second codec tool comprises a codec tool that relies on the computation of gradient values.

In some embodiments, the conversion generates a first block of video from the bitstream representation.

In some embodiments, the conversion generates a bitstream representation from a first block of video.

From the foregoing, it will be appreciated that specific embodiments of the technology disclosed herein have been described for purposes of illustration, but that various modifications may be made without deviating from the scope of the invention. Accordingly, the techniques of the present disclosure are not limited except by the appended claims.

Implementations of the subject matter and the functional operations described in this patent document can be implemented in various systems, digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a tangible and non-transitory computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a storage device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing unit" or "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. In addition to hardware, an apparatus may include code that creates an execution environment for a computer program, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. The computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. These processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

It is intended that the specification, together with the drawings, be considered exemplary only, with the exemplary representation. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, the use of "or" is intended to include "and/or" unless the context clearly indicates otherwise.

While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although certain features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, although operations are illustrated in a particular order in the figures, this should not be understood as requiring that such operations be performed in sequential order or a particular order as illustrated, or that all illustrated operations be performed, to achieve desirable results. Furthermore, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.

Only a few implementations and examples have been described, and other implementations, enhancements, and variations may be made based on what is described and illustrated in this patent document.

Claims

1. A method for processing video, comprising:

For a transition between a first block of video and a bitstream of the first block, configuring a first derivation process for deriving first gradient values used in a first codec tool to be aligned with a second derivation process for deriving second gradient values used in a second codec tool different from the first codec tool; and

Performing the conversion based on the configured first derivation process;

wherein the first codec is a bi-directional optical flow BDOF refinement process and the second codec is an adaptive loop filter, ALF, process, or

The first codec tool is the ALF process and the second codec tool is the BDOF refinement process, or

The first codec tool is a prediction refinement, PROF, process using optical flow applied to affine codec blocks, and the second codec tool is the ALF process; and

Wherein gradient values used in the BDOF refinement process or the PROF process are derived by applying a sub-block-level gradient calculation process, the gradient values being derived for part of the samples within the prediction block of the first block.

2. The method of claim 1, wherein all or part of the first derivation process is aligned with a corresponding all or part of the second derivation process.

3. The method of claim 1, wherein the first and second derivation processes comprise the same vertical gradient value calculation for calculating vertical gradient values and/or the same horizontal gradient value calculation for calculating horizontal gradient values.

4. A method according to claim 3, wherein the vertical gradient value calculation and/or the horizontal gradient value calculation is based on a [ -1,2, -1] filter.

5. The method of claim 4, wherein the vertical gradient value (g _v) in BDOF and/or the horizontal gradient value (g _h) in BDOF are calculated using a function Shift (x, n), the Shift (x, n) being defined as

Shift(x,n)＝(x+offset0)>>n，

6. The method of claim 5, wherein the vertical gradient value (g _v) in BDOF is calculated as follows:

g_v＝Shift(2R(k,l)-R(k,l-off1)-R(k,l+off2),prec)，

7. The method of claim 5, wherein the horizontal gradient value (g _h) in BDOF is calculated as follows:

g_h＝Shift(2R(k,l)-R(k-off1,l)-R(k+off2,l),prec)，

8. The method of claim 6 or 7, wherein off1 = off2 = 1 and prec = 0.

9. The method of claim 4, wherein the vertical gradient value (g _v) in BDOF and/or the horizontal gradient value (g _h) in BDOF is calculated by using a function SatShift (x, n), satShift (x, n) being defined as

10. The method of claim 9, wherein the vertical gradient value (g _v) in BDOF is calculated as follows:

g_v＝SatShift(2R(k,l)-R(k,l-off1)-R(k,l+off2),prec)，

11. The method of claim 9, wherein the horizontal gradient value (g _h) in BDOF is calculated as follows:

g_h＝SatShift(2R(k,l)-R(k-offk1,l)-R(k+off2,l),prec)，

where E (i, j) indicates the reconstructed or predicted samples at coordinates (i, j), variables off1, off2, and prec are integers.

12. The method of claim 10 or 11, wherein off1 = off2 = 1 and prec = 0.

13. The method according to claim 4, wherein the vertical gradient value (g _v) in the PROF and/or the horizontal gradient value (g _h) in the PROF are calculated by using a function Shift (x, n), the Shift (x, n) being defined as

Shift(x,n)＝(x+offset0)>>n，

14. The method of claim 13, wherein the vertical gradient value (g _v) in the PROF is calculated as follows:

g_v＝Shift(2R(k,l)-R(k,l-off1)-R(k,l+off2),prec)，

15. The method of claim 13, wherein the horizontal gradient value (g _h) in the PROF is calculated as follows:

g_h＝Shift(2R(k,l)-R(k-off1,l)-R(k+off2,l),prec)，

16. The method of claim 14 or 15, wherein off1 = off2 = 1 and prec = 0.

17. The method according to claim 4, wherein SatShift (x, n) is defined as calculating a vertical gradient value (g _v) in the PROF and/or a horizontal gradient value (g _h) in the PROF by using a function SatShift (x, n)

18. The method of claim 17, wherein the vertical gradient value (g _v) in the PROF is calculated as follows:

g_v＝SatShift(2R(k,l)-R(k,l-off1)-R(k,l+off2),prec)，

19. The method of claim 17, wherein the horizontal gradient value (g _h) in the PROF is calculated as follows:

g_h＝SatShift(2R(k,l)-R(k-off1,l)-R(k+off2,l),prec)，

20. The method of claim 18 or 19, wherein off1 = off2 = 1 and prec = 0.

21. A method according to claim 3, wherein the vertical gradient value calculation and/or the horizontal gradient value calculation is based on a [1, -1] filter.

22. The method according to claim 21, wherein the vertical gradient value (g _v) in the ALF and/or the horizontal gradient value (g _h) in the ALF is calculated by using a function Shift (x, n), which is defined as Shift (x, n) = (x+offset0) > > n,

23. The method of claim 22, wherein the vertical gradient value (g _v) in the ALF is calculated as follows:

g_v(x,y)＝Shift(R(x,y+off1)-R(x,y-off2),prec)，

24. The method of claim 22, wherein the vertical gradient value (g _v) in the ALF is calculated as follows:

g_v(x,y)＝Shift(|R(x,y+off1)-R(x,y-off2)|,prec)，

25. The method of claim 22, wherein the horizontal gradient value (g _h) in the ALF is calculated as follows:

g_h(x,y)＝Shift(R(x+1,y)-R(x-1,y),prec)，

26. The method of claim 22, wherein the horizontal gradient value (g _h) in the ALF is calculated as follows:

g_h(x,y)＝Shift(|R(x+1,y)-R(x-1,y)|,prec)，

27. The method of any of claims 23-26, wherein off1 = off2 = 1 and prec = 0.

28. The method of claim 21, wherein SatShift (x, n) is defined as calculating a vertical gradient value (g _v) in ALF and/or a horizontal gradient value (g _h) in ALF by using a function SatShift (x, n)

29. The method of claim 28, wherein the vertical gradient value (g _v) in the ALF is calculated as follows:

g_v(x,y)＝SatShift(R(x,y+off1)-R(x,y-off2),prec)，

30. The method of claim 28, wherein the vertical gradient value (g _v) in the ALF is calculated as follows:

g_v(x,y)＝SatShift(|R(x,y+off1)-R(x,y-off2)|,prec)，

31. The method of claim 28, wherein the horizontal gradient value (g _h) in the ALF is calculated as follows:

g_h(x,y)＝SatShift(R(x+1,y)-R(x-1,y),prec)，

32. The method of claim 28, wherein the horizontal gradient value (g _h) in the ALF is calculated as follows:

g_h(x,y)＝SatShift(|R(x+1,y)-R(x-1,y)|,prec)，

33. The method of any of claims 29-32, wherein off1 = off2 = 1 and prec = 0.

34. A method according to claim 3, wherein the vertical gradient values of all or part of the samples in the first block are calculated in the same way as the vertical gradient values in BDOF are calculated and averaged to obtain the vertical gradient values for the first block used in ALF.

35. A method according to claim 3, wherein the horizontal gradient values of all or part of the samples in the first block are calculated in the same way as the horizontal gradient values in BDOF are calculated and averaged to obtain the horizontal gradient values for the first block used in ALF.

36. The method of claim 1, wherein gradient values are derived using only samples at selected coordinates.

37. The method of claim 36, wherein, when gradient values are not calculated for samples at certain coordinates, the gradient values associated with the samples at the certain coordinates are copied from gradient values associated with neighbors thereof in which the gradient values were calculated.

38. The method of claim 37, wherein how the gradient values are copied from the selected sample points to those remaining sample points depends on a gradient direction comprising at least one of a horizontal gradient or a vertical gradient.

39. The method of claim 1, wherein the first codec tool and/or the second codec tool comprises a gradient value dependent calculation codec tool.

40. The method of claim 1, wherein the converting generates the first block of video from the bitstream.

41. The method of claim 1, wherein the converting generates the bitstream from a first block of the video.

42. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of claims 1-41.

43. A non-transitory computer readable medium storing instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1-41.