WO2024088003A1 - Method and apparatus of position-aware reconstruction in in-loop filtering - Google Patents

Method and apparatus of position-aware reconstruction in in-loop filtering Download PDF

Info

Publication number
WO2024088003A1
WO2024088003A1 PCT/CN2023/121900 CN2023121900W WO2024088003A1 WO 2024088003 A1 WO2024088003 A1 WO 2024088003A1 CN 2023121900 W CN2023121900 W CN 2023121900W WO 2024088003 A1 WO2024088003 A1 WO 2024088003A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
sample
loop
loop filter
filter
Prior art date
Application number
PCT/CN2023/121900
Other languages
French (fr)
Inventor
Shih-Chun Chiu
Ching-Yeh Chen
Original Assignee
Mediatek Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mediatek Inc. filed Critical Mediatek Inc.
Publication of WO2024088003A1 publication Critical patent/WO2024088003A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel

Definitions

  • the present invention relates to in-loop filters in a video coding system.
  • the present invention relates techniques that uses position-aware in-loop filtering process to improve the coding performance.
  • VVC Versatile video coding
  • JVET Joint Video Experts Team
  • MPEG ISO/IEC Moving Picture Experts Group
  • ISO/IEC 23090-3 2021
  • Information technology -Coded representation of immersive media -Part 3 Versatile video coding, published Feb. 2021.
  • VVC is developed based on its predecessor HEVC (High Efficiency Video Coding) by adding more coding tools to improve coding efficiency and also to handle various types of video sources including 3-dimensional (3D) video signals.
  • HEVC High Efficiency Video Coding
  • Fig. 1A illustrates an exemplary adaptive Inter/Intra video coding system incorporating loop processing.
  • Intra Prediction the prediction data is derived based on previously coded video data in the current picture.
  • Motion Estimation (ME) is performed at the encoder side and Motion Compensation (MC) is performed based on the result of ME to provide prediction data derived from other picture (s) and motion data.
  • Switch 114 selects Intra Prediction 110 or Inter-Prediction 112 and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues.
  • the prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120.
  • T Transform
  • Q Quantization
  • the transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data.
  • the bitstream associated with the transform coefficients is then packed with side information such as motion and coding modes associated with Intra prediction and Inter prediction, and other information such as parameters associated with loop filters applied to underlying image area.
  • the side information associated with Intra Prediction 110, Inter prediction 112 and in-loop filter 130, are provided to Entropy Encoder 122 as shown in Fig. 1A. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well.
  • the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues.
  • the residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data.
  • the reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.
  • incoming video data undergoes a series of processing in the encoding system.
  • the reconstructed video data from REC 128 may be subject to various impairments due to a series of processing.
  • in-loop filter 130 is often applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality.
  • deblocking filter (DF) may be used.
  • SAO Sample Adaptive Offset
  • ALF Adaptive Loop Filter
  • the loop filter information may need to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, loop filter information is also provided to Entropy Encoder 122 for incorporation into the bitstream.
  • DF deblocking filter
  • SAO Sample Adaptive Offset
  • ALF Adaptive Loop Filter
  • Loop filter 130 is applied to the reconstructed video before the reconstructed samples are stored in the reference picture buffer 134.
  • the system in Fig. 1A is intended to illustrate an exemplary structure of a typical video encoder. It may correspond to the High Efficiency Video Coding (HEVC) system, VP8, VP9, H. 264 or VVC.
  • HEVC High Efficiency Video Coding
  • the decoder can use similar or portion of the same functional blocks as the encoder except for Transform 118 and Quantization 120 since the decoder only needs Inverse Quantization 124 and Inverse Transform 126.
  • the decoder uses an Entropy Decoder 140 to decode the video bitstream into quantized transform coefficients and needed coding information (e.g. ILPF information, Intra prediction information and Inter prediction information) .
  • the Intra prediction 150 at the decoder side does not need to perform the mode search. Instead, the decoder only needs to generate Intra prediction according to Intra prediction information received from the Entropy Decoder 140.
  • the decoder only needs to perform motion compensation (MC 152) according to Inter prediction information received from the Entropy Decoder 140 without the need for motion estimation.
  • an input picture is partitioned into non-overlapped square block regions referred as CTUs (Coding Tree Units) , similar to HEVC.
  • CTUs Coding Tree Units
  • Each CTU can be partitioned into one or multiple smaller size coding units (CUs) .
  • the resulting CU partitions can be in square or rectangular shapes.
  • VVC divides a CTU into prediction units (PUs) as a unit to apply prediction process, such as Inter prediction, Intra prediction, etc.
  • an Adaptive Loop Filter (ALF) with block-based filter adaption is applied.
  • ALF Adaptive Loop Filter
  • the 7 ⁇ 7 diamond shape 220 is applied for luma component and the 5 ⁇ 5 diamond shape 210 is applied for chroma components.
  • each 4 ⁇ 4 block is categorized into one out of 25 classes.
  • the classification index C is derived based on its directionality D and a quantized value of activity as follows:
  • indices i and j refer to the coordinates of the upper left sample within the 4 ⁇ 4 block and R (i, j) indicates a reconstructed sample at coordinate (i, j) .
  • the subsampled 1-D Laplacian calculation is applied to the vertical direction (Fig. 3A) and the horizontal direction (Fig. 3B) .
  • the same subsampled positions are used for gradient calculation of all directions (g d1 in Fig. 3C and g d2 in Fig. 3D) .
  • D maximum and minimum values of the gradients of horizontal and vertical directions are set as:
  • Step 1 If both and are true, D is set to 0.
  • Step 2 If continue from Step 3; otherwise continue from Step 4.
  • Step 3 If D is set to 2; otherwise D is set to 1.
  • the activity value A is calculated as:
  • A is further quantized to the range of 0 to 4, inclusively, and the quantized value is denoted as
  • K is the size of the filter and 0 ⁇ k, l ⁇ K-1 are coefficients coordinates, such that location (0, 0) is at the upper left corner and location (K-1, K-1) is at the lower right corner.
  • the transformations are applied to the filter coefficients f (k, l) and to the clipping values c (k, l) depending on gradient values calculated for that block. The relationship between the transformation and the four gradients of the four directions are summarized in the following table.
  • each sample R (i, j) within the CU is filtered, resulting in sample value R′ (i, j) as shown below,
  • f (k, l) denotes the decoded filter coefficients
  • K (x, y) is the clipping function
  • c (k, l) denotes the decoded clipping parameters.
  • the variable k and l varies between –L/2 and L/2, where L denotes the filter length.
  • the clipping function K (x, y) min (y, max (-y, x) ) which corresponds to the function Clip3 (-y, y, x) .
  • the clipping operation introduces non-linearity to make ALF more efficient by reducing the impact of neighbour sample values that are too different with the current sample value.
  • CC-ALF uses luma sample values to refine each chroma component by applying an adaptive, linear filter to the luma channel and then using the output of this filtering operation for chroma refinement.
  • Fig. 4A provides a system level diagram of the CC-ALF process with respect to the SAO, luma ALF and chroma ALF processes. As shown in Fig. 4A, each colour component (i.e., Y, Cb and Cr) is processed by its respective SAO (i.e., SAO Luma 410, SAO Cb 412 and SAO Cr 414) .
  • SAO i.e., SAO Luma 410, SAO Cb 412 and SAO Cr 414.
  • ALF Luma 420 is applied to the SAO-processed luma and ALF Chroma 430 is applied to SAO-processed Cb and Cr.
  • ALF Chroma 430 is applied to SAO-processed Cb and Cr.
  • there is a cross-component term from luma to a chroma component i.e., CC-ALF Cb 422 and CC-ALF Cr 424) .
  • the outputs from the cross-component ALF are added (using adders 432 and 434 respectively) to the outputs from ALF Chroma 430.
  • Filtering in CC-ALF is accomplished by applying a linear, diamond shaped filter (e.g. filters 440 and 442 in Fig. 4B) to the luma channel.
  • a linear, diamond shaped filter e.g. filters 440 and 442 in Fig. 4B
  • a blank circle indicates a luma sample and a dot-filled circle indicate a chroma sample.
  • One filter is used for each chroma channel, and the operation is expressed as:
  • (x, y) is chroma component i location being refined
  • (x Y , y Y ) is the luma location based on (x, y)
  • S i is filter support area in luma component
  • c i (x 0 , y 0 ) represents the filter coefficients.
  • the luma filter support is the region collocated with the current chroma sample after accounting for the spatial scaling factor between the luma and chroma planes.
  • CC-ALF filter coefficients are computed by minimizing the mean square error of each chroma channel with respect to the original chroma content.
  • VTM VVC Test Model
  • the VTM (VVC Test Model) algorithm uses a coefficient derivation process similar to the one used for chroma ALF. Specifically, a correlation matrix is derived, and the coefficients are computed using a Cholesky decomposition solver in an attempt to minimize a mean square error metric.
  • a maximum of 8 CC-ALF filters can be designed and transmitted per picture. The resulting filters are then indicated for each of the two chroma channels on a CTU basis.
  • CC-ALF Additional characteristics include:
  • the design uses a 3x4 diamond shape with 8 taps.
  • Each of the transmitted coefficients has a 6-bit dynamic range and is restricted to power-of-2 values.
  • the eighth filter coefficient is derived at the decoder such that the sum of the filter coefficients is equal to 0.
  • An APS may be referenced in the slice header.
  • ⁇ CC-ALF filter selection is controlled at CTU-level for each chroma component.
  • the reference encoder can be configured to enable some basic subjective tuning through the configuration file.
  • the VTM attenuates the application of CC-ALF in regions that are coded with high QP and are either near mid-grey or contain a large amount of luma high frequencies. Algorithmically, this is accomplished by disabling the application of CC-ALF in CTUs where any of the following conditions are true:
  • the slice QP value minus 1 is less than or equal to the base QP value.
  • ALF filter parameters are signalled in Adaptation Parameter Set (APS) .
  • APS Adaptation Parameter Set
  • up to 25 sets of luma filter coefficients and clipping value indexes, and up to eight sets of chroma filter coefficients and clipping value indexes could be signalled.
  • filter coefficients of different classification for luma component can be merged.
  • slice header the indices of the APSs used for the current slice are signalled.
  • is a pre-defined constant value equal to 2.35, and N equal to 4 which is the number of allowed clipping values in VVC.
  • the AlfClip is then rounded to the nearest value with the format of power of 2.
  • APS indices can be signalled to specify the luma filter sets that are used for the current slice.
  • the filtering process can be further controlled at CTB level.
  • a flag is always signalled to indicate whether ALF is applied to a luma CTB.
  • a luma CTB can choose a filter set among 16 fixed filter sets and the filter sets from APSs.
  • a filter set index is signalled for a luma CTB to indicate which filter set is applied.
  • the 16 fixed filter sets are pre-defined and hard-coded in both the encoder and the decoder.
  • an APS index is signalled in slice header to indicate the chroma filter sets being used for the current slice.
  • a filter index is signalled for each chroma CTB if there is more than one chroma filter set in the APS.
  • the filter coefficients are quantized with norm equal to 128.
  • a bitstream conformance is applied so that the coefficient value of the non-central position shall be in the range of -2 7 to 2 7 -1, inclusive.
  • the central position coefficient is not signalled in the bitstream and is considered as equal to 128.
  • Block size for classification is reduced from 4x4 to 2x2.
  • Filter size for both luma and chroma, for which ALF coefficients are signalled, is increased to 9x9.
  • two 13x13 diamond shape fixed filters F 0 and F 1 are applied to derive two intermediate samples R 0 (x, y) and R 1 (x, y) .
  • F 2 is applied to R 0 (x, y) , R 1 (x, y) , and neighbouring samples to derive a filtered sample as
  • f i, j is the clipped difference between a neighbouring sample and current sample R (x, y) and g i is the clipped difference between R i-20 (x, y) and current sample.
  • M D, i represents the total number of directionalities D i .
  • values of the horizontal, vertical, and two diagonal gradients are calculated for each sample using 1-D Laplacian.
  • the sum of the sample gradients within a 4 ⁇ 4 window that covers the target 2 ⁇ 2 block is used for classifier C 0 and the sum of sample gradients within a 12 ⁇ 12 window is used for classifiers C 1 and C 2 .
  • the sums of horizontal, vertical and two diagonal gradients are denoted, respectively, as and The directionality D i is determined by comparing
  • the directionality D 2 is derived as in VVC using thresholds 2 and 4.5.
  • D 0 and D 1 horizontal/vertical edge strength and diagonal edge strength are calculated first.
  • Thresholds Th [1.25, 1.5, 2, 3, 4.5, 8] are used.
  • each set may have up to 25 filters.
  • the filter is carried out in the sample adaptive offset (SAO) loop-filter stage, as shown in Fig. 5.
  • SAO sample adaptive offset
  • Both the bilateral filter (BIF) 510 and SAO 520 are using samples from deblocking as input. Each filter creates an offset per sample, and these are added (via adder 530) to the input sample and then clipped (by Clip module 540) , before proceeding to ALF.
  • I C is the input sample from deblocking
  • ⁇ I BIF is the offset from the bilateral filter
  • ⁇ I SAO is the offset from SAO
  • the implementation provides the possibility for the encoder to enable or disable filtering at the CTU and slice level.
  • the encoder makes a decision by evaluating the RDO cost.
  • the filtering process proceeds as follows. At the picture border, where samples are unavailable, the bilateral filter uses extension (sample repetition) to fill in unavailable samples. For virtual boundaries, the behaviour is the same as for SAO, i.e., no filtering occurs. When crossing horizontal CTU borders, the bilateral filter can access the same samples as SAO is accessing. As an example, if the centre sample I C (see Fig.
  • JVET-P0073 Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 16th Meeting: Geneva, CH, 1–11 October 2019, Document JVET-P0073
  • the modifier value is now calculated as:
  • JVET-P0073 This is different from JVET-P0073, where 5 such tables are used, and the same table is reused for several qp-values.
  • JVET-N0493 Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 14th Meeting: Geneva, CH, 19–27 March 2019, Document JVET-N0493) section 3.1.3
  • a position-aware in-loop filtering technique is disclosed to improve the coding performance.
  • a method and apparatus for video coding using position-aware in-loop filtering are disclosed. According to the method, reconstructed pixels associated with a current coding region are received. A horizontal period M and a vertical period N are determined, wherein M and N are positive integers. Samples in the current coding region are divided into K positional groups according to sample positions with respect to the horizontal period M and the vertical period N, wherein K is a positive integer smaller than or equal to MxN. K in-loop filters are determined for the K positional groups, wherein any two of the K in-loop filters are different in at least one part of filtering process.
  • a target filtered output for a target sample of the current coding region having a target positional group number is derived by applying a target in-loop filter to the target sample, wherein the target in-loop filter is selected from the K in-loop filters according to the target positional group number. Filtered-reconstructed pixels are provided, wherein the filtered-reconstructed pixels comprise the target filtered output.
  • the K in-loop filters correspond to SAO (Sample Adaptive Offset) with K different classification rules or K different offsets.
  • the K in-loop filters correspond to BIF (Bilateral Filter) with K different lookup tables or K different intermediate filtering result derivation methods.
  • the K in-loop filters correspond to ALF (Adaptive Loop Filter) with K different classification rules or K different sets of coefficients.
  • the K in-loop filters correspond to one target in-loop filter with K different on/off control methods.
  • said applying the target in-loop filter to the target sample is performed as a standalone process. In one embodiment, said applying the target in-loop filter to the target sample is performed between deblocking filtering and SAO (Sample Adaptive Offset) /BIF (Bilateral Filter) . In another embodiment, said applying the target in-loop filter to the target sample is performed between SAO (Sample Adaptive Offset) /BIF (Bilateral Filter) and ALF (Adaptive Loop Filter) . In yet another embodiment, said applying the target in-loop filter to the target sample is performed after ALF (Adaptive Loop Filter) .
  • SAO Sample Adaptive Offset
  • BIF Bilateral Filter
  • ALF Adaptive Loop Filter
  • said applying the target in-loop filter to the target sample is performed in parallel with SAO (Sample Adaptive Offset) /BIF (Bilateral Filter) .
  • said applying the target in-loop filter to the target sample is performed in parallel with ALF (Adaptive Loop Filter) .
  • said applying the target in-loop filter to the target sample is performed in parallel with deblocking filtering.
  • one or more first flags indicating the horizontal period M and/or the vertical period N are signalled or parsed at CTB (Coding Tree Block) , slice, or APS (Adaptation Parameter Set) level.
  • a second flag is signalled or parsed at the CTB, the slice, or the APS level to indicating whether to select the target in-loop filter from the K in-loop filters according to the target positional group number, and whether to derive the target filtered output for the target sample of the current coding region by applying the target in-loop filter to the target sample.
  • Fig. 1A illustrates an exemplary adaptive Inter/Intra video coding system incorporating loop processing.
  • Fig. 1B illustrates a corresponding decoder for the encoder in Fig. 1A.
  • Fig. 2 illustrates the ALF filter shapes for the chroma (left) and luma (right) components.
  • Figs. 3A-D illustrates the subsampled Laplacian calculations for g v (3A) , g h (3B) , g d1 (3C) and g d2 (3D) .
  • Fig. 4A illustrates the placement of CC-ALF with respect to other loop filters.
  • Fig. 4B illustrates a diamond shaped filter for the chroma samples.
  • Fig. 5 illustrates an example of Bilateral Filter (BIF) in ECM, where both BIF and SAO use samples from the deblocking stage as input and the outputs are combined and clipped before proceeding to ALF.
  • BIF Bilateral Filter
  • Fig. 6 shows the naming convention for samples surrounding the centre sample, I C for BIF processing.
  • Fig. 7 illustrates an example of partitioning a coding region into MxN blocks and samples in each MxN block are divided into K (K ⁇ MxN) groups for selecting K at-least partially different in-loop filters.
  • Fig. 9 illustrates a flowchart of an exemplary video coding system that utilizes position-aware reconstruction in in-loop filtering according to an embodiment of the present invention.
  • position-aware reconstruction is disclosed to improve the performance of ALF.
  • VVC in-loop filtering stage the sample is processed by deblocking filtering, SAO, and ALF.
  • ECM the sample is processed by deblocking filtering, SAO, BIF, and ALF with more complicated SAO and ALF design.
  • it is proposed to consider the positional information in the reconstruction process of in-loop filtering.
  • the reconstruction process of in-loop filtering is also referred as the filtering process using an in-loop filter.
  • a horizontal period M and a vertical period N are determined.
  • the samples in a current coding region are divided into K groups based on M and N, and the samples in the different groups undergo partially different in-loop filtering mechanism at least.
  • the difference can be different classification rules in SAO, different offsets in SAO, different look-up tables (LUT ROW ) in BIF, different intermediate filtering result (m sum , c v , or ⁇ I BIF ) derivation methods in BIF, different classification rules in ALF, different set of coefficients in ALF, or different on/off control in the in-loop filtering process.
  • each small square represents a sample and the number in the small square is the positional group index assigned based on positional information of the sample.
  • a same positional group number is assigned to samples at a same corresponding positions within the MxN blocks.
  • the positional group number assigned to all sample locations at (0, 0) of the MxN blocks in the current coding region is 0.
  • the positional group number assigned to all sample locations at (0, 1) or (1, 0) of the MxN blocks is 1.
  • all sample locations at (2, 0) of the MxN blocks in the current coding region is 3 and all sample locations at (3, 0) of the MxN blocks in the current coding region is 4, and so on.
  • the offset derivation process for each sample is affected by the positional group index. For example, 6 additional offsets c M , c M+1 , ..., c M+5 are signalled, and one or more of them will be added to the existing SAO offset for each sample according to the positional group index.
  • the existing SAO offset for sample at (x, y) as c SAO (x, y)
  • the SAO reconstruction for samples with positional group index g becomes:
  • one new classifier which classifies the to-be-processed samples into different classes according to the positions of the to-be-processed samples and this pattern is added in SAO. That is, one additional mode can be used in SAO, compared to the original design.
  • the samples in some specific group (s) are not filtered by SAO.
  • the filtering process for each sample is affected by the positional group index. For example, in the ALF reconstruction equation
  • additional flags can be signalled at CTB, slice, or APS level to indicate the period selection (i.e., M and N) . Furthermore, additional flags can be signalled at CTB, slice, or APS level to indicate whether to use the positional information or not. If positional information is not used, the reconstruction process of in-loop filtering will follow the existing design.
  • such positional-aware reconstruction is a standalone process rather than integrated with existing in-loop filtering tools.
  • This process can be between deblocking filtering and SAO/BIF, between SAO/BIF and ALF, or after ALF.
  • this process can be executed in parallel with one of current in-loop filtering, e.g. deblocking filter, SAO/BIF, or ALF.
  • the final output will be the combination of this process and the one in-loop filtering which is performed in parallel. In one example, this process is added into the same stage of SAO/BIF.
  • the reconstructed picture will be restored by three kinds of in-loop filtering, SAO, BIF and the proposed position-aware reconstruction, in parallel, and the final output of this stage will be the combination of the outputs from these three in-loop filtering.
  • any of the position-aware reconstruction in in-loop filtering as described above can be implemented in encoders and/or decoders.
  • any of the proposed methods can be implemented in the in-loop filter module (e.g. ILPF 130 in Fig. 1A and Fig. 1B) of an encoder or a decoder.
  • any of the proposed methods can be implemented as a circuit coupled to the inter coding module of an encoder and/or motion compensation module, a merge candidate derivation module of the decoder.
  • the ALF methods may also be implemented using executable software or firmware codes stored on a media, such as hard disk or flash memory, for a CPU (Central Processing Unit) or programmable devices (e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate Array) ) .
  • a media such as hard disk or flash memory, for a CPU (Central Processing Unit) or programmable devices (e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate
  • Fig. 9 illustrates a flowchart of an exemplary video coding system that utilizes position-aware reconstruction in in-loop filtering according to an embodiment of the present invention.
  • the steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g., one or more CPUs) at the encoder side.
  • the steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart.
  • reconstructed pixels associated with a current coding region are received in step 910.
  • a horizontal period M and a vertical period N are determined in step 920, wherein M and N are positive integers.
  • Samples in the current coding region are divided into K positional groups according to sample positions with respect to the horizontal period M and the vertical period N in step 930, wherein K is a positive integer smaller than or equal to MxN.
  • K in-loop filters are determined for the K positional groups in step 940, wherein any two of the K in-loop filters are different in at least one part of filtering process.
  • a target filtered output for a target sample of the current coding region having a target positional group number is derived by applying a target in-loop filter to the target sample, wherein the target in-loop filter is selected from the K in-loop filters according to the target positional group number in step 950.
  • Filtered-reconstructed pixels are provided in step 960, wherein the filtered-reconstructed pixels comprise the target filtered output.
  • Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both.
  • an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein.
  • An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein.
  • DSP Digital Signal Processor
  • the invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA) .
  • These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention.
  • the software code or firmware code may be developed in different programming languages and different formats or styles.
  • the software code may also be compiled for different target platforms.
  • different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method and apparatus for video coding using position-aware in-loop filtering. According to the method, a horizontal period M and a vertical period N are determined. Samples in the current coding region are divided into K positional groups according to sample positions with respect to the horizontal period M and the vertical period N. K in-loop filters are determined for the K positional groups, wherein any two of the K in-loop filters are different in at least one part of filtering process. A target filtered output for a target sample of the current coding region having a target positional group number is derived by applying a target in-loop filter to the target sample, wherein the target in-loop filter is selected from the K in-loop filters according to the target positional group number. Filtered-reconstructed pixels are provided, wherein the filtered-reconstructed pixels comprise the target filtered output.

Description

METHOD AND APPARATUS OF POSITION-AWARE RECONSTRUCTION IN IN-LOOP FILTERING
CROSS REFERENCE TO RELATED APPLICATIONS
The present invention is a non-Provisional Application of and claims priority to U.S. Provisional Patent Application No. 63/380,591, filed on October 24, 2022. The U.S. Provisional Patent Applications are hereby incorporated by reference in their entireties.
FIELD OF THE INVENTION
The present invention relates to in-loop filters in a video coding system. In particular, the present invention relates techniques that uses position-aware in-loop filtering process to improve the coding performance.
BACKGROUND AND RELATED ART
Versatile video coding (VVC) is the latest international video coding standard developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) . The standard has been published as an ISO standard: ISO/IEC 23090-3: 2021, Information technology -Coded representation of immersive media -Part 3: Versatile video coding, published Feb. 2021. VVC is developed based on its predecessor HEVC (High Efficiency Video Coding) by adding more coding tools to improve coding efficiency and also to handle various types of video sources including 3-dimensional (3D) video signals.
Fig. 1A illustrates an exemplary adaptive Inter/Intra video coding system incorporating loop processing. For Intra Prediction, the prediction data is derived based on previously coded video data in the current picture. For Inter Prediction 112, Motion Estimation (ME) is performed at the encoder side and Motion Compensation (MC) is performed based on the result of ME to provide prediction data derived from other picture (s) and motion data. Switch 114 selects Intra Prediction 110 or Inter-Prediction 112 and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues. The prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data. The bitstream associated with the transform coefficients is then packed with side information such as motion and coding modes associated with Intra prediction and Inter prediction, and other information such as parameters associated with loop filters applied to underlying image area. The side information associated with Intra Prediction 110, Inter prediction 112 and in-loop filter 130,  are provided to Entropy Encoder 122 as shown in Fig. 1A. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues. The residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data. The reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.
As shown in Fig. 1A, incoming video data undergoes a series of processing in the encoding system. The reconstructed video data from REC 128 may be subject to various impairments due to a series of processing. Accordingly, in-loop filter 130 is often applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality. For example, deblocking filter (DF) , Sample Adaptive Offset (SAO) and Adaptive Loop Filter (ALF) may be used. The loop filter information may need to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, loop filter information is also provided to Entropy Encoder 122 for incorporation into the bitstream. In Fig. 1A, Loop filter 130 is applied to the reconstructed video before the reconstructed samples are stored in the reference picture buffer 134. The system in Fig. 1A is intended to illustrate an exemplary structure of a typical video encoder. It may correspond to the High Efficiency Video Coding (HEVC) system, VP8, VP9, H. 264 or VVC.
The decoder, as shown in Fig. 1B, can use similar or portion of the same functional blocks as the encoder except for Transform 118 and Quantization 120 since the decoder only needs Inverse Quantization 124 and Inverse Transform 126. Instead of Entropy Encoder 122, the decoder uses an Entropy Decoder 140 to decode the video bitstream into quantized transform coefficients and needed coding information (e.g. ILPF information, Intra prediction information and Inter prediction information) . The Intra prediction 150 at the decoder side does not need to perform the mode search. Instead, the decoder only needs to generate Intra prediction according to Intra prediction information received from the Entropy Decoder 140. Furthermore, for Inter prediction, the decoder only needs to perform motion compensation (MC 152) according to Inter prediction information received from the Entropy Decoder 140 without the need for motion estimation.
According to VVC, an input picture is partitioned into non-overlapped square block regions referred as CTUs (Coding Tree Units) , similar to HEVC. Each CTU can be partitioned into one or multiple smaller size coding units (CUs) . The resulting CU partitions can be in square or rectangular shapes. Also, VVC divides a CTU into prediction units (PUs) as a unit to apply prediction process, such as Inter prediction, Intra prediction, etc.
Adaptive Loop Filter in VVC
In VVC, an Adaptive Loop Filter (ALF) with block-based filter adaption is applied. For the  luma component, one filter is selected among 25 filters for each 4×4 block, based on the direction and activity of local gradients.
Filter shape
Two diamond filter shapes (as shown in Fig. 2) are used. The 7×7 diamond shape 220 is applied for luma component and the 5×5 diamond shape 210 is applied for chroma components.
Block classification
For luma component, each 4×4 block is categorized into one out of 25 classes. The classification index C is derived based on its directionality D and a quantized value of activityas follows:
To calculate D andgradients of the horizontal, vertical and two diagonal direction are first calculated using 1-D Laplacian:



where indices i and j refer to the coordinates of the upper left sample within the 4×4 block and R (i, j) indicates a reconstructed sample at coordinate (i, j) .
To reduce the complexity of block classification, the subsampled 1-D Laplacian calculation is applied to the vertical direction (Fig. 3A) and the horizontal direction (Fig. 3B) . As shown in Figs. 3C-D, the same subsampled positions are used for gradient calculation of all directions (gd1 in Fig. 3C and gd2 in Fig. 3D) .
Then D maximum and minimum values of the gradients of horizontal and vertical directions are set as:
The maximum and minimum values of the gradient of two diagonal directions are set as:
To derive the value of the directionality D, these values are compared against each other and with two thresholds t1 and t2:
Step 1. If bothandare true, D is set to 0.
Step 2. Ifcontinue from Step 3; otherwise continue from Step 4.
Step 3. IfD is set to 2; otherwise D is set to 1.
Step 4. IfD is set to 4; otherwise D is set to 3.
The activity value A is calculated as:
A is further quantized to the range of 0 to 4, inclusively, and the quantized value is denoted as
For chroma components in a picture, no classification is applied.
Geometric transformations of filter coefficients and clipping values
Before filtering each 4×4 luma block, geometric transformations such as rotation or diagonal and vertical flipping are applied to the filter coefficients f (k, l) and to the corresponding filter clipping values c (k, l) depending on gradient values calculated for that block. This is equivalent to applying these transformations to the samples in the filter support region. The idea is to make different blocks to which ALF is applied more similar by aligning their directionality.
Three geometric transformations, including diagonal, vertical flip and rotation are introduced:
Diagonal: fD (k, l) =f (l, k) , cD (k, l) =c (l, k) ,
Vertical flip: fV (k, l) =f (k, K-l-1) , cV (k, l) =c (k, K-l-1) ,
Rotation: fR (k, l) =f (K-l-1, k) , cR (k, l) =c (K-l-1, k) ,
where K is the size of the filter and 0≤k, l≤K-1 are coefficients coordinates, such that location (0, 0) is at the upper left corner and location (K-1, K-1) is at the lower right corner. The transformations are applied to the filter coefficients f (k, l) and to the clipping values c (k, l) depending on gradient values calculated for that block. The relationship between the transformation and the four gradients of the four directions are summarized in the following table.
Table 1. Mapping of the gradient calculated for one block and the transformations
Filtering process
At decoder side, when ALF is enabled for a CTB (Coding Tree Block) , each sample R (i, j) within the CU is filtered, resulting in sample value R′ (i, j) as shown below,
where f (k, l) denotes the decoded filter coefficients, K (x, y) is the clipping function and c (k, l) denotes the decoded clipping parameters. The variable k and l varies between –L/2 and L/2, where L denotes the filter length. The clipping function K (x, y) =min (y, max (-y, x) ) which corresponds to the function Clip3 (-y, y, x) . The clipping operation introduces non-linearity to make ALF more efficient by reducing the impact of neighbour sample values that are too different with the current sample value.
Cross Component Adaptive Loop Filter
CC-ALF uses luma sample values to refine each chroma component by applying an adaptive, linear filter to the luma channel and then using the output of this filtering operation for chroma refinement. Fig. 4A provides a system level diagram of the CC-ALF process with respect to the SAO, luma ALF and chroma ALF processes. As shown in Fig. 4A, each colour component (i.e., Y, Cb and Cr) is processed by its respective SAO (i.e., SAO Luma 410, SAO Cb 412 and SAO Cr 414) . After SAO, ALF Luma 420 is applied to the SAO-processed luma and ALF Chroma 430 is applied to SAO-processed Cb and Cr. However, there is a cross-component term from luma to a chroma component (i.e., CC-ALF Cb 422 and CC-ALF Cr 424) . The outputs from the cross-component ALF are added (using adders 432 and 434 respectively) to the outputs from ALF Chroma 430.
Filtering in CC-ALF is accomplished by applying a linear, diamond shaped filter (e.g. filters 440 and 442 in Fig. 4B) to the luma channel. In Fig. 4B, a blank circle indicates a luma sample and a dot-filled circle indicate a chroma sample. One filter is used for each chroma channel, and the operation is expressed as:
where (x, y) is chroma component i location being refined, (xY, yY) is the luma location based on (x, y) , Si is filter support area in luma component, and ci (x0, y0) represents the filter coefficients.
As shown in Fig, 4B, the luma filter support is the region collocated with the current chroma sample after accounting for the spatial scaling factor between the luma and chroma planes.
In the VVC reference software, CC-ALF filter coefficients are computed by minimizing the mean square error of each chroma channel with respect to the original chroma content. To achieve this, the VTM (VVC Test Model) algorithm uses a coefficient derivation process similar to the one used for chroma ALF. Specifically, a correlation matrix is derived, and the coefficients are computed using a Cholesky decomposition solver in an attempt to minimize a mean square error metric. In designing the filters, a maximum of 8 CC-ALF filters can be designed and transmitted per picture. The resulting filters are then indicated for each of the two chroma channels on a CTU basis.
Additional characteristics of CC-ALF include:
● The design uses a 3x4 diamond shape with 8 taps.
● Seven filter coefficients are transmitted in the APS (Adaptation Parameter Set) .
● Each of the transmitted coefficients has a 6-bit dynamic range and is restricted to power-of-2 values.
● The eighth filter coefficient is derived at the decoder such that the sum of the filter coefficients is equal to 0.
● An APS may be referenced in the slice header.
● CC-ALF filter selection is controlled at CTU-level for each chroma component.
● Boundary padding for the horizontal virtual boundaries uses the same memory access pattern as luma ALF.
As an additional feature, the reference encoder can be configured to enable some basic subjective tuning through the configuration file. When enabled, the VTM attenuates the application of CC-ALF in regions that are coded with high QP and are either near mid-grey or contain a large amount of luma high frequencies. Algorithmically, this is accomplished by disabling the application of CC-ALF in CTUs where any of the following conditions are true:
● The slice QP value minus 1 is less than or equal to the base QP value.
● The number of chroma samples for which the local contrast is greater than (1 << (bitDepth –2 ) ) –1 exceeds the CTU height, where the local contrast is the difference between the maximum and minimum luma sample values within the filter support region.
● More than a quarter of chroma samples are in the range between (1 << (bitDepth –1 ) ) –16 and (1 << (bitDepth –1 ) ) + 16
The motivation for this functionality is to provide some assurance that CC-ALF does not amplify artefacts introduced earlier in the decoding path (This is largely due the fact that the VTM currently does not explicitly optimize for chroma subjective quality) . It is anticipated that alternative encoder implementations may either not use this functionality or incorporate alternative strategies suitable for their encoding characteristics.
Filter parameters signalling
ALF filter parameters are signalled in Adaptation Parameter Set (APS) . In one APS, up to 25 sets of luma filter coefficients and clipping value indexes, and up to eight sets of chroma filter coefficients and clipping value indexes could be signalled. To reduce bits overhead, filter coefficients of different classification for luma component can be merged. In slice header, the indices of the APSs used for the current slice are signalled.
Clipping value indexes, which are decoded from the APS, allow determining clipping values using a table of clipping values for both luma and Chroma components. These clipping values are dependent of the internal bitdepth. More precisely, the clipping values are obtained by the following formula:
AlfClip= {round (2B-α*n ) for n∈ [0.. N-1] }
with B equal to the internal bitdepth, α is a pre-defined constant value equal to 2.35, and N equal to 4 which is the number of allowed clipping values in VVC. The AlfClip is then rounded to the nearest value with the format of power of 2.
In slice header, up to 7 APS indices can be signalled to specify the luma filter sets that are used for the current slice. The filtering process can be further controlled at CTB level. A flag is always signalled to indicate whether ALF is applied to a luma CTB. A luma CTB can choose a filter set among 16 fixed filter sets and the filter sets from APSs. A filter set index is signalled for a luma CTB to indicate which filter set is applied. The 16 fixed filter sets are pre-defined and hard-coded in both the encoder and the decoder.
For the chroma component, an APS index is signalled in slice header to indicate the chroma filter sets being used for the current slice. At CTB level, a filter index is signalled for each chroma  CTB if there is more than one chroma filter set in the APS.
The filter coefficients are quantized with norm equal to 128. In order to restrict the multiplication complexity, a bitstream conformance is applied so that the coefficient value of the non-central position shall be in the range of -27 to 27 -1, inclusive. The central position coefficient is not signalled in the bitstream and is considered as equal to 128.
Adaptive Loop Filter in ECM
In ECM (Muhammed Coban, et al., “Algorithm description of Enhanced Compression Model 5 (ECM 5) ” , Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 26th Meeting, by teleconference, 20–29 April 2022, Document JVET-Z2025) , some changes have been made to ALF compared the ALF in the VVC standard as shown below.
ALF simplification
ALF gradient subsampling and ALF virtual boundary processing are removed. Block size for classification is reduced from 4x4 to 2x2. Filter size for both luma and chroma, for which ALF coefficients are signalled, is increased to 9x9.
ALF with fixed filters
To filter a luma sample, three different classifiers (C0, C1 and C2) and three different sets of filters (F0, F1 and F2) are used. Sets F0 and F1 contain fixed filters, with coefficients trained for classifiers C0 and C1. Coefficients of filters in F2 are signalled. Which filter from a set Fi is used for a given sample is decided by a class Ci assigned to this sample using classifier Ci.
Filtering
At first, two 13x13 diamond shape fixed filters F0 and F1 are applied to derive two intermediate samples R0 (x, y) and R1 (x, y) . After that, F2 is applied to R0 (x, y) , R1 (x, y) , and neighbouring samples to derive a filtered sample as
where fi, j is the clipped difference between a neighbouring sample and current sample R (x, y) and gi is the clipped difference between Ri-20 (x, y) and current sample. The filter coefficients ci, i=0, …21, are signalled.
Classification
Based on directionality Di and activitya class Ci is assigned to each 2x2 block:
where MD, i represents the total number of directionalities Di.
As in VVC, values of the horizontal, vertical, and two diagonal gradients are calculated for each sample using 1-D Laplacian. The sum of the sample gradients within a 4×4 window that covers the target 2×2 block is used for classifier C0 and the sum of sample gradients within a 12×12 window is used for classifiers C1 and C2. The sums of horizontal, vertical and two diagonal gradients are denoted, respectively, asandThe directionality Di is determined by comparing
with a set of thresholds. The directionality D2 is derived as in VVC using thresholds 2 and 4.5.
For D0 and D1, horizontal/vertical edge strengthand diagonal edge strengthare calculated first. Thresholds Th= [1.25, 1.5, 2, 3, 4.5, 8] are used. Edge strengthis 0 if otherwise, is the maximum integer such thatEdge strengthis 0 ifotherwise, is the maximum integer such thatWheni.e., horizontal/vertical edges are dominant, the Di is derived by using Table 2A; otherwise, diagonal edges are dominant, the Di is derived by using Table 2B.
Table 2A. Mapping ofandto Di
Table 2B. Mapping ofandto Di
To obtainthe sum of vertical and horizontal gradients Ai is mapped to the range of 0 to  n, where n is equal to 4 forand 15 forand
In an ALF_APS, up to 4 luma filter sets are signalled, each set may have up to 25 filters.
Bilateral Filter in ECM
The filter is carried out in the sample adaptive offset (SAO) loop-filter stage, as shown in Fig. 5. Both the bilateral filter (BIF) 510 and SAO 520 are using samples from deblocking as input. Each filter creates an offset per sample, and these are added (via adder 530) to the input sample and then clipped (by Clip module 540) , before proceeding to ALF.
In detail, the output sample IOUT is obtained as
IOUT=clip3 (IC+ΔIBIF+ΔISAO) ,       (1)
where IC is the input sample from deblocking, ΔIBIF is the offset from the bilateral filter and ΔISAO is the offset from SAO.
The implementation provides the possibility for the encoder to enable or disable filtering at the CTU and slice level. The encoder makes a decision by evaluating the RDO cost.
For CTUs that are filtered, the filtering process proceeds as follows. At the picture border, where samples are unavailable, the bilateral filter uses extension (sample repetition) to fill in unavailable samples. For virtual boundaries, the behaviour is the same as for SAO, i.e., no filtering occurs. When crossing horizontal CTU borders, the bilateral filter can access the same samples as SAO is accessing. As an example, if the centre sample IC (see Fig. 6) is located on the top line of a CTU, INW, IA and INE are read from the CTU above, just like SAO does, but IAA is padded, so no extra line buffer is needed compared to JVET-P0073 (Jacob et al., “CE5-3.1 Combination of bilateral filter and SAO” , Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 16th Meeting: Geneva, CH, 1–11 October 2019, Document JVET-P0073) .
The samples surrounding the centre sample IC are denoted according to Fig. 6, where A, B, L and R stands for above, below, left and right and where NW, NE, SW, SE stands for the diagonal direction such as north-west (NW) . Likewise, AA stands for above-above, BB for below-below, etc. This diamond shape in ECM is different from JVET-P0073 which used a square filter support, not using IAA, IBB, ILL, or IRR.
Each surrounding sample IA, IR, etc. will contribute with a corresponding modifier value etc. These are calculated the following way: starting with the contribution from the sample to the right, IR, we calculate the difference
ΔIR= (|IR-IC|+4) >>3,        (2)
where |·| denotes absolute value. For data that is not 10-bit, we use ΔIR= (|IR-IC|+2n-6) >> (n-7) instead, where n = 8 for 8-bit data etc. The resulting value is now clipped so that it is smaller than 16:
sIR=min (15, ΔIR) .          (3)
The modifier value is now calculated as:
where LUTROW [] is an array of 16 values determined by the value of qpb = clip (0, 25, QP +bilateral_filter_qp_offset-17) :
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, } , if qpb = 0
{0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, } , if qpb = 1
{0, 2, 2, 2, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, } , if qpb = 2
{0, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, -1, } , if qpb = 3
{0, 3, 3, 3, 2, 2, 1, 2, 1, 1, 1, 1, 0, 1, 1, -1, } , if qpb = 4
{0, 4, 4, 4, 3, 2, 1, 2, 1, 1, 1, 1, 0, 1, 1, -1, } , if qpb = 5
{0, 5, 5, 5, 4, 3, 2, 2, 2, 2, 2, 1, 0, 1, 1, -1, } , if qpb = 6
{0, 6, 7, 7, 5, 3, 3, 3, 3, 2, 2, 1, 1, 1, 1, -1, } , if qpb = 7
{0, 6, 8, 8, 5, 4, 3, 3, 3, 3, 3, 2, 1, 2, 2, -2, } , if qpb = 8
{0, 7, 10, 10, 6, 4, 4, 4, 4, 3, 3, 2, 2, 2, 2, -2, } , if qpb = 9
{0, 8, 11, 11, 7, 5, 5, 4, 5, 4, 4, 2, 2, 2, 2, -2, } , if qpb = 10
{0, 8, 12, 13, 10, 8, 8, 6, 6, 6, 5, 3, 3, 3, 3, -2, } , if qpb = 11
{0, 8, 13, 14, 13, 12, 11, 8, 8, 7, 7, 5, 5, 4, 4, -2, } , if qpb = 12
{0, 9, 14, 16, 16, 15, 14, 11, 9, 9, 8, 6, 6, 5, 6, -3, } , if qpb = 13
{0, 9, 15, 17, 19, 19, 17, 13, 11, 10, 10, 8, 8, 6, 7, -3, } , if qpb = 14
{0, 9, 16, 19, 22, 22, 20, 15, 12, 12, 11, 9, 9, 7, 8, -3, } , if qpb = 15
{0, 10, 17, 21, 24, 25, 24, 20, 18, 17, 15, 12, 11, 9, 9, -3, } , if qpb = 16
{0, 10, 18, 23, 26, 28, 28, 25, 23, 22, 18, 14, 13, 11, 11, -3, } , if qpb = 17
{0, 11, 19, 24, 29, 30, 32, 30, 29, 26, 22, 17, 15, 13, 12, -3, } , if qpb = 18
{0, 11, 20, 26, 31, 33, 36, 35, 34, 31, 25, 19, 17, 15, 14, -3, } , if qpb = 19
{0, 12, 21, 28, 33, 36, 40, 40, 40, 36, 29, 22, 19, 17, 15, -3, } , if qpb = 20
{0, 13, 21, 29, 34, 37, 41, 41, 41, 38, 32, 23, 20, 17, 15, -3, } , if qpb = 21
{0, 14, 22, 30, 35, 38, 42, 42, 42, 39, 34, 24, 20, 17, 15, -3, } , if qpb = 22
{0, 15, 22, 31, 35, 39, 42, 42, 43, 41, 37, 25, 21, 17, 15, -3, } , if qpb = 23
{0, 16, 23, 32, 36, 40, 43, 43, 44, 42, 39, 26, 21, 17, 15, -3, } , if qpb = 24
{0, 17, 23, 33, 37, 41, 44, 44, 45, 44, 42, 27, 22, 17, 15, -3, } , if qpb = 25
This is different from JVET-P0073, where 5 such tables are used, and the same table is reused for several qp-values.
As described in JVET-N0493 (Jacob et al., “CE1-related: Multiplication-free bilateral loop filter” , Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 14th Meeting: Geneva, CH, 19–27 March 2019, Document JVET-N0493) section 3.1.3, these values can be stored using six bits per entry resulting in 26*16*6/8=312 bytes or 300 bytes if excluding the first row which is all zeros.
The modifier values forandare calculated from IL, IA and IB in the same way. For diagonal samples INW, INE, ISE, ISW, and the samples two steps away IAA, IBB, IRR and ILL, the calculation also follows Equations (2) and (3) , but uses a value shifted by 1. Using the diagonal sample ISE as an example, we get
and the other diagonal samples and two-steps-away samples are calculated likewise. The modifier values are summed together
Note thatequalsfor the previous sample. Likewise, equalsfor the sample above, and similar symmetries can be found also for the diagonal and two-steps-away modifier values. This means that in a hardware implementation, it is sufficient to calculate the six valuesandand the remaining six values can be obtained from previously calculated values.
The msum value is now multiplied either by c = 1, 2 or 3, which can be done using a single adder and logical AND gates in the following way:
cv=k1& (msum<<1) +k2&msum,       (7)
where &denotes logical AND and k1 is the most significant bit of the multiplier c and k2 is the least significant bit. The value to multiply with is obtained using the minimum block dimension D=min (width, height) as shown in Table:
Table. Obtaining the c parameter from the minimum size D = min (width, height) of the  block.
Finally, the bilateral filter offset ΔIBIF is calculated. For full strength filtering, we use
ΔIBIF= (cv+16) >>5,           (8)
whereas for half-strength filtering, we instead use
ΔIBIF= (cv+32) >>6.        (9)
A general formula for n-bit data is to use
radd=214-n-bilatera_filter_strength
rshift=15-n-bilateal_filter_strength             (10)
ΔIBIF= (cv+radd) >>rshift,
where bilateral_filter_strength can be 0 or 1 and is signalled in the PPS (Picture Parameter Set) .
In the present invention, a position-aware in-loop filtering technique is disclosed to improve the coding performance.
BRIEF SUMMARY OF THE INVENTION
A method and apparatus for video coding using position-aware in-loop filtering are disclosed. According to the method, reconstructed pixels associated with a current coding region are received. A horizontal period M and a vertical period N are determined, wherein M and N are positive integers. Samples in the current coding region are divided into K positional groups according to sample positions with respect to the horizontal period M and the vertical period N, wherein K is a positive integer smaller than or equal to MxN. K in-loop filters are determined for the K positional groups, wherein any two of the K in-loop filters are different in at least one part of filtering process. A target filtered output for a target sample of the current coding region having a target positional group number is derived by applying a target in-loop filter to the target sample, wherein the target in-loop filter is selected from the K in-loop filters according to the target positional group number. Filtered-reconstructed pixels are provided, wherein the filtered-reconstructed pixels comprise the target filtered output.
In one embodiment, the K in-loop filters correspond to SAO (Sample Adaptive Offset) with K different classification rules or K different offsets. In another embodiment, the K in-loop filters correspond to BIF (Bilateral Filter) with K different lookup tables or K different intermediate filtering result derivation methods. In yet another embodiment, the K in-loop filters correspond to  ALF (Adaptive Loop Filter) with K different classification rules or K different sets of coefficients. In yet another embodiment, the K in-loop filters correspond to one target in-loop filter with K different on/off control methods.
In one embodiment, said applying the target in-loop filter to the target sample is performed as a standalone process. In one embodiment, said applying the target in-loop filter to the target sample is performed between deblocking filtering and SAO (Sample Adaptive Offset) /BIF (Bilateral Filter) . In another embodiment, said applying the target in-loop filter to the target sample is performed between SAO (Sample Adaptive Offset) /BIF (Bilateral Filter) and ALF (Adaptive Loop Filter) . In yet another embodiment, said applying the target in-loop filter to the target sample is performed after ALF (Adaptive Loop Filter) .
In one embodiment, said applying the target in-loop filter to the target sample is performed in parallel with SAO (Sample Adaptive Offset) /BIF (Bilateral Filter) . In another embodiment, said applying the target in-loop filter to the target sample is performed in parallel with ALF (Adaptive Loop Filter) . In yet another embodiment, said applying the target in-loop filter to the target sample is performed in parallel with deblocking filtering.
In one embodiment, one or more first flags indicating the horizontal period M and/or the vertical period N are signalled or parsed at CTB (Coding Tree Block) , slice, or APS (Adaptation Parameter Set) level. In another embodiment, a second flag is signalled or parsed at the CTB, the slice, or the APS level to indicating whether to select the target in-loop filter from the K in-loop filters according to the target positional group number, and whether to derive the target filtered output for the target sample of the current coding region by applying the target in-loop filter to the target sample.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1A illustrates an exemplary adaptive Inter/Intra video coding system incorporating loop processing.
Fig. 1B illustrates a corresponding decoder for the encoder in Fig. 1A.
Fig. 2 illustrates the ALF filter shapes for the chroma (left) and luma (right) components.
Figs. 3A-D illustrates the subsampled Laplacian calculations for gv (3A) , gh (3B) , gd1 (3C) and gd2 (3D) .
Fig. 4A illustrates the placement of CC-ALF with respect to other loop filters.
Fig. 4B illustrates a diamond shaped filter for the chroma samples.
Fig. 5 illustrates an example of Bilateral Filter (BIF) in ECM, where both BIF and SAO use samples from the deblocking stage as input and the outputs are combined and clipped before proceeding to ALF.
Fig. 6 shows the naming convention for samples surrounding the centre sample, IC for BIF processing.
Fig. 7 illustrates an example of partitioning a coding region into MxN blocks and samples in each MxN block are divided into K (K < MxN) groups for selecting K at-least partially different in-loop filters.
Fig. 8 illustrates an example of partitioning a coding region into MxN blocks and samples in each MxN block are divided into K (K = MxN) groups for selecting K at-least partially different in-loop filters.
Fig. 9 illustrates a flowchart of an exemplary video coding system that utilizes position-aware reconstruction in in-loop filtering according to an embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
It will be readily understood that the components of the present invention, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the systems and methods of the present invention, as represented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. References throughout this specification to “one embodiment, ” “an embodiment, ” or similar language mean that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, etc. In other instances, well-known structures, or operations are not shown or described in detail to avoid obscuring aspects of the invention. The illustrated embodiments of the invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of apparatus and methods that are consistent with the invention as claimed herein.
In the present invention, position-aware reconstruction is disclosed to improve the performance of ALF.
Position-Aware Reconstruction in In-Loop Filtering
In VVC in-loop filtering stage, the sample is processed by deblocking filtering, SAO, and  ALF. In ECM, the sample is processed by deblocking filtering, SAO, BIF, and ALF with more complicated SAO and ALF design. In this invention, it is proposed to consider the positional information in the reconstruction process of in-loop filtering. The reconstruction process of in-loop filtering is also referred as the filtering process using an in-loop filter.
In one embodiment, a horizontal period M and a vertical period N are determined. The samples in a current coding region are divided into K groups based on M and N, and the samples in the different groups undergo partially different in-loop filtering mechanism at least. The difference can be different classification rules in SAO, different offsets in SAO, different look-up tables (LUTROW) in BIF, different intermediate filtering result (msum, cv, or ΔIBIF) derivation methods in BIF, different classification rules in ALF, different set of coefficients in ALF, or different on/off control in the in-loop filtering process.
For example, let M = 4 and N = 2, given a current coding region, the samples in each M by N block are divided into K = 6 groups, where the number in each small square indicates the group of the sample, and the group index can be 0, 1, 2, …, 5) as shown in Fig. 7. In Fig. 7, each small square represents a sample and the number in the small square is the positional group index assigned based on positional information of the sample. The example here shows that the number of group K is not necessarily equal to MxN (i.e., K (=6) not equal to 4x2 in this example) . In other words, not every sample in an M by N block is categorized into a distinct group. In other words, some samples within an MxN block are assigned the same group index. In Fig. 7, a same positional group number is assigned to samples at a same corresponding positions within the MxN blocks. For example, if the upper-left corner coordinate of each MxN block is named as (0, 0) , the positional group number assigned to all sample locations at (0, 0) of the MxN blocks in the current coding region is 0. The positional group number assigned to all sample locations at (0, 1) or (1, 0) of the MxN blocks is 1. Similarly, all sample locations at (2, 0) of the MxN blocks in the current coding region is 3 and all sample locations at (3, 0) of the MxN blocks in the current coding region is 4, and so on.
If SAO is used for the current coding region, the offset derivation process for each sample is affected by the positional group index. For example, 6 additional offsets cM, cM+1, …, cM+5 are signalled, and one or more of them will be added to the existing SAO offset for each sample according to the positional group index. As an example, denote the existing SAO offset for sample at (x, y) as cSAO (x, y) , the SAO reconstruction for samples with positional group index g becomes:
where R (x, y) andare the current sample value before and after SAO reconstruction, respectively.
In another example, one new classifier which classifies the to-be-processed samples into different classes according to the positions of the to-be-processed samples and this pattern is added in SAO. That is, one additional mode can be used in SAO, compared to the original design. In another example, the samples in some specific group (s) are not filtered by SAO.
For another example, let M = 4 and N = 2, given a current coding region, the samples in each M by N block are divided into K = 8 groups as shown in Fig. 8, where the number in the small square indicates the group index of the sample, and the group index can be 0, 1, 2, …, 7.
If ALF is used for the current coding region, the filtering process for each sample is affected by the positional group index. For example, in the ALF reconstruction equation
additional 8 coefficients cM, cM+1, …, cM+7 are signalled, and one or more of them will be introduced to the existing ALF filtering equation for each sample according to the positional group index. As an example, the ALF reconstruction becomes:
for samples with positional group 0: 
for samples with positional group 1: 
for samples with positional group 2: 
for samples with positional group 3: 
for samples with positional group 4: 
for samples with positional group 5: 
for samples with positional group 6: 
for samples with positional group 7: 
where R (x, y) andare the current sample value before and after ALF reconstruction, respectively.
In the above embodiment, additional flags can be signalled at CTB, slice, or APS level to indicate the period selection (i.e., M and N) . Furthermore, additional flags can be signalled at CTB, slice, or APS level to indicate whether to use the positional information or not. If positional information is not used, the reconstruction process of in-loop filtering will follow the existing design.
In another embodiment, such positional-aware reconstruction is a standalone process rather than integrated with existing in-loop filtering tools. This process can be between deblocking filtering and SAO/BIF, between SAO/BIF and ALF, or after ALF. In another embodiment, this process can be executed in parallel with one of current in-loop filtering, e.g. deblocking filter, SAO/BIF, or ALF. And the final output will be the combination of this process and the one in-loop filtering which is performed in parallel. In one example, this process is added into the same stage of SAO/BIF. That is, after deblocking filter, the reconstructed picture will be restored by three kinds of in-loop filtering, SAO, BIF and the proposed position-aware reconstruction, in parallel, and the final output of this stage will be the combination of the outputs from these three in-loop filtering.
Any of the position-aware reconstruction in in-loop filtering as described above can be implemented in encoders and/or decoders. For example, any of the proposed methods can be implemented in the in-loop filter module (e.g. ILPF 130 in Fig. 1A and Fig. 1B) of an encoder or a decoder. Alternatively, any of the proposed methods can be implemented as a circuit coupled to the inter coding module of an encoder and/or motion compensation module, a merge candidate derivation module of the decoder. The ALF methods may also be implemented using executable software or firmware codes stored on a media, such as hard disk or flash memory, for a CPU (Central Processing Unit) or programmable devices (e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate Array) ) .
Fig. 9 illustrates a flowchart of an exemplary video coding system that utilizes position-aware reconstruction in in-loop filtering according to an embodiment of the present invention. The steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g., one or more CPUs) at the encoder side. The steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart. According to the method, reconstructed pixels associated with a current coding region are received in step 910. A horizontal period M and a vertical period N are determined in step 920, wherein M and N are positive integers. Samples in the current coding  region are divided into K positional groups according to sample positions with respect to the horizontal period M and the vertical period N in step 930, wherein K is a positive integer smaller than or equal to MxN. K in-loop filters are determined for the K positional groups in step 940, wherein any two of the K in-loop filters are different in at least one part of filtering process. A target filtered output for a target sample of the current coding region having a target positional group number is derived by applying a target in-loop filter to the target sample, wherein the target in-loop filter is selected from the K in-loop filters according to the target positional group number in step 950. Filtered-reconstructed pixels are provided in step 960, wherein the filtered-reconstructed pixels comprise the target filtered output.
The flowchart shown is intended to illustrate an example of video coding according to the present invention. A person skilled in the art may modify each step, re-arranges the steps, split a step, or combine steps to practice the present invention without departing from the spirit of the present invention. In the disclosure, specific syntax and semantics have been used to illustrate examples to implement embodiments of the present invention. A skilled person may practice the present invention by substituting the syntax and semantics with equivalent syntax and semantics without departing from the spirit of the present invention.
The above description is presented to enable a person of ordinary skill in the art to practice the present invention as provided in the context of a particular application and its requirement. Various modifications to the described embodiments will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed. In the above detailed description, various specific details are illustrated in order to provide a thorough understanding of the present invention. Nevertheless, it will be understood by those skilled in the art that the present invention may be practiced.
Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA) . These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software  code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.
The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims (15)

  1. A method for in-loop filter processing of reconstructed video, the method comprising:
    receiving reconstructed pixels associated with a current coding region;
    determining a horizontal period M and a vertical period N, wherein M and N are positive integers;
    dividing samples in the current coding region into K positional groups according to sample positions with respect to the horizontal period M and the vertical period N, wherein K is a positive integer smaller than or equal to MxN;
    determining K in-loop filters for the K positional groups, wherein any two of the K in-loop filters are different in at least one part of filtering process;
    deriving a target filtered output for a target sample of the current coding region having a target positional group number by applying a target in-loop filter to the target sample, wherein the target in-loop filter is selected from the K in-loop filters according to the target positional group number; and
    providing filtered-reconstructed pixels, wherein the filtered-reconstructed pixels comprise the target filtered output.
  2. The method of Claim 1, wherein the K in-loop filters correspond to SAO (Sample Adaptive Offset) with K different classification rules or K different offsets.
  3. The method of Claim 1, wherein the K in-loop filters correspond to BIF (Bilateral Filter) with K different lookup tables or K different intermediate filtering result derivation methods.
  4. The method of Claim 1, wherein the K in-loop filters correspond to ALF (Adaptive Loop Filter) with K different classification rules or K different sets of coefficients.
  5. The method of Claim 1, wherein the K in-loop filters correspond to one target in-loop filter with K different on/off control methods.
  6. The method of Claim 1, wherein said applying the target in-loop filter to the target sample is performed as a standalone process.
  7. The method of Claim 6, wherein said applying the target in-loop filter to the target sample is performed between deblocking filtering and SAO (Sample Adaptive Offset) /BIF (Bilateral Filter) .
  8. The method of Claim 6, wherein said applying the target in-loop filter to the target sample is performed between SAO (Sample Adaptive Offset) /BIF (Bilateral Filter) and ALF (Adaptive Loop Filter) .
  9. The method of Claim 6, wherein said applying the target in-loop filter to the target sample is performed after ALF (Adaptive Loop Filter) .
  10. The method of Claim 6, wherein said applying the target in-loop filter to the target sample is performed in parallel with SAO (Sample Adaptive Offset) /BIF (Bilateral Filter) .
  11. The method of Claim 6, wherein said applying the target in-loop filter to the target sample is performed in parallel with ALF (Adaptive Loop Filter) .
  12. The method of Claim 6, wherein said applying the target in-loop filter to the target sample is performed in parallel with deblocking filtering.
  13. The method of Claim 1, wherein one or more first flags indicating the horizontal period M and/or the vertical period N are signalled or parsed at CTB (Coding Tree Block) , slice, or APS (Adaptation Parameter Set) level.
  14. The method of Claim 13, wherein a second flag is signalled or parsed at the CTB, the slice, or the APS level to indicating whether to select the target in-loop filter from the K in-loop filters according to the target positional group number, and whether to derive the target filtered output for the target sample of the current coding region by applying the target in-loop filter to the target sample.
  15. An apparatus for in-loop filter processing of reconstructed video, the apparatus comprising one or more electronics or processors arranged to:
    receive reconstructed pixels associated with a current coding region;
    determine a horizontal period M and a vertical period N, wherein M and N are positive integers;
    divide samples in the current coding region into K positional groups according to sample positions with respect to the horizontal period M and the vertical period N, wherein K is a positive integer smaller than or equal to MxN;
    determine K in-loop filters for the K positional groups, wherein any two of the K in-loop filters are different in at least one part of filtering process;
    derive a target filtered output for a target sample of the current coding region having a target positional group number by applying a target in-loop filter to the target sample, wherein the target in-loop filter is selected from the K in-loop filters according to the target positional group number; and
    provide filtered-reconstructed pixels, wherein the filtered-reconstructed pixels comprise the target filtered output.
PCT/CN2023/121900 2022-10-24 2023-09-27 Method and apparatus of position-aware reconstruction in in-loop filtering WO2024088003A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263380591P 2022-10-24 2022-10-24
US63/380591 2022-10-24

Publications (1)

Publication Number Publication Date
WO2024088003A1 true WO2024088003A1 (en) 2024-05-02

Family

ID=90830011

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/121900 WO2024088003A1 (en) 2022-10-24 2023-09-27 Method and apparatus of position-aware reconstruction in in-loop filtering

Country Status (1)

Country Link
WO (1) WO2024088003A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103370936A (en) * 2011-04-21 2013-10-23 联发科技股份有限公司 Method and apparatus for improved in-loop filtering
JP2018509074A (en) * 2015-02-11 2018-03-29 クアルコム,インコーポレイテッド Coding tree unit (CTU) level adaptive loop filter (ALF)
CN110115038A (en) * 2016-12-22 2019-08-09 佳能株式会社 Encoding device, coding method and program, decoding device, coding/decoding method and program
US20200296365A1 (en) * 2019-03-16 2020-09-17 Mediatek Inc. Method and Apparatus for Signaling Adaptive Loop Filter Parameters in Video Coding
WO2021203394A1 (en) * 2020-04-09 2021-10-14 北京大学 Loop filtering method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103370936A (en) * 2011-04-21 2013-10-23 联发科技股份有限公司 Method and apparatus for improved in-loop filtering
JP2018509074A (en) * 2015-02-11 2018-03-29 クアルコム,インコーポレイテッド Coding tree unit (CTU) level adaptive loop filter (ALF)
CN110115038A (en) * 2016-12-22 2019-08-09 佳能株式会社 Encoding device, coding method and program, decoding device, coding/decoding method and program
US20200296365A1 (en) * 2019-03-16 2020-09-17 Mediatek Inc. Method and Apparatus for Signaling Adaptive Loop Filter Parameters in Video Coding
WO2021203394A1 (en) * 2020-04-09 2021-10-14 北京大学 Loop filtering method and apparatus

Similar Documents

Publication Publication Date Title
US11641465B2 (en) Method and apparatus of cross-component adaptive loop filtering with virtual boundary for video coding
US11909965B2 (en) Method and apparatus for non-linear adaptive loop filtering in video coding
US20230077218A1 (en) Filter shape switching
US11818343B2 (en) Sample offset with predefined filters
US11743507B2 (en) Method and apparatus for video filtering
US11595644B2 (en) Method and apparatus for offset in video filtering
US20240171736A1 (en) Method and apparatus for boundary handling in video coding
KR20220166354A (en) Method and Apparatus for Video Filtering
WO2024088003A1 (en) Method and apparatus of position-aware reconstruction in in-loop filtering
WO2024017200A1 (en) Method and apparatus for adaptive loop filter with tap constraints for video coding
WO2024016983A1 (en) Method and apparatus for adaptive loop filter with geometric transform for video coding
WO2024067188A1 (en) Method and apparatus for adaptive loop filter with chroma classifiers by transpose indexes for video coding
WO2024016981A1 (en) Method and apparatus for adaptive loop filter with chroma classifier for video coding
WO2024082946A1 (en) Method and apparatus of adaptive loop filter sub-shape selection for video coding
WO2024017010A1 (en) Method and apparatus for adaptive loop filter with alternative luma classifier for video coding
WO2024012167A1 (en) Method and apparatus for adaptive loop filter with non-local or high degree taps for video coding
WO2024082899A1 (en) Method and apparatus of adaptive loop filter selection for positional taps in video coding
WO2024055842A1 (en) Method and apparatus for adaptive loop filter with non-sample taps for video coding
WO2024012168A1 (en) Method and apparatus for adaptive loop filter with virtual boundaries and multiple sources for video coding
US10375392B2 (en) Video encoding apparatus, video encoding method, video decoding apparatus, and video decoding method
WO2024012576A1 (en) Adaptive loop filter with virtual boundaries and multiple sample sources
WO2024032725A1 (en) Adaptive loop filter with cascade filtering
WO2024016982A1 (en) Adaptive loop filter with adaptive filter strength
US20230421759A1 (en) Filter shape for sample offset
WO2024041658A1 (en) On sao and ccsao