CN117413516A

CN117413516A - Codec enhancement in cross-component sample adaptive offset

Info

Publication number: CN117413516A
Application number: CN202280038162.7A
Authority: CN
Inventors: 郭哲玮; 修晓宇; 陈伟; 王祥林; 陈漪纹; 朱弘正; 闫宁; 于冰
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-05-26
Filing date: 2022-05-26
Publication date: 2024-01-16

Abstract

An electronic device performs a method of decoding video data. The method comprises the following steps: receiving an Adaptive Parameter Set (APS) identifier from video data, the APS identifier associated with a plurality of previously used cross-component sample adaptive offset (CCSAO) filter offset sets stored in the APS; receiving a syntax in a Picture Header (PH) or a Slice Header (SH) from the video data, the syntax indicating an APS identifier for a current picture or slice, decoding a filter set index for a current Coding Tree Unit (CTU), the filter set index indicating a particular previously used CCSAO filter offset set of a plurality of offset sets associated with the APS identifier in the APS; and applying a particular set of previously used CCSAO filter offsets to the current CTU of video data.

Description

Codec enhancement in cross-component sample adaptive offset

RELATED APPLICATIONS

This application claims priority from U.S. provisional patent application No. 63/193,539 entitled "Cross-component Sample Adaptive Offset (Cross-component sample adaptive offset)" filed on month 5 of 2021 and U.S. provisional patent application No. 63/213,167 entitled "Cross-component Sample Adaptive Offset (Cross-component sample adaptive offset)" filed on month 21 of 2021, the entire contents of which are incorporated herein by reference.

Technical Field

The present application relates generally to video codec and compression, and more particularly, to a method and apparatus for improving luminance and chrominance codec efficiency.

Background

Various electronic devices support digital video, such as digital televisions, laptop or desktop computers, tablet computers, digital cameras, digital recording devices, digital media players, video game consoles, smart phones, video teleconferencing devices, video streaming devices, and the like. Electronic devices transmit, receive, encode, decode, and/or store digital video data by implementing video compression/decompression standards. Some well-known video codec standards include the general video codec (VVC), the high efficiency video codec (HEVC, also known as h.265 or MPEG-H Part 2) and the advanced video codec (AVC, also known as h.264 or MPEG-4Part 10), which are developed jointly by ISO/IEC MPEG and ITU-T VCEG. AOMediaVideo 1 (AV 1) was developed by the open media Alliance (AOM) as successor to its previous standard VP 9. Audio video codec (AVS) refers to digital audio and digital video compression standards, which is yet another series of video compression standards established by the audio video codec standards working group.

Video compression typically includes performing spatial (intra) prediction and/or temporal (inter) prediction to reduce or eliminate redundancy inherent in video data. For block-based video coding, a video frame is divided into one or more slices (slices), each slice having a plurality of video blocks, which may also be referred to as Coding Tree Units (CTUs). Each CTU may contain one Coding Unit (CU) or be progressively partitioned into smaller CUs until a predefined minimum CU size is reached. Each CU (also referred to as a leaf CU) contains one or more Transform Units (TUs), and each CU also contains one or more Prediction Units (PUs). Each CU may be encoded in intra mode, inter mode, or IBC mode. Video blocks in an intra-coded (I) slice of a video frame are coded using spatial prediction with respect to reference samples in neighboring blocks within the same video frame. Video blocks in inter-coded (P or B) slices of a video frame may use spatial prediction with respect to reference samples in neighboring blocks within the same video frame, or with respect to reference samples in other previous and/or future reference video frames.

A prediction block of a current video block to be encoded is generated based on spatial prediction or temporal prediction of a reference block (e.g., a neighboring block) that has been previously encoded. The process of finding the reference block may be accomplished by a block matching algorithm. Residual data representing pixel differences between a current block to be encoded and a prediction block is referred to as a residual block or prediction error. Inter-coded blocks are encoded according to motion vectors pointing to reference blocks in the reference frames forming the prediction block, as well as the residual block. The process of determining motion vectors is commonly referred to as motion estimation. The intra-coded block is coded according to an intra-prediction mode and a residual block. For further compression, the residual block is transformed from the pixel domain to a transform domain (e.g., the frequency domain), resulting in residual transform coefficients, which can then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, may be scanned to produce a one-dimensional vector of transform coefficients, which are then entropy encoded into the video bitstream to achieve more compression.

The encoded video bitstream is then saved in a computer readable storage medium (e.g., flash memory) for access by another electronic device having digital video capabilities, or transmitted directly to the electronic device, either wired or wirelessly. The electronic device then performs video decompression (which is the reverse of the video compression described above) by, for example, parsing the encoded video bitstream to obtain syntax elements from the bitstream, and reconstructing the digital video data from the encoded video bitstream to its original format based at least in part on the syntax elements obtained from the bitstream, and rendering the reconstructed digital video data on a display of the electronic device.

As digital video quality changes from high definition to 4Kx2K or even 8Kx4K, the amount of video data to be encoded/decoded grows exponentially. It is a continuing challenge to be able to more efficiently encode/decode video data while maintaining the image quality of the decoded video data.

Disclosure of Invention

Embodiments are described relating to video data encoding and decoding, and more particularly, to methods and apparatus for improving the codec efficiency of luminance and chrominance components, including improving the codec efficiency by exploring the cross-component relationship between the luminance and chrominance components.

According to a first aspect of the present application, a method of decoding video data, comprises: receiving an Adaptive Parameter Set (APS) identifier from the video data, the APS identifier associated with a plurality of previously used cross-component sample adaptive offset (CCSAO) filter offset sets stored in the APS; receiving a syntax in a Picture Header (PH) or a Slice Header (SH) from the video data, the syntax indicating the APS identifier for a current picture or slice; decoding a filter set index for a current Coding Tree Unit (CTU), the filter set index indicating a particular previously used CCSAO filter offset set of a plurality of offset sets of the APS associated with the APS identifier; and applying the particular previously used CCSAO filter offset set to the current CTU of the video data

In some embodiments, the video data includes a first component and a second component, and the particular set of previously used CCSAO filter offsets is obtained by: determining a respective classifier for the second component from a set of one or more samples of the first component associated with a respective sample of the second component; determining respective sample offsets for respective samples of the second component according to the respective classifiers to modify the respective samples of the second component based on the determined respective sample offsets; and storing the set of corresponding determined respective sample offsets for each respective classifier as the particular previously used CCSAO filter offset set.

According to a second aspect of the present application, an electronic device includes one or more processing units, a memory, and a plurality of programs stored in the memory. The program, when executed by one or more processing units, causes an electronic device to perform a method of encoding and decoding video data as described above.

According to a third aspect of the present application, a non-transitory computer readable storage medium stores a plurality of programs for execution by an electronic device having one or more processing units. The program, when executed by one or more processing units, causes an electronic device to perform a method of encoding and decoding video data as described above.

According to a fourth aspect of the present application, a computer-readable storage medium has stored therein a bitstream including video information generated by the video encoding and decoding method described above.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification, illustrate embodiments and together with the description serve to explain the principles. Like reference numerals designate corresponding parts.

Fig. 1 is a block diagram illustrating an exemplary video encoding and decoding system according to some embodiments of the present disclosure.

Fig. 2 is a block diagram illustrating an exemplary video encoder according to some embodiments of the present disclosure.

Fig. 3 is a block diagram illustrating an exemplary video decoder according to some embodiments of the present disclosure.

Fig. 4A-4E are block diagrams illustrating how a frame is progressively partitioned into multiple video blocks of different sizes and shapes according to some embodiments of the present disclosure.

Fig. 5A is a block diagram depicting four gradient modes used in Sample Adaptive Offset (SAO) according to some embodiments of the present disclosure.

Fig. 5B is a block diagram depicting a naming convention for samples around a center sample, in accordance with some embodiments of the present disclosure.

Fig. 6A is a block diagram illustrating a system and process of CCSAO applied to chroma samples and using dbfy as input in accordance with some embodiments of the present disclosure.

Fig. 6B is a block diagram illustrating a system and process for CCSAO applied to luminance and chrominance samples and using DBF Y/Cb/Cr as input according to some embodiments of the present disclosure.

Fig. 6C is a block diagram illustrating a system and process of CCSAO that may operate independently in accordance with some embodiments of the present disclosure.

Fig. 6D is a block diagram illustrating a system and process of CCSAO that may be progressively applied (2 or N times) with the same or different offsets, according to some embodiments of the present disclosure.

Fig. 6E is a block diagram illustrating a system and process of CCSAO applied in parallel with Enhanced Sample Adaptive Offset (ESAO) in the AVS standard, according to some embodiments of the present disclosure.

Fig. 6F is a block diagram illustrating a system and process of CCSAO applied after SAO according to some embodiments of the present disclosure.

FIG. 6G is a block diagram illustrating a system and process of CCSAO that may operate independently without CCALF, according to some embodiments of the present disclosure.

Fig. 6H is a block diagram illustrating a system and process of CCSAO applied in parallel with a cross-component adaptive loop filter (CCALF) according to some embodiments of the present disclosure.

Fig. 6I is a block diagram illustrating a system and process of CCSAO applied in parallel with SAO and BIF according to some embodiments of the present disclosure.

Fig. 6J is a block diagram illustrating a system and process of CCSAO applied in parallel with BIF by replacing SAO according to some embodiments of the present disclosure.

Fig. 7 is a block diagram illustrating a sample process using CCSAO according to some embodiments of the present disclosure.

Fig. 8 is a block diagram illustrating a CCSAO process being interleaved to vertical and horizontal deblocking filters (DBFs) according to some embodiments of the present disclosure.

Fig. 9 is a flowchart illustrating an exemplary process of decoding a video signal using cross-component correlation according to some embodiments of the present disclosure.

Fig. 10A is a block diagram illustrating a classifier using different luma (or chroma) sample positions for C0 classification according to some embodiments of the present disclosure.

Fig. 10B illustrates some examples of different shapes for luminance candidates according to some embodiments of the present disclosure.

Fig. 11 is a block diagram illustrating a sample process in which all co-located and adjacent luminance/chrominance samples may be fed into a CCSAO classification according to some embodiments of the present disclosure.

Fig. 12 illustrates an exemplary classifier according to some embodiments of the present disclosure by replacing co-located luminance sample values with values obtained by weighting co-located and adjacent luminance samples.

Fig. 13 is a block diagram illustrating the application of CCSAO with other loop filters having different clipping combinations according to some embodiments of the present disclosure.

Fig. 14A is a block diagram illustrating that CCSAO is not applied to a current luminance (luma) sample if any of co-located and adjacent luminance (luma) samples used for classification are outside of the current picture, according to some embodiments of the present disclosure.

Fig. 14B is a block diagram illustrating the application of CCSAO to a current luma or chroma sample if any of the co-located and adjacent luma or chroma samples used for classification are outside of the current picture, according to some embodiments of the present disclosure.

Fig. 14C is a block diagram illustrating that CCSAO is not applied to a current chroma sample if a corresponding selected co-located or neighboring luma sample for classification is outside of a virtual space defined by a Virtual Boundary (VB) according to some embodiments of the present disclosure.

Fig. 15 illustrates applying a repeating or mirrored fill to luminance samples outside of a virtual boundary, according to some embodiments of the present disclosure.

Fig. 16 illustrates that an additional 1 luma row buffer is required if all 9 co-located neighboring luma samples are used for classification according to some embodiments of the present disclosure.

Fig. 17 shows a diagram in AVS in which 9 luminance candidates CCSAO may be added 2 additional luminance line buffers beyond VB, according to some embodiments of the present disclosure.

Fig. 18A shows a diagram in a VVC in which 9 luminance candidates CCSAO may be added with 1 additional luminance line buffer beyond VB, according to some embodiments of the present disclosure.

Fig. 18B shows a diagram when co-located or adjacent chroma samples are used to classify a current luma sample, where the selected chroma candidates may span VB and require additional chroma line buffers, according to some embodiments of the present disclosure.

Fig. 19A-19C illustrate that in AVS and VVC, CCSAO is disabled for a chroma sample if any of the luma candidates of the chroma sample spans VB (outside of the current chroma sample VB), according to some embodiments of the present disclosure.

Fig. 20A-20C illustrate that in AVS and VVC, CCSAO is enabled for chroma samples using repeated padding if any of the luma candidates of the chroma samples spans VB (outside of the current chroma sample VB), according to some embodiments of the present disclosure.

Fig. 21A-21C illustrate that in AVS and VVC, if any of the luma candidates of a chroma sample spans VB (outside of the current chroma sample VB), then CCSAO is enabled for the chroma sample using mirrored padding, according to some embodiments of the present disclosure.

FIGS. 22A-22B illustrate the use of bilateral symmetry filling to enable CCSAO for different CCSAO sample shapes, according to some embodiments of the present disclosure.

Fig. 23 illustrates limitations of classifying using a limited number of luminance candidates according to some embodiments of the present disclosure.

Fig. 24 illustrates CCSAO application area misalignment Coding Tree Block (CTB)/Coding Tree Unit (CTU) boundaries, according to some embodiments of the present disclosure.

FIG. 25 illustrates that CCSAO application area frame partitions may be fixed with CCSAO parameters, according to some embodiments of the present disclosure.

Fig. 26 illustrates that CCSAO application areas may be partitioned from a frame/stripe/CTB level Binary Tree (BT)/Quadtree (QT)/Trigeminal Tree (TT) according to some embodiments of the present disclosure.

Fig. 27 is a block diagram illustrating multiple classifiers used and switched at different levels within a picture frame according to some embodiments of the present disclosure.

Fig. 28 is a block diagram illustrating that CCSAO application area partitions may be dynamic and switch at the image level according to some embodiments of the present disclosure.

Fig. 29 is a diagram illustrating that a CCSAO classifier may consider current or cross-component coding information according to some embodiments of the present disclosure.

Fig. 30 is a block diagram illustrating a SAO classification method disclosed in the present disclosure as a post-prediction filter according to some embodiments of the present disclosure.

Fig. 31 is a block diagram illustrating that each component may be classified using a current sample and neighboring samples for a post-prediction SAO filter according to some embodiments of the present disclosure.

Fig. 32 is a block diagram illustrating the SAO classification method disclosed in the present disclosure as a post-reconstruction filter according to some embodiments of the present disclosure.

Fig. 33 is a flowchart illustrating an exemplary process of decoding a video signal using cross-component correlation according to some embodiments of the present disclosure.

FIG. 34 is a diagram illustrating a computing environment coupled with a user interface according to some embodiments of the present disclosure.

Detailed Description

Reference will now be made in detail to the specific embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous non-limiting specific details are set forth to provide an understanding of the subject matter presented herein. It will be apparent, however, to one skilled in the art that various alternatives can be used without departing from the scope of the claims, and the subject matter can be practiced without these specific details. For example, it will be apparent to one of ordinary skill in the art that the subject matter presented herein may be implemented on a variety of types of electronic devices having digital video capabilities.

The first generation AVS standard comprises China national Standard information technology, advanced audio and video coding and decoding, part 2: video (Information Technology, advanced AudioVideo Coding, part 2: video) "(referred to as AVS 1) and" information technology, advanced audio video codec, part 16: broadcast television video (Information Technology, advanced AudioVideo Coding Part 16:Radio TelevisionVideo "(referred to as avs+), the first generation AVS standard can save about 50% of the bit rate compared to the MPEG-2 standard at the same perceived quality, the second generation AVS standard comprises the chinese national standard" information technology, "a high efficiency multimedia codec (Information Technology, efficient Multimedia Coding)" family (referred to as AVS 2) that is primarily responsible for transmission of ultra-high definition television programs, AVS2 has twice the coding efficiency of avs+ at the same time, AVS2 standard video part was submitted by the Institute of Electrical and Electronics Engineers (IEEE), as an international standard application, AVS3 standard is a new generation video coding standard for UHD video applications, intended to exceed the coding efficiency of the latest international standard HEVC, providing about 30% of the bit rate savings over the HEVC standard, 2019 months, AVS3-P2 baseline is completed at 68 th AVS conference, providing about 30% of the bit rate savings over the HEVC standard, currently, a reference software called a High Performance Model (HPM) is used by the AVS group of maintenance standards for building a hybrid frame of video coding and decoding standards over the standard of the present invention.

Fig. 1 is a block diagram illustrating an exemplary system 10 for encoding and decoding video blocks in parallel according to some embodiments of the present disclosure. As shown in fig. 1, the system 10 includes a source device 12 that generates and encodes video data for later decoding by a destination device 14. Source device 12 and destination device 14 may comprise any of a variety of electronic devices including desktop or laptop computers, tablet computers, smart phones, set-top boxes, digital televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, and the like. In some implementations, the source device 12 and the destination device 14 are equipped with wireless communication capabilities.

In some implementations, destination device 14 may receive encoded video data to be decoded via link 16. Link 16 may comprise any type of communication medium or device capable of moving encoded video data from source device 12 to destination device 14. In one example, link 16 may include a communication medium to enable source device 12 to transmit encoded video data directly to destination device 14 in real-time. The encoded video data may be modulated in accordance with a communication standard, such as a wireless communication protocol, and transmitted to destination device 14. The communication medium may include any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the internet. The communication medium may include routers, switches, base stations, or any other device that may help facilitate communication from source device 12 to destination device 14.

In some other implementations, encoded video data may be transferred from output interface 22 to storage device 32. Destination device 14 may then access the encoded video data in storage device 32 via input interface 28. Storage device 32 may include any of a variety of distributed data storage media or locally accessed data storage media such as a hard drive, blu-ray disc, DVD, CD-ROM, flash memory, volatile memory or nonvolatile memory, or any other suitable digital storage media for storing encoded video data. In further examples, storage device 32 may correspond to a file server or another intermediate storage device that may hold encoded video data generated by source device 12. Destination device 14 may access stored video data from storage device 32 via streaming or download. The file server may be any type of computer capable of storing encoded video data and transmitting the encoded video data to destination device 14. Exemplary file servers include web servers (e.g., for web sites), FTP servers, network Attached Storage (NAS) devices, or local disk drives. The destination device 14 may access the encoded video data over any standard data connection, including a wireless channel (e.g., wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both suitable for accessing the encoded video data stored on a file server. The transmission of encoded video data from storage device 32 may be a streaming transmission, a download transmission, or a combination of both.

As shown in fig. 1, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. Video source 18 may include a source such as a video capture device (e.g., a video camera), a video archive containing previously captured video, a video feed interface for receiving video from a video content provider, and/or a computer graphics system for generating computer graphics data as source video, or a combination of these sources. As one example, if video source 18 is a video camera of a security surveillance system, source device 12 and destination device 14 may form a camera phone or video phone. However, the embodiments described herein may be applicable to video codecs in general, and may be applied to wireless and/or wired applications.

The captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video data may be transmitted directly to destination device 14 via output interface 22 of source device 12. The encoded video data may also (or alternatively) be stored on the storage device 32 for later access by the destination device 14 or other devices for decoding and/or playback. Output interface 22 may also include a modem and/or a transmitter.

Destination device 14 includes an input interface 28, a video decoder 30, and a display device 34. Input interface 28 may include a receiver and/or modem and receives encoded video data over link 16. The encoded video data transmitted over link 16 or provided on storage device 32 may include various syntax elements that are generated by video encoder 20 for use by video decoder 30 in decoding the video data. Such syntax elements may be contained within encoded video data transmitted over a communication medium, stored on a storage medium, or stored in a file server.

In some implementations, the destination device 14 may include a display device 34, and the display device 34 may be an integrated display device and an external display device configured to communicate with the destination device 14. The display device 34 displays the decoded video data to a user and may comprise any of a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.

Video encoder 20 and video decoder 30 may operate in accordance with proprietary standards or industry standards such as VVC, HEVC, MPEG-4, part 10, advanced Video Codec (AVC), AVS, or extensions to such standards. It should be understood that the present application is not limited to a particular video encoding/decoding standard and may be applicable to other video encoding/decoding standards. It is generally contemplated that video encoder 20 of source device 12 may be configured to encode video data according to any of these current or future standards. Similarly, it is also generally contemplated that the video decoder 30 of the destination device 14 may be configured to decode video data according to any of these current or future standards.

Video encoder 20 and video decoder 30 may each be implemented as any of a number of suitable encoder circuits, such as, for example, one or more microprocessors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. When implemented in part in software, the electronic device can store instructions of the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the video encoding/decoding operations disclosed in the present disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, any of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device.

Fig. 2 is a block diagram illustrating an exemplary video encoder 20 according to some embodiments described in this application. Video encoder 20 may perform intra-prediction encoding and inter-prediction encoding of video blocks within video frames. Intra-prediction encoding relies on spatial prediction to reduce or eliminate spatial redundancy in video data within a given video frame or picture. Inter-prediction encoding relies on temporal prediction to reduce or eliminate temporal redundancy in video data within adjacent video frames or pictures of a video sequence.

As shown in fig. 2, video encoder 20 includes a video data memory 40, a prediction processing unit 41, a Decoded Picture Buffer (DPB) 64, an adder 50, a transform processing unit 52, a quantization unit 54, and an entropy encoding unit 56. The prediction processing unit 41 further includes a motion estimation unit 42, a motion compensation unit 44, a partition unit 45, an intra prediction processing unit 46, and an intra Block Copy (BC) unit 48. In some implementations, video encoder 20 also includes an inverse quantization unit 58, an inverse transform processing unit 60, and an adder 62 for video block reconstruction. A loop filter 63, e.g., a deblocking filter, may be located between adder 62 and DPB 64 to filter block boundaries to remove blocking artifacts from the reconstructed video. In addition to the deblocking filter, another loop filter 63 may be used to filter the output of adder 62. Further loop filters 63, e.g. Sample Adaptive Offset (SAO) and Adaptive Loop Filter (ALF), may be applied to the reconstructed CU, which is then put into a reference picture store and used as a reference for encoding and decoding future video blocks. Video encoder 20 may take the form of fixed hardware units or programmable hardware units, or may be split between one or more of the fixed hardware units or programmable hardware units illustrated.

Video data memory 40 may store video data to be encoded by components of video encoder 20. The video data in video data store 40 may be obtained, for example, from video source 18. DPB 64 is a buffer that stores reference video data for encoding video data by video encoder 20 (e.g., in intra-prediction encoding mode or inter-prediction encoding mode). Video data memory 40 and DPB 64 may be formed from any of a variety of memory devices. In various examples, video data memory 40 may be on-chip with other components of video encoder 20, or off-chip with respect to those components.

As shown in fig. 2, after receiving video data, a partition unit 45 within the prediction processing unit 41 partitions the video data into video blocks. The partitioning may also include partitioning the video frame into slices, tiles (tiles), or other larger Coding Units (CUs) according to a predefined partitioning structure, such as a quadtree structure, associated with the video data. A video frame may be split into multiple video blocks (or a set of video blocks called tiles). The prediction processing unit 41 may select one of a plurality of possible prediction coding modes, for example, one of a plurality of intra prediction coding modes or one of a plurality of inter prediction coding modes, for the current video block based on the error result (e.g., coding rate and distortion level). The prediction processing unit 41 may provide the generated intra-or inter-prediction encoded blocks to the adder 50 to generate residual blocks and to the adder 62 to reconstruct the encoded blocks for subsequent use as part of a reference frame. Prediction processing unit 41 also provides syntax elements, such as motion vectors, intra mode indicators, partition information, and other such syntax information, to entropy encoding unit 56.

To select an appropriate intra-prediction encoding mode for the current video block, intra-prediction processing unit 46 within prediction processing unit 41 may perform intra-prediction encoding of the current video block with respect to one or more neighboring blocks in the same frame as the current block to be encoded to provide spatial prediction. Motion estimation unit 42 and motion compensation unit 44 within prediction processing unit 41 perform inter-prediction encoding of the current video block relative to one or more prediction blocks in one or more reference frames to provide temporal prediction. Video encoder 20 may perform a plurality of encoding processes (multiple coding passes) to, for example, select an appropriate encoding mode for each block of video data.

In some embodiments, motion estimation unit 42 determines the inter-prediction mode for the current video frame by generating a motion vector that indicates a displacement of a Prediction Unit (PU) of a video block within the current video frame relative to a prediction block within a reference video frame, according to a predetermined mode within the sequence of video frames. The motion estimation performed by the motion estimation unit 42 is the process of generating motion vectors that estimate the motion of the video block. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a predicted block within a reference frame (or other coded unit), relative to a current block being coded within the current frame (or other coded unit). The predetermined pattern may designate video frames in the sequence as P-frames or B-frames. The intra BC unit 48 may determine the vector (e.g., block vector) for intra BC encoding in a manner similar to the motion vector determined by the motion estimation unit 42 for inter prediction, or may utilize the motion estimation unit 42 to determine the block vector.

A prediction block is a block of a reference frame that is considered to closely match the PU of the video block to be encoded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), sum of Squared Differences (SSD), or other difference metric. In some implementations, video encoder 20 may calculate values for sub-integer pixel positions of reference frames stored in DPB 64. For example, video encoder 20 may interpolate values for one-quarter pixel positions, one-eighth pixel positions, or other fractional pixel positions of the reference frame. Accordingly, the motion estimation unit 42 may perform a motion search with respect to the full pixel position and the fractional pixel position, and output a motion vector having fractional pixel accuracy.

Motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter prediction encoded frame by comparing the location of the PU with a location of a prediction block of a reference frame selected from a first reference frame list (list 0) or a second reference frame list (list 1), each of list 0 and list 1 identifying one or more reference frames stored in DPB 64. The motion estimation unit 42 sends the calculated motion vector to the motion compensation unit 44 and then to the entropy encoding unit 56.

The motion compensation performed by motion compensation unit 44 may involve fetching or generating a prediction block based on the motion vector determined by motion estimation unit 42. Upon receiving the motion vector of the PU of the current video block, motion compensation unit 44 may locate the prediction block to which the motion vector points in one of the reference frame lists, retrieve the prediction block from DPB 64, and forward the prediction block to adder 50. Adder 50 then forms a residual video block of pixel differences by subtracting the pixel values of the prediction block provided by motion compensation unit 44 from the pixel values of the current video block being encoded. The pixel differences forming the residual video block may include a luma difference component or a chroma difference component or both. Motion compensation unit 44 may also generate syntax elements associated with the video blocks of the video frames for use by video decoder 30 in decoding the video blocks of the video frames. The syntax elements may include, for example, syntax elements defining motion vectors used to identify the prediction block, any flags indicating the prediction mode, or any other syntax information described herein. Note that the motion estimation unit 42 and the motion compensation unit 44 may be highly integrated, but are shown separately for conceptual purposes.

In some implementations, intra BC unit 48 may generate the vector and retrieve the prediction block in a manner similar to that described above in connection with motion estimation unit 42 and motion compensation unit 44, but with the prediction block in the same frame as the current block being encoded, and the vector is referred to as a block vector rather than a motion vector. In particular, the intra BC unit 48 may determine an intra prediction mode for encoding the current block. In some examples, intra BC unit 48 may encode the current block using various intra prediction modes, e.g., during separate encoding processes, and test the performance of these intra prediction modes by rate-distortion analysis. Next, intra BC unit 48 may select an appropriate intra prediction mode among the various tested intra prediction modes to use and generate an intra mode indicator accordingly. For example, the intra BC unit 48 may calculate rate distortion values using rate distortion analysis for various tested intra prediction modes, and select the intra prediction mode having the best rate distortion characteristics among the tested modes as the appropriate intra prediction mode to use. Rate-distortion analysis typically determines the amount of distortion (or error) between an encoded block and the original, unencoded block that was encoded to produce the encoded block, and determines the bit rate (i.e., the number of bits) used to produce the encoded block. The intra BC unit 48 may calculate ratios from the distortion and rate of the various encoded blocks to determine which intra prediction mode exhibits the best rate distortion value for the block.

In other examples, intra BC unit 48 may use, in whole or in part, motion estimation unit 42 and motion compensation unit 44 to perform such functions for intra BC prediction in accordance with embodiments described herein. In either case, for intra block copying, the prediction block may be a block deemed to closely match the block to be encoded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), sum of Squared Differences (SSD), or other difference metric, and the identification of the prediction block may include calculating a value of sub-integer pixel locations.

Whether the prediction block is from the same frame according to intra prediction or from a different frame according to inter prediction, video encoder 20 may form the residual video block by subtracting the pixel values of the prediction block from the pixel values of the current video block being encoded, which form pixel differences. The pixel differences forming the residual video block may include both a luma component difference and a chroma component difference.

As described above, the intra-prediction processing unit 46 may perform intra-prediction on the current video block as an alternative to inter-prediction performed by the motion estimation unit 42 and the motion compensation unit 44 or intra-block copy prediction performed by the intra BC unit 48. In particular, intra-prediction processing unit 46 may determine an intra-prediction mode to be used for encoding the current block. To this end, intra-prediction processing unit 46 may encode the current block using various intra-prediction modes, e.g., during a separate encoding process, and intra-prediction processing unit 46 (or a mode selection unit in some examples) may select an appropriate intra-prediction mode from among the tested intra-prediction modes for use. Intra-prediction processing unit 46 may provide entropy encoding unit 56 with information indicating the selected intra-prediction mode for the block. Entropy encoding unit 56 may encode information into the bitstream that indicates the selected intra-prediction mode.

After the prediction processing unit 41 determines the prediction block of the current video block via inter prediction or intra prediction, the adder 50 forms a residual video block by subtracting the prediction block from the current video block. The residual video data in the residual block may be contained in one or more Transform Units (TUs) and provided to transform processing unit 52. The transform processing unit 52 transforms the residual video data into residual transform coefficients using a transform such as a Discrete Cosine Transform (DCT) or a conceptually similar transform.

The transform processing unit 52 may transmit the generated transform coefficient to the quantization unit 54. The quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may also reduce the bit depth associated with some or all of the coefficients. The quantization level may be modified by adjusting the quantization parameter. In some examples, quantization unit 54 may then perform a scan of a matrix including quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform the scan.

After quantization, entropy encoding unit 56 entropy encodes the quantized transform coefficients into a video bitstream using, for example, context Adaptive Variable Length Coding (CAVLC), context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability Interval Partitioning Entropy (PIPE) coding, or another entropy encoding method or technique. The encoded bitstream may then be sent to video decoder 30 or archived in storage device 32 for later transmission to video decoder 30 or retrieval by video decoder 30. Entropy encoding unit 56 may also entropy encode the motion vector and other syntax elements of the current video frame being encoded.

Inverse quantization unit 58 and inverse transform processing unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual video block in the pixel domain to generate a reference block for predicting other video blocks. As noted above, motion compensation unit 44 may generate motion compensated prediction blocks from one or more reference blocks of frames stored in DPB 64. Motion compensation unit 44 may also apply one or more interpolation filters to the prediction block to calculate sub-integer pixel values for motion estimation.

Adder 62 adds the reconstructed residual block to the motion compensated prediction block generated by motion compensation unit 44 to generate a reference block for storage in DPB 64. The reference block may then be used as a prediction block by the intra BC unit 48, the motion estimation unit 42, and the motion compensation unit 44 to inter-predict another video block in a subsequent video frame.

Fig. 3 is a block diagram illustrating an exemplary video decoder 30 according to some embodiments of the present application. Video decoder 30 includes video data memory 79, entropy decoding unit 80, prediction processing unit 81, inverse quantization unit 86, inverse transform processing unit 88, adder 90, and DPB 92. The prediction processing unit 81 further includes a motion compensation unit 82, an intra prediction processing unit 84, and an intra BC unit 85. Video decoder 30 may perform a decoding process that is generally inverse to the encoding process described above in connection with fig. 2 with respect to video encoder 20. For example, the motion compensation unit 82 may generate prediction data based on the motion vector received from the entropy decoding unit 80, and the intra prediction unit 84 may generate prediction data based on the intra prediction mode indicator received from the entropy decoding unit 80.

In some examples, the task of the units of video decoder 30 may be to perform embodiments of the present application. Further, in some examples, embodiments of the present disclosure may be split among one or more of the units of video decoder 30. For example, the intra BC unit 85 may perform embodiments of the present application alone or in combination with other units of the video decoder 30 (e.g., the motion compensation unit 82, the intra prediction processing unit 84, and the entropy decoding unit 80). In some examples, video decoder 30 may not include intra BC unit 85, and the functions of intra BC unit 85 may be performed by other components of prediction processing unit 81 (e.g., motion compensation unit 82).

Video data memory 79 may store video data (e.g., an encoded video bitstream) for decoding by other components of video decoder 30. The video data stored in the video data memory 79 may be obtained, for example, from the storage device 32, from a local video source (e.g., a camera), via wired or wireless network communication of video data, or by accessing a physical data storage medium (e.g., a flash drive or hard disk). The video data memory 79 may include an encoded picture buffer (CPB) that stores encoded video data from an encoded video bitstream. Decoded Picture Buffer (DPB) 92 of video decoder 30 stores reference video data for use in decoding video data by video decoder 30, e.g., in an intra-prediction encoding mode or an inter-prediction encoding mode. Video data memory 79 and DPB 92 may be formed from any of a variety of memory devices such as Dynamic Random Access Memory (DRAM), including Synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. For purposes of illustration, video data memory 79 and DPB 92 are depicted in fig. 3 as two different components of video decoder 30. It will be apparent to those skilled in the art that video data memory 79 and DPB 92 may be provided by the same memory device or separate memory devices. In some examples, video data memory 79 may be on-chip with other groups of video decoder 30, or off-chip with respect to those components.

During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks of encoded video frames and associated syntax elements. Video decoder 30 may receive syntax elements at the video frame level and/or the video block level. Entropy decoding unit 80 of video decoder 30 entropy decodes the bitstream to generate quantization coefficients, motion vectors, or intra-prediction mode indicators, as well as other syntax elements. Entropy decoding unit 80 then forwards the motion vectors and other syntax elements to prediction processing unit 81.

When a video frame is encoded as an intra-prediction encoded (I) frame or for an intra-encoded prediction block in another type of frame, the intra-prediction processing unit 84 of the prediction processing unit 81 may generate prediction data of a video block of the current video frame based on the signaled intra-prediction mode and reference data from a previously decoded block of the current frame.

When a video frame is encoded as an inter-prediction encoded (i.e., B or P) frame, the motion compensation unit 82 of the prediction processing unit 81 generates one or more prediction blocks of the video block of the current video frame based on the motion vector and other syntax elements received from the entropy decoding unit 80. Each of the prediction blocks may be generated from a reference frame within one of the reference frame lists. Video decoder 30 may construct reference frame lists, list 0 and list 1, using default construction techniques based on the reference frames stored in DPB 92.

In some examples, when a video block is encoded according to the intra BC mode described herein, intra BC unit 85 of prediction processing unit 81 generates a predicted block for the current video block based on the block vector and other syntax elements received from entropy decoding unit 80. The prediction block may be within a reconstructed region of the same picture as the current video block defined by video encoder 20.

The motion compensation unit 82 and/or the intra BC unit 85 determine prediction information for the video block of the current video frame by parsing the motion vector and other syntax elements, and then use the prediction information to generate a prediction block for the current video block being decoded. For example, motion compensation unit 82 uses some of the received syntax elements to determine a prediction mode (e.g., intra-prediction or inter-prediction) for encoding a video block of a video frame, an inter-prediction frame type (e.g., B or P), construction information for one or more of a reference frame list of frames, a motion vector for each inter-prediction encoded video block of a frame, an inter-prediction state for each inter-prediction encoded video block of a frame, and other information for decoding a video block in a current video frame.

Similarly, the intra BC unit 85 may use some of the received syntax elements (e.g., flags) to determine which video blocks of the frame are predicted using the intra BC mode, the build information of which video blocks of the frame are within the reconstruction region and should be stored in the DPB 92, the block vector of each intra BC predicted video block of the frame, the intra BC prediction status of each intra BC predicted video block of the frame, and other information for decoding video blocks in the current video frame.

The motion compensation unit 82 may also perform interpolation using interpolation filters used by the video encoder 20 during encoding of the video block to calculate interpolated values for sub-integer pixels of the reference block. In this case, the motion compensation unit 82 may determine an interpolation filter used by the video encoder 20 according to the received syntax element, and generate a prediction block using the interpolation filter.

The inverse quantization unit 86 inversely quantizes quantized transform coefficients, which are provided in the bitstream and entropy decoded by the entropy decoding unit 80, using the same quantization parameter calculated by the video encoder 20 for each video block in the video frame to determine the degree of quantization. The inverse transform processing unit 88 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to reconstruct the residual block in the pixel domain.

After the motion compensation unit 82 or the intra BC unit 85 generates a prediction block for the current video block based on the vector and other syntax elements, the adder 90 reconstructs a decoded video block for the current video block by summing the residual block from the inverse transform processing unit 88 and the corresponding prediction block generated by the motion compensation unit 82 and the intra BC unit 85. Loop filter 91 may be located between adder 90 and DPB 92 to further process the decoded video blocks. Loop filtering 91, e.g., deblocking filters, sample Adaptive Offset (SAO), and Adaptive Loop Filters (ALF), may be applied to the reconstructed CU, which is then placed in the reference picture store. The decoded video blocks in a given frame are then stored in DPB 92, which DPB 92 stores reference frames for subsequent motion compensation of the next video block. DPB 92 or a memory device separate from DPB 92 may also store decoded video for later presentation on a display device (e.g., display device 34 of fig. 1).

In a typical video codec process, a video sequence generally includes an ordered set of frames or pictures. Each frame may include three sample arrays, denoted SL, SCb, and SCr. SL is a two-dimensional array of luminance samples. SCb is a two-dimensional array of Cb chroma samples. SCr is a two-dimensional array of Cr chroma samples. In other cases, the frame may be monochromatic, and thus include only one two-dimensional array of luminance samples.

As with HEVC, the AVS3 standard builds on top of a block-based hybrid video codec framework. The input video signal is processed block by block (referred to as a Coding Unit (CU)). Unlike HEVC, which partitions blocks based on quadtree alone, in AVS3 one Coding Tree Unit (CTU) is split into CUs to accommodate different local characteristics based on quadtree/binary tree/extended quadtree. Furthermore, the concept of multi-partition unit types in HEVC, i.e., no separation of CUs, prediction Units (PUs), and Transform Units (TUs) is present in AVS3, is removed. Instead, each CU is always used as a base unit for prediction and transformation, without further partitioning. In the tree partitioning structure of AVS3, one CTU is first partitioned based on a quadtree structure. Each quadtree leaf node may then be further partitioned based on the binary tree and the extended quadtree structure.

As shown in fig. 4A, video encoder 20 (or more specifically partition unit 45) generates an encoded representation of a frame by first partitioning the frame into a set of Coding Tree Units (CTUs). A video frame may include an integer number of CTUs ordered consecutively in raster scan order from left to right and top to bottom. Each CTU is the largest logical coding unit and the width and height of the CTU are signaled by video encoder 20 in the sequence parameter set such that all CTUs in the video sequence have the same size, i.e., one of 128 x 128, 64 x 64, 32 x 32, and 16 x 16. It should be noted that the present application is not necessarily limited to a particular size. As shown in fig. 4B, each CTU may include one Coding Tree Block (CTB) of luma samples, two corresponding coding tree blocks of chroma samples, and syntax elements for encoding samples of the coding tree blocks. Syntax elements describe the properties of the different types of units of the encoded pixel blocks and how the video sequence may be reconstructed at video decoder 30, including inter-or intra-prediction, intra-prediction modes, motion vectors, and other parameters. In a monochrome picture or a picture having three separate color planes, a CTU may comprise a single coding tree block and syntax elements for encoding samples of the coding tree block. The coding tree block may be a block of samples of NxN.

To achieve better performance, video encoder 20 may progressively perform tree partitioning on the coded tree blocks of CTUs, e.g., binary tree partitioning, trigeminal tree partitioning, quadtree partitioning, or a combination of both, and split the CTUs into smaller Coding Units (CUs). As depicted in fig. 4C, a 64x64 CTU 400 is first split into four smaller CUs, each having a block size of 32x 32. Among four smaller CUs, each of CUs 410 and 420 is split into four 16×16 CUs by block size. The two 16x16 CUs 430 and 440 are each further split into four 8x8 CUs by block size. Fig. 4D depicts a quadtree data structure showing the end result of the partitioning process of CTU 400 as depicted in fig. 4C, with each leaf node of the quadtree corresponding to a CU having a respective size ranging from 32x32 to 8x8. As with the CTU depicted in fig. 4B, each CU may include a Coding Block (CB) of a frame having a same size luma sample and two corresponding coding blocks of chroma samples, and syntax elements for coding the samples of the coding blocks. In a monochrome picture or a picture having three separate color planes, a CU may comprise a single coding block and syntax structures for encoding samples of the coding block. It should be noted that the quadtree partitions depicted in fig. 4C and 4D are for illustration purposes only, and that one CTU may be split into CUs to accommodate different local characteristics based on quadtree/trigeminal/binary tree partitions. In a multi-type tree structure, one CTU is partitioned by a quadtree structure, and each quadtree leaf CU may be further partitioned by a binary tree and a trigeminal tree structure. As shown in fig. 4E, there are five partition/partition types in AVS3, namely, quad-partition, horizontal binary partition, vertical binary partition, horizontal extended quadtree partition, and vertical extended quadtree partition.

In some implementations, video encoder 20 may further partition the coding blocks of the CU into one or more MxN Prediction Blocks (PB). The prediction block is a rectangular (square or non-square) block of samples on which the same (inter or intra) prediction is applied. A Prediction Unit (PU) of a CU may include a prediction block of luma samples, two corresponding prediction blocks of chroma samples, and syntax elements for predicting the prediction blocks. In a monochrome picture or a picture having three separate color planes, a PU may comprise a single prediction block and syntax structures for predicting the prediction block. Video encoder 20 may generate a predicted luma block, a Cb block, and a Cr block for the luma prediction block, the Cb prediction block, and the Cr prediction block of each PU of the CU.

Video encoder 20 may use intra-prediction or inter-prediction to generate the prediction block for the PU. If video encoder 20 uses intra-prediction to generate the prediction block of the PU, video encoder 20 may generate the prediction block of the PU based on decoded samples of the frame associated with the PU. If video encoder 20 uses inter prediction to generate the prediction block of the PU, video encoder 20 may generate the prediction block of the PU based on decoded samples of one or more frames other than the frame associated with the PU.

After video encoder 20 generates the predicted luma block, the Cb block, and the Cr block for one or more PUs of the CU, video encoder 20 may generate a luma residual block of the CU by subtracting the predicted luma block of the CU from its original luma coded block such that each sample in the luma residual block of the CU indicates a difference between a luma sample in one of the predicted luma blocks of the CU and a corresponding sample in the original luma coded block of the CU. Similarly, video encoder 20 may generate the Cb residual block and the Cr residual block of the CU, respectively, such that each sample in the Cb residual block of the CU indicates a difference between a Cb sample in one of the predicted Cb blocks of the CU and a corresponding sample in the original Cb encoded block of the CU, and each sample in the Cr residual block of the CU may indicate a difference between a Cr sample in one of the predicted Cr blocks of the CU and a corresponding sample in the original Cr encoded block of the CU.

Further, as shown in fig. 4C, video encoder 20 may use quadtree partitioning to decompose a luma residual block, a Cb residual block, and a Cr residual block of a CU into one or more luma transform blocks, cb transform blocks, and Cr transform blocks. A transform block is a rectangular (square or non-square) block of samples to which the same transform is applied. A Transform Unit (TU) of a CU may include a transform block of luma samples, two corresponding transform blocks of chroma samples, and syntax elements for transforming the transform block samples. Thus, each TU of a CU may be associated with a luma transform block, a Cb transform block, and a Cr transform block. In some examples, the luma transform block associated with a TU may be a sub-block of a luma residual block of a CU. The Cb transform block may be a sub-block of a Cb residual block of the CU. The Cr transform block may be a sub-block of a Cr residual block of the CU. In a monochrome picture or a picture having three separate color planes, a TU may comprise a single transform block and syntax structures for transforming the samples of the transform block.

Video encoder 20 may apply one or more transforms to the luma transform block of the TU to generate a luma coefficient block of the TU. The coefficient block may be a two-dimensional array of transform coefficients. The transform coefficients may be scalar quantities. Video encoder 20 may apply one or more transforms to the Cb transform block of the TU to generate a Cb coefficient block of the TU. Video encoder 20 may apply one or more transforms to the Cr transform blocks of the TUs to generate Cr coefficient blocks of the TUs.

After generating the coefficient block (e.g., the luma coefficient block, the Cb coefficient block, or the Cr coefficient block), video encoder 20 may quantize the coefficient block. Quantization generally refers to a process of quantizing transform coefficients to potentially reduce the amount of data used to represent the transform coefficients, thereby providing further compression. After video encoder 20 quantizes the coefficient blocks, video encoder 20 may entropy encode syntax elements that indicate the quantized transform coefficients. For example, video encoder 20 may perform context-adaptive binary arithmetic coding (CABAC) on syntax elements that indicate quantized transform coefficients. Finally, video encoder 20 may output a bitstream including a sequence of bits that form a representation of the encoded frames and associated data, which is stored in storage device 32 or transmitted to destination device 14.

Upon receiving the bitstream generated by video encoder 20, video decoder 30 may parse the bitstream to obtain syntax elements from the bitstream. Video decoder 30 may reconstruct the frames of video data based at least in part on the syntax elements obtained from the bitstream. The process of reconstructing video data is generally inverse to the encoding process performed by video encoder 20. For example, video decoder 30 may perform an inverse transform on the coefficient blocks associated with the TUs of the current CU to reconstruct residual blocks associated with the TUs of the current CU. Video decoder 30 also reconstructs the coding block of the current CU by adding samples of the prediction block of the PU of the current CU to corresponding samples of the transform block of the TU of the current CU. After reconstructing the encoded blocks for each CU of the frame, video decoder 30 may reconstruct the frame.

SAO is a process of modifying decoded samples by conditionally adding an offset value to each sample after the deblocking filter is applied, based on values in a look-up table transmitted by the encoder. SAO filtering is performed on a region basis based on a filtering type selected by the syntax element SAO-type-idx for each CTB. A value of 0 for SAO-type-idx indicates that the SAO filter is not applied to CTB, and values 1 and 2 indicate the use of band offset and edge offset filter types, respectively. In the band offset mode specified by sao-type-idx equal to 1, the offset value selected depends directly on the sample amplitude. In this mode, the entire sample amplitude range is split evenly into 32 segments, called bands, and sample values belonging to four of these bands (consecutive within 32 bands) are modified by adding a transmission value denoted as band offset, which may be positive or negative. The main reason for using four consecutive bands is that in smooth areas where banding artifacts may occur, the sample amplitudes in CTBs tend to concentrate in a few bands. Furthermore, the design choice of using four offsets is unified with the edge offset mode of operation, which also uses four offset values. In the edge offset mode specified by sao-type-idx equal to 2, syntax elements sao-eo-class having values from 0 to 3 signal whether one of horizontal, vertical, or two diagonal gradient directions is used for edge offset classification in CTB.

Fig. 5A is a block diagram depicting four gradient modes used in SAO according to some embodiments of the present disclosure. Four gradient patterns 502, 504, 506, and 508 are used for the corresponding sao-eo-class in the edge offset pattern. The sample labeled "p" represents the center sample to be considered. The two samples labeled "n0" and "n1" designate two adjacent samples along (a) a horizontal gradient pattern (sao-eo-class=0), (b) a vertical gradient pattern (sao-eo-class=1), (c) a 135 ° diagonal gradient pattern (sao-eo-class=2), and (d) a 45 ° gradient pattern (sao-eo-class=3) gradient pattern. By comparing the sample value p at a certain position with the values n0 and n1 of two samples at adjacent positions, each sample in the CTB is classified into one of five EdgeIdx categories, as shown in fig. 5A. This classification is done for each sample based on the decoded sample value, so the EdgeIdx classification does not require additional signaling. The offset value from the transmitted look-up table is added to the sample value for the EdgeIdx class from 1 to 4, depending on the EdgeIdx class of the sample location. The offset value is always positive for categories 1 and 2 and negative for categories 3 and 4. The filter typically has a smoothing effect in the edge offset mode. Table 1-1 below illustrates an example EdgeIdx class in the SAO edge class.

Table 1-1: sample EdgeIdx class in SAO edge class.

For SAO types 1 and 2, a total of four amplitude offset values are transmitted to the decoder for each CTB. For type 1, the symbol is also encoded. The offset value and associated syntax elements (e.g., sao-type-idx and sao-eo-class) are determined by the encoder-typically using criteria that optimize rate-distortion performance. The merge flag may be used to indicate that the SAO parameters are inherited from CTBs on the left or above to make signaling efficient. In summary, SAO is a nonlinear filtering operation that allows additional refinement to the reconstructed signal and that can enhance the signal representation around smooth regions and edges.

In some embodiments, a Pre-sample adaptive offset (Pre-SAO) is implemented. The codec performance of pre-SAO with low complexity is promising in future video codec standard developments. In some examples, pre-SAO is applied only to luminance component samples that are classified using luminance samples. The Pre-SAO operates by applying two SAO-like filtering operations, called SAOV and SAOH, and applying them in conjunction with a deblocking filter (DBF) prior to applying the existing (legacy) SAO. A first SAO-like filter SAOV applies SAO to the input picture Y after applying a vertical edge deblocking filter (DBFV) ₂ To operate.

Y ₃ (i)＝Clip1(Y ₂ ^` (i)+d ₁ ·(f(i)>T1：0)-d ₂ ·(f(i)<-T1：0))

Where T is a predetermined positive constant, d ₁ And d ₂ Is an offset coefficient associated with two classes, based on Y ₁ (i) And Y ₂ (i) The sample-by-sample difference between them is given by

f(i)＝Y ₁ (i)-Y ₂ (i)。

d ₁ Is given as taking all sample positions i, such that f (i)>T and d ₂ Is of the second class of (2) by f (i)<-T-giveAnd (5) outputting. Calculating an offset coefficient d at the encoder ₁ And d ₂ So that the output picture Y of SAOV is in the same way as in the existing SAO process ₃ The mean square error with the original picture X is minimized. After applying the SAOV, a second SAO-like filter SAOH operates to utilize the filter according to Y after applying the SAOV ₃ (i) And Y ₄ (i) Sample-by-sample difference between classifies output pictures of a horizontal edge deblocking filter (DBFH) to apply SAO to Y ₄ . The same procedure as for SAOV applies to SAOH with Y ₃ (i)-Y ₄ (i) Rather than Y ₁ (i)-Y ₂ (i) Classification is performed. Two offset coefficients, a predetermined threshold T, and an enable flag for each of SAOH and SAOV are signaled at the stripe level. SAOH and SAOV are applied independently to the luminance and two chrominance components.

In some cases, SAOV and SAOH operate only on picture samples that are affected by the corresponding deblocking (DBFV or DBFH). Thus, unlike existing SAO processes, pre-SAO processes only a subset of all samples in a given spatial region (picture or CTU in conventional SAO), keeping the resulting increase in the decoding side average operation per picture sample low (based on preliminary estimates, in worst case, two or three comparisons and two additions per sample). The Pre-SAO requires only the samples used by the deblocking filter and does not require additional samples to be stored at the decoder.

In some embodiments, a bilateral filter (BIF) is implemented for compression efficiency exploration beyond VVC. The BIF is performed in a Sample Adaptive Offset (SAO) loop filter stage. Both the bilateral filter (BIF) and the SAO use samples from deblocking as inputs. Each filter creates an offset for each sample and adds these offsets to the input samples, which are then clipped before ALF.

In detail, output sample I _OUT Obtained as

I _OUT ＝clip3(I _C +ΔI _BIF +ΔI _SAO )，

Wherein I is _C Is the input sample from deblocking, ΔI _BIF Is the offset from the bilateral filter, ΔI _SAO Is the offset from the SAO.

In some embodiments, this implementation provides the encoder with the possibility to enable or disable filtering at CTU and slice level. The encoder makes the decision by evaluating the Rate Distortion Optimization (RDO) cost.

PPS incorporates the following syntax elements:

table 1-2: picture parameter set RBSP syntax.

pps_biliterra_filter_enabled_flag equal to 0 specifies that the bilateral loop filter is disabled for the PPS-referenced slice. pps_biliterra_filter_flag equal to 1 specifies that the bilateral loop filter is enabled for the slice referencing PPS.

Bilaster_filter_strength specifies the bilateral loop filter strength values used in the bilateral transform block filtering process. The value of the Bilasterterttrength should be in the range of 0 to 2, inclusive.

The biliterral_filter_qp_offset specifies the offset used in the derivation of the bilateral filter look-up table LUT (x) for the PPS referenced slices. The biliterral_filter_qp_offset should be in the range of-12 to +12, inclusive.

The following syntax elements are introduced:

tables 1-3: slice header syntax.

Tables 1 to 4: coding tree unit syntax.

The semantics are as follows: the slice_temporal_filter_all_ctb_enabled_flag equal to 1 specifies that the bilateral filter is enabled and applied to all CTBs in the current stripe. When the slice_temporal_filter_all_ctb_enabled_flag does not exist, it is inferred to be equal to 0.

The slice_dual_filter_enabled_flag equal to 1 specifies that the bilateral filter is enabled and can be applied to the CTB of the current slice. When the slice_dual_filter_enabled_flag does not exist, it is inferred that the slice_dual_filter_all_ctb_enabled_flag is equal.

The biliterraal_filter_ctb_flag [ xCtb > > CtbLog2SizeY ] [ yCtb > > CtbLog2SizeY ] equals 1 specifies that the bilateral filter is applied to the luma coding tree block of the coding tree unit at luma position (xCtb, yCtb). The biliterraal_filter_ctb_flag [ cIdx ] [ xCtb > > CtbLog2SizeY ] [ yCtb > > CtbLog2SizeY ] equal to 0 specifies that the bilateral filter is not applied to the luma coding tree block of the coding tree unit at the luma position (xCtb, yCtb). When the dual_filter_ctb_flag does not exist, it is inferred to be equal to (slice_dual_filter_all_ctb_enabled_flag & slice_dual_filter_enabled_flag).

In some examples, for the CTUs being filtered, the filtering process proceeds as follows. At picture boundaries, samples are not available, and bilateral filters use extensions (sample repetition) to fill in the unavailable samples. For virtual boundaries, the behavior is the same as SAO, i.e., no filtering occurs. When crossing horizontal CTU boundaries, the bilateral filter may access the same samples as the SAO is accessing. Fig. 5B is a block diagram depicting a naming convention for samples around a center sample, in accordance with some embodiments of the present disclosure. For example, if center sample I _C Located in the top row of CTUs, I is read from the CTU above _NW 、I _A And I _NE Just like SAO, but is filled with I _AA No additional line buffers are therefore required. Around a central sample I _C According to fig. 5B, wherein A, B, L and R represent up, down, left and right, and wherein NW, NE, SW, SE represents northwest, etc. Likewise, AA stands for up-up, BB stands for down-down, and so on. This diamond is different from another approach using square filter support, which does not use I _AA 、I _BB 、I _LL Or I _RR 。

Each surrounding sample I _A 、I _R Will contribute corresponding correction valuesEtc. These values are calculated as follows: sample I from the right _R The contribution of (1) starts, the difference is calculated as:

ΔI _R ＝(|I _R -I _C |+4)＞＞3，

Where |·| represents absolute value. For non-10 bit data, ΔI is used _R ＝(|I _R -I _C |+2 ^n-6 ) > (n-7) instead, where n=8 for 8 bits of data, and so on. The resulting value is now clipped to be less than 16:

sI _R ＝min(15,ΔI _R ).

the correction value is now calculated as

Wherein LUT is _ROW []Is an array of 16 values determined by the value qpb =clip (0, 25, qp+bipolar_filter_qp_offset-17):

{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, }, if qpb =0

{0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0, }, if qpb =1

{0,2,2,2,1,1,0,1,0,0,0,0,0,0,0,0, }, if qpb =2

{0,2,2,2,2,1,1,1,1,1,1,1,0,1,1, -1, }, if qpb =3

{0,3,3,3,2,2,1,2,1,1,1,1,0,1,1, -1, }, if qpb =4

{0,4,4,4,3,2,1,2,1,1,1,1,0,1,1, -1, }, if qpb =5

{0,5,5,5,4,3,2,2,2,2,2,1,0,1,1, -1, }, if qpb =6

{0,6,7,7,5,3,3,3,3,2,2,1,1,1,1, -1, }, if qpb =7

{0,6,8,8,5,4,3,3,3,3,3,2,1,2,2, -2, }, if qpb =8

{0,7,10,10,6,4,4,4,4,3,3,2,2,2,2, -2, }, if qpb =9

{0,8,11,11,7,5,5,4,5,4,4,2,2,2,2, -2, }, if qpb =10

{0,8,12,13,10,8,8,6,6,6,5,3,3,3,3, -2, }, if qpb =11

{0,8,13,14,13,12,11,8,8,7,7,5,5,4,4, -2, }, if qpb =12

{0,9,14,16,16,15,14,11,9,9,8,6,6,5,6, -3, }, if qpb =13

{0,9,15,17,19,19,17,13,11,10,10,8,8,6,7, -3, }, if qpb =14

{0,9,16,19,22,22,20,15,12,12,11,9,9,7,8, -3, }, if qpb =15

{0,10,17,21,24,25,24,20,18,17,15,12,11,9,9, -3, }, if qpb =16

{0,10,18,23,26,28,28,25,23,22,18,14,13,11,11, -3, }, if qpb =17

{0,11,19,24,29,30,32,30,29,26,22,17,15,13,12, -3, }, if qpb =18

{0,11,20,26,31,33,36,35,34,31,25,19,17,15,14, -3, }, if qpb =19

{0,12,21,28,33,36,40,40,40,36,29,22,19,17,15, -3, }, if qpb =20

{0,13,21,29,34,37,41,41,41,38,32,23,20,17,15, -3, }, if qpb =21

{0,14,22,30,35,38,42,42,42,39,34,24,20,17,15, -3, }, if qpb =22

{0,15,22,31,35,39,42,42,43,41,37,25,21,17,15, -3, }, if qpb =23

{0,16,23,32,36,40,43,43,44,42,39,26,21,17,15, -3, }, if qpb =24

{0,17,23,33,37,41,44,44,45,44,42,27,22,17,15, -3, }, if qpb =25

These values may be stored using six bits per entry, yielding 26 x 16 x 6/8=312 bytes or 300 bytes if the first row of all zeros is not included.And->According to I in the same manner _L 、I _A And I _B To calculate. For diagonal sample I _NW 、I _NE 、I _SE 、I _SW And sample I outside two steps _AA 、I _BB 、I _RR And I _LL Meter (D)The calculations also follow equations 2 and 3, but use a value shifted by 1. In diagonal sample I _SE In the case of an example of this,

other diagonal samples and samples outside the two steps are calculated similarly.

The correction values are added together

In some examples, for the previous sample,equal to->Also for the above example, +.>Equal toAnd similar symmetry can be found for both diagonal and two-step out correction values. This means that in a hardware implementation, the +.> And->These six values are sufficient and the remaining six values can be obtained from the previously calculated values.

m _sum The value is now multiplied by c=1, 2 or 3, which can be done using a single adder AND a logical AND (AND) gate in the following way:

c _v ＝k ₁ &(m _sum ＜＜1)+k ₂ &m _sum ,

wherein the method comprises the steps of&Representing logical AND, k ₁ Is the most significant bit, k, of multiplier c ₂ Is the least significant bit. The multiplied value is obtained using the minimum block size d=min (height), as shown in tables 1-5:

block type	D≤4	4<D<16	D≥16
				Intra-frame	3	2	1
Inter-frame	2	2	1

Tables 1 to 5: the c parameter is obtained from the minimum size d=min (height) of the block.

Finally, calculating the bilateral filter offset DeltaI _BIF . For full intensity filtering, the following formula is used:

ΔI _BIF ＝(c _v +16)＞＞5，

whereas for half-intensity filtering the following formula is used:

ΔI _BIF ＝(c _v +32)＞＞6。

the general formula for n-bit data is to use

r _add ＝2 ^{14-n-bilateral_filter_strength}

r _shift ＝15-n-bilateal_filter_strength

ΔI _BIF ＝(c _v +r _add )＞＞r _shift ，

Wherein the biliterra_filter_structh may be 0 or 1 and signaled in pps.

In some embodiments, the methods and systems disclosed herein improve coding efficiency or reduce complexity of Sample Adaptive Offset (SAO) by introducing cross-component information. SAO is used for HEVC, VVC, AVS and AVS3 standards. Although the SAO designs existing in the HEVC, VVC, AVS and AVS3 standards are used as the basic SAO method in the following description, the cross-component method described in the present disclosure may also be applied to other loop filter designs or other codec tools having similar design concepts to those skilled in the art of video codec. For example, in the AVS3 standard, SAO is replaced by a codec tool called Enhanced Sample Adaptive Offset (ESAO). However, the CCSAO disclosed herein may also be applied in parallel with ESAO. In another example, CCSAO may be applied in parallel with a Constrained Direction Enhancement Filter (CDEF) in the AV1 standard.

For existing SAO designs in the HEVC, VVC, AVS and AVS3 standards, the luma Y, chroma Cb and chroma Cr sample offset values are independently determined. That is, for example, the current chroma sample offset is determined only from the current and neighboring chroma sample values, regardless of the co-located or neighboring luma samples. However, luminance samples retain more original picture detail information than chrominance samples, which may be advantageous for the decision of the current chrominance sample offset. Furthermore, introducing luma samples preserving high frequency detail for chroma offset decision may be advantageous for chroma sample reconstruction, as the chroma samples typically lose high frequency detail after conversion from RGB colors to YCbCr, or after quantization and deblocking filters. Thus, further gains may be expected by exploring cross-component correlations, for example, by using methods and systems for cross-component sample adaptive offset (CCSAO). In some embodiments, the correlation here includes not only the cross-component sample values, but also picture/codec information, such as prediction/residual codec mode from the cross-component, transform type, and quantization/deblocking/SAO/ALF parameters.

Another example is for SAO, the luminance sample offset is determined only by the luminance samples. However, for example, luma samples having the same segment offset (BO) classification may be further classified by their co-located and neighboring chroma samples, which may result in a more efficient classification. The SAO classification may act as a shortcut to compensate for sample differences between the original picture and the reconstructed picture. Therefore, an efficient classification is needed.

Fig. 6A is a block diagram illustrating a system and process of CCSAO applied to chroma samples and using dbfy as input in accordance with some embodiments of the present disclosure. Luminance samples after a luminance deblocking filter (DBF Y) are used to determine additional offsets of chrominance Cb and Cr after SAO Cb and SAO Cr. For example, the current chroma sample 602 is first classified using the co-located luma sample 604 and the neighboring (white) luma samples 606, and the corresponding CCSAO offset value for the corresponding class is added to the current chroma sample value. Fig. 6B is a block diagram illustrating a system and process for CCSAO applied to luminance and chrominance samples and using DBF Y/Cb/Cr as input according to some embodiments of the present disclosure. Fig. 6C is a block diagram illustrating a system and process of CCSAO that may operate independently in accordance with some embodiments of the present disclosure. Fig. 6D is a block diagram illustrating a system and process of CCSAO that may be applied progressively (2 times or N times) in the same codec stage with the same or different offsets or repeated in different stages, according to some embodiments of the present disclosure. In summary, in some embodiments, to classify the current luma sample, information of the current and neighboring luma samples, information of co-located and neighboring chroma samples (Cb and Cr) may be used. In some embodiments, to classify a current chroma sample (Cb or Cr), co-located and adjacent luma samples, co-located and adjacent cross-chroma samples, and current and adjacent chroma samples may be used. In some embodiments, CCSAO may be cascaded (1) after DBF Y/Cb/Cr, (2) after reconstructed image Y/Cb/Cr before DBF, or (3) after SAO Y/Cb/Cr, or (4) after ALF Y/Cb/Cr.

In some embodiments, CCSAO may also be applied in parallel with other codec tools, such as ESAO in the AVS standard, or CDEF or Neural Network Loop Filter (NNLF) in the AV1 standard. Fig. 6E is a block diagram illustrating a system and process of CCSAO applied in parallel with ESAO in the AVS standard according to some embodiments of the present disclosure.

Fig. 6F is a block diagram illustrating a system and process of CCSAO applied after SAO according to some embodiments of the present disclosure. In some embodiments, fig. 6F shows that the location of CCSAO may be after SAO, i.e., the location of a cross-component adaptive loop filter (CCALF) in the VVC standard. Fig. 6G is a block diagram illustrating that the system and process of CCSAO may operate independently without a CCALF in accordance with some embodiments of the present disclosure. In some embodiments, for example in the AVS3 standard, SAO Y/Cb/Cr may be replaced by ESAO.

FIG. 6H is a block diagram illustrating a system and process of CCSAO applied in parallel with CCALF in accordance with some embodiments of the present disclosure. In some embodiments, fig. 6H illustrates that CCSAO may be applied in parallel with CCALF. In some embodiments, in fig. 6H, the locations of CCALF and CCSAO may be interchanged. In some embodiments, in fig. 6A-6H, or throughout the present disclosure, SAO Y/Cb/Cr blocks may be replaced by ESAO Y/Cb/Cr (in AVS 3) or CDEF (in AV 1). Note that Y/Cb/Cr may also be denoted as Y/U/V in the field of video encoding and decoding. In some embodiments, if the video is in RGB format, CCSAO may also be applied in the present disclosure by simply mapping YUV symbols to GBRs, respectively.

In some embodiments, the current chroma sample classification reuses SAO types (edge offset (EO) or BO), classes, and categories for co-located luma samples. The corresponding CCSAO offset may be signaled or derived from the decoder itself. For example, let h_Y be the luminance SAO offset of the parity, and h_Cb and h_Cr be CCSAO Cb and Cr offsets, respectively. h_cb (or h_cr) =w×h_y, where w can be selected in a limited table. For example, + -1/4, + -1/2, 0, + -1, + -2, + -4 … …, etc., where |w| includes only power values of 2.

In some embodiments, a comparison score of co-located luminance samples (Y0) and 8 neighboring luminance samples [ -8, 8] is used, which results in 17 classes in total.

Initial class = 0

Cycling adjacent 8 luminance samples (Yi, i=1 to 8)

If Y0> Yi class+=1

Otherwise if Y0< Yi class- =1

In some embodiments, the above classification methods may be combined. For example, comparison scores combined with SAO BO (32 band classification) were used to increase diversity, yielding a total of 17 x 32 classes. In some embodiments, cb and Cr may use the same class to reduce complexity or save bits.

Fig. 7 is a block diagram illustrating a sample process using CCSAO according to some embodiments of the present disclosure. In particular, FIG. 7 shows that the CCSAO inputs may introduce vertical and horizontal DBF inputs to simplify class determination or to increase flexibility. For example, let y0_dbf_ V, Y0_dbf_h and Y0 be the parity luminance samples at the inputs of dbf_ V, DBF _h and SAO, respectively. Yi_dbf_ V, yi _dbf_h and Yi are the adjacent 8 luminance samples at the inputs of dbf_ V, DBF _h and SAO, respectively, where i=1 to 8.

MaxY0＝max(Y0_DBF_V,Y0_DBF_H,Y0_DBF)

MaxYi＝max(Yi_DBF_V,Yi_DBF_H,Yi_DBF)

And max Y0 and max Yi are fed to the CCSAO class.

Fig. 8 is a block diagram illustrating a CCSAO process being interleaved to vertical and horizontal DBFs according to some embodiments of the present disclosure. In some embodiments, the CCSAO blocks in fig. 6, 7 and 8 may be selective. For example, y0_dbf_v and yi_dbf_v are used for the first ccsao_v, which applies the same sample processing as in fig. 6, while using the input of the dbf_v luminance sample as the CCSAO input.

In some embodiments, the CCSAO syntax implemented is shown in table 2 below.

Table 2: CCSAO grammar example

In some embodiments, to signal CCSAO Cb and Cr offset values, if an additional chroma offset is signaled, other chroma component offsets may be derived by adding or subtracting signs or weights to save bit overhead. For example, let h_Cb and h_Cr be the offsets of CCSAO Cb and Cr, respectively. Using explicit signaling w, where w= ++ - |w| with limited |w| candidates, h_cr can be derived from h_cb without explicitly signaling the h_cr itself.

h_Cr＝w*h_Cb

Fig. 9 is a flow chart illustrating an exemplary process 900 for decoding a video signal using cross-component correlation in accordance with some embodiments of the present disclosure.

Video decoder 30 receives a video signal comprising a first component and a second component (910). In some embodiments, the first component is a luminance component and the second component is a chrominance component of the video signal.

Video decoder 30 also receives a plurality of offsets associated with the second component (920).

Video decoder 30 then utilizes the characteristic measurement of the first component to obtain a classification category associated with the second component (930). For example, in fig. 6, a current chroma sample 602 is first classified using a co-located luma sample 604 and an adjacent (white) luma sample 606, and a corresponding CCSAO offset value is added to the current chroma sample.

Video decoder 30 further selects a first offset from the plurality of offsets of the second component according to the classification category (940).

Video decoder 30 additionally modifies the second component based on the selected first offset (950).

In some embodiments, utilizing the characteristic measurement of the first component to obtain the classification category (930) associated with the second component includes: a respective classification category of a respective each sample of the second component is obtained using a respective sample of the first component, wherein the respective sample of the first component is a respective co-located sample of the first component to a respective each sample of the second component. For example, the current chroma sample classification is reusing SAO types (EO or BO), classes, and categories for co-located luma samples.

In some embodiments, utilizing the characteristic measurement of the first component to obtain the classification category (930) associated with the second component includes: a respective classification category of each respective sample of the second component is obtained using the respective sample of the first component, wherein the respective sample of the first component is reconstructed either before deblocking or after deblocking. In some embodiments, the first component is deblocked at a deblocking filter (DBF). In some embodiments, the first component is deblocked at a luma deblocking filter (DBF Y). For example, instead of fig. 6 or 7, the ccsao input may also precede the DBF Y.

In some embodiments, the characteristic measure is derived by dividing a range of sample values of the first component into several bands and selecting the bands based on intensity values of the samples in the first component. In some embodiments, the characteristic measurement is derived from a Band Offset (BO).

In some embodiments, the characteristic measure is derived based on the direction and intensity of the edge information of the sample in the first component. In some embodiments, the characteristic measurement is derived from Edge Offset (EO).

In some embodiments, modifying the second component (950) includes directly adding the selected first offset to the second component. For example, a corresponding CCSAO offset value is added to the current chroma component sample.

In some embodiments, modifying the second component (950) includes mapping the selected first offset to the second offset and adding the mapped second offset to the second component. For example, for signaling CCSAO Cb and Cr offset values, if one additional chroma offset is signaled, other chroma component offsets may be derived by using plus or minus signs or weights to save bit overhead.

In some embodiments, receiving the video signal (910) includes receiving a syntax element that indicates in a Sequence Parameter Set (SPS) whether a method of decoding the video signal using CCSAO is enabled for the video signal. In some embodiments, cc_sao_enabled_flag indicates whether CCSAO is enabled at the sequence level.

In some embodiments, receiving the video signal (910) includes receiving a syntax element indicating whether a method of decoding the video signal using CCSAO is enabled for the second component at the stripe level. In some embodiments, the slice_cc_sao_cb_flag or slice_cc_sao_cr_flag indicates whether CCSAO is enabled in the corresponding slice of Cb or Cr.

In some embodiments, receiving the plurality of offsets (920) associated with the second component includes receiving different offsets for different Coding Tree Units (CTUs). In some embodiments, for CTUs, cc_sao_offset_sign_flag indicates the sign of the offset, and cc_sao_offset_abs indicates the CCSAO Cb and Cr offset values of the current CTU.

In some embodiments, receiving the plurality of offsets associated with the second component (920) includes receiving a syntax element indicating whether the offset of the received CTU is the same as an offset of one of the neighboring CTUs of the CTU, wherein the neighboring CTU is a left neighboring CTU or an above neighboring CTU. For example, cc_sao_merge_up_flag indicates whether the CCSAO offset is merged from the left CTU or the upper CTU.

In some embodiments, the video signal further comprises a third component, and the method of decoding the video signal using CCSAO further comprises: receiving a second plurality of offsets associated with the third component; obtaining a second classification category associated with the third component using the characteristic measurement of the first component; selecting a third offset for the third component from the second plurality of offsets according to the second classification category; and modifying the third component based on the selected third offset.

Fig. 11 is a block diagram illustrating a sample process in which all co-located and adjacent (white) luminance/chrominance samples may be fed to a CCSAO classification in accordance with some embodiments of the present disclosure. Fig. 6A, 6B and 11 show the input of the CCSAO classification. In fig. 11, the current chroma sample is 1104, the cross-component co-located chroma sample is 1102, and the co-located luma sample is 1106.

In some embodiments, classifier example (C0) classifies using the following co-located luminance or chrominance sample value (Y0) in FIG. 12 (Y4/U4/V4 in FIG. 6B and FIG. 6C). Let band_num be the equal fractional band number of the luminance or chrominance dynamic range, bit_depth be the sequence bit depth, examples of class indexes of the current chrominance samples are:

Class(C0)＝(Y0*band_num)>>bit_depth

in some embodiments, classification considers rounding, such as:

Class(C0)＝((Y0*band_num)+(1<<bit_depth))>>bit_depth

some band_num and bit_depth examples are listed in table 3 below. Table 3 shows three classification examples, where the number of bands is different for each classification example.

Table 3: exemplary band_num and bit_depth for each class index.

In some embodiments, the classifier uses different luminance sample locations for the C0 classification. Fig. 10A is a block diagram illustrating a classifier using different luma (or chroma) sample positions for C0 classification, e.g., using adjacent Y7 instead of Y0, according to some embodiments of the present disclosure.

In some embodiments, different classifiers may switch at the Sequence Parameter Set (SPS)/Adaptation Parameter Set (APS)/Picture Parameter Set (PPS)/Picture Header (PH)/Slice Header (SH)/region/Coding Tree Unit (CTU)/Coding Unit (CU)/sub-block/sample level. For example, in fig. 10, POC0 uses Y0, POC1 uses Y7, as shown in table 4 below.

POC	Classifier	C0 band_num	General class
				0	C0 uses Y0 position	8	8
1	C0 uses Y7 position	8	8

Table 4: different classifiers are applied to different pictures

In some embodiments, fig. 10B illustrates some examples of different shapes for luminance candidates according to some implementations of the disclosure. For example, constraints may be applied to the shape. In some cases, the total number of luminance candidates must be a power of 2, as shown in fig. 10B (b) (c) (d). In some cases, the number of luminance candidates must be horizontally and vertically symmetric with respect to the chroma samples (at the center), as shown in fig. 10B (a) (c) (d) (e). In some embodiments, a power constraint of 2 and a symmetry constraint may also be applied to the chroma candidates. The U/V portions of FIGS. 6B and 6C show examples of symmetric constraints. In some embodiments, different color formats may have different classifier "constraints. For example, the 420 color format uses luminance/chrominance candidate selection (one candidate is selected from 3x3 shapes) as shown in fig. 6B and 6C, but the 444 color format uses fig. 10B (f) for luminance and chrominance candidate selection, the 422 color format uses fig. 10B (g) for luminance (2 chrominance samples share 4 luminance candidates), and fig. 10B (f) for chrominance candidates.

In some embodiments, the C0 position and C0 band_num may be combined and switched at SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. Different combinations may be different classifiers as shown in table 5 below.

POC	Classifier	C0 band_num	General class
				0	C0 uses Y0 position	16	16
1	C0 uses Y7 position	8	8

Table 5: application of different classifier and band combinations to different pictures

In some embodiments, the co-located luminance sample value (Y0) is replaced by a value (Yp) obtained by weighting co-located and adjacent luminance samples. Fig. 12 illustrates an exemplary classifier according to some embodiments of the present disclosure by replacing co-located luminance sample values with values obtained by weighting co-located and adjacent luminance samples. The co-located luminance sample value (Y0) may be replaced with a phase correction value (Yp) obtained by weighting neighboring luminance samples. Different YPs may be different classifiers.

In some embodiments, different yps apply to different chroma formats. For example, in fig. 12, yp of (a) is used for 420 chromaticity format, yp of (b) is used for 422 chromaticity format, and Y0 is used for 444 chromaticity format.

In some embodiments, another classifier (C1) is a comparison score of the co-located luminance sample (Y0) and the 8 neighboring luminance samples [ -8,8], which yields a total of 17 classes, as shown below.

Initiating Class (C1) =0, cycle over adjacent 8 luminance samples (Yi, i=1 to 8)

If Y0> Yi class+=1

Otherwise if Y0< Yi Class- =1

In some embodiments, the Cl example is equal to the following function, where the threshold th is 0.

ClassIdx＝Index2ClassTable(f(C,P1)+f(C,P2)+…+f(C,P8))

If x-y > th, f (x, y) =1; if x-y=th, f (x, y) =0; if x-y < th, then f (x, y) = -1 where Index2ClassTable is a look-up table (LUT), C is the current or co-located sample, and P1 to P8 are neighboring samples.

In some embodiments, similar to the C4 classifier, one or more thresholds may be predefined (e.g., saved in a LUT) or signaled in SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level to help classify (quantify) the differences.

In some embodiments, variant (C1') calculates only comparison scores [0,8], and this results in 8 classes. (C1, C1 ') is a classifier set and the PH/SH level flags can be signaled to switch between C1 and C1'.

Initiating Class (C1')=0, cycle over 8 adjacent luminance samples (Yi, i=1 to 8)

If Y0> Yi class+=1

In some embodiments, variant (C1 s) selectively uses adjacent N of the M adjacent samples to calculate the comparison score. The bit mask of the M bits may be signaled at SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level to indicate which neighboring samples are selected to calculate the comparison score. Using fig. 6B as an example of a luminance classifier: 8 adjacent luminance samples are candidates and an 8 bit bitmask (01111110) is signaled at PH, indicating that 6 samples Y1 through Y6 are selected, so the comparison score is in [ -6,6] which produces 13 offsets. The selective classifier C1s provides more choices for the encoder to trade off the offset signaling overhead against the classification granularity.

Similar to C1s, the variant (C1's) calculates only the comparison score of [0, +N ], the previous example of bitmask 01111110 gives the comparison score in [0,6], which yields 7 offsets.

In some embodiments, different classifiers are combined to produce a generic classifier. For example, for different pictures (different POC values), different classifiers are applied, as shown in table 6-1 below.

POC	Classifier	C0 band_num	General class
				0	Combination C0 and C1	16	16*17
1	Combination of C0 and C1'	16	16*9
				2	Combination C0 and C1	7	7*17

Table 6-1: different general classifiers are applied to different pictures

In some embodiments, another classifier example (C3) uses bitmasks for classification, as shown in Table 6-2. The 10-bit mask is signaled at SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level to indicate the classifier. For example, the bitmask 11 1100 0000 represents that for a given 10-bit luma sample value, only the Most Significant Bits (MSBs): the 4 bits are used for classification, yielding a total of 16 classes. Another example bitmask 10 0100 0001 represents only 3 bits for classification, resulting in a total of 8 classes.

In some embodiments, the bit mask length (N) may be fixed or switched in SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. For example, for a 10-bit sequence, a 4-bit bitmask 1110 is signaled in PH in the picture, MSB 3 bits b9, b8, b7 are used for classification. Another example is a 4-bit bitmask 0011 on LSB, and b0, b1 is used for classification. The bitmask classifier may be applied to luminance or chrominance classification. Whether the bitmask N uses MSB or LSB may be fixed or switched in SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level.

In some embodiments, the luma position and the C3 bit mask may be combined and switched in SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. Different combinations may be different classifiers.

In some embodiments, a maximum number of "1's" of bitmask limits may be applied to limit the corresponding number of offsets. For example, limiting the "maximum number of 1's" of bitmasks to 4 in SPS will result in a maximum offset in the sequence of 16. The bitmasks in different POCs may be different, but the "maximum number of 1" should not exceed 4 (total class must not exceed 16). The "maximum number of 1" value may be signaled and switched at SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level.

Table 6-2 classifier examples use bitmasks for classification (bitmask locations underlined)

In some embodiments, as shown in fig. 11, for example, for the current chroma sample 1104, other cross-component chroma samples (e.g., chroma sample 1102) and their neighbors may also be fed into the CCSAO classification. For example, cr chroma samples may be fed into the CCSAO Cb classification. Cb chroma samples may be fed into the CCSAO Cr classification. The classifier of the cross-component chroma samples may be the same as the luma cross-component classifier, or may have its own classifier, as described in this disclosure. These two classifiers may be combined to form a joint classifier to classify the current chroma sample. For example, the joint classifier combines the cross-component luma and chroma samples, yielding a total of 16 classes, as shown in tables 6-3 below.

Table 6-3: classifier examples (bitmask position underlined) using a joint classifier combining cross-component luma and chroma samples

All the above classifications (C0, C1', C2, C3) may be combined. For example, see tables 6-4 below.

Table 6-4: different classifiers are combined

In some embodiments, classifier example (C2) uses the difference (Yn) of co-located and adjacent luminance samples. Fig. 12 (c) shows an example of Yn whose dynamic range is [ -1024,1023] when the bit depth is 10. Let c2band _ num be the equal Band number of Yn dynamic range,

Class(C2)＝(Yn+(1<<bit_depth)*band_num)>>(bit_depth+1)。

in some embodiments, C0 and C2 are combined to produce a generic classifier. For example, for different pictures (different POCs), different classifiers are applied, as shown in table 7 below.

POC	Classifier	C0 band_num	C2 band_num	General class
					0	Combination C0 and C2	16	16	16*17
1	Combination C0 and C2	8	7	8*7

Table 7: different general classifiers are applied to different pictures

In some embodiments, all of the above-described classifiers (C0, C1', C2) are combined. For example, for different pictures (different POCs), different classifiers are applied, as shown in Table 8-1 below.

POC	Classifier	C0 band_num	C2 band_num	General class
					0	Combinations C0, C1 and C2	4	4	4174
1	Combinations C0, C1' and C2	6	4	694

Table 8-1: different general classifiers are applied to different pictures

In some embodiments, classifier example (C4) uses the difference between the CCSAO input value and the sample value to be compensated for classification, as shown in Table 8-2 below. For example, if CCSAO is applied in the ALF phase, the difference between the current component pre-ALF and post-ALF sample values is used for classification. One or more thresholds may be predefined (e.g., stored in a look-up table (LUT)) or signaled in SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level to aid in classification (quantization) discrimination. The C4 classifier can be combined with C0Y/U/V BandNum to form a joint classifier (POC 1 example as shown in table 8-2).

Table 8-2: classifier examples classify using the difference of CCSAO input values and sample values to be compensated

In some embodiments, the classifier example (C5) uses "coding information" to help sub-block classification, as different coding modes may introduce different distortion statistics in the reconstructed image. A CCSAO sample is classified based on the encoded information preceding its sample, and the combination of these encoded information may form a classifier, for example, as shown in tables 8-3 below. Fig. 30 below shows another example of different phases of the encoded information of C5.

Table 8-3CCSAO samples are classified according to their previous codec information, and the combination of the codec information may form a classifier

In some embodiments, classifier example (C6) classifies using YUV color transform values. For example, to classify the current Y component, 1/1/1 co-located or adjacent Y/U/V sample color transforms to RGB are selected and the C3 bandNum quantized R value is used as the current Y component classifier.

In some embodiments, other classifier examples that use only current component information for current component classification may be used as cross-component classification. For example, as shown in fig. 5A and table 1, luminance sample information and eo-class are used to derive EdgeIdx and classify the current chroma sample. Other "non-cross-component" classifiers that may also be used as cross-component classifiers include edge direction, pixel intensity, pixel variance, pixel laplace summation, sobel (sobel) operator, compass (compatibility) operator, high pass filter values, low pass filter values, and the like.

In some embodiments, multiple classifiers are used in the same POC. The current frame is divided into several regions, each region using the same classifier. For example, in POC0, 3 different classifiers are used, which classifier (0, 1 or 2) to use is signaled at the CTU level, as shown in table 9 below.

POC	Classifier	C0 band_num	Region(s)
				0	C0 uses Y0 position	16	0
0	C0 uses Y0 position	8	1
				0	C0 uses Y1 position	8	2

Table 9: different general classifiers are applied to different regions of the same picture

In some embodiments, the maximum number of multiple classifiers (multiple classifiers may also be referred to as alternative offset sets) may be fixed or signaled in SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. In one example, the fixed (predefined) maximum number of multiple classifiers is 4. In this case, 4 different classifiers are used in POC0, and which classifier (0, 1 or 2) to use is signaled at CTU level. Truncated Unary (TU) codes may be used to indicate the classifier for each luminance or chrominance CTB. For example, as shown in table 10 below, when the TU code is 0: CCSAO is not applied; when the TU code is 10: application set 0; when the TU code is 110, apply set 1; when the TU code is 1110: an application set 2; when the TU code is 1111: application set 3. Fixed length codes, golomb-rice codes, and exponential-golomb codes may also be used to indicate the classifier (offset set index) of the CTB. In POC1, 3 different classifiers are used.

POC	Classifier	C0 band_num	Region(s)	TU codes
					0	C0 uses Y3 position	6	0	10
0	C0 uses Y3 position	7	1	110
					0	C0 uses Y1 position	3	2	1110
0	C0 uses Y6 position	6	3	1111
					1	C0 uses Y0 position	16	0	10
1	C0 uses Y0 position	8	1	110
					1	C0 uses Y1 position	8	2	1110

Table 10: truncated Unary (TU) codes are used to indicate a classifier for each chroma CTB

Examples of Cb and Cr CTB offset set indices are given for the 1280x720 sequence POC0 (if the CTU size is 128x128, then the number of CTUs in the frame is 10x 6). POC0 Cb uses 4 offset sets and Cr uses 1 offset set. As shown in table 11-1 below, when the offset set index is 0: CCSAO is not applied; when the offset set index is 1: application set 0; when the offset set index is 2: an application set 1; when the offset set index is 3: an application set 2; when the offset set index is 4: application set 3. The type indicates the position of the selected parity luminance sample (Yi). Different sets of offsets may have different types, band_num, and corresponding offsets.

Table 11-1: examples of Cb and Cr CTB offset set indices are given for the 1280x720 sequence POC0 (if the CTU size is 128x128, then the number of CTUs in the frame is 10x 6)

In some embodiments, examples of classification using co-located/current and neighboring Y/U/V samples in combination (3-component joint bandNum classification for each Y/U/V component) are listed in Table 11-2 below. In POC0, the {2,4,1} offset set is used for { Y, U, V } respectively. Each offset set may be adaptively switched at the SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. Different sets of offsets may have different classifiers. For example, as candidate positions (candPos) shown in fig. 6B and 6C, in order to classify the current Y4 luminance sample, Y set0 selects { current Y4, parity U4, parity V4} as candidates, having different candnum { Y, U, V } = {16,1,2}, respectively. Taking { candY, candU, candV } as the sample value of the selected { Y, U, V } candidates, the total class number is 32, and the class index derivation can be expressed as:

bandY＝(candY*bandNumY)>>BitDepth；

bandU＝(candU*bandNumU)>>BitDepth；

bandV＝(candV*bandNumV)>>BitDepth；

classIdx＝bandY*bandNumU*bandNumV

+bandU*bandNumV

+bandV；

In some embodiments, the classIdx derivation of the joint classifier may be expressed in an "or-shifted" form to simplify the derivation process. For example, maximum bandnum= {16,4,4}

classIdx＝(bandY<<4)|(bandU<<2)|bandV

Another example is in POC1 component V set1 classification. In this example, candpos= { neighbor Y8, neighbor U3, neighbor V0}, where candnum= {4,1,2}, this would result in 8 classes, are used.

POC	Current component	Offset set	A classifier: candPos (Y, U, V), wherein candNum (Y, U, V)	Total class (offset number)
					0	Y	0	(Y4，U4，V4)，(16，1，2)	1612＝32
		1	(Y4，U0，V2)，(15，4，1)	1541＝60
						U	0	(Y8，U3，V0)，(1，1，2)	2
		1	(Y4，U1，V0)，(15，2，2)	60
							2	(Y6，U6，V6)，(4，4，1)	16
		3	(Y2，U0，V5)，(1，1，1)	1
						V	0	(Y2，U0，V5)，(1，1，1)	1
1	Y	0	(Y4，U1，V0)，(15，2，2)	60
						U	0	(Y6，U2，V1)，(7，1，2)	14
	V	0	(Y8，U3，V0)，(1，1，2)	2
							1	(Y8，U3，V0)，(4，1，2)	8

Table 11-2: examples of classification using co-located/current and neighboring Y/U/V samples in combination

In some embodiments, examples of joint use of co-located and adjacent Y/U/V samples for current Y/U/V sample classification (3 component joint edgeNum (Cls) and bandNum classification for each Y/U/V component) are listed, for example, as shown in tables 11-3 below. edge CandPoS is the center position for the C1s classifier, edge bitMasK is the C1s neighbor sample activation index, and edge Num is the corresponding C1s class number. In this example, C1s applies only to the Y classifier (so that edgeNum is equal to edgeNumY), and edge candPos is always Y4 (current/co-located sample position). However, C1s can be applied to the Y/U/V classifier with edge candPos as the neighboring sample locations.

diff represents the comparative score of Y C s, the classIdx derivation may be

bandY＝(candY*bandNumY)＞＞BitDepth；

bandU＝(candU*bandNumU)＞＞BitDepth；

bandV＝(candV*bandNumV)＞＞BitDepth；

edgeIdx＝diff+(edgeNum＞＞1)；

bandIdx＝bandY*bandNumU*bandNumV

+bandU*bandNumV

+bandV；

classIdx＝bandIdx*edgeNum+edgeIdx；

Table 11-3 (part 1): examples of classification using co-located/current and neighboring Y/U/V samples in combination

Table 11-3 (part 2): examples of classification using co-located/current and neighboring Y/U/V samples in combination

Table 11-3 (part 3): examples of classification using co-located/current and neighboring Y/U/V samples in combination

In some embodiments, the maximum band_num (bandNumY, bandNumU or bandNumV) may be fixed or signaled in SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. For example, the maximum band_num=16 is fixed in the decoder, and 4 bits are signaled for each frame to indicate c0band_num in the frame. Some other examples of maximum band_num are listed in table 12 below.

Band_num_min	Band_num_max	Band_num bit
			1	1	0
1	2	1
			1	4	2
1	8	3
			1	16	4
1	32	5
			1	64	6
1	128	7
			1	256	8

Table 12: maximum band_num and band_num bit examples

In some embodiments, the maximum number of classes or offsets per set (or all sets added) (a combination of multiple classifiers used in combination, e.g., C1s edgeNum C1 bandNumY bandNumU BandNumV) may be fixed or signaled at the SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. For example, the maximum value is fixed for all added sets, class_num=256×4, and the constraint can be checked using an encoder consistency check or a decoder normalization check.

In some embodiments, a limit may be applied to the C0 classification, e.g., to limit band_num (bandNumY, bandNumU or band numv) to a power value of only 2. Instead of explicitly signaling the band_num, the grammar band_num_shift is signaled. The decoder may use a shift operation to avoid multiplication. Different band_num_shift may be used for different components.

Class(C0)＝(Y0>>band_num_shift)>>bit_depth

Another example of operation is to consider rounding to reduce errors.

Class(C0)＝((Y0+(1<<(band_num_shift-1)))>>band_num_shift)>>bit_depth

For example, if band_num_max (Y, U or V) is 16, then the possible band_num_shift candidates are 0, 1, 2, 3, 4, corresponding to band_num=1, 2, 4, 8, 16, as shown in table 13.

POC	Classifier	C0 band_num_shift	C0 band_num	General class
					0	C0 uses Y0 position	4	16	16
1	C0 uses Y7 position	3	8	8

Band_num_max	Effective band_num	Band_num_shift candidates
			1	1	0
2	1,2	0,1
			4	1,2,4	0,1,2
8	1,2,4,8	0,1,2,3
			16	1,2,4,8,16	0,1,2,3,4
32	1,2,4,8,16,32	0,1,2,3,4,5
			64	1,2,4,8,16,32,64	0,1,2,3,4,5,6
128	1,2,4,8,16,32,64,128	0,1,2,3,4,5,6,7
			256	1,2,4,8,16,32,64,128,256	0,1,2,3,4,5,6,7,8

Table 13: band_num and corresponding band_num_shift candidates

In some embodiments, the classifiers applied to Cb and Cr are different. Cb and Cr offsets of all classes may be signaled separately. For example, different signaled offsets are applied to different chrominance components, as shown in table 14 below.

POC	Component(s)	Classifier	C0 band_num	General class	Signaled offset
						0	Cb	C0	16	16	16
0	Cr	C0	5	5	5

Table 14: cb and Cr offsets of all classes can be signaled separately

In some embodiments, the maximum offset value is fixed or signaled in the Sequence Parameter Set (SPS)/Adaptive Parameter Set (APS)/Picture Parameter Set (PPS)/Picture Header (PH)/Slice Header (SH)/region/CTU/CU/sub-block/sample level. For example, the maximum offset is between [ -15,15 ]. Different components may have different maximum offset values.

In some embodiments, the offset signaling may use Differential Pulse Code Modulation (DPCM). For example, the offset {3,3,2,1, -1} may be signaled as {3,0, -1, -1, -2}.

In some embodiments, the offset may be stored in an APS or memory buffer for reuse by the next picture/stripe. An index may be signaled to indicate which stored previous frame offsets are used for the current picture.

In some embodiments, the classifiers for Cb and Cr are the same. All classes of Cb and Cr offsets may be signaled jointly, for example, as shown in table 15 below.

POC	Component(s)	Classifier	C0 band_num	General class	Signaled offset
						0	Cb and Cr	C0	8	8	8

Table 15: the cb and cr offsets of all classes may be signaled jointly

In some embodiments, the classifiers for Cb and Cr may be the same. The Cb and Cr offsets for all classes may be signaled jointly, e.g., with sign flag differences, as shown in table 16 below. According to Table 16, when Cb offset is (3, 2, -1), the derived Cr offset is (-3, -3, -2, 1).

Table 16: cb and Cr offsets of all classes may be signaled jointly with a sign-flag difference

In some embodiments, a symbolic flag may be signaled for each class. For example, as shown in table 17 below. According to Table 17, when Cb offset is (3, 2, -1), the derived Cr offset from the corresponding signed flag is (-3,3,2,1).

Table 17: cb and Cr offsets of all classes may be signaled jointly with a symbol flag signaled for each class

In some embodiments, the classifiers for Cb and Cr may be the same. The Cb and Cr offsets of all classes may be signaled jointly with a weight difference, for example, as shown in table 18 below. The weights (w) may be selected in a limited table, e.g., + -1/4, + -1/2, 0, + -1, + -2, + -4 … …, etc., where |w| comprises only power values of 2. According to Table 18, when Cb offset is (3, 2, -1), the Cr offset derived from the corresponding signed flag is (-6, -6, -4, 2).

Table 18: cb and Cr offsets of all classes may be signaled in conjunction with weight differences

In some embodiments, a weight may be signaled for each class. For example, as shown in table 19 below. According to Table 19, when Cb offset is (3, 2, -1), the Cr offset derived from the corresponding signed flag is (-6,12,0, -1).

Table 19: cb and Cr offsets for all classes may be signaled in conjunction with weights signaled for each class

In some embodiments, if multiple classifiers are used in the same POC, different sets of offsets are signaled separately or jointly.

In some embodiments, the previously decoded offset may be stored for use by future frames. An index may be signaled to indicate which previously decoded offset set is used for the current frame to reduce offset signaling overhead. For example, POC2 may reuse POC0 offset and signal that offset set idx is equal to 0, as shown in table 20 below.

Table 20: the index may be signaled to indicate which previously decoded offset set is used for the current frame in some embodiments, the reuse offset sets idx for Cb and Cr may be different, e.g., as shown in table 21 below.

Table 21: the index may be signaled to indicate which previously decoded offset set is used for the current frame, and may be different for Cb and Cr components.

In some embodiments, offset signaling may use an additional syntax including start and length to reduce signaling overhead. For example, when band_num=256, only the offset of band_idx=37 to 44 is signaled. In the example of table 22-1 below, the syntax of both start and length is 8-bit fixed length coded, which should match the band_num bit.

Table 22-1: offset signaling uses additional syntax, including start and length

In some embodiments, if CCSAO is applied to all YUV 3 components, co-located and neighboring YUV samples may be used jointly for classification, and all of the above-described offset signaling methods for Cb/Cr may be extended to Y/Cb/Cr. In some embodiments, different sets of component offsets may be stored and used separately (each component has its own set of stores) or jointly (each component shares/reuses the same store). Table 22-2 below shows an individual set example.

Table 22-2: the examples show that different sets of component offsets can be stored and used alone (each component has its own set of storage) or in combination (each component shares/reuses the same storage)

In some embodiments, if the sequence bit depth is higher than 10 (or a particular bit depth), the offset may be quantized prior to signaling. On the decoder side, the decoded offset is dequantized prior to application, as shown in table 23 below. For example, for a 12-bit sequence, the decoded offset is left shifted (dequantized) by 2.

Signaled offset	Dequantized and applied offset
		0	0
1	4
		2	8
3	12
		…
14	56
		15	60

Table 23-1: the decoded offset is dequantized before application

In some embodiments, the offset may be calculated as ccsaooffsetval= (1-2 x ccsao_offset_sign_flag) (ccsao_offset_abs < < (BitDepth-Min (10, bitDepth))

In some embodiments, the filter strength concept is further introduced herein. For example, the classifier offsets may be further weighted before being applied to the samples. The weight (w) may be selected in a table of power values of 2. For example, + -1/4, + -1/2, 0, + -1, + -2, + -4 … …, etc., where |w| includes only power values of 2. The weight index may be signaled at SPS/APS/PPS/PH/SH/region (set)/CTU/CU/sub-block/sample level. Quantization offset signaling may be used as a subset of the weight application. If progressive CCSAO is applied as shown in FIG. 6D, a similar weight indexing mechanism can be applied between the first stage and the second stage.

In some examples, the weights of the different classifiers are: the offsets of multiple classifiers may be applied to the same sample in a weighted combination. A similar weight index mechanism may be signaled as described above. For example, the number of the cells to be processed,

offset_final=w_offset_1+ (1-w) offset_2, or

offset_final＝w1*offset_1+w2*offset_2+…

In some embodiments, rather than signaling CCSAO parameters directly in PH/SH, previously used parameters/offsets may be stored in an Adaptive Parameter Set (APS) or memory buffer for subsequent picture/stripe reuse. The index may be signaled in PH/SH to indicate which stored previous frame offsets are used for the current picture/slice. A new APS ID may be created to maintain the CCSAO history offset. The following table shows an example using fig. 6I, candPoS and candnum { Y, U, V } = {16,4,4}. In some examples, candPos, bandNum, the offset signaling method may be a Fixed Length Code (FLC) or other method, such as a Truncated Unary (TU) code, an exponential golomb code of k-order (EGk), signed EG0 (SVLC), or unsigned EG0 (UVLC).

In this case, sao_cc_y_class_num (or cb, cr) is equal to sao_cc_y_band_num_y x sao_cc_y_band_num_u x sao_cc_y_band_num_v (or cb, cr). ph_sao_cc_y_aps_id is the parameter index used in the picture/slice. Note that the cb and cr components may follow the same signaling logic.

/>

Table 23-2: adaptive Parameter Set (APS) syntax

The aps_adaptation_parameter_set_id provides an identifier of the APS for reference by other syntax elements. When the aps_parameters_type is equal to ccsao_aps, the value of the aps_adaptation_parameter_set_id should be in the range of 0 to 7, including an end value (for example).

ph_sao_cc_y_aps_id specifies the aps_adaptation_parameter_set_id of CCSAO APS referenced by the Y color component of the slice in the current picture. When ph_sao_cc_y_aps_id exists, the following applies: the value of sao_cc_y_set_signal_flag of APS NAL units with aps_parameter_type equal to ccsao_aps and aps_adaptation_parameter_set_id equal to ph_sao_cc_y_aps_id should be equal to 1; the temporalld of an APS Network Abstraction Layer (NAL) unit with aps_parameters_type equal to ccsao_aps and aps_adaptation_parameter_set_id equal to ph_sao_cc_y_aps_id should be less than or equal to the tempsalald of the current picture.

In some embodiments, APS update mechanisms are described herein. The maximum number of APS offset sets may be predefined or signaled in SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. Different components may have different maximum number limits. If the APS offset set is full, the newly added offset set may replace an existing stored offset using a first-in-first-out (FIFO), last-in-first-out (LIFO), or Least Recently Used (LRU) mechanism, or an index value indicating which APS offset set should be replaced is received. In some examples, if the selected classifier consists of candPos/edge info/coding info, etc., all classifier information may be part of the APS offset set and may also be stored in the APS offset set along with its offset value.

In some embodiments, constraints may be applied. For example, the newly received classifier information and offset cannot be the same as any of the stored APS offset sets (of the same component, or across different components).

In some examples, if a C0 candPos/bandNum classifier is used, the maximum number of APS offset sets is 4 per Y/U/V, and the FIFO update is for Y/V, indicating that the updated idx is for U.

/>

Table 23-3: CCSAO offset set updating using FIFO.

In some embodiments, the FIFO update may be (1) a cyclic update from the previously left set index (again starting from set 0 if all updates), as in the example above, (2) each update from set 0. In some examples, the update may be at the PH (as illustrated) or SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level when a new offset set is received.

For LRU updates, the decoder maintains a count table that counts the "total offset set usage count" that can be refreshed in SPS/APS/group of pictures (GOP) structure/PPS/PH/SH/region/CTU/CU/sub-block/sample level. The newly received offset set replaces the least recently used offset set in the APS. If the 2 stored offset sets have the same count, FIFO/LIFO may be used. For example, see component Y in Table 23-4 below.

Table 23-4: CCSAO offset set updating using LRU.

In some embodiments, different components may have different update mechanisms.

In some embodiments, different components (e.g., U/V) may share the same classifier (the same candPos/edge information/codec information/offset, may additionally have weights with correction values).

In some embodiments, sample processing is described below. Let R (x, y) be the input luminance or chrominance sample value before CCSAO, and R' (x, y) be the output luminance or chrominance sample value after CCSAO:

offset = ccsao_offset [ class_index of R (x, y) ]

R’(x,y)＝Clip3(0,(1<<bit_depth)–1,R(x,y)+offset)

Each luma or chroma sample value R (x, y) is classified using the indicated classifier and/or the current offset set index of the current picture according to the above equation. A corresponding offset of the derived class index is added to each luma or chroma sample value R (x, y). A clipping function Clip3 is applied to (R (x, y) +offset) to bring the output luminance or chrominance sample value R' (x, y) within the bit depth dynamic range, e.g., range 0 to (1 < < bit_depth) -1.

In some embodiments, when CCSAO is operating with other loop filters, the clipping operation may be

(1) And adding and cutting. The following equation shows an example of the following case: (a) CCSAO operates with SAO and BIF, or (b) CCSAO replaces SAO but still operates with BIF.

(a)I _OUT ＝clip1(I _C +ΔI _SAO +ΔI _BIF ++ΔI _CCSAO )

(b)I _OUT ＝clip1(I _C +ΔI _CCSAO +ΔI _BIF )

(2) Clipping before adding, and operating with BIF. In some embodiments, the clipping sequence may be switched.

(a)I _OUT ＝clip1(I _C +ΔI _SAO )

I′ _OUT ＝clip1(I _OUT +ΔI _BIF )

I″ _OUT ＝clip1(I″ _OUT +ΔI _CCSAO )

(b)I _OUT ＝clip1(I _C +ΔI _BIF )

I′ _OUT ＝clip1(I′ _OUT +ΔI _CCSAO )

(3) Cutting after adding part

(a)I _OUT ＝clip1(I _C +ΔI _SAO +ΔI _BIF )

I′ _OUT ＝clip1(I _OUT +ΔI _CCSAO )

In some embodiments, different clipping combinations give different trade-offs between correction accuracy and hardware temporary buffer size (register or SRAM bit width).

FIG. 13 (a) shows SAO/BIF offset clipping. Fig. 13 (b) shows an additional bit depth clipping of CCSAO. FIG. 13 (c) shows joint clipping after SAO/BIF/CCSAO offset is added to the input samples. More specifically, for example, FIG. 13 (a) shows the current BIF design when interacting with SAO. Offsets from SAO and BIF are added to the input samples, and then a bit depth cut is performed. However, when CCSAO is also added in the SAO stage, two possible clipping designs may be selected: (1) Adding an extra bit depth cut for CCSAO, and (2) a coordinated design that performs joint cuts after SAO/BIF/CCSAO offset is added to the input samples, as shown in FIG. 13 (b) and FIG. 13 (c). In some embodiments, the above clipping designs differ only in terms of luminance samples, as BIF is applied only to luminance samples.

In some embodiments, the boundary processing is described below. CCSAO is not applied to the current luma (chroma) sample if any co-located and neighboring luma (chroma) samples used for classification are outside the current picture. Fig. 14A is a block diagram illustrating that CCSAO is not applied to a current luminance (luma) sample if any of co-located and adjacent luminance (luma) samples used for classification are outside of the current picture, according to some embodiments of the present disclosure. For example, in fig. 14A (a), if a classifier is used, CCSAO is not applied to the left 1-column chrominance component of the current picture. For example, if C1' is used, CCSAO is not applied to the left 1 column and the upper 1 row chrominance components of the current picture, as shown in fig. 14A (b).

Fig. 14B is a block diagram illustrating the application of CCSAO to a current luma or chroma sample if any of the co-located and adjacent luma or chroma samples used for classification are outside of the current picture, according to some embodiments of the present disclosure. In some embodiments, a variant is to reuse the missing samples if any of the co-located and adjacent luma or chroma samples for classification are outside the current picture, as shown in fig. 14B (a), or to mirror fill the missing samples to create samples for classification, as shown in fig. 14B (B), CCSAO may be applied to the current luma or chroma samples. In some embodiments, the disable/repeat/mirror picture boundary processing methods disclosed herein may also be applied to sub-picture/slice/tile/CTU/360 virtual boundaries if any of the co-located and neighboring luma (chroma) samples for classification are outside of the current sub-picture/slice/tile/CTU/360 virtual boundary.

For example, a picture is divided into one or more tile rows and one or more tile columns. A tile is a series of CTUs that cover a rectangular area of a picture.

A stripe is made up of an integer number of consecutive complete CTU rows within an integer number of complete tiles or one tile of a picture.

The sub-picture includes one or more strips that collectively cover a rectangular area of the picture.

In some embodiments, 360 degree video is captured on a sphere and essentially has no "boundary", and reference samples beyond the reference picture boundary in the projection domain are always available from neighboring samples in the sphere. For projection formats composed of multiple facets, no matter what compact framing arrangement is used, discontinuities may occur between two or more adjacent facets in a framing picture. In VVC, vertical and/or horizontal virtual boundaries are introduced, on which the loop filtering operation is disabled, and the positions of these boundaries are signaled in the SPS or picture header. The use of a 360 virtual boundary is more flexible than using two tiles (one for each set of contiguous faces) because it does not require the face size to be a multiple of the CTU size. In some embodiments, the maximum number of vertical 360 virtual boundaries is 3, and the maximum number of horizontal 360 virtual boundaries is also 3. In some embodiments, the distance between two virtual boundaries is greater than or equal to the CTU size, and the virtual boundary granularity is 8 luma samples, e.g., an 8x8 sample grid.

Fig. 14C is a block diagram illustrating that CCSAO is not applied to a current chroma sample if a corresponding selected co-located or neighboring luma sample for classification is outside of a virtual space defined by a virtual boundary, according to some embodiments of the present disclosure. In some embodiments, the Virtual Boundary (VB) is a virtual line separating space within a picture frame. In some embodiments, if a Virtual Boundary (VB) is applied in the current frame, CCSAO is not applied to chroma samples having a selected corresponding luma location outside the virtual space defined by the virtual boundary. Fig. 14C shows an example of virtual boundaries of a C0 classifier with 9 luminance position candidates. For each CTU, CCSAO is not applied to chroma samples whose corresponding selected luma location is outside of the virtual space enclosed by the virtual boundary. For example, in fig. 14C (a), CCSAO is not applied to chroma samples 1402 when the selected Y7 luma sample position is located on the other side of horizontal virtual boundary 1406, which horizontal virtual boundary 1406 is 4 pixel rows from the bottom side of the frame. For example, in fig. 14C (b), CCSAO is not applied to chroma samples 1404 when the selected Y5 luma sample position is located on the other side of vertical virtual boundary 1408, which vertical virtual boundary 1408 is Y pixel rows from the right side of the frame.

Fig. 15 illustrates that repeated or mirrored padding may be applied to luminance samples outside of the virtual boundary according to some embodiments of the present disclosure. Fig. 15 (a) shows an example of repeated filling. If the original Y7 is selected as the classifier on the bottom side of VB 1502, then the Y4 luma sample value is used for classification (copied to Y7 position) instead of the original Y7 luma sample value. Fig. 15 (b) shows an example of mirror filling. If Y7 is selected as the classifier located on the bottom side of VB 1504, then the Y1 luminance sample value symmetrical to the Y7 value with respect to the Y0 luminance sample is used for classification instead of the original Y7 luminance sample value. The padding method provides more chroma sample possibilities for applying CCSAO, so more codec gains can be achieved.

In some embodiments, restrictions may be applied to reduce the line buffers required for CCSAO and simplify boundary processing condition checking. Fig. 16 illustrates that if all 9 co-located neighboring luma samples are used for classification, an additional 1 luma row buffer, i.e., the entire row of luma samples for row-5 above the current VB 1602, may be needed in accordance with some embodiments of the present disclosure. Fig. 10B (a) shows an example of classification using only 6 luminance candidates, which reduces the line buffers and does not require any additional boundary check in fig. 14A and 14B.

In some embodiments, the use of luma samples for CCSAO classification may increase the luma line buffers and thus increase decoder hardware implementation costs. Fig. 17 shows a diagram in AVS in which 9 luminance candidates CCSAO may be added 2 additional luminance line buffers beyond VB 1702 in accordance with some embodiments of the present disclosure. For luma and chroma samples above Virtual Boundary (VB) 1702, DBF/SAO/ALF is processed at the current CTU row. For luma and chroma samples below VB 1702, DBF/SAO/ALF is processed in the next CTU row. In the AVS decoder hardware design, luma row-4 to-1 pre-DBF samples, row-5 pre-SAO samples, and chroma row-3 to-1 pre-DBF samples, row-4 pre-SAO samples are stored as a row buffer for the next CTU row DBF/SAO/ALF processing. Processing the next CTU line, luminance and chrominance samples that are not in the line buffer are not available. However, for example, at the chroma line-3 (b) position, chroma samples are processed at the next CTU line, but CCSAO requires pre-SAO luma sample lines-7, -6 and-5 for classification. The pre-SAO luma sample lines-7, -6 are not in the line buffer and therefore they are not available. And adding pre-SAO luma sample lines-7 and-6 to the line buffers would increase decoder hardware implementation costs. In some examples, luminance VB (row-4) and chrominance VB (row-3) may be different (misaligned).

Similar to fig. 17, fig. 18A shows a diagram in a VVC in which 9 luminance candidates CCSAO may add 1 additional luminance line buffer beyond VB 1802, according to some embodiments of the present disclosure. VB may vary among different standards. In VVC, luminance VB is row-4 and chrominance VB is row-2, so 9 candidate CCSAOs may be increased by 1 luminance row buffer.

In some embodiments, in the first solution, CCSAO is disabled for a chroma sample if any luma candidate of the chroma sample passes beyond VB (outside the current chroma sample VB). Fig. 19A-19C illustrate that in AVS and VVC, CCSAO is disabled for a chroma sample if any luma candidate of the chroma sample passes beyond VB 1902 (outside of the current chroma sample VB), in accordance with some embodiments of the present disclosure. Fig. 14C also shows some examples of this embodiment.

In some embodiments, in a second solution, for "over VB" luminance candidates, the fill is repeated for CCSAO from the luminance line (e.g., luminance line-4) that is close to and on the other side of VB. In some embodiments, the repeated filling from luma nearest neighbors below VB is implemented for "over VB" chroma candidates. 20A-20C illustrate that in AVS and VVC, if any luma candidates of a chroma sample go beyond VB 2002 (outside of the current chroma sample VB), then CCSAO is enabled for the chroma sample using repeated padding, in accordance with some embodiments of the present disclosure. Fig. 14C (a) also shows some examples of this embodiment.

In some embodiments, in a third solution, for "over VB" luma candidates, the fill is mirrored from below luma VB for CCSAO. Fig. 21A-21C illustrate that in AVS and VVC, mirror padding is used to enable CCSAO for chroma samples if any luma candidate of the chroma samples passes beyond VB 2102 (outside of the current chroma sample VB), in accordance with some embodiments of the present disclosure. Fig. 14C (B) and 14B (B) also show some examples of this embodiment. In some embodiments, in a fourth solution, "bilateral symmetry filling" is used to apply CCSAO. Fig. 22A-22B illustrate the use of bilateral symmetry filling to enable CCSAO for some examples of different CCSAO shapes, e.g., 9 luminance candidates (fig. 22A) and 8 luminance candidates (fig. 22B), according to some embodiments of the present disclosure. For a luma sample set with co-located centered luma samples of chroma samples, a bilateral symmetry fill is applied to both sides of the luma sample set if one side of the luma sample set is outside of VB 2202. For example, in fig. 22A, luminance samples Y0, Y1, and Y2 are outside VB 2202, so Y0, Y1, Y2, and Y6, Y7, Y8 are all filled with Y3, Y4, Y5. For example, in fig. 22B, the luminance sample Y0 is outside of VB 2202, so Y0 is filled with Y2 and Y7 is filled with Y5.

Fig. 18B shows a graphical representation that a selected chroma candidate may cross VB and require an additional chroma line buffer when co-located or adjacent chroma samples are used to classify a current luma sample, according to some embodiments of the present disclosure. Solutions 1 to 4 similar to the above can be applied to deal with this problem.

Solution 1 is to disable CCSAO for luma samples when any chroma candidate of luma samples may cross VB.

Solution 2 is to use repeated padding from chroma nearest neighbors below VB for "over VB" chroma candidates.

Solution 3 is to use mirrored padding from below chroma VB for "over VB" chroma candidates.

Solution 4 is to use "bilateral symmetric filling". For a candidate set centered on a CCSAO co-located chroma sample, if one side of the candidate set is outside VB, a bilateral symmetry fill is applied to both sides.

The padding approach gives the possibility to apply CCSAO for more luma or chroma samples and thus more codec gain can be achieved.

In some embodiments. At the bottom picture (or slice, tile) boundary CTU row, samples below VB are processed at the current CTU row, so the special handling (solutions 1, 2, 3, 4) described above is not applicable to the bottom picture (or slice, tile) boundary CTU row. For example, 1920x1080 frames are divided into 128x128 CTUs. One frame contains 15x9 CTUs (rounded up). The bottom CTU row is the 15 th CTU row. The decoding process is one CTU row by one CTU row, each CTU row by one CTU. Deblocking needs to be applied along the horizontal CTU boundaries between the current and next CTU rows. CTB VB is applicable to each CTU row because inside one CTU, at the bottom 4/2 luminance/chrominance row, the DBF samples (VVC case) are processed at the next CTU row and CCSAO at the current CTU row is not available. However, at the bottom CTU row of the picture frame, the bottom 4/2 luma/chroma row DBF samples are available at the current CTU row because there is no next CTU row and they are DBF processed at the current CTU row.

In some embodiments, VB shown in FIGS. 13-22 may be replaced with boundaries of sub-picture/stripe/tile/CTU/360 virtual boundaries. In some embodiments, the positions of the chroma samples and the luma samples in fig. 13-22 may be switched. In some embodiments, the positions of the chroma samples and the luma samples in fig. 13-22 may be replaced with the positions of the first chroma sample and the second chroma sample. In some embodiments, ALF VB within the CTU may be generally horizontal. In some embodiments, the boundaries of the sub-picture/slice/tile/CTU/360 virtual boundary may be horizontal or vertical.

In some embodiments, restrictions may be applied to reduce the line buffers required for CCSAO and simplify boundary processing condition checking, as explained in fig. 16. Fig. 23 illustrates limitations of classifying using a limited number of luminance candidates according to some embodiments of the present disclosure. Fig. 23 (a) shows a limitation of classification using only 6 luminance candidates. Fig. 23 (b) shows a limitation of classification using only 4 luminance candidates.

In some embodiments, the application area is implemented. The CCSAO application area unit may be CTB based. I.e., on/off control, CCSAO parameters (offset for classification, luminance candidate position, band_num, bit mask, etc., offset set index) are identical in one CTB.

In some embodiments, the application area may not be aligned with the CTB boundary. For example, the application area is not aligned with the chroma CTB boundary, but is shifted. The syntax (on/off control, CCSAO parameters) is still signaled for each CTB, but the real application area is not aligned with CTB boundaries. Fig. 24 illustrates that CCSAO application areas are not aligned with CTB/CTU boundaries 2406 according to some embodiments of the present disclosure. For example, the application region is not aligned with chroma CTB/CTU boundary 2406, but is shifted (4, 4) samples in the upper left corner relative to VB 2408. This misaligned CTB boundary design facilitates the deblocking process because the same deblocking parameters are used for each 8x8 deblocking processing region.

In some embodiments, the CCSAO application area units (mask sizes) may be varied (greater or less than CTB size) as shown in table 24. The mask size may be different for different components. The mask size may be switched at SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. For example, in PH, a series of mask on/off flags and offset set indexes are signaled to indicate each CCSAO region information.

POC	Component(s)	CTB size	Mask size
				0	Cb	64x64	128x128
0	Cr	64x64	32x32
				1	Cb	64x64	16x16
1	Cr	64x64	256x256

Table 24: CCSAO application area units (mask size) may be variants

In some embodiments, CCSAO application area frame partitioning may be fixed. For example, a frame is partitioned into N regions. FIG. 25 illustrates that CCSAO application area frame partitions may be fixed with CCSAO parameters, according to some embodiments of the present disclosure.

In some embodiments, each zone may have its own zone on/off control flag and CCSAO parameters. Further, if the region size is larger than the CTB size, it may have both a CTB on/off control flag and a region on/off control flag. Fig. 25 (a) and (b) show some examples of partitioning a frame into N regions. Fig. 25 (a) shows vertical division of 4 areas. Fig. 25 (b) shows square partitions of 4 areas. In some embodiments, similar to the picture level CTB full-on control flag (ph_cc_sao_cb_ctb_control_flag/ph_cc_sao_cr_ctb_control_flag), the CTB on/off flag may be further signaled if the region on/off control flag is off. Otherwise, CCSAO will apply to all CTBs in the region without further signaling of CTB flags.

In some embodiments, different CCSAO application areas may share the same area on/off control and CCSAO parameters. For example, in fig. 25 (c), the areas 0 to 2 share the same parameters, and the areas 3 to 15 share the same parameters. Fig. 25 (c) also shows that the zone on/off control flag and the CCSAO parameter may be signaled in Hilbert (Hilbert) scan order.

In some embodiments, the CCSAO application area unit may be a quadtree/binary tree/trigeminal tree split from the picture/stripe/CTB level. Similar to CTB splitting, a series of split flags are signaled to indicate CCSAO application area partitions. Fig. 26 illustrates that the CCSAO application area may be a Binary Tree (BT)/Quadtree (QT)/Trigeminal Tree (TT) split from a frame/stripe/CTB level, according to some embodiments of the present disclosure.

Fig. 27 is a block diagram illustrating multiple classifiers used and switched at different levels within a picture frame according to some embodiments of the present disclosure. In some embodiments, if multiple classifiers are used in one frame, the method of how to apply the classifier set index may be switched at the SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. For example, four sets of classifiers are used in one frame, switching in PH, as shown in table 25 below. Fig. 27 (a) and (c) show default fixed-area classifiers. Fig. 27 (b) shows that classifier set indexes are signaled at the mask/CTB level, where 0 indicates CCSAO is turned off for this CTB and 1-4 indicate set indexes.

Table 25: using four sets of classifiers in one frame, switching in PH

In some embodiments, for the default region case, if the CTB in the region does not use the default set index (e.g., region level flag is 0), but uses the other classifier set in the frame, the region level flag may be signaled. For example, if a default set index is used, the region level flag is 1. For example, in the 4 regions of the square partition, the following set of classifiers is used, as shown in Table 26-1 below,

POC	region(s)	Sign mark	Using default set index
				0	1	1	Using a default set: 1
	2	1	Using a default set: 2
					3	1	Using a default set: 3
	4	0	CTB handover sets 1 to 4

Table 26-1: a region level flag may be signaled to show whether CTBs in the region do not use the default set index

Fig. 28 is a block diagram illustrating that CCSAO application area partitioning may be dynamic and switch at the picture level according to some embodiments of the present disclosure. For example, fig. 28 (a) shows that 3 CCSAO offset sets (set_num=3) are used in the POC, and thus a picture frame is vertically partitioned into 3 areas. Fig. 28 (b) shows that 4 CCSAO offset sets (set_num=4) are used in the POC, and thus a picture frame is horizontally partitioned into 4 areas. Fig. 28 (c) shows that 3 CCSAO offset sets (set_num=3) are used in this POC, and thus a picture frame is raster-partitioned into 3 areas. Each region may have its own region full-on flag to hold each CTB on/off control bit. The number of regions depends on the signaled picture set _ num. The CCSAO application area may be a specific area according to coding information (sample position, sample coding mode, loop filter parameters, etc.) within a block. For example, 1) the CCSAO application area may only be applied when samples are skip mode coded, or 2) the CCSAO application area contains only N samples along CTU boundaries, or 3) the CCSAO application area contains only samples on an 8x8 grid in a frame, or 4) the CCSAO application area contains only DBF filtered samples, or 5) the CCSAO application area contains only top M and left N rows in a CU, or (6) the CCSAO application area contains only intra coded samples, or (7) the CCSAO application area contains only samples in cbf=0 blocks, or (8) the CCSAO application area is only on blocks with block QP in [ N, M ], where (N, M) may be predefined or signaled at SPS/APS/PPS/SH/area/CTU/CU/sub-block/sample level. Cross-component coding information may also be considered, (9) CCSAO application area on chroma samples whose co-located luma samples are in cbf=0 blocks.

In some embodiments, whether or not the coding information application region restriction is introduced may be predefined or signaled at the SPS/APS/PPS/PH/SH/region (per substitution set)/CTU/CU/sub-block/sample level with a control flag to indicate whether or not the specified coding information is included/excluded in the CCSAO application. The decoder skips CCSAO processing of those regions according to predefined conditions or control flags. For example, YUV uses different predefined/flag control conditions that switch at the region (set) level. CCSAO application decisions may be made at the CU/TU/PU or sample level.

/>

Table 26-2: YUV uses different predefined/flag control conditions that switch at the region (set) level

Another example is to reuse all or part of the bilateral-enablement constraint (predefined).

bool isInter＝(currCU.predMode＝＝MODE_INTER)？true:false；

if(ccSaoParams.ctuOn[ctuRsAddr]

&&((TU::getCbf(currTU,COMPONENT_Y)||isInter＝＝false)&&(currTU.cu->qp>17))

&&(128>std::max(currTU.lumaSize().width,currTU.lumaSize().height))

&&((isInter＝＝false)||(32>std::min(currTU.lumaSize().width,currTU.lumaSize().height))))

In some embodiments, excluding some specific regions may be beneficial for CCSAO statistics collection. The offset derivation may be more accurate or appropriate for those areas that actually need correction. For example, a block with cbf=0 typically means that the block is perfectly predicted and may not require further correction. Excluding these blocks may be advantageous for offset derivation of other regions.

Different application areas may use different classifiers. For example, in CTU, skip mode uses C1,8x8 grid uses C2, skip mode and 8x8 grid uses C3. For example, in CTU, skip mode coding samples use C1, samples at the center of CU use C2, and skip mode coding samples at the center of CU use C3. Fig. 29 is a diagram illustrating that a CCSAO classifier may consider current or cross-component coding information according to some embodiments of the present disclosure. For example, different coding modes/parameters/sample positions may form different classifiers. The different encoded information may be combined to form a joint classifier. Different regions may use different classifiers. Fig. 29 also shows another example of an application area.

In some embodiments, a predefined or mark controlled "coding information exclusion zone" mechanism may be used for DBF/Pre-SAO/SAO/BIF/CCSAO/ALF/CCALF/NN loop filters (NNLF), or other loop filters.

In some embodiments, the CCSAO syntax implemented is shown in Table 27 below. In some examples, the binarization of each syntax element may be changed. In AVS3, the term patch is similar to a stripe, and the patch head is similar to a stripe head. FLC stands for fixed length code. TU stands for truncated unary code. EGK stands for k-th order exponential golomb code, where k may be fixed. SVLC stands for signed EG0.UVLC stands for unsigned EG0.

/>

Table 27: exemplary CCSAO grammar

If the higher level flag is off, the lower level flag may be inferred from the off state of the flag and need not be signaled. For example, if ph_cc_sao_cb_flag is false in the picture, ph_cc_sao_cb_band_num_minus1, ph_cc_sao_cb_luma_type, cc_sao_cb_offset_sign_flag, cc_sao_cb_offset_abs, ctb_cc_sao_cb_flag, cc_sao_cb_merge_left_flag, and cc_sao_cb_merge_up_flag are not present and inferred to be false.

In some embodiments, the SPS ccsao enabled flag is conditioned on an SPS SAO enable flag, as shown in table 28 below.

Table 28: SPS ccsao_enabled_flag is conditioned on SPS SAO enable flag

In some embodiments, the ph_cc_sao_cb_ctb_control_flag, ph_cc_sao_cr_ctb_control_flag indicates whether Cb/Cr CTB on/off control granularity is enabled. If ph_cc_sao_cb_ctb_control_flag and ph_cc_sao_cr_ctb_control_flag are enabled, ctb_cc_sao_cb_flag and ctb_cc_sao_cr_flag may be further signaled. Otherwise, whether CCSAO is applied in the current picture depends on ph_cc_sao_cb_flag, ph_cc_sao_cr_flag, without further signaling ctb_cc_sao_cb_flag and ctb_cc_sao_cr_flag at CTB level.

In some embodiments, for ph_cc_sao_cb_type and ph_cc_sao_cr_type, a flag may be further signaled to distinguish whether a center parity luma position (Y0 position in fig. 10) is used for classification of chroma samples to reduce bit overhead. Similarly, if cc_sao_cb_type and cc_sao_cr_type are signaled at CTB level, the flag may be further signaled using the same mechanism. For example, if the number of C0 luminance position candidates is 9, cc_sao_cb_type0_flag is further signaled to distinguish whether or not the center parity luminance position is used, as shown in table 29 below. If the center parity luminance position is not used, cc_sao_cb_type_idc is used to indicate which of the remaining 8 neighboring luminance positions is used.

Table 29: signaling cc_sao_cb_type0_flag to distinguish whether center parity luminance location is used

Table 30 below shows an example of using a single (set_num=1) or multiple (set_num > 1) classifiers in an AVS in a frame. Note that syntax symbols may be mapped to the symbols used above.

/>

Table 30: examples of using a single (set_num=1) or multiple (set_num > 1) classifiers in a picture frame in AVS

If fig. 25 or 27 is combined, in which each region has its own set, a syntax example may include a region on/off control flag (picture_ccsao_ lcu _control_flag [ compIdx ] [ setIdx ]) as shown in table 31 below.

Table 31: each region has its own set, and syntax examples may include region on/off control flags

(picture_ccsao_lcu_control_flag[compIdx][setIdx])

In some embodiments, pps_ccsao_info_in_ph_flag and gci_no_sao_constraint_flag may be added for high level syntax.

In some embodiments, pps_ccsao_info_in_ph_flag equal to 1 specifies that CCSAO filter information may be present in the PH syntax structure and not in a slice header referencing PPS that does not contain the PH syntax structure. pps_ccsao_info_in_ph_flag equal to 0 specifies that CCSAO filter information does not exist in the PH syntax structure and may exist in the slice header referencing PPS. When not present, the value of pps_ccsao_info_in_ph_flag is inferred to be equal to 0.

In some embodiments, gci_no_ccsao_constraint_flag equal to 1 specifies that for all pictures in OlsInScope, sps_ccsao_enabled_flag should be equal to 0. No such constraint is imposed by gci_no_ccsao_constraint_flag being equal to 0. In some embodiments, the bitstream of video includes one or more Output Layer Sets (OLS) according to rules. In the examples herein, olsInScope refers to one or more OLS within a range. In some examples, the profile_tier_level () syntax structure provides level information and optionally general constraint information that the profile (profile), layer, sub-profile, and OlsInScope conform to. When the profile_tier_level () syntax structure is contained in the VPS, olsInScope is one or more OLS specified by the VPS. When the profile_tier_level () syntax structure is contained in the SPS, olsInScope is an OLS including only a layer which is the lowest layer among the layers referencing the SPS, and the lowest layer is an independent layer.

In some embodiments, the extension of the intra and inter predicted SAO filters is further described below. In some embodiments, the SAO classification methods disclosed in the present disclosure (including cross-component sample/coding information classification) may be used as post-prediction filters, and the prediction may be intra, inter or other prediction tools, such as intra block copy. Fig. 30 is a block diagram illustrating a SAO classification method disclosed in the present disclosure as a post-prediction filter according to some embodiments of the present disclosure.

In some embodiments, for each Y, U and V component, a respective classifier is selected. And for each component prediction sample, classification is first performed and a corresponding offset is added. For example, each component may be classified using the current sample and the neighboring samples. Y uses the current Y and neighbor Y samples, and U/V is classified using the current U/V sample, as shown in Table 32 below. Fig. 31 is a block diagram illustrating that each component may be classified using a current sample and neighboring samples for a post-prediction SAO filter according to some embodiments of the present disclosure.

Table 32: selecting a corresponding classifier for each Y, U and V component

In some embodiments, the refined prediction samples (Ypred ', upsred ', vpred ') are updated by adding corresponding class offsets and then used for intra prediction, inter prediction, or other predictions.

Ypred’＝clip3(0,(1<<bit_depth)-1,Ypred+h_Y[i])

Upred’＝clip3(0,(1<<bit_depth)-1,Upred+h_U[i])

Vpred’＝clip3(0,(1<<bit_depth)-1,Vpred+h_V[i])

In some embodiments, for chrominance U and V components, the cross component (Y) may be used for further offset classification in addition to the current chrominance component. Additional cross-component offsets (h_u, h_v) may be added to the current component offset (h_u, h_v), for example, as shown in table 33 below.

/>

Table 33: for chrominance U and V components, the cross-component (Y) may be used to further offset classification in addition to the current chrominance component

In some embodiments, the refined prediction samples (upsred ", vpred") are updated by adding corresponding class offsets and used for later intra-prediction, inter-prediction, or other predictions.

Upred”＝clip3(0,(1<<bit_depth)-1,Upred’+h’_U[i])

Vpred”＝clip3(0,(1<<bit_depth)-1,Vpred’+h’_V[i])

In some embodiments, intra-prediction and inter-prediction may use different SAO filter offsets.

In some embodiments, the SAO/CCSAO classification methods disclosed herein, including cross-component sample/coding information classification, may be used as a filter applied to reconstructed samples of a Tree Unit (TU). As shown in fig. 32, CCSAO may be used as a post-reconstruction filter, i.e. compensating for luma/chroma samples before entering neighboring intra/inter prediction, using reconstructed samples (after prediction/residual sample addition, before deblocking) as classification inputs. The CCSAO post-reconstruction filter may reduce distortion of current TU samples and may provide better prediction for neighboring intra/inter blocks. Better compression efficiency can be expected by more accurate prediction.

Fig. 33 is a flow chart illustrating an exemplary process 3300 of decoding a video signal using cross-component correlation, according to some embodiments of the present disclosure.

Video decoder 30 (shown in fig. 3) receives an Adaptive Parameter Set (APS) identifier from the video data, the identifier associated with a plurality of previously used cross-component sample adaptive offset (CCSAO) filter offset sets stored in the APS (3310).

Video decoder 30 receives a syntax in a Picture Header (PH) or a Slice Header (SH) from video data that indicates an APS identifier for the current picture or slice (3320).

Video decoder 30 decodes a filter set index for a current Coding Tree Unit (CTU) that indicates a particular previously used CCSAO filter offset set (3330) of a plurality of offset sets associated with APS identifiers in the APS.

Video decoder 30 applies a particular set of previously used CCSAO filter offsets to the current CTU of video data (3340).

In some embodiments, the APS identifier is associated with a maximum number of the previously used CCSAO filter offset sets, and in response to determining that the number of previously used CCSAO filter offset sets reaches the maximum number, a newly added CCSAO filter offset set replaces one of the previously used CCSAO filter offset sets with a first-in-first-out (FIFO) mechanism.

In some embodiments, the APS identifier is associated with a maximum number of the previously used CCSAO filter offset sets, and in response to determining that the number of previously used CCSAO filter offset sets reaches the maximum number, a newly added CCSAO filter offset set replaces one of the previously used CCSAO filter offset sets with a Least Recently Used (LRU) mechanism.

In some embodiments, in the FIFO mechanism, the newly added CCSAO filter offset set cyclically replaces one of the previously used CCSAO filter offset sets in one or more of a Sequence Parameter Set (SPS), APS, picture parameter set (SPS), PH, SH, region, coding Tree Unit (CTU), coding Unit (CU), sub-block, and/or sample level.

In some embodiments, in the LRU mechanism, the newly added CCSAO filter offset set replaces the least recently used offset set identified by the count table of the previously used CCSAO filter offset set in one or more of SPS, APS, group of pictures (GOP) structure, PPS, PH, SH, region, CTU, CU, sub-block, and/or sample level.

In some embodiments, the maximum number is predefined or signaled in one or more of SPS, APS, PPS, PH, SH, region, CTU, CU, sub-block and/or sample levels.

In some embodiments, the plurality of previously used CCSAO filter offset sets includes one or more of candidate locations, band information, edge information, and coding information associated with the respective classifier.

In some embodiments, if none of the previously used CCSAO filter offset sets associated with the APS identifier is the same as the newly added CCSAO filter offset set, the newly added CCSAO filter offset set replaces one of the previously used CCSAO filter offset sets.

In some embodiments, the APS identifier is for the first component and the second component in the current picture or slice, wherein the first component uses a first CCSAO filter offset set replacement mechanism and the second component uses a second CCSAO filter offset set replacement mechanism.

In some embodiments, a first syntax in PH or SH indicates a first APS identifier for the first component in the current picture or slice, and a second syntax in PH or SH indicates a second APS identifier for a second component in the current picture or slice.

Fig. 34 illustrates a computing environment 3410 coupled to a user interface 3450. The computing environment 3410 may be part of a data processing server. The computing environment 3410 includes a processor 3420, memory 3430, and an input/output (I/O) interface 3440.

The processor 3420 generally controls the overall operation of the computing environment 3410, such as operations associated with display, data acquisition, data communication, and image processing. The processor 3420 may include one or more processors to execute instructions to perform all or some of the steps of the methods described above. Further, the processor 3420 may include one or more modules that facilitate interactions between the processor 3420 and other components. The processor may be a Central Processing Unit (CPU), microprocessor, single-chip microcomputer, graphics Processor (GPU), or the like.

The memory 3430 is configured to store various types of data to support the operation of the computing environment 3410. The memory 3430 may include predetermined software 3432. Examples of such data include instructions, video data sets, image data, and the like for any application or method operating on the computing environment 3410. The memory 3430 may be implemented using any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk.

The I/O interface 3440 provides an interface between the processor 3420 and peripheral interface modules, such as a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to, a home button, a start scan button, and a stop scan button. The I/O interface 3440 may be coupled with an encoder and a decoder.

In one embodiment, a non-transitory computer readable storage medium is also provided that includes a plurality of programs, e.g., in memory 3430, executable by processor 3420 in computing environment 3410 for performing the methods described above. Alternatively, the non-transitory computer-readable storage medium may have stored therein a bitstream or data stream that includes encoded video information (e.g., video information including one or more syntax elements) generated by an encoder (e.g., video encoder 20 of fig. 2) using, for example, the encoding method described above for use by a decoder (e.g., video decoder 30 of fig. 3) in decoding video data. The non-transitory computer readable storage medium may be, for example, ROM, random-access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In one embodiment, a computing device is also provided, including one or more processors (e.g., processor 3420); and a non-transitory computer-readable storage medium or memory 3430 having stored therein a plurality of programs executable by one or more processors, wherein the one or more processors are configured to perform the above-described methods when the plurality of programs are executed.

In one embodiment, a computer program product is also provided, comprising a plurality of programs, e.g., in the memory 3430, executable by the processor 3420 in the computing environment 3410 for performing the methods described above. For example, a computer program product may include a non-transitory computer-readable storage medium.

In one embodiment, the computing environment 3410 may be implemented with one or more ASICs, DSPs, digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), FPGAs, GPUs, controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

Further embodiments also include various subsets of the above embodiments combined or otherwise rearranged in various other embodiments.

In one or more examples, the described functionality may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media corresponding to tangible media, such as data storage media, or communication media, including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures to implement the embodiments described herein. The computer program product may include a computer-readable medium.

The terminology used in the description of the embodiments herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of the claims. As used in the description of the embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, elements, and/or components, but do not preclude the presence or addition of one or more other features, elements, components, and/or groups thereof.

It will be further understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, the first electrode may be referred to as a second electrode, and similarly, the second electrode may be referred to as a first electrode, without departing from the scope of the embodiments. The first electrode and the second electrode are both electrodes, but not the same electrode.

Reference throughout this specification to "one example," "an example," etc., in the singular or plural form means that one or more particular features, structures, or characteristics described in connection with the example are included in at least one example of the present disclosure. Thus, the appearances of the phrases "in one example" or "in an example," "in an exemplary example," and the like in singular or plural form throughout this specification are not necessarily all referring to the same example. Furthermore, the particular features, structures, or characteristics of one or more examples may be combined in any suitable manner.

The description of the present application has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications, variations and alternative embodiments will come to mind to one skilled in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. The example was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments and with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of the claims is not to be limited to the specific examples of the disclosed embodiments and that modifications and other embodiments are intended to be included within the scope of the appended claims.

Claims

1. A method of decoding video data, comprising:

receiving an Adaptive Parameter Set (APS) identifier from the video data, the APS identifier associated with a plurality of previously used cross-component sample adaptive offset (CCSAO) filter offset sets stored in the APS,

receiving a syntax in a Picture Header (PH) or a Slice Header (SH) from the video data, the syntax indicating the APS identifier for a current picture or slice,

decoding a filter set index for a current Coding Tree Unit (CTU), the filter set index indicating a particular previously used CCSAO filter offset set of a plurality of offset sets of the APS associated with the APS identifier; and

the particular set of previously used CCSAO filter offsets is applied to the current CTU of the video data.

2. The method of claim 1, wherein the video data includes a first component and a second component, and the particular set of previously used CCSAO filter offsets is obtained by:

determining a respective classifier for the second component from a set of one or more samples of the first component associated with a respective sample of the second component,

Determining a respective sample offset for the respective sample of the second component according to the respective classifier to modify the respective sample of the second component based on the determined respective sample offset, an

The set of corresponding determined respective sample offsets for each respective classifier is stored as the particular set of previously used CCSAO filter offsets.

3. The method of claim 1, wherein the APS identifier is associated with a maximum number of the previously used CCSAO filter offset sets, and in response to determining that the number of previously used CCSAO filter offset sets reaches the maximum number, a newly added CCSAO filter offset set replaces one of the previously used CCSAO filter offset sets with a first-in-first-out (FIFO) mechanism.

4. The method of claim 1, wherein the APS identifier is associated with a maximum number of the previously used CCSAO filter offset sets, and in response to determining that the number of previously used CCSAO filter offset sets reaches the maximum number, a newly added CCSAO filter offset set replaces one of the previously used CCSAO filter offset sets with a Least Recently Used (LRU) mechanism.

5. The method of claim 3, wherein in the FIFO mechanism, the newly added CCSAO filter offset set cyclically replaces one of the previously used CCSAO filter offset sets in one or more of a Sequence Parameter Set (SPS), APS, picture parameter set (SPS), PH, SH, region, coding Tree Unit (CTU), coding Unit (CU), sub-block, and/or sample level.

6. The method of claim 4, wherein in the LRU mechanism, the newly added CCSAO filter offset set replaces a least recently used offset set identified by a count table of the previously used CCSAO filter offset set in one or more of SPS, APS, group of pictures (GOP) structure, PPS, PH, SH, region, CTU, CU, sub-block, and/or sample level.

7. A method as claimed in claim 3, wherein the maximum number is predefined or signaled in one or more of SPS, APS, PPS, PH, SH, region, CTU, CU, sub-block and/or sample levels.

8. The method of claim 2, wherein the plurality of previously used CCSAO filter offset sets includes one or more of candidate locations, band information, edge information, and coding information associated with the respective classifiers.

9. The method of claim 3, wherein if none of the previously used CCSAO filter offset sets associated with the APS identifier is the same as the newly added CCSAO filter offset set, the newly added CCSAO filter offset set replaces one of the previously used CCSAO filter offset sets.

10. The method of claim 2, wherein the APS identifier is for the first component and the second component in the current picture or slice, wherein the first component uses a first CCSAO filter offset set replacement mechanism and the second component uses a second CCSAO filter offset set replacement mechanism.

11. The method of claim 2, wherein a first syntax in PH or SH indicates a first APS identifier for the first component in the current picture or slice, and a second syntax in PH or SH indicates a second APS identifier for a second component in the current picture or slice.

12. An electronic device, comprising:

one or more processing units;

a memory coupled to the one or more processing units; and

a plurality of programs stored in the memory, which when executed by the one or more processing units, cause the electronic device to perform the method of any of claims 1-11.

13. A computer readable storage medium having stored therein a bitstream comprising video information generated in accordance with the method for decoding video data of any of claims 1-11.

14. A non-transitory computer readable storage medium storing a plurality of programs for execution by an electronic device with one or more processing units, wherein the plurality of programs, when executed by the one or more processing units, cause the electronic device to perform the method of any of claims 1-11.