CN114846807A - Coding and decoding of chroma residual - Google Patents

Coding and decoding of chroma residual Download PDF

Info

Publication number
CN114846807A
CN114846807A CN202080088818.7A CN202080088818A CN114846807A CN 114846807 A CN114846807 A CN 114846807A CN 202080088818 A CN202080088818 A CN 202080088818A CN 114846807 A CN114846807 A CN 114846807A
Authority
CN
China
Prior art keywords
chroma
residual
video
dynamic range
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080088818.7A
Other languages
Chinese (zh)
Inventor
修晓宇
陈漪纹
马宗全
朱弘正
王祥林
于冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Publication of CN114846807A publication Critical patent/CN114846807A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/98Adaptive-dynamic-range coding [ADRC]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present application relates to decoding video data. The electronic device reconstructs a plurality of residual blocks of the video frame from the video bitstream. The luma and chroma residuals of a plurality of residual blocks of a video frame are clipped into a first dynamic range and a second dynamic range, respectively. For each residual block, a luma residual for a luma component and a chroma residual for two chroma components are determined from the cropped luma and chroma residuals of the plurality of residual blocks of the video frame. In some embodiments, the electronic device determines whether a joint codec (JCCR) mode of the chroma residual is enabled. In accordance with a determination that the JCCR mode is enabled, jointly determining chroma residuals for the two chroma components in accordance with the JCCR scheme.

Description

Coding and decoding of chroma residual
RELATED APPLICATIONS
This application claims priority from us provisional application entitled "Joint Coding of chroma Residuals resials" entitled "united states provisional application for Joint Coding of chroma Residuals," filed 2019, 12, 30, month and year, application number 62/955,319, which is incorporated herein by reference in its entirety.
This application relates to a PCT application entitled "method and Apparatus for Joint Coding of Chroma Residuals" (Methods and Apparatus of Joint Coding of Chroma Residuals) filed on 30.4.2020/030743, 2020/2020, claiming priority from a US provisional patent application entitled "Joint Coding of Chroma Residuals" (filed on 30.4.2019, 62/841,158, all of which are hereby incorporated by reference in their entirety.
Technical Field
The present application relates generally to video data codec and compression and, in particular, to methods and apparatus for decoding chroma residuals.
Background
Various electronic devices (e.g., digital televisions, laptop or desktop computers, tablet computers, digital cameras, digital recording devices, digital media players, video game consoles, smart phones, video teleconferencing devices, video streaming devices, etc.) support digital video. Electronic devices transmit, receive, encode, decode, and/or store digital video data by implementing video compression/decompression standards as defined by the MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Codec (AVC), High Efficiency Video Codec (HEVC), and general video codec (VVC) standards. Video compression typically includes performing spatial (intra) prediction and/or temporal (inter) prediction to reduce or remove redundancy inherent in the video data. For block-based video coding, a video frame is partitioned into one or more slices, each slice having a plurality of video blocks, which may also be referred to as Coding Tree Units (CTUs). Each CTU may contain one Coding Unit (CU), or be recursively divided into smaller CUs until a predefined minimum CU size is reached. Each CU (also referred to as a leaf CU) contains one or more Transform Units (TUs), and each CU also contains one or more Prediction Units (PUs). Each CU may be coded in intra mode, inter mode, or IBC mode. Video blocks in an intra-coded (I) slice of a video frame are encoded using spatial prediction with respect to reference samples in neighboring blocks within the same video frame. Video blocks in an inter-coded (P or B) slice of a video frame may use spatial prediction with respect to reference samples in neighboring blocks within the same video frame, or temporal prediction with respect to reference samples in other previous and/or future reference video frames.
A prediction block for a current video block to be encoded is derived based on spatial prediction or temporal prediction of a reference block (e.g., a neighboring block) that has been previously encoded. The process of finding the reference block may be accomplished by a block matching algorithm. Residual data representing pixel differences between the current block to be encoded and the prediction block is referred to as a residual block or prediction error. An inter-coded block is coded according to a motion vector and a residual block, the motion vector pointing to a reference block forming a predicted block in a reference frame. The process of determining motion vectors is commonly referred to as motion estimation. The intra-coded block is coded according to the intra-prediction mode and the residual block. For further compression, the residual block is transformed from the pixel domain to a transform domain (e.g., frequency domain), resulting in residual transform coefficients, which may then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, may be scanned to produce one-dimensional vectors of transform coefficients, and then entropy encoded into a video bitstream to achieve even greater compression.
The encoded video bitstream is then stored in a computer readable storage medium (e.g., flash memory) for access by another electronic device having digital video capabilities or for direct transmission to the electronic device, either wired or wirelessly. The electronic device then performs video decompression (which is the inverse of the video compression described above) by, for example, parsing the encoded video bitstream to obtain syntax elements from the bitstream, and reconstructing the digital video data from the encoded video bitstream to its original format based at least in part on the syntax elements obtained from the bitstream, and presents the reconstructed digital video data on a display of the electronic device.
To maintain flexibility and scalability, video codec standards typically define options for the syntax of the coded video bitstream that specify the parameters allowed by the syntax in the bitstream. In many cases, this option also provides details about the decoding operations that the decoder should perform to get the syntax parameters from the bitstream and get the correct results in the decoding. As the digital video quality changes from high definition to 4K × 2K or even 8K × 4K, the amount of video data to be encoded/decoded grows exponentially. It is a long-standing challenge how to be able to encode/decode video data more efficiently, while maintaining the image quality of the decoded video data.
Disclosure of Invention
The present application describes embodiments related to video data encoding and decoding, and more particularly, embodiments related to methods and apparatuses for decoding of a luminance residual and a chrominance residual. In one aspect of the present application, a method for displaying video dataThe decoding method comprises the following steps: a plurality of residual blocks of a video frame is reconstructed from a video bitstream. The method further comprises the following steps: the method includes clipping luma and chroma residuals of a plurality of residual blocks of the video frame into a first dynamic range and a second dynamic range, respectively, and determining a luma residual for a luma component and a chroma residual for two chroma components of each residual block of the video frame from the clipped luma and chroma residuals of the plurality of residual blocks of the video frame. In some embodiments, the method further comprises: it is determined whether a joint codec (JCCR) mode of chroma residuals is enabled. In accordance with a determination that the JCCR mode is enabled, jointly determining a chroma residual for the two chroma components of the video frame in accordance with a scheme of joint coding of the chroma residual. In an example, one or both of the first dynamic range and the second dynamic range are defined as [ -2 [ K-1 ,2 K-1 -1]Or [ - (2) K-1 -1),2 K-1 -1]Where K is an integer (e.g., 16). K may correspond to a predefined bit depth or a coded bit depth of a video frame.
In another aspect of the present application, an electronic device includes one or more processing units, a memory, and a plurality of programs stored in the memory. Which when executed by one or more processing units causes the electronic device to perform the method of encoding video data as described above.
In yet another aspect of the application, a non-transitory computer readable storage medium stores a plurality of programs for execution by an electronic device having one or more processing units. Which when executed by one or more processing units causes the electronic device to perform the method of decoding video data as described above.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification, illustrate described embodiments and together with the description serve to explain the principles. Like reference numerals designate corresponding parts.
Fig. 1 is a block diagram illustrating an exemplary video encoding and decoding system according to some embodiments of the present disclosure.
Fig. 2 is a block diagram illustrating an exemplary video encoder according to some embodiments of the present disclosure.
Fig. 3 is a block diagram illustrating an exemplary video decoder according to some embodiments of the present disclosure.
Fig. 4A-4E are block diagrams illustrating how a frame is recursively partitioned into multiple video blocks of different sizes and shapes according to some embodiments of the disclosure.
Fig. 5A and 5B are flow diagrams illustrating exemplary processes by which a video encoder according to some embodiments of the present invention implements techniques for encoding video data using a joint coding scheme for chroma residuals.
Fig. 6A-6C are flow diagrams illustrating exemplary processes by which a video decoder implements techniques for decoding video data using a joint coding scheme for chroma residuals, according to some embodiments of the present invention.
Fig. 7 is a block diagram of an exemplary video decoder configured for luma mapping and chroma scaling according to some embodiments.
Fig. 8 is a block diagram of an exemplary video decoder configured to implement Adaptive Color Transform (ACT), according to some embodiments.
Fig. 9A and 9B are block diagrams of two example residual decoding subsystems that may be applied in a video decoder, according to some embodiments.
Fig. 10A and 10B are block diagrams of two other example residual decoding subsystems, in accordance with some embodiments.
Fig. 11A and 11B are flow diagrams of video decoding methods implemented by an electronic device, according to some embodiments.
Detailed Description
Reference will now be made in detail to the present embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous non-limiting specific details are set forth in order to provide an understanding of the subject matter presented herein. It will be apparent, however, to one skilled in the art that various alternatives may be used and the subject matter may be practiced without these specific details without departing from the scope of the claims. For example, it will be apparent to one of ordinary skill in the art that the subject matter presented herein may be implemented on many types of electronic devices having digital video capabilities.
Fig. 1 is a block diagram illustrating an example system 10 for encoding and decoding video blocks in parallel according to some embodiments of the present disclosure. As shown in fig. 1, system 10 includes a source device 12, source device 12 generating and encoding video data to be later decoded by a target device 14. Source device 12 and target device 14 may comprise any of a wide variety of electronic devices, including desktop or laptop computers, tablet computers, smart phones, set-top boxes, digital televisions, cameras, display devices, digital media players, video game machines, video streaming devices, and the like. In some embodiments, source device 12 and target device 14 are equipped with wireless communication capabilities.
In some embodiments, target device 14 may receive encoded video data to be decoded via link 16. Link 16 may include any type of communication medium or device capable of moving encoded video data from source device 12 to destination device 14. In one example, link 16 may include a communication medium that enables source device 12 to transmit encoded video data directly to target device 14 in real-time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to the target device 14. The communication medium may include any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the internet. The communication medium may include a router, switch, base station, or any other device that may facilitate communication from source device 12 to target device 14.
In other embodiments, the encoded video data may be transmitted from the output interface 22 to the storage device 32. The encoded video data in storage device 32 may then be accessed by target device 14 via input interface 28. Storage device 32 may include any of a variety of distributed or locally accessed data storage media such as a hard drive, blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded video data. In another example, storage device 32 may correspond to a file server or another intermediate storage device that may hold encoded video data generated by source device 12. The target device 14 may access the stored video data from the storage device 32 via streaming or downloading. The file server may be any type of computer capable of storing encoded video data and transmitting the encoded video data to the target device 14. Exemplary file servers include web servers (e.g., for a website), FTP servers, Network Attached Storage (NAS) devices, or local disk drives. The target device 14 may access the encoded video data through any standard data connection suitable for accessing encoded video data stored on a file server, including a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both wireless and wired connections. The transmission of the encoded video data from the storage device 32 may be a streaming transmission, a download transmission, or a combination of both a streaming and a download transmission.
As shown in fig. 1, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. Video source 18 may include sources such as the following or a combination of such sources: a video capture device (e.g., a video camera), a video archive containing previously captured video, a video feed interface for receiving video from a video content provider, and/or a computer graphics system for generating computer graphics data as source video. As one example, if video source 18 is a video camera of a security monitoring system, source device 12 and destination device 14 may form a camera phone or video phone. However, embodiments described herein may be generally applicable to video codecs, and may be applied to wireless and/or wired applications.
Captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video data may be transmitted directly to the target device 14 via the output interface 22 of the source device 12. The encoded video data may also (or alternatively) be stored on storage device 32 for later access by target device 14 or other devices for decoding and/or playback. The output interface 22 may further include a modem and/or a transmitter.
The target device 14 includes an input interface 28, a video decoder 30, and a display device 34. Input interface 28 may include a receiver and/or a modem and receives encoded video data over link 16. The encoded video data transmitted over link 16 or provided on storage device 32 may include various syntax elements generated by video encoder 20 for use by video decoder 30 in decoding the video data. Such syntax elements may be included within encoded video data sent over a communication medium, stored on a storage medium, or stored on a file server.
In some embodiments, the target device 14 may include a display device 34, and the display device 34 may be an integrated display device and an external display device configured to communicate with the target device 14. Display device 34 displays the decoded video data to a user and may include any of a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.
Video encoder 20 and video decoder 30 may operate according to a proprietary or industry standard (e.g., VVC, HEVC, part 10 of MPEG-4, Advanced Video Codec (AVC)) or an extension of such a standard. It should be understood that the present application is not limited to a particular video encoding/decoding standard and may be applicable to other video encoding/decoding standards. It is generally recognized that video encoder 20 of source device 12 may be configured to encode video data according to any of these current or future standards. Similarly, it is also generally contemplated that video decoder 30 of target device 14 may be configured to decode video data in accordance with any of these current or future standards.
Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder and/or decoder circuits, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic devices, software, hardware, firmware or any combinations thereof. When implemented in part in software, the electronic device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the video encoding/decoding operations disclosed in this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device.
Fig. 2 is a block diagram illustrating an exemplary video encoder 20 according to some embodiments described in the present application. Video encoder 20 may perform intra-prediction encoding and inter-prediction encoding on video blocks within video frames. Intra-prediction coding relies on spatial prediction to reduce or remove spatial redundancy in video data within a given video frame or picture. Inter-prediction coding relies on temporal prediction to reduce or remove temporal redundancy in video data within adjacent video frames or pictures of a video sequence.
As shown in fig. 2, video encoder 20 includes a video data memory 40, a prediction processing unit 41, a Decoded Picture Buffer (DPB)64, an adder 50, a transform processing unit 52, a quantization unit 54, and an entropy encoding unit 56. Prediction processing unit 41 further includes a motion estimation unit 42, a motion compensation unit 44, a partition unit 45, an intra prediction processing unit 46, and an intra Block Copy (BC) unit 48. In some embodiments, video encoder 20 also includes an inverse quantization unit 58, an inverse transform processing unit 60, and an adder 62 for video block reconstruction. Loop filter 66 may be located between adder 62 and DPB 64 and include a deblocking filter to filter block boundaries and remove blockiness from reconstructed video. Loop filter 66 also includes a Sample Adaptive Offset (SAO) filter and/or an Adaptive Loop Filter (ALF) to filter the output of summer 62 before the output of summer 62 is input to DPB 64 and used to codec other video blocks. The video encoder 20 may take the form of fixed or programmable hardware units, or may be dispersed among one or more of the illustrated fixed or programmable hardware units.
Video data memory 40 may store video data to be encoded by components of video encoder 20. The video data in video data storage 40 may be obtained, for example, from video source 18. DPB 64 is a buffer that stores reference video data for use by video encoder 20 in encoding video data (e.g., in intra or inter prediction encoding modes). Video data memory 40 and DPB 64 may be formed from any of a variety of memory devices. In various examples, video data memory 40 may be on-chip with other components of video encoder 20, or off-chip with respect to those components.
As shown in fig. 2, upon receiving the video data, partition unit 45 within prediction processing unit 41 partitions the video data into video blocks. This partitioning may also include partitioning the video frame into slices, tiles (tiles), or other larger Coding Units (CUs) according to a predefined partitioning structure, such as a quadtree structure, associated with the video data. A video frame may be divided into multiple video blocks (or a set of video blocks called tiles). Prediction processing unit 41 may select one of a plurality of possible predictive coding modes, such as one of one or more inter-predictive coding modes of a plurality of intra-predictive coding modes, for the current video block based on the error results (e.g., coding rate and distortion level). Prediction processing unit 41 may provide the resulting intra-predicted or inter-predicted encoded blocks to adder 50 to generate a residual block, and to adder 62 to reconstruct the encoded block for subsequent use as part of a reference frame. Prediction processing unit 41 also provides syntax elements (such as motion vectors, intra-mode indicators, partition information, and other such semantic information) to entropy encoding unit 56.
To select a suitable intra-prediction encoding mode for the current video block, intra-prediction processing unit 46 within prediction processing unit 41 may perform intra-prediction encoding of the current video block in relation to one or more neighboring blocks in the same frame as the current block to be encoded to provide spatial prediction. Motion estimation unit 42 and motion compensation unit 44 within prediction processing unit 41 perform inter-prediction encoding of the current video block in relation to one or more prediction blocks in one or more reference frames to provide temporal prediction. Video encoder 20 may perform multiple encoding passes, e.g., to select an appropriate encoding mode for each block of video data.
In some implementations, motion estimation unit 42 determines the inter-prediction mode for the current video frame by generating motion vectors according to predetermined patterns within the sequence of video frames, the motion vectors indicating the displacement of Prediction Units (PUs) of video blocks within the current video frame relative to prediction blocks within the reference video frame. The motion estimation performed by motion estimation unit 42 is a process of generating motion vectors that estimate motion for video blocks. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a prediction block within a reference frame (or other coding unit) associated with the current block being encoded within the current frame (or other coding unit). The predetermined pattern may designate video frames in the sequence as P-frames or B-frames. Intra BC unit 48 may determine vectors (e.g., block vectors) for intra BC encoding in a similar manner as the motion vectors determined by motion estimation unit 42 for inter prediction, or may determine block vectors using motion estimation unit 42.
In terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), or other difference metrics, a prediction block is a block of a reference frame that is considered to closely match a PU of a video block to be encoded. In some implementations, video encoder 20 may calculate values for sub-integer pixel positions of reference frames stored in DPB 64. For example, video encoder 20 may interpolate values for a quarter-pixel position, an eighth-pixel position, or other fractional-pixel positions of the reference frame. Thus, the motion estimation unit 42 may perform a motion search with respect to the full pixel position and the fractional pixel position and output a motion vector with fractional pixel accuracy.
Motion estimation unit 42 calculates motion vectors for PUs of video blocks in inter-prediction coded frames by: the location of the PU is compared to locations of prediction blocks of reference frames selected from a first reference frame list (list 0) or a second reference frame list (list 1), each of which identifies one or more reference frames stored in the DPB 64. The motion estimation unit 42 sends the calculated motion vector to the motion compensation unit 44 and then to the entropy coding unit 56.
The motion compensation performed by motion compensation unit 44 may involve obtaining or generating a prediction block based on the motion vector determined by motion estimation unit 42. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may locate the prediction block to which the motion vector points in one of the reference frame lists, retrieve the prediction block from DPB 64, and forward the prediction block to adder 50. Adder 50 then forms a residual video block of pixel difference values by subtracting the pixel values of the prediction block provided by motion compensation unit 44 from the pixel values of the current video block being encoded. The pixel difference values forming the residual video block may comprise a luminance difference component or a chrominance difference component or both. Motion compensation unit 44 may also generate syntax elements associated with video blocks of the video frame for use by video decoder 30 in decoding the video blocks of the video frame. The syntax elements may include, for example, syntax elements that define motion vectors used to identify the prediction blocks, any flag indicating a prediction mode, or any other syntax information described herein. It should be noted that motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are illustrated separately for conceptual purposes.
In some embodiments, intra BC unit 48 may generate vectors and obtain prediction blocks in a manner similar to that described above in connection with motion estimation unit 42 and motion compensation unit 44, but in the same frame as the current block being encoded, and these vectors are referred to as block vectors rather than motion vectors. In particular, the intra BC unit 48 may determine an intra prediction mode to be used for encoding the current block. In some examples, intra BC unit 48 may encode current blocks using various intra prediction modes, e.g., during separate encoding passes, and test their performance through rate-distortion analysis. Next, intra BC unit 48 may select an appropriate intra prediction mode among the various tested intra prediction modes to use and generate an intra mode indicator accordingly. For example, the intra BC unit 48 may calculate rate-distortion values for various tested intra prediction modes using rate-distortion analysis, and select an intra prediction mode having the best rate-distortion characteristics among the tested modes as a suitable intra prediction mode to use. Rate-distortion analysis generally determines the amount of distortion (or error) between an encoded block and the original, unencoded block that was encoded to produce the encoded block, as well as the bit rate (i.e., the number of bits) used to produce the encoded block. Intra BC unit 48 may calculate ratios from the distortion and rate for various encoded blocks to determine which intra prediction mode exhibits the best rate-distortion value for the block.
In other examples, intra BC unit 48 may use, in whole or in part, motion estimation unit 42 and motion compensation unit 44 to perform such functions for intra BC prediction according to embodiments described herein. In either case, for intra block copying, the prediction block may be a block that is considered to closely match the block to be encoded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), or other difference metrics, and identifying the prediction block may include calculating values for sub-integer pixel locations.
Whether the prediction block is from the same frame according to intra prediction or from a different frame according to inter prediction, video encoder 20 may form a residual video block by subtracting pixel values of the prediction block from pixel values of the current video block being encoded to form pixel difference values. The pixel difference values forming the residual video block may include both a luminance component difference and a chrominance component difference.
As an alternative to inter prediction performed by motion estimation unit 42 and motion compensation unit 44 or intra block copy prediction performed by intra BC unit 48 as described above, intra prediction processing unit 46 may intra predict the current video block. In particular, the intra prediction processing unit 46 may determine an intra prediction mode to use for encoding the current block. To this end, the intra-prediction processing unit 46 may encode the current block using various intra-prediction modes, e.g., during separate encoding passes, and the intra-prediction processing unit 46 (or in some examples, a mode selection unit) may select an appropriate intra-prediction mode from the tested intra-prediction modes to use. Intra-prediction processing unit 46 may provide information indicating the intra-prediction mode selected for the block to entropy encoding unit 56. The entropy encoding unit 56 may encode information indicating the selected intra prediction mode into a bitstream.
After prediction processing unit 41 determines a prediction block for the current video block via inter prediction or intra prediction, adder 50 forms a residual video block by subtracting the prediction block from the current video block. The residual video data in the residual block may be included in one or more Transform Units (TUs) and provided to the transform processing unit 52. The transform processing unit 52 transforms the residual video data into residual transform coefficients using a transform, such as a Discrete Cosine Transform (DCT) or a conceptually similar transform.
Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54. The quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may also reduce the bit depth associated with some or all of the coefficients. The quantization level may be modified by adjusting a quantization parameter. In some examples, quantization unit 54 may then perform a scan of a matrix including quantized transform coefficients. Alternatively, the entropy encoding unit 56 may perform scanning.
After quantization, entropy encoding unit 56 entropy encodes the quantized transform coefficients into a video bitstream using, for example, Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or another entropy encoding method or technique. The encoded bitstream may then be transmitted to video decoder 30, or archived in storage device 32 for later transmission to video decoder 30 or retrieval by video decoder 30. Entropy encoding unit 56 may also entropy encode the motion vectors and other syntax elements for the current video frame being encoded.
Inverse quantization unit 58 and inverse transform processing unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual video block in the pixel domain for use in generating reference blocks for predicting other video blocks. As noted above, motion compensation unit 44 may generate a motion compensated prediction block from one or more reference blocks of a frame stored in DPB 64. Motion compensation unit 44 may also apply one or more interpolation filters to the prediction blocks to calculate sub-integer pixel values for use in motion estimation.
Adder 62 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 44 to produce a reference block for storage in DPB 64. The reference block may then be used by intra BC unit 48, motion estimation unit 42, and motion compensation unit 44 as a prediction block to inter-predict another video block in a subsequent video frame.
Fig. 3 is a block diagram illustrating an exemplary video decoder 30 according to some embodiments of the present application. The video decoder 30 includes a video data memory 79, an entropy decoding unit 80, a prediction processing unit 81, an inverse quantization unit 86, an inverse transform processing unit 88, an adder 90, and a DPB 92. Prediction processing unit 81 further includes a motion compensation unit 82, an intra prediction processing unit 84, and an intra BC unit 85. Video decoder 30 may perform a decoding process that is substantially reciprocal to the encoding process described above with respect to video encoder 20 in connection with fig. 2. For example, motion compensation unit 82 may generate prediction data based on motion vectors received from entropy decoding unit 80, and intra-prediction unit 84 may generate prediction data based on intra-prediction mode indicators received from entropy decoding unit 80.
In some examples, the units of video decoder 30 may be tasked to perform embodiments of the present application. Furthermore, in some examples, embodiments of the disclosure may be dispersed in one or more of the units of video decoder 30. For example, intra BC unit 85 may perform embodiments of the present application alone or in combination with other units of video decoder 30 (e.g., motion compensation unit 82, intra prediction processing unit 84, and entropy decoding unit 80). In some examples, video decoder 30 may not include intra BC unit 85, and the functions of intra BC unit 85 may be performed by other components of prediction processing unit 81 (e.g., motion compensation unit 82).
Video data memory 79 may store video data, such as an encoded video bitstream, to be decoded by other components of video decoder 30. The video data stored in the video data memory 79 may be obtained, for example, from the storage device 32, from a local video source (e.g., a camera), via wired or wireless network communication of the video data, or by accessing a physical data storage medium (e.g., a flash drive or hard disk). Video data memory 79 may include a Coded Picture Buffer (CPB) that stores coded video data from a coded video bitstream. Decoded Picture Buffer (DPB)92 of video decoder 30 stores reference video data for use by video decoder 30 when decoding the video data (e.g., in intra or inter prediction encoding modes). Video data memory 79 and DPB 92 may be formed from any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM) (including synchronous DRAM (sdram)), magnetoresistive ram (mram), resistive ram (rram), or other types of memory devices. For illustrative purposes, video data memory 79 and DPB 92 are depicted in fig. 3 as two different components of video decoder 30. It will be apparent to those skilled in the art that video data memory 79 and DPB 92 may be provided by the same memory device or separate memory devices. In some examples, video data memory 79 may be on-chip with other components of video decoder 30, or off-chip with respect to those components.
During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks and associated syntax elements of an encoded video frame. Video decoder 30 may receive syntax elements at the video frame level and/or the video block level. Entropy decoding unit 80 of video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors or intra prediction mode indicators, and other syntax elements. The entropy decoding unit 80 then forwards the motion vectors and other syntax elements to the prediction processing unit 81.
When a video frame is encoded as an intra-prediction encoded (I) frame or as an intra-coded prediction block for use in other types of frames, intra-prediction processing unit 84 of prediction processing unit 81 may generate prediction data for a video block of the current video frame based on the signaled intra-prediction mode and reference data from previously decoded blocks of the current frame.
When a video frame is encoded as an inter-prediction encoded (i.e., B or P) frame, motion compensation unit 82 of prediction processing unit 81 generates one or more prediction blocks for the video block of the current video frame based on the motion vectors and other syntax elements received from entropy decoding unit 80. Each of the prediction blocks may be generated from a reference frame within one of the reference frame lists. Video decoder 30 may use a default construction technique to construct reference frame lists, i.e., list 0 and list 1, based on the reference frames stored in DPB 92.
In some examples, when encoding a video block according to the intra BC mode described herein, intra BC unit 85 of prediction processing unit 81 generates a prediction block for the current video block based on the block vector and other syntax elements received from entropy decoding unit 80. The prediction block may be within a reconstruction region of the same picture as the current video block defined by video encoder 20.
Motion compensation unit 82 and/or intra BC unit 85 determine prediction information for the video block of the current video frame by parsing the motion vectors and other syntax elements and then use the prediction information to generate a prediction block for the current video block being decoded. For example, motion compensation unit 82 uses some of the received syntax elements to determine a prediction mode (e.g., intra-prediction or inter-prediction) for encoding a video block of a video frame, an inter-prediction frame type (e.g., B or P), construction information for one or more of a list of reference frames for the frame, a motion vector for each inter-prediction encoded video block of the frame, an inter-prediction state for each inter-prediction encoded video block of the frame, and other information for decoding a video block in the current video frame.
Similarly, some of the received syntax elements, such as flags, may be used by intra BC unit 85 to determine that the current video block is predicted using an intra BC mode, build information for which video blocks of the frame are within the reconstruction region and should be stored in DPB 92, a block vector for each intra BC predicted video block of the frame, intra BC prediction status for each intra BC predicted video block of the frame, and other information for decoding the video blocks in the current video frame.
Motion compensation unit 82 may also perform interpolation using interpolation filters as used by video encoder 20 during encoding of video blocks to calculate interpolated values for sub-integer pixels of a reference block. In this case, motion compensation unit 82 may determine interpolation filters used by video encoder 20 from the received syntax elements and use these interpolation filters to generate prediction blocks.
Inverse quantization unit 86 inverse quantizes the quantized transform coefficients provided in the bitstream and entropy decoded by entropy decoding unit 80 using the same quantization parameter calculated by video encoder 20 for each video block in the video frame to determine the degree of quantization. Inverse transform processing unit 88 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to reconstruct the residual block in the pixel domain.
After motion compensation unit 82 or intra BC unit 85 generates a prediction block for the current video block based on the vector and other syntax elements, adder 90 reconstructs the decoded video block for the current video block by adding the residual block from inverse transform processing unit 88 to the corresponding prediction block generated by motion compensation unit 82 and intra BC unit 85. A loop filter 94 may be located between adder 90 and DPB 92 and include a deblocking filter to filter block boundaries and remove blockiness from decoded video blocks. Loop filter 94 also includes an SAO filter and an ALF to filter the decoded video blocks output by adder 90. The decoded video blocks in a given frame are then stored in DPB 92, and DPB 92 stores reference frames for subsequent motion compensation of subsequent video blocks. DPB 92, or a memory device separate from DPB 92, may also store decoded video for later presentation on a display device (e.g., display device 34 of fig. 1).
In a typical video encoding process, a video sequence typically comprises an ordered set of frames or pictures. Each frame may include three arrays of samples, denoted SL, SCb, and SCr. SL is a two-dimensional array of brightness samples. SCb is a two-dimensional array of Cb chroma samples. SCr is a two-dimensional array of Cr chroma samples. In other examples, the frame may be monochromatic, and thus include only one two-dimensional array of intensity samples.
As shown in fig. 4A, video encoder 20 (or, more specifically, segmentation unit 45) generates an encoded representation of a frame by first segmenting the frame into a set of Coding Tree Units (CTUs). A video frame may include an integer number of CTUs ordered sequentially from left to right and top to bottom in raster scan order. Each CTU is the largest logical coding unit, and the width and height of the CTU is signaled by video encoder 20 in a sequence parameter set such that all CTUs in a video sequence have the same size of one of 128 × 128, 64 × 64, 32 × 32, and 16 × 16. It should be noted, however, that the present application is not necessarily limited to a particular size. As shown in fig. 4B, each CTU may include one Coding Tree Block (CTB) of luma samples, two corresponding coding tree blocks of chroma samples, and syntax elements for encoding samples of the coding tree blocks. The syntax elements describe the properties of the different types of units that encode the pixel blocks and how the video sequence can be reconstructed at video decoder 30, including inter or intra prediction, intra prediction modes, motion vectors, and other parameters. In a monochrome picture or a picture with three separate color planes, a CTU may comprise a single coding tree block and syntax elements for coding samples of the coding tree block. The coding tree block may be an N × N block of samples.
To achieve better performance, video encoder 20 may recursively perform tree partitioning, e.g., binary tree partitioning, ternary tree partitioning, quadtree partitioning, or a combination thereof, on the coding tree blocks of the CTUs and partition the CTUs into smaller Coding Units (CUs). As depicted in fig. 4C, a 64 × 64 CTU 400 is first divided into four smaller CUs, each having a block size of 32 × 32. Of the four smaller CUs, CU 410 and CU 420 are divided into four CUs having a block size of 16 × 16, respectively. The two 16 × 16 CUs 430 and the CU 440 are further divided into four CUs having block sizes of 8 × 8, respectively. Fig. 4D depicts a quadtree data structure showing the final result of the segmentation process of the CTU 400 as depicted in fig. 4C, each leaf node of the quadtree corresponding to one CU of a respective size ranging from 32 x 32 to 8 x 8. Similar to the CTU depicted in fig. 4B, each CU may include a Coded Block (CB) of luma samples and two corresponding coded blocks of chroma samples of the same size frame, as well as syntax elements for coding the samples of the coded blocks. In a monochrome picture or a picture with three separate color planes, a CU may comprise a single coding block and a syntax structure for coding the samples of the coding block. It should be noted that the quadtree partitioning depicted in fig. 4C and 4D is for illustrative purposes only, and one CTU may be split into multiple CUs based on quadtree partitioning/ternary tree partitioning/binary tree partitioning to adapt to varying local characteristics. In the multi-type tree structure, one CTU is divided in a quadtree structure, and each quadtree-leaf CU can be further divided in binary and ternary tree structures. As shown in fig. 4E, there are five segmentation types, i.e., a quad segmentation, a horizontal binary segmentation, a vertical binary segmentation, a horizontal ternary segmentation, and a vertical ternary segmentation.
In some embodiments, video encoder 20 may further partition the coding block of the CU into one or more (M × N) Prediction Blocks (PB). A prediction block is a block of rectangular (square or non-square) samples to which the same prediction (inter or intra) is applied. A Prediction Unit (PU) of a CU may include a prediction block of luma samples, two corresponding prediction blocks of chroma samples, and syntax elements for predicting the prediction block. In a monochrome picture or a picture with three separate color planes, a PU may include a single prediction block and syntax structures for predicting the prediction block. Video encoder 20 may generate predicted luma, predicted Cb, and predicted Cr blocks for the luma, Cb, and Cr predicted blocks for each PU of the CU.
Video encoder 20 may generate the prediction block for the PU using intra prediction or inter prediction. If video encoder 20 uses intra-prediction to generate the prediction block for the PU, video encoder 20 may generate the prediction block for the PU based on the decoding samples of the frame associated with the PU. If video encoder 20 uses inter-prediction to generate the prediction block for the PU, video encoder 20 may generate the prediction block for the PU based on decoding samples of one or more frames other than the frame associated with the PU.
After video encoder 20 generates the predicted luma block, the predicted Cb block, and the predicted Cr block for one or more PUs of the CU, video encoder 20 may generate a luma residual block for the CU by subtracting the predicted luma block of the CU from the original luma coding block of the CU, such that each sample in the luma residual block of the CU indicates a difference between a luma sample in one of the predicted luma blocks of the CU and a corresponding sample in the original luma coding block of the CU. Similarly, video encoder 20 may generate the Cb residual block and the Cr residual block for the CU, respectively, such that each sample in the Cb residual block of the CU indicates a difference between a Cb sample in one of the predicted Cb blocks of the CU and a corresponding sample in the original Cb coding block of the CU, and each sample in the Cr residual block of the CU may indicate a difference between a Cr sample in one of the predicted Cr blocks of the CU and a corresponding sample in the original Cr coding block of the CU.
Further, as shown in fig. 4C, video encoder 20 may decompose the luma, Cb, and Cr residual blocks of the CU into one or more luma, Cb, and Cr transform blocks using quadtree partitioning. A transform block is a block of rectangular (square or non-square) samples to which the same transform is applied. A transform block (TU) of a CU may include a transform block of luma samples, two corresponding transform blocks of chroma samples, and syntax elements for transforming the transform block samples. Thus, each TU of a CU may be associated with a luma transform block, a Cb transform block, and a Cr transform block. In some examples, the luma transform block associated with a TU may be a sub-block of a luma residual block of a CU. The Cb transform block may be a sub-block of a Cb residual block of the CU. The Cr transform block may be a sub-block of the Cr residual block of the CU. In a monochrome picture or a picture with three separate color planes, a TU may include a single transform block and syntax structures for transforming samples of the transform block.
Video encoder 20 may apply one or more transforms to a luma transform block of a TU to generate a luma coefficient block for the TU. The coefficient block may be a two-dimensional array of transform coefficients. The transform coefficients may be scalars. Video encoder 20 may apply one or more transforms to Cb transform blocks of a TU to generate Cb coefficient blocks for the TU. Video encoder 20 may apply one or more transforms to a Cr transform block of a TU to generate a Cr coefficient block for the TU.
After generating the coefficient block (e.g., a luminance coefficient block, a Cb coefficient block, or a Cr coefficient block), video encoder 20 may quantize the coefficient block. Quantization generally refers to the process by which transform coefficients are quantized to possibly reduce the amount of data used to represent the transform coefficients, thereby providing further compression. After video encoder 20 quantizes the coefficient block, video encoder 20 may entropy encode syntax elements that indicate the quantized transform coefficients. For example, video encoder 20 may perform Context Adaptive Binary Arithmetic Coding (CABAC) on syntax elements indicating quantized transform coefficients. Finally, video encoder 20 may output a bitstream that includes the bit sequence that forms a representation of the encoded frames and associated data, the bitstream being stored in storage device 32 or transmitted to destination device 14.
Upon receiving the bitstream generated by video encoder 20, video decoder 30 may parse the bitstream to obtain syntax elements from the bitstream. Video decoder 30 may reconstruct the frames of video data based at least in part on syntax elements obtained from the bitstream. The process of reconstructing the video data is generally reciprocal to the encoding process performed by video encoder 20. For example, video decoder 30 may perform inverse transforms on coefficient blocks associated with TUs of the current CU to reconstruct residual blocks associated with the TUs of the current CU. Video decoder 30 also reconstructs the coding block for the current CU by adding samples of the prediction block for PUs of the current CU to corresponding samples of the transform block for TUs of the current CU. After reconstructing the encoded blocks for each CU of a frame, video decoder 30 may reconstruct the frame.
Recent studies have shown that there appears to be a correlation between the Cb residual and the Cr residual of a CU. In some cases, the two chroma residuals appear inversely correlated to each other. In this case, a mode for joint coding of chroma residues is proposed to signal only one chroma residue block (e.g., Cb residue block) of a CU and a flag indicating that joint coding of chroma residues is enabled, thereby improving coding efficiency. In some embodiments, the average of the positive Cb residual and the negative Cr residual is used as the joint residual of the two components, thereby improving accuracy when the two chroma residuals are not exactly inversely correlated, as follows:
resJoint=(resCb–resCr)/2,
where resCb represents the Cb residual block of the CU and resCr represents the Cr residual block of the CU.
In some implementations, the video codec calculates both the average sum block and the average difference block between two chroma residuals as follows:
resJointCb=(resCb–resCr)/2;
resJointCr=(resCb+resCr)/2
the values in the average difference block resJointCr have smaller magnitudes than the two residual blocks resCb and resCr and can be quantized with a small number of bits with the same or similar level of precision.
In some embodiments, different modes for joint coding of chroma residuals are proposed, wherein each mode targets a specific relation between Cb and Cr residuals as follows:
mode 1: cb is coded and decoded and Cr is obtained according to Cr-CSign-Cb/2;
mode 2: cb is coded and decoded and Cr is obtained according to Cr-CSign-Cb;
mode 3: cr is coded and Cb is derived from Cb ═ CSign × Cr/2,
where CSign represents a symbol used to derive a second chroma residual block from a first chroma residual block. CSign is signaled as a tile group header syntax element and has a value of-1 or 1.
In some embodiments, the mode of joint coding of chroma residuals is signaled by a TU-level flag (i.e., TU _ cb _ cr _ join _ residual). If tu _ cb _ cr _ join _ residual is equal to 1, one of the above three modes is used. The particular pattern used is derived from the signaled chroma Coded Block Flag (CBF) according to the following table:
Figure BDA0003702741360000171
table 1: CBF-based joint chroma residual coding and decoding mode derivation
In some embodiments, if the joint chroma codec mode is selected, the Quantization Parameter (QP) used to codec the joint chroma residual component is decremented by 1 (for modes 1 and 3) or 2 (for mode 2).
In summary, video encoder 20 derives the joint chroma residual by a corresponding blending operation of the Cb residual and the Cr residual, and selects one of three supported chroma codec modes (including CSign) based on a distortion analysis (e.g., a distortion obtained by first blending the Cb residual and the Cr residual into the joint chroma residual and then reconstructing the Cb residual and the Cr residual from the joint chroma residual without quantization). The selected mode is then tested in an additional mode decision process (i.e., using transform, quantization and entropy coding). In some embodiments, a tile group header syntax element indicating a symbol (CSign) used to derive the second chroma component is determined by analyzing a correlation between an original Cb component and a high-pass filtered version of the Cr component of the tile group.
In some embodiments, the correlation between the first chrominance residual and the second chrominance residual indicates that the signaling of the tu _ cb _ cr _ join _ residual flag depends on the signaling of one, but not both, chrominance coded block flags. For example, if the chroma coded block flag signaled first has a value of 1, the tu _ cb _ cr _ join _ residual flag will be signaled and the second chroma coded block flag need not be signaled due to the correlation between the first chroma residual block and the second chroma residual block. The second chroma coded block flag is signaled only if the tu _ cb _ cr _ join _ residual flag has a value of 0 (i.e., there is no correlation between the first chroma residual block and the second chroma residual block).
In some embodiments, one or two contexts are used for CABAC coding of the tu _ cb _ cr _ join _ residual flag. For example, one of the two contexts is selected based on the value of the Cr coded block flag. When the Cr coding block flag is 1, using a context; otherwise (i.e., Cr coded block flag equal to 0), the other context is used. If the Cb coded block flag is equal to 1, the TU level flag TU _ Cb _ cr _ join _ residual is signaled and both contexts are used to codec the TU _ Cb _ cr _ join _ residual flag.
In some embodiments, the TU-level flag TU _ cb _ cr _ join _ residual is signaled only when both chroma CBFs are 1. When the tu _ cb _ cr _ join _ residual flag has a value of 1, an additional syntax element is signaled to indicate which of the three modes is selected. For CABAC coding of mode syntax, different codeword binarization may be used. One exemplary codeword binarization may be a truncated unary codeword, with a maximum codeword index of 2, as shown in table 2 below.
Figure BDA0003702741360000181
Table 2: codeword binarization for joint chroma coding and decoding mode signaling
In some embodiments, additional syntax elements are proposed to control the syntax signaling of the modes of joint coding of chroma residuals at different levels. For example, the syntax elements may be signaled at the video sequence level, picture level or tile group level, tile level or slice level. When this syntax element is signaled at a particular level with a value of 1, a TU-level control flag (i.e., TU _ cb _ cr _ join _ residual) at or below that level is also signaled to indicate the use of joint coding of chroma residuals. When the syntax is signaled with a value of 0, joint coding of chroma residual is disabled at this level, and when a CU is coded at or below the level where the flag is signaled with a value of 0, the TU-level control flag will not be coded.
Fig. 5A and 5B are flow diagrams illustrating an exemplary process 500 for video encoder 20 to implement techniques for encoding video data using a joint coding scheme for chroma residuals, according to some embodiments of the present disclosure. Video encoder 20 obtains a first syntax element associated with a first layer of a hierarchy from a video bitstream having the hierarchy (510). As explained above, there are multiple options for the first layer, and thus, the first element may be in one of a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a tile group header, a tile header, a slice header, and the like. The plurality of chroma components of each of the one or more blocks corresponds to a transform unit of the video data, which in turn is associated with a particular coding unit.
Next, video encoder 20 examines the value of the first syntax element (e.g., in the form of a one-bit flag) and determines whether the syntax element indicates that the joint codec mode for chroma residuals is enabled (530). For example, a value of 1 indicates that the joint coding mode for the chroma residual is enabled, and a value of 0 indicates that the joint coding mode for the chroma residual is disabled. If the joint coding mode for the chroma residues is enabled (530 — yes), video encoder 20 then jointly encodes the chroma residues for the multiple chroma components of the one or more blocks below the first layer into a video bitstream according to a predefined scheme for joint coding of the chroma residues (550). As described above, at least three different schemes of joint coding of chroma residuals are proposed to handle different types of video data. One chroma component of the multiple chroma components is processed accordingly using various syntax elements and CABAC contexts, while the other chroma components are derived from the processed chroma component according to a correlation under a predefined scheme of joint coding of chroma residuals. If the joint coding mode for the chroma residues is disabled (530-no), video encoder 20 then encodes the chroma residues for the plurality of chroma components for the one or more blocks below the first layer into a video bitstream, respectively (570). In other words, each of the multiple chroma components of one or more blocks is encoded into a video bitstream, and the TU-level control flag TU _ cb _ cr _ join _ residual is set to zero for each CU.
Finally, video encoder 20 outputs a video bitstream that includes encoded chroma residuals for a plurality of chroma components for one or more blocks and a first syntax element (590).
In some implementations, as shown in fig. 5B, after the first syntax element indicates that the joint coding mode for chroma residuals is enabled, video encoder 20 obtains a second syntax element associated with each of the one or more blocks (550-1), and determines whether the second syntax element indicates that the block-level joint coding mode for chroma residuals is enabled (550-3). If so (550-3, yes), video encoder 20 jointly encodes chroma residuals for the plurality of chroma components of the block into a video bitstream according to a predefined scheme for joint coding of chroma residuals (550-5); otherwise (550-3, no), video encoder 20 encodes the chroma residuals for the multiple chroma components of the block into the video bitstream, respectively (550-7). In other words, a value of 0 at the first syntax element may disable the application of joint coding of chroma residuals for all blocks below the first layer, such that the second syntax element need not be signaled at the block level. But a value of 1 at the first syntax element does not indicate that each block below the first layer will have to be encoded using one of the schemes of joint coding of chroma residues. By selecting the second syntax element, each block still has its own control, enhancing the flexibility of implementing a video encoder.
In some implementations, video encoder 20 selects a mode from a plurality of modes (see, e.g., table 1 above) based on values of chroma coding flags of a plurality of chroma components of a block, which may require rate-distortion analysis. Then, the video encoder 20 encodes the chroma residual of one of the plurality of chroma components of the block and the value of the chroma coding flag of the plurality of chroma components of the block into the video bitstream, respectively, according to the selected mode.
Fig. 6A-6C are flow diagrams illustrating an exemplary process 600 for a video decoder implementing a technique for decoding video data using a joint coding scheme for chroma residuals, according to some embodiments of the present disclosure. Video decoder 30 receives a first syntax element associated with a first layer of the hierarchy from the video bitstream having the hierarchy (610), and then checks whether the first syntax element indicates that a joint codec mode for chroma residuals is enabled (630). If so (630-yes), video decoder 30 jointly reconstructs chroma residuals for the multiple chroma components for one or more blocks below the first layer from the video bitstream according to a predefined scheme for joint coding of the chroma residuals (650). Otherwise (630-no), video decoder 30 reconstructs chroma residuals for the multiple chroma components for one or more blocks below the first layer from the video bitstream, respectively (670). As explained above, there are multiple options for the first layer, and thus, the first element may be in one of a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a tile group header, a tile header, a slice header, and the like. The plurality of chroma components of each of the one or more blocks corresponds to a transform unit of the video data, which in turn is associated with a particular coding unit.
In some implementations, as shown in fig. 6B, after the first syntax element indicates that the joint coding mode for chroma residuals is enabled, video decoder 30 receives a second syntax element associated with each of the one or more blocks from the video bitstream (650-1), and determines whether the second syntax element indicates that the block-level joint coding mode for chroma residuals is enabled (650-3). If so (650-3, yes), video decoder 30 jointly reconstructs chroma residuals for the multiple chroma components of the block from the video bitstream according to a predefined scheme for joint coding of the chroma residuals (650-5); otherwise (650-3, no), video decoder 30 reconstructs the chroma residuals for the multiple chroma components of the block from the video bitstream, respectively (650-7). In other words, a value of 0 at the first syntax element may disable the application of joint coding of chroma residuals for all blocks below the first layer, such that the second syntax element need not be signaled at the block level. But a value of 1 at the first syntax element does not indicate that each block below the first layer will have to be coded using one of the schemes of joint coding of chroma residuals. By selecting the second syntax element, each block still has its own control, enhancing the flexibility of implementing a video decoder.
In some embodiments, as shown in fig. 6C and described above in connection with table 1, each of the plurality of chroma components of the block has a chroma coding flag, and the predefined scheme for joint coding of the chroma residuals has a plurality of modes (650-11). Video decoder 30 selects a mode from a plurality of modes (see, e.g., table 1 above) based on the values of chroma coding flags for a plurality of chroma components of the block (650-13), and then reconstructs chroma residuals for the plurality of chroma components of the block from the video bitstream according to the selected mode (650-15). Assuming that the plurality of chroma components of the block includes a first chroma component (e.g., Cb component) and a second chroma component (e.g., Cr component) (650-15-1), video decoder 30 reconstructs a chroma residual for the first chroma component of the block from the video bitstream (650-15-3), and derives the chroma residual for the second chroma component directly from the chroma residual for the first chroma component of the block (650-15-5) as described above with respect to various modes of joint coding of chroma residuals.
As shown in the above table 1, when the TU level TU _ Cb _ Cr _ join _ residual flag is 1, and Cb CBF is 1 and Cr CBF is 0, mode 1 is selected. It is still possible that both chroma blocks (Cb and Cr blocks) actually have non-zero residuals, resulting in a difference between the signaled chroma block CBF value and the actual corresponding chroma block residual. When such chroma CBF values are subsequently used for other purposes (e.g., used as context for coding other syntax), such differences may affect codec performance. As shown in fig. 6C, video decoder 30 may reset chroma coding flags for a plurality of chroma components of the block to predefined values (650-17). For example, in a scheme of joint chroma residual coding with multiple modes, when a TU-level flag TU _ Cb _ Cr _ join _ residual is signaled as 1, regardless of which of the three modes is used, after reconstructing the current block, Cb and Cr chroma Coding Block Flag (CBF) syntax elements are reset to 1. For example, in mode 1, even if the signaled Cr CBF is 0, it will be reset to 1 after the current block is reconstructed.
Fig. 7 is a block diagram of an exemplary video decoder 30 configured for Luma Mapping and Chroma Scaling (LMCS), in accordance with some embodiments. In video decoder 30, a first plurality of decoding modules (shaded) are implemented in the mapped domain and include entropy encoding unit 80, inverse quantization unit 86, inverse transform processing unit 88, luma intra prediction unit 84A, and luma sample reconstruction unit 90A. The luma prediction samples and the luma residual samples are added in the mapped domain. A second plurality of decoding modules (not shaded) are implemented in a non-mapped domain (also referred to as the original domain) and include a motion compensation prediction unit 82, a chroma intra prediction unit 84B, a chroma sampling point reconstruction unit 90B, and a loop filter 94, the loop filter 94 further including one or more of a deblocking filter, an SAO filter, and an ALF filter. Chroma prediction samples and chroma residual samples are added in the non-mapped domain. The luminance component and the chrominance component of the reference picture are output in the unmapped domain and stored in a Decoded Picture Buffer (DPB) 92. LMCS is applied before loop filters 94A and 94B configured to process the luma and chroma components of the reconstructed video data, respectively. The LMCS is implemented jointly by a luminance component loop mapping module 96 and a luminance dependent chroma residual scaling unit 98. The loop mapping module 96 is implemented based on an adaptive piecewise linear model and further includes a forward mapping unit 96A and an inverse mapping unit 96B.
The loop mapping module 96 is configured to adjust the dynamic range of the input video data to improve codec efficiency. The forward mapping unit 96A is based on a forward mapping function FwdMap and the reverse mapping unit 96B is based on a corresponding reverse mapping function InvMap. The forward mapping function FwdMap is signaled from the video encoder 20 to the video decoder 30 using a luminance mapping model 702 (e.g., a piecewise linear model having 16 equally sized segments). In some embodiments, the inverse mapping function InvMap is derived directly from the forward mapping function FwdMap and need not be signaled.
In some embodiments, one or more parameters of the mapping unit 96 (e.g., parameters of the luminance mapping model 702) are signaled at the picture level. For example, a presence flag is signaled to indicate whether the intensity mapping model 702 is present in the current image slice. If the intensity mapping model 702 exists for the current image slice, a plurality of piecewise-linear model parameters are further signaled to facilitate the implementation of the piecewise-linear model. Based on the piecewise linear model, the dynamic range of the input video data is partitioned into 16 segments of equal size in the original domain, and each segment of the dynamic range of the input video data is mapped to a corresponding segment in the original domain. For a given fragment in the original domain, its corresponding fragment in the mapped domain may have the same size or a different size than the given fragment in the original domain. The size of each segment in the mapping domain is indicated by the number of codewords (i.e., codewords that store the mapped sample values) of the respective segment. For each segment in the original domain, the piecewise-linear mapping parameter may be derived based on the number of codewords in its corresponding segment in the mapped domain. In an example, the input video data has a resolution of 10 bits. If each of the segments in the mapped domain has 64 codewords assigned to the respective segment, each of the 16 segments in the original domain has 64 pixel values according to a one-to-one mapping (i.e., a mapping in which each sample value is invariant). The number of signaled codewords for each segment in the mapping domain is used to calculate a scaling factor and adjust the forward mapping function or the reverse mapping function accordingly for that segment. Further, in some embodiments, another LMCS control flag is signaled to enable/disable LMCS for image stripes at the picture level.
In the example, for the ith segment (i ═ 0 … 15), the corresponding piecewise linear model applied by the forward mapping function FwdMap of the forward mapping unit 96A is defined by two input pivot points InputPivot [ i ] and InputPivot [ i +1] and two output (mapped) pivot points MappedPivot [ i ] and MappedPivot [ i +1 ]. Further, if the input video data has a resolution of 10 bits, values of inputpdivot [ i ] and MappedPivot [ i ] (i ═ 0 … 15) are calculated as follows:
1. set variable OrgCW 64
2. For i 0:16, inputpdivot [ i ] ═ i OrgCW
3. For i ═ 0:16, mappedpinvot [ i ] was calculated as follows:
MappedPivot[0]=0;
for(i=0;i<16;i++)
MappedPivot[i+1]=MappedPivot[i]+SignaledCW[i]
where SignaldCW [ i ] is the number of signaled codewords for the ith segment.
A Chroma Residual Scaling (CRS) unit 98 is connected to the inverse transform processing unit 88 and is configured to compensate for the interplay of quantization accuracy between the luma component and the corresponding chroma components when the loop map is applied to the luma component. Signaling a first CRS flag in a picture header to indicate whether chroma residual scaling is enabled or disabled. If luma mapping is enabled and dual-tree partitioning of luma and chroma components is disabled for the current picture, a second CRS flag is signaled to indicate whether luma-dependent chroma residual scaling is applied. Disabling chroma residual scaling via the first CRS flag if luma mapping is not used or dual-tree partitioning is enabled for the current picture. Furthermore, in some embodiments, chroma residual scaling is disabled for CUs that contain less than or equal to four chroma samples.
For both intra-CU and inter-CU, a scaling parameter used to scale the chroma residual is determined based on an average of the corresponding mapped luma prediction samples. Let's be avg' Y Is the average of the luma prediction samples in the mapped domain. Determining a scaling parameter C according to the following steps ScaleInv
1. Finding avg 'in the mapping domain' Y Segment index Y of the associated piecewise-linear model 1dx . Here, Y 1dx Having integer values ranging from 0 to 15.
2、C ScaleInv =cScaleInv[Y 1dx ]Wherein cScaleEnv [ i ]](i-0 … 15) is a pre-computed 16-segment look-up table (LUT).
Intra prediction is performed in the LMCS in the mapped domain. Avg 'when a CU is coded in intra prediction mode, Combined Intra and Inter Prediction (CIIP) mode, or Intra Block Copy (IBC) mode' Y Is calculated as the average of the luminance prediction samples. Otherwise, avg' Y Is calculated as the average of the inter-predicted luma samples of the forward map. Furthermore, unlike luma mapping performed on samples, the original chroma residual sample values C ScaleInv Is fixed for the whole chrominance CU. Given original chroma residual sample values C ScaleInv Chroma residual scaling is applied as follows:
the encoder side: c ResScale =C Res /C ScaleInv
On the decoder side: c Res =C ResScale ×C ScaleInv
Wherein C is Res Representing scaled chroma residual sample values.
Fig. 8 is a block diagram of an exemplary video decoder 30 configured to implement inverse Adaptive Color Transform (ACT), according to some embodiments. In the HEVC SCC extension, ACT is applied to reduce redundancy between the three color components in the 444 chroma format. ACT is also adopted in the VVC standard to enhance the codec efficiency of 444 chroma format codecs. For example, ACT performs loop color space conversion in the prediction residual domain by adaptively converting the residual from the input color space to the YCgCo color space. Referring to fig. 8, in some embodiments, two color spaces are adaptively selected by signaling the ACT flag at the CU level. When the ACT flag is equal to a first value ("1"), the residual of the CU is encoded in the YCgCo color space and decoded using an inverse ACT unit 802. When the ACT flag is equal to the second value ("0"), the residual of the CU is encoded in the original color space of the input video data and decoded without using the inverse ACT unit 802. Furthermore, in some embodiments, ACT is enabled only for a CU in inter-prediction or IBC mode when there is at least one non-zero coefficient in the CU, and the inverse ACT unit 802 is enabled to decode the residual of the CU in inter-prediction or IBC mode. In some embodiments, when the chroma component selects the same intra prediction mode as the luma component, ACT is enabled for the CU only in the intra prediction mode, and the inverse ACT unit 802 and the intra prediction unit 84 are enabled to decode the residual of the CU in the intra prediction mode.
In some embodiments, the color space conversion for ACT is based on a YCgCo transform matrix. Specifically, a forward YCgCo color transform matrix and a reverse YCgCo color transform matrix are applied to convert the GBR vector into a YCgCo vector and to convert the YCgCo vector into a GBR vector, as follows:
Figure BDA0003702741360000251
in some embodiments, the ACT transform matrices (e.g., the forward YCgCo color transform matrix and the inverse YCgCo color transform matrix in the above equations) are not normalized. QP adjustment (-5, -5, -3) is applied to transform Y, Cg the residual of the Co component to compensate for the dynamic range change of the residual signal before and after color transform. The adjusted QP affects quantization and dequantization of the residual in the CU. For other codec processes (e.g., deblocking), the original and unadjusted QP are still applied. Furthermore, in some embodiments, the forward and inverse color transforms require access to the residuals of all three color components, and ACT is always disabled for different split tree partitions and ISP modes for the prediction block sizes of the different color components.
Fig. 9A and 9B are block diagrams of two example residual decoding subsystems 900 and 950, respectively, that may be applied in video decoder 30, according to some embodiments. Each of the residual decoding subsystems 900 and 950 includes an inverse quantization unit 86, an inverse transform processing unit 88, an inverse JCCR unit 902, an inverse ACT unit 802, and an inverse CRS unit 98. The inverse JCCR unit 902 and the inverse ACT unit 802 may be selectively arranged between the inverse transform processing unit 88 and the CRS unit 98 in fig. 7. The quantized transform coefficients are provided to the video decoder 30 in a video bitstream and entropy decoded by the entropy decoding unit 80. Inverse quantization unit 86 inverse quantizes the decoded quantized transform coefficients using the same quantization parameter calculated by video encoder 20 for each video block in the video frame to determine the degree of quantization. Inverse transform processing unit 88 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to reconstruct the residual block in the pixel domain. The reconstructed residual block includes a luminance component and a chrominance component for a plurality of pixels in a video frame. The inverse JCCR unit 902 is enabled in JCCR mode to jointly determine the chroma residuals for the chroma components of the video frame according to a scheme of joint coding of the chroma residuals. For example, in JCCR mode, only one chroma residual block (e.g., Cb residual block) of a CU and a flag indicating that joint coding of chroma residuals is enabled are signaled, while two chroma residuals (e.g., Cb and Cr residuals) are jointly determined from the only one chroma residual block. The inverse ACT unit 802 is connected to an inverse JCCR unit 902 to apply an adaptive color transform on the chroma residual (e.g., both the Cb residual and the Cr residual) of the CU. The inverse CRS unit 98 is connected to the inverse ACT unit 802 and is configured to compensate for the mutual influence of quantization accuracy between the luminance component of the CU and the corresponding chrominance component when the loop map is applied to the luminance component of the CU.
In some embodiments, the residuals generated by each unit of residual decoding subsystems 900 and 950 are controlled to be within one or more respective dynamic ranges (e.g., corresponding to a 16-bit resolution including sign bits). In the example residual decoding subsystems 900 and 950, a first clipping operation is applied, i.e. a first clipping unit 904 is connected between the inverse transform processing unit 88 and the inverse JCCR unit 902 to maintain a first dynamic range DR for the luma and chroma residuals, respectively A And a second dynamic range DR B . In an example, the dynamic range DR A And DR B Corresponds to a first bit depth equal to 16 bits and is defined as [ -2 ] 15 ,2 15 -1](including 2) 15 And 2 15 -1). The first clipping unit 904 clips (-2) according to a clipping function 15 ,2 15 -1, M) to be output by the inverse transform processing unit 88Clipping the corresponding residuals of the luminance residual and the chrominance residual of the residual block M to the corresponding dynamic range DR A And/or DR B . In yet another example, the dynamic range DR A And DR B Corresponds to a first bit depth equal to 16 bits and is defined as [ -2 ] 15 -1,2 15 -1](including-2) 15 -1 and 2 15 -1). The first clipping unit 904 clips (- (2) according to another clipping function 15 -1),2 15 -1, M) clipping the corresponding ones of the luminance residual and the chrominance residual of the residual block M output by the inverse transform processing unit 88 to the respective dynamic range DR A And/or DR B . Furthermore, in another example, the dynamic range DR A And DR B Is defined as [ -2 ] or both B ,2 B -1](including-2) B And 2 B -1). The first clipping unit 904 uses a clipping function Clip (-2) B ,2 B -1, M) or Clip (- (2) B -1),2 B -1, M) clipping the corresponding ones of the luminance residual and the chrominance residual of the residual block M output by the inverse transform processing unit 88 to the respective dynamic range DR A And/or DR B . Integer B is the luma residual of the residual block of the CU obtained through the video bitstream and applied by inverse transform processing unit 88 to reconstruct the corresponding video frame. The integer B refers to a coded bit depth of the input video data.
Furthermore, in some embodiments, a bitstream conformance constraint is applied to ensure that the residual (i.e., the residual fed to the inverse quantization unit 86) is within the 16-bit dynamic range (i.e., [ -2 ] 15 ,2 15 -1]) And (4) the following steps. Referring to fig. 9A, no clipping unit is applied after the inverse JCCR unit 902, and when JCCR mode 2 with a signaled negative sign (i.e., Cr ═ Cb) is selected and the decoded Cb residual samples are equal to the minimum of a 16-bit signed integer (i.e., -2) 15 ) The decoded residual output by JCCR unit 902 exceeds the 16-bit dynamic range. In these cases, the Cr residual samples are equal to 2 15 ,2 15 In the 16-bit range [ -2 ] 15 ,2 15 -1]And (c) out.
In the example residual decoding subsystem 950, two different clipping operations are applied. Except for the firstOutside clipping unit 904, a second clipping unit 906 is connected to the output of inverse JCCR unit 902, e.g. between inverse JCCR unit 902 and inverse ACT unit 802, to maintain the chrominance dynamic range DR CR . Dynamic range of chromaticity DR CR The chroma bit depth is selectively equal to or less than the second dynamic range DR corresponding to the chroma bit depth B A corresponding second bit depth. Second dynamic Range DR B And chroma dynamic range DR CR Same or comprising the chrominance dynamic range DR CR . For example, the second bit depth and the chroma bit depth are both equal to K, where K is an integer (e.g., 16, or a coded bit depth received over a video bitstream), and the second dynamic range DR B And chroma dynamic range DR CR Are identical and are selected from [ - (2) K -1),2 K -1]、[-2 K ,2 K -1]、[-(2 K -1),2 K ]And [ -2 [ ] K ,2 K ]。
Fig. 10A and 10B are block diagrams of two further example residual decoding subsystems 1000 and 1050, respectively, that may be applied in video decoder 30, according to some embodiments. Each of the residual decoding subsystems 1000 and 1050 includes an inverse quantization unit 86, an inverse transform processing unit 88, an inverse JCCR unit 902, an inverse ACT unit 802, and an inverse CRS unit 98. In both the example residual decoding subsystems 1000 and 1050, at least three clipping operations are applied. A first clipping unit 904 is connected between the inverse transform processing unit 88 and the inverse JCCR unit 902 to maintain a first dynamic range DR of the luma and chroma residuals corresponding to the first and second bit depths A And a second dynamic range DR B . A third clipping unit 1002 is connected between the inverse quantization unit 86 and the inverse transform processing unit 88 to maintain a third dynamic range DR of residual information corresponding to a third bit depth C . A fourth clipping unit 1004 is connected between the inverse ACT unit 802 and the inverse CRS unit 98 to maintain a fourth dynamic range DR of residual information corresponding to a fourth bit depth D
In some embodiments associated with the example residual decoding subsystem 1050, in addition to the first clipping unit 904, the third clipping unit 1002, and the fourth clipping unitOutside the clipping unit 1004, a second clipping unit 906 is connected at the output of the inverse JCCR unit 902, e.g. between the inverse JCCR unit 902 and the inverse ACT unit 802, to maintain a chrominance dynamic range DR corresponding to the chrominance bit depth CR . In an example, the chroma dynamic range DR CR Corresponding to a bit depth of 16 bits and a second clipping unit 906 is applied such that the reconstructed JCCR chroma residual output by the JCCR unit 902 does not exceed the 16-bit dynamic range DR CR
The cropping units 1002, 904, 906, and 1004 are arranged in an ordered sequence on the video decoding path. For either luma or chroma residuals, the dynamic range corresponding to the upstream clipping unit in the video decoding path is equal to or contains the dynamic range of the downstream clipping unit in the video decoding path. In particular, the third bit depth is selectively equal to or greater than the first bit depth or the second bit depth, and the third dynamic range DR is for a luma residual or a chroma residual, respectively C Equal to or including the first dynamic range DR A Or a second dynamic range DR B . The second bit depth is selectively equal to or greater than the fourth bit depth, and the second dynamic range DR B Equal to or including the fourth dynamic range DR D . In the residual decoding subsystem 1050, the second bit depth of the chroma residual is selectively equal to or greater than the chroma bit depth, and the second dynamic range DR B Equal to or including the chrominance dynamic range DR CR . The chroma bit depth is selectively equal to or greater than the fourth bit depth, and the chroma dynamic range DR CR Equal to or including the fourth dynamic range DR D
Referring to FIG. 10B, in the example, the first clipping unit 904, the second clipping unit 906 and the third clipping unit 1002 have a Clip function Clip (-2) 15 ,2 15 1, M), i.e. having the same bit depth of 16 bits, while the fourth clipping unit 1004 has a Clip (-2) function defined by a different clipping function Clip B ,2 B -1, M) different dynamic ranges DR D . Note that the clipping function clip (min, max, M) clips the input value M to [ min, max]And B is the coding ratio of the input video dataA specific depth. Furthermore, in some embodiments, there is a bitstream conformance constraint to ensure that the parsed residue (i.e., the residue fed to the inverse quantization unit 86) is within the 16-bit dynamic range (i.e., [ -2 ] 15 ,2 15 -1]) And (4) the following steps. In some embodiments, in FIG. 10A, there is no clipping operation applied to the output of inverse JCCR unit 902. When JCCR mode 2 with signaled negative sign (i.e., -Cr ═ Cb) is selected and the decoded Cb residual samples are equal to the minimum of a 16-bit signed integer (i.e., -2) 15 ) This design may result in the decoded residual exceeding the 16-bit dynamic range after the JCCR unit 902. In this case, the calculated Cr residual samples are equal to 2 15 ,2 15 Falls within the 16-bit dynamic range [ -2 [ ] 15 ,2 15 -1]And (c) out.
In some embodiments, the clipping operation of first clipping unit 904 applied before inverse JCCR unit 902 is symmetric, e.g., with the clipping function Clip (- (2) 15 -1),2 15 -1, M), the Clip function Clip (- (2) 15 -1),2 15 -1, M) is configured to clip the residual output by the inverse transform processing unit 88 to a symmetric dynamic range [ - (2) 15 -1),2 15 -1]. In an example, the symmetric clipping function clips both the luma residual and the chroma residual after the inverse transform processing unit 88. First dynamic Range DR A And a second dynamic range DR B Both being symmetrical, e.g., [ - (2) 15 -1),2 15 -1]. In another example, such a symmetric clipping function is applied to clip the chroma residual to the second dynamic range DR B And apply an asymmetric clipping function (e.g., Clip (-2) 15 ,2 15 -1, M)) to clip the associated luminance residual to a first dynamic range DR A . First dynamic Range DR A And a second dynamic range DR B Are each [ -2 [) 15 ,2 15 -1]And [ - (2) 15 -1),2 15 -1]。
As explained above, in some embodiments, the first clipping unit 904, the second clipping unit 906 and the third clipping unit 1002 are configured to enable clipping of the respective input residuals to be associated with the same bit depthWhile the fourth clipping unit 1004 is configured to implement different clipping operations that clip the respective input residuals to different dynamic ranges associated with different bit depths. Furthermore, in some embodiments, the first clipping unit 904, the second clipping unit 906, the third clipping unit 1002, and the fourth clipping unit 1004 are unified to implement the same clipping operation that clips the respective input residuals to the same dynamic range associated with the same bit depth. For example, the first clipping unit 904, the second clipping unit 906, the third clipping unit 1002, and the fourth clipping unit 1004 are unified to apply a clipping function Clip (-2) 15 ,2 15 -1,M)。
Fig. 11A and 11B are flow diagrams of a video decoding method 1100 implemented by an electronic device (e.g., including video decoder 30) according to some embodiments. The electronic device reconstructs multiple residual blocks (1102) of the video frame from the video bitstream, clipping luminance and chrominance residuals of the multiple residual blocks of the video frame to a first dynamic range DR, respectively A And a second dynamic range DR B (1104) And for each residual block, determining a luma residual for a luma component and a chroma residual for two chroma components of the video frame from the cropped luma and chroma residuals of the plurality of residual blocks of the video frame (1106). In some embodiments, according to the first dynamic range DR respectively A And a second dynamic range DR B Storing the cropped luma and chroma residuals for the plurality of residual blocks in a buffer. The cropping operation helps to save the storage space of the buffer and speed up the corresponding video decoding rate. In an example, the first dynamic range DR A And a second dynamic range DR B Is asymmetric and is defined as [ -2 [ 15 ,2 15 -1]I.e. dynamic range DR A And DR B Is one or both of [ -2 ] 15 ,2 15 -1]. In another example, the first dynamic range DR A And a second dynamic range DR B Is symmetrical and is defined as [ - (2) 15 -1),2 15 -1]I.e. dynamic range DR A And DR B Is one or both of [ - (2) 15 -1),2 15 -1]. In yet another example, the first dynamic range DR A And a second dynamic range DR B Is asymmetric and is defined as [ -2 [ B ,2 B -1]I.e. dynamic range DR A And DR B Is one or both of [ -2 ] B ,2 B -1]And B is the coded bit depth of the video frame and is obtained by the video bitstream. In yet another example, the first dynamic range DR A And a second dynamic range DR B Is symmetrical and is defined as [ - (2) B -1),2 B -1]I.e. dynamic range DR A And DR B Is one or both of [ -2 ] B ,2 B -1]And B is the coded bit depth obtained by the video bitstream.
In some embodiments, the plurality of residual blocks (1102) is reconstructed by inverse quantizing a plurality of quantized transform coefficients in the video bitstream to a plurality of transform coefficients for the video frame (1108) and applying an inverse transform to the plurality of transform coefficients to reconstruct the plurality of residual blocks (1110). Then, the plurality of residual blocks are clipped to a first dynamic range. Furthermore, in some embodiments, each transform coefficient of the plurality of transform coefficients is clipped at the third dynamic range DR before applying the inverse transform C Middle (1112).
In some embodiments, the electronic device determines whether a joint codec (JCCR) mode of the chroma residual is enabled (1114). In accordance with a determination that JCCR mode is enabled, e.g., by inverse JCCR unit 902, chroma residuals for the two chroma components of the video frame are jointly determined in accordance with a scheme of joint coding of the chroma residuals (1116). In some embodiments, the chroma residual for the two chroma components of the video frame is then clipped to the chroma dynamic range DR associated with the chroma bit depth CR Middle (1118). Further, in some embodiments, the video bitstream has a hierarchical structure, and the electronic device obtains syntax elements associated with a first layer of the hierarchical structure to jointly determine chroma residuals for the two chroma components of the video frame below the first layer (1120). Conversely, in some embodiments, in accordance with a determination that the syntax element indicates that the JCCR mode is disabled, the electronic device is based onThe cropped chroma residuals of the plurality of residual blocks respectively reconstruct chroma residuals for the two chroma components of the video frame (1122).
In some embodiments, an inverse adaptive color transform is also applied to two chroma components of the video frame to obtain a substitute chroma residual for the two chroma components (1124). Clipping a substitute chroma residual for two chroma components of a video frame to a fourth dynamic range DR C 1126) and scaling (1128) for both chrominance components of the video frame.
In some embodiments, each residual block includes one or more respective luma residuals and one or more respective chroma residuals. First dynamic Range DR A Is equal to the second dynamic range DR B And is symmetrical with respect to zero. Clipping, for each residual block, one or more respective luminance residuals to a first dynamic range DR A And clipping one or more respective chrominance residuals to a second dynamic range DR B In (1). Conversely, in some embodiments, each residual block includes one or more respective luma residuals and one or more respective chroma residuals. First dynamic Range DR A Symmetrical with respect to zero, and a second dynamic range DR B Asymmetric with respect to zero. Clipping one or more respective luma residuals of each residual block to a first dynamic range DR A And clipping one or more respective chroma residuals of each residual block to a second dynamic range DR B In (1).
In some embodiments associated with uniform clipping, the first dynamic range is the same as the second dynamic range. The plurality of residual blocks are reconstructed by inverse quantizing a plurality of quantized transform coefficients in the video bitstream to a plurality of transform coefficients for the video frame and applying an inverse transform to the plurality of transform coefficients to reconstruct the plurality of residual blocks. Before applying the inverse transform, the electronic device clips each transform coefficient of the plurality of transform coefficients into a first dynamic range. After determining the chroma residuals, the electronic device clips the chroma residuals for the two chroma components of the video frame into a first dynamic range, applies an inverse adaptive color transform to the two colors of the video frameThe clipped chroma residuals for the chroma components are clipped to obtain a replacement chroma residual for the two chroma components, and the replacement chroma residual for the two chroma components for the video frame is clipped into a fourth dynamic range. In some embodiments, the fourth dynamic range is the same as the first dynamic range. In some embodiments, the first dynamic range is [ -2 ] 15 ,2 15 -1]And the fourth dynamic range is [ -2 [) B ,2 B -1]Where B (i.e., the coded bit depth) is an integer less than 15.
In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, corresponding to tangible media (such as data storage media), or communication media, including any medium that facilitates transfer of a computer program (e.g., according to a communication protocol) from one place to another. In this manner, the computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium or (2) a communication medium, such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the embodiments described herein. The computer program product may include a computer-readable medium.
The terminology used in the description of the embodiments herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of the claims. As used in the description of the embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms "comprises" and/or "comprising … …," when used in this specification, specify the presence of stated features, elements, and/or components, but do not preclude the presence or addition of one or more other features, elements, components, and/or groups thereof.
It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first electrode may be referred to as a second electrode, and similarly, a second electrode may be referred to as a first electrode, without departing from the scope of embodiments. The first electrode and the second electrode are both electrodes, but they are not the same electrode.
The description of the present application has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications, variations and alternative embodiments will become apparent to those of ordinary skill in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments and with the best mode of practicing the invention in accordance with the principles and with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of the claims is not to be limited to the specific examples of the embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims.

Claims (21)

1. A method for decoding video data, comprising:
reconstructing a plurality of residual blocks of a video frame from a video bitstream;
clipping luma and chroma residuals of the plurality of residual blocks of the video frame into a first dynamic range and a second dynamic range, respectively; and is
For each residual block, a luma residual for a luma component and a chroma residual for two chroma components are determined from the cropped luma and chroma residuals of the plurality of residual blocks.
2. The method of claim 1, wherein at least one of the first dynamic range and the second dynamic range is asymmetric with respect to zero and is defined as [ -2 [ 15 ,2 15 -1]。
3. The method of claim 1, wherein at least one of the first dynamic range and the second dynamic range is symmetric with respect to zero and is defined as [ - (2) 15 -1),2 15 -1]。
4. The method of claim 1, wherein at least one of the first dynamic range and the second dynamic range is asymmetric with respect to zero and is defined as [ -2 [ B ,2 B -1]Wherein B is a coded bit depth of the video frame, the method further comprising:
obtaining the coded bit depth from the video bitstream.
5. The method of claim 1, wherein at least one of the first dynamic range and the second dynamic range is symmetric with respect to zero and is defined as [ - (2) B -1),2 B -1]Wherein B is a coded bit depth of the video frame, the method further comprising:
obtaining the coded bit depth from the video bitstream.
6. The method of any one of claims 1 to 5, wherein:
each residual block comprising one or more respective luma residuals and one or more respective chroma residuals;
the first dynamic range is equal to the second dynamic range and is symmetric with respect to zero; and is
Clipping luma and chroma residuals of the plurality of residual blocks of the video frame further comprises:
for each residual block, clipping the one or more respective luma residuals into the first dynamic range and clipping the one or more respective chroma residuals into the second dynamic range.
7. The method of any one of claims 1 to 5, wherein:
each residual block comprising one or more respective luma residuals and one or more respective chroma residuals;
the first dynamic range is asymmetric with respect to zero and the second dynamic range is symmetric with respect to zero; and is
Clipping luma and chroma residuals of the plurality of residual blocks of the video frame further comprises:
clipping the one or more respective luma residuals for each residual block into the first dynamic range and clipping the one or more respective chroma residuals for each residual block into the second dynamic range.
8. The method of any one of claims 1-7, wherein the second dynamic range is less than or included in the first dynamic range.
9. The method of any one of claims 1-8, wherein reconstructing the plurality of residual blocks from the video bitstream comprises:
inverse quantizing a plurality of quantized transform coefficients in the video bitstream to a plurality of transform coefficients for the video frame; and is
Applying an inverse transform to the plurality of transform coefficients to reconstruct the plurality of residual blocks.
10. The method of claim 9, further comprising:
clipping each transform coefficient of the plurality of transform coefficients into a third dynamic range prior to applying the inverse transform.
11. The method of any of claims 1-10, further comprising:
storing the clipped luma and chroma residuals of the plurality of residual blocks in a cache according to the first and second dynamic ranges, respectively.
12. The method of any of claims 1-11, further comprising:
applying an inverse adaptive color transform to the two chroma components of the video frame to obtain a substitute chroma residual for the two chroma components;
clipping a replacement chroma residual of the two chroma components of the video frame into a fourth dynamic range; and is
Scaling the clipped substitute chroma residual of the two chroma components of the video frame.
13. The method of any of claims 1-12, further comprising:
determining whether a JCCR mode of a chroma residual is enabled, wherein in accordance with a determination that the JCCR mode is enabled, chroma residuals of the video frame for the two chroma components are jointly determined in accordance with a scheme of joint coding of chroma residuals.
14. The method of claim 13, further comprising:
for each residual block, clipping chroma residuals of the two chroma components of the video frame into a chroma dynamic range.
15. The method of claim 13, wherein the video bitstream has a hierarchical structure, the method further comprising:
obtaining a syntax element associated with a first level of the hierarchy; wherein chroma residuals of the two chroma components of the video frame are jointly determined below the first layer.
16. The method of claim 15, determining a luma residual for the luma component and a chroma residual for the two chroma components further comprising:
in accordance with a determination that the syntax element indicates that the JCCR mode is disabled, reconstructing chroma residuals for the video frame for the two chroma components based on the cropped chroma residuals for the plurality of residual blocks, respectively.
17. The method of claim 1, wherein:
the first dynamic range and the second dynamic range are the same;
reconstructing the plurality of residual blocks from the video bitstream comprises:
inverse quantizing a plurality of quantized transform coefficients in the video bitstream to a plurality of transform coefficients for the video frame; and is
Applying an inverse transform to the plurality of transform coefficients to reconstruct the plurality of residual blocks;
the method further comprises the following steps:
clipping each transform coefficient of the plurality of transform coefficients into the first dynamic range prior to applying an inverse transform;
clipping chroma residuals of the video frame for the two chroma components into the first dynamic range;
applying an inverse adaptive color transform to clipped chroma residuals of the two chroma components of the video frame to obtain alternative chroma residuals for the two chroma components; and is
Clipping a substitute chroma residual of the video frame for the two chroma components into a fourth dynamic range.
18. The method of claim 17, wherein the fourth dynamic range is the same as the first dynamic range.
19. The method of claim 17, wherein the first dynamic range is [ -2 ] 15 ,2 15 -1]The fourth dynamic range is [ -2 ] B ,2 B -1]Which isWherein B is an integer less than 15.
20. An electronic device, comprising:
one or more processors; and
a memory having instructions stored thereon that, when executed by the one or more processors, cause the processors to perform the method of any one of claims 1-19.
21. A non-transitory computer-readable medium having stored thereon instructions, which when executed by one or more processors, cause the processors to perform the method of any one of claims 1-19.
CN202080088818.7A 2019-12-30 2020-12-30 Coding and decoding of chroma residual Pending CN114846807A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201962955319P 2019-12-30 2019-12-30
US62/955,319 2019-12-30
PCT/US2020/067547 WO2021138476A1 (en) 2019-12-30 2020-12-30 Coding of chrominance residuals

Publications (1)

Publication Number Publication Date
CN114846807A true CN114846807A (en) 2022-08-02

Family

ID=76687454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080088818.7A Pending CN114846807A (en) 2019-12-30 2020-12-30 Coding and decoding of chroma residual

Country Status (2)

Country Link
CN (1) CN114846807A (en)
WO (1) WO2021138476A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023024712A1 (en) * 2021-08-27 2023-03-02 Mediatek Inc. Method and apparatus of joint coding for multi-colour components in video coding system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7929610B2 (en) * 2001-03-26 2011-04-19 Sharp Kabushiki Kaisha Methods and systems for reducing blocking artifacts with reduced complexity for spatially-scalable video coding
JP6338469B2 (en) * 2014-06-23 2018-06-06 キヤノン株式会社 Image processing apparatus and image processing method
US10158836B2 (en) * 2015-01-30 2018-12-18 Qualcomm Incorporated Clipping for cross-component prediction and adaptive color transform for video coding
WO2018070914A1 (en) * 2016-10-12 2018-04-19 Telefonaktiebolaget Lm Ericsson (Publ) Residual refinement of color components

Also Published As

Publication number Publication date
WO2021138476A1 (en) 2021-07-08

Similar Documents

Publication Publication Date Title
CN114710679B (en) Small chroma block size limitation in video coding and decoding
US20220201301A1 (en) Methods and apparatus of video coding in 4:4:4 chroma format
CN113906749B (en) Chroma residual joint coding method and device
WO2021247881A1 (en) Chroma coding enhancement in the prediction from multiple cross-components (pmc) mode
US20220286673A1 (en) Deblocking filtering for video coding
CN114762329A (en) Method and apparatus for video encoding and decoding using palette mode
WO2020252270A1 (en) Methods and system of subblock transform for video coding
WO2021046509A1 (en) Prediction mode signaling in video coding
US20230109849A1 (en) Methods and apparatus of video coding in 4:4:4 color format
US20220303580A1 (en) Methods and apparatus of video coding in 4:4:4 chroma format
CN114846807A (en) Coding and decoding of chroma residual
WO2021062017A1 (en) Methods and apparatus of performing rate-distortion analysis for palette mode
WO2021207731A1 (en) Methods and apparatus for high-level syntax in video coding
CN118075461A (en) Video decoding method, device and medium
CN118118672A (en) Video decoding method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination