CN110301134B

CN110301134B - Integrated image shaping and video coding

Info

Publication number: CN110301134B
Application number: CN201880012069.2A
Authority: CN
Inventors: 吕陶然; 浦方君; 尹鹏; 陈涛; W·J·胡萨克
Original assignee: Dolby Laboratories Licensing Corp
Current assignee: Dolby Laboratories Licensing Corp
Priority date: 2017-06-29
Filing date: 2018-06-29
Publication date: 2024-02-06
Anticipated expiration: 2038-06-29
Also published as: CN117793378A; EP3571838B1; CN116095314A; KR20200021913A; CN116095315A; CN116095313A; CN117793377A; CN117793379A; EP4064701A1; WO2019006300A1; US11490095B1; RU2020122372A3; BR112019016885A2; CN110301134A; US20230021624A1; KR102580314B1; JP2023015093A; US10992941B2; RU2727100C1; RU2746981C2

Abstract

Given a sequence of images represented by a first codeword, methods, processes and systems are presented for integrating shaping into a next generation video codec for encoding and decoding images, wherein shaping allows a portion of the images to be encoded with a second codeword representation that allows for more efficient compression than using the first codeword representation. Various architectures are discussed, including: an out-of-loop shaping architecture, an in-loop shaping architecture for intra pictures only, an in-loop architecture for prediction residuals, and a hybrid in-loop shaping architecture. Also proposed are syntax methods for signaling shaping parameters and image coding methods optimized for shaping.

Description

Integrated image shaping and video coding

Cross Reference to Related Applications

The present application claims priority from U.S. provisional patent application serial No. 62/686,738 filed on day 19 of 6, 2018, serial No. 62/680,710 filed on day 5 of 6, 2018, serial No. 62/629,313 filed on day 12, 2017, 9, 21, and serial No. 62/526,577 filed on day 29, 2017, each of which is incorporated herein by reference in its entirety.

Technical Field

The present invention relates generally to image and video coding. More particularly, embodiments of the present invention relate to integrated image shaping and video coding.

Background

In 2013, the MPEG expert group in the international organization for standardization (ISO) issued, along with the International Telecommunications Union (ITU), a first draft of the HEVC (also known as h.265) video coding standard. Recently, the MPEG expert group has issued a piece of evidence for supporting the development of next generation coding standards that provide improved coding performance compared to existing video coding techniques.

As used herein, the term 'bit depth' represents the number of pixels used to represent one of the color components of an image. Traditionally, images are encoded with 8 bits per color component per pixel (e.g., 24 bits per pixel); modern architectures, however, can now support higher bit depths, such as 10 bits, 12 bits, or more.

In conventional image pipelines (pipeline), captured images are quantized using a nonlinear photoelectric function (OETF) that converts linear scene light into a nonlinear video signal (e.g., gamma encoded RGB or YCbCr). The signal is then processed at the receiver by an electro-optic transfer function (EOTF) that converts the video signal values to output screen color values before being displayed on the display. Such nonlinear functions include the conventional "gamma" curves described in ITU-R rec.bt.709 and bt.2020, and the "PQ" (perceptual quantization) curves described in SMPTE ST 2084 and rec.itu-R bt.2100.

As used herein, the term "forward shaping (forward reshaping)" refers to the process of sample-to-sample mapping or codeword-to-codeword mapping of a digital image from its original bit depth and original codeword distribution or representation (e.g., gamma or PQ, etc.) to an image of the same or different bit depths and different codeword distributions or representations. Shaping allows improving the compressibility or improving the image quality at a fixed bit rate. For example, without limitation, shaping may be applied to HDR video encoded with 10-bit or 12-bit PQ to improve coding efficiency in a 10-bit video coding architecture. In the receiver, after decompressing the shaped signal, the receiver may apply an "inverse shaping function" to restore the signal to its original codeword distribution. As understood herein by the inventors, as developments for next generation video coding standards begin, improved techniques for integrated shaping and coding of images are desired. The method of the present invention may be applied to a variety of video content including, but not limited to, content in Standard Dynamic Range (SDR) and/or High Dynamic Range (HDR).

The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Thus, unless otherwise indicated, any approaches described in this section are not to be construed so as to qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, questions identified with respect to one or more methods should not be deemed to be recognized in any prior art based on this section.

Disclosure of Invention

A first aspect of the present disclosure relates to a method for encoding an image with a processor, the method may include: accessing, with a processor, an input image represented by a first codeword; generating a forward shaping function mapping pixels of the input image to a second codeword representation, wherein the second codeword representation allows for more efficient compression than the first codeword representation; generating an inverse shaping function based on the forward shaping function, wherein the inverse shaping function maps pixels from the second codeword representation to the first codeword representation; for an input pixel region in the input image; calculating a prediction region based on pixel data in a reference frame buffer or in a previously encoded spatial neighborhood; generating a shaped residual region based on the input pixel region, the prediction region, and the forward shaping function; generating a quantized residual region based on the shaped residual region; generating a dequantized residual region based on the quantized residual region; generating a reconstructed pixel region based on the dequantized residual region, the prediction region, the forward shaping function, and the inverse shaping function; and generating a reference pixel region to be stored on the reference frame buffer based on the reconstructed pixel region.

A second aspect of the present disclosure relates to a method for decoding an encoded bitstream with a processor to generate an output image represented by a first codeword, the method may include: the receiving portion employing a second codeword representation of the encoded image, wherein the second codeword representation allows for more efficient compression than the first codeword representation; receiving shaping information of the encoded image; generating a forward shaping function mapping pixels from the first codeword representation to the second codeword representation based on the shaping information; generating an inverse shaping function based on the shaping information, wherein the inverse shaping function maps pixels from the second codeword representation to the first codeword representation; a region for the encoded image; generating a decoded shaped residual region; generating a prediction region based on pixels in a reference pixel buffer or in a previously decoded spatial neighborhood; generating a reconstructed pixel region based on the decoded shaped residual region, the prediction region, the forward shaping function, and the reverse shaping function; generating an output pixel region of the output image based on the reconstructed pixel region; and storing the output pixel region in the reference pixel buffer.

A third aspect of the present disclosure relates to a method for decoding an encoded bitstream with a processor to generate an output image represented by a first codeword, the method may include: the receiving portion employing a second codeword representation of the encoded image, wherein the second codeword representation allows for more efficient compression than the first codeword representation; receiving shaping information of the encoded image; generating a shaping scaling function based on the shaping information; a region for the encoded image; generating a decoded shaped residual region; generating a prediction region based on pixels in a reference pixel buffer or in a previously decoded spatial neighborhood; generating a reconstructed pixel region based on the decoded shaped residual region, the prediction region, and the shaping scaling function; generating an output pixel region of the output image based on the reconstructed pixel region; and storing the output pixel region in the reference pixel buffer.

A fourth aspect of the present disclosure relates to a method for encoding an image with a processor, the method may include: accessing, with a processor, an input image represented by a first codeword; selecting a shaping architecture from two or more candidate coding architectures for compressing the input image with a second codeword representation, wherein the second codeword representation allows for more efficient compression than the first codeword representation, wherein the two or more candidate coding architectures include an out-of-loop shaping architecture, an in-loop shaping architecture for intra-frames only, and an in-loop architecture for prediction residuals; and compressing the input image according to the selected shaping architecture.

A fifth aspect of the present disclosure relates to a method for decoding an encoded bitstream with a processor to generate an output image represented by a first codeword, the method may include: receiving an encoded bitstream comprising one or more encoded images, wherein at least a portion of the encoded images are represented with a second codeword, wherein the second codeword representation allows for more efficient compression than the first codeword representation; determining a shaping decoder architecture based on metadata in the encoded bitstream, wherein the shaping decoder architecture comprises one of an out-of-loop shaping architecture, an in-loop shaping architecture for intra-frames only, or an in-loop architecture for prediction residuals; receiving shaping information of an encoded image in the encoded bit stream; and decompressing the encoded image according to the shaping decoder architecture to generate the output image.

A sixth aspect of the present disclosure relates to an apparatus for image shaping, the apparatus may comprise: one or more processors, and memory having software instructions stored thereon that, when executed by the one or more processors, cause performance of a method according to the present disclosure.

A seventh aspect of the present disclosure relates to a non-transitory computer-readable storage medium that may have stored thereon computer-executable instructions for performing a method according to the present disclosure.

Drawings

Embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:

FIG. 1A depicts an example process of a video transmission pipeline;

FIG. 1B depicts an example process for data compression using signal shaping according to the prior art;

FIG. 2A depicts an example architecture of an encoder using canonical extra-loop shaping in accordance with an embodiment of the invention;

FIG. 2B depicts an example architecture of a decoder using canonical extra-loop shaping in accordance with an embodiment of the invention;

FIG. 2C depicts an example architecture of an encoder that uses intra-only intra-loop shaping of specifications, according to an embodiment of the invention;

FIG. 2D depicts an example architecture of a decoder that uses intra-only intra-loop shaping of specifications, according to an embodiment of the invention;

fig. 2E depicts an example architecture of an encoder using in-loop shaping for prediction residuals, according to an embodiment of the invention;

FIG. 2F depicts an example architecture of a decoder using in-loop shaping for prediction residuals, according to an embodiment of the invention;

FIG. 2G depicts an example architecture of an encoder using hybrid in-loop shaping, according to an embodiment of the invention;

FIG. 2H depicts an example architecture of a decoder using hybrid in-loop shaping, according to an embodiment of the invention;

FIG. 3A depicts an example process for encoding video using an extra-loop shaping architecture, according to an embodiment of the invention;

FIG. 3B depicts an example process for decoding video using an out-of-loop shaping architecture, according to an embodiment of the invention;

FIG. 3C depicts an example process for encoding video using an intra-only intra-ring shaping architecture, according to an embodiment of the invention;

FIG. 3D depicts an example process for decoding video using an intra-only intra-ring shaping architecture, according to an embodiment of the invention;

FIG. 3E depicts an example process for encoding video using an in-loop shaping architecture for prediction residuals, according to an embodiment of the invention;

FIG. 3F depicts an example process for decoding video using an in-loop shaping architecture for prediction residuals, according to an embodiment of the invention;

FIG. 4A depicts an example process for encoding video using any one of three shaping-based architectures, or a combination thereof, in accordance with an embodiment of the present invention;

FIG. 4B depicts an example process for decoding video using any one of three shaping-based architectures, or a combination thereof, in accordance with an embodiment of the invention;

FIGS. 5A and 5B depict a shaping function reconstruction process in a video decoder according to an embodiment of the present invention;

fig. 6A and 6B depict examples of how chroma QP offset values vary according to the luma Quantization Parameters (QP) of the PQ encoded signal and the HLG encoded signal according to an embodiment of the present invention; and is also provided with

FIG. 7 depicts an example of a pivot-based (pivot) representation of a shaping function according to an embodiment of the invention.

Detailed Description

Signal shaping and encoding techniques for out-of-loop and in-loop integration of specifications for compressed images are described herein. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It may be evident, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are not described in detail to avoid unnecessarily obscuring, or obscuring the present invention.

SUMMARY

Example embodiments described herein relate to signal shaping and encoding for video integration. In the encoder, a processor receives an input image that employs a first codeword representation represented by an input bit depth N and an input codeword map (e.g., gamma, PQ, etc.). The processor selects an encoder architecture from two or more candidate encoder architectures for compressing the input image using a second codeword representation that allows for more efficient compression than the first codeword representation (wherein the shaper is an integral part of the encoder), wherein the two or more candidate encoder architectures include an out-of-loop shaping architecture, an in-loop shaping architecture for intra pictures only, or an in-loop architecture for prediction residuals, and the processor compresses the input image according to the selected encoder architecture.

In another embodiment, a decoder for generating an output image represented by a first codeword receives an encoded bitstream, wherein at least a portion of the encoded image is compressed in a second codeword representation. The decoder also receives associated shaping information. The processor receives signaling indicating decoder architectures from two or more candidate decoder architectures for decompressing an input encoded bitstream, wherein the two or more candidate decoder architectures include an out-of-loop shaping architecture, an in-loop shaping architecture for intra pictures only, or an in-loop architecture for prediction residuals, and decompresses the encoded image according to the received shaping architecture to generate an output image.

In another embodiment, in an encoder for compressing an image according to an in-loop architecture for prediction residuals, a processor accesses an input image that employs a first codeword representation and generates a forward shaping function that maps pixels of the input image from the first codeword representation to a second codeword representation. The processor generates an inverse shaping function that maps pixels represented by the second codeword to pixels represented by the first codeword based on the forward shaping function. Then, for an input pixel region in the input image: the processor performs the following operations:

calculating at least one prediction region based on pixel data in a reference frame buffer or in a previously encoded spatial neighborhood;

generating a shaped residual region based on the input pixel region, the prediction region, and the forward shaping function;

generating an encoded (transformed and quantized) residual region based on the shaped residual region;

generating a decoded (inverse quantized and inverse transformed) residual region based on the encoded residual region;

generating a reconstructed pixel region based on the decoded residual region, the prediction region, the forward shaping function, and the reverse shaping function; and

A reference pixel region to be stored on the reference frame buffer is generated based on the reconstructed pixel region.

In another embodiment, the processor receiving portion uses the second codeword to represent an encoded bitstream encoded in a decoder for generating an output image represented by the first codeword according to an intra-loop architecture for the prediction residual. The processor also receives associated shaping information. The processor generates a forward shaping function and an inverse shaping function based on the shaping information, wherein the forward shaping function maps pixels from the first codeword representation to the second codeword representation and the inverse shaping function maps pixels from the second codeword representation to the first codeword representation. For regions of the encoded image, the processor performs the following operations:

generating a decoded shaped residual region based on the encoded image;

generating a prediction region based on pixels in a reference pixel buffer or in a previously decoded spatial neighborhood;

generating a reconstructed pixel region based on the decoded and shaped residual region, the prediction region, the forward shaping function, and the reverse shaping function;

generating an output pixel region based on the reconstructed pixel region; and storing the output pixel region in a reference pixel buffer.

Example video Transmission processing pipeline

FIG. 1A depicts an example process of a conventional video transmission pipeline (100) showing various stages from video capture to video content display. A sequence of video frames (102) is captured or generated using an image generation block (105). The video frames (102) may be captured digitally (e.g., by a digital camera) or generated by a computer (e.g., using a computer animation) to provide video data (107). Alternatively, the video frames (102) may be captured on film by a film camera. The film is converted to a digital format to provide video data (107). In a production phase (110), the video data (107) is edited to provide a video production stream (112).

The streamed video data (112) is then provided to a processor at block (115) for post production editing. The post-production editing of the box (115) may include adjusting or modifying the color or brightness (brightness) in a particular region of the image to enhance image quality or to achieve a particular appearance of the image according to the authoring intent of the video creator. This is sometimes referred to as "color adjustment" or "color grading". Other edits (e.g., scene selection and ordering, image cropping, adding computer-generated visual effects, etc.) may be performed at block (115) to produce a final version of the product for distribution (117). During post-production editing (115), video images are viewed on a reference display (125).

After post-production (115), the video data of the end product (117) may be transmitted to an encoding box (120) for downstream transmission to decoding and playback devices such as televisions, set-top boxes, movie theatres, and the like. In some embodiments, the encoding block (120) may include audio encoders and video encoders such as those defined by ATSC, DVB, DVD, blu-ray, and other transport formats to generate the encoded bitstream (122). In the receiver, the encoded bitstream (122) is decoded by a decoding unit (130) to generate a decoded signal (132) representing an exact same or close approximation (close approximation) of the signal (117). The receiver may be attached to a target display (140) that may have entirely different characteristics than the reference display (125). In this case, the display management box (135) may be used to map the dynamic range of the decoded signal (132) to the characteristics of the target display (140) by generating a display mapping signal (137).

Signal shaping

Fig. 1B depicts an example process for signal shaping according to the prior art (reference [1 ]). Given an input frame (117), a forward shaping block (150) analyzes the input constraint and the encoding constraint and generates a codeword mapping function that maps the input frame (117) to a re-quantized output frame (152). For example, the input (117) may be encoded according to some electro-optic transfer function (EOTF) (e.g., gamma). In some embodiments, metadata may be used to communicate information about the shaping process to downstream devices (such as decoders). As used herein, the term "metadata" relates to any auxiliary information that is transmitted as part of the encoded bitstream and that assists the decoder in rendering the decoded image. Such metadata may include, but is not limited to, color space or gamut information, reference display parameters, and auxiliary signal parameters, as those described herein.

After encoding (120) and decoding (130), the decoded frames (132) may be processed for further downstream processing, such as the display management process (135) discussed previously, by converting the re-quantized frames (132) back to a backward (or inverse) shaping function (160) of the original EOTF domain (e.g., gamma). In some embodiments, the backward shaping function (160) may be integrated with a dequantizer in the decoder (130), e.g., as part of a dequantizer in an AVC or HEVC video decoder.

As used herein, the term "shaper" may refer to a forward shaping function or an inverse shaping function to be used in encoding and/or decoding a digital image. Examples of shaping functions are discussed in references [1] and [2 ]. For the purposes of the present invention, it is assumed that a person skilled in the art can derive suitable forward and reverse shaping functions depending on the characteristics of the input video signal and the available bit depth of the encoding and decoding architecture.

In reference [1], a block-based intra-loop image shaping method for high dynamic range video coding is proposed. This design allows block-based shaping within the encoding loop, but at the cost of increased complexity. Specifically, the design requires maintaining two sets of decoded image buffers: a set of decoded pictures for reverse shaping (or non-shaping), which can be used for prediction without shaping and for output to a display; and the other set is for forward shaped decoded pictures that are only used for prediction in the shaped case. Although the forward shaped decoded picture can be computed in real time, the complexity cost is very high, especially for inter prediction (motion compensation with sub-pixel interpolation). In general, display Picture Buffer (DPB) management is complex and requires great care, and thus, as understood by the present inventors, a simplified method for encoding video is desired.

Embodiments of the shaping-based codec architecture presented herein may be divided as follows: an architecture with an outer loop outer shaper, an architecture with an intra-loop only intra-frame shaper, and an architecture with an intra-loop shaper for prediction residuals (also simply referred to as an 'intra-loop residual shaper'). The video encoder or decoder may support any one or a combination of these architectures. Each of these architectures may also be applied alone or in combination with any of the other architectures. Each architecture may be applied to a luma (luma) component, a chroma component, or a combination of luma (luma) and one or more chroma components.

In addition to these three architectures, additional embodiments describe efficient signaling methods for shaping-related metadata and several encoder-based optimization tools for improving coding efficiency when shaping is applied.

Canonical out-of-loop shaper

Fig. 2A and 2B depict the architecture for a video encoder (200a_e) and a corresponding video decoder (200a_d) with a "normal" out-of-loop shaper. The term "canonical" means that unlike previous designs in which shaping is considered a preprocessing step and thus outside of the description of the canonical of the coding standard, such as AVC, HEVC, etc., forward shaping and reverse shaping are part of the requirements of the canonical in this embodiment. Unlike the architecture of fig. 1B in which bitstream conformance is tested according to the standard after decoding (130), in fig. 2B conformance is tested after the inverse shaping block (265) (e.g., at output 162 in fig. 1B).

In the encoder (200 a_e), two new boxes are added to a conventional block-based encoder (e.g., HEVC): block (205) for estimating a forward shaping function, and a method for estimating a forward shaping function for an input video (117) A forward picture shaping box (210) applying forward shaping to one or more color components of the image. In some embodiments, these two operations may be performed as part of a single image shaping box. Parameters (207) related to determining the inverse shaping function in the decoder may be passed to a lossless encoder block (e.g., CABAC 220) of the video encoder so that the parameters may be embedded in the encoded bitstream (122). Performing intra-or inter-prediction (225), transform and quantization (T and Q), inverse transform and inverse quantization (Q) using shaped pictures stored in a DPB (215) ^-1 And T ^-1 ) And all loop filtering related operations.

In the decoder (200 a_d), two new canonical blocks are added to the traditional block-based decoder: a block (250) for reconstructing an inverse shaping function based on the encoded shaping function parameters (207), and a block (265) for applying the inverse shaping function to the decoded data (262) to generate a decoded video signal (162). In some embodiments, the operations associated with blocks 250 and 265 may be combined into a single processing block.

Fig. 3A depicts an example process (300a_e) for encoding video using an extra-loop shaping architecture (200a_e) according to an embodiment of the invention. If shaping is not enabled (path 305), the encoding proceeds as known in prior art encoders (e.g., HEVC). If shaping is enabled (path 310), the encoder may have the following options: either a predetermined (default) shaping function (315) is applied or a new shaping function (325) is adaptively determined based on picture analysis (320) (e.g., as described in references [1] to [3 ]). After forward shaping (330), the remainder of the encoding follows a conventional encoding pipeline (335). If adaptive shaping is employed (312), metadata associated with the inverse shaping function is generated as part of the "code shaper" step (327).

Fig. 3B depicts an example process (300a_d) for decoding video using an extra-loop shaping architecture (200a_d) according to an embodiment of the invention. If shaping is not enabled (path 355), after decoding the picture (350), an output frame is generated (390) as in a conventional decoding pipeline. If shaping is enabled (path 360), then in step (370) the decoder determines whether to apply a predetermined (default) shaping function (375) or to adaptively determine an inverse shaping function (380) based on the received parameters (e.g., 207). After reverse shaping (385), the remainder of the decoding follows a conventional decoding pipeline.

Intra-only shaper within a canonical ring

Fig. 2C depicts an example architecture of an encoder (200 b_e) for intra-loop-only shaping using specifications, according to an embodiment of the invention. The design is very similar to the design proposed in reference [1 ]; however, to reduce complexity, intra pictures are encoded using only this architecture, especially when use of DPB memories (215 and 260) is involved.

The main difference of the encoder 200b_e compared to the out-of-loop shaping (200a_e) is: the DPB (215) stores the inversely shaped pictures instead of the shaped pictures. In other words, the decoded intra pictures need to be reverse shaped (by reverse shaping unit 265) before they are stored into the DPB. The reason behind this approach is that if intra pictures are encoded with shaping, the improved performance of encoding intra pictures will propagate to (implicitly) improve the encoding of inter pictures even if inter pictures are not encoded with shaping. In this way one can take advantage of shaping without having to deal with the complexity of in-loop shaping of inter pictures. Since the inverse shaping (265) is part of the inner loop, the inverse shaping may be performed before the in-loop filter (270). The advantage of adding the reverse shaping before the in-loop filter is that: in this case, the design of the in-loop filter may be optimized based on the characteristics of the original picture, rather than the characteristics of the forward shaped picture.

Fig. 2D depicts an example architecture of a decoder (200 b_d) for intra-loop-only shaping using specifications, according to an embodiment of the invention. As depicted in fig. 2D, the determining inverse shaping function (250) and the applying inverse shaping (265) are now performed prior to in-loop filtering (270).

Fig. 3C depicts an example process (300b_e) for encoding video using an intra-only intra-ring shaping architecture according to an embodiment of the invention. As depicted, the operational flow in fig. 3C shares many elements with the operational flow in fig. 3A. Now, by default, no shaping is applied to inter-coding. For intra coded pictures, if shaping is enabled, the encoder again has the option to use a default shaping curve or apply adaptive shaping (312). If the picture is shaped, inverse shaping (385) is part of the process and the associated parameters are encoded in step (327). The corresponding decoding process (300b_d) is depicted in fig. 3D.

As depicted in fig. 3D, the shaping-related operation is enabled only for received intra pictures, and only when intra shaping is applied on the encoder.

In-loop shaper for prediction residual

In encoding, the term 'residual' denotes the difference between a prediction of a sample or data element and its original or decoded value. For example, given an original sample (denoted original_sample) from the input video (117), intra or inter prediction (225) may generate a corresponding prediction sample (227) denoted pred_sample. If not shaped, the unshaped residual (Res_u) may be defined as:

Res_u＝Orig_sample-Pred_sample。 (1)

In some embodiments, it may be beneficial to apply shaping to the residual domain. Fig. 2E depicts an example architecture for an encoder (200 c_e) using in-loop shaping for prediction residuals according to an embodiment of the invention. Let Fwd () represent the forward shaping function and Inv () represent the corresponding reverse shaping function. In an embodiment, the shaped residual (232) may be defined as:

Res_r＝Fwd(Orig_sample)-Fwd(Pred_sample)。 (2)

accordingly, at the output (267) of the inverse shaper (265), the reconstructed samples (267) denoted as Reco_sample may be expressed as:

Reco_sample＝Inv(Res_d+Fwd(Pred_sample))， (3)

where res_d represents the residual (234) after intra-loop encoding and decoding in 200c_e, i.e. a close approximation of res_r.

Note that while shaping is applied to the residual, the actual input video pixels are not shaped. Fig. 2F depicts a corresponding decoder (200c_d). Note that as depicted in fig. 2F and based on equation (3), the decoder needs to access both the forward and reverse shaping functions, which can be extracted using the received metadata (207) and the "shaper decode" box (250).

In an embodiment, equations (2) and (3) may be simplified in order to reduce complexity. For example, assuming that the forward shaping function can be approximated by a piecewise linear function and the absolute difference between pred_sample and origin_sample is relatively small, equation (2) can be approximated as:

Res_r＝a(Pred_sample)*(Orig_sample-Pred_sample)， (4)

Where a (pred_sample) represents a scaling factor based on the value of pred_sample. According to equations (3) and (4), equation (3) may be approximated as:

Reco_sample＝Pred_sample+(1/a(Pred_sample))*Res_r， (5)

thus, in an embodiment, only the scaling factor a (pred_sample) for the piecewise linear model need be transmitted to the decoder.

Fig. 3E and 3F depict example process flows for encoding (300c_e) and decoding (300c_d) video using intra-loop shaping of prediction residuals. The processes are very similar to those described in fig. 3A and 3B and therefore need not be described.

Table 1 summarizes the key features of the three architectures proposed.

Table 1: key features of the shaping architecture under consideration

Fig. 4A and 4B depict example encoding and decoding processes for encoding and decoding using a combination of the three architectures proposed. As depicted in fig. 4A, if shaping is not enabled, the input video is encoded according to known video encoding techniques (e.g., HEVC, etc.) without using any shaping. Otherwise, the encoder may select any of the three methods mainly proposed according to the capabilities and/or input characteristics of the target receiver. For example, in an embodiment, the encoder may switch between these methods at the scene level, where 'scene' is represented as a sequence of consecutive frames with similar luma characteristics. In another embodiment, the high-level parameters are defined in a Sequence Parameter Set (SPS) level.

As depicted in fig. 4B, the decoder may invoke any of the respective decoding processes for decoding the incoming encoded bitstream in accordance with the received shaping information signaling.

Shaping in mixing ring

Fig. 2G depicts an example architecture (200 d_e) for an encoder using a hybrid in-loop shaping architecture. This architecture combines elements from both the intra-only intra-loop shaping architecture (200b_e) and the intra-loop residual architecture (200c_e) discussed previously. Under this architecture, intra slices are encoded according to an intra-annular intra-shaping encoding architecture (e.g., 200b_e in fig. 2C), with a few differences: for intra slices, inverse picture shaping (265-1) is performed after loop filtering (270-1). In another embodiment, in-loop filtering may be performed on intra slices after reverse shaping; however, experimental results indicate that such an arrangement may result in a worse coding efficiency than performing the inverse shaping after loop filtering. The remaining operations remain the same as previously discussed.

As previously discussed, the inter-slices are encoded according to an intra-loop residual coding architecture (e.g., 200c_e in fig. 2E). As depicted in fig. 2G, intra/inter slice switching allows switching between the two architectures depending on the slice type to be encoded.

Fig. 2H depicts an example architecture (200 d_d) for a decoder using hybrid in-loop shaping. Again, intra slices are decoded according to an intra shaping decoder architecture (e.g., 200b_d in fig. 2D), where loop filtering (270-1) precedes inverse picture shaping (265-1), again for intra slices. The inter slices are decoded according to an intra-loop residual coding architecture (e.g., 200c_d in fig. 2F). As depicted in fig. 2H, intra/inter slice switching allows switching between these two architectures depending on the slice type in the encoded video picture.

By invoking the encoding processes 300D-E depicted in FIG. 2G, FIG. 4A can be readily expanded to also include a hybrid intra-loop plastic encoding method. Similarly, FIG. 4B can be easily extended to also include hybrid in-loop shaping decoding methods by invoking the decoding process 300D-D depicted in FIG. 2H.

Slice-level shaping

Embodiments of the present invention allow for adaptation of various slice levels. For example, to reduce computation, shaping may be enabled for intra slices only or inter slices only. In another embodiment, shaping may be allowed based on the value of a time ID (e.g., the variable TemporalId of HEVC (reference [11 ]), where TemporalId = nuh temporal ID plus 1-1). For example, if the template id of the current slice is less than or equal to a predefined value, the slice_restore_enable_flag of the current slice may be set to 1, otherwise the slice_restore_enable_flag will be 0. To avoid sending the slice_reserve_enable_flag parameter for each slice, the sps_reserve_temporal_id parameter may be specified at the SPS level, so the value of the slice_reserve_enable_flag parameter may be inferred.

For shaping enabled slices, the decoder needs to know which shaping model to use. In one embodiment, the shaping model defined at the SPS level may be used throughout. In another embodiment, the shaping model defined in the slice header may be used throughout. If no shaping model is defined in the current slice, the shaping model used in the most recently decoded slice that has been shaped can be applied. In another embodiment, the shaping model may be specified in the intra slices at all times, whether shaping is used for the intra slices or not. In such an embodiment, the parameters slice_reserve_enable_flag and slice_reserve_model_present_flag need to be disassociated. An example of such a slice syntax is depicted in table 5.

Signaling of shaping information

The information related to forward and/or reverse shaping may be present at different information layers, for example, at a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice header, supplemental information (SEI), or any other high level syntax. By way of example and not limitation, table 2 provides an example of a high level syntax in an SPS for signaling whether shaping is enabled, whether shaping is adaptive, and which of these three architectures is being used.

Table 2: examples of shaping information in SPS

Additional information may also be carried at some other layer, such as in the slice header. The shaping function may be described by a look-up table (LUT), a piecewise polynomial, or other kind of parameterized model. The type of shaping model used to transmit the shaping function may be signaled by an additional syntax element (e.g., a resharpening_model_type flag). For example, consider a system using two different representations: model_a (e.g., reshaping_model_type=0) represents the shaping function as a set of piecewise polynomials (e.g., see reference [4 ]), whereas in model_b (e.g., reshaping_model_type=1), the shaping function is adaptively derived by assigning codewords to different luma bands based on picture luma characteristics and visual importance (e.g., see reference [3 ]). Table 3 provides an example of syntax elements in the slice header of a picture to assist the decoder in determining the appropriate shaping model that is being used.

Table 3: example syntax for shaping signaling in slice header

The following three tables describe alternative examples of bit stream syntax for signal shaping at the sequence layer, slice layer, or Coding Tree Unit (CTU) layer.

Table 4: examples of shaping information in SPS

Table 5: example syntax for shaping signaling in slice header

Table 6: example syntax for shaping signaling in CTU

For tables 4 through 6, example semantics may be expressed as:

the sps_reshaper_enable_flag equal to 1 specifies that a shaper is used in the Coded Video Sequence (CVS). The sps_higher_enabled_flag being equal to 0 specifies that no shaper is used in the CVS.

slice_restore_enable_flag equal to 1 specifies that the current slice has the shaper enabled. slice_restore_enable_flag equal to 0 specifies that the current slice does not enable the shaper.

sps_higher_signal_type indicates the original codeword distribution or representation. By way of example and not limitation, sps_restore_signal_type equals 0 specifying SDR (gamma); sps_reserve_signal_type equals 1 designated PQ; and sps_restore_signal_type equals 2 specified HLG.

A reshiper CTU control flag equal to 1 indicates that the shaper is allowed to be adapted for each CTU. A resharer CTU control flag equal to 0 indicates that the shaper is not allowed to be adapted for each CTU. When the rest_cut_control_flag does not exist, it should be inferred that the value is 0.

The reshiper CTU flag equal to 1 specifies the use of the shaper for the current CTU. The reshaper_cut_flag being equal to 0 specifies that the current CTU does not use a shaper. When the reshiper CTU flag does not exist, it should be inferred that this value is equal to the slice _ reserve _ enabled _ flag.

A sps_reshaper_model_present_flag equal to 1 indicates the presence of sps_reshaper_model () in sps. A sps_reshaper_model_present_flag equal to 0 indicates that there is no sps_reshaper_model ().

slice_restore_model_present_flag equal to 1 indicates that slice_slice_model () is present in the slice header. A slice_reshaper_model_present_flag equal to 0 indicates that no slice_reshaper_model () is present in the SPS.

The sps_restore_chromaadj being equal to 1 indicates that chroma QP adjustment is complete using chroma DQP. sps_restore_chromaadj equal to 2 indicates that chroma QP adjustment is complete using chroma scaling.

sps_restore_ilf_opt indicates: for intra and inter slices, an in-loop filter is applied in the original or in the shaping domain. For example, a two-bit syntax is used, where the least significant bits refer to intra slices:

sps_reshaper_ILF_opt	in-loop filter operation
		0 0	In the original domain for both intra and inter frames
0 1	In the original domain for inter frames, in the shaped domain for intra frames
		1 0	In the shaping domain for inter frames, in the original domain for intra frames
1 1	In the shaping domain for both intra and inter frames

In some embodiments, this parameter may be adjusted at the slice level. For example, in an embodiment, when the slice_restore_enable_flag is set to 1, a slice may include the slice_restore_ilfopt_flag. In another embodiment, in the SPS, if sps_reserve_ilf_opt is enabled, the sps_reserve_ilf_tid parameter may be included. If the current slice's TemporalID < = sps_reserve_ILF_Tid and slice_reserve_enable_flag is set to 1, then an in-loop filter is applied in the shaping domain. Otherwise, an in-loop filter is applied in the unshaped domain.

In table 4, chroma QP adjustment is controlled at the SPS level. In an embodiment, chroma QP adjustment may also be controlled at the slice level. For example, in each slice, when the slice_restore_enable_flag is set to 1, a syntax element slice_restore_chromaadj_flag may be added. In another embodiment, in SPS, if sps_reserve_chromaadj is enabled, syntax element sps_reserve_chromaadj_tid may be added. If the current slice's TemporalID < = sps_reserve_chromaadj_tid and slice_reserve_enable_flag is set to 1, then chroma adjustment is applied. Otherwise, no chromaticity adjustment is applied. Table 4B depicts an example variation of table 4 using the syntax previously described.

Table 4B: example syntax for shaping signaling in SPS using time ID

sps_restore_ilf_tid specifies the highest TemporalID to apply the in-loop filter to the shaped slice in the shaping domain. sps_reshiper_chromaadj_tid specifies the highest TemporalID to which chroma adjustment is applied to the reshaped slice.

In another embodiment, a shaping model ID (e.g., reshape_model_id) may be used to define the shaping model, e.g., as part of a slice_reshape_model () function. The shaping model may be signaled at the SPS level, PPS level, or slice header level. If a signal is signaled in the SPS or PPS, the value of the reshape_model_id can also be inferred from the sps_seq_parameter_set_id or pps_pic_parameter_set_id. An example of how to use the slice_mode_id for slices that do not carry slice_reshape_mode () (e.g., slice_reshape_mode_present_flag equals 0) is shown in table 5B below, with table 5B being a variation of table 5.

Table 5B: example syntax for shaping signaling using reshape_model_id in slice header

In the example syntax, the parameter reshape_model_id specifies the value of reshape_model being used. The value of reshape_model_id should be in the range of 0 to 15.

As an example of using the proposed syntax, consider an HDR signal encoded using PQ EOTF, where shaping is used at the SPS level, no specific shaping is used at the slice level (shaping is used for all slices), and CTU adaptation is allowed only for inter slices. Then:

sps_reshaper_signal_type＝1(PQ)；

sps_reshaper_model_present_flag＝1；

+/-/notes: for inter slices, the slice_slice_enable_flag may be manipulated to enable and disable the shaper.

In another example, consider an SDR signal where shaping is applied only at the slice level and only for intra slices. The CTU shaping adaptation is allowed only for inter slices. Then:

at the CTU level, in an embodiment, CTU level shaping may be enabled based on the brightness characteristics of the CTU. For example, for each CTU, an average brightness (e.g., ctu_avg_lum_value) may be calculated, the average brightness compared to one or more thresholds, and a decision to turn shaping on or off based on the results of these comparisons. For example, the number of the cells to be processed,

if CTU_avg_lum_value < THR1, or

If CTU_avg_lum_value > THR2, or

If THR3< CTU avg lum value < THR4,

then for this CTU, the reserve_ctu_flag=1.

In an embodiment, instead of using an average brightness, some other brightness characteristic of the CTU may be used, such as a minimum brightness, a maximum brightness, or an average brightness, a variance, etc. The chroma-based characteristics of CTUs may also be applied, or the luma and chroma characteristics may be combined with a threshold.

As previously described (e.g., with respect to the steps in fig. 3A, 3B, and 3C), embodiments may support default or static shaping functions, or adaptive shaping. The "default shaper" may be used to perform a predefined shaping function, thus reducing the complexity of analyzing each picture or scene when a shaping curve is obtained. In this case, the inverse shaping function need not be signaled at the scene, picture or slice level. The default shaper may be implemented by using a fixed mapping curve stored in the decoder to avoid any signaling, or it may be signaled once as part of the sequence level parameter set. In another embodiment, the previously decoded adaptive shaping function may be reused for later pictures in coding order. In another embodiment, the shaping curves may be signaled differently than previously decoded. In other embodiments, only one of the Inv () function or Fwd () function may be signaled in the bitstream (e.g., for in-loop residual shaping that requires both the Inv () function and Fwd () function to perform reverse shaping), or alternatively both functions may be signaled to reduce decoder complexity. Tables 7 and 8 provide two examples for signaling shaping information.

In table 7, the shaping function is transmitted as a set of second order polynomials. Which is a simplified syntax of the Exploratory Test Model (ETM) (reference [5 ]). Earlier variants can also be found in reference [4 ].

Table 7: example syntax for a segment representation (model_A) of a shaping function

reshape_input_luma_bit_depth_minus8 specifies the sample bit depth of the input luma component of the shaping process.

coeff_log2_offset_minus2 specifies the decimal place for the calculation of the shaping correlation coefficient of the luminance component. The value of coeff_log2_offset_minus2 should be in the range of 0 to 3 (inclusive).

reshape_num_ranges_minus1 plus 1 specifies the number of ranges in the segment shaping function. When reshape_num_ranges_minus1 does not exist, it is inferred that its value is 0.reshape_num_ranges_minus1 should be in the range of 0 to 7 (end points are included for the luminance component).

The reshape_equal_ranges_flag is equal to 1, the specified segment shaping function is partitioned into NumberRanges segments of nearly equal length, and the length of each range is not explicitly signaled. The reshape_equal_ranges_flag equal to 0 specifies that the length of each range is explicitly signaled.

reshape_global_offset_val is used to obtain an offset value for specifying the start point of the 0 th range.

reshape_range_val [ i ] is used to obtain the length of the i-th range of the luminance component.

The reshape_continuity_flag specifies the continuity attribute of the shaping function of the luminance component. If the reserve_continuity_flag is equal to 0, zero order continuity is applied to the piecewise linear inverse shaping function between successive pivot points. If the reserve_continuity_flag is equal to 1, then first order smoothness is used to get the full second order polynomial inverse shaping function between successive pivot points.

reshape_poly_coeff_order0_int [ i ] specifies the integer value of the i-th segment 0 th order polynomial coefficient of the luminance component.

reshape_poly_coeff_order0_frac [ i ] specifies the fractional value of the i-th segment 0 th order polynomial coefficient of the luminance component.

reshape_poly_coeff_order1_int specifies the integer values of the 1 st order polynomial coefficients of the luminance component.

reshape_poly_coeff_order1_frac specifies the fractional value of the 1 st order polynomial coefficient of the luminance component.

Table 8 depicts an example embodiment of an alternative parameterized representation of model_B (reference [3 ]) in accordance with the previous discussion.

Table 8: example syntax for parameterized representation of shaping function (model_b)

In table 8, in an embodiment, the syntax parameters may be defined as: the reshape_model_profile_type specifies the type of distribution to be used in the shaper construction process.

The restore_model_scale_idx specifies the index value of the scaling factor (denoted ScaleFactor) to be used in the shaper construction process. The value of ScaleFactor allows improved control of the shaping function to increase overall coding efficiency. Additional details regarding the use of this ScaleFactor are provided with respect to the discussion of the shaping function reconstruction process (e.g., as depicted in fig. 5A and 5B). By way of example and not limitation, the value of reshape_model_scale_idx should be in the range of 0 to 3 (inclusive). In an embodiment, the mapping between scale_idx and ScaleFactor as shown in the following table is given by:

ScaleFactor＝1.0-0.05*reshape_model_scale_idx.

reshape_model_scale_idx	ScaleFactor
		0	1.0
1	0.95
		2	0.9
3	0.85

in another example, for a more efficient fixed point implementation,

ScaleFactor＝1-1/16*reshape_model_scale_idx.

reshape_model_scale_idx	ScaleFactor
		0	1.0
1	0.9375
		2	0.875
3	0.8125

reshape_model_min_bin_idx specifies the minimum interval index to be used in the shaper construction process. The value of reshape_model_min_bin_idx should be in the range of 0 to 31, inclusive.

reshape_model_max_bin_idx specifies the maximum interval index to be used in the shaper construction process. The value of reshape_model_max_bin_idx should be in the range of 0 to 31, inclusive.

reshape_model_num_band specifies the number of bands to be used in the shaper construction process. The value of reshape_model_num_band should be in the range of 0 to 15, inclusive.

The reshape_model_band_profile_delta [ i ] specifies the delta value to be used in the shaper construction process to adjust the distribution of the ith band. The value of reshape_model_band_profile_delta [ i ] should be in the range of 0 to 1, inclusive.

The syntax in table 8 is far more efficient than reference [3] by defining a set of "default distribution types" (such as highlight, intermediate, and dark). In an embodiment, each type has a predefined visual band importance distribution. The predefined frequency bands and corresponding distributions may be implemented as fixed values in the decoder, or they may also be signaled using a high level syntax, such as a sequence parameter set. At the encoder, each image is first analyzed and classified as one of the distribution types. The distribution type is signaled by a syntax element "reshape_model_profile_type". In adaptive shaping, to capture a full range of image dynamics, the default distribution is further adjusted by an increment for each luma band or subset of luma bands. The delta value is derived based on the visual importance of the luma band and signaled by the syntax element "reserve_model_band_profile_delta".

In one embodiment, the increment value may only take on the value 0 or 1. At the encoder, visual importance is determined by comparing the percentage of band pixels in the entire image to the percentage of band pixels within the "main band", where the main band can be detected using a local histogram. If pixels within a band are concentrated in a small local box, the band is likely to be visually important at the box. The counts of the primary bands are summed and normalized to form a meaningful comparison to obtain an increment value for each band.

In the decoder, the shaper function reconstruction procedure has to be invoked to derive the shaping LUT based on the method described in reference [3 ]. Thus, the complexity is higher than a simpler piecewise approximation model that only requires evaluation of the piecewise polynomial function to calculate the LUT. The benefit of using a parameterized model syntax is that the bit rate of using a shaper can be significantly reduced. For example, based on typical test content, the model depicted in table 7 requires 200 to 300 bits to signal the shaper, while the parameterized model (as shown in table 8) uses only about 40 bits.

In another embodiment, as depicted in table 9, the forward shaping look-up table may be derived from a parameterized model of dQP values. In an embodiment, for example,

dQP＝clip3(min，max，scale*X+offset)，

Where min and max represent boundaries of dQP, scale and offset are two parameters of the model, and X represents a parameter derived based on signal brightness (e.g., a brightness value of a pixel, or, for a frame, a measure of frame brightness (e.g., minimum, maximum, average, variance, standard deviation, etc.) thereof). For example, and without limitation,

dQP＝clip3(-3，6，0.015*X-7.5)。

table 9: example syntax for parameterized representation of shaping function (model C)

In an embodiment, the parameters in table 9 may be defined as follows:

full_range_input_flag specifies the range of the input video signal. A full_range_input_flag of 0 corresponds to a standard dynamic range input video signal. A full_range_input_flag of 1 corresponds to a full-range input video signal. When full_range_input_flag does not exist, it is inferred to be 0.

Note that: as used herein, the term "full-range video" means that valid codewords in the video are not "limited". For example, for 10-bit full range video, the valid codeword is between 0 and 1023, where 0 is mapped to the lowest brightness level. In contrast, for 10-bit "standard range video", the valid codeword is between 64 and 940, and 64 is mapped to the lowest brightness level.

For example, the calculation of "full range" and "standard range" may be calculated as follows:

for the normalized luminance value Ey' in [0 1], encoding is performed with BD bits (e.g., bd=10, 12, etc.):

full range: y=clip 3 (0, (1 < BD) -1, ey' ((1 < BD) -1))

Standard range: y=clip 3 (0, (1 < BD) -1, round (1 < (BD-8) (219×ey' +16))

This syntax is similar to the "video_full_range_flag" syntax in the HEVC VUI parameter as described in section e.2.1 of the HEVC (h.265) specification (reference [11 ]).

dQP_model_scale_int_pre specifies the number of bits used to represent dQP_model_scale_int. dQP_model_scale_int_pre equals 0 indicating that dQP_model_scale_int is not signaled and is inferred to be 0.

dQP_model_scale_int specifies the integer value that the dQP model scales.

dqp_model_scale_frac_pre_minus16 plus 16 specifies the number of bits used to represent dqp_model_scale_frac.

dqp_model_scale_frac specifies the small value of dQP model scaling.

The variable dqpmodelrscaleabs is obtained as follows:

dQPModelScaleAbs＝dQP_model_scale_int＜＜(dQP_model_scale_ffac_prec_minus16+16)+dQP_model_scale_frac

dqp_model_scale_sign specifies the sign of dQP model scaling. When dqpmodelreabs are equal to 0, dqp_model_scale_sign is not signaled and it is inferred to be 0.

dQP_model_offset_int_pre_minus3 plus 3, specifies the number of bits used to represent dQP_model_offset_int. dqp_model_offset_int specifies the integer value of dQP model offset.

dQP_model_offset_frac_prec_minus1 plus 1, specifies the number of bits used to represent dQP_model_offset_frac.

dqp_model_offset_frac specifies the small value of dQP model offset.

The variable dQPmode Offsetabs is obtained as follows:

dQPModelOffsetAbs＝dQP_model_offset_int＜＜(dQP_model_offset_frac_prec_minus1+1)+dQP_model_offset_frac

dqp_model_offset_sign specifies the sign of the dQP model offset. When dQPModelOffsetAbs equals 0, dqp_model_offset_sign is not signaled and is inferred to be 0.

dQP_model_abs_pre_minus3 plus 3, specifies the number of bits used to represent dQP_model_max_abs and dQP_model_min_abs.

dqp_model_max_abs specifies the integer value of the dQP model maximum.

dqp_model_max_sign specifies the sign of the dQP model maximum. When dqp_model_max_abs is equal to 0, dqp_model_max_sign is not signaled and is inferred to be 0.

dqp_model_min_abs specifies the integer value of the minimum value of the dQP model.

dqp_model_min_sign specifies the sign of the minimum value of the dQP model. When dqp_model_min_abs is equal to 0, dqp_model_min_sign is not signaled and is inferred to be 0.

Model C decoding process

Given the syntax elements of table 9, the shaping LUT may be derived as follows.

The variable dQPmode ScaleFP is obtained as follows:

dQPModelScaleFP＝((1-2*dQP_model_scale_sign)*dQPModelScaleAbs)＜＜(dQP_model_offset_frac_prec_minus1+1).

the variable dQPmode OffsetFP is obtained as follows:

dQPModelOffsetFP＝((1-2*dQP_model_offset_sign)*dQPModelOffsetAbs)＜＜(dQP_model_scale_frac_prec_minus16+16).

the variable dqpmodelrhift is obtained as follows:

dQPModelShift＝(dQP_model_offset_frac_prec_minus1+1)+(dQP_model_scale_frac_prec_minus16+16).

the variable dQPModelMaxFP is obtained as follows:

dQPModelMaxFP＝((1-2*dQP_nodel_max_sign)*dQP_model_max_abs)＜＜dQPModelShift.

the variable dQPModelMinFP is obtained as follows:

dQPModelMinFP＝((1-2*dQP_model_min_sign)*dQP_model_min_abs)＜＜dQPModelShift.

for y=0: maxY// for example, for 10-bit video, maxy=1023

{

dQP[Y]＝clip3(dQPModelMinFP，dQPModelMaxFP，dQPModelScaleFP*Y+dQPModelOffsetFP)；

slope [ Y ] =exp2 ((dQP [ Y ] +3)/6); a/(exp 2 fixed point implementation, where exp2 (x) =2 (x);

}

if (full_range_input_flag= 0)// If the input is standard range video

For Y outside the standard range (i.e., y= [0:63] and [940:1023 ]), slope [ Y ] =0 is set;

CDF[0]＝slope[0]；

fbrY＝0：maxY-1

{

CDF [ y+1] =cdf [ Y ] +slope [ Y ]; the// CDF [ Y ] is the integral of the slope [ Y ]

}

for Y＝0：maxY

{

Fwdllut [ Y ] =round (CDF [ Y ]. Times.maxy/CDF [ maxY ]); fwdLUT is obtained by/(round-off and normalization

}

In another embodiment, as depicted In table 10, the forward shaping function may be represented as a set of luminance pivot points (in_y) and their corresponding codewords (out_y). To simplify the encoding, the input brightness range is described using a linear piecewise representation from a sequence of a starting pivot and equally spaced subsequent pivots. An example of a forward shaping function for 10-bit input data is depicted in fig. 7.

Table 10: example syntax for pivot-based representation of shaping function (model D)

In an embodiment, the parameters in table 10 may be defined as follows:

full_range_input_flag specifies the range of the input video signal. A full_range_input_flag of 0 corresponds to a standard range input video signal. A full_range_input_flag of 1 corresponds to a full-range input video signal. When full_range_input_flag does not exist, it is inferred to be 0.

bin_pivot_start specifies the pivot value of the first equal length interval (710). When full_range_input_flag is equal to 0, bin_pivot_start should be greater than or equal to the minimum standard range input and should be less than the maximum standard range input. (e.g., for a 10 bit SDR input, bin_pivot_start (710) should be between 64 and 940).

bin_cw_start specifies a mapped value (715) of bin_pivot_start (710) (e.g., bin_cw_start=fwdlt [ bin_pivot_start ]).

log2_ num _ equal _ bins _ minus3 plus 3, specifies the number of equal length intervals after the start pivot (710). The variables NumEqualBins and NumTotalBins are defined by:

NumEqualBins＝1<<(log2_num_equal_bins_minus3+3)

if full_range_input_flag= 0

NumTotalBins＝NumEqualBins+4

Otherwise

NumTotalBins＝NumEqualBins+2

Note that: experimental results indicate that most forward shaping functions can be represented using eight equal length segments; however, a complex shaping function may require more segments (e.g., 16 segments or more).

The equal_bin_pivot_delta specifies the length of the equal length interval (e.g., 720-1, 720-N). Numequalbin_bin_pivot_delta should be less than or equal to the effective input range. ( For example, if full_range_input_flag is 0, then for a 10-bit input, the valid input range should be 940-64=876; if full_range_input_flag is 1, the valid input range should be 0 to 1023 for 10-bit input. )

bin_cw_in_first_equal_bin specifies the number of mapped codewords (725) in the first equal-length interval (720-1).

Bin_cw_delta_abs_pre_minus4 plus 4, the number of bits used to represent bin_cw_delta_abs [ i ] for each subsequent equal interval is specified.

The bin_cw_delta_abs [ i ] specifies the value of bin_cw_delta_abs [ i ] for each subsequent equal-length interval. bin_cw_delta [ i ] (e.g., 735) is the difference of a codeword (e.g., 740) in the current equal length interval i (e.g., 720-N) compared to a codeword (e.g., 730) in the previous equal length interval i-1.

bin_cw_delta_sign [ i ] designates the sign of bin_cw_delta_abs [ i ]. When bin_cw_delta_abs [ i ] is equal to 0, bin_cw_delta_sign [ i ] is not signaled and is inferred to be 0. The variable bin_cw_delta [ i ] = (1-2 x bin_cw_delta_sign [ i ]) bin_cw_delta_abs [ i ]

Decoding process for model D

Given the syntax elements of table 10, for a 10-bit input, the shaping LUT may be derived as follows: defining a constant:

minIN＝minOUT＝0；

maxin=maxout=2 BD-1=1023 for the case of 10 bits// bd=bit depth

minstdin=64 for the 10-bit case

maxstdin=940 for the 10-bit case

Step 1: for j=0 to NumTotalBins, a pivot value in_y [ j ] is obtained

Step 2: for j=0 to NumTotalBins, a mapped value out_y [ j ] is obtained

/>

Step 3: linear interpolation to obtain all LUT entries

Initialization of FwdLUT

In general, shaping can be turned on or off for each slice. For example, shaping may be enabled for intra slices only and disabled for inter slices. In another example, shaping may be disabled for the inter slice with the highest temporal level. (note: as used herein, temporal sub-layers may match the definition of temporal sub-layers in HEVC.) in defining the shaper model, in one example the shaper model may be signaled only in SPS, but in another example the slice shaper model may be signaled in intra slices. Alternatively, the shaper model may be signaled in the SPS and allowed to update the SPS shaper model for all slices, or the slice shaper model may be allowed to update the SPS shaper model for intra slices only. For inter slices following intra slices, an SPS shaper model or an intra slice shaper model may be applied.

As another example, fig. 5A and 5B depict a shaping function reconstruction process in a decoder according to an embodiment. The process uses the method described herein and in reference [3], wherein the visual rating scale is [05].

As shown in fig. 5A, first (step 510), the decoder extracts the reserve_model_profile_type variable and sets an appropriate initial band profile for each interval (steps 515, 520, and 525). For example, in pseudocode:

if (reshape_model_profile_type= 0) R [ b _i ]＝R _bright [b _i ]；

Otherwise if (reshape_model_profile_type= 1) R [ b _i ]＝R _dark [b _i ]；

Otherwise R < b > _i ]＝R _mid [b _i ].

In step 530, the decoder uses the received reshape_model_band_profile_delta [ b ] _i ]Values to adjust each band distribution are as follows:

for(i＝0：reshape_model_num_band-1)

{R[b _i ]＝R[b _i ]+reshape_model_band_profile_delta[b _i ]}.

in step 535, the decoder propagates the adjusted values to each bin profile as follows: if bin [ j ]]Belonging to band b _i ,R_bin[j]＝R[b _i ].

In step 540, the interval distribution is modified as follows:

if (j > reshape_model_max_bin_idx) or (j < reshape_model_min_bin_idx)

Then { r_bin [ j ] =0 }.

In parallel, in steps 545 and 550, the decoder may extract parameters to calculate scaling factor values and candidate codewords for each bin [ j ], as follows:

ScaleFactor＝1.0–0.05*reshape_model_scale_idx

Cw_dft [ j ] = codeword in section if default shaping is used

CW_PQ[j]＝TotalCW/TotalNumBins.

In calculating the ScaleFactor value, instead of using a scaling factor of 0.05, 1/16=0.0625 may be used instead for the fixed point implementation.

Continuing with fig. 5B, in step 560, the decoder starts pre-assignment of Codewords (CWs) for each interval based on the interval distribution, as follows:

if R_bin [ j ] = 0, CW [ j ] = 0

If R_bin [ j ] = 1, CW [ j ] = CW_dft [ j ]/2;

if R_bin [ j ] = 2, CW [ j ] = min (CW_PQ [ j ], CW_dft [ j ]);

if R_bin [ j ] = 3, CW [ j ] = (CW_PQ [ j ] +CW_dft [ j ])/-2;

if R_bin [ j ] > = 4, CW [ j ] = max (CW_PQ [ j ], CW_dft [ j ]);

in step 565, the total used codeword is calculated and Codeword (CW) assignment is refined/completed as follows: CW (continuous wave) _used ＝Sum(CW[j]):

If CW _used >Total CW, rebalance (rebalance) CW [ j]＝CW[j]/(CW _used /TotalCW)；

Otherwise

{

CW_remain＝TotalCW–CW _used ；

CW_domain is assigned to the interval with the largest R_bin [ j ]);

}

finally, in step 565, the decoder: a) generates a forward shaping function (e.g., fwdlt) by accumulating the CW [ j ] values, b) multiplies the ScaleFactor values with fwdlt values to form a final FwdLUT (FFwdLUT), and c) it generates an inverse shaping function InvLUT based on the FFwdLUT.

In a fixed point implementation, the computation of the ScaleFactor and FFwdLUT may be expressed as:

ScaleFactor＝(1＜＜SF_PREC)-reshape_model_scale_idx

FFwdLUT＝(FwdLUT*ScaleFactor+(1<<(FP_PREC+SF_PREC-1)))>>(FP_PREC+SF_PREC),

Where sf_prec and fp_prec are predefined precision-related variables (e.g., sf_prec=4 and fp_prec=14), "c=a<<n "represents a shift left by n bits (or c=a (2) ⁿ ) And "c=a)>>n "represents a binary shift operation performed on a by n bits to the right (or c=a/(2) ⁿ ))。

Chroma QP derivation

The chroma coding performance is closely related to the luma coding performance. For example, in AVC and HEVC, a table is defined for specifying a relationship between Quantization Parameters (QP) of a luminance component and a chrominance component, or between brightness and chrominance. The specification also allows for more flexibility in defining QP relationships between luma and chroma using one or more chroma QP offsets. When shaping is used, the luminance value is modified, and thus, the relationship between luminance and chromaticity may also be modified. In order to maintain and further improve coding efficiency during shaping, in an embodiment, a chroma QP offset is derived at the Coding Unit (CU) level based on a shaping curve. This operation needs to be performed at both the decoder and the encoder.

As used herein, the term "coding unit" (CU) refers to a coding box (e.g., a macroblock, etc.). For example, and without limitation, in HEVC, a CU is defined as a "luma sample coding box, two corresponding chroma sample coding boxes of a picture with three sample arrays, or a sample coding box of a monochrome picture or a sample coding box of a picture encoded using three separate color planes and syntax structures for encoding the samples.

In an embodiment, the chroma Quantization Parameter (QP) (chromaQP) value may be obtained as follows:

1) Based on the shaping curve, an equivalent luminance dQP map, dQPLUT, is obtained:

for CW＝0:MAX_CW_VALUE-1

dQPLUT[CW]＝-6*log2(slope[CW])；

wherein, slope [ CW ]]Representing the slope of the forward shaping curve at each CW (codeword) point, and MAX uCw_value is the maximum codeword VALUE for a given bit depth, e.g., max_cw_value=1024 (2 for a 10-bit signal ¹⁰ )。

Then, for each Coding Unit (CU):

2) The average brightness of the coding units is calculated, denoted AvgY:

3) The chromaDQP value is calculated based on dqPLUT [ ], avgY, shaping architecture, inverse shaping function Inv () and slice type, as shown in Table 11 below:

table 11: example chromaDQP values according to shaping architecture

4) chromaQP was calculated as:

chromaQP＝QP_luma+chromaQPOffset+chromaDQP；

where chromaQPOffset represents the chroma QP offset and qp_luma represents the luma QP of the coding unit. Note that the value of the chroma QP offset may be different for each chroma component (e.g., cb and Cr), and the chroma QP offset value is transmitted to the decoder as part of the encoded bitstream.

In an embodiment, dQPLUT [ ] may be implemented as a predefined LUT. It is assumed that all codewords are divided into N intervals (e.g., n=32), and each interval contains m=max_cw_value/N codewords (e.g., m=1024/32=32). When new codewords are assigned for each interval, they can limit the number of codewords to 1 to 2*M, so they can pre-calculate dqPLUT [1 … 2*M ] and save the calculation result as LUT. This approach may avoid any floating point calculations or avoid approximations of fixed point calculations. The method can also save encoding/decoding time. For each interval, one fixed chromaQPOffset is used for all codewords in that interval. The DQP value is set equal to dQPLUT [ L ], where L is the number of codewords for the interval, where 1.ltoreq.L.ltoreq. 2*M.

The dqPLUT value may be pre-calculated as follows:

for i＝1:2*M

slope[i]＝i/M；

dQPLUT[i]＝-6*log2(slope[i])；

ending

In calculating dqPLUT [ x ], different quantization schemes may be used to obtain integer QP values, such as: round (), ceil (), floor (), or a mixture thereof. For example, a threshold value TH may be set, and if Y < TH, the dQP value is quantized using floor (), otherwise when Y is not less than TH, the dQP value is quantized using ceil (). The use of such quantization schemes and the corresponding parameters may be predefined in the codec or may be signaled in the bitstream for adaptation. An example syntax that allows mixing the quantization scheme with a threshold as discussed above is as follows:

the quant_scheme_signal_table () function may be defined at different levels (e.g., sequence level, slice level, etc.) of the shaping syntax according to adaptation needs.

In another embodiment, the chromaDQP value may be calculated by applying a scaling factor to the residual signal in each coding unit (or more specifically the transform unit). This scaling factor may be a brightness dependent value and may be calculated as follows: a) Digitally, for example, as the first derivative (slope) of the forward shaping LUT (see, for example, equation (6) in the following section), or b) is calculated as:

When dQP (x) is used to calculate Slope (x), dQP can maintain floating point precision without integer quantization. Alternatively, various different quantization schemes may be used to calculate the quantized integer dQP value. In some embodiments, such scaling may be performed at the pixel level rather than at the frame level, where each chroma residual may be scaled by a different scaling factor derived using the co-located luma prediction values of the chroma samples. Thus, the first and second substrates are bonded together,

table 12: example chroma dQP values using scaling for hybrid in-loop shaping architecture

For example, if cscale_fp_prec=16

Forward scaling: after generating the chrominance residual, before transforming and quantizing:

-C_Res＝C_orig-C_pred

-C_Res_scaled＝C_Res*S+(1＜＜(CSCALE_FP_PREC-1))＞＞CSCALE_FP_PREC

inverse scaling: after the chroma inverse quantization and inverse transformation, but prior to reconstruction:

-C_Res_inv＝(C_Res_scaled＜＜CSCALE_FP_PREC/S

-C_Reco＝C_Pred+C_Res_inv；

where S is s_cu or s_px.

Note that: in table 12, when Scu is calculated, the average luminance (AvgY) of the frame is calculated before applying the reverse shaping. Alternatively, inverse shaping may be applied before calculating the average luminance, for example, scu=slopelut [ Avg (Inv [ Y ]) ]. This alternative calculation order is also applicable to calculating the values in table 11; that is, calculating Inv (AvgY) may be replaced with calculating an Avg (Inv [ Y ]) value. The latter approach may be considered more accurate but increases computational complexity.

Encoder optimization for shaping

This section discusses various techniques for improving the coding efficiency of an encoder by jointly optimizing shaping parameters and encoder parameters when shaping is part of a standardized decoding process (as described in one of three candidate architectures). In general, encoder optimization and shaping have their own limitations in solving coding problems in different places. In conventional imaging and encoding systems, there are two types of quantization: a) Quantization of samples in the baseband signal (e.g., gamma or PQ coding), and b) transform-dependent quantization (part of compression). Shaping is between them. Picture-based shaping is typically updated on a picture basis and only allows mapping of sample values based on their brightness level, without taking any spatial information into account. In block-based codecs, such as HEVC, transform quantization (e.g., for luminance) is applied within a spatial frame and can be spatially adjusted, so the encoder optimization method must apply the same parameter set for an entire frame containing samples with different luma values. As appreciated by the inventors and described herein, joint shaping and encoder optimization may further improve coding efficiency.

Inter/intra mode decision

In conventional coding, the inter/intra mode decision is based on calculating a distortion function (dfunc ()) between the original samples and the predicted samples. Examples of such functions include Sum of Squared Error (SSE), sum of Absolute Difference (SAD), and the like. In an embodiment, such distortion metrics may be used using shaped pixel values. For example, if original dfinct () uses original_sample (i) and pred_sample (i), dfinct () may use its respective shaped values Fwd (original_sample (i)) and Fwd (pred_sample (i)) when shaping is applied. This approach allows for more accurate inter/intra mode decisions, thereby improving coding efficiency.

By means of shaped LumaDQP

In the JCTVC HDR universal test conditions (CTC) document (ref [6 ]]) In lumaDQP and chromaQPoffsets are two encoder settings for modifying the Quantization (QP) parameters of the luma and chroma components to improve the HDR encoding efficiency. In the present invention, several new encoder algorithms are presented to further refine the original proposal. For each lumaDQP-adapted unit (e.g., 64 x 64 CTU), a dQP value is calculated based on the average input luminance value of the unit (e.g., reference [6 ]]Shown in table 3 of (c). The final quantization parameter QP for each coding unit within the lumaDQP adapter unit should be adjusted by subtracting the dQP. The dQP map is configurable in the encoder input configuration. The input configuration is denoted dQP _inp 。

Such as reference [6 ]]And [7 ]]As discussed in the prior artIn the code scheme, the same lumaDQP LUT dQP _inp For both intra pictures and inter pictures. Intra pictures and inter pictures may have different properties and quality characteristics. In the present invention, it is proposed to adjust lumaDQP settings based on picture coding type. Thus, there are two dQP maps in the encoder input configuration that are configurable and are denoted dQP _inpIntra And dQP _inpInter 。

As discussed previously, when using the intra-ring intra shaping method, since shaping is not performed on inter pictures, it is important to apply some lumaDQP settings to inter coded pictures to achieve similar quality as if the inter pictures were shaped by the same shaper as used for intra pictures. In one embodiment, the lumaDQP setting for the inter picture should match the characteristics of the shaping curve for the intra picture.

Order the

Slope(x)＝Fwd’(x)＝(Fwd(x+dx)-Fwd(x-dx))/(2dx)， (6)

Representing the first derivative of the forward shaping function, then in an embodiment, represents an automatically derived dQP _auto (x) The values may be calculated as follows:

if Slope (x) =0, dQP _auto (x) =0, otherwise

dQP _auto (x)＝6log2(Slope(x))， (7)

Wherein dQP _auto (x) Can be limited within a reasonable range, e.g., [ -6 6 ]。

If lumaDQP is enabled for intra pictures with shaping (i.e., external dQP is set _inpIntra ) The lumaDQP for the inter picture should take this into account. In an embodiment, the data may be obtained by passing the dQP from the shaper _auto (equation (7)) and dQP for intra pictures _inpIntra Setting an adder to calculate the final inter dQP _final . In another embodiment, to take advantage of intra quality propagation, dQP for inter pictures can be used _final Set to dQP _auto Or in small increments only (by setting dQP _inpInter ) And add it to dQP _auto 。

In an embodiment, when shaping is enabled, the following general rule for setting the luminance dQP value may apply:

(1) A luma dQP map table (based on picture coding type) can be set independently for intra pictures and inter pictures;

(2) If the pictures within the encoding loop are in the shaping domain (e.g., intra pictures in the intra-ring frame shaping architecture or all pictures in the extra-ring shaping architecture), then the input luma to delta QP mapping dQP is also required _inp Conversion to the shaping domain dQP _rsp . I.e.

dQP _rsp (x)＝dQP _inp [Inv(x)]. (8)

(3) If the pictures within the encoding loop are in the unshaped domain (e.g., reverse shaped or unshaped, e.g., inter pictures in an intra-annular intra-shaping architecture or all pictures in an intra-annular residual shaping architecture), then the input luma to delta QP mapping does not need to be translated and can be used directly.

(4) The automatic inter-increment QP derivation is only valid for intra-annular intra-shaping architectures. The actual delta QP for an inter picture in this case is the sum of the automatically derived and entered values:

dQP _final [x]＝dQP _inp [x]+dQP _auto [x]， (9)

and dQP _final [x]Can be limited to a reasonable range, such as [ -12]；

(5) The luminance to dQP map can be updated in each picture, or when the shaping LUT changes. The actual dQP adaptation (obtaining the corresponding dQP for quantization of a block from its average luminance value) can occur at the CU level (encoder configurable).

Table 13 summarizes the dQP settings for each of the three architectures proposed.

Table 13: dQP setting

Rate distortion optimization(RDO)

In JEM6.0 software (reference [8 ]), when lumaDQP is enabled, RDO (rate distortion optimization) pixel based weighted distortion is used. The weight table is fixed based on the brightness value. In an embodiment, the weight table should be adjusted adaptively based on the lumaDQP setting calculated as set forth in the previous section. Two weights of Sum of Square Error (SSE) and Sum of Absolute Difference (SAD) are proposed as follows:

the weight calculated by equation (10 a) or equation (10 b) is based on the total weight of the final dQP, which includes both the input lumaDQP and the dQP derived from the forward shaping function. For example, based on equation (9), equation (10 a) can be written as:

The total weight can be separated into weights calculated by the input lumaDQP:

weights from shaping:

when the total dQP is used to calculate the total weight by first calculating the weight from the shaping, the integer dQP is obtained due to the clipping operation _auto But accuracy is lost. Conversely, directly using the slope function to calculate the weight from the shaping may keep the weight higherPrecision, and thus is more advantageous.

The weight derived from the input lumaDQP is denoted as W _dQP . Let f' (x) denote the first derivative (or slope) of the forward shaping curve. In an embodiment, the total weight considers both the dQP value and the shape of the shaping curve, so the total weight value can be expressed as:

weight _total ＝Clip3(0.0，30.0，W _dQP *f′(x) ² )。 (11)

a similar approach may also be applied to the chrominance components. For example, in an embodiment, dQP [ x ] may be defined according to Table 13 for chroma.

Interaction with other coding tools

This section provides several examples of suggested changes required by other coding tools when shaping is enabled. There may be interactions for any possible existing or future coding tools to be included in the next generation video coding standard. The examples given below are not limiting. In general, the video signal fields (shaped, unshaped, inversely shaped) during the encoding steps need to be identified, and the operations of processing the video signal at each step need to take into account the shaping effect.

Cross-component linear model prediction

In CCLM (cross-component linear model prediction) (reference [8 ]]) In which the luminance reconstruction signal rec can be used _L ' (i, j) to obtain a predicted chroma sample pred _c (i,j)：

pred _C (i，j)＝α·rec _L ′(i，j)+β。 (12)

When shaping is enabled, in an embodiment, it may be desirable to determine whether the luma reconstructed signal is in the shaping domain (e.g., an out-of-loop shaper or an in-loop intra-frame shaper) or in the non-shaping domain (e.g., an in-loop residual shaper). In one embodiment, the reconstructed luminance signal may be used implicitly as it is without any additional signaling or operation. In other embodiments, if the reconstructed signal is in the unshaped domain, the reconstructed luminance signal may be converted to be also in the unshaped domain as follows:

pred _C (i，j)＝α·Inv(rec _L ′(i，j))+β。 (13)

in other embodiments, a bitstream syntax element may be added to signal which domain is desired (shaped or unshaped), which may be decided by the RDO process, or the decision may be derived based on decoded information, thereby saving the overhead required for explicit signaling. A corresponding operation may be performed on the reconstructed signal based on the decision.

Shaper with residual prediction tool

In the HEVC range extension profile, a residual prediction tool is included. Predicting a chrominance residual signal from a luminance residual signal at an encoder side as:

Δr _C (x，y)＝r _C (x，y)-(α×r′ _L (x，y))＞＞3， (14)

And the chrominance residual signal is compensated at the decoder side as:

r′ _C (x，y)＝Δr′ _C (x，y)+(α×r′ _L (x，y))＞＞3， (15)

wherein r is _c Representing chroma residual samples at position (x, y), r' _L Reconstructed residual samples, Δr, representing the luminance component _c Representing prediction signal using inter-color prediction, Δr' _C Expressed in the pair Deltar _c Reconstructed signal after encoding and decoding, and r' _C Representing the reconstructed chrominance residual.

When shaping is enabled, it may be necessary to consider which luma residual is used for chroma residual prediction. In one embodiment, the "residual" may be used as is (may be shaped or unshaped based on the shaper architecture). In another embodiment, the luminance residual may be forced in one domain (such as in an unshaped domain) and the appropriate mapping performed. In another embodiment, the appropriate processing may be obtained by the decoder, or may be explicitly signaled as described previously.

Shaper with adaptive clipping

Adaptive clipping (reference [8 ]) is a new tool that was introduced to signal the original data range for content dynamics and does adaptive clipping instead of fixed clipping (based on internal bit depth information) at each step in the compression workflow (e.g. in transform/quantization, loop filtering, output) where clipping occurs. Order the

T _clip ＝Clip _BD (T，bitdepth，C)＝Clip3(min _C ，max _C ，T)， (16)

Where x=clip 3 (min, max, c) denotes:

and is also provided with

C is a component ID (typically Y, cb or Cr)

·min _c Is the clipping lower limit used in the current slice of component ID C

·max _c Is the upper clipping limit used in the current slice of component ID C

When shaping is enabled, in an embodiment, it may be necessary to find the domain in which the data stream is currently located and perform clipping correctly. For example, if clipping is handled in the shaping domain data, the original clipping boundary needs to be translated to the shaping domain:

T _clip ＝Clip _BD (T，bitdepth，C)＝

＝Clip3(Fwd(min _C )，Fwd(max _C )，T)。 (17)

in general, each clipping step needs to be handled correctly with respect to the shaping architecture.

Shaper and loop filtering

In HEVC and JEM 6.0 software, loop filters such as ALF and SAO require the use of reconstructed luma samples and uncompressed "raw" luma samples to estimate the optimal filter parameters. When shaping is enabled, in an embodiment, the domain for which filter optimization is desired to be performed may be specified (explicitly or implicitly). In one embodiment, filter parameters over the shaping domain (relative to the shaped original when the reconstruction is in the shaping domain) may be estimated. In other embodiments, filter parameters over the unshaped domain may be estimated (relative to the original when the reconstruction is in the unshaped domain or in the reverse-shaped domain).

For example, depending on the in-loop shaping architecture, in-loop filter optimization (ILFOPT) options and operations may be described by table 14 and table 15.

TABLE 14 Loop Filter optimization in intra-only and hybrid in-loop shaping architecture

/>

TABLE 15 Loop Filter optimization in-loop residual shaping architecture

Although most of the detailed discussion herein relates to methods performed on the luma component, those skilled in the art will appreciate that similar methods may be performed on the chroma color component and chroma related parameters such as chromaQPOffset (e.g., see reference [9 ]).

In-loop shaping and region of interest (ROI)

Given an image, the term 'region of interest' (ROI), as used herein, represents an image region that is considered to be of particular interest. In this section, a novel embodiment is presented that only supports in-loop shaping of the region of interest. That is, in an embodiment, shaping may be applied only inside the ROI, not outside. In another embodiment, different shaping curves may be applied in and outside the region of interest.

The use of ROIs is driven by the need to balance bit rate with image quality. Consider, for example, a sunset video sequence. In the upper half of the image, the sun can be positioned on the sky of a relatively uniform color (so pixels in the sky background can have very low variance). Instead, the lower half of the image may depict a moving wave. The upper portion may be considered to be far more important than the lower portion from the perspective of the viewer. On the other hand, since the variance of the pixels of the moving wave is large, the moving wave is difficult to compress, and more bits are required for each pixel; however, it may be desirable to allocate more bits on the sun portion than the wave portion. In this case, the upper half may be denoted as a region of interest.

ROI description

Today, most codecs (e.g., AVC, HEVC, etc.) are block-based. To simplify the embodiment, the region may be specified in units of boxes. Using HEVC as an example, a region may be defined as multiple Coding Units (CUs) or Coding Tree Units (CTUs), without limitation. One ROI or multiple ROIs may be specified. Multiple ROIs may be different or overlapping. The ROI is not necessarily rectangular. The syntax for the ROI may be provided at any level of interest, such as slice level, picture level, video stream level, etc. In an embodiment, the ROI is first specified in a Sequence Parameter Set (SPS). Then in the slice header, small ROI changes may be allowed. Table 16 depicts an example of syntax in which one ROI is specified as a plurality of CTUs in a rectangular region. Table 17 describes the syntax of the modified ROI at the slice level.

Table 16: SPS syntax for ROI

Table 17: slice header syntax for ROI

The sps_reserve_active_roi_flag is equal to 1, specifying that there is an ROI in the Coded Video Sequence (CVS). sps_reserve_active_roi_flag is equal to 0, specifying that no ROI exists in the CVS.

The restore_active_roi_in_ctusize_ left, reshaper _active_roi_in\u CTUsize_ right, reshaper _active_ROI_in_CTUsize_top and restore u active_roi_in_ctusize_bottom each specifies a picture sample in the ROI according to a rectangular region specified in the picture coordinates. For the left and upper part, the coordinates are equal to offset CTUsize, and for the right and lower part, the coordinates are equal to offset CTUsize-1. The reshape_model_roi_modification_flag is equal to 1, specifying that the ROI is modified in the current slice. The reshape_model_roi_modification_flag is equal to 0, specifying that the ROI is not modified in the current slice.

The resurger_roi_mod_offset_ left, reshaper _roi_mod_offset_ right, reshaper _roi_mod_offset_top and resurger_roi_mod_offset_bottom each specify and are assigned to the resurger_active_roi_in left/right/up/down offset values of_ctusize_ left, reshaper _active_roi_in_ctusize_ right, reshaper _active_roi_in_ctusize_top and restore_active_roi_in_ctusize_bottom.

For multiple ROIs, the example syntax for a single ROI of tables 16 and 17 can be extended using an index (or ID) for each ROI, similar to the scheme used in HEVC to define multiple full scan (pan-scan) rectangles using SEI messages (see HEVC specification, reference [11], section d.2.4).

ROI processing in intra-only intra-frame shaping in a ring

For intra-only shaping, the ROI portion of the picture is first shaped and then encoded. Since shaping is applied only to the ROI, the boundary between the ROI and non-ROI portions of the picture may be seen. Since loop filters (e.g., 270 in fig. 2C or fig. 2D) may cross boundaries, special care must be taken with the ROI to perform loop filter optimization (ILFOPT). In an embodiment it is proposed that the loop filter is applied only if the entire decoded picture is in the same domain. I.e. the whole picture is either entirely in the reshaped domain or entirely in the unshaped domain. In one embodiment, on the decoder side, if loop filtering is applied over the unshaped domain, inverse shaping should first be applied to the ROI portion of the decoded picture, and then the loop filter is applied. Next, the decoded picture is stored into the DPB. In another embodiment, if the loop filter is applied over the shaping domain, the shaping should be applied first to the non-ROI portion of the decoded picture, and then the loop filter is applied, and then the entire picture is inverse shaped. Next, the decoded picture is stored into the DPB. In yet another embodiment, if loop filtering is applied over the shaping domain, the ROI portion of the decoded picture may be first reverse shaped, then the entire picture is shaped, then the loop filter is applied, and then the entire picture is reverse shaped. Next, the decoded picture is stored into the DPB. These three methods are summarized in table 18. From a computational point of view, method "a" is simpler. In an embodiment, the enablement of the ROI may be used to specify the order in which reverse shaping and Loop Filtering (LF) are performed. For example, if the ROI is actively used (e.g., SPS syntax flag = true), the LF is performed (block 270 in fig. 2C and 2D) after the reverse shaping (block 265 in fig. 2C and 2D). If the ROI is not actively used, the LF is performed before the reverse shaping.

TABLE 18 Loop Filter (LF) options Using ROI

ROI processing in intra-prediction residual shaping

For an in-loop (prediction) residual shaping architecture (see, e.g., 200c_d in fig. 2F), at the decoder, using equation (3), the process can be expressed as:

if (the current CTU belongs to the ROI)

Reco_sample=Inv (Res_d+Fwd (pred_sample)) (see equation (3))

Otherwise

Reco_sample＝Res_d+Pred_sample

Ending

ROI and encoder considerations

In the encoder, it is necessary to check whether each CTU belongs to the ROI. For example, for intra-prediction residual shaping, a simple check based on equation (3) may perform the following:

if (the current CTU belongs to the ROI)

The weighted distortion in RDO is applied to the luminance. The weight is based on equation (10)

Otherwise

Applying unweighted distortion in RDO to luminance

Ending

An example encoding workflow that considers the ROI during shaping may include the steps of:

-for intra pictures:

applying forward shaping to ROI areas of original pictures

-encoding intra frames

-applying inverse shaping to the ROI-areas of the reconstructed picture before the Loop Filter (LF)

Loop filtering is performed in the unshaped domain as follows (see, e.g., method "C" in table 18), comprising the steps of:

applying forward shaping to non-ROI areas of the original picture (so that the whole original picture is shaped for loop filter reference)

Applying forward shaping to the entire picture region of the reconstructed picture

Deriving loop filter parameters and applying loop filtering

Applying inverse shaping to the whole picture region of the reconstructed picture and storing it in the DPB on the encoder side, the processing of the LF reference for each method is as shown in table 19, since the LF needs to have non-compressed reference pictures for filter parameter estimation:

TABLE 19 treatment of LF references for ROI

-for inter pictures:

-applying prediction residual shaping and weighted distortion to the luminance for each CU within the ROI when encoding the inter frame; for each CU outside the ROI, no shaping is applied

Loop filter optimization (option 1) is performed as before (as if ROI were not used):

forward shaping of the entire picture region of the original picture

Forward shaping of the entire picture region of the reconstructed picture

Deriving loop filter parameters and applying loop filtering

Applying inverse shaping to the entire picture region of the reconstructed picture and storing it in the DPB

Shaping HLG encoded content

The term hybrid log-Gamma or HLG denotes another transfer function defined in rec.bt.2100 for mapping high dynamic range signals. HLG was developed to maintain backward compatibility with conventional standard dynamic range signals encoded using conventional gamma functions. When comparing codeword distribution between PQ encoded content and HLG encoded content, the PQ mapping tends to allocate more codewords in dark and bright areas, while most HLG content codewords appear to be allocated into mid-range. Two methods can be used for HLG luminance shaping. In one embodiment, the HLG content may simply be converted to PQ content and then all PQ-related shaping techniques discussed previously are applied. For example, the following steps may be applied:

1) HLG brightness (e.g., Y) is mapped to PQ brightness. Let the transformed function or LUT be denoted HLG2PQLUT (Y)

2) The PQ brightness values are analyzed and a PQ-based forward shaping function or LUT is obtained. It is expressed as PQADPFET (Y)

3) Combining the two functions or LUTs into a single function or LUT: HLGAdpFLUT [ i ] =pqapplut [ HLG2PQLUT [ i ] ].

This approach may yield suboptimal shaping results because the HLG codeword distribution is quite different from the PQ codeword distribution. In another embodiment, the HLG shaping function is derived directly from HLG samples. The same framework as that for the PQ signal may be applied, but the cw_bins_dft table is changed to reflect the characteristics of the HLG signal. In an embodiment, using the halftone distribution for HLG signals, several cw_bins_dft tables can be designed according to user preferences. When, for example, it is preferable to leave high light, for alpha=1.4,

g_DftHLGCWBin0＝{8，14，17，19，21，23，24，26，27，28，29，30，31，32，33，34，35，36，36，37，38，39，39，40，41，41，42，43，43，44，44，30}。

when it is preferable to reserve the middle key (or middle range):

g_DftHLGCWBin1＝{12，16，16，20，24，28，32，32，32，32，36，36，40，44，48，52，56，52，48，44，40，36，36，32，32，32，26，26，20，16，16，12}。

when it is preferable to preserve skin tone:

g_DftHLGCWBin2＝{12，16，16，24，28，32，56，64，64，64，64，56，48，40，32，32，32，32，32，32，28，28，24，24，20，20，20，20，20，16，16，12}；

from the viewpoint of the bitstream syntax, in order to distinguish between the PQ-based shaping and HLG-based shaping, a new parameter denoted sps_reshaper_signal_type is added, wherein the value sps_reshaper_signal_type indicates the type of signal being shaped (e.g. 0 is a gamma-based SDR signal, 1 is a PQ-encoded signal, and 2 is an HLG-encoded signal).

Examples of syntax tables for HDR shaping in SPS and slice headers for both PQ and HLG are shown in tables 20 and 21, with all features (e.g., ROI, loop filter optimization (ILFOPT), and chromadqpadadjustment) discussed previously.

Table 20: example SPS syntax for shaping

The sps_in_loop_filter_opt_flag equal to 1 specifies in-loop filter optimization to be performed in the shaping domain in the Coded Video Sequence (CVS).

The sps_in_loop_filter_opt_flag equal to 0 specifies in-loop filter optimization to be performed in the unshaped domain in the CVS. The sps_luma_based_chroma_qp_offset_flag is equal to 1 and specifies (e.g., according to table 11 or table 12) a luma-based chroma QP offset and applies it to the chroma coding of each CU in the Coded Video Sequence (CVS). The sps_luma_based_chroma_qp_offset_flag being equal to 0 specifies that no luma-based chroma QP offset is enabled in the CVS.

Table 21: example syntax for shaping at the slice level

Improving chromaticity quality

The support of HLG-based coding considers HLG-based coding to provide better backward compatibility with SDR signaling. Thus, in theory, HLG-based signals may employ the same coding settings as conventional SDR signals. However, when viewing the HLG encoded signal in HDR mode, some color artifacts (artifacts) can still be observed, especially in achromatic (achromatic) areas (such as white and gray). In an embodiment, such artifacts may be reduced by adjusting the chromaQPOffset value during encoding. It is suggested that for HLG content, less aggressive chromaQP adjustments are applied than are used when encoding the PQ signal. For example, in reference [10], a model of assigning QP offsets for Cb and Cr based on luminance QP and assigning factors based on capturing color primaries and representing color primaries is described as:

QPoffsetCb＝Clip3(-12，0，Round(c _cb *(k*QP+l)))， (18a)

QPoffsetCr＝Clip3(-12，0，Round(c _cr *(k*QP+l)))， (18b)

Wherein the method comprises the steps ofIf the capture color primary is the same as the representation color primary, c _cb =1, if the capture color primary is equal to the P3D65 primary and represents a color primary equal to the rec.itu-R bt.2020 primary, c _cb =1.04, and if the capture color primary is equal to the rec.itu-R bt.709 primary and the representation primary is equal to the rec.itu-R bt.2020 primary, c _cb =1.14. Similarly, if the capture color primary is the same as the representation color primary, c _cr =1, if the capture color primary is equal to the P3D65 primary and represents a color primary equal to the rec.itu-R bt.2020 primary, c _cr =1.39, and if the capture color primary is equal to the rec.itu-R bt.709 primary and the representation primary is equal to the rec.itu-R bt.2020 primary, c _cr =1.78. Finally, k= -0.46 and l=0.26.

In an embodiment, it is suggested to use the same model but with different parameters to produce less aggressive chromaQPOffset changes. For example, without limitation, in an embodiment, for Cb, c in equation (18 a) _cb =1, k= -0.2 and l=7, and for Cr, c in equation (18 b) _cr =1, k= -0.2, and l=7. Fig. 6A and 6B depict examples of how the chromaQPOffset value changes according to the luminance Quantization Parameter (QP) for PQ (rec.709) and HLG. The change in the PQ correlation value is more pronounced than the HLG correlation value. Fig. 6A corresponds to Cb (equation (18 a)), and fig. 6B corresponds to Cr (equation (18B)).

Reference to the literature

Each of the references listed herein is incorporated by reference in its entirety.

[1] PCT application PCT/US2016/025082, filed at 30/3/2016, is also published as WO 2016/164235 by G-M.Su as In-Loop Block-Based Image Reshaping In High Dynamic Range Video Coding (Block-based In-Loop image shaping In high dynamic range video coding).

[2] Baylon, Z.Gu, A.Luthra, K.Minoo, P.yin, F.Pu, T.Lu, T.Chen, W.Husak, Y.He, L.Kerofsky, Y.Ye, B.Yi "Response to Call for Evidence for HDR and WCG Video Coding:Arris Dolby and InterDigital (evidence-supported responses to HDR video coding and WCG video coding: arris, dolby and Interdigital)", document m36264, 7 months 2015, polish Huasha.

[3] U.S. patent application 15/410,563 to Content-Adaptive Reshaping for High Codeword representation Images (Content adaptive shaping for high codeword representation images) filed by t.lu et al at 1/19 in 2017.

[4] PCT application PCT/US2016/042229, filed at 14, 7, 2016 (signal shaping and encoding for HDR and wide color gamut signals), was also published as WO 2017/01636 by p.yin et al.

[5] Minoo et al, "Exploratory Test Model for HDR extension of HEVC (exploratory test model for HDR expansion for HEVC)", MPEG output document, JCTCVC-W0092 (m 37732), 2016, san Diego, USA.

[6]E.Francois、J.Sole、J."Common Test Conditions for HDR/WCG video coding experiments (general test conditions for HDR/WCG video coding experiments)" by P.YIn, JCTCVC document Z1020, nitrowa, month 1 of 2017.

[7] Segall, "JVET common test conditions and evaluation procedures for HDR/WCG Video (JVET Universal test conditions and evaluation procedure for HDR/WCG Video)", E.Francois and D.Rusanovskyy, JVET-E1020, ITU-T conference, nitrowa, month 1 of 2017.

[8]JEM 6.0 software:https://jvet.hhi.fraunhofer.de/svn/svnHMJEMSoftware/ tags/HM-16.6-JEM-6.0

[9] U.S. provisional patent application Ser. No. 62/406,483, "Adaptive Chroma Quantization in Video Coding for Multiple Color Imaging Formats (adaptive chroma quantization in video coding for multiple color imaging formats)" filed by Lu et al at 10/11, also filed as U.S. patent application Ser. No. 15/728,939, published as U.S. patent application publication US 2018/0103253.

[10] J.Samuelsson et al, "Conversion and coding practices for HDR/WCG Y 'CbCr 4:2:0Video with PQ Transfer Characteristics (for HDR/WCG Y' CbCr 4:2:0video conversion and coding practice with PQ transfer characteristics)" JCTCVC-Y1017, ITU-T/ISO conference, chengdu, 2016, 10 months.

[11] ITU-T h.265, "High efficiency video coding (high efficiency video coding)", ITU, version 4.0, (12/2016).

Example computer System embodiment

Embodiments of the invention may be implemented using a computer system, a system configured with electronic circuits and components, an Integrated Circuit (IC) device such as a microcontroller, a Field Programmable Gate Array (FPGA), or other configurable or Programmable Logic Device (PLD), a discrete-time or Digital Signal Processor (DSP), an application-specific IC (ASIC), and/or an apparatus including one or more such systems, devices, or components. The computer and/or IC may execute, control or carry out instructions related to integrated signal shaping and image encoding, such as those described herein. The computer and/or IC may calculate any of a variety of parameters or values related to the signal shaping and encoding processes described herein. Image and video embodiments may be implemented in hardware, software, firmware, and various combinations thereof.

Certain embodiments of the invention include a computer processor executing software instructions that cause the processor to perform the method of the invention. For example, one or more processors in a display, encoder, set-top box, transcoder, etc. may implement the methods related to integrated signal shaping and image coding as described above by executing software instructions in a program memory accessible to the processors. The present invention may also be provided in the form of a program product. The program product may comprise any non-transitory medium carrying a set of computer readable signals comprising instructions which, when executed by a data processor, cause the data processor to perform the method of the invention. The program product according to the invention may take any of a variety of forms. The program product may comprise, for example, physical media such as magnetic data storage media including floppy disks, hard disk drives, optical data storage media including CD ROMs, DVDs, electronic data storage media including ROMs, flash RAM, etc. The computer readable signal on the program product may optionally be compressed or encrypted.

Where a component (e.g., a software module, processor, assembly, device, circuit, etc.) is referred to above, unless otherwise indicated, references to such component (including references to "means") are to be interpreted as including as equivalents of any component which performs the function of the described component (e.g., that is functionally equivalent), including components which are not structurally equivalent to the disclosed structure which performs the function in the illustrated exemplary embodiments of the invention.

Equivalents, extensions, alternatives

Example embodiments related to efficient integrated signal shaping and image coding are thus described. In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicants to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Thus, no limitation, element, feature, characteristic, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

1. A method for encoding an image with a processor, the method comprising:

accessing, with a processor, an input image represented by a first codeword;

generating a forward shaping function mapping pixels of the input image to a second codeword representation, wherein the second codeword representation allows for more efficient compression than the first codeword representation;

generating an inverse shaping function based on the forward shaping function, wherein the inverse shaping function maps pixels from the second codeword representation to the first codeword representation;

for an input pixel region in the input image;

calculating a prediction region based on pixel data in a reference frame buffer or in a previously encoded spatial neighborhood;

generating a quantized residual region based on the shaped residual region;

generating a dequantized residual region based on the quantized residual region;

generating a reconstructed pixel region based on the dequantized residual region, the prediction region, the forward shaping function, and the inverse shaping function, wherein generating the reconstructed pixel region comprises calculating:

Reco_sample (i) =Inv (Res_d (i) +Fwd (pred_sample (i))), wherein Reco_sample (i) represents pixels of the reconstructed pixel region, res_d (i) represents pixels of the dequantized residual region, inv () represents the inverse shaping function, fwd () represents the forward shaping function, and pred_sample (i) represents pixels of the prediction region; and

2. The method of claim 1, further comprising:

generating a shaper signaling bit stream, said shaper signaling bit stream characterizing said forward shaping function and/or said reverse shaping function; and

multiplexing the shaper signaling bitstream with an encoded bitstream generated based on the input image to generate an output bitstream.

3. The method of claim 1, wherein generating the quantized residual region comprises:

applying a forward encoding transform to the shaped residual region to generate transformed data; and

a forward coding quantizer is applied to the transformed data to generate quantized data.

4. The method of claim 3, wherein generating the dequantized residual region comprises:

Applying an inverse coding quantizer to the quantized data to generate inverse quantized data; and

an inverse coding transform is applied to the inverse quantized data to generate the dequantized residual region.

5. The method of claim 1, wherein generating the reference pixel region to be stored on the reference frame buffer comprises applying a loop filter to the reconstructed pixel region.

6. The method of claim 1, wherein generating the shaped residual region comprises computing:

Res_r(i)＝Fwd(Orig_sample(i))-Fwd(Pred-sample(i))，

where res_r (i) represents the pixels of the shaped residual region and origin_sample (i) represents the pixels of the input pixel region.

7. The method of claim 6, wherein generating the shaped residual region comprises simplifying res_r (i) pixels by calculating:

Res_r(i)＝a(Pred_sample(i))*(Orig_sample(i)-Pred_sample(i))；

where a (pred_sample (i)) represents a scaling factor based on the value of pred_sample (i).

8. The method of claim 7, wherein generating the reconstructed pixel region comprises simplifying a recao sample (i) pixel by calculating:

Reco_sample(i)＝Pred_sample(i)+(1/a(Pred_sample(i)))*Res_d(i)。

9. the method of claim 1, wherein the input pixel region comprises an image region of interest.

10. The method of any of claims 1 to 9, further comprising a method for optimizing coding related decisions based on the forward shaping function, wherein the coding related decisions comprise one or more of: inter/intra mode decision, dQP optimization, rate distortion optimization, cross-component linear model prediction, residual prediction, adaptive clipping, or loop filtering.

11. A method for decoding an encoded bitstream with a processor to generate an output image represented by a first codeword, the method comprising:

the receiving portion employing a second codeword representation of the encoded image, wherein the second codeword representation allows for more efficient compression than the first codeword representation;

receiving shaping information of the encoded image;

generating a forward shaping function mapping pixels from the first codeword representation to the second codeword representation based on the shaping information;

generating an inverse shaping function based on the shaping information, wherein the inverse shaping function maps pixels from the second codeword representation to the first codeword representation;

a region for the encoded image;

generating a decoded shaped residual region;

generating a reconstructed pixel region based on the decoded shaped residual region, the prediction region, the forward shaping function, and the reverse shaping function, wherein generating the reconstructed pixel region comprises computing:

reco_sample (i) =Inv (Res_d (i) +Fwd (pred_sample (i))), wherein Reco_sample (i) represents pixels of the reconstructed pixel region, res_d (i) represents pixels of the decoded and shaped residual region, inv () represents the inverse shaping function, fwd () represents the forward shaping function, and pred_sample (i) represents pixels of the prediction region;

generating an output pixel region of the output image based on the reconstructed pixel region; and

the output pixel region is stored in the reference pixel buffer.

12. The method of claim 11, wherein the region of the encoded image comprises an image region of interest.

13. A method for decoding an encoded bitstream with a processor to generate an output image represented by a first codeword, the method comprising:

receiving shaping information of the encoded image;

generating a shaping scaling function based on the shaping information;

a region for the encoded image;

generating a decoded shaped residual region;

generating a reconstructed pixel region based on the decoded shaped residual region, the prediction region, and the shaping scaling function;

the output pixel region is stored in the reference pixel buffer.

14. The method of claim 13, wherein generating the reconstructed pixel region comprises computing:

Reco_sample(i)＝Pred_sample(i)+(1/a(Pred_sample(i)))*Res_d(i)

where Reco_sample (i) represents the pixels of the reconstructed pixel region, res_d (i) represents the pixels of the decoded and shaped residual region, a () represents the shaping scaling function, and pred_sample (i) represents the pixels of the prediction region.

15. The method of claim 13, wherein the region of the encoded image comprises an image region of interest.

16. An apparatus for image shaping, comprising:

one or more processors

A memory having software instructions stored thereon that when executed by the one or more processors cause performance of any of the methods recited in claims 1-15.

17. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions for performing the method of any one of claims 1 to 15.