OA20172A

OA20172A - Image reshaping in video coding using rate distortion optimization.

Info

Publication number: OA20172A
Application number: OA1202000277
Authority: OA
Inventors: Peng Yin; Fangjun PU; Taoran Lu; Tao Chen; Walter J. Husak; Sean Thomas MCCARTHY
Original assignee: Dolby Laboratories Licensing Corporation
Priority date: 2018-01-14
Filing date: 2019-02-13
Publication date: 2021-12-30

Abstract

Given a sequence of images in a first codeword representation, methods, processes, and systems are presented for image reshaping using rate distortion optimization, wherein reshaping allows the images to be coded in a second codeword representation which allows more efficient compression than using the first codeword representation. Syntax methods for signaling reshaping parameters are also presented.

Description

IMAGE RESHAPING IN VIDEO CODING USING RATE DISTORTION

OPTIMIZATION

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority to U.S. Provisional Patent Applications Ser. No. 62/792,122, filed on Jan 14, 2019, Ser. No. 62/782,659, filed on Dec. 20, 2018, Ser. No. 62/772,228, filed onNov. 28, 2018, Ser. No. 62/739,402, filed on Oct. 1, 2018, Ser. No. 62/726,608, filed on Sept. 4, 2018, Ser. No. 62/691,366, filed on June 28, 2018, and Ser. No. 62/630,385, filed on Feb. 14, 2018, each of which is incorporated herein by reference in its entirety.

TECHNOLOGY

The présent invention relates generally to images and video coding. More particularly, an embodiment of the présent invention relates to image reshaping in video coding.

BACKGROUND

In 2013, the MPEG group in the International Standardization Organization (ISO), jointly with the International Télécommunications Union (ITU), released the first draft ofthe HEVC (also known as H.265) video coding standard (Ref. [4]). More recently, the same group has released a call for evidence to support the development of a next génération coding standard that provides improved coding performance over existing video coding technologies.

As used herein, the term ‘bit depth’ dénotés the number of pixels used to represent one of the color components of an image. Traditionally, images were coded at 8-bits, per color component, per pixel (e.g., 24 bits per pixel); however, modem architectures may now support higher bit depths, such as 10 bits, 12 bits or more.

In a traditional image pipeline, captured images are quantized using a non-linear opto-electronic function (OETF), which converts linear scene light into a non-linear video signal (e.g·, gamma-coded RGB or YCbCr). Then, on the receiver, before being displayed on the display, the signal is processed by an electro-optical transfer function (EOTF) which translates video signal values to output screen color values. Such non-linear fùnctions include the traditional “gamma” curve, documented in ITU-R Rec. BT.709 and BT. 2020, the “PQ” (perceptual quantization) curve described in SMPTE ST 2084, and the “HybridLog-gamma” or “HLG” curve described in and Rec. ITU-R BT. 2100.

As used herein, the term “forward reshaping” dénotés a process of sample-to-sample or codeword-to-codeword mapping of a digital image from its original bit depth and original

- 1 20172 codewords distribution or représentation (e.g., gamma or PQ or HLG, and the like) to an image ofthe same or different bit depth and a different codewords distribution or représentation.

Reshaping allows for improved compressibility or improved image quality at a fixed bit rate. For example, without limitation, reshaping may be applied to 10-bit or 12-bit PQ-coded HDR video to improve coding efficiency in a 10-bit video coding architecture. In a receiver, after decompressing the reshaped signal, the receiver may apply an “inverse reshaping function” to restore the signal to its original codeword distribution. As appreciated by the inventors here, as development begins for the next génération of a video coding standard, improved techniques for the integrated reshaping and coding of images are desired. Methods of this invention can be applicable to a variety of video content, including, but not limited, to content in standard dynamic range (SDR) and/or high-dynamic range (HDR).

The approaches described in this section are approaches that could be pursued, but not necessarily approaches that hâve been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to hâve been recognized in any prior art on the basis of this section, unless otherwise indicated.

BRIEF DESCRIPTION OF THE DRAWINGS

An embodiment of the présent invention is illustrated by way of example, and not in way by limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar éléments and in which:

FIG. 1A depicts an example process for a video delivery pipeline;

FIG. IB depicts an example process for data compression using signal reshaping according to prior art;

FIG. 2A depicts an example architecture for an encoder using hybrid in-loop reshaping according to an embodiment of this invention;

FIG. 2B depicts an example architecture for a décoder using hybrid in-loop reshaping according to an embodiment of this invention;

FIG. 2C depicts an example architecture for intra-CU decoding using reshaping according to an embodiment;

FIG. 2D depicts an example architecture for inter-CU decoding using reshaping according to an embodiment;

FIG. 2E depicts an example architecture for intra-CU decoding within inter-coded slices according to an embodiment for luma or chroma processing;

-220172

FIG. 2F depicts an example architecture for intra-CU decoding within inter-coded slices according to an embodiment for chroma processing;

FIG. 3A depicts an example process for encoding video using a reshaping architecture according to an embodiment of this invention;

FIG. 3B depicts an example process for decoding video using a reshaping architecture according to an embodiment of this invention;

FIG. 4 depicts an example process for reassigning codewords in the reshaped domain according to an embodiment of this invention;

FIG. 5 depicts an example process for deriving reshaping thresholds according to an embodiment of this invention;

FIG. 6A, 6B, 6C, and 6D depict example data plots for deriving reshaping thresholds according to the process depicted in FIG. 5 and an embodiment of this invention; and

FIG. 6E depicts examples of codeword allocation according to bin variance according to embodiments of this invention

DESCRIPTION OF EXAMPLE EMBODIMENTS

Signal reshaping and coding techniques for compressing images using rate-distortion optimization (RDO) are described herein. In the following description, for the purposes of explanation, numerous spécifie details are set forth in order to provide a thorough understanding of the présent invention. It will be apparent, however, that the présent invention may be practiced without these spécifie details. In other instances, well-known structures and devices are not described in exhaustive detail, in order to avoid unnecessarily occluding, obscuring, or obfuscating the présent invention.

OVERVIEW

Example embodiments described herein relate to signal reshaping and coding for video. In an encoder, a processor receives an input image in a first codeword représentation to be reshaped to a second codeword représentation, wherein the second codeword représentation allows for a more efficient compression than the first codeword représentation, and generates a forward reshaping function mapping pixels of the input image to a second codeword représentation, wherein to generate the forward reshaping function, the encoder: divides the input image into multiple pixel régions, assigns each of the pixel régions to one of multiple codeword bins according to a first luminance characteristic of each pixel région, computes a bin metric for each one of the multiple codeword bins according to a second luminance characteristic

-3 20172 of each of the pixel régions assigned to each codeword bin, allocates a number of codewords in the second codeword représentation to each codeword bin according to the bin metric of each codeword bin and a rate distortion optimization criterion, and generates the forward reshaping function in response to the allocation of codewords in the second codeword représentation to each of the multiple codeword bins.

In an another embodiment, in a décoder, a processor receives coded bitstream syntax éléments characterizing a reshaping model, wherein the syntax éléments include one or more of a flag indicating a minimum codeword bin index value to be used in a reshaping construction process, a flag indicating a maximum codeword bin index value to be used in a reshaping construction process, a flag indicating a reshaping model profde type, wherein the model profde type is associated with default bin-relating parameters, including bin importance values, or a flag indicating one or more delta bin importance values to be used to adjust the default bin importance values defined in the reshaping model profile. The processor détermines based on the reshaping model profile the default bin importance values for each bin and an allocation list of a default numbers of codewords to be allocated to each bin according to the bin’s importance value. Then, for each codeword bin, the processor:

détermines its bin importance value by adding its default bin importance value to its delta bin importance value;

détermines the number of codewords to be allocated to the codeword bin based on the bin’s bin importance value and the allocation list; and generates a forward reshaping function based on the number of codewords allocated to each codeword bin.

In another embodiment, in a décoder, a processor receives a coded bitstream comprising one or more coded reshaped images in a first codeword représentation and metadata related to reshaping information for the coded reshaped images. The processor generates based on the metadata related to the reshaping information, an inverse reshaping function and a forward reshaping function, wherein the inverse reshaping function maps pixels of the reshaped image from the first codeword représentation to a second codeword représentation, and the forward reshaping function maps pixels of an image from the second codeword représentation to the first codeword représentation. The processor extracts from the coded bitstream a coded reshaped image comprising one or more coded units, wherein for one or more coded units in the coded reshaped image:

for a reshaped intra-coded coding unit (CU) in the coded reshaped image, the processor:

-420172 generates first reshaped reconstructed samples of the CU based on reshaped residuals in the CU and first reshaped prédiction samples;

generates a reshaped loop fîlter output based on the first reshaped reconstructed samples and loop-filter parameters;

applies the inverse reshaping function to the reshaped loop filter output to generate decoded samples of the coding unit in the second codeword représentation; and stores the decoded samples of the coding unit in the second codeword représentation in a reference buffer;

for a reshaped inter-coded coding unit in the coded reshaped image, the processor: applies the forward reshaping function to prédiction samples stored in the reference buffer in the second codeword représentation to generate second reshaped prédiction samples;

generates second reshaped reconstructed samples of the coding unit based on reshaped residuals in the coded CU and the second reshaped prédiction samples;

generates a reshaped loop filter output based on the second reshaped reconstructed samples and loop-filter parameters;

applies the inverse reshaping function to the reshaped loop filter output to generate samples of the coding unit in the a second codeword représentation; and stores the samples of the coding unit in the second codeword représentation in a reference buffer. Finally, the processor generates a decoded image based on the stored samples in the reference buffer.

In another embodiment, in a décoder, a processor receives a coded bitstream comprising one or more coded reshaped images in an input codeword représentation and reshaping metadata (207) for the one or more coded reshaped images in the coded bitstream. The processor generates a forward reshaping fùnction (282) based on the reshaping metadata, wherein the forward reshaping function maps pixels of an image from a first codeword représentation to the input codeword représentation. The processor generates an inverse reshaping function (265-3) based on the reshaping metadata or the forward reshaping function, wherein the inverse reshaping function maps pixels of a reshaped image from the input codeword représentation to the first codeword représentation. The processor extracts from the coded bitstream a coded reshaped image comprising one or more coded units, wherein:

for an intra-coded coding unit (intra-CU) in the coded reshaped image, the processor: generates reshaped reconstructed samples of the intra-CU (285) based on reshaped residuals in the intra-CU and intra-predicted reshaped prédiction samples;

- 5 20172 applies the inverse reshaping function (265-3) to the reshaped reconstructed samples of the intra-CU to generate decoded samples of the intra-CU in the first codeword représentation;

applies a loop filter (270) to the decoded samples of the intra-CU to generate output samples of the intra-CU; and stores the output samples of the intra-CU in a reference buffer;

for an inter-coded CU (inter-CU) in the coded reshaped image, the processor: applies the forward reshaping function (282) to inter-prediction samples stored in the reference buffer in the first codeword représentation to generate reshaped prédiction samples for the inter-CU in the input codeword représentation;

générâtes reshaped reconstructed samples of the inter-CU based on reshaped residuals in the inter-CU and the reshaped prédiction samples for the inter-CU;

applies the inverse reshaping function (265-3) to the reshaped reconstructed samples of the inter-CU to generate decoded samples of the inter-CU in the first codeword représentation;

applies the loop filter (270) to the decoded samples of the inter-CU to generate output samples of the inter-CU; and stores the output samples of the inter-CU in the reference buffer; and generates a decoded image in the first codeword représentation based on output samples in the reference buffer.

Example Video Delivery Processing Pipeline

FIG. 1A depicts an example process of a conventional video delivery pipeline (100) showing various stages from video capture to video content display. A sequence of video frames (102) is captured or generated using image génération block (105). Video frames (102) may be digitally captured (e.g. by a digital caméra) or generated by a computer (e.g. using computer animation) to provide video data (107). Alternatively, video frames (102) may be captured on film by a film caméra. The film is converted to a digital format to provide video data (107). In a production phase (110), video data (107) is edited to provide a video production stream (112).

The video data of production stream (112) is then provided to a processor at block (115) for post-production editing. Block (115) post-production editing may include adjusting or modifying colors or brightness in particular areas of an image to enhance the image quality or achieve a particular appearance for the image in accordance with the video creator's créative intent. This is sometimes called “color timing” or “color grading.” Other editing (e.g. scene

- 6 20172 sélection and sequencing, image cropping, addition of computer-generated visual spécial effects, etc.) may be performed at block (115) to yield a final version (117) of the production for distribution. During post-production editing (115), video images are viewed on a reference display (125).

Following post-production (115), video data of final production (117) may be delivered to encoding block (120) for delivering downstream to decoding and playback devices such as télévision sets, set-top boxes, movie theaters, and the like. In some embodiments, coding block (120) may include audio and video encoders, such as those defined by ATSC, DVB, DVD, BluRay, and other delivery formats, to generate coded bit stream (122). In a receiver, the coded bit stream (122) is decoded by decoding unit (130) to generate a decoded signal (132) representing an identical or close approximation of signal (117). The receiver may be attached to a target display (140) which may hâve completely different characteristics than the reference display (125). In that case, a display management block (135) may be used to map the dynamic range of decoded signal (132) to the characteristics of the target display (140) by generating displaymapped signal (137).

Signal Reshaping

FIG. IB depicts an example process for signal reshaping according to prior art [2]. Given input frames (117), a forward reshaping block (150) analyzes the input and the coding constrains and generates codeword mapping functions which map input frames (117) to re-quantized output frames (152). For example, input (117) may be encoded according to certain electro-optical transfer function (EOTF) (e.g., gamma). In some embodiments, information about the reshaping process may be communicated to downstream devices (such as decoders) using metadata. As used herein, the term “metadata” relates to any auxiliary information that is transmitted as part of the coded bitstream and assists a décoder to render a decoded image. Such metadata may include, but are not limited to, color space or gamut information, reference display parameters, and auxiliary signal parameters, as those described herein.

Following coding (120) and decoding (130), decoded frames (132) may be processed by a backward (or inverse) reshaping function (160), which couverts the re-quantized frames (132) back to the original EOTF domain (e.g., gamma), for further downstream processing, such as the display management process (135) discussed earlier. In some embodiments, the backward reshaping fùnction (160) may be integrated with a de-quantizer in décoder (130), e.g., as part of the de-quantizer in an AVC or HEVC video décoder.

- 7 20172

As used herein, the term “reshaper” may dénoté a forward or an inverse reshaping function to be used when coding and/or decoding digital images. Examples of reshaping functions are discussed in Ref. [2]. In Ref. [2], an in-loop block-based image reshaping method for high dynamic range video coding was proposed. That design allows block-based reshaping inside the coding loop, but at a cost of increased complexity. To be spécifie, the design requires maintaining two sets of decoded-image buffers: one set for inverse-reshaped (or non-reshaped) decoded pictures, which can be used for both prédiction without reshaping and for output to a display, and another set for forward-reshaped decoded pictures, which is used only for prédiction with reshaping. Though forward-reshaped decoded pictures can be computed on the fly, the complexity cost is very high, especially for inter-prediction (motion compensation with sub-pixel interpolation). In general, display-picture-buffer (DPB) management is complicated and requires very careful attention, thus, as appreciated by the inventors, simplified methods for coding video are desired.

In Ref. [6], additional reshaping-based codée architectures were presented, including an external, out-of-loop reshaper, an architecture with an in-loop intra only reshaper, an architecture with an in-loop reshaper for prédiction residuals, and a hybrid architecture which combines both intra, in-loop, reshaping and inter, residual reshaping. The main goal of those proposed reshaping architectures is to improve subjective visual quality. Thus, many of these approaches will yield worse objective metrics, in particular the well-known Peak Signal to Noise Ratio (PSNR) metric.

In this invention, a new reshaper is proposed based on Rate-Distortion Optimization (RDO). In particular, when the targeted distortion metric is MSE (Mean Square Error), the proposed reshaper will improve both subjective visual quality and well-used objective metrics based on PSNR, Bjontegaard PSNR (BD-PSNR), or Bjontegaard Rate (BD-Rate). Note that any of the proposed reshaping architectures, without loss of generality, may be applied for the luminance component, one or more of the chroma components, or a combination of luma and chroma components.

Reshaping based on Rate-Distortion Optimization

Consider a reshaped video signal represented by a bit-depth of B bits in a color component (e.g., B = 10 for Y, Cb, and/or Cr), thus there are a total of 2^S available codewords.

D

Consider dividing the desired codeword range [0 2 ] into N segments or bins, and let represents the number of codewords in the A-th segment or bin, after a reshaping mapping, so that given a target bit rate R, the distortion D between the source picture and the decoded or reconstructed picture is minimal. Without loss of generality, D may be expressed as a measure

- 8 20172 of the sum of square error (SSE) between corresponding pixel values of the source input (Sourand the reconstructed picture (Recon(ij))

D = SSE = X^DiffÇijy, (1) where

Diff(i,f) = SourceÇi.j') — Recon(i,f).

The optimization reshaping problem may be re-written as: findMk (k=0, 1, ...,7V-1), such that given a bitrate R, D is minimal, where <= 2^B .

Various optimization methods can be used to find a solution, but the optimal solution could be very complicated for real-time encoding. In this invention, a suboptimal, but more practical analytical solution is proposed.

Without losing generality, consider an input signal represented by a bit depth of B bits (e.g., B = 10), where the codewords are uniformly divided into A^bins (e.g., N= 32). By default, each bin is assigned to M_a= 2^B/N codewords (e.g., for N=32 and 5=10, M_a = 32). Next, a more efficient codeword allocation, based on RDO, will be demonstrated through an example.

As used herein, the term “narrow range” [CW1, CW2] dénotés a continuous range of codewords between codewords CW1 and CW2 which is a subset of the full dynamic range [0 2^B1]. For example, in an embodiment, a narrow range may be defined as [ 16*2^(S’⁸/ 235*2^(S-8)], (e.g., for B = 10, the narrow range comprises values [64 940]). Assuming the bit depth of the output signal is Bo, if the dynamic range of an input signal is within a narrow range, then, in what will be denoted as “default” reshaping, one can stretch the signal into the full range [0 2^δο-1], Then, each bin will hâve about M = CEIL((2^Bo/(CW2-CWl))*Ma) codewords, or, for our example, if B_o=B=10, Mf = CEIL((1024/(940-64))*32) = 38 codewords, where CEIL(x) dénotés the ceiling function, which maps x to the least integer that is greater or equal to x. Without losing generality, in the examples below, for simplicity, it is assumed that B_O=B.

For the same quantization parameter (QP), the effect of increasing the number of codewords in a bin is équivalent to allocating more bits to code the signal within the bin, therefore it is équivalent to reducing SSE or improving PSNR; however, a uniform increase of codeword allocation in each bin may not give better results than coding without reshaping, because PSNR gain may not beat the increase of bitrate, i.e., this is not a good tradeoff in terms of RDO. Ideally, one would like to assign more codewords only to the bins which yield the best tradeoff on RDO, i.e., generate significant SSE decrease (PSNR increase) at the expense of little amount of bitrate increase.

In an embodiment, RDO performance is improved through an adaptive piecewise reshaping mapping. The method can be applied to any type of a signal, including standard

-920172 dynamic range (SDR) and high-dynamic range (HDR) signais. Using the previous simple case as an example, the goal of this invention is to assign either M_a or codewords for each codeword segment or codeword bin.

At an encoder, given N codeword bins for the input signal, the average luminance variance of each bin can be approximated as following:

Initialize to zéro the sum of block variance (yar_bm(kf) and a counter (ο>ί_η(£)) for each bin, e.g., wzrbin(X) = 0 and Cbin(^) = 0, for k = 0, 1, ..., V-l.

Divide the picture into L* L non-overlapped blocks (e.g., Z=16)

For each picture block, compute the luma mean of the block and the luma variance of the block i (e.g., Luma_mean(z) and Luma_var(z))

Based on the mean luma of the block, assign that block to one of the N bins. In an embodiment, if Luma_mean(z) is within the £-th segment in the input dynamic range, the total bin luminance variance for the Æ-th bin is incremented by the luma variance of the newly assigned block, and the counter for that bin is increased by one. That is, if the z-th pixel région belongs to the £-th bin: var_bi_n(k) = var^ifk) + Luma_var(z); (2)

CbinU) Cbin(^) + 1 ·

For each bin, compute the average luminance variance for that bin by dividing the sum of the block variances in that bin by the counter, assuming the counter is not equal to 0; or if CbirY) is not 0, then var_bm(k) = var_bm(k)/ c_bin(k) (3)

A person skilled in the art would appreciate that one may apply alternative metrics than luminance variance to characterize the sub-blocks. For example, one may use the standard déviation of luminance values, a weighted luminance variance or luminance value, a peak luminance, and the like.

In an embodiment, the following pseudo code depicts an example on how an encoder may adjusts the bin allocation using the computed metrics for each bin.

For the /c-th bin, if no pixels are in the bin

M = 0;

else if var_bi_n(k) < THu (4) = Mf ;

else

- 1020172

μ. = μ_ά ; // (note: this is to make sure that each bin will hâve at least M_a codewords. //

Alternatively, one may also allocate M_a+1 codewords) end where THu dénotés a predetermined upper threshold.

In another embodiment, the allocation may be performed as follows: For the A-th bin, if no pixels are in the bin

M = 0;

else if THo < varbi_n(A) < THi (5) else

M^ = M_a ;

end where THo and THi dénoté predetermined lower and upper thresholds.

In another embodiment

For the Æ-th bin, if no pixels are in the bin

M = 0;

else if var^ffk) >THl (6)

Mk = Mf ·, else

M = M;

end where THl dénotés a predetermined lower threshold.

The above examples show how to select the number of codewords for each bin from two pre-select numbers Mf and M_a. Thresholds (e.g., THu or THl) can be determined based on optimizing the rate distortion, e.g., though exhaustive search. Thresholds may also be adjusted based on the quantization parameter values (QP). In an embodiment, for 5=10, thresholds may range between 1,000 and 10,000.

In an embodiment, to expedite processing, a threshold may be determined from a fixed set of values, say, {2,000, 3,000, 4,000, 5,000, 6,000, 7,000}, using a Lagrangian optimization method. For example, for each TH(z) value in the set, using pre-defïned training clips, one can run compression tests with fixed QP, and compute values of an objective function J is defined as

J(Î) = D + kR. (7)

- 11 20172

Then, the optimal threshold value may be defined as the TH(z) value in the set for which J(i) is minimum.

In a more general example, one can predefine a look-up table (LUT). For example, in Table 1, the first row defines a set of thresholds dividing the full range of possible bin metrics (e.g., var^fk) values) into segments, and the second row defines the corresponding number of codewords (CW) to be assigned in each segment. In an embodiment, one rule to build such a LUT is: if the bin variance is too big, one may need to spend lots of bits to reduce the SSE, therefore one can assign codeword (CW) values less than M_a. If the bin variance is very small, one can assign a CW value larger than M_a.

Table 1: Example LUT of codeword allocation based on bin variance thresholds

TH₀		TH_p.i	TH_P	TH_p+1		TH_q.l
CW₀		CWp.i	CW_P	CW_p+i		CW_q.i	cw_q

Using Table 1, the mapping of thresholds into codewords may be generated as follows:

For the Æ-th bin, if there are no pixels in the bin

M = 0;

else if ναπ,ίη(Α) < THo

M_k = CW₀;

else if THo< vcuy_m{k) <THi (8)

M_k = CW! ;

else if THp-i < var^k) < TH_P

My = CW_P ;

else if varbm(k) > TH_q.i

M_k = CW_q ;

end

For example, given two thresholds and three codeword allocations, for B = 10, in an embodiment, TH₀= 3,000, CW₀ = 38, THi = 10,000, CW! = 32, and CW₂ = 28.

In another embodiment, the two thresholds THo and THi may be selected as follows: a) consider TH, to be a very large number (even infinity) and select THo from a set of predetermined values, e.g., using the RDO optimization in équation (7). Given THo, now define a second set of possible values for THi, e.g., set {10,000, 15,000, 20,000, 25,000, 30,000}, and

- 12 20172 apply équation (7) to identify the optimum value. The approach can be iteratively performed with a limited numbers of threshold values or until it converges.

One may note that after allocating codewords to bins according to any of the schemes defined earlier, either the sum of My values may exceed the maximum of available codewords β

(2 ) or there are unused codewords. If there are unused codewords, one may simply décidé to do nothing, or allocate them to spécifie bins. On the other hand, if the algorithm assigned more codewords than available, then one may want to readjust the My values, e.g., by renormalizing the CW values. Alternatively, one may generate the forward reshaping function using the existing A4 values, but then readjust the output value of the reshaping function by scaling with (Σ/c Lf/l⁸. Examples of codeword réallocation techniques are also described in Ref. [7].

FIG. 4 depicts an example process for allocating codewords into the reshaping domain according to the RDO technique described earlier. In step 405, the desired reshaped dynamic range is divided into Vbins. After the input image is divided into non- overlapping blocks (step 410), for each block:

• Step 415 computes its luminance characteristics (e.g., mean and variance) • Step 420 assigns each image block to one of the N bins • Step 425 computes the average luminance variance in each bin

Given the values computed in step 425, in step 430, each bin is assigned a number of codewords according to one or more thresholds, for example, using any of the codeword allocation algorithms depicted in équations (4) to (8). Finally, in step (435), the final codeword allocation may be used to generate a forward reshaping functions and/or an inverse reshaping function.

In an embodiment, as an example and without limitation, the forward LUT (FLUT) can be built using the following C code.

tot_cw = 2^S; hist_lens = tot_cw/N; for (i = 0; i < N; i++) {

double temp = (double) M[i] / (double)hist_lens; //M[i] corresponds to M_kfor (j = 0; j < hist_lens; j++) {

CW_bins_LUT_all[i*hist_lens + j] = temp;

} }

Y_LUT_all[0] = CW_bins_LUT_all[0]; for (i = 1; i < tot_cw; i++) {

Y_LUT_all[i] = Y_LUT_all [i -1] + CW_bins_LUT_all[i];

} for (i = 0; i < tot_cw; i++)

- 13 20172 <

FLUT[i] = Clip3(0, tot_cw - 1, (Int)(Y_LUT_aII[i] + 0.5));

}

In an embodiment, the inverse LUT can be built as follows:

low = FLUT[O];

high = FLUT[tot_cw -1];

first = 0;

last = tot_cw -1;

for ( i = 1; i < totcw; i++) if (FLUT[O] < FLUT[i]) {

first = i -1;

break;

} for (i = tot_cw - 2; i >= 0; i-) if (FLUT[tot_cw -1] > FLUT[i]) {

last = i + 1;

break;

} for (i = 0; i < tot_cw; i++) {

if (i <= low) {

ILUT[i] = first;

} else if (i >= high) {

ILUT[i] = last;

} else {

for (j = 0; j < tot_cw - 1; j++) if (FLUT[j] >=i) {

ILUT[i] = j;

break;

} }

Syntax-wise, one can re-use the syntax proposed in previous applications, such as the piecewise polynomial mode or parametric model in References [5] and [6]. Table 2 shows such an example for N=32 for équation (4).

- 1420172

Table 2: Syntax of reshaping using a first parametric model

reshaping_model() {	Descriptor
reshaper model profile type	ue(v)
reshaper model scale idx	u(2)
reshaper model min bin idx	u(5)
reshaper model max bin idx	u(5)
for ( i = reshaper_model_min_bin_idx; i <= reshaper_model_max_bin_idx; i++) {
reshaper model bin profile delta [ i ]	u(l)

}

where, reshaper_model_profile_type spécifiés the profile type to be used in the reshaper construction process. A given profile may provide information about default values being used, such as the number of bins, default bin importance or priority values, and default codeword allocations (e.g., M_a and/or Mf values).

reshapermodelscaleidx spécifiés the index value of a scale factor (denoted as ScaleFactor) to be used in the reshaper construction process. The value of the ScaleFactor allows for improved control of the reshaping function for improved overall coding efficiency.

reshaper_model_min_bin_idx spécifiés the minimum bin index to be used in the reshaper construction process. The value of reshaper_model_min_bin_idx shall be in the range of 0 to 31, inclusive.

reshapermodelmaxbinidx spécifiés the maximum bin index to be used in the reshaper construction process. The value of reshaper_model_max_bin_idx shall be in the range of 0 to 31, inclusive.

reshaper_model_bin_profile_delta [ i ] spécifiés the delta value to be used to adjust the profile of the z-th bin in the reshaper construction process. The value of reshaper_model_bin_profile_delta [ i ] shall be in the range of 0 to 1, inclusive.

Table 3 depicts another embodiment with an alternative, more efficient, syntax représentation.

Table 3: Syntax of reshaping using a second parametric model

reshaping model() {	Descriptor
reshaper model profile type	ue(v)
reshaper model scale idx	u(2)
reshaper model min bin idx	ue(v)
reshaper model delta max bin idx	ue(v)
reshaper model num cw minusl	u(l)
for ( i = 0; i < reshaper model num cw minusl + 1; i++) {
reshaper model delta abs CW [ i ]	u(5)
if ( reshaper model delta abs CW > 0 )
reshaper model delta sign CW [ i ]	u(l)
}

- 15 20172

for ( i = reshaper_model_min_bin_idx; i <= reshaper_model_max_bin_idx; i++) {
reshaper model bin profile delta [ i ]	u(v)
}
}

where, reshapermodeldeltamaxbinidx is set equal to the maximum allowed bin index ( e.g., 31) minus the maximum bin index to be used in the reshaper construction process.

reshaper_model_num_cw_minusl plus 1 spécifiés the number of codewords to be signalled. reshaper_model_delta_abs_CW [ i ] spécifiés the z-th absolute delta codeword value. reshaper_model_delta_sign_CW [ i ] spécifiés the sign for the z-th delta codeword.

Then:

reshaper_model_delta_CW [ i ] = (1-2* reshaper_model_delta_sign_CW [ i ]) * reshaper_model_delta_abs_CW [ i ];

reshaper_model_CW [ i ] = 32 + reshaper_model_delta_CW [ i ]. reshaper_model_bin_profile_delta [ i ] spécifiés the delta value to be used to adjust the profile of the z-th bin in the reshaper construction process. The value of reshaper_model_bin_profile_delta [ i ] shall be in the range of 0 to 1 when reshaper_model_num_cw_minusl is equal to 0. The value of reshaper_model_bin_profile_delta [ i ] shall be in the range of 0 to 2 when reshaper_model_num_cw_minusl is equal to 1.

CW=32 when reshaper_model_bin_profile_delta [ i ] is set equal to 0, CW= reshaper_model_CW [ 0 ] when reshaper_model_bin_profile_delta [ i ] is set equal to 1 ; CW= reshaper_model_CW [ 1 ] when reshaper_model_bin_profile_delta [ i ] is set equal to 2. In an embodiment, reshaper_model_num_cw_minusl is allowed to be larger than 1 allowing reshaper model num cw minusl and reshaper_model_bin_profile_delta[ i ] to be signaled with ue(v) for more efficient coding.

In another embodiment, as described in Table 4, the number of codewords per bin may be defined explicitly.

Table 4: Syntax of reshaping using a third model______________________________

slice_reshaper_model () {	Descriptor
reshapermodelnumberbinsminusl	ue(v)
reshapermodelminbinidx	ue(v)
reshapermodeldeltamaxbinidx	ue(v)
reshaper_model_bin_delta_abs_cw_prec_minusl	ue(v)
for ( i = reshaper model min bin idx; i <= reshaper model max bin idx; i++) {
reshaper_model_bin_delta_abs_CW [ i ]	u(v)

- 16 20172

if( reshaper_model_bin_delta_abs_CW[ i ] ) > 0 )
reshaper_model_bin_delta_sign_CW_flag[ i ]	u(l)
l f
I J

reshaper_model_number_bins_minusl plus 1 spécifiés the number of bins used for the luma component. In some embodiments it may be more efficient that the number of bins is a power of two. Then, the total number of bins may be represented by its log2 représentation, e.g., using an alternative parameter like iog2_reshaper_model_number_bins_minusl. For example, for 32 bins log2_reshaper_model_number_bins_minusl = 4.

reshaper_model_bin_delta_abs_cw_prec_minusl plus 1 spécifiés the number of bits used for the représentation of the syntax reshaper_model_bin_delta_abs_CW[ i ].

reshaper_model_bin_delta_abs_CW[ i ] spécifiés the absolute delta codeword value for the ith bin.

reshaper_model_bin_deIta_sign_CW_flag[ i ] spécifiés the sign of reshaper_model_bin_delta_abs_CW[ i ] as follows:

- If reshaper_model_bin_delta_sign_CW_fIag[ i ] is equal to 0, the corresponding variable RspDeltaCW[ i ] has a positive value.

- Otherwise ( reshaper_model_bin_delta_sign_CW_flag[ i ] is not equal to 0 ), the corresponding variable RspDeltaCW[ i ] has a négative value.

When reshaper_model_bin_delta_sign_CW_flag[ i ] is not présent, it is inferred to be equal to 0.

The variable RspDeltaCW[ i ] = (1 - 2*reshaper_model_bin_delta_sign_CW[ i ]) * reshaper_model_bin_delta_abs_CW [ i ] ;

The variable OrgCW is set equal to (1 « BitDepthy) / (reshaper_model_number_bins_minusl + i);

The variable RspCW[ i ] is derived as follows:

if reshaper_model_min_bin_idx <= i <= reshaper_model_max_bin_idx then RspCW[ i ] = OrgCW + RspDeltaCW[ i ].

else, RspCW[ i ] = 0.

- 1720172

In an embodiment, assuming the codeword allocation according to one of the earlier examples, e.g., équation (4), an example of how to defïne the parameters in Table 2, comprises:

First assume one assigns “bin importance” as follows:

For the £-th bin, if M_k = 0;

bin_importance = 0;

else if M_k = = bin_importance = 2; (9) else bin_importance = 1 ;

end

As used herein, the term “bin importance” is a value assigned to each of the N codeword bins to indicate the importance of ail codewords in that bin in the reshaping process with respect to other bins.

In an embodiment, one may set the default_bin_importance from reshaper_model_min_bin_idx to reshaper_model_max_bin_idx to 1. The value of reshaper_model_min_bin_idx is set to the smallest bin index which has M_k not equal to 0. The value of reshaper_model_max_bin_idx is set to the largest bin index which has M_k not equal to 0. reshaper_model_bin_profile_delta for each bin within [reshaper_model_min_bin_idx reshaper_model_max_bin_idx] is the différence between bin_importance and the default_bin_importance.

An example of how to use the proposed parametric model to construct a Forward Reshaping LUT (FLUT) and an Inverse Reshaping LUT (ILUT) is shown as follows.

) Divide the luminance range into jVbins (e.g., N=32)

2) Dérivé the bin-importance index for each bin from the syntax. E.g.:

For the Æ-th bin, if reshaper_model_min_bin_idx <=k<=reshaper_model_max_bin_idx bin_importance[k] = default_bin_importance[k] + reshaper_model_binjprofde_delta[k] ;

else bin_importance[k] = 0;

3) Automatically pre-assign codewords based on bin importance: for the /c-th bin, if bin_importance[k] == 0

M = 0;

- 18 20172 else if bin_importance[k] == 2

Mv = ;

else

M = M_a;

end

4) Build forward reshaping LUT based on codeword assignment for each bin, by accumulating the codeword assigned for each bin. The sum up should be less or equal to total codeword budget (e.g. 1024 for 10-bit full range). (E.g., see earliest C code).

5) Build inverse reshaping LUT (e.g., see earliest C code).

From a syntax point of view, alternative methods can also be applied. The key is to specify the number of codewords in each bin (e.g., for k = 0, 1,2,..., N-l) either explicitly or implicitly. In one embodiment, one can specify explicitly the number of codewords in each bin. In another embodiment, one can specify the codewords differentially. For example, the number of codewords in a bin can be determined using the différence of the number of codewords in the current bin and the previous bin (e.g., M Delta(Æ) = M(A) - M(k-l)). In another embodiment, one can specify the most commonly used number of codewords (say, M_m) and express the number of codewords in each bin as the différence of the codeword number in each bin from this number (e.g., M_Delta(k) = M(k)- M_m.

In an embodiment, two reshaping methods are supported. One is denoted as the “default reshaper,” where M/ is assigned to ail bins. The second, denoted as “adaptive reshaper,” applies the adaptive reshaper described earlier. The two methods can be signaled to a décoder as in Ref. [6] using a spécial flag, e.g., sps_reshaper_adaptive_flag (e.g., use sps_reshaper_adaptive_flag = 0 for the default reshaper and use sps_reshaper_adaptive_flag = 1 ) for the adaptive reshaper.

The invention is applicable to any reshaping architecture proposed in Ref. [6], such as: an external reshaper, in-loop intra only reshaper, in-loop residue reshaper, or in-loop hybrid reshaper. As an example, FIG. 2A and 2B depict example architectures for hybrid in-loop reshaping according to embodiments of this invention. In FIG. 2A, the architecture combines éléments from both an in-loop intra only reshaping architecture (top of the Figure) and an in-loop residual architecture (bottom part of the Figure). Under this architecture, for intra slices, reshaping is applied to the picture pixels, while for inter slices, reshaping is applied to the prédiction residuals. In the encoder (200_E), two new blocks are added to a traditional blockbased encoder (e.g., HEVC): a block (205) to estimate the forward reshaping function (e.g., according to FIG. 4), the forward picture reshaping block (210-1), and the forward residue reshaping block (210-2), which applies the forward reshaping to one or more of the color

- 1920172 components of the input video (117) or prédiction residuals. In some embodiments, these two operations may be performed as part of a single image reshaping block. Parameters (207) related to determining the inverse reshaping function in the décoder may be passed to the lossless encoder block of the video encoder (e.g., CABAC 220) so that they can be embedded into the coded bitstream (122). In intra-mode, intra prédiction (225-1), transform and quantization (T&Q), and inverse transform and inverse quantization (Q¹ &T¹) ail use reshaped pictures. In both modes, stored pictures in the DPB (215) are always in inverse-reshaped mode, which requires an inverse picture reshaping block (e.g. 265-1) or an inverse residual reshaping block (e.g. 265-2) before the loop filter (270-1, 270-2). As depicted in FIG. 2A, an Intra/Inter Slice switch allows switching between the two architectures depending on the slice type to be encoded. In another embodiment, in-loop filtering for Intra slices may be performed before inverse reshaping.

In the décoder (200_D), the following new normative blocks are added to a traditional block-based décoder: a block (250) (reshaper decoding) to reconstruct a forward reshaping function and an inverse reshaping function based on the encoded reshaping function parameters (207), a block (265-1) to apply the inverse reshaping function to the decoded data, and a block (265-2) to apply both the forward reshaping function and inverse reshaping function to generate the decoded video signal (162). For example, in (265-2) the reconstructed value is given by Rec = ILUT(FLUT(Pred)+Res), where FLUT dénotés the forward reshaping LUT and ILUT dénotés the inverse reshaping LUT.

In some embodiments, operations related to blocks 250 and 265 may be combined into a single processing block. As depicted in FIG. 2B, an Intra/Inter Slice switch allows switching between the two modes depending on the slice types in the encoded video pictures.

FIG. 3A depicts an example process (300_E) for encoding video using a reshaping architecture (e.g., 200_E) according to an embodiment of this invention. If there is no reshaping enabled (path 305), then encoding (335) proceeds as known in prior-art encoders (e.g., HEVC). If reshaping is enabled (path 310), then an encoder may hâve the options to either apply a predetermined (default) reshaping function (315), or adaptively détermine a new reshaping function (325) based on a picture analysis (320) (e.g., as described in FIG. 4). Following encoding a picture using a reshaping architecture (330), the rest of the encoding follows the same steps as the traditional coding pipeline (335). If adaptive reshaping (312) is employed, metadata related to the reshaping fùnction are generated as part of the “Encode Reshaper” step (327).

FIG. 3B depicts an example process (300_D) for decoding video using a reshaping architecture (e.g., 200_D) according to an embodiment of this invention. If there is no reshaping enabled (path 340), then after decoding a picture (350), output frames are generated (390) as in a

-2020172 traditional decoding pipeline. If reshaping is enabled (path 360), then, the décoder détermines whether to apply a pre-determined (default) reshaping function (375), or adaptively détermine the reshaping function (380) based on received parameters (e.g., 207). Following decoding using a reshaping architecture (385), the rest ofthe decoding follows the traditional decoding pipeline.

As described in Ref. [6] and earlier in this spécification, the forward reshaping LUT FwdLUT may be built by intégration, while the inverse reshaping LUT may be built based on a backward mapping using the forward reshaping LUT (FwdLUT). In an embodiment, the forward LUT may be built using piecewise linear interpolation. At the décoder, inverse reshaping can be done by using the backward LUT directly or again by linear interpolation. The piece-wise linear LUT is built based on input pivot points and output pivot points.

Let (XI, Yl), (X2, Y2) be two input pivot points and their corresponding output values for each bin. Any input value X between XI and X2 can be interpolated by the following équation:

Y = ((Y2-Y1)/(X2-X1)) * (X-Xl) + YL

In a fixed-point implémentation, the above équation can be rewritten as

Y = ((m * X + 2^^⁴) » FP PREC) + c where m and c dénoté the scalar and offset for linear interpolation and FP_PREC is a constant related to the fixed-point précision.

As an example, FwdLUT may be built as follows: Let the variable lutSize = ( 1 « BitDepth_Y ).

Let variables binNum = reshaper_model_number_bins_minusl + 1, and binLen = lutSize / binNum.

For the i-th bin, its two bracketing pivots (e.g., XI and X2) may be derived as Xl= i*binLen and X2 = (i+l)*binLen. Then:

binsLUT[ 0 ] =0;

for( i = 0; i < reshaper_model_number_bins_minusl + 1; i++) { binsLUT[ (i + 1) * binLen] = binsLUT[i*binLen] + RspCW[ i ];

-21 20172

Yl = binsLUT[i*binLen];

Y2 = binsLUT[(i + l)*binLen];

scale = ((Y2 - Yl) * (1 « FP_PREC) + (1 « (log2(binLen)- 1))) » (log2(binLen));

for (j = 1; j < binLen; j++) { binsLUT[i*binLen + j] = Yl + ((scale * j + (1 « (FP_PREC - 1))) » FP_PREC);

} }

FPPREC defines the fixed-point précision of the fractional part ofthe variables (e.g., FP_PREC = 14). In an embodiment, binsLUT[] may be computed in higher précision than the précision of FwdLUT. For example, binsLUT[] values may be computed as 32-bit integers, but FwdLUT may be the binsLUT values clipped at 16 bits.

Adaptive Threshold Dérivation

As described earlier, during reshaping, the codeword allocation may be adjusted using one or more thresholds (e.g., TH, THu, TH_L, and the like). In an embodiment, such thresholds may be generated adaptively based on the content characteristics. FIG. 5 depicts an example process for deriving such thresholds according to an embodiment.

) In step 505, the luminance range of an input image is divided into TVbins (e.g., N=32). For example, let TValso be denoted as PIC_ANALYZE_CW_BINS.

2) In step 510, one performs an image analysis to calculate luminance characteristics for each bin. For example, one may compute the percentage of pixels in each bin (to be denoted as BinHist[b], b - I, 2,..., TV), where

BinHist[b] = 100* (total pixels in bin b) / (total pixels in the picture), ( 10)

As discussed before, another good metric of image characteristics is the average variance (or standard déviation) of pixels in each bin, to be denoted BinVar[b]. BinVar[b] may be computed in “block mode” as var_bin(k) in the steps described in the section leading to équations (2) and (3). Alternatively, the block-based calculation could be refrned with pixel-based calculations. For example, dénoté as yf(f) the variance associated with a group of pixels surrounding the z-th pixel in a m x m neighborhood window (e.g., m = 5) with the z-th pixel at its center). For example, if ₍₁₁₎

-2220172 dénotés the mean value of pixels in a MN — m * m window (e.g., m = 5) surrounding the z'-th pixel with value x(i), then v/(0= ^ZSW)-ft(O)²· (12)

An optional non-linear mapping, such as vfii) = loglO(v/(z)+l), can be used to suppress the dynamic range of raw variance values. Then, the variance factor may be used in calculating the average variance in each bin as

BinVar[b] (13) where K_b dénotés the number of pixels in bin b.

3) In step 515, the average bin variances (and their corresponding indices) are sorted, for example and without limitation in descending order. For example, sorted BinVar values may be stored in BinVarSortDsd[b] and sorted bin indices may be stored in BinldxSortDsd[b]. As an example, using C code, the process may be described as:

for (int b = 0; b < PIC_ANALYZE_CW_BINS; b++ // initialize (unsorted) {

BinVarSortDsd[b] = BinVar[b];

BinldxSortDsd[b] = b;

//sort (see example code in Appendix 1) bubbleSortDsd(BinVarSortDsd, BinldxSortDsd, PIC_ANALYZE CW BINS);

An example plot of sorted average bin variance factors is depicted in FIG. 6A.

4) Given the bin histogram values computed in step 510, in step 520, one computes and stores a cumulative density function (CDF) according to the order of sorted average bin variances. For example, if the CDF is stored in array BinVarSortDsdCDF[b], in an embodiment:

BinVarSortDsdCDF[0 ] = BinHist[BinldxSortDsd[0]]; for (int b = 1; b < PIC_ANALYZE_CW_BINS; b++) {

BinVarSortDsdCDF[b] = BinVarSortDsdCDF[b - 1] + BinHist[BinldxSortDsd[b]] ;

-23 20172 i

An example plot (605) of a computed CDF, based on the data of FIG. 6A, is depicted in FIG. 6B. The pairs of CDF values versus sorted average bin variances: {x = BinVarSortDsd[b],p = BinVarSortDsdCDFfb]}, can be interpreted as: “there arey% pixels in the picture having variance greater than or equal to x” or “there are (100-y)% pixels in the picture having variance less than x.”

5) Finally, in step 525, given the CDF BinVarSortDsdCDF[BinVarSortDsd[b]] as a function of the sorted average bin-variance values, one can define thresholds based on bin variances and the accumulated percentages.

Examples for determining a single threshold or two thresholds are shown in FIG. 6C and 6D respectively. When only one threshold is being used (e.g., TH), as an example, TH may be defined as “the average variance where k % of the pixels hâve vf >TH.” Then TH can be calculated by finding the intersection of the CDF plot (605) at k % (e.g., 610) (e.g., the BinVarSortDsd[b] value where BinVarSortDsdCDF = k % ); For example, as depicted in FIG. 6C, for k = 50, TH = 2.5. Then, one can assign Mf codewords for bins having BinVar[b] < TH and M_acodewords for bins having BinVar[b] > TH. As a rule of thumb, it is préférable to assign a larger number of codewords to bins with smaller variance (e.g., Mf> 32> M_a, for 10-bit video signal with 32 bins).

When using two thresholds, an example of selecting TH_L and THu is depicted in FIG. 6D. For example, without loss of generality, TH_L may be defined as the variance where 80% pixels hâve vf > TH_l (then, in our example, THl = 2.3), and THu may be defined as the variance where 10% of ail pixels hâve vf >ΤΗ_υ (then, in our example, THu = 3.5). Given these thresholds, one can assign Mf codewords for bins having BinVar[b] < THl and M_a codewords for bins having BinVar[b] >THu- For bins having BinVar in between TH_l and THu, one may use the original numbers of codewords per bin (e.g., 32 for 5=10).

The techniques above can be easily extended to cases with more than two thresholds. The relationship can also be used to adjust the number of codewords (Mf, M_a, etc.). As a rule of thumb, in low-variance bins, one should assign more codewords to boost PSNR (and reduce MSE); for high-variance bins, one should assign less codewords to save bits.

In an embodiment, if the set of parameters (e.g., TH_L, THu, M_a, Mf, and the like) were obtained manually for spécifie content, for example, through an exhaustive manual parameter tuning, this automatic method may be applied to design a decision tree to categorize each content in order to set the optimum manual parameters automatically. For example, content categories include: film, télévision, SDR, HDR, cartoons, nature, action, and the like.

-2420172

To reduce complexity, in-loop reshaping may be constrained using a variety of schemes. If in-loop reshaping is adopted in a video coding standard, then these constrains should be normative to guarantee décoder simplifications. For example, in an embodiment, luma reshaping may be disabled for certain block coding sizes. For example, one could disable intra and inter reshaper mode in an inter slice when nTbW * nTbH < TH, where the variable nTbW spécifiés the transform block width and variable nTbH spécifiés the transform block height. For example, for TH=64, blocks with sizes 4x4, 4x8, and 8x4 are disabled for both intra and inter mode reshaping in inter-coded slices (or tiles).

Similarly, in another embodiment, one may disable luma-based, chroma residue scaling in intra mode in inter-coded slices (or tiles), or when having separate luma and chroma partitioning trees is enabled.

Interaction with other coding tools

Loop Filtering

In Ref. [6], it was described that a loop filter can operate either in the original pixel domain or in the reshaped pixel domain. In one embodiment it is suggested that loop filtering is performed in the original pixel domain (after picture reshaping). For example, in a hybrid in-loop reshaping architecture (200_E and 200_D), for intra picture, one will need to apply inverse reshaping (265-1) before the loop filter (270-1).

FIG. 2C and FIG. 2D depict alternative décoder architectures (200B_D and 200C_D) where inverse reshaping (265) is performed after the loop filtering (270), just before storing the decoded data into the decoded picture buffer (DPB) (260). In the proposed embodiments, compared to the architecture in 200 D, the inverse residue reshaping formula for inter slices is modified, and inverse reshaping (e.g., via an InvLUTÇ) function or look-up-table) is performed after loop filtering (270). In this way, inverse reshaping is performed for both intra slices and inter slices after loop filtering, and the reconstructed pixels before loop filtering for both intracoded CU and inter-coded CU are in the reshaped domain. After inverse reshaping (265), the output samples which are stored in the Reference DPB are ail in the original domain. Such an architecture allows for both slice-based adaption and CTU-based adaption for in-loop reshaping.

As depicted in FIG. 2C and FIG. 2D, in an embodiment, loop filtering (270) is performed in the reshaped domain for both intra-coded and inter-coded CUs, and inverse picture reshaping (265) happens only once, thus presenting a unified, and simpler architecture for both intra and inter-coded CUs.

-25 20172

For decoding intra-coded CUs (200B_D), Intra prédiction (225) is performed on reshaped neighboring pixels. Given residual Res, and a predicted sample PredSample, the reconstructed sample (227) is derived as:

RecSample = Res + PredSample. (14)

Given the reconstructed samples (227), loop filtering (270) and inverse picture reshaping (265) are applied to dérivé RecSamplelnDPB samples to be stored in DPB (260), where

RecSamplelnDPB = InvL UT(LPF(RecSample))) = = InvLUT(LPF(Res + PredSample'))), ( 15) where InvLUTÇ) dénotés the inverse reshaping function or inverse reshaping look-up table, and LPFQ dénotés the loop-filtering operations.

In traditional coding, inter/intra-mode decisions are based on computing a distortion function (dfuncO) between the original samples and the predicted samples. Examples of such functions include the sum of square errors (SSE), the sum of absolute différences (SAD), and others. When using reshaping, at the encoder side (not shown), CU prédiction and mode decision are performed on the reshaped domain. That is, for mode decision, distortion = dfunc(FwdLUT(SrcSample)-RecSample), (16) where FwdLUTÇ) dénotés the forward reshaping function (or LUT) and SrcSample dénotés the original image samples.

For inter-coded CUs, at the décoder side (e.g., 200C D), inter prédiction is performed using reference pictures in the non-reshaped domain in the DPB. Then in reconstruction block 275, the reconstructed pixels (267) are derived as:

RecSample = (Res + FwdLUT(PredSample)). (17)

Given the reconstructed samples (267), loop filtering (270) and inverse picture reshaping (265) are applied to dérivé RecSamplelnDPB samples to be stored in DPB, where

RecSamplelnDPB = InvLUT(LPF(RecSample))) = InvLUT(LPF( Res + FwdLUT(PredSample)))). (18)

-2620172

At the encoder side (not shown), intra prédiction is performed in the reshaped domain as

Res — FwdLUT(SrcSample) — PredSample, (19a) under the assumption that ail neighbor samples (PredSample) used for prédiction are already in the reshaped domain. Inter prédiction (e.g., using motion compensation) is performed in the non-reshaped domain (i.e., using reference pictures from the DPB directly), i.e.,

PredSample = MC(RecSampleinDPR), (19b) where MCI) dénotés the motion compensation function. For motion estimation and fast mode decision, where residue is not generated, one can compute distortion using distortion = dfunc(SrcSample-PredSample).

However, for full mode decision where residue is generated, mode decision is performed in the reshaped domain. That is, for full mode decision, distortion = dfunc(FwdLUT(SrcSample)-RecSample). (20)

Block level adaptation

As explained before, the proposed in-loop reshaper allows reshaping to be adapted at the CU level, e.g., to set the variable CU_reshaper on or off as needed. Under the same architecture, for an inter-coded CU, when CU_reshaper = off, the reconstructed pixels need to be in the reshaped domain, even if the CU_reshaper flag is set to off for this inter-coded CU.

RecSample = FwdLUT(Res + PredSample), (21 ) so that intra-prediction always has neighboring pixels in the reshaped domain. The DPB pixels can be derived as:

RecSamplelnDPB = InvL UT(LPF(RecSample))) = =InvL UT(LPF(FwdL UT(Res+PredSample))).

(22)

-2720172

For an intra-coded CU, depending on the encoding process, two alternative methods are proposed:

1) Ail intra-coded CUs are coded with CU_reshaper = on. In this case, no additional processing is needed because ail pixels are already in the reshaped domain.

2) Some intra-coded CUs can be coded using CU_reshaper = off. In this case, for CU_reshaper = off, when applying intra prédiction, one needs to apply inverse reshaping to the neighboring pixels so that intra prédiction is performed in the original domain and the final reconstructed pixels need to be in the reshaped domain, i.e.,

RecSample = FwdLUT(Res + InvLUT(PredSampley), (23) then

RecSamplelnDPB = InvLUT(LPF(RecSampleyf =

HnvLUT(LPF(FwdLUT(Res + InvLUT(PredSampleyyyyy (24)

In general, the proposed architectures may be used in a variety of combinations, such as in-loop intra-only reshaping, in-loop reshaping only for prédiction residuals, or a hybrid architecture which combines both intra, in-loop, reshaping and inter, residual reshaping. For example, to reduce the latency in the hardware decoding pipeline, for inter slice decoding, one can perform intra prédiction (that is, décodé intra CUs in an inter slice) before inverse reshaping. An example architecture (200D_D) of such an embodiment is depicted in FIG. 2E. In the reconstruction module (285), for Inter CUs (e.g., the Mux enables the output from 280 and 282), from équation (17),

RecSample = (Res + FwdLUT(PredSample)).

where FwdLUT(PredSample) dénotés the output of the inter predictor (280) followed by forward reshaping (282). Otherwise, for Intra CUs (e.g., the Mux enables the output from 284), the output of the reconstruction module (285) is

RecSample = (Res + IPredSample),

-28 20172 where IPredSample dénotés the output of the Intra Prédiction block (284). The inverse

Reshaping block (265-3), generates

Ycu~ InvLUT[RecSample].

Applying intra prédiction for inter slices in the reshaped domain is applicable to other embodiments as well, including those depicted in FIG. 2C (where inverse reshaping is performed after loop filtering) and FIG. 2D. In ail such embodiments, spécial care needs to be taken in the combined inter/intra prédiction mode (that is, when during reconstruction, some samples are from inter-coded blocks and some are from intra-coded blocks), since inter-prediction is in the original domain, but intra-prediction is in the reshaped domain. When combining data from both inter- and intra-predicted coded units, the prédiction may be performed in either of the two domains. For example, when the combined inter/intra prédiction mode is done in the reshaped domain, then

PredSampleCombined = PredSampelntra + FwdLUT(PredSampleInter) RecSample = Res + PredSampleCombined, that is, inter-coded samples in the original domain are reshaped before the addition. Otherwise, when the combined inter/intra prédiction mode is done in the original domain, then:

PredSampleCombined = InvLUT(PredSampeIntra) + PredSamplelnter RecSample = Res + FwdLUT(PredSampleCombined), that is, intra-predicted samples are inversed-reshaped to be in the original domain.

Similar considérations are applicable to the corresponding encoding embodiments as well, since encoders (e.g., 200_E) include a décoder loop that matches the corresponding décoder. As discussed earlier, équation (20) describes an embodiment where mode decision is performed in the reshaped domain. In another embodiment, mode decision may be performed in the original domain, that is:

distortion = dfunc(SrcSample-InvLUT(RecSampley).

For luma-based chroma QP offset or chroma residue scaling, the average CU luma value (Ÿ_cu ) ^can always be calculated using the predicted value (instead of the reconstructed value) for minimum latency.

-2920172

Chroma QP dérivations

As in Ref. [6], one may apply the same proposed chromaDQP dérivation process to balance the luma and chroma relationship caused by the reshaping curve. In a embodiment, one can dérivé a piece-wise chromaDQP value based on the codeword assignment for each bin. For example: for the A-th bin, scale^ = ; (25) chromaDQP = 6*log2(scôz/ek) ;

end

Encoder Optimization

As described in Ref. [6], it is recommended to use pixel-based weighted distortion when lumaDQP is enabled. When reshaping is used, in an example, the weight needed is adjusted based on the reshaping function (/(x)). For example:

W_rsp =/'(x)², (26) where f (x) dénotés the slope of reshaping fùnction/(x).

In another embodiment, one can dérivé pieeewise weights directly based on codeword assignment for each bin. For example: for the A-th bin,

W_rspW=^y. (27)

For a chroma component, weight can be set to 1 or some scaling factor sf To reduce chroma distortion, sf can be set larger than 1. To increase chroma distortion, sf can be set larger than 1. In one embodiment, sf can be used to compensate for équation (25). Since chromaDQP can be only set to integer, we can use sf to accommodate the décimal part of chromaDQP: thus, 2( (chromaDQP-INT(chromaDQP)) /3)

In another embodiment, one can explicitly set the chromaQPOffset value in the Picture Parameter Set (PPS) or a slice header to control chroma distortion.

The reshaper curve or mapping function does not need to be fixed for the whole video sequence. For example, it can be adapted based on the quantization parameter (QP) or the target bit rate. In one embodiment, one can use a more aggressive reshaper curve when the bit rate is low and use less aggressive reshaping when the bit rate is relatively high. For example, given 32 bins in 10-bit sequences, each bin has initially 32 codewords. When the bit rate is relative low, one can use codewords between [28 40] to choose codewords for each bin. When the bit rate is

-3020172 high, one can choose codewords between [31 33] for each bin or one can simply use an identity reshaper curve.

Given a slice (or a tile), reshaping at the slice (tile) level can be performed in a variety of ways that may trade-off coding efficiency with complexity, including: 1) disable reshaping in intra slices only; 2) disable reshaping in spécifie inter slices, such as inter slices on particular temporal level(s), or in inter slices which are not used for reference pictures, or in inter slices which are considered to be less important reference pictures. Such slice adaption can also be QP/rate dépendent, so that different adaption rules can be applied for different QPs or bit rates.

In an encoder, under the proposed algorithm, a variance is computed for each bin (e.g., BinVar(b) in équation (13)). Based on that information, one can allocate codewords based on each bin variance. In one embodiment, BinVar(b) may be inversely linearly mapped to the number of codewords in each bin b. In another embodiment, non-linear mappings such as (BinVaribf, sqrt(BinVar(b)), and the like, may be used to inversely map the number of codewords in bin b. In essence, this approach allows an encoder to apply arbitrary codewords to each bin, beyond the simpler mapping used earlier, where the encoder allocated codewords in each bin using the two upper-range values Mf and M_a (e.g., see FIG. 6C), or the three upper-range values, 32, or M_a, (e.g., see FIG. 6D).

As an example, FIG. 6E depicts two codeword allocation schemes based on BinVar(b) values, plot 610 depicts the codeword allocation using two thresholds while plot 620 depicts codeword allocation using inverse linear mapping, where the codeword allocation for a bin is inversely proportional to its B inVar (b) value. For example, in an embodiment, the following code may be applied to dérivé the number of codewords (bin_cw) in a spécifie bin:

alpha = (minCW - maxCW) / (maxVar- minVar);

beta = (maxCW*maxVar - minCW*minVar) / (maxVar - minVar);

bin_cw = round(alpha * bin_var + beta);, where minVar dénotés the minimum variance across ail bins, maxVar dénotés the maximum variance across ail bins, and minCW, maxCW dénoté the minimum and maximum number of codewords per bin, as determined by the reshaping model.

Luma-based Chroma QP offset refinement

- 31 20172

In Ref. [6], to compensate for the interaction between luma and chroma, an additional chroma QP offset (denoted as chromaDQP or cQPO) and a luma-based chroma residual scaler (cScale) were defined. For example:

chromaQP = QP_luma + chromaQPOffset + cQPO, (28) where chromaQPOffset dénotés a chroma QP offset, and QP_luma dénotés the luma QP for the coding unit. As presented in Ref. [6], in an embodiment cQPO = -6 * log2(FwdLUTTcJ = - dQP(Ÿ_cu), (29) where FwdLUT' dénotés the slope (first order dérivative) of the FwdLUTÇ). For an inter slice, Ÿ_cu dénotés the average predicted luma value of the CU. For an intra slice, Ÿ_cu dénotés the inverse reshaped value of the average predicted Luma value of the CU. When dual tree coding is used for a CU (that is, the luma and chroma components hâve two separate coding trees and therefore luma reconstruction is available before chroma coding starts), the average reconstructed luma value of the CU can be used to dérivé the cQPO value. The cScale scaling factor was defined as cScale = FwdLUT' [Peu] ⁼ pow(2, -cQPO/6), (30) where y = pow(2, x) dénotés the y = 2* function.

Given the non-linear relationship between luma-derived QP values (denoted as qPi) and the final chroma QP values (denoted as Qpc) (for example, see Table 8-10, “Spécification of Qp_cas a function of qPi for ChromaArrayType equal to 1” in Ref [4]), in an embodiment cQPO and cScale may be further adjusted as follows.

Dénoté as f_QPi2QPcÇ) a mapping between adjusted luma and chroma QP values, e.g., as in Table 8-10 of Ref. [4], then chromaQP_actual = f QPi2QPc\chromaQP) = ff_Q^Pi2QPc\QPJuma + chromaQPOffset + cQPO] . (31)

For scaling the chroma residual, the scale need to be calculated based on the real différence between the actual chroma coding QP, both before applying cQPO and after applying cQPO'.

-3220172

QPcBase = f OPi2OPc\QP luma + chromaQPOffset}', QPcFinal = f QPi2QPc\QP luma + chromaQPOffset + cQPO}', (32) cQPO_refine = QPcFinal - QpcBase', cScale = pow(2, - cQPO_refine/6).

In another embodiment, one can absorb chromaQPOffset into cScale too. For example,

QPcBase = f_QPi2QPc[QP luma}',

QPcFinal = f QPi2QPc\QP luma + chromaQPOffset + cQPO}', (33) cTotalQPO_refine = QPcFinal — QpcBase', cScale = pow(2, - cTotalQPO_refine/6).

As an example, as described in Ref. [6], in an embodiment:

Let CSCALEFPPREC =16 dénoté a précision parameter • Forward scaling: after chroma residual is generated, before transformation and quantization:

- C_Res = C_orig - C_pred

- C_Res_scaled = C Res * cScale + (1 « (CSCALE_FP_PREC - 1 ))) » CSCALEFPPREC • Inverse scaling: after chroma inverse quantization and inverse transformation, but before reconstruction:

- C_Res_inv = (C_Res_scaled « CSCALE FP PREC) / cScale — CReco = CPred + CResinv;

In an alternative embodiment, the operations for in-loop chroma reshaping may be expressed as follows. At the encoder side, for the residue (CxRes = CxOrg - CxPred) of chroma component Cx (e.g., Cb or Cr) of each CU or TU,

CxResScaled = CxRes * cScale[Ÿ_CL/], (34) where CxResScaled is the sealed Cb or Cr residue signal of the CU to be transformed and quantized. At the décoder side. CxResScaled is the sealed chroma residue signal after inverse quantization and transform, and

CxRes = CxResScale / cScale[Ÿ_CL/}. (35)

-33 20172

The final reconstruction of chroma component is

CxRec = CxPred + CxRes.

(36)

This approach allows the décoder to start inverse quantization and transform operations for chroma decoding immediately after syntax parsing. The cScale value being used for a CU may be shared by the Cb and Cr components, and from équations (29) and (30), it may be derived as:

cQPO^_clJ} = —6 * log2(FwdLUT'\Y_cu\) (37)

-cQPO[Ÿ_cu} cScale[Y_CL/] = FwdLUT'[Y_cu] = 2 e , where Ÿ_cu is the average predicted luma value of current CU in inter slices (where dual tree coding is not used and therefore reconstructed luma is not available), and Ÿ_cu is the average reconstructed luma value of current CU in intra slices (where dual tree coding is used). In an embodiment, the scales are calculated and stored with 16-bit fixed point integers and the scaling operations at both the encoder and décoder side are implemented with fixed point integer arithmetic. FwdLUT'[Ÿ_cu] dénotés the first dérivative of the forward reshaping function.

Assuming a piece-wise linear représentation of the curve, then FwdLUT’(Y) = (CW[#]/32) when Y belongs to the Ar-th bin. To reduce hardware latency, in another embodiment (see FIG. 2E), Ÿ_cu can use the average predicted luma value of the current CU for both intra and inter modes, regardless of the slice type and whether dual trees are used or not. In another embodiment, Ÿ_cucan be derived using reconstructed CUs (such as those in the upper row and/or left column of the current CU) for intra and/or inter mode. In another embodiment, a region-based average, médian, and the like, luma value or cScale value can be sent in the bitstream explicitly using high-level syntax.

Using cScale is not limited to chroma residue scaling for in-loop reshaping. The same method can be applied for out-of-loop reshaping as well. In an out of loop reshaping, cSccde may be used for chroma samples scaling. The operations are the same as in the in-loop approach.

At the encoder side, when computing the chroma RDOQ, the lambda modifier for chroma adjustment (either when using QP offset or when using chroma residue scaling) also needs to be calculated based on the refined offset:

Modifier = pow(2, - cQPO_refine/yy, New_lambda = Old_lambda / Modifier.

(38)

-3420172

As noted in équation (35), using cScale may require a division in the décoder. To simplify the décoder implémentation, one may décidé to implement the same functionality using a division in the encoder and apply a simpler multiplication in the décoder. For example, let cScalelnv = (M cScale) then, as an example, on an encoder cResScale = CxRes *cScale = CxRes / (1/cScale) = CxRes/cScalelnv, and on the décoder

CxRes = cResScale/cScale = CxRes * (1/cScale) = CxRes *cScalelnv.

In an embodiment, each luma-dependent chroma scaling factor may be calculated for a corresponding luma range in the piece-wise linear (PWL) représentation instead of for each luma codeword value. Thus, chroma scaling factors may be stored in a smaller LUT (e.g., with 16 or 32 entries), say, cScale!nv[binIdx\, instead of the 1024-entry LUT (for 10-bit Luma codewords) (say, cScale\Y]). The scaling operations at both the encoder and the décoder side may be implemented with fixed point integer arithmetic as follows:

c = sign(c) * ((abs(c) % + 2^^-^-^-¹) » CSCALE_FP_PREC), where c is the chroma residual, s is the chroma residual scaling factor from cScalelnv\binldx\, binldx is decided by the corresponding average luma value, and CSCALE_FP_PREC is a constant value related to précision.

In an embodiment, while the forward reshaping function may be represented using N equal segments (e.g., V=8, 16, 32, and the like), the inverse représentation will comprise nonlinear segments. From an implémentation point of view, it is désirable to hâve a représentation of the inverse reshaping function using equal segments as well; however, forcing such a représentation may cause loss in coding efficiency. As a compromise, in an embodiment one may be able to construct an inverse reshaping function with a “mixed” PWL représentation, combining both equal and unequal segments. For example, when using 8 segments, one may first divide the whole range to two equal segments, and then subdivide each of these into 4 unequal segments. Altematively, one may divide the whole range into 4 equal segments and then subdivide each one into two unequal segments. Altematively, one may first divide the whole range into several unequal segments, then subdivide each unequal segment into multiple equal

-3520172 segments. Alternatively, one may first divide the whole range into two equal segments, and then subdivide each equal segment into equal sub-segments, where the segment length in each group of sub-segments is not the same.

For example, without limitation, with 1,024 codewords, one could hâve: a) 4 segments with 150 codewords each and two segments with 212 codewords each, or b) 8 segments with 64 codewords each and 4 segments with 128 codewords each. The general purpose of such a combination of segments is to reduce the number of comparisons required to identify the PWLpiece index given a code value, thus simplifying hardware and software implémentations.

In an embodiment, for a more efficient implémentation related to chroma residue scaling, the following variations may be enabled:

• Disable the chroma residual scaling when separate luma/chroma trees are used • Disable the chroma residual scaling for 2x2 chroma; and • Use the prédiction signal rather than the reconstruction signal for intra as well as inter coded units

As an example, given the décoder depicted in FIG. 2E (200D_D) to process the luma component, FIG. 2F depicts an example architecture (200D_DC) for processing the corresponding chroma samples.

As depicted in FIG. 2F, compared to FIG. 2E, the following changes are made when processing chroma:

• The forward and reverse reshaping blocks (282 and 265-3) blocks are not used • There is a new Chroma residual scaling block (288), in effect replacing the inverse reshaping block for luma (265-3); and • The reconstruction block (285-C) is modified to handle color residuals in the original domain, as described in équation (36): CxRec = CxPred + CxRes.

From équation (34), at the décoder side, let CxResScaled dénoté the extracted sealed chroma residual signal after inverse quantization and transform (before block 288), and let

CxRes = CxResScaled * C_Scai_elnv dénoté the rescaled chroma residual generated by the Chroma Residual scaling block (288) to be used by the reconstruction unit (285-C) to compute CxRec = CxPred + CxRes, where CxPred is generated either by the Intra (284) or Inter (280) Prédiction blocks.

The C_ScaLeIriv value being used for a Transform Unit (TU) may be shared by the Cb and Cr components and can be computed as foliows:

- 36 20172 • If in intra mode, then compute the average of intra predicted luma values;

• if in inter mode, then compute the average of forward reshaped inter-predicted luma values. That is, the average luma value avgY'_TU is computed in the reshaped domain; and • If in combined merge and intra prédiction, then compute the average of combined predicted luma values. For example, the combined predicted luma values may be computed according to Appendix 2, section 8.4.6.6.

• In an embodiment, one can apply a LUT to compute C_Scai_eInv based on avgY'_TU. Altematively, given a piece-wise-linear (PWL) représentation of the reshaping function one may fmd the index idx where the value avgY'_TU belongs to in the inverse-mapping PWL.

• Then, C_Scalelnv = cScaleInv[z'5x]

An example implémentation, as it is applicable to the Versatile Video Coding codée (Ref. [8]), currently under development by ITU and ISO, can be found in Appendix 2 (e.g., see Section 8.5.5.1.2).

Disabling luma-based chroma residual scaling for intra slices with dual trees may cause some loss in coding efficiency. To improve the effects of chroma reshaping, the following methods may be used:

1. The chroma scaling factor may be kept the same for the entire frame depending on the average or médian of the luma sample values. This will remove the TU-level dependency on luma for chroma residue scaling.

2. Chroma scaling factors can be derived using reconstructed luma values from the neighboring CTUs.

3. An encoder can dérivé the chroma scaling factor based on source luma pixels and send it in the bitstream at the CU/CTU level (e.g., as an index to the piece-wise représentation of the reshaping function). Then, the décoder may extract the chroma scaling factor from the reshaping function without depending on luma data.

• The scale factor for a CTU can be derived and sent only for Intra slices; but can be used for Inter slices as well. The additional signaling cost occurs only for Intra slices, thus having no impact in coding efficiency in random access.

4. Chroma can be reshaped at the frame level as luma, with the chroma reshaping curve being derived from the luma reshaping curve based on a corrélation analysis between luma and chroma. This éliminâtes chroma residue scaling completely.

- 37 20172 delta_qp Application

In AVC and HEVC, the parameter delta_qp is allowed to modify the QP value for a coding block. In an embodiment, one can use the luma curve in the reshaper to dérivé the delta_qp value. One can dérivé a piece-wise lumaDQP value based on the codeword assignment for each bin. For example:

for the £-th bin, scale^ = ; (39) lumaDQPk = INT(6*log2(sca/ek)), where INT() can be CEIL(), ROUNDQ or FLOORQ. The encoder can use a function of luma, e.g., average(luma), min(luma), max(luma), and the like, to find the luma value for that block, then use the corresponding lumaDQP value for that block. To get the rate-distortion benefit, from équation (27), one can use weighted distortion in mode decision and set

W_rsp(k) — scale^ .

Reshaping and considérations for the number of bins

In typical 1O-bit video coding, it is préférable to use at least 32 bins for the reshaping mapping; however, to simplify the décoder implémentation, in an embodiment, one may use fewer bins, say 16, or even 8 bins. Given that an encoder may already being using 32 bins to analyze the sequence and dérivé the distribution codeword, one can reuse the original 32-bin codeword distribution and dérivé the 16 bins-codewords by adding the corresponding two 16bins inside each 32 bins, i.e., for i = 0 to 15

CWInl6Bin[i] = CWIn32Bin[2i] + CWIn32Bin[2i+l],

For the chroma residue scaling factor, one can simply divide the codeword by 2, and point to the 32-bins chromaScalingFactorLUT. For example, given

CWIn32Bin[32]={ 0 0 33 38 38 38 38 38 38 38 38 38 38 38 38 38 38 33 33 33 33 33 33 33 33 33 33 33 33 33 0 0},

-3820172 the corresponding 16-bins CW allocation is

CWInl6Bin[16] = { 0 71 76 76 76 76 76 76 71 66 66 66 66 66 66 0 }.

This approach can be extended to handle even fewer bins, say 8, then, for i = 0 to 7

CWIn8Bin[i] = CWInl6Bin[2i] + CWInl6Bin[2i+l],

When using a narrow range of valid codewords (e.g., [64, 940] for 10-bit signais and [64, 235] for 8-bit signais), care should be taken that the first and last bin do not consider mapping to reserved codewords. For example, for a 10-bit signal, with 8 bins, each bin will hâve 1024/8 = 128 codewords, and the first bin will be [0, 127]; however, since the standard codeword range is [64, 940], the first bin should only consider codewords [64, 127]. A spécial flag, (e.g., video_full_range_flag = 0 ) may be used to notify the décoder that the input video has a narrower range than the full range [0, 2^bltdepth - 1] and that spécial care should be taken to not generate illégal codewords when processing the first and last bins. This is applicable to both luma and chroma reshaping.

As an example, and without limitation, Appendix 2 provides an example syntax structure and associated syntax éléments to support reshaping in the ISO/ITU Video Versatile Codée (VVC) (Ref. [8]) according to an embodiment using the architectures depicted in FIG. 2C, FIG. 2E, and FIG. 2F, where the forward reshaping function comprises 16 segments.

References

Each one of the references listed herein is incorporated by reference in its entirety.

[1] Exploratory Test Model for HDR extension ofHEVC, K. Minoo et al., MPEG output document, JCTVC-W0092 (m37732), 2016, San Diego, USA.

[2] PCT Application PCT/US2016/025082, In-Loop Block-Based Image Reshaping in High Dynamic Range Video Coding, filed on March 30, 2016, also published as WO 2016/164235, by G-M. Su.

[3] U.S. Patent Application 15/410,563, Content-Adaptive Reshaping for High Codeword représentation Images, filed on Jan. 19, 2017, by T. Lu et al.

-3920172

[4] ITU-T H.265, “High efficiency video coding,” ITU, Dec. 2016.

[5] PCT Application PCT/US2016/042229, Signal Reshaping and Coding for HDR and Wide

Color Gamut Signais, filed on July 14, 2016, also published as WO 2017/011636, by P. Yin et al.

[6] PCT Patent Application PCT/US2018/040287, Integrated Image Reshaping and Video Coding, filed on June 29, 2018, by T. Lu et al.

[7] J. Froehlich et al., “Content-Adaptive Perceptual Quantizer for High Dynamic Range Images,” U.S. Patent Application Publication Ser. No. 2018/0041759, Feb. 08, 2018.

[8] B. Bross, J. Chen, and S. Liu, “Versatile Video Coding (Draft 3),” JVET output document, JVET-L1001, v9, uploaded, Jan. 8, 2019.

Example Computer System Implémentation

Embodiments of the présent invention may be implemented with a computer system, Systems configured in electronic circuitry and components, an integrated circuit (IC) device such as a microcontroller, a field programmable gâte array (FPGA), or another configurable or programmable logic device (PLD), a discrète time or digital signal processor (DSP), an application spécifie IC (ASIC), and/or apparatus that includes one or more of such Systems, devices or components. The computer and/or IC may perform, control, or execute instructions relating to signal reshaping and coding of images, such as those described herein. The computer and/or IC may compute any of a variety of parameters or values that relate to the signal reshaping and coding processes described herein. The image and video embodiments may be implemented in hardware, software, firmware and various combinations thereof.

Certain implémentations of the invention comprise computer processors which execute software instructions which cause the processors to perform a method of the invention. For example, one or more processors in a display, an encoder, a set top box, a transcoder or the like may implement methods related to signal reshaping and coding of images as described above by executing software instructions in a program memory accessible to the processors. The invention may also be provided in the form of a program product. The program product may comprise any non-transitory and tangible medium which carries a set of computer-readable signais comprising instructions which, when executed by a data processor, cause the data processor to execute a method of the invention. Program products according to the invention may be in any of a wide variety of non-transitory and tangible forms. The program product may comprise, for example, physical media such as magnetic data storage media including floppy diskettes, hard disk drives, optical data storage media including CD ROMs, DVDs, electronic data storage media including

-4020172

ROMs, flash RAM, or the like. The computer-readable signais on the program product may optionally be compressed or encrypted.

Where a component (e.g. a software module, processor, assembly, device, circuit, etc.) is referred to above, unless otherwise indicated, reference to that component (including a reference to a means) should be interpreted as including as équivalents of that component any component which performs the function of the described component (e.g., that is functionally équivalent), including components which are not structurally équivalent to the disclosed structure which performs the function in the illustrated example embodiments of the invention.

Equivalents, Extensions, Alternatives and Miscellaneous

Example embodiments that relate to the efficient signal reshaping and coding of images are thus described. In the foregoing spécification, embodiments of the présent invention hâve been described with reference to numerous spécifie details that may vary from implémentation to implémentation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicants to be the invention, is the set of claims that issue from this application, in the spécifie form in which such claims issue, including any subséquent correction. Any définitions expressly set forth herein for terms contained in such claims shall govem the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The spécification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Enumerated Exemplary Embodiments

The invention may be embodied in any of the forms described herein, including, but not limited to the following Enumerated Example Embodiments (EEEs) which describe structure, features, and functionality of some portions of the présent invention.

EEE 1. A method for adaptive reshaping of a video sequence with a processor, the method comprising:

accessing with a processor an input image in a first codeword représentation; and generating a forward reshaping function mapping pixels of the input image to a second codeword représentation, wherein the second codeword représentation allows for a more efficient compression than the first codeword représentation, wherein generating the forward reshaping function comprises:

dividing the input image into multiple pixel régions;

-41 20172 assigning each of the pixel régions to one of multiple codeword bins according to a first luminance characteristic of each pixel région;

computing a bin metric for each of the one of the multiple codeword bins according to a second luminance characteristic of each of the pixel régions assigned to each codeword bin;

allocating a number of codewords in the second codeword représentation to each codeword bin according to the bin metric of each codeword bin and a rate distortion optimization criterion;

and generating the forward reshaping function in response to the allocation of codewords in the second codeword représentation to each of the multiple codeword bins.

EEE 2. The method of EEE 1, wherein the first luminance characteristic of a pixel région comprises the average luminance pixel value in the pixel région.

EEE 3. The method of EEE 1, wherein the second luminance characteristic of a pixel région comprises the variance of luminance pixel values in the pixel région.

EEE 4. The method of EEE 3, wherein computing a bin metric for a codeword bin comprises computing the average of the variances of luminance pixel values for ail pixels régions assigned to the codeword bin.

EEE 5. The method of EEE 1, wherein allocating a number of codewords in the second codeword représentation to a codeword bin according to its bin metric comprises: assigning no codewords to the codeword bin, if no pixel régions are assigned to the codeword bin;

assigning a first number of codewords if the bin metric of the codeword bin is lower than an upper threshold value; and assigning a second number of codewords to the codeword bin otherwise.

EEE 6. The method of EEE 5, wherein for a first codeword représentation with a depth of B bits and a second codeword représentation with a depth of B_o bits and N codeword bins, the first number of codewords comprises Mf= CEIL((2^S7(CW2-CW1))* M_a) and the second number of codewords comprise M_a = 2 /N, where CW1 < CW2 dénoté two codewords in [0 2 -1],

EEE 7. The method of EEE 6, wherein CW1 = 16*2^(S‘⁸⁾ and CW2 = 235*2^(β-8).

-4220172

EEE 8. The method of EEE 5, wherein determining the upper threshold comprises: defining a set of potential threshold values;

for each threshold in the set of threshold values:

generating a forward reshaping function based on the threshold;

encoding and decoding a set of input test frames according to the reshaping function and a bit rate R to generate an output set of decoded test frames; and computing an overall rate-distortion optimization (RDO) metric based on the input test frames and the decoded test frames;

and selecting as the upper threshold the threshold value in the set of potential threshold values for which the RDO metric is minimum.

EEE 9. The method of EEE 8, wherein computing the RDO metric comprises computing J = D + λ R, where D dénotés a measure of distortion between pixel values of the input test frames and corresponding pixel values in the decoded test frames, and λ dénotés a Lagrangian multiplier.

EEE 10. The method of EEE 9, where D is a measure of the sum of square différences between corresponding pixel values of the input test frames and the decoded test frames.

EEE 11. The method of EEE 1, wherein allocating a number of codewords in the second codeword représentation to a codeword bin according to its bin metric is based on a codeword allocation look-up table, wherein the codeword allocation look-up table defines two or more thresholds dividing a range of bin metric values into segments and provides the number of codewords to be allocated to a bin with a bin metric within each segment.

EEE 12. The method of EEE 11, wherein given a default codeword allocation to a bin, bins with large bin metrics are assigned fewer codewords than the default codeword allocation and bins with small bin metrics are assigned more codewords than the default codeword allocation.

EEE 13. The method of EEE 12, wherein for a first codeword représentation with B bits and N bins, the default codeword allocation per bin is given by M_a = 2^B/N.

EEE 14. The method of EEE 1, further comprising generating reshaping information in response to the forward reshaping function, wherein the reshaping information comprises one or more of:

-43 20172 a flag indicating a minimum codeword bin index value to be used in a reshaping reconstruction process, a flag indicating a maximum codeword bin index value to be used in the reshaping construction process, a flag indicating a reshaping model profile type, wherein each model profile type is associated with default bin-related parameters, or one or more delta values to be used to adjust the default bin-related parameters.

EEE 15. The method of EEE 5, further comprising assigning to each codeword bin a bin importance value, wherein the bin importance value is:

if no codewords are assigned to the codeword bin;

if the first value of codewords is assigned to the codeword bin; and 1 otherwise.

EEE 16. The method of EEE 5, wherein determining the upper threshold comprises: dividing the luminance range of the pixel values in the input image into bins; for each bin, determining a bin-histogram value and an average bin-variance value, wherein for a bin, the bin-histogram value comprises the number of pixels in the bin over the total number of pixels in the image and the average bin-variance value provides a metric of the average pixel variance of the pixels in the bin;

sorting the average bin variance values to generate a sorted list of average bin variance values and a sorted list of average bin variance-value indices;

computing a cumulative density function as a function of the sorted average bin variance values based on the bin-histogram values and the sorted list of average bin variance-value indices; and determining the upper threshold based on a criterion satisfied by values of the cumulative density function.

EEE 17. The method of EEE 16, wherein computing the cumulative density function comprises computing:

BinVarSortDsdCDF[0] = BinHist[BinldxSortDsd[0]]; for (int b = 1; b < PIC_ANALYZE_CW_BINS; b++) {

BinVarSortDsdCDF[b] = BinVarSortDsdCDF[b - 1] + BinHist[BinldxSortDsd[b]] ;

} ,

-4420172 where b dénotés a bin number, pic_analyze_cw_bins dénotés the total number of bins,

BinVarSortDsdCDF [b] dénotés the output of the CDF function for bin b, BinHist [i ] dénotés the bin-histogram value for bin i, and BinldxSortDsd [ ] dénotés the sorted list of average bin variance-value indices.

EEE 18. The method of EEE 16, wherein under a criterion that for k % of the pixels in the input image the average bin variance is larger or equal than the upper threshold, the upper threshold is determined as the average bin variance value for which the CDF output is k %.

EEE 19. The method of EEE 18, wherein k= 50.

EEE 20. In a décoder, a method to reconstruct a reshaping function, the method comprising: receiving in a coded bitstream syntax éléments characterizing a reshaping model, wherein the syntax éléments include one or more of a flag indicating a minimum codeword bin index value to be used in a reshaping construction process, a flag indicating a maximum codeword bin index value to be used in a reshaping construction process, a flag indicating a reshaping model profile type, wherein the model profile type is associated with default bin-relating parameters, including bin importance values, or a flag indicating one or more delta bin importance values to be used to adjust the default bin importance values defined in the reshaping model profile;

determining based on the reshaping model profile the default bin importance values for each bin and an allocation list of a default numbers of codewords to be allocated to each bin according to the bin’s importance value;

for each codeword bin:

determining its bin importance value by adding its default bin importance value to its delta bin importance value;

determining the number of codewords to be allocated to the codeword bin based on the bin’s bin importance value and the allocation list; and generating a forward reshaping function based on the number of codewords allocated to each codeword bin.

EEE 21. The method of EEE 20, wherein determining M_k, the number of codewords allocated to the Æ-th codeword bin, using the allocation list further comprises:

-45 20172 for the £-th bin:

if bin_importance[k] == 0 then My = 0;

else if bin_importance[k] == 2 then = Mf ;

else

M_k = M_a, where M_a and Mf are éléments of the allocation list and bin_importance[k] dénotés the bin importance value of the Lth bin.

EEE 22. In a décoder comprising one or more processors, a method to reconstruct coded data, the method comprising:

receiving a coded bitstream (122) comprising one or more coded reshaped images in a first codeword représentation and metadata (207) related to reshaping information for the coded reshaped images;

generating (250) an inverse reshaping function based on the metadata related to the reshaping information, wherein the inverse reshaping function maps pixels of the reshaped image from the first codeword représentation to a second codeword représentation;

generating (250) a forward reshaping function based on the metadata related to the reshaping information, wherein the forward reshaping function maps pixels of an image from the second codeword représentation to the first codeword représentation;

extracting from the coded bitstream a coded reshaped image comprising one or more coded units, wherein for one or more coded units in the coded reshaped image:

for an intra-coded coding unit (CU) in the coded reshaped image:

generating first reshaped reconstructed samples of the CU (227) based on reshaped residuals in the CU and first reshaped prédiction samples;

generating (270) a reshaped loop filter output based on the first reshaped reconstructed samples and loop-filter parameters;

applying (265) the inverse reshaping function to the reshaped loop filter output to generate decoded samples of the coding unit in the second codeword représentation; and storing the decoded samples of the coding unit in the second codeword représentation in a reference buffer;

for an inter-coded coding unit in the coded reshaped image:

-4620172 applying the forward reshaping function to prédiction samples stored in the reference buffer in the second codeword représentation to generate second reshaped prédiction samples;

generating second reshaped reconstructed samples of the coding unit based on reshaped residuals in the coded CU and the second reshaped prédiction samples;

generating a reshaped loop filter output based on the second reshaped reconstructed samples and loop-filter parameters;

applying the inverse reshaping function to the reshaped loop filter output to generate samples of the coding unit in the a second codeword représentation; and storing the samples of the coding unit in the second codeword représentation in a reference buffer;

and generating a decoded image based on the stored samples in the reference buffer.

EEE 23. An apparatus comprising a processor and configured to perform a method as recited in any one of the EEEs 1 -22.

EEE 24. A non-transitory computer-readable storage medium having stored thereon computerexecutable instruction for executing a method with one or more processors in accordance with any one of the EEEs 1-22.

-4720172

Example implémentation of bubble sort.

Appendix 1 void bubbleSortDsd(double* array, int * ( int i, j;

bool swapped;

for (i = 0; i < n - 1; i++) {

swapped = false;

for (j =0; j < n - i - 1; j++) {

if (arraytj] < array[j + 1]) swap(sarray[j], &array[j + 1]);

swap(&idx[j], &idx[j + 1]); swapped = true;

} }

if (swapped == false) break;

idx, int n)

-48 20172

As an example, this Appendix provides an example syntax structure and associated syntax éléments according to an embodiment to support reshaping in the Versatile Video Codée (VVC) (Ref. [8]), currently under joint development by ISO and ITU. New syntax éléments in the existing draft version are either highlighted or explicitly noted. Equation numbers like (8-xxx) dénoté placeholders to be updated, as needed, in the final spécification.

Appendix 2

In 7.3.2.1 Sequence parameter set RBSP syntax

seq parameter set rbsp( ) {	Descriptor
sps seq parameter set id	ue(v)
intraonlyconstraintflag	u(l)
maxbitdepthconstraintidc	u(4)
maxchromaformatconstraintidc	u(2)
frameonlyconstraintflag	u(l)
noqtbttdualtreeintra constraintflag	u(l)
nosaoconstraintflag	u(l)
noalfconstraintjflag	u(l)
no_pcm_constraint_flag	u(l)
no temporal mvp constraint flag	u(l)
nosbtmvpconstraintflag	u(l)
noamvrconstraintflag	u(l)
nocclmconstraintflag	u(l)
noaffïnemotionconstraintflag	u(l)
noladfconstraintflag	u(l)
nodepquantconstraintflag	u(l)
no_sign_data_hiding_constraint_flag	u(l)
chromaformatidc	ue(v)
if( chroma_format_idc = = 3 )
separate_colour_plane_flag	u(l)
picwidthinlumasamples	ue(v)
picheightinlumasamples	ue(v)
bit_depth_luma_minus8	ue(v)
bit_depth_chroma_minus8	ue(v)
Iog2_max_pic_order_cnt_lsb_minus4	ue(v)
qtbttdualtreeintraflag	ue(v)
Iog2_ctu_size_minus2	ue(v)
Iog2_min_luma_coding_block_size_minus2	ue(v)
partitionconstraintsoverrideenabledflag	ue(v)
sps_log2_diff_min_qt_min_cb_intra_tile_group_luma	ue(v)
sps log2 diff min qt min cb inter file group	ue(v)
sps_max_mtt_hierarchy_depth_inter_tile_groups	ue(v)

-4920172

sps_max_mtt_hierarchy_depth_intra_tile_groups_luma	ue(v)
if( sps_max_mtt_hierarchy_depth_intra_tile_groups_luma != 0 ) {
sps_log2_diff_max_bt_min_qt_intra_tile_group_luma	ue(v)
sps_log2_diff_max_tt_min_qt_intra_tile_group_luma	ue(v)
i f
if( sps_max_mtt_hierarchy_depth_inter_tile_groups != 0 ) {
sps_log2_diff_max_bt_min_qt_inter_tile_group	ue(v)
sps_log2_diff_max_tt_min_qt_inter_tile_group	ue(v)
f
if( qtbtt_dual_tree_intra_flag ) {
sps_log2_diff_min_qt_min_cb_intra_tile_group_chroma	ue(v)
sps_max_mtt_hierarchy_depth_intra_tile_groups_chroma	ue(v)
if ( sps_max_mtt_hierarchy_depth_intra_tile_groups_chroma != 0 ) {
sps log2 diff max bt min qt intra tile group chroma	ue(v)
sps log2 diff max tt min qt intra tile group chroma	ue(v)
}
}
spssaoenabledflag	u(l)
sps alf enabled flag	u(l)
pcmenabledflag	u(l)
if( pcm_enabled_flag ) {
pcmsamplebitdepthlumaminusl	u(4)
pcmsamplebitdepthchromaminusl	u(4)
log2 min pcm luma coding block size minus3	ue(v)
log2 diff max min pcm luma coding block size	ue(v)
pcmloopfïlterdisabledflag	u(l)
f
spsrefwraparoundenabledflag	u(l)
if( sps_ref_wraparound_enabled_flag )
spsrefwraparoundoffset	ue(v)
spstemporalmvpenabledflag	u(l)
if( sps_temporal_mvp_enabled_nag )
spssbtmvpenabledflag	u(l)
spsamvrenabledflag	u(l)
spsbdofenabledflag	u(l)
spscclmenabledflag	u(l)
spsmtsintraenabledflag	u(l)
spsmtsinterenabledflag	u(l)
spsaffïneenabledflag	u(l)
if( sps_affme_enabled_flag )
spsaffînetypeflag	u(l)
spsgbienabledflag	u(l)
spscprenabledflag	u(l)
spsciipenabledflag	u(l)
spstriangleenabledflag	u(l)

- 5020172

spsladfenabledflag	u(l)
if ( sps_ladf_enabled_flag ) {
sps_num_ladf_intervals_minus2	u(2)
spsladflowestintervalqpoffset	se(v)
for( i = 0; i < sps_num_ladf_intervals_minus2 + 1; i++ ) {
sps_ladf_qp_offset[ i ]	se(v)
sps_ladf_delta_threshold_minusl[ i ]	ue(v)
1 J
}
spsreshaperenabledflag	u(l)
rbsp_trailing_bits( )
}

In 7.3.3.1 General tile group header syntax

tile_group_header( ) {	Descriptor
tile_group_pic_parameter_set_id	ue(v)
if( NumTilesInPic > 1 ) {
tilegroupaddress	u(v)
num tiles in tile group minusl	ue(v)
}
tile group type	ue(v)
tile_group_pic_order_cnt_lsb	u(v)
if( partition_constraints_override_enabled_flag ) {
partition constraints override flag	ue(v)
if( partition_constraints_override_flag ) {
tile_group_log2_diff_min_qt_min_cb_luma	ue(v)
tilegroupmaxmtthierarchydepthluma	ue(v)
if( tile_group_max_mtt_hierarchy_depth_luma != 0 )
tile_group_log2_diff_max_bt_min_qt_luma	ue(v)
tile_group_log2_diff_max_tt_min_qt_luma	ue(v)
}
if( tile_group_type = = I && qtbtt_dual_tree_intra_flag ) {
tile_group_log2_diff_min_qt_min_cb_chroma	ue(v)
tile_group_max_mtt_hierarchy_depth_chroma	ue(v)
if( tile_group_max_mtt_hierarchy_depth_chroma != 0 )
tile_group_log2_diff_max_bt_min_qt_chroma	ue(v)
tile_group_log2_diff_max_tt_min_qt_chroma	ue(v)
}
}
}
}
if ( tile_group_type != I ) {
if( sps_temporal_mvp_enabled_flag )
tilegrouptemporalmvpenabledflag	u(l)

- 51 20172

if( tile_group_type = = B )
m vd_l l_zero_flag	u(l)
if( tile_group_temporal_mvp_enabled_flag ) {
if( tile_group_type = = B )
collocatedfromlO flag	u(l)
}
sixminusmaxnummergecand	ue(v)
if( sps_affme_enable_flag )
fiveminusmaxnumsubblockmergecand	ue(v)
}
tilegroupqpdelta	se(v)
if( pps_tile_group_chroma_qp_offsets_present_flag ) {
tilegroupcbqpoffset	se(v)
tilegroupcrqpoffset	se(v)
}
if( sps_sao_enabled_flag ) {
tile group sao luma flag	u(l)
if( ChromaArrayType != 0 )
tilegroupsaochroma flag	u(l)
}
if( sps_alf_enabled_flag ) {
tilegroupalfenabledflag	u(l)
if( tile_group_alf_enabled_flag )
alf_data( )
}
if( tile_group_type = = P \| \| tile group type = = B ) {
numrefidxlOactiveminus 1	ue(v)
if( tile_group_type = = B )
numrefidxllactiveminus 1	ue(v)
}
depquantenabled flag	u(l)
if( !dep_quant_enabled_flag )
sign data hiding enabled flag	u(l)
if( deblocking_filter_override_enabled_flag )
deblocking filter override flag	u(l)
if( deblocking_filter_override_flag ) {
tile group deblocking filter disabled flag	u(l)
if( !tile_group_deblocking_filter_disabled_flag ) {
tile_group_beta_offset_div2	se(v)
tile group te offset div2	se(v)
}
}
if( num_tiles_in_tile_group_minus 1 > 0 ) {
offset_Ien_minusl	ue(v)
for( i = 0; i < num_tiles_in_tile_group_minusl; i++ )

-5220172

entry_point_offset_minusl[ i ]	u(v)
}
if ( sps_reshaper_enabled_flag ) {
tile_group_reshaper_model_present_flag	u(l)
if ( tile_group_reshaper_model_present_flag )
tile_group_reshaper_model ( )
tilegroupreshaperenableflag	u(l)
if ( tile_group_reshaper_enable_flag && (!( qtbtt_dual_tree_intra_flag && tile_group_type ==!)))
tilegroupreshaperchromaresidualscaleflag	u(l)
}
byte_alignment( )
}

Add a new syntax table tile group reshaper model:

tile_group_reshaper_model () {	Descriptor
reshapermodelminbinidx	ue(v)
reshapermodeldeltamaxbinidx	ue(v)
reshaper_model_bin_delta_abs_cw_prec_minusl	ue(v)
for ( i = reshaper_model_min_bin_idx; i <= reshaper_model_max_bin_idx; i++ ) {
reshaper_model_bin_delta_abs_CW [ i ]	u(v)
if ( reshaper_model_bin_delta_abs_CW[ i ] ) > 0 )
reshaper_model bin_delta_sign_CW_flag[ i ]	u(l)
}
}

In General sequence parameter set RBSP semantics, add the following semantics:

sps_reshaper_enabled_flag equal to l spécifiés that reshaper is used in the coded video sequence (CVS). sps_reshaper_enabled_flag equal to 0 spécifiés that reshaper is not used in the CVS.

In tile group header syntax, add the following semantics tile group reshaper model présent flag equal to l spécifiés tile_group_reshaper_model() is présent in tile group header. tile_group_reshaper_model_present_flag equal to 0 spécifiés tile_group_reshaper_model() is not présent in tile group header. When tile_group_reshaper_model_present_flag is not présent, it is inferred to be equal to 0.

tilegroupreshaperenabled flag equal to l spécifiés that reshaper is enabled for the current tile group. tile_group_reshaper_enabled_flag equal to 0 spécifiés that reshaper is not enabled for the current tile group. When tile_group_reshaper_enable_flag is not présent, it is inferred to be equal to 0.

tilegroupreshaperchromaresidualscaleflag equal to l spécifiés that chroma residual scaling is enabled for the current tile group. tile_group_reshaper_chroma_residual_scale_flag equal to 0 spécifiés

- 53 20172 that chroma residual scaling is not enabled for the current tile group. When tile_group_reshaper_chroma_residual_scale_flag is not présent, it is inferred to be equal to 0.

Add tile group reshaper_model() syntax reshapermodelminbinidx spécifiés the minimum bin (or piece) index to be used in the reshaper construction process. The value of reshaper_model_min_bin_idx shall be in the range of 0 to MaxBinldx, inclusive. The value of MaxBinldx shall be equal to 15.

reshaper_model_delta_max_bin_idx spécifiés the maximum allowed bin (or piece) index MaxBinldx minus the maximum bin index to be used in the reshaper construction process. The value of reshaper_model_max_bin_idx is set equal to MaxBinldx - reshaper_model_delta_max_bin_idx.

reshaper_model_bin_delta_abs_cwjprec_minusl plus 1 spécifiés the number of bits used for the représentation ofthe syntax reshaper_model_bin_delta_abs_CW[ i ].

reshaper_model_bin_delta_abs_CW[ i ] spécifiés the absolute delta codeword value for the i-th bin. reshaper_model_bin_delta_sign_CW_flag[ i ] spécifiés the sign of reshaper_model_bin_delta_abs_CW[ i ] as follows:

- If reshaper_model_bin_delta_sign_CW_flag[ i ] is equal to 0, the corresponding variable RspDeltaCW[ i ] is a positive value.

- Otherwise ( reshaper_model_bin_delta_sign_CW_flag[ i ] is not equal to 0 ), the corresponding variable RspDeltaCW[ i ] is a négative value.

The variable RspDeltaCW[ i ] = (1 - 2*reshaper_model_bin_delta_sign_CW

[ i ]) * reshaper_model_bin_delta_abs_CW [ i ];

The variable RspCW[ i ] is derived as following steps:

The variable OrgCW is set equal to (1 « BitDepth_Y) / ( MaxBinldx +1).

- If reshaper_model_min_bin_idx < = i <= reshaper_model_max_bin_idx RspCW[ i ] = OrgCW + RspDeltaCW[ i ].

- Otherwise, RspCW[ i ] = 0.

The value of RspCW [ i ] shall be in the range of 32 to 2 * OrgCW - 1 if the value of BitDepth_Y is equal to 10.

The variables InputPivot[ i ] with i in the range of 0 to MaxBinldx + 1, inclusive are derived as follows

InputPivot[ i ] = i * OrgCW

The variable ReshapePivot[ i ] with i in the range of 0 to MaxBinldx + 1, inclusive, the variable ScaleCoeff i ] and InvScaleCoefff i ]with i in the range of 0 to MaxBinldx , inclusive, are derived as follows:

shiftY = 14

ReshapePivot[ 0 ] = 0;

for( i = 0; i <= MaxBinldx ; i++) {

ReshapePivot[ i + 1 ] = ReshapePivot[ i ] + RspCW[ i ]

- 5420172

ScaleCoef} i ] = ( RspCW[ i ] * (1 « shiftY) + (1 « (Log2(OrgCW) - 1))) » (Log2(OrgCW)) if(RspCW[i] = 0)

InvScaleCoefff i ] = 0 else

InvScaleCoeff[ i ] = OrgCW * (1 « shiftY) / RspCW[ i ] }

The variable ChromaScaleCoef[ i ] with i in the range of 0 to MaxBinldx , inclusive, are derived as follows:

ChromaResidualScaleLut[64] = {16384, 16384, 16384, 16384, 16384, 16384, 16384, 8192, 8192, 8192, 8192, 5461, 5461, 5461, 5461, 4096, 4096, 4096, 4096, 3277, 3277, 3277, 3277,2731,

2731, 2731, 2731, 2341, 2341, 2341, 2048, 2048, 2048, 1820, 1820, 1820, 1638, 1638,1638,

1638, 1489, 1489, 1489, 1489, 1365, 1365, 1365, 1365, 1260, 1260, 1260, 1260, 1170,1170,

1170, 1170, 1092, 1092, 1092, 1092, 1024, 1024, 1024, 1024};

shiftC = 11

- if ( RspCW[ i] =0 )

ChromaScaleCoef [ i ] = (1 « shiftC)

- Otherwise (RspCW[ i ] != 0),

ChromaScaleCoef} i ] = ChromaResidualScaleLut} Clip3(l, 64, RspCW[ i ] » 1 ) — 1 ]

Note: In an alternative implémentation, one may unify the scalingfor luma and chroma, thus eliminating the need for the ChromaResidualScaleLut]]. Then chroma scaling may be implemented as'.

shiftC = 11

- if ( RspCW} i ] == 0 )

ChromaScaleCoef [ i ] = (1 « shiftC)

- Otherwise (RspCW[ i ] != 0), the following applies:

BinCW = BitDepth_Y > 10 ? ( RspCW[ i ] » (BitDepth_Y 10)) : BitDepth_Y < 10 ? ( RspCW[ i ] « ( 10 BitDepth_Y) ): RspCW[ i ]; ChromaScaleCoef} i ] = OrgCW * (1 « shiftC) / BinCW [ i ].

- 55 20172

Add the following in Weighted sample prédiction process for combined merge and intra prédiction. The addition is highlighted.

8.4.6.6 Weighted sample prédiction process for combined merge and intra prédiction

Inputs to this process are:

- the width of the current coding block cbWidth,

- the height of the current coding block cbHeight,

- two (cbWidth)x(cbHeight) arrays predSamplesInter and predSamplesIntra,

- the intra prédiction mode predModelntra,

- a variable cïdx specifying the colour component index.

Output of this process is the (cbWidth)x(cbHeight) array predSamplesComb of prédiction sample values.

The variable bitDepth is derived as follows:

- If cïdx is equal to 0, bitDepth is set equal to BitDepth_Y.

- Otherwise, bitDepth is set equal to BitDepth_c.

The prédiction samples predSamplesComb[ x ][ y ] with x = 0..cbWidth - 1 and y = 0..cbHeight - 1 are derived as follows:

— The weight w is derived as follows:

— If predModelntra is INTRA_ANGULAR50, w is specified in Table 8-10 with nPos equal to y and nSize equal to cbHeight.

- Otherwise, if predModelntra is INTRA_ANGULAR18, w is specified in Table 8-10 with nPos equal to x and nSize equal to cbWidth.

- Otherwise, w is set equal to 4.

- If cïdx is equal to 0, predSamplesInter is derived as following:

- If tile_group_reshaper_enabled_flag is equal to 1, shiftY = 14 idxY = predSamplesInter} x ][ y ] » Log2( OrgCW ) predSamplesInter [ x ][ y ] = Clipl_Y ( ReshapePivot} idxY ] +„( ScaleCoeff} idxY ] *( predSamplesInter} x ]} y ] - InputPivot} idxY ] ) + ( 1 « ( shiftY - 1 ) ) ) » shiftY ) (8-xxx)

- Otherwise ( tile_group_reshaper_enabled_flag is equal to 0 ) predSamplesInter [ x ][ y ] = predSamplesInter [ x ][ y ] — The prédiction samples predSamplesComb} x ][ y ] are derived as follows:

-5620172 predSamplesComb[ x ][ y ] = ( w * predSamples!ntra[ x ][ y ] + ( 8 - w ) * predSamplesInterf x ][ y ] ) » 3 ) (8-740)

Table 8-10 - Spécification of w as a function of the position nP and the size nS

0 <= nP < ( nS / 4 )	( nS / 4 ) <= nP < ( nS / 2 )	( nS / 2 ) <= nP < ( 3 *nS / 4 )	( 3 *nS / 4 ) <= nP < n S
6	5	3	2

Add the following in Picture reconstruction process

8.5.5 Picture reconstruction process

Inputs to this process are:

- a location ( xCurr, yCurr ) specifying the top-left sample of the current block relative to the top-left sample of the current picture component,

- the variables nCurrSw and nCurrSh specifying the width and height, respectively, of the current block, - a variable cldx specifying the colour component of the current block,

- an (nCurrSw)x(nCurrSh) array predSamples specifying the predicted samples of the current block, - an (nCurrSw)x(nCurrSh) array resSamples specifying the residual samples of the current block. Depending on the value of the colour component cldx, the following assignments are made:

- If cldx is equal to 0, recSamples corresponds to the reconstructed picture sample array S_L and the function clipCidxl corresponds to Clipl _Y.

- Otherwise, if cldx is equal to 1, recSamples corresponds to the reconstructed chroma sample array S_Cb and the function clipCidxl corresponds to Clipl_c.

- Otherwise (cldx is equal to 2), recSamples corresponds to the reconstructed chroma sample array S_Crand the function clipCidxl corresponds to Clip 1_c.

When the value of tile_group_reshaper_enabled_flag is equal to 1, the (nCurrSw)x(nCurrSh) block of the reconstructed sample array recSamples at location ( xCurr, yCurr ) is derived as the mapping process specifïed in clause 8.5.5.1. Otherwise, the (nCurrSw)x(nCurrSh) block of the reconstructed sample array recSamples at location ( xCurr, yCurr ) is derived as follows:

recSamples[ xCurr + i ][ yCurr + j ] = clipCidxl( predSamples[ i ][ j ] + resSamples[ i ][ j ] ) (8-xxx) with i = 0..nCurrSw — 1, j = 0..nCurrSh — 1

8.5.5.1 Picture reconstruction with mapping process

This clause spécifiés picture reconstruction with mapping process. The picture reconstructon with mapping process for luma sample value is specifïed in 8.5.5.1.1. The picture reconstructon with mapping process for chroma sample value is specifïed in 8.5.5.1.2.

8.5.5.1.1 Picture reconstructon with mapping process for luma sample value

Inputs to this process are:

-5720172

- an (nCurrSw)x(nCurrSh) array predSamples specifying the luma predicted samples of the current block,

- an (nCurrSw)x(nCurrSh) array resSamples specifying the luma residual samples of the current block. The output for this process are:

- an (nCurrSw)x(nCurrSh) mapped luma prédiction sample array predMapSamples,

- an (nCurrSw)x(nCurrSh) reconstructed luma sample array recSamples.

The predMapSamples is derived as follows:

- If ( CuPredMode[ xCurr ][ yCurr ] = = MODE_INTRA ) || ( CuPredMode[ xCurr ][ yCurr ] = = MODE_INTER && mh_intra_flag[ xCurr ][ yCurr ] ) predMapSamples[ xCurr + i ][ yCurr + j ] = predSamples[ i ][ j ] (8-xxx) with i = 0. .nCurrSw - 1, j = 0. .nCurrSh - 1

- Otherwise ( ( CuPredMode[ xCurr ] [ yCurr ] = = MODE_INTER &&

!mh_intra_flag[ xCurr ][ yCurr ] )), the following applies:

shiftY =14 idxY = predSamples[ i ][ j ] » Log2( OrgCW ) predMapSamples[ xCurr + i ][ yCurr + j ] = ReshapePivot[ idxY ] + ( ScaleCoeff[ idxY ] *(predSamples[ i ][ j ] - InputPivot[ idxY ] ) + ( 1 « ( shiftY - 1 ) ) ) » shiftY (8-xxx) with i = 0. .nCurrSw — 1, j = 0. .nCurrSh - 1

The recSamples is derived as follows:

recSamples[ xCurr + i ][ yCurr + j ] = Cliply ( predMapSamples[ xCurr + i ][ yCurr + j ]+ resSamples[ i ][ j ] ] ) (8-xxx) with i = 0. .nCurrSw — 1, j = 0. .nCurrSh - 1

8.5.5.1.2 Picture reconstructon with mapping process for chroma sample value

Inputs to this process are:

- an (nCurrSwx2)x(nCurrShx2) array mapped predMapSamples specifying the mapped luma predicted samples of the current block,

- an (nCurrSw)x(nCurrSh) array predSamples specifying the chroma predicted samples of the current block,

- an (nCurrSw)x(nCurrSh) array resSamples specifying the chroma residual samples of the current block.

The output for this process is reconstructed chroma sample array recSamples.

The recSamples is derived as follows:

- If ( !tile_group_reshaper_chroma_residual_scale_flag || ( (nCurrSw)x(nCurrSh) <= 4) )

- 58 20172 recSamples[ xCurr + i ][ yCurr + j ] - Clipl_c ( predSamples[ i][j ] + resSamples[ i ][j ] ) (8-xxx) with i = O..nCurrSw - 1, j = O..nCurrSh - 1

- Otherwise (tile_group_reshaper_chroma_residual_scale_flag && ( (nCurrSw)x(nCurrSh) > 4)), the following applies:

The variable varScale is derived as follows:

1. invAvgLuma = Clipl _Y( ( Σ,Σ, predMapSamples[ (xCurr « 1 ) + i ][ (yCurr « 1) + j ] + nCurrSw * nCurrSh *2) / ( nCurrSw * nCurrSh *4 ) )

2. The variable idxYInv is derived by involing the identification of piece-wise function index as specfied in clause 8.5.6.2 with the input of sample value invAvgLuma.

3. varScale = ChromaScaleCoef[ idxYInv ]

The recSamples is derived as follows:

- If tu_cbf_cldx [ xCurr ][ yCurr ] equal to 1, the following applies:

shiftC = 11 recSamples[ xCurr + i ][ yCurr + j ] = ClipCidxl ( predSamples[ i ][ j ] + Sign( resSamples[ i ][ j ] ) * ( ( Abs( resSamples[ i ] [ j ] ) * varScale + ( 1 « ( shiftC — 1 ) ) ) » shiftC ) ) (8-xxx) with i = 0..nCurrSw - 1, j = 0..nCurrSh 1

- Otherwise (tu_cbf_cldx[ xCurr ][ yCurr ] equal to 0) recSamplesf xCurr + i ][ yCurr + j ] = ClipCidxl (predSamples[ i ][ j ] ) (8-xxx) with i = 0..nCurrSw - 1, j = 0..nCurrSh - 1

8.5.6 Picture inverse mapping process

This clause is invoked when the value of tile_group_reshaper_enabled_flag is equal to 1. The input is reconstructed picture luma sample array S_L and the output is modified reconstructed picture luma sample array S ’_L after inverse mapping process.

The inverse mapping process for luma sample value is specified in 8.4.6.1.

8.5.6.1 Picture inverse mapping process of luma sample values

Inputs to this process is a luma location ( xP, yP ) specifying the luma sample location relative to the topleft luma sample of the current picture.

Outputs of this process is a inverse mapped luma sample value invLumaSample.

The value of invLumaSample is derived by applying the following ordered steps:

1. The variables idxYInv is derived by invoking the identification of piece-wise function index as specified in clause 8.5.6.2 with the input of luma sample value S_L[ xP ][ yP ].

- 59 20172

2. The value of reshapeLumaSample is derived as follows:

shiftY = 14 invLumaSample = InputPivotf idxYInv ] + ( InvScaleCoeff} idxYInv ] *(

S_L[ xP ][ yP ] - ReshapePivot[ idxYInv ] ) + ( 1 « ( shiftY - 1 ) ) ) » shiftY (8-xxx)

3. clipRange = ((reshaper_model_min_bin_idx > 0) && (reshaper_model_max_bin_idx < MaxBinldx));

When clipRange is equal to 1, the following applies:

minVal = 16 « (BitDepth_Y - 8) maxVal = 235« (BitDepth_Y - 8) invLumaSample = Clip3(minVal, maxVal, invLumaSample) else (clipRange is equal to 0), the following applies:

invLumaSample = ClipCidxl (invLumaSample)

8.5.6.2 Identification of piecewise function index for luma components

Inputs to this process are a luma sample value S.

Output of this process is an index idxS identifîng the piece to which the sample S belongs. The variable idxS is derived as follows:

for( idxS = 0, idxFound = 0; idxS <= MaxBinldx; idxS++ ) { if( (S < ReshapePivot [ idxS + 1 ] ) { idxFound =1 break }

Note, an alternative implémentation to find the identification idxS is as following:

if (S < ReshapePivot [ reshaper_model_min_bin_idx ]) idxS = 0 else if (S >= ReshapePivot [ reshaper_model_max_bin_idx ]) idxS = MaxBinldx else idxS = fmdldx ( S, 0, MaxBinldx + 1, ReshapePivot [ ] )

-6020172 function idx = findldx (val, low, high, pivot[ ]) { if ( high - low <= l ) idx = low else { mid = ( low + high) » 1 if (val < pivot [mid] ) high = mid else low = mid idx = findldx (val, low, high, pivot[])

-61 20172

Claims

CLAIMS What is claimed is:

1. A method to reconstruct coded video data with one or more processors, the method comprising:

receiving a coded bitstream comprising one or more coded reshaped images in an input codeword représentation;

receiving reshaping metadata for the one or more coded reshaped images in the coded bitstream, wherein the reshaping metadata comprise parameters to generate a forward reshaping function based on the reshaping metadata, wherein the forward reshaping function maps pixels of an image from a first codeword représentation to the input codeword représentation, wherein the reshaping metadata comprise:

a first parameter to détermine a minimum bin index being used in reshaping;

a second parameter to détermine a maximum bin index being used in reshaping;

absolute delta codeword values for each bin in the input codeword représentation; and signs of the absolute delta codeword values for each bin in the input codeword représentation;

generating a forward reshaping function based on the reshaping metadata;

generating an inverse reshaping function based on the reshaping metadata or the forward reshaping function, wherein the inverse reshaping function maps pixels of a reshaped image from the input codeword représentation to the first codeword représentation; and decoding the coded bitstream based on the forward reshaping function and the inverse reshaping function.

2. The method of claim 1, wherein the forward reshaping function is reconstructed as a piecewise linear function with linear segments derived by the reshaping metadata.

3. A method for adaptive reshaping of a video sequence with a processor, the method comprising:

accessing with a processor an input image in a first codeword représentation; and

4. The method of claim 3, wherein the first luminance characteristic of a pixel région comprises the average luminance pixel value in the pixel région.

5 a first set of parameters indicating absolute delta codeword values for each bin in the input codeword représentation; and a second set of parameters indicating signs of the delta codeword values for each bin in the input codeword représentation.

10

5. The method of claim 3, wherein the second luminance characteristic of a pixel région comprises the variance of luminance pixel values in the pixel région.

6. The method of claim 5, wherein computing a bin metric for a codeword bin comprises computing the average of the variances of luminance pixel values for ail pixels régions assigned to the codeword bin.

6 2 generating a forward reshaping function mapping pixels ofthe input image to a second codeword représentation, wherein generating the forward reshaping function comprises:

dividing the input image into multiple pixel régions;

assigning each ofthe pixel régions to one ofmultiple codeword bins according to a first luminance characteristic of each pixel région;

computing a bin metric for each of the multiple codeword bins according to a second luminance characteristic of each of the pixel régions assigned to each of the multiple codeword bins;

allocating a number of codewords in the second codeword représentation to each ofthe multiple codeword bins according to the bin metric of each of the multiple codeword bins and a rate distortion optimization criterion;

7 The method of claim 3, wherein allocating a number of codewords in the second codeword représentation to a codeword bin according to its bin metric comprises:

assigning no codewords to the codeword bin, if no pixel régions are assigned to the codeword bin;

8. A method to reconstruct coded video data, the method comprising:

receiving a coded bitstream comprising one or more coded reshaped pictures in an input codeword représentation;

receiving reshaping metadata for the one or more coded reshaped pictures in the coded bitstream;

generating a forward reshaping function based on the reshaping metadata, wherein the forward reshaping function maps pixels of a picture from a first codeword représentation to the input codeword représentation;

generating an inverse reshaping function based on the reshaping metadata or the forward reshaping function, wherein the inverse reshaping function maps pixels of a reshaped picture from the input codeword représentation to the first codeword représentation;

extracting from the coded bitstream a coded reshaped picture comprising one or more coded units, wherein:

for an inter-coded CU (inter-CU) in the coded reshaped picture:

applying the forward reshaping function to inter-prediction samples stored in a reference buffer in the first codeword représentation to generate reshaped prédiction samples for the inter-CU in the input codeword représentation;

generating reshaped reconstructed samples of the inter-CU based on reshaped residuals in the inter-CU and the reshaped prédiction samples for the inter-CU;

applying the inverse reshaping function to the reshaped reconstructed samples of the inter-CU to generate decoded samples of the inter-CU in the first codeword représentation;

applying a loop filter to the decoded samples of the inter-CU to generate output samples ofthe inter-CU; and storing the output samples of the inter-CU in the reference buffer; and generating a decoded picture in the first codeword représentation based on output samples in the reference buffer.

9. The method of claim 8, wherein generating a reshaped reconstructed sample (RecSample) of the inter-CU comprises computing:

RecSample = (Res + Fwd(PredSdmpley), wherein Res dénotés a reshaped residual in the inter-CU in the input codeword représentation, FwdQ dénotés the forward reshaping function, and PredSample dénotés an inter-prediction sample in the first codeword représentation.

10. The method of claim 9, wherein generating an output sample to be stored in the reference buffer (RecSample InD P B) comprises computing:

RecSamplelnDPB = LP F(Inv(RecSamplef), wherein, InvQ dénotés the inverse reshaping function and LPF() dénotés a loop filter.

11. The method of claim 8, wherein for chroma residual samples in the inter-coded CU (interCU) in the input codeword représentation, further comprising:

determining a chroma scaling factor based on luma pixel values in the input codeword représentation and the reshaping metadata;

multiplying the chroma residual samples in the inter-CU with the chroma scaling factor to generate scaled chroma residual samples in the inter-CU in the first codeword représentation;

generating reconstructed chroma samples of the inter-CU based on the scaled chroma residuals in the inter-CU and chroma inter-prediction samples stored in the reference buffer to generate decoded chroma samples of the inter-CU;

applying the loop filter to the decoded chroma samples of the inter-CU to generate output chroma samples of the inter-CU; and storing the output chroma samples of the inter-CU in the reference buffer.

12. The method of claim 11, wherein in inter mode the chroma scaling factor is based on an average of inter-predicted luma values in the input codeword représentation.

6b

13. The method of claim 8 wherein the reshaping metadata comprise:

a first parameter to détermine a minimum bin index being used in reshaping;

a second parameter to détermine a maximum bin index being used in reshaping;

14. The method of claim 8, wherein the forward reshaping function is reconstructed as a piece- wise linear function with linear segments derived by the reshaping metadata.

15. An apparatus comprising a processor and configured to perform a method as recited in claim 1.

16. A non-transitory computer-readable storage medium having stored thereon computerexecutable instruction for executing a method with one or more processors in accordance with claim 1.