CN113632481B

CN113632481B - Delay reduction method and device for chroma residual scaling

Info

Publication number: CN113632481B
Application number: CN202080021613.7A
Authority: CN
Inventors: 林芷仪; 庄子德; 陈庆晔; 徐志玮
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc
Priority date: 2019-03-15
Filing date: 2020-03-13
Publication date: 2023-12-26
Anticipated expiration: 2040-03-13
Also published as: CN113632481A

Abstract

A method and apparatus for video decoding are disclosed. According to one method, the chroma residual scaling factor is derived based on neighboring predicted or reconstructed luma samples of the collocated luma block, wherein the neighboring predicted or reconstructed luma samples of the collocated luma block correspond to samples of M samples along a top boundary of the collocated luma block and N samples along a left boundary of the collocated luma block. Chroma scaling is applied to chroma residual samples of the chroma residual block according to the derived chroma residual scaling factor. In another approach, the chroma residual scaling factor is derived based on one or more reconstructed luma samples outside the collocated luma processing data unit. In another approach, the chroma residual scaling factor is signaled in or parsed from the APS (adaptive parameter set) of the data stream.

Description

Delay reduction method and device for chroma residual scaling

Cross-reference to related applications

The present invention claims 2019.3.15, U.S. provisional patent application serial No.62/818,799, 2019.3.23, 62/822,866, 2019.4.24, 62/837,773, 2019.6.19, 62/863,333, 2019.6.26, 62/866,710, 2019.7.4, 62/870,757. The contents of this U.S. provisional patent application are incorporated by reference into this specification.

[ field of technology ]

The present invention relates to video codec for color video data, wherein a luminance map is applied to luminance components. In particular, techniques are disclosed for deriving and/or signaling one or more chroma scaling factors (scaling factors) for chroma residual scaling.

[ background Art ]

Multifunctional video codec (VVC) is an emerging video codec standard developed by the joint video expert group consisting of ITU-T16 th research group video codec expert group and ISO/IEC JTC1SC29/WG11 (moving picture expert group (Moving Picture Experts Group, abbreviated MPEG)). VVC is based on HEVC (high efficiency video codec) video standard, with improved and new codec tools. For example, the remodeling (reshap) process is a new codec tool employed in VTM-4.0 (VVC test model version 4.0). The remodeling process is also known as LMCS (luma map and chroma scaling (Luma Mapping and Chroma Scaling)). When applying reshaping, the video samples are encoded and reconstructed in the reshaped domain before loop filtering (loop filter). The remolded domain reconstructed samples are converted to the original domain using inverse remodelling. The original domain reconstructed samples that are loop filtered are stored in a decoded picture buffer. For Inter mode (Inter mode), a Motion Compensated (MC) predictor is converted to a remodel domain by using forward remodelling (forward reshaping). Fig. 1 shows an example of a reshaping process at the decoder side.

As shown in fig. 1, the data stream is decoded by a CABAC (context adaptive binary arithmetic coding) decoder (i.e., CABAC) in block 110 ^-1 ) Inverse quantization (i.e. Q ^-1 ) And inverse transform (T) ^-1 ) Deriving reconstructed luminance residual Y _res . The reconstructed luminance residual is provided to a luminance reconstruction block 120 to generate a reconstructed luminance signal. For intra mode, the predictor is from intra prediction block 130. For inter mode, the predictor is from motion compensation block 140. Since the reshaping is applied to the luma signal on the encoder side, forward reshaping 150 is used for the predictor from the motion compensation block 140 before it is provided to the reconstruction block 120. Inverse remodeling 160 is applied to the reconstructed luma signal from reconstruction block 120 to recover the un-shaped reconstructed luma signal. The loop filter 170 is then applied to the unplanned reconstructed luminance signal before the signal is stored in a Decoded Picture Buffer (DPB) 180.

When applying the reshaping, the chroma residual scaling is also applied. The chroma residual scaling may compensate for the interaction between the luma signal and the chroma signal as shown in fig. 2. In fig. 2, the upper part corresponds to luminance decoding and the lower part corresponds to chrominance decoding.

Chroma residual scaling is applied at the TU level at the encoder side and decoder side according to the following equations, respectively:

Encoder side C _ResScale ＝C _Res *C _Scale ＝C _Res /C _ScaleInv (1)

Decoder side:C _Res ＝C _ResScale /C _Scale ＝C _ResScale *C _ScaleInv (2)

in the above equation, C _Res Is the original chrominance residual signal, C _ResScale Is a scaled chrominance residual signal. C (C) _Scale Is a scaling factor calculated for the inter-mode predictor using fwdlt (i.e., forward look-up table) and converted to its inverse C _ScaleInv To perform multiplication rather than division at the decoder side, thereby reducing implementation complexity. The scaling operations at both encoder and decoder sides are implemented by a fixed-point integer algorithm (fixed-point integer arithmetic) by the following formula:

c’＝sign(c)*((abs(c)*s+2 ^{CSCALE_FP_PREC-1} )>>

CSCALE_FP_PREC) (3)

in the above formula, c is the chroma residual, s is the chroma residual scaling factor in cscaleimv [ exeedx ], which is determined by the corresponding average luminance value of the TU, cscale_fp_prec is a constant for the specified precision. In order to derive the scaling factor, the predictor of the entire TU is used. The value of c_scaleinv is calculated as follows:

if the mode is the intra mode, calculating the average value of intra prediction brightness values; if the mode is inter mode, the average value of the forward remodelled inter prediction luminance value is calculated. In other words, the average luminance value acgY 'is calculated in the remodelled domain' _TU 。

Find index idx, where avgY' _TU Belongs to the inverse map PWL.

C _ScaleInv ＝cScaleInv[idx]

Deriving the chroma scaling factor C is performed by block 210 in fig. 2 _ScaleInv The chroma scaling factor is input to chroma residual scaling block 250. Derived chroma scaling factor C _ScaleInv For converting scaled chroma residual, which is decoded by CABAC (context adaptive binary arithmetic coding) (i.e., CABAC ^-1 ) Inverse quantization (i.e. Q ^-1 ) And inverse transform (T) ^-1 ) To reconstruct. The reconstruction block 220 reconstructs the color by adding predictors to the reconstructed chroma residualA degree signal. For intra mode, the predictor is from intra prediction block 230. For inter mode, the predictor is from the motion compensation block 240. The loop filter 270 is then applied to the reconstructed chroma signal before the signal is stored in a chroma decoded picture buffer (decodedpicture buffer, abbreviated DPB) 280.

Fig. 3 shows an example of a luminance map. In fig. 3A, 1 is shown: 1, wherein the output (i.e., the reshaped intensity) is the same as the input. Since the histogram of luma samples is typically not flat, the use of intensity reshaping may help improve performance in terms of RDO (rate-distortion optimization ). Statistics of the luminance samples are calculated for an image region (e.g., a picture). The mapping curve is then determined based on the statistics. Typically, piecewise linear (PWL) mapping curves are used. Fig. 3B shows an example of a piecewise linear (PWL) map with 3 segments, where two adjacent segments have different slopes. Dashed line 340 corresponds to 1:1 mapping. If the samples ranging from 0 to 340 have a large spatial variance (spatial variance) and the number of occurrences (number of occurrences) is small, then the input ranges 0-340 are mapped to a small output range (i.e., 0-170), as indicated by line segment 310 of FIG. 3B. If samples ranging from 340 to 680 have less spatial variation and occur more frequently, then the input ranges 340-680 are mapped to a larger output range (i.e., 170-850), as indicated by line segment 320 of FIG. 3B. If samples ranging from 680 to 1023 have a larger spatial variation and occur less frequently, then the input range 680-1023 is mapped to a smaller output range (i.e., 850-1023), as indicated by line segment 330 of FIG. 3B. Fig. 3B is intended to illustrate a simple PWL map. In practice, the PWL map may have more or fewer segments.

Intra sub-block partitioning (ISP) and sub-block transform (SBT)

To generate better intra mode predictors, intra sub-block partitioning (ISP) may be applied. When the ISP is applied, the luminance component is divided into a plurality of sub TBs. The sub-TBs are reconstructed one by one. For each sub-TU, reconstructed samples of neighboring sub-TBs may be used as neighboring reconstructed samples of intra-prediction. For the chrominance component TB, it is not divided into a plurality of sub TBs like luminance.

Similar to ISP, sub-block transform (SBT) may be applied to inter modes. When SBT is applied, only part of CU data is converted. For example, the current block may be divided into two partitions by horizontal division or vertical division. Only one partition can be used for converting the codec. The residual of the other partition is set to zero. For example, a CU is split into two TUs or four TUs. Only one of TUs has non-zero coefficients.

Signalling of LMCS parameters (signalling)

Table 1 shows a syntax table of LMCS parameters that VVC is considering.

Table 1.

In the above grammar table, the semantics of the grammar are defined as follows:

lmcs_min_bin_idx specifies the minimum data sub (bin) index of the PWL (piecewise linear) model of the luminance map

lmcs delta max bin idx specifies an increment value (delta value) between 15 and the maximum data sub-index LmcsMaxBinIdx used in lmcs. This value should be in the range of 1 to 15 inclusive.

lmcs_delta_cw_pre_minus1 plus 1 is the number of data used to represent syntax lmcs_delta_abs_cw [ i ].

lmcs delta abs cw i is the absolute delta codeword value (absolute delta codeword value) for the ith data seed.

lmcs_delta_sign_cw_flag [ i ] is the sign (sign) of the variable lmcsDeltaCW [ i ].

The variable lmcsDeltaCW [ i ] is derived as follows:

lmcsDeltaCW[i]＝

(1-2*lmcs_delta_sign_cw_flag[i])*lmcs_delta_abs_cw[i].

the variable lmcsccw [ i ] specifies the number of codewords per interval in the mapped domain, where i=0 …. It can be derived as follows:

OrgCW＝(1<<BitDepthY)/16

for i=0 … lmcs_min_bin_idx-1, lmcs cw [ i ] is set equal to 0.

For i=lmcs_min_bin_idx..lmcs maxbinidx, the following applies:

lmcsCW[i]＝OrgCW+lmcsDeltaCW[i]

the value of lmcscCW [ i ] should be in the range of (OrgCW > > 3) to (OrgCW < < 3-1) (including (OrgCW > > 3) and (OrgCW < < 3-1)).

For i=lmcsmaxbinidx+1 …, lmcscw [ i ] is set equal to 0.

To represent the PWL model of the remodelling curve, three variables LmcsPivot [ i ] (i=0 …), scaleCoeff [ i ] (i=0 …), and InvScaleCoeff [ i ] (i=0 …) were derived as follows:

in the above derivation, scale_fp_prec is a constant value for specifying accuracy.

In LMCS, the delay of chroma residual scaling may negatively impact processing speed due to the dependence on the corresponding luminance data. Accordingly, it is desirable to develop methods and apparatus that reduce the delay of chroma residual scaling.

[ invention ]

Methods and apparatus for video decoding are disclosed. According to one method of the present invention, a current chroma residual block is received. One or more chroma residual scaling factors are derived based on neighboring predicted or reconstructed luma samples of the collocated luma block, wherein neighboring predicted or reconstructed luma samples of the collocated luma block associated with the current chroma residual block correspond to M samples along a top boundary of the collocated luma block and samples of N samples along a left boundary of the collocated luma block, wherein M and N are positive integers. And applying chroma scaling to chroma residual samples of the current chroma residual block according to the one or more derived chroma residual scaling factors.

In one embodiment, the neighboring predicted or reconstructed luma samples of the collocated luma block correspond to M samples along the top boundary of the collocated luma block. In another embodiment, the adjacent predicted or reconstructed luma samples of the collocated luma block correspond to N samples along the left boundary of the collocated luma block. In yet another embodiment, the neighboring predicted or reconstructed luma samples of the collocated luma block correspond to both the M samples along the top boundary of the collocated luma block and the N samples along the left boundary of the collocated luma block.

In one embodiment, if a boundary sample at an upper left position of the collocated luma block is available, the boundary sample at the upper left position of the collocated luma block is used to derive the one or more chroma residual scaling factors. If the boundary samples at the upper left position of the collocated luma block are not available, the left boundary samples along the left boundary of the collocated luma block or the top boundary samples along the top boundary of the collocated luma block are used to derive the one or more chroma residual scaling factors.

According to another method, chroma residual data associated with a current chroma processing data unit in a picture is received, wherein the picture is divided into a plurality of non-overlapping processing data units, and each processing data unit includes a luma processing data unit and one or more chroma processing data units. One or more chroma residual scaling factors are derived based on one or more reconstructed luma samples outside of the collocated luma processing data unit associated with the current chroma processing data unit. And then applying chroma scaling to chroma residual samples of the current chroma processing data unit in accordance with the derived one or more chroma residual scaling factors. According to a variant of the method, a chroma residual scaling factor is derived based on one or more reconstructed luma samples from a first Coding Unit (CU) covering an upper left position of the collocated luma processing data unit.

In one embodiment, the one or more reconstructed luma samples outside of a first Codec Unit (CU) that covers the collocated luma processing data unit correspond to one or more reconstructed luma samples of one or more previously-encoded luma processing data units. In another embodiment, the one or more reconstructed luma samples of the one or more previously encoded luma processing data units correspond to one or more reconstructed luma samples along a top boundary of the overlaid collocated luma first Codec Unit (CU), one or more reconstructed luma samples along a left boundary of the overlaid collocated luma first Codec Unit (CU), or both.

In one embodiment, the reconstructed luma samples outside the collocated luma processing data unit correspond to one or more reconstructed luma samples of one or more previously decoded luma processing data units. For example, the reconstructed luma samples of the one or more previously decoded luma processing data units correspond to one or more reconstructed luma samples along a top boundary of the collocated luma processing data unit, one or more reconstructed luma samples along a left boundary of the collocated luma processing data unit, or both.

In yet another method, one or more chroma residual scaling factors are signaled in the APS (Adaptation Parameter Set ) level of the video data stream at the encoder side or parsed from the APS level of the video data stream at the decoder side.

[ description of the drawings ]

Fig. 1 shows an exemplary block diagram of a video decoder incorporating a luma reshaping process.

Fig. 2 shows an exemplary block diagram of a video decoder that combines a luma reshaping process and chroma scaling.

Fig. 3A shows 1:1, wherein the output (i.e., the reshaped intensity) is the same as the input.

Fig. 3B shows an example of a piecewise linear (PWL) luminance map with 3 pieces.

Fig. 4 shows an example of deriving a chroma scaling factor based on luminance samples reconstructed along a reference VPDU top boundary, a left boundary, or both, according to an embodiment of the invention.

Fig. 5 shows an example of deriving a chroma scaling factor based on the reference reconstructed luma sample TL, a or L positions according to an embodiment of the present invention.

Fig. 6 illustrates a flow diagram of an exemplary decoding system for deriving one or more chroma residual scaling factors based on neighboring predicted or reconstructed luma samples of a collocated luma block in accordance with an embodiment of the invention.

Fig. 7 illustrates a flow diagram of another exemplary decoding system for deriving one or more chroma residual scaling factors based on one or more reconstructed luma samples outside a collocated luma processing data unit according to one embodiment of the present invention.

Fig. 8 shows a flow chart of an exemplary codec system in which one or more chroma residual scaling factors are signaled in or parsed from the APS (adaptive parameter set) level of the video data stream at the encoder side according to an embodiment of the present invention.

[ detailed description ] of the invention

The following description is of the best mode contemplated for carrying out the invention. This description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims.

In chroma residual scaling, for a chroma TU, all corresponding luma predictors are used to derive a single scaling factor. The chroma sample reconstruction cannot be processed until the scaling factor is derived. It introduces new data dependencies for the cross-component process, resulting in longer delays in the reconstruction of chroma samples. In VVC, some decoder aids are introduced to refine the luma predictors to improve codec efficiency. These types of codec tools will also add critical paths for the reconstruction loop. In inter and intra mode prediction, the prediction samples of a CU/PU/TU may be divided into multiple MxN blocks, and the blocks may be processed sequentially or in parallel.

In one embodiment, to reduce the delay of chroma sample reconstruction, it uses only the KxL luma samples (e.g., luma predictors or luma reconstruction samples or luma residuals) or the upper left M luma samples above and to derive one or more chroma residual scaling factors for the CU/PU/TU. K and L may be equal to 1, 2, 4, 8, 16, 32 or 64. One or more scaling factors are used for the entire chroma TU. For example, the upper left 16x15 luminance sample is used. In another example, the upper left 1x1 luminance sample is used. In another example, the upper left 256 luminance samples are used. In another example, the upper left 1 luma sample is used. In another example, if the width and height of a luma CU/TU is greater than or equal to 16, then the upper left 16x16 luma samples are used; otherwise, a maximum of 256 upper left corner luminance samples are used. In one example, when an ISP is applied, only the top left KxL block or top left M samples of the first ISP sub TB are used to derive the chroma residual scaling factor. In another example, when SBT is applied, only the top left KxL block or top left M samples of TUs with non-zero coefficients are used to derive the scaling factor. In another embodiment, only a portion of the corresponding luma samples is used to derive the chroma residual scaling factor. For example, a portion of the intra-collocated luma CT/TU/PU boundary samples, e.g., a portion of the top row (top-row) and a portion of the left column of the intra-collocated luma CT/TU/PU boundary samples, are used to derive the chroma residual scaling factor.

In another embodiment, to reduce the delay for chroma sample reconstruction of a CU/PU/TU, one or more chroma residual scaling factors are derived using only neighboring boundary samples along the current TB (i.e., corresponding luma samples or luma samples called collocated). The samples may be prediction samples or reconstructed samples of neighboring blocks. In an embodiment, the M samples along the top boundary are used to derive one or more chroma residual scaling factors. In an embodiment, the N samples along the left boundary are used to derive one or more chroma residual scaling factors. In an embodiment, the M samples along the top boundary and the N samples along the left boundary are used to derive one or more chroma residual scaling factors. Here, M and N may be 1, 2, 4, 8, 16, 32, or 64. In another embodiment, one or more chroma residual scaling factors are derived using samples located at the upper left corner of the L-shaped boundary. In another embodiment, if the upper left neighbor sample is available, the sample is used. Otherwise, one of the top neighbor samples or one of the left neighbor samples is used. In one example, if none of the above samples are available, the upper left sample in the collocated luma block is used. One or more scaling factors are used for the entire chroma TU.

In another embodiment, to reduce the delay of chroma sample reconstruction when chroma residual scaling is applied, it is proposed to divide the chroma TU into sub-blocks, e.g. KxL sub-blocks or sub-blocks of block size equal to M. K and L may be 2, 4, 8, 16 or 32; m may be 4, 8, 16, 32, 64, 128, 256, 512, or 1024. For each KxL chroma residual sub-block, one or more scaling factors are derived. Different KxL chroma residual sub-blocks may have different scaling factors. For example, for an mxn block, where M is greater than K (i.e., width threshold) and N is less than L (i.e., height threshold), the mxn block is divided into M/K blocks of size (kxn).

In another embodiment, when the size/area/width/height of the chroma residual TU is less than the first threshold or greater than the second threshold, no chroma residual scaling is applied. For example, chroma residual scaling will be disabled when the TU size is less than or equal to 8 or 16 or 64. In another example, chroma residual scaling will be disabled when TU width or height is less than or equal to 2 or 4 or 8 or 16. In another example, chroma residual scaling is disabled when the TU size is greater than or equal to 16, 64, 256, or 1024. In another example, chroma residual scaling is disabled when the TU width or height is greater than or equal to 8 or 16 or 32. In another example, for some prediction modes, chroma residual scaling is disabled. For example, chroma residual scaling is disabled for blocks that have DMVR mode, BIO mode, LIC mode, diffusion mode, or a combination of these modes enabled.

DMVR (decoder-side motion vector refinement) is a new codec tool developed in recent years. The DMVR derives MV refinement information at the decoder side to improve codec performance. BIO is another new codec tool developed in recent years. The BIO derives sample-level motion refinement from the assumption of optical flow and stable motion, where the current pixel in the B slice (bi-predictive slice) is predicted from one pixel in reference picture 0 and one pixel in reference picture 1. LIC (Local Illumination Compensation ) is a method of performing inter prediction using neighboring samples of a current block and a reference block. It is based on a linear model using a scaling factor and an offset.

In one embodiment, when an ISP is applied, only a portion of the luma sub-TB is used to derive the chroma residual scaling factor. For example, only the first luminance TB is used to derive the chroma residual scaling factor. Generating the scaling factor using the first luminance TB may reduce the delay of the reconstruction of the chroma samples. In another example, only the last luminance TB is used to derive the chroma residual scaling factor.

In another embodiment, each luminance sub-TB may be considered a separate TB when an ISP is applied. For each sub-TB, it may calculate its own chroma residual scaling factor. The method proposed above may also be applied, for example, dividing each luminance sub-TB into several KxL sub-blocks and deriving a scaling factor for each sub-block. For the chroma TB, even if it is not divided into a plurality of sub TBs like luminance when the transform is performed, the chroma TB is divided into a plurality of sub areas when the chroma residual scaling is performed. Each sub-region corresponds to a luminance TB; each sub-region corresponds to one or more luminance sub-TBs; or one or more chrominance sub-areas correspond to one luminance sub-TB. For each chroma sub-region, if the luma sub-TB is divided into a plurality of sub-blocks to derive a scaling factor, it may be further divided into a plurality of sub-blocks.

In another embodiment, when SBT is applied, the chroma residual scaling factor is derived using only luma partitions with non-zero coefficients. The luminance partition used may be divided into sub-blocks to derive the scaling factor. In another embodiment, when SBT is applied, the luminance samples of the entire CU may be used to derive one or more scaling factors.

In another embodiment, luma samples of a CU (not a TU or TB) are used to derive a chroma residual scaling factor. When ISP is applied, the entire luma CU samples will be used to derive the chroma residual scaling factor. For example, the luma CU samples may be divided into sub-blocks to derive different scaling factors for the different sub-blocks. The sub-blocks may cross ISP sub-TB boundaries.

In another embodiment, the chroma residual scaling factor derivation may be different for applying a transform or not (e.g., transform skipping). The value/factor/constant or square procedure may be different for the chroma residual scaling factor derivation. In another embodiment, the chroma residual scaling factor derivation may be different for different prediction modes or different residual energy levels.

On the encoder side, the scaling factor derivation typically includes deriving λ for the quantization parameter. In one embodiment, the entire TU prediction data is used to derive the λ value. For chroma residual scaling, the TU is still divided into sub-blocks. Each sub-block may derive its own scaling factor.

In the BIO and DMVR processes, the same type of process problem will be encountered. For example, for the BIO process, a TU/PU/CU level SAD (sum of absolute differences ) calculation is performed. The BIO process may be disabled if the calculated cost is small enough. For the DMVR procedure, this is not a friendly design if the entire CU/PU/TU is used to derive one MV difference (MVD). It is therefore proposed to align the BIO with the DMVR, or even the BIO and/or the DMVR with the chroma residual scaling procedure, which divides the current block into KxL blocks. For example, for BIO and DMVR processes, the current block is divided into KxL blocks. For each KxL block, it can calculate the cost of its BIO early termination decision, or use the DMVR process to derive its own MVD. In another example, for a BIO or DMVR process, the current block is divided into KxL blocks for performing the BIO and DMVR processes, where the KxL (in units of luma sample precision) is the same size as the basic unit of the chroma residual scaling process.

In another embodiment, different modes may use reference luminance samples at different locations.

In one embodiment, for blocks that can reference neighboring reconstructed samples for the prediction process, the reference luma samples used for the scaled value derivation are from neighboring reconstructed samples or reference boundary samples used to generate predictors of the current CU or TU. For example, if the current block is an intra prediction mode, the reference luma samples are upper left, or left reference boundary samples of the current CU. Thus, for intra sub-partition prediction (ISP) mode, the chroma residual scaling values are derived using the L-shaped boundary reconstructed samples of the current CU/TU (not the sub-partition TU), the upper left corner, the upper or the left side boundary reconstructed samples. In another example, if the current block is an intra prediction mode, the reference luma samples are upper left reference boundary samples of the current TU. Thus, for intra sub-partition prediction (ISP) mode, the top left, upper or left boundary reconstructed samples of the L-shaped boundary reconstructed samples of the current TU (sub-partition) are used to derive the chroma residual scaling value. The upper left L-shaped boundary reconstruction sample may be one sample.

In another example, if the current CU is an inter prediction mode, but predicted by a combined inter/intra mode (CIIP) or other prediction method requiring neighboring reconstructed samples, the reference luma samples may be reference boundary reconstructed samples or reference boundary samples (e.g., using upper left neighboring reconstructed samples) used to generate predictors of the current CU or TU, as described above. As known in the art, CIIP is yet another codec tool developed in recent years. CIIP uses a weighted average of the inter and intra prediction signals to obtain CIIP predictions.

In another embodiment, if the current block is an inter prediction mode, the reference luma samples may be left upper corner luma prediction samples of the current CU or TU.

In one embodiment, if CIIP mode, the reference luma samples are the top left luma prediction samples of the inter predictor.

In another embodiment, if the current block is an inter prediction mode other than the CIIP mode, the reference luma sample may be an upper left luma prediction sample of the current CU or TU. In this embodiment, a block encoded in the CIIP mode is regarded as an intra prediction mode, and any of the above methods relating to the intra prediction mode may be applied.

In another embodiment, if the current block is IBC mode, the decision to refer to luma samples is the same as the inter prediction mode. As known in the art, IBC (intra block copy) is a new codec tool developed in recent years. IBC is similar to inter prediction mode. However, IBC uses reference pixels in the current frame instead of reference pixels in previously encoded frames.

In another embodiment, if the current block is IBC mode, the determination of the reference luma samples is the same as the intra prediction mode.

When one or more reference luma samples are prediction samples of the current CU or TU, a different number of samples may be used as described in the above embodiments.

In one embodiment, the above embodiments of intra and inter prediction modes may be combined.

In one embodiment, for intra prediction mode and CIIP mode, the reference luma samples are upper left boundary reference samples used to generate intra predictors, and for inter prediction modes other than CIIP mode, the reference luma samples are upper left corner luma prediction samples.

In one embodiment, for intra prediction modes, the reference luma samples are upper left boundary reference samples used to generate intra predictors, and for inter prediction modes, the reference luma samples are upper left luma prediction samples except for CIIP mode. For CIIP mode, the reference luma samples are the top left luma prediction samples of the inter predictor. In other words, the prediction samples are mixed with the intra prediction samples prior to use.

In one embodiment, for intra-prediction mode and CIIP mode, the reference luma samples are top-left, top-left or left (first available) boundary reconstruction samples, and for inter-prediction mode (other than CIIP mode), the reference luma samples are top-left luma prediction samples. In other words, the prediction samples are mixed with the intra prediction samples prior to use.

In another example, only the upper left reconstructed sample is used.

If the reference sample is not available, the scaling factor is set to a default value. In one embodiment, the default value is equal to (1 < < PREC), where PREC is a prediction of chroma scaling.

In one embodiment, for intra prediction modes, the reference luma samples are top left, top left or left (first available) boundary reconstruction samples, and for inter prediction modes (other than CIIP modes), the reference luma samples are the top left luma prediction samples. For CIIP mode, the reference luma samples are the top left luma prediction samples of the inter predictor. In other words, the prediction samples are mixed with the intra prediction samples prior to use.

Mode constraint and conditional prohibition of chroma resolution within root block

In another embodiment, a root block is determined and the luminance component of the root block may be further divided into smaller blocks. According to this embodiment, it is determined whether the chrominance components of the root block can be further divided according to the prediction mode of the luminance block within the same root block.

In the previous method, three cases of the definition of "same mode" are listed below:

the situation 1. The same mode means that all blocks within a root block must be intra prediction mode, or all blocks within a root block must be inter prediction mode, or all blocks within a root block must be IBC mode.

The situation 2. The same mode means that all blocks within the root block must be intra prediction mode or that all blocks within the root block must be one of inter prediction mode and IBC prediction mode (inter/IBC mode).

The situation 3. The same mode means that all blocks within a root block must be one of intra prediction mode and IBC prediction mode (intra/IBC mode), or that all blocks within a root block must be inter prediction mode.

In one embodiment, the partitioning of the chroma components follows the luma block if all blocks within the current root block are inter prediction mode, inter/IBC mode, and inter prediction mode for case 1, case 2, and case 3, respectively. If all blocks within the current root block are intra prediction mode, intra prediction mode and intra/IBC mode for case 1, case 2 and case 3, respectively, the chrominance components of this root block cannot be further split, thus resulting in a plurality of luminance blocks corresponding to one chrominance block.

In another embodiment, a root block is determined and the luminance component of the root block may be further divided into smaller blocks. According to this embodiment, it is determined whether the chrominance components of the root block cannot be further partitioned. In this region, the luminance blocks may be the same mode or may be different modes.

In one embodiment, chroma residual scaling cannot be applied when the chroma components are not allowed to be further partitioned. In another embodiment, chroma residual scaling may still be applied when the chroma components are not allowed to split further. The locations of the reference luminance samples may be different. In one embodiment, the top left NxM luma prediction samples of the collocated luma block are used. N and M may be 1, 2, 4, 8, 16, 32, 64 and 128. In another embodiment, reconstructed top boundary K reference samples of the current root block are used. In another embodiment, the reconstructed left boundary K reference samples of the current root block are used. K may be 1, 2, 4, 8, 16, 32, 64, and 128. In another embodiment, the reconstructed upper left reference sample of the current root block is used.

In another embodiment, chroma residual scaling cannot be applied when the chroma components are not allowed to be further partitioned and the chroma root block is encoded and decoded in intra mode. In another example, when the chroma components are not allowed to be further split and the chroma root block is encoded in IBC mode, then chroma residual scaling cannot be applied. In another embodiment, chroma residual scaling may still be applied when the chroma components are not allowed to be further split and the chroma root block is encoded in intra mode. In another example, chroma residual scaling may still be applied when the chroma components are not allowed to be further split and the chroma root block is encoded in IBC mode. The locations of the reference luminance samples may be different. In one embodiment, the top left NxM luma prediction samples of the collocated luma block are used. N and M may be 1, 2, 4, 8, 16, 32, 64 and 128. In another embodiment, the reconstructed top boundary K reference samples of the current root block are used. In another embodiment, the reconstructed left boundary K reference samples of the current root block are used. K may be 1, 2, 4, 8, 16, 32, 64, and 128. In another embodiment, the reconstructed upper left reference sample of the current root block is used.

In another embodiment, chroma residual scaling cannot be applied when the chroma block is in the chroma root block. In another embodiment, chroma residual scaling may still be applied when the chroma block is in the chroma root block. The locations of the reference luminance samples may be different. In one embodiment, the top left NxM luma prediction samples of the collocated luma block are used. N and M may be 1, 2, 4, 8, 16, 32, 64 and 128. In another embodiment, the reconstructed top boundary K reference samples of the current root block are used. In another embodiment, K reference samples of the reconstructed left boundary of the current root block are used. K may be 1, 2, 4, 8, 16, 32, 64, and 128. In another embodiment, the reconstructed upper left reference sample of the current root block is used.

The LMCS maps samples in the original domain to the remodelled domain for better data estimation. The mapping curve is approximated by a piecewise linear (PWL) model (appurtenance). To convert the sample values from the original domain to the remodel domain, a look-up-table (LUT) is used. The number of entries (entry number) of the LUT is the same as the input sample dynamic range. For example, if 10 data inputs are used, a 1024 entry LUT is used. If a 14 data input is used, 8192 entry LUTs are used. In a hardware implementation, the cost of such a LUT is high. Thus, a piecewise linear model may be used. The input may be compared to each of the plurality of segments to find the segment to which the input belongs. In each segment, a corresponding output value may be calculated based on the characteristics of the segment.

Various methods of LMCS are disclosed in accordance with embodiments of the present invention.

Method 1-PCM mode with LMCS

The LMCS maps samples in the original domain to the remodelled domain for better data estimation. The mapping curve is approximated by a piecewise linear model. A look-up table (LUT) is used to convert the sample values from the original domain to the remodel domain. The number of entries of the LUT is the same as the input sample dynamic range. For example, if 10 data inputs are used, a 1024 entry LUT is used. If a 14 data input is used, a 8192 entry LUT is used.

In one embodiment, when pulse codec modulation (Pulse code Modulation, abbreviated PCM) codec is used, LMCS is disabled, which may enable lossless codec. This is because the mapping process may introduce some rounding of digits or it may not map it exactly back to the original value after performing the forward and backward mapping, resulting in lossy codec. According to one embodiment of the invention, one or more high level grammars of PCM codec are signaled at SPS/PPS/APS/slice/tile group/tile/picture level and signaled before LMCS syntax. When determining that a tile/tile group/picture/slice/sequence uses PCM codec, syntax elements (remodeling tools or remodeling models) related to the LMCS may be skipped, inferred to be unused, or constrained to be unused (e.g., encoder constraints to prohibit the LMCS from being used for PCM codec).

In another embodiment, if PCM codec mode is applied in tile/tile group/slice/picture/sequence level regions, then reshaping may still be applied. However, the mapping table for forward and reverse remodeling should be identity mapping (identity mapping), e.g., input equals output, or the mapping function is a line with slope equal to 1.

In one example, the mapping table may be signaled, but should be an identity mapping table. In another example, the mapping table is not signaled. A default identity mapping table is used. The default mapping is a simple identical mapping in which the input is equal to the output.

In another embodiment, if CU/PU/TU level PCM codec and/or transform quantization bypass mode is applied, the residual or transformed residual should be encoded in the original domain to achieve PCM codec. For example, predictors (e.g., inter-mode predictors, intra-block copy mode predictors, palette mode predictors) should also be located in the original domain. For intra-prediction or any other prediction mode that uses neighboring reconstructed samples to generate predictors (e.g., combined inter/intra-prediction), the neighboring reconstructed samples are converted to the original domain prior to generating the predictors. In another example, if a predictor (e.g., an intra-mode predictor) is generated in the remodelled domain, the generated predictor is converted to the original domain. In this example, for an inter mode predictor, when PCM mode is used, it will not pass through the forward remodel to become a predictor of the remodeled domain. The residual data is encoded and decoded in the original domain. The syntax is used to specify the fields of the reconstructed CU samples. Thus, when CU/PU/TU level PCM codec and/or transform quantization bypass mode is applied, only neighboring reconstructed samples in the remodelled domain need to be inversely mapped to the original domain if intra prediction is used for prediction.

When coding a current intra CU in lossy coding, forward mapping is required if the neighboring reconstructed samples are in the original domain. After mapping the neighboring reconstructed samples to the remodelling domain, intra-prediction samples will be generated using the remodeled neighboring reconstructed samples.

In another embodiment, if the current intra CU is encoded in lossy encoding, then the neighboring reconstructed samples are considered as remolded samples, regardless of which domain they belong to.

In another embodiment, if CU/PU/TU level PCM codec and/or transform quantization bypass mode is applied, predictors may still be generated in the remodelled domain, but reconstructed samples are not reconstructed by inverse mapping (to the original domain). However, the reconstructed sample in the remodelled domain should be the value obtained by PCM-ing the original sample. For example, for intra prediction or any other prediction mode that uses neighboring reconstructed samples to generate predictors, it is not necessary to convert neighboring samples back to the original domain. The remodelled domain neighbor samples may be used to generate predictors. For inter prediction, the predictor may be converted by forward mapping like lossy codec or not. In another embodiment, the backward mapping may still be applied. However, the mapping table of the backward mapping is the same mapping, for example, a one-to-one mapping with a slope equal to 1 or an output equal to the input.

In another embodiment, if CU/PU/TU level PCM codec and/or transform quantization bypass mode is applied, the forward and backward mapping is disabled or the same mapping (for all prediction modes) is used. In another embodiment, the residual/predictor/reconstructed samples may still be encoded in the remodelled domain. However, there are encoder constraints or data stream consistency requirements that the reconstructed samples can be converted to the original domain when PCM mode is applied, and the original domain reconstructed samples should be identical to the input samples.

In one embodiment, if CU/PU/TU level PCM codec and/or transform quantization bypass mode is applied, then no chroma residual scaling is applied, or the scaling factor is set to 1, or the scaling factor is limited to a range. For example, the scaling factor should be no greater than 1 or no less than 1. In another embodiment, when the transform skip mode is applied, no chroma residual scaling is applied. In another embodiment, when the transform skip mode is applied to the chroma components, no chroma residual scaling is applied.

In another embodiment, if CU/PU/TU level PCM codec and/or transform quantization bypass mode is applied, the residual or the transformed residual should be encoded in the remodelled domain, with the output of the mapping table being the same as the input. Thus, the mapping process does not introduce lossy codec.

In another embodiment, if a CU/PU/TU level PCM mode and/or a transform quantization bypass mode is used, neighboring reconstructed samples are converted to the original domain. The prediction samples of the current block may still be converted by reshaping. However, the mapping tables for forward and reverse remodeling should be one-to-one mappings, e.g., output equals input, or the mapping function corresponds to a line with a slope equal to 1.

Method 2-derivation of inverse scaling factor

In one embodiment, the inverse scaling factor may be derived as follows:

InvScaleCoeff[i]＝

OrgCW*((1<<SCALE_FP_PREC)/lmcsCW[i]).

thus, since the number of possible values of the denominator (e.g., lmcsCW [ i ]) is limited, a lookup table can be used to achieve division of the value of a non-2 power (e.g., lmcsCW [ i ]). The lookup table contains values of (1 < < SCALE_FP_PREC)/lmcsCW [ i ].

Method 3-LMCS with default number of codewords

In one embodiment, the number of codewords per data sub-in the mapping domain (e.g., lmcsCW [ i ]) can be derived using a default number of codewords instead of using OrgCW (which depends only on the data depth of the input data).

In the proposed method, the variable lmcsCW [ i ] is derived according to the following formula, where i=lmcs_min_bin_idx to LmcsMaxBinIdx:

lmcsCW[i]＝default_CW+lmcsDeltaCW[i],

wherein default_cw is derived at the decoder side or recoder signaled.

In one embodiment, if default_cw is derived at the decoder side, it may be derived from lmcs_min_bin_idx and lmcs maxbinidx. If the sum of the number of data children less than lmcs_min_bin_idx and the number of data children greater than lmcs maxbinidx is greater than lmcs_min_bin_idx, then default_cw may be adjusted to a value greater than OrgCW.

For example, if the sum of the number of data sub-numbers less than lmcs_min_bin_idx and greater than lmcs maxbinidx is equal to 2, then default_cw is derived as default_cw=orgcw+a, where a is a positive integer (e.g., 1,2, 3 …).

If the sum of the number of data sub-numbers less than lmcs_min_bin_idx and the number of data sub-numbers greater than lmcs maxbinidx is equal to 0, then default_cw is equal to OrgCW.

In one embodiment, if default_CW is signaled, two grammars default_delta_abs_CW and default_delta_sign_CW_flag are signaled before lmcs_delta_cw_pre_minus1.

The variable default delta abs CW represents the absolute difference between default CW and OrgCW and the variable default delta sign CW flag represents the increment value positive or negative. The default delta sign CW flag is signaled only if the default delta abs CW is greater than 0.

In one embodiment, if default_cw is signaled, then syntax default_delta_cw is signaled before lmcs_delta_cw_pre_minus1.

The variable default delta CW represents the difference between default CW and OrgCW.

Method 4-remodeling Curve update

In one embodiment, the remodeling curve is updated in each frame or every other frame.

Chroma scaling with VPDU constraints

A picture may be divided into several non-overlapping MxN blocks. These MxN non-overlapping blocks as processing data units are called VPDUs. M and N may be 64, or any predefined or signaled value, or a value related to the maximum transform block size.

In one embodiment, for the chroma component, the chroma residual scaling uses reference luma reconstructed samples outside the current VPDU, such as the previously encoded VPDU.

In one embodiment, the reference luminance sample may be one or more regions. For example, the reference samples are KxL blocks outside the current VPDU. K and L may be 2, 4, 8, 16 or 32. In detail, according to the present embodiment, the size of the current VPDU is equal to min (CtbSizeY, 64), and the reference luminance sample numbers of the top boundary and the left boundary are respectively equal to min (CtbSizeY, 64). The variable CtbSizeY specifies the luminance width and the luminance height of the luminance coding tree block.

In another embodiment, the reference reconstructed luma samples are along the VPDU top boundary or the left boundary or both, as shown in fig. 4. The number of reference luminance samples is the power of 2.

In another embodiment, the reference reconstructed luma sample is only one sample value. In one embodiment, the location may be the upper left location of the L-shaped boundary of the current VPDU, such as the TL location in fig. 5. In another embodiment, the location of the reference sample may be a location above the current VPDU, such as the a location in fig. 5. In another embodiment, the position of the reference sample may be the left position of the current VPDU, e.g. the L position in fig. 5.

In another embodiment, the chroma scaling is derived only once in each VPDU and the scaling factor is derived by the first CU in each VPDU. In detail, the size of the VPDU is equal to Min (CtbSizeY, 64), and the reference luminance samples for deriving the chroma scaling factor are the same according to the present embodiment for all blocks in the Min (CtbSizeY, 64) multiplied by Min (CtbSizeY, 64) region (i.e., in the same VPDU). The variable CtbSizeY specifies the luminance width and the luminance height of the luminance coding tree block.

In another embodiment, a reference Coding Unit (CU) for chroma scaling is derived from a VPDU corresponding to the current block (e.g., chroma scaling is always derived by the first CU in the VPDU even if it is not applicable to the CU). In detail, according to this embodiment, the size of the VPDU is equal to Min (CtbSizeY, 64), and the reference CU covers the upper left position of the current VPDU. The reference luma samples include the Min (CtbSizeY, 64) reconstructed luma samples along the top boundary of the reference CU and the Min (CtbSizeY, 64) luma reconstructed samples along the left boundary of the reference CU. In another embodiment, if chroma scaling is not applied to the first CU in the current VPDU, the scaling factor is set to a default value. In one embodiment, the default value is equal to (1 < < PREC), where PREC is a prediction of chroma scaling.

In another embodiment, the chroma scaling factors are shared in the picture/slice level. In another embodiment, the chroma scaling factors are shared in the APS level. In other words, for each signaled mapping curve, a chroma scaling factor is derived. In one example, the derivation of the chroma scaling factor for each remodelling curve is accomplished by averaging the scaling factors in all intervals (segments). In another embodiment, the scaling factor is derived by selecting most of the scaling factors in all intervals (segments). In another embodiment, the scaling factor is derived by directly dividing the difference between the maximum luma sample and the minimum luma sample by the difference between the maximum luma sample in the remodel domain and the minimum luma sample in the remodel domain.

Luminance residual with reduced delay

Instead of mapping luma prediction samples, the mapping may be applied only to luma residuals. In other words, the predicted samples of the luminance component are in the original domain, and the residual of the luminance component will be scaled by a scaling factor. The scaling factor is derived by referencing the luma prediction samples at different locations or in different ways. The above method suggested for chroma scaling may also be applied to luma residual scaling.

In another embodiment, the scaling factor is an average of two scaling factors of two consecutive intervals.

In one embodiment, the scaling factors for luminance residual scaling and chrominance residual scaling are the same.

Signaling chroma scaling factors

Instead of deriving the chroma scaling factor implicitly at the decoder side, embodiments of the present invention signal the chroma scaling factor at TB, TU, CU, CTU, VPDU, slice level, block level or APS level.

In one embodiment, one or more chroma scaling factors are signaled in an APS.

In one embodiment, if the chroma scaling factor is signaled at the TU level and if Cbfs (coded block flag) for Cb and Cr are both equal to 0, then the chroma scaling factor is not signaled.

In another embodiment, if the chroma scaling factor is signaled at the TU level and if the root Cbf is equal to 0, the chroma scaling factor is not signaled.

In one embodiment, if the chroma scaling factor is signaled at the TB level for the chroma Cb component and if Cbf of Cb is equal to 0, the chroma scaling factor is not signaled; for the chroma Cr component, if Cbf of Cr is equal to 0, the chroma scaling factor is not signaled.

In some embodiments, the video encoder must follow the aforementioned syntax design in order to generate a legitimate data stream, and the video decoder can only correctly decode the data stream if the parsing process complies with the aforementioned syntax design. When skipping syntax in the data stream, the encoder and decoder should set the syntax value to an inferred value to ensure that the encoding and decoding results match.

Fig. 6 illustrates a flow diagram of an exemplary decoding system for deriving one or more chroma residual scaling factors based on neighboring predicted or reconstructed luma samples of a collocated luma block in accordance with an embodiment of the invention. The steps shown in the flowcharts, and other subsequent flowcharts in this disclosure, may be implemented as program code executable on one or more processors (e.g., one or more CPUs) on the encoder side and/or decoder side. The steps shown in the flowcharts may also be implemented on a hardware basis, such as one or more electronic devices or processors arranged to perform the steps in the flowcharts. According to the method, a current chroma residual block is received in step 610. In step 620, one or more chroma residual scaling factors are derived based on the neighboring predicted or reconstructed luma samples of the collocated luma block associated with the current chroma residual block, wherein the neighboring predicted or reconstructed luma samples of the collocated luma block correspond to M samples of a top boundary of the collocated luma block and samples of N samples of a left boundary of the collocated luma block, wherein M and N are positive integers. Then in step 630, chroma scaling is applied to chroma residual samples of the current chroma residual block in accordance with the one or more chroma residual scaling factors.

Fig. 7 shows a flow chart of another exemplary decoding system for deriving one or more chroma residual scaling factors based on one or more reconstructed luma samples outside a collocated luma processing data unit according to an embodiment of the present invention. According to the method, chroma residual data associated with a current chroma processing data unit in a picture is received in step 710, wherein the picture is divided into a plurality of non-overlapping processing data units and each processing data unit comprises a luma processing data unit and one or more chroma processing data units. In step 720, one or more chroma residual scaling factors are derived based on one or more reconstructed luma samples outside of the collocated luma processing data unit associated with the current chroma processing data unit. In step 730, chroma scaling is applied to chroma residual samples of the current chroma processing data unit in accordance with the one or more chroma residual scaling factors.

Fig. 8 shows a flow chart of an exemplary codec system in which one or more chroma residual scaling factors are signaled in or parsed from an APS (adaptive parameter set) level of a video data stream at the encoder side according to an embodiment of the present invention. According to the method, a current chroma residual block is received in step 810. In step 820, one or more chroma residual scaling factors are signaled in the APS (adaptive parameter set) level of the video data stream or parsed in the APS level of its video data stream. In step 830, chroma scaling is applied to chroma residual samples of the current chroma residual block.

The flow chart shown is intended to illustrate an example of video codec according to the present invention. One skilled in the art may modify each step, rearrange steps, split steps, or combine steps to practice the invention without departing from the spirit of the invention. In this disclosure, specific syntax and semantics have been used to illustrate examples for implementing embodiments of the present invention. Those skilled in the art can practice the invention by replacing the grammar and semantics with equivalent grammar and semantics without departing from the spirit of the invention.

The previous description is presented to enable any person skilled in the art to practice the invention provided in the context of a particular application and its requirements. Various modifications to the described embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments. Thus, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein. In the previous detailed description, numerous specific details were set forth in order to provide a thorough understanding of the present invention. However, those skilled in the art will appreciate that the present invention may be practiced.

Embodiments of the invention as described above may be implemented in various hardware, software codecs or a combination of both. For example, embodiments of the invention may be one or more electronic circuits integrated into a video compression chip or program code integrated into video compression software to perform the processes described herein. Embodiments of the invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processes described herein. The invention may also relate to a number of functions performed by a computer processor, a digital signal processor, a microprocessor, or a Field Programmable Gate Array (FPGA). The processors may be configured to perform particular tasks according to the invention by executing machine readable software code or firmware code that defines the particular methods in which the invention is embodied. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different codec formats, styles and languages of software code, and other means of configuring code to perform tasks in accordance with the invention will not depart from the spirit and scope of the invention.

The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. A video decoding method, the method comprising:

receiving chroma residual data related to a current chroma processing data unit in a picture, wherein the picture is divided into a plurality of non-overlapping processing data units, and each processing data unit comprises a brightness processing data unit and one or more chroma processing data units;

deriving one or more chroma residual scaling factors based on one or more reconstructed luma samples outside of a collocated luma processing data unit associated with the current chroma processing data unit; and

applying chroma scaling to chroma residual samples of the current chroma processing data unit based on the derived one or more chroma residual scaling factors,

wherein the luminance size of the one or more non-overlapping processed data units is equal to Min (CtbSizeY, 64) times Min (CtbSizeY, 64), and wherein CtbSizeY specifies the luminance width and the luminance height of the luminance coding tree block.

2. The method of claim 1, wherein the one or more reconstructed luma samples outside the collocated luma processed data unit correspond to one or more reconstructed luma samples of one or more previously encoded luma processed data units.

3. The method of claim 2, wherein the one or more reconstructed luma samples of the one or more previously encoded luma processed data units correspond to one or more reconstructed luma samples along a top boundary of the collocated luma processed data unit, one or more reconstructed luma samples along a left boundary of the collocated luma processed data unit, or both.

4. A video decoding apparatus comprising one or more electronic circuits or processors arranged to:

5. A video decoding method, the method comprising:

deriving one or more chroma residual scaling factors from a first codec unit based on the one or more reconstructed luma samples, wherein the first codec unit overlays an upper left position of a collocated luma processed data unit associated with the current chroma processed data unit; and

applying chroma scaling to chroma residual samples of the chroma residual data associated with the current chroma processing data unit in accordance with the derived one or more chroma residual scaling factors,

6. The method of claim 5, wherein the one or more reconstructed luma samples outside the first codec unit that covers the collocated luma processed data unit correspond to one or more reconstructed luma samples of one or more previously-encoded luma processed data units.

7. The method of claim 5, wherein the one or more reconstructed luma samples of the one or more previously encoded luma processed data units correspond to one or more reconstructed luma samples along a top boundary of the first Codec Unit (CU), one or more reconstructed luma samples along a left boundary of the first Codec Unit (CU), or both, wherein the first codec unit overlays the collocated luma processed data unit.

8. The method of claim 7, wherein the number of luminance samples reconstructed along a reference that covers a top boundary of the first codec unit of the concatenated luminance processing data unit is equal to a width or a height of the luminance processing data unit.

9. The method of claim 7, wherein the number of luminance samples reconstructed along the reference overlaying the left boundary of the first codec unit of the concatenated luminance processing data unit is equal to the width or height of the luminance processing data unit.

10. A video encoding and decoding method, the method comprising:

receiving a current chroma residual block;

signaling one or more chroma residual scaling factors in an adaptive parameter set level of a video data stream or parsing the one or more chroma residual scaling factors in the adaptive parameter set level of the video data stream; and

chroma scaling is applied to chroma residual samples of the current chroma residual block.

11. A video codec device, the device comprising one or more electronic circuits or processors arranged to:

receiving a current chroma residual block;