WO2022026480A1 - Weighted ac prediction for video coding - Google Patents

Weighted ac prediction for video coding Download PDF

Info

Publication number
WO2022026480A1
WO2022026480A1 PCT/US2021/043335 US2021043335W WO2022026480A1 WO 2022026480 A1 WO2022026480 A1 WO 2022026480A1 US 2021043335 W US2021043335 W US 2021043335W WO 2022026480 A1 WO2022026480 A1 WO 2022026480A1
Authority
WO
WIPO (PCT)
Prior art keywords
prediction
block
frequency signal
video
signal
Prior art date
Application number
PCT/US2021/043335
Other languages
French (fr)
Inventor
Xiaoyu XIU
Wei Chen
Che-Wei Kuo
Yi-Wen Chen
Xianglin Wang
Tsung-Chuan MA
Hong-Jheng Jhu
Bing Yu
Original Assignee
Beijing Dajia Internet Information Technology Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co., Ltd. filed Critical Beijing Dajia Internet Information Technology Co., Ltd.
Priority to CN202180059215.9A priority Critical patent/CN116158079A/en
Publication of WO2022026480A1 publication Critical patent/WO2022026480A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based
    • H04N19/54Motion estimation other than block-based using feature points or meshes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures

Definitions

  • This disclosure is related to video coding and compression. More specifically, this disclosure relates to methods and apparatus for weighted alternating current (AC) prediction for video coding.
  • AC alternating current
  • Video coding is performed according to one or more video coding standards.
  • video coding standards include Versatile Video Coding (VVC), High Efficiency Video Coding (HEVC, also known as H.265 or MPEG-H Part2) and Advanced Video Coding (AVC, also known as H.264 or MPEG-4 Part 10), which are jointly developed by ISO/IEC MPEG and ITU-T VECG.
  • AV Versatile Video Coding
  • HEVC High Efficiency Video Coding
  • AVC also known as H.264 or MPEG-4 Part 10
  • AOMedia Video 1 was developed by Alliance for Open Media (AOM) as a successor to its preceding standard VP9.
  • Audio Video Coding which refers to digital audio and digital video compression standard
  • AVS Audio Video Coding
  • Most of the existing video coding standards are built upon the famous hybrid video coding framework, i.e., using block-based prediction methods (e.g., inter-prediction, intra-prediction) to reduce redundancy present in video images or sequences and using transform coding to compact the energy of the prediction errors.
  • An important goal of video coding techniques is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradations to video quality.
  • Examples of the present disclosure provide methods and apparatus for weighted alternating current (AC) prediction for video coding.
  • a method for video decoding in weighted alternating current prediction may include obtaining a plurality of inter prediction blocks from a number of temporal reference pictures associated with a video block. The method may also obtain a low-frequency signal based on the plurality of inter prediction blocks. The method may also obtain a plurality of high- frequency signals based on the plurality of inter prediction blocks. At least one of the plurality of high-frequency signals is associated with one prediction block. The method may also determine at least one weight associated with the high-frequency signal of at least one of the inter prediction blocks. The method may also calculate a final prediction signal of the video block based on a weighted sum of the low-frequency signal and the plurality of high-frequency signals using the at least one weight.
  • WACP weighted alternating current prediction
  • FIG. 1 is a block diagram of an encoder, according to an example of the present disclosure.
  • FIG. 2 is a block diagram of a decoder, according to an example of the present disclosure.
  • FIG. 3A is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
  • FIG. 3B is a diagram illustrating block partitions in a multi -type tree structure, according to an example of the present disclosure.
  • FIG. 3C is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
  • FIG. 3D is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
  • FIG. 3E is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
  • FIG. 4 is an illustration of bi-directional optical flow (BDOF), according to an example of the present disclosure.
  • FIG. 5 A is an illustration of a 4-parameter affine mode, according to an example of the present disclosure.
  • FIG. 5B is an illustration of a 4-parameter affine mode, according to an example of the present disclosure.
  • FIG. 6 is an illustration of a 6-parameter affine mode, according to an example of the present disclosure.
  • FIG. 7 is an illustration of an inheritance of the WACP mode, according to an example of the present disclosure.
  • FIG. 8 is a method for video decoding, according to an example of the present disclosure.
  • FIG. 9 is a method for video decoding, according to an example of the present disclosure.
  • FIG. 10 is a diagram illustrating a computing environment coupled with a user interface, according to an example of the present disclosure.
  • first, second, third, etc. may be used herein to describe various information, the information should not be limited by these terms. These terms are only used to distinguish one category of information from another. For example, without departing from the scope of the present disclosure, first information may be termed as second information; and similarly, second information may also be termed as first information.
  • first information may be termed as second information; and similarly, second information may also be termed as first information.
  • second information may also be termed as first information.
  • the term “if’ may be understood to mean “when” or “upon” or “in response to a judgment” depending on the context.
  • the first generation AVS standard includes Chinese national standard “Information Technology, Advanced Audio Video Coding, Part 2: Video” (known as AVS1) and “Information Technology, Advanced Audio Video Coding Part 16: Radio Television Video” (known as AVS+). It can offer around 50% bit-rate saving at the same perceptual quality compared to MPEG-2 standard.
  • the AVS1 standard video part was promulgated as the Chinese national standard in February 2006.
  • the second generation AVS standard includes the series of Chinese national standard “Information Technology, Efficient Multimedia Coding” (knows as AVS2), which is mainly targeted at the transmission of extra HD TV programs.
  • the coding efficiency of the AVS2 is double of that of the AVS+. In May 2016, the AVS2 was issued as the Chinese national standard.
  • the AVS2 standard video part was submitted by the Institute of Electrical and Electronics Engineers (IEEE) as one international standard for applications.
  • the AVS3 standard is one new generation video coding standard for UHD video application aiming at surpassing the coding efficiency of the latest international standard HEVC.
  • March 2019, at the 68-th AVS meeting the AVS3-P2 baseline was finished, which provides approximately 30% bit-rate savings over the HEVC standard.
  • HPM high performance model
  • the AVS3 standard is built upon the block-based hybrid video coding framework.
  • FIG. 1 shows a general diagram of a block-based video encoder for the VVC.
  • FIG. 1 shows atypical encoder 100.
  • the encoder 100 has video input 110, motion compensation 112, motion estimation 114, intra/inter mode decision 116, block predictor 140, adder 128, transform 130, quantization 132, prediction related info 142, intra prediction 118, picture buffer 120, inverse quantization 134, inverse transform 136, adder 126, memory 124, in-loop filter 122, entropy coding 138, and bitstream 144.
  • a video frame is partitioned into a plurality of video blocks for processing. For each given video block, a prediction is formed based on either an inter prediction approach or an intra prediction approach.
  • a prediction residual representing the difference between a current video block, part of video input 110, and its predictor, part of block predictor 140, is sent to a transform 130 from adder 128.
  • Transform coefficients are then sent from the Transform 130 to a Quantization 132 for entropy reduction.
  • Quantized coefficients are then fed to an Entropy Coding 138 to generate a compressed video bitstream.
  • prediction related information 142 from an intra/inter mode decision 116 such as video block partition info, motion vectors (MVs), reference picture index, and intra prediction mode, are also fed through the Entropy Coding 138 and saved into a compressed bitstream 144.
  • Compressed bitstream 144 includes a video bitstream.
  • decoder-related circuitries are also needed in order to reconstruct pixels for the purpose of prediction.
  • a prediction residual is reconstructed through an Inverse Quantization 134 and an Inverse Transform 136.
  • This reconstructed prediction residual is combined with a Block Predictor 140 to generate un-filtered reconstructed pixels for a current video block.
  • Spatial prediction uses pixels from samples of already coded neighboring blocks (which are called reference samples) in the same video frame as the current video block to predict the current video block.
  • Temporal prediction uses reconstructed pixels from already-coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal.
  • the temporal prediction signal for a given coding unit (CU) or coding block is usually signaled by one or more MVs, which indicate the amount and the direction of motion between the current CU and its temporal reference. Further, if multiple reference pictures are supported, one reference picture index is additionally sent, which is used to identify from which reference picture in the reference picture storage, the temporal prediction signal comes from.
  • Motion estimation 114 intakes video input 110 and a signal from picture buffer 120 and output, to motion compensation 112, amotion estimation signal.
  • Motion compensation 112 intakes video input 110, a signal from picture buffer 120, and motion estimation signal from motion estimation 114 and output to intra/inter mode decision 116, a motion compensation signal.
  • an intra/inter mode decision 116 in the encoder 100 chooses the best prediction mode, for example, based on the rate- distortion optimization method.
  • the block predictor 140 is then subtracted from the current video block, and the resulting prediction residual is de-correlated using the transform 130 and the quantization 132.
  • the resulting quantized residual coefficients are inverse quantized by the inverse quantization 134 and inverse transformed by the inverse transform 136 to form the reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU.
  • in-loop filtering 122 such as a deblocking filter, a sample adaptive offset (SAO), and/or an adaptive in-loop filter (ALF) may be applied on the reconstructed CU before it is put in the reference picture storage of the picture buffer 120 and used to code future video blocks.
  • coding mode inter or intra
  • prediction mode information motion information
  • quantized residual coefficients are all sent to the entropy coding unit 138 to be further compressed and packed to form the bitstream.
  • FIG. 1 gives the block diagram of a generic block-based hybrid video encoding system.
  • the input video signal is processed block by block (called coding units (CUs)).
  • CUs coding units
  • one coding tree unit (CTU) is split into CUs to adapt to varying local characteristics based on quad/binary/extended-quad-tree.
  • the concept of multiple partition unit type in the HEVC is removed, i.e., the separation of CU, prediction unit (PU) and transform unit (TU) does not exist in the AVS3; instead, each CU is always used as the basic unit for both prediction and transform without further partitions.
  • one CTU is firstly partitioned based on a quad-tree structure. Then, each quad-tree leaf node can be further partitioned based on a binary and extended-quad-tree structure.
  • FIG. 3A, 3B, 3C, 3D, and 3E there are five splitting types, quaternary partitioning, horizontal binary partitioning, vertical binary partitioning, horizontal extended quad-tree partitioning, and vertical extended quad-tree partitioning.
  • FIG. 3 A shows a diagram illustrating block quaternary partition, in accordance with the present disclosure.
  • FIG. 3B shows a diagram illustrating block vertical binary partition, in accordance with the present disclosure.
  • FIG. 3C shows a diagram illustrating block horizontal binary partition, in accordance with the present disclosure.
  • FIG. 3D shows a diagram illustrating block vertical extended quaternary partition, in accordance with the present disclosure.
  • FIG. 3E shows a diagram illustrating block horizontal extended quaternary partition, in accordance with the present disclosure.
  • spatial prediction and/or temporal prediction may be performed.
  • Spatial prediction (or “intra prediction”) uses pixels from the samples of already coded neighboring blocks (which are called reference samples) in the same video picture/slice to predict the current video block. Spatial prediction reduces spatial redundancy inherent in the video signal.
  • Temporal prediction (also referred to as “inter prediction” or “motion compensated prediction”) uses reconstructed pixels from the already coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal.
  • Temporal prediction signal for a given CU is usually signaled by one or more motion vectors (MVs) which indicate the amount and the direction of motion between the current CU and its temporal reference. Also, if multiple reference pictures are supported, one reference picture index is additionally sent, which is used to identify from which reference picture in the reference picture store the temporal prediction signal comes.
  • the mode decision block in the encoder chooses the best prediction mode, for example based on the rate- distortion optimization method. The prediction block is then subtracted from the current video block; and the prediction residual is de-correlated using transform and then quantized.
  • the quantized residual coefficients are inverse quantized and inverse transformed to form the reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU. Further in-loop filtering, such as deblocking filter, sample adaptive offset (SAO) and adaptive in-loop filter (ALF) may be applied on the reconstructed CU before it is put in the reference picture store and used as reference to code future video blocks.
  • coding mode inter or intra
  • prediction mode information, motion information, and quantized residual coefficients are all sent to the entropy coding unit to be further compressed and packed.
  • FIG. 2 shows a general block diagram of a video decoder for the VVC.
  • FIG. 2 shows a typical decoder 200 block diagram.
  • Decoder 200 has bitstream 210, entropy decoding 212, inverse quantization 214, inverse transform 216, adder 218, intra/inter mode selection 220, intra prediction 222, memory 230, in-loop filter 228, motion compensation 224, picture buffer 226, prediction related info 234, and video output 232.
  • Decoder 200 is similar to the reconstruction-related section residing in the encoder 100 of FIG. 1. In the decoder 200, an incoming video bitstream 210 is first decoded through an Entropy Decoding 212 to derive quantized coefficient levels and prediction-related information.
  • the quantized coefficient levels are then processed through an Inverse Quantization 214 and an Inverse Transform 216 to obtain a reconstructed prediction residual.
  • a block predictor mechanism implemented in an Intra/inter Mode Selector 220, is configured to perform either an Intra Prediction 222 or a Motion Compensation 224, based on decoded prediction information.
  • a set of unfiltered reconstructed pixels is obtained by summing up the reconstructed prediction residual from the Inverse Transform 216 and a predictive output generated by the block predictor mechanism, using a summer 218.
  • the reconstructed block may further go through an In-Loop Filter 228 before it is stored in a Picture Buffer 226, which functions as a reference picture store.
  • the reconstructed video in the Picture Buffer 226 may be sent to drive a display device, as well as used to predict future video blocks.
  • a filtering operation is performed on these reconstructed pixels to derive a final reconstructed Video Output 232.
  • FIG. 2 gives a general block diagram of a block-based video decoder.
  • the video bitstream is first entropy decoded at entropy decoding unit.
  • the coding mode and prediction information are sent to either the spatial prediction unit (if intra coded) or the temporal prediction unit (if inter coded) to form the prediction block.
  • the residual transform coefficients are sent to inverse quantization unit and inverse transform unit to reconstruct the residual block.
  • the prediction block and the residual block are then added together.
  • the reconstructed block may further go through in-loop filtering before it is stored in reference picture store.
  • the reconstructed video in reference picture store is then sent out for display, as well as used to predict future video blocks.
  • one weighted AC prediction (WACP) approach is proposed to enhance the efficiency of motion compensated prediction.
  • the proposed scheme aims at predicting the alternative current (AC) components of one video block from the weighted combination of the AC components from one or more of its temporal reference blocks. Because one better prediction can be achieved, the corresponding overhead of signaling the AC coefficients can be reduced by the proposed WACP scheme.
  • some existing inter coding technologies in the current VVC and AVS3 standards which are closely related with the proposed method, are briefly overviewed. After that, some shortcomings in the current inter prediction design are analyzed. Finally, the details of the proposed WACP scheme are discussed.
  • Weighted prediction was a coding tool that is primarily used to compensate the illuminance changes, such as fade-in and fade-out, in-between one current picture and its temporal reference pictures at motion compensation stage.
  • the WP was firstly adopted in the AVC and reused by the HEVC and the VVC. Specifically, when the WP is enabled, a set of multiplicative weight and additive offset are signaled for each picture in each of L0 and LI reference list in the slice header. For P slices, the prediction of one current block is generated by weighting the prediction samples obtained from one single reference picture. Specifically, let P(i, j) denotes the original prediction sample (i.e., prior to the WP) at coordinate (i, j), the final prediction sample is calculated as
  • Bi-Prediction With CU-Level Weight when the WP is not applied, the bi prediction signal is generated by averaging the uni-prediction signals obtained from two reference pictures.
  • one tool coding namely bi-prediction with CU-level weight (BCW)
  • BCW CU-level weight
  • the bi-prediction in the BCW is extended by allowing weighted averaging of two prediction signals, as depicted as:
  • the weight of one BCW coding block is allowed to be selected from a set of predefined weight values w e (—2, 3, 4, 5, 10 ⁇ and weight of 4 represents traditional bi-prediction case where the two uni prediction signals are equally weighted. For low-delay, only 3 weights w E ⁇ 3, 4, 5 ⁇ are allowed.
  • the two coding tools are targeting at solving the illumination change challenge at different granularities. However, because the interaction between the WP and the BCW could potentially complicate the VVC design, the two tools are disallowed to be enabled simultaneously. Specifically, when the WP is enabled for one slice, then the BCW weights for all the bi-prediction CUs in the slice are not signaled and inferred to be 4 (i.e., the equal weight being applied).
  • the MMVD/UMVE mode is introduced in both the VVC and AVS standards as one special merge mode. Specifically, in both the VVC and AVS3, the mode is signaled by one MMVD flag at coding block level.
  • the MMVD mode the first two candidates in the merge list for regular merge mode are selected as the two base merge candidates for MMVD. After one base merge candidate is selected and signaled, additional syntax elements are signaled to indicate the motion vector differences (MVDs) that are added to the motion of the selected merge candidate.
  • the MMVD syntax elements include a merge candidate flag to select the base merge candidate, a distance index to specify the MVD magnitude and a direction index to indicate the MVD direction.
  • the distance index specifies MVD magnitude which is defined based on one set of pre-defined offsets from the starting point. As shown in FIG.6, the offset is added to either horizontal or vertical component of the starting MV (i.e., the MVs of the selected base merge candidate). Table 1 illustrates the MVD offsets that are applied in the AVS3, respectively.
  • the direction index is used to specify the signs of the signaled MVD. It is noted that the meaning of the MVD sign could be variant according to the starting MVs.
  • the starting MVs is a uni-prediction MV or bi-prediction MVs with MVs pointing to two reference pictures whose POCs are both larger than the POC of the current picture, or both smaller than the POC of the current picture
  • the signaled sign is the sign of the MVD added to the starting MV.
  • the signaled sign is applied to the L0 MVD and the opposite value of the signaled sign is applied to the LI MVD.
  • BIO Bidirectional Optical Flow
  • BDOF Bi-Directional Optical Flow
  • the derivation of the refined motion vector for each sample in one block is based on the classical optical flow model.
  • Let / (x, y) be the sample value at the coordinate (x, y) of the prediction block derived from the reference picture list k (k 0, 7), and dl (k x, y)/ dx and dl (k x,y)/ dy are the horizontal and vertical gradients of the sample.
  • the motion refinement (v x , v y ) at (x, y) can be derived by
  • (MV x o, MV y o) and (MV xi , MV yi ) indicate the block-level motion vectors that are used to generate the two prediction blocks / (0) and I (1
  • the motion refinement (v x , v y ) at the sample location (x, y) is calculated by minimizing the difference D between the values of the samples after motion refinement compensation (i.e., A and B in FIG. 4), as shown as
  • the BIO is only applied to bi-prediction blocks which are predicted by two reference blocks from temporal neighboring pictures. Additionally, the BIO is enabled without sending additional information from encoder to decoder. Specifically, the BIO is applied to all the bi-directional predicted blocks which have both the forward and backward prediction signals.
  • affine motion compensated prediction is applied by signaling one flag for each inter coding block to indicate whether the translation motion or the affine motion model is applied for inter prediction.
  • two affine modes including 4-parameter affine mode and 6-parameter affine mode, are supported for one affine coding block.
  • the 4-parameter affine model has the following parameters: two parameters for translation movement in horizontal and vertical directions respectively, one parameter for zoom motion and one parameter for rotation motion for both directions.
  • Horizontal zoom parameter is equal to vertical zoom parameter.
  • Horizontal rotation parameter is equal to vertical rotation parameter.
  • CPMV control point motion vector
  • FIG. 5A shows an illustration of a 4-parameter affine model, in accordance with the present disclosure.
  • FIG. 5B shows an illustration of a 4-parameter affine model, in accordance with the present disclosure.
  • the 6-parameter affine mode has following parameters: two parameters for translation movement in horizontal and vertical directions respectively, one parameter for zoom motion and one parameter for rotation motion in horizontal direction, one parameter for zoom motion and one parameter for rotation motion in vertical direction.
  • the 6-parameter affine motion model is coded with three MVs at three CPMVs.
  • FIG. 6 shows an illustration of a 6-parameter affine model, in accordance with the present disclosure.
  • three control points of one 6-parameter affine block are located at the top-left, top-right and bottom left comer of the block.
  • the motion at top-left control point is related to translation motion
  • the motion at top-right control point is related to rotation and zoom motion in horizontal direction
  • the motion at bottom-left control point is related to rotation and zoom motion in vertical direction.
  • the rotation and zoom motion in horizontal direction of the 6-parameter may not be the same as those motion in vertical direction.
  • the motion vector of each sub block ( V’ Y, Vy ) is derived using three MVs at control points as:
  • Transform coding is one of the most important compression technology that is widely used in all of mainstream video codecs. It improves the coding efficiency by compacting most of the signal energy into few low-frequency coefficients and distributing the remaining energy into high-frequency coefficients. Therefore, with the quantization being applied, the coefficients having highest energy (i.e., low-frequency coefficients) over the block are finely quantized and allocated with more bits while the low energy coefficients (i.e., high frequency coefficients) are coarsely quantized and allocated with less bits. Due to such reason, in most scenarios (especially low bit-rate applications), the reconstructed video signal is usually dominated by low-frequency information and some high-frequency information that are present in the original video are missing and/or distorted in the reconstructed video signal.
  • the WP and the BCW are efficient tools to improve the efficiency of motion compensated prediction when there are global or local illumination variations among different pictures.
  • improvement is achieved by estimating brightness variations by a linear model, i.e., one multiplicative weight and one additive offset.
  • the weight and the offset are usually optimized by minimizing the mean squared error (MSE) between the current block and its prediction block, i.e., MSE
  • one weighted AC prediction (WACP) scheme is proposed to improve the prediction efficiency of AC components at motion compensation stage.
  • the AC components of one video block is predicted from the weighted combination of the AC components from one or more of its temporal reference blocks. Because one better AC prediction can be achieved, the signaling overhead of the AC coefficients is expected to be reduced therefore the overall motion compensation efficiency is expected to raise up when the WACP scheme is applied.
  • the idea of the WACP can be regarded as one extension of the famous multi-hypothesis prediction to estimate the value of the AC component at each sample of the current block based on the linear combination of the AC component of the collocated sample from multiple motion compensated prediction blocks.
  • the general idea of the proposed WACP idea can be formulated as follows: where P DC (i,j ) is the average (i.e., the DC component) at coordinate (i,j) of multiple prediction blocks; P ⁇ c ⁇ i,j) is the AC component at the coordinate (i,j) of the A-th prediction block; w k represents the weight that is applied to the AC component of the A-th prediction block and N is the total number of multi-hypothesis that are applied.
  • the value of P DC (i,j ) and P k c (i,j) can be further calculated as:
  • P k (i,j ) denotes the sample at coordinate (i,j) in the A-th prediction block.
  • one essential shortcoming of the proposed WACP scheme is how to balance the prediction efficiency gain of using more hypothesis and the required overhead to signal multiple weights.
  • more hypothesis candidates imply more accurate AC prediction, which however, requires more bits to code the weight values.
  • the required overhead may outweigh the prediction accuracy benefit.
  • it is proposed to signal the number of the hypothesis prediction signals applied in the WACP scheme and let the encoder to adaptively choose the optimal number for the best rate-distortion (R-D) performance.
  • the number of the applied hypothesis for the WACP may be signaled at various coding levels, e.g., sequence level, picture level, tile/slice level and coding block level and so forth, to provide different tradeoff between coding efficiency and hardware/software implementation cost.
  • it is proposed to usually use one fixed number of hypothesis prediction blocks when the proposed WACP scheme is applied. Without loss of psychology, N 2 will be used as an example to explain the proposed WACP method.
  • FIG. 8 shows a method for video decoding in weighted alternating current prediction (WACP), in accordance with the present disclsure.
  • the decoder may obtain a plurality of inter prediction blocks from a number of temporal reference pictures associated with a video block.
  • the decoder may obtain a low-frequency signal based on the plurality of inter prediction blocks.
  • the decoder may obtain a plurality of high-frequency signals based on the plurality of inter prediction blocks. At least one of the plurality of high-frequency signals is associated with one prediction block. In some embodiments, the decoder may obtain a plurality of high-frequency signals, where each high-frequency signal is associated with one prediction block.
  • the decoder may determine at least one weight associated with the high- frequency signal of at least one of the inter prediction blocks. In some embodiments, the decoder may determine a plurality of weights associated with the high-frequency signal of each inter prediction block.
  • the decoder may calculate a final prediction signal of the video block based on a weighted sum of the low-frequency signal and the plurality of high-frequency signals using the at least one weight.
  • FIG. 9 shows a method for video decoding in WACP, in accordance with the present disclsure.
  • the decoder may obtain a combined high-frequency signal based on a weighted sum of the plurality of high-frequency signals. At least one of the plurality of high- frequency signals is weighted by a corresponding weight associated with the at least one of the plurality of high-frequency signals.
  • the decoder may calculate the final prediction signal of the video block as a sum of the low-frequency signal and the combined high-frequency signal of the video block.
  • w 0 + w 1 0, such that only one weight needs to be explicitly signaled.
  • a weight for a high-frequency signal can be identified, for example, by subtracting weights of all other high-frequency signals from one. In another example, a weight for a high-frequency signal can be identified by subtracting weights of all other high-frequency signals from zero.
  • the equation (13) can be rewritten using the integer weights as R wacp ay) where w int is the integer weight values which are allowed to be selected from (—1, 0, 1 ⁇ in the first example and from (-6, -1, 0, 1, 6 ⁇ in the second example.
  • integer weight values may be allowed to be selected from (5, 0, 3 ⁇ and a fixed number of bits for right shift operation is set to 3.
  • the selected weight of the BD-WACP mode is explicitly signaled in bitstream if one coding block is bi-predicted.
  • merge mode is supported in both VVC and AVS3 where motion information of one coding block is not signaled but derived from one of a set of spatial/temporal merge candidates.
  • methods are proposed in this section to apply the BD-WACP to merge modes. Firstly, in addition to the motion information (i.e., reference picture indices and motion vectors), it is proposed to store the associated WACP weight for each bi-prediction. By this way, the BD-WACP weight can be inherited from block to block without signaling.
  • there are multiple types of merge modes including regular merge mode, inherited affine merge mode, constructed affine merge mode.
  • FIG. 7 illustrates an inheritance of the WACP mode.
  • FIG. 7 is one example to explain the inheritance scheme proposed for the WACP mode.
  • spatial merge candidate B2 which is coded by the BD-WACP mode with weight value of 1, is selected as the merge candidate of the current coding block.
  • both the BD-WACP weight and the motion information of B2 are inherited to generate the bi-prediction signal of the current block.
  • the motion information of constructed affine merge mode is generated from the motion information of multiple neighboring blocks.
  • Different methods may be applied to generate the BD-WACP weight for one constructed affine merge block.
  • the first method it is proposed to always disable the BD-WACP mode (i.e., forcibly setting the BD-WACP weight w to 0) when the current block is coded by the constructed affine merge mode.
  • the second method it is proposed to set the BD-WACP weight of one constructed affine merge block to be equal to the BD-WACP weight of the block that generates the first control-point motion vector (i.e., at the top-left comer of the current block).
  • the BD-WACP weight of one constructed affine merge block is set to be equal to the BD-WACP weight that are mostly used by the neighboring blocks. Additionally, when there are not enough neighboring blocks that are coded by the BD-WACP mode, the BD-WACP weight of the current block is set to 0 (i.e., disabling the BD-WACP).
  • the BD-WACP and the WP are two coding tools with different flavors: the BD-WACP is targeting at compensating the high-frequency information that are missed at reference pictures while the BD-WACP is concentrated on compensating the illumination variations (i.e., low-frequency information) between the current picture and the reference pictures. Therefore, there are not obvious conflicts that prevent the two coding tools from being used jointly. Specifically, when the WP is turned on, the WP parameters (i.e., weight and offset) are signaled at picture/slice level.
  • the WP parameters i.e., weight and offset
  • one additional BD-WACP weight can be signed when the current block is bi-predicted. Therefore, as one embodiment of the disclosure, it is proposed to apply the BD-WACP and the WP together.
  • the WP is firstly applied to adjust the illumination magnitudes of the prediction blocks which are combined by the WACP to generate the final bi-prediction. Assuming the WP weights and offsets associated with L0 and LI reference pictures and W BD ⁇ WACP i s the BD-WACP weight.
  • the bi-prediction is generated as
  • equation (15) the coordinate (i,j) is excluded from the equation to facilitate the presentation. Additionally, for easy description, all the value of weights and offsets are assumed to be floating. In practice, the parameter discretization method as depicted in equation (14) can be easily applied to do equation (15) by fixed-point implementations. In another embodiment, it is proposed to always disable the BD-WACP mode for one bi-predicted coding block when the WP is enabled for the picture/slice that the coding block belongs. In the case, the BD-WACP weight does not need to be signaled but always inferred to 0.
  • the BD-WACP can also be seamlessly combined with the BCW mode, because the two modes aiming at improving different components of motion compensated prediction signal. Therefore, in one or more embodiments, it is proposed to jointly apply the BD-WACP and the BCW at the same time for one bi-predicted coding block. Specifically, by this method, the BCW is firstly applied to adjust the local illumination magnitudes of the prediction blocks which are combined by the WACP to generate the final bi-prediction. Assuming w BCW are the BCW weight being applied, the bi-prediction is generated as
  • the BD-WACP weight does not need to be signaled but always inferred to 0.
  • the BD-WACP can also be freely combined with the BDOF. More specifically, when the two tools are combined, the original prediction signal P 0 and x are still applied to estimate the sample-wise sample refinement A BD0F as depicted in “bi-directional optical flow” (BDOF) section which is added to the enhanced bi-prediction signal by the BD-WACP, as shown as
  • BDOF bi-directional optical flow
  • variable-length code-words should be designed to accommodate the specific distribution of the weight values of the BD-WACP mode.
  • the BD-WACP weight 0 i.e., default bi-prediction
  • the weight values with larger absolute values are often less selected due to the relatively modifications to the AC components in the reference blocks. Therefore, they should be assigned with longer code-word.
  • Table 4 and Table5 shows two BD-WACP weight binarization methods when three weights (—1, 0, 1 ⁇ or five weights (—6, —1, 0, 1, 6 ⁇ are applied to the BD- WACP mode.
  • FIG. 10 shows a computing environment 1010 coupled with a user interface 1060.
  • the computing environment 1010 can be part of a data processing server.
  • the computing environment 1010 includes processor 1020, memory 1040, and I/O interface 1050.
  • the processor 1020 typically controls overall operations of the computing environment 1010, such as the operations associated with the display, data acquisition, data communications, and image processing.
  • the processor 1020 may include one or more processors to execute instructions to perform all or some of the steps in the above-described methods.
  • the processor 1020 may include one or more modules that facilitate the interaction between the processor 1020 and other components.
  • the processor may be a Central Processing Unit (CPU), a microprocessor, a single chip machine, a GPU, or the like.
  • the memory 1040 is configured to store various types of data to support the operation of the computing environment 1010.
  • Memory 1040 may include predetermine software 1042. Examples of such data comprise instructions for any applications or methods operated on the computing environment 1010, video datasets, image data, etc.
  • the memory 1040 may be implemented by using any type of volatile or non-volatile memory devices, or a combination thereof, such as a static random access memory (SRAM), an electrically erasable programmable read-only memory (EEPROM), an erasable programmable read-only memory (EPROM), a programmable read-only memory (PROM), a read-only memory (ROM), a magnetic memory, a flash memory, a magnetic or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory a magnetic memory
  • flash memory a magnetic
  • the I/O interface 1050 provides an interface between the processor 1020 and peripheral interface modules, such as a keyboard, a click wheel, buttons, and the like.
  • the buttons may include but are not limited to, a home button, a start scan button, and a stop scan button.
  • the I/O interface 1050 can be coupled with an encoder and decoder.
  • non-transitory computer-readable storage medium comprising a plurality of programs, such as comprised in the memory 1040, executable by the processor 1020 in the computing environment 1010, for performing the above-described methods.
  • the non-transitory computer-readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disc, an optical data storage device or the like.
  • the non-transitory computer-readable storage medium has stored therein a plurality of programs for execution by a computing device having one or more processors, where the plurality of programs when executed by the one or more processors, cause the computing device to perform the above-described method for motion prediction.
  • the computing environment 1010 may be implemented with one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field- programmable gate arrays (FPGAs), graphical processing units (GPUs), controllers, micro controllers, microprocessors, or other electronic components, for performing the above methods.
  • ASICs application-specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field- programmable gate arrays
  • GPUs graphical processing units
  • controllers micro controllers, microprocessors, or other electronic components, for performing the above methods.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method, apparatus, and a non-transitory computer-readable storage medium for video decoding in weighted alternating current prediction (WACP) are provided. The method may include obtaining a plurality of inter prediction blocks from a number of temporal reference pictures associated with a video block. The method may also inlcude obtaining a low-frequency signal based on the plurality of inter prediction blocks. The method may further include obtaining a plurality of high-frequency signals based on the plurality of inter prediction blocks. The method may also include determining at least one weight associated with the high-frequency signal of at least one of the inter prediction blocks. The method may further include calculating a final prediction signal of the video block based on a weighted sum of the low-frequency signal and the plurality of high-frequency signals using the at least one weight.

Description

WEIGHTED AC PREDICTION FOR VIDEO CODING
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application is based upon and claims priority to Provisional Applications No. 63/057,290, filed on July 27, 2020, the entire contents thereof are incorporated herein by reference in its entirety for all purposes.
TECHNICAL FIELD
[0002] This disclosure is related to video coding and compression. More specifically, this disclosure relates to methods and apparatus for weighted alternating current (AC) prediction for video coding.
BACKGROUND
[0003] Various video coding techniques may be used to compress video data. Video coding is performed according to one or more video coding standards. For example, nowadays, some well-known video coding standards include Versatile Video Coding (VVC), High Efficiency Video Coding (HEVC, also known as H.265 or MPEG-H Part2) and Advanced Video Coding (AVC, also known as H.264 or MPEG-4 Part 10), which are jointly developed by ISO/IEC MPEG and ITU-T VECG. AOMedia Video 1 (AVI) was developed by Alliance for Open Media (AOM) as a successor to its preceding standard VP9. Audio Video Coding (AVS), which refers to digital audio and digital video compression standard, is another video compression standard series developed by the Audio and Video Coding Standard Workgroup of China. Most of the existing video coding standards are built upon the famous hybrid video coding framework, i.e., using block-based prediction methods (e.g., inter-prediction, intra-prediction) to reduce redundancy present in video images or sequences and using transform coding to compact the energy of the prediction errors. An important goal of video coding techniques is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradations to video quality. SUMMARY
[0004] Examples of the present disclosure provide methods and apparatus for weighted alternating current (AC) prediction for video coding.
[0005] According to a first aspect of the present disclosure, a method for video decoding in weighted alternating current prediction (WACP) is provided. The method may include obtaining a plurality of inter prediction blocks from a number of temporal reference pictures associated with a video block. The method may also obtain a low-frequency signal based on the plurality of inter prediction blocks. The method may also obtain a plurality of high- frequency signals based on the plurality of inter prediction blocks. At least one of the plurality of high-frequency signals is associated with one prediction block. The method may also determine at least one weight associated with the high-frequency signal of at least one of the inter prediction blocks. The method may also calculate a final prediction signal of the video block based on a weighted sum of the low-frequency signal and the plurality of high-frequency signals using the at least one weight.
[0006] It is to be understood that the above general descriptions and detailed descriptions below are only examples and explanatory and not intended to limit the present disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate examples consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.
[0008] FIG. 1 is a block diagram of an encoder, according to an example of the present disclosure.
[0009] FIG. 2 is a block diagram of a decoder, according to an example of the present disclosure.
[0010] FIG. 3A is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
[0011] FIG. 3B is a diagram illustrating block partitions in a multi -type tree structure, according to an example of the present disclosure.
[0012] FIG. 3C is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
[0013] FIG. 3D is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
[0014] FIG. 3E is a diagram illustrating block partitions in a multi-type tree structure, according to an example of the present disclosure.
[0015] FIG. 4 is an illustration of bi-directional optical flow (BDOF), according to an example of the present disclosure.
[0016] FIG. 5 A is an illustration of a 4-parameter affine mode, according to an example of the present disclosure.
[0017] FIG. 5B is an illustration of a 4-parameter affine mode, according to an example of the present disclosure.
[0018] FIG. 6 is an illustration of a 6-parameter affine mode, according to an example of the present disclosure.
[0019] FIG. 7 is an illustration of an inheritance of the WACP mode, according to an example of the present disclosure.
[0020] FIG. 8 is a method for video decoding, according to an example of the present disclosure.
[0021] FIG. 9 is a method for video decoding, according to an example of the present disclosure.
[0022] FIG. 10 is a diagram illustrating a computing environment coupled with a user interface, according to an example of the present disclosure.
DETAILED DESCRIPTION
[0023] Reference will now be made in detail to example embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise represented. The implementations set forth in the following description of example embodiments do not represent all implementations consistent with the present disclosure. Instead, they are merely examples of apparatuses and methods consistent with aspects related to the present disclosure, as recited in the appended claims. [0024] The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to limit the present disclosure. As used in the present disclosure and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It shall also be understood that the term “and/or” used herein is intended to signify and include any or all possible combinations of one or more of the associated listed items.
[0025] It shall be understood that, although the terms “first,” “second,” “third,” etc., may be used herein to describe various information, the information should not be limited by these terms. These terms are only used to distinguish one category of information from another. For example, without departing from the scope of the present disclosure, first information may be termed as second information; and similarly, second information may also be termed as first information. As used herein, the term “if’ may be understood to mean “when” or “upon” or “in response to a judgment” depending on the context.
[0026] The first generation AVS standard includes Chinese national standard “Information Technology, Advanced Audio Video Coding, Part 2: Video” (known as AVS1) and “Information Technology, Advanced Audio Video Coding Part 16: Radio Television Video” (known as AVS+). It can offer around 50% bit-rate saving at the same perceptual quality compared to MPEG-2 standard. The AVS1 standard video part was promulgated as the Chinese national standard in February 2006. The second generation AVS standard includes the series of Chinese national standard “Information Technology, Efficient Multimedia Coding” (knows as AVS2), which is mainly targeted at the transmission of extra HD TV programs. The coding efficiency of the AVS2 is double of that of the AVS+. In May 2016, the AVS2 was issued as the Chinese national standard. Meanwhile, the AVS2 standard video part was submitted by the Institute of Electrical and Electronics Engineers (IEEE) as one international standard for applications. The AVS3 standard is one new generation video coding standard for UHD video application aiming at surpassing the coding efficiency of the latest international standard HEVC. In March 2019, at the 68-th AVS meeting, the AVS3-P2 baseline was finished, which provides approximately 30% bit-rate savings over the HEVC standard. Currently, there is one reference software, called high performance model (HPM), is maintained by the AVS group to demonstrate a reference implementation of the AVS3 standard.
[0027] Like the HEVC, the AVS3 standard is built upon the block-based hybrid video coding framework.
[0028] FIG. 1 shows a general diagram of a block-based video encoder for the VVC. Specifically, FIG. 1 shows atypical encoder 100. The encoder 100 has video input 110, motion compensation 112, motion estimation 114, intra/inter mode decision 116, block predictor 140, adder 128, transform 130, quantization 132, prediction related info 142, intra prediction 118, picture buffer 120, inverse quantization 134, inverse transform 136, adder 126, memory 124, in-loop filter 122, entropy coding 138, and bitstream 144.
[0029] In the encoder 100, a video frame is partitioned into a plurality of video blocks for processing. For each given video block, a prediction is formed based on either an inter prediction approach or an intra prediction approach.
[0030] A prediction residual, representing the difference between a current video block, part of video input 110, and its predictor, part of block predictor 140, is sent to a transform 130 from adder 128. Transform coefficients are then sent from the Transform 130 to a Quantization 132 for entropy reduction. Quantized coefficients are then fed to an Entropy Coding 138 to generate a compressed video bitstream. As shown in FIG. 1, prediction related information 142 from an intra/inter mode decision 116, such as video block partition info, motion vectors (MVs), reference picture index, and intra prediction mode, are also fed through the Entropy Coding 138 and saved into a compressed bitstream 144. Compressed bitstream 144 includes a video bitstream.
[0031] In the encoder 100, decoder-related circuitries are also needed in order to reconstruct pixels for the purpose of prediction. First, a prediction residual is reconstructed through an Inverse Quantization 134 and an Inverse Transform 136. This reconstructed prediction residual is combined with a Block Predictor 140 to generate un-filtered reconstructed pixels for a current video block.
[0032] Spatial prediction (or “intra prediction”) uses pixels from samples of already coded neighboring blocks (which are called reference samples) in the same video frame as the current video block to predict the current video block.
[0033] Temporal prediction (also referred to as “inter prediction”) uses reconstructed pixels from already-coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal. The temporal prediction signal for a given coding unit (CU) or coding block is usually signaled by one or more MVs, which indicate the amount and the direction of motion between the current CU and its temporal reference. Further, if multiple reference pictures are supported, one reference picture index is additionally sent, which is used to identify from which reference picture in the reference picture storage, the temporal prediction signal comes from.
[0034] Motion estimation 114 intakes video input 110 and a signal from picture buffer 120 and output, to motion compensation 112, amotion estimation signal. Motion compensation 112 intakes video input 110, a signal from picture buffer 120, and motion estimation signal from motion estimation 114 and output to intra/inter mode decision 116, a motion compensation signal.
[0035] After spatial and/or temporal prediction is performed, an intra/inter mode decision 116 in the encoder 100 chooses the best prediction mode, for example, based on the rate- distortion optimization method. The block predictor 140 is then subtracted from the current video block, and the resulting prediction residual is de-correlated using the transform 130 and the quantization 132. The resulting quantized residual coefficients are inverse quantized by the inverse quantization 134 and inverse transformed by the inverse transform 136 to form the reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU. Further in-loop filtering 122, such as a deblocking filter, a sample adaptive offset (SAO), and/or an adaptive in-loop filter (ALF) may be applied on the reconstructed CU before it is put in the reference picture storage of the picture buffer 120 and used to code future video blocks. To form the output video bitstream 144, coding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to the entropy coding unit 138 to be further compressed and packed to form the bitstream.
[0036] FIG. 1 gives the block diagram of a generic block-based hybrid video encoding system. The input video signal is processed block by block (called coding units (CUs)). Different from the HEVC which partitions blocks only based on quad-trees, in the AVS3, one coding tree unit (CTU) is split into CUs to adapt to varying local characteristics based on quad/binary/extended-quad-tree. Additionally, the concept of multiple partition unit type in the HEVC is removed, i.e., the separation of CU, prediction unit (PU) and transform unit (TU) does not exist in the AVS3; instead, each CU is always used as the basic unit for both prediction and transform without further partitions. In the tree partition structure of the AVS3, one CTU is firstly partitioned based on a quad-tree structure. Then, each quad-tree leaf node can be further partitioned based on a binary and extended-quad-tree structure.
[0037] As shown in FIG. 3A, 3B, 3C, 3D, and 3E, there are five splitting types, quaternary partitioning, horizontal binary partitioning, vertical binary partitioning, horizontal extended quad-tree partitioning, and vertical extended quad-tree partitioning.
[0038] FIG. 3 A shows a diagram illustrating block quaternary partition, in accordance with the present disclosure.
[0039] FIG. 3B shows a diagram illustrating block vertical binary partition, in accordance with the present disclosure.
[0040] FIG. 3C shows a diagram illustrating block horizontal binary partition, in accordance with the present disclosure.
[0041] FIG. 3D shows a diagram illustrating block vertical extended quaternary partition, in accordance with the present disclosure.
[0042] FIG. 3E shows a diagram illustrating block horizontal extended quaternary partition, in accordance with the present disclosure. [0043] In FIG.l, spatial prediction and/or temporal prediction may be performed. Spatial prediction (or “intra prediction”) uses pixels from the samples of already coded neighboring blocks (which are called reference samples) in the same video picture/slice to predict the current video block. Spatial prediction reduces spatial redundancy inherent in the video signal. Temporal prediction (also referred to as “inter prediction” or “motion compensated prediction”) uses reconstructed pixels from the already coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal. Temporal prediction signal for a given CU is usually signaled by one or more motion vectors (MVs) which indicate the amount and the direction of motion between the current CU and its temporal reference. Also, if multiple reference pictures are supported, one reference picture index is additionally sent, which is used to identify from which reference picture in the reference picture store the temporal prediction signal comes. After spatial and/or temporal prediction, the mode decision block in the encoder chooses the best prediction mode, for example based on the rate- distortion optimization method. The prediction block is then subtracted from the current video block; and the prediction residual is de-correlated using transform and then quantized. The quantized residual coefficients are inverse quantized and inverse transformed to form the reconstructed residual, which is then added back to the prediction block to form the reconstructed signal of the CU. Further in-loop filtering, such as deblocking filter, sample adaptive offset (SAO) and adaptive in-loop filter (ALF) may be applied on the reconstructed CU before it is put in the reference picture store and used as reference to code future video blocks. To form the output video bitstream, coding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to the entropy coding unit to be further compressed and packed.
[0044] FIG. 2 shows a general block diagram of a video decoder for the VVC. Specifically, FIG. 2 shows a typical decoder 200 block diagram. Decoder 200 has bitstream 210, entropy decoding 212, inverse quantization 214, inverse transform 216, adder 218, intra/inter mode selection 220, intra prediction 222, memory 230, in-loop filter 228, motion compensation 224, picture buffer 226, prediction related info 234, and video output 232. [0045] Decoder 200 is similar to the reconstruction-related section residing in the encoder 100 of FIG. 1. In the decoder 200, an incoming video bitstream 210 is first decoded through an Entropy Decoding 212 to derive quantized coefficient levels and prediction-related information. The quantized coefficient levels are then processed through an Inverse Quantization 214 and an Inverse Transform 216 to obtain a reconstructed prediction residual. A block predictor mechanism, implemented in an Intra/inter Mode Selector 220, is configured to perform either an Intra Prediction 222 or a Motion Compensation 224, based on decoded prediction information. A set of unfiltered reconstructed pixels is obtained by summing up the reconstructed prediction residual from the Inverse Transform 216 and a predictive output generated by the block predictor mechanism, using a summer 218.
[0046] The reconstructed block may further go through an In-Loop Filter 228 before it is stored in a Picture Buffer 226, which functions as a reference picture store. The reconstructed video in the Picture Buffer 226 may be sent to drive a display device, as well as used to predict future video blocks. In situations where the In-Loop Filter 228 is turned on, a filtering operation is performed on these reconstructed pixels to derive a final reconstructed Video Output 232. [0047] FIG. 2 gives a general block diagram of a block-based video decoder. The video bitstream is first entropy decoded at entropy decoding unit. The coding mode and prediction information are sent to either the spatial prediction unit (if intra coded) or the temporal prediction unit (if inter coded) to form the prediction block. The residual transform coefficients are sent to inverse quantization unit and inverse transform unit to reconstruct the residual block. The prediction block and the residual block are then added together. The reconstructed block may further go through in-loop filtering before it is stored in reference picture store. The reconstructed video in reference picture store is then sent out for display, as well as used to predict future video blocks.
[0048] In one or more embodiments, one weighted AC prediction (WACP) approach is proposed to enhance the efficiency of motion compensated prediction. The proposed scheme aims at predicting the alternative current (AC) components of one video block from the weighted combination of the AC components from one or more of its temporal reference blocks. Because one better prediction can be achieved, the corresponding overhead of signaling the AC coefficients can be reduced by the proposed WACP scheme. To facilitate the description, in the following, some existing inter coding technologies in the current VVC and AVS3 standards, which are closely related with the proposed method, are briefly overviewed. After that, some shortcomings in the current inter prediction design are analyzed. Finally, the details of the proposed WACP scheme are discussed.
[0049] Weighted Prediction
[0050] Weighted prediction (WP) was a coding tool that is primarily used to compensate the illuminance changes, such as fade-in and fade-out, in-between one current picture and its temporal reference pictures at motion compensation stage. The WP was firstly adopted in the AVC and reused by the HEVC and the VVC. Specifically, when the WP is enabled, a set of multiplicative weight and additive offset are signaled for each picture in each of L0 and LI reference list in the slice header. For P slices, the prediction of one current block is generated by weighting the prediction samples obtained from one single reference picture. Specifically, let P(i, j) denotes the original prediction sample (i.e., prior to the WP) at coordinate (i, j), the final prediction sample is calculated as
P i,j ) = w P(i,j ) + o (!) where w and o are the WP weight and offset that are associated with the reference picture of the current block. Similarly, for bi-prediction, the final bi-prediction is calculated as
Figure imgf000012_0001
where wo and oo, and W7 and o are the WP weights and offsets are associated with the reference pictures in L0 and LI, respectively. In general, the WP works efficiently for global illumination changes that vary linearly from picture to picture.
[0051] Bi-Prediction With CU-Level Weight [0052] In the preceding AVC and VVC standards, when the WP is not applied, the bi prediction signal is generated by averaging the uni-prediction signals obtained from two reference pictures. In the VVC, one tool coding, namely bi-prediction with CU-level weight (BCW), was introduced to improve the efficiency of bi-prediction. Specifically, instead of simple averaging, the bi-prediction in the BCW is extended by allowing weighted averaging of two prediction signals, as depicted as:
Figure imgf000013_0001
[0053] In the VVC, when the current picture is one low-delay picture, the weight of one BCW coding block is allowed to be selected from a set of predefined weight values w e (—2, 3, 4, 5, 10} and weight of 4 represents traditional bi-prediction case where the two uni prediction signals are equally weighted. For low-delay, only 3 weights w E {3, 4, 5} are allowed. Generally speaking, though there are some design similarities between the WP and the BCW, the two coding tools are targeting at solving the illumination change challenge at different granularities. However, because the interaction between the WP and the BCW could potentially complicate the VVC design, the two tools are disallowed to be enabled simultaneously. Specifically, when the WP is enabled for one slice, then the BCW weights for all the bi-prediction CUs in the slice are not signaled and inferred to be 4 (i.e., the equal weight being applied).
[0054] Merge Mode With Motion Vector Differences (MMVD)
[0055] In addition to conventional merge mode which derives the motion information of one current block from its spatial/temporal neighbors, the MMVD/UMVE mode is introduced in both the VVC and AVS standards as one special merge mode. Specifically, in both the VVC and AVS3, the mode is signaled by one MMVD flag at coding block level. In the MMVD mode, the first two candidates in the merge list for regular merge mode are selected as the two base merge candidates for MMVD. After one base merge candidate is selected and signaled, additional syntax elements are signaled to indicate the motion vector differences (MVDs) that are added to the motion of the selected merge candidate. The MMVD syntax elements include a merge candidate flag to select the base merge candidate, a distance index to specify the MVD magnitude and a direction index to indicate the MVD direction.
[0056] In the existing MMVD design, the distance index specifies MVD magnitude which is defined based on one set of pre-defined offsets from the starting point. As shown in FIG.6, the offset is added to either horizontal or vertical component of the starting MV (i.e., the MVs of the selected base merge candidate). Table 1 illustrates the MVD offsets that are applied in the AVS3, respectively.
Table 1 The MVD offset used in the AVS3
Figure imgf000014_0001
[0057] As shown in Table 2, the direction index is used to specify the signs of the signaled MVD. It is noted that the meaning of the MVD sign could be variant according to the starting MVs. When the starting MVs is a uni-prediction MV or bi-prediction MVs with MVs pointing to two reference pictures whose POCs are both larger than the POC of the current picture, or both smaller than the POC of the current picture, the signaled sign is the sign of the MVD added to the starting MV. When the starting MVs are bi-prediction MVs pointing to two reference pictures with one picture’s POC larger than the current picture and the other picture’s POC smaller than the current picture, the signaled sign is applied to the L0 MVD and the opposite value of the signaled sign is applied to the LI MVD.
Table 2 The MVD sign as specified by the direction index
Figure imgf000014_0002
[0058] Bidirectional Optical Flow (BIO) and Bi-Directional Optical Flow (BDOF)
[0059] Conventional bi-prediction in video coding is a simple combination of two temporal prediction blocks obtained from the reference pictures. However, due to the signaling cost and accuracy tradeoff of motion vectors, the motion vectors received at decoder end may not be so accurate. As a result, there may be still remaining small motion that can be observed between the two prediction blocks, which could reduce the efficiency of motion compensated prediction. To solve this shortcoming, the BIO tool is adopted in both VVC and AVS3 standards to compensate such motion for every sample inside one block. Specifically, the BIO is sample- wise motion refinement that is performed on top of the block-based motion-compensated predictions when bi-prediction is used. In the existing BIO design, the derivation of the refined motion vector for each sample in one block is based on the classical optical flow model. Let / (x, y) be the sample value at the coordinate (x, y) of the prediction block derived from the reference picture list k (k = 0, 7), and dl(k x, y)/ dx and dl(k x,y)/ dy are the horizontal and vertical gradients of the sample. Assuming the optical flow model is valid, the motion refinement (vx, vy ) at (x, y) can be derived by
Figure imgf000015_0001
[0060] With the combination of the optical flow equation Error! Reference source not found.) and the interpolation of the prediction blocks along the motion trajectory (as show in FIG. 4), we can obtain the BIO prediction as
Figure imgf000015_0002
[0061] In FIG. 4, (MVxo, MVyo) and (MVxi, MVyi) indicate the block-level motion vectors that are used to generate the two prediction blocks /(0) and I(1
[0062] Further, the motion refinement (vx, vy ) at the sample location (x, y) is calculated by minimizing the difference D between the values of the samples after motion refinement compensation (i.e., A and B in FIG. 4), as shown as
Figure imgf000015_0003
[0063] Additionally, in order to ensure the regularity of the derived motion refinement, it is assumed that the motion refinement is consistent within a local surrounding area centered at (x, y); therefore, in the current BIO design in the AVS3, the values of (vx, vy ) are derived by minimizing D inside the 4x4 window W around the current sample at (x, y) as
(v*, V* ) = argmin V A2 (i ) (7)
[0064] As shown in Error! Reference source not found.4) and Error! Reference source not found.5), in addition to the block-level MC, gradients need to be derived in the BIO for every sample of each motion compensated block (i.e.,
Figure imgf000016_0001
and /(1)) in order to derive the local motion refinement and generate the final predication at that sample location. In the AVS3, the gradients are calculated by a 2D separable finite impulse response (FIR) filtering process which defines a set of 8-tap filters and applies different filters to derive the horizontal and vertical gradients according to the precision of the block-level motion vector (e.g., (MVxo, MVyo) and (MVxi, MVyi) in FIG. 4). Table 3 illustrates the coefficients of the gradient filters that are used by the BIO.
Table 3 Gradient filters used in BIO
Figure imgf000016_0002
[0065] Finally, the BIO is only applied to bi-prediction blocks which are predicted by two reference blocks from temporal neighboring pictures. Additionally, the BIO is enabled without sending additional information from encoder to decoder. Specifically, the BIO is applied to all the bi-directional predicted blocks which have both the forward and backward prediction signals.
[0066] Affine Mode
[0067] In AVC and HEVC, only translation motion model is applied for motion compensated prediction. While in the real world, there are many kinds of motion, e.g., zoom in/out, rotation, perspective motions and other irregular motions. In the VVC, affine motion compensated prediction is applied by signaling one flag for each inter coding block to indicate whether the translation motion or the affine motion model is applied for inter prediction. In the current VVC design, two affine modes, including 4-parameter affine mode and 6-parameter affine mode, are supported for one affine coding block.
[0068] The 4-parameter affine model has the following parameters: two parameters for translation movement in horizontal and vertical directions respectively, one parameter for zoom motion and one parameter for rotation motion for both directions. Horizontal zoom parameter is equal to vertical zoom parameter. Horizontal rotation parameter is equal to vertical rotation parameter. To achieve a better accommodation of the motion vectors and affine parameter, in the VVC, those affine parameters are translated into two MVs (which are also called control point motion vector (CPMV)) located at the top-left comer and top-right comer of a current block. As shown in FIGS. 5A and 5B, the affine motion field of the block is described by two control point MVs (Vo, Vi).
[0069] FIG. 5A shows an illustration of a 4-parameter affine model, in accordance with the present disclosure.
[0070] FIG. 5B shows an illustration of a 4-parameter affine model, in accordance with the present disclosure.
[0071] Based on the control point motion, the motion field ( v’Y, vy) of one affine coded block is described as
Figure imgf000017_0001
[0072] The 6-parameter affine mode has following parameters: two parameters for translation movement in horizontal and vertical directions respectively, one parameter for zoom motion and one parameter for rotation motion in horizontal direction, one parameter for zoom motion and one parameter for rotation motion in vertical direction. The 6-parameter affine motion model is coded with three MVs at three CPMVs. [0073] FIG. 6 shows an illustration of a 6-parameter affine model, in accordance with the present disclosure.
[0074] As shown in FIG. 6, three control points of one 6-parameter affine block are located at the top-left, top-right and bottom left comer of the block. The motion at top-left control point is related to translation motion, and the motion at top-right control point is related to rotation and zoom motion in horizontal direction, and the motion at bottom-left control point is related to rotation and zoom motion in vertical direction. Compared to the 4-parameter affine motion model, the rotation and zoom motion in horizontal direction of the 6-parameter may not be the same as those motion in vertical direction. Assuming (Vo, Vi, V2) are the MVs of the top-left, top-right and bottom-left comers of the current block in FIG. 6, the motion vector of each sub block ( V’Y, Vy) is derived using three MVs at control points as:
Figure imgf000018_0001
[0075] Improvements to Video Decoding
[0076] Transform coding is one of the most important compression technology that is widely used in all of mainstream video codecs. It improves the coding efficiency by compacting most of the signal energy into few low-frequency coefficients and distributing the remaining energy into high-frequency coefficients. Therefore, with the quantization being applied, the coefficients having highest energy (i.e., low-frequency coefficients) over the block are finely quantized and allocated with more bits while the low energy coefficients (i.e., high frequency coefficients) are coarsely quantized and allocated with less bits. Due to such reason, in most scenarios (especially low bit-rate applications), the reconstructed video signal is usually dominated by low-frequency information and some high-frequency information that are present in the original video are missing and/or distorted in the reconstructed video signal.
Given that the reconstructed video signal is used as reference for inter prediction, such distorted high-frequency information could potentially result in severe performance drop for both the current picture and the subsequent pictures that are predicted from the current picture.
[0077] The WP and the BCW are efficient tools to improve the efficiency of motion compensated prediction when there are global or local illumination variations among different pictures. However, such improvement is achieved by estimating brightness variations by a linear model, i.e., one multiplicative weight and one additive offset. In practice, the weight and the offset are usually optimized by minimizing the mean squared error (MSE) between the current block and its prediction block, i.e.,
(w*, o* ) = argmin V (S(i,j) - P(i,}))2 (10) where S(i,j) and P(i,j ) represent the samples at coordinate (i,j) in the current block and the prediction block. Due to the dominate energy of low-frequency information in the reconstructed, the WP and the BCW can only compensate the differences between the low- frequency components (for instance, the direct current (DC) component) between the current block and its reference block but cannot recover the missing high-frequency information that may be missed in the reference samples.
[0078] Proposed Methods
[0079] In this disclosure, one weighted AC prediction (WACP) scheme is proposed to improve the prediction efficiency of AC components at motion compensation stage. In short, in the proposed method, the AC components of one video block is predicted from the weighted combination of the AC components from one or more of its temporal reference blocks. Because one better AC prediction can be achieved, the signaling overhead of the AC coefficients is expected to be reduced therefore the overall motion compensation efficiency is expected to raise up when the WACP scheme is applied.
[0080] Generalized Weighted AC Prediction
[0081] Conceptually, the idea of the WACP can be regarded as one extension of the famous multi-hypothesis prediction to estimate the value of the AC component at each sample of the current block based on the linear combination of the AC component of the collocated sample from multiple motion compensated prediction blocks. Specifically, the general idea of the proposed WACP idea can be formulated as follows:
Figure imgf000020_0001
where PDC(i,j ) is the average (i.e., the DC component) at coordinate (i,j) of multiple prediction blocks; P^c{i,j) is the AC component at the coordinate (i,j) of the A-th prediction block; wk represents the weight that is applied to the AC component of the A-th prediction block and N is the total number of multi-hypothesis that are applied. The value of PDC(i,j ) and Pk c(i,j) can be further calculated as:
Figure imgf000020_0002
[0082] In equation (12), Pk(i,j ) denotes the sample at coordinate (i,j) in the A-th prediction block.
[0083] Similar to the multi-hypothesis prediction, one essential shortcoming of the proposed WACP scheme is how to balance the prediction efficiency gain of using more hypothesis and the required overhead to signal multiple weights. Here, more hypothesis candidates imply more accurate AC prediction, which however, requires more bits to code the weight values. Sometimes, the required overhead may outweigh the prediction accuracy benefit. In one or more embodiments, it is proposed to signal the number of the hypothesis prediction signals applied in the WACP scheme and let the encoder to adaptively choose the optimal number for the best rate-distortion (R-D) performance. The number of the applied hypothesis for the WACP may be signaled at various coding levels, e.g., sequence level, picture level, tile/slice level and coding block level and so forth, to provide different tradeoff between coding efficiency and hardware/software implementation cost. In some embodiments, it is proposed to usually use one fixed number of hypothesis prediction blocks when the proposed WACP scheme is applied. Without loss of generosity, N = 2 will be used as an example to explain the proposed WACP method. [0084] FIG. 8 shows a method for video decoding in weighted alternating current prediction (WACP), in accordance with the present disclsure.
[0085] In step 810, the decoder may obtain a plurality of inter prediction blocks from a number of temporal reference pictures associated with a video block.
[0086] In step 812, the decoder may obtain a low-frequency signal based on the plurality of inter prediction blocks.
[0087] In step 814, the decoder may obtain a plurality of high-frequency signals based on the plurality of inter prediction blocks. At least one of the plurality of high-frequency signals is associated with one prediction block. In some embodiments, the decoder may obtain a plurality of high-frequency signals, where each high-frequency signal is associated with one prediction block.
[0088] In step 816, the decoder may determine at least one weight associated with the high- frequency signal of at least one of the inter prediction blocks. In some embodiments, the decoder may determine a plurality of weights associated with the high-frequency signal of each inter prediction block.
[0089] In step 818, the decoder may calculate a final prediction signal of the video block based on a weighted sum of the low-frequency signal and the plurality of high-frequency signals using the at least one weight.
[0090] FIG. 9 shows a method for video decoding in WACP, in accordance with the present disclsure.
[0091] In step 910, the decoder may obtain a combined high-frequency signal based on a weighted sum of the plurality of high-frequency signals. At least one of the plurality of high- frequency signals is weighted by a corresponding weight associated with the at least one of the plurality of high-frequency signals.
[0092] In step 912, the decoder may calculate the final prediction signal of the video block as a sum of the low-frequency signal and the combined high-frequency signal of the video block.
[0093] Bi-Directional Weighted AC Prediction [0094] Bi-directional weighted AC prediction (BD-WACP) is a special case of the generalized WACP where the number of motion compensated prediction blocks that are used is limited to 2, i.e., N = 2. Therefore, based on equation (11), the bi-prediction sample at coordinate (i,j) can be calculated by
Figure imgf000022_0001
where w0 and wt are the weights associated with the AC samples of the prediction signal P0 and Pt. As shown in equation (13), when w0 is equal to wt, the proposed WACP degrades to the traditional bi-prediction.
[0095] Assuming the BD-WACP can be adaptively switched at coding block level, to equation (13), two different weights need to be signaled for each bi-prediction block, which is costly considering the coding bits that may produce. To reduce the signaling overhead, one additional constraint can be applied to enforce the summation of two weights to be a constant such as zero, i.e., w0 + w1 = 0, such that only one weight needs to be explicitly signaled. A weight for a high-frequency signal can be identified, for example, by subtracting weights of all other high-frequency signals from one. In another example, a weight for a high-frequency signal can be identified by subtracting weights of all other high-frequency signals from zero. [0096] As show in equation (13), when the BD-WACP is applied, only one single weight w needs to be signaled in bitstream. However, the weight in equation (13) is assumed to be floating number which need to be quantized before transmission. As the errors resulting from the quantized may significantly affect degrade the WACP performance, it is important to choose the allowed WACP weights. In one specific example, three weights w e (—1/8, 0, 1/8} are proposed to be used for the BD-WACP. In another specific example, five weights (—6/8, — 1/8, 0, 1/8, 6/8 } are proposed to be used for the BD-WACP. When either of the two methods is applied, the corresponding absolute weight value can be represented by 3-bits. Therefore, the equation (13) can be rewritten using the integer weights as R wacp ay)
Figure imgf000023_0001
where wint is the integer weight values which are allowed to be selected from (—1, 0, 1} in the first example and from (-6, -1, 0, 1, 6} in the second example. In another example, integer weight values may be allowed to be selected from (5, 0, 3} and a fixed number of bits for right shift operation is set to 3. In another embodiment, instead of using fixed set of WACP weights, it is proposed to directly signal the allowed weights in bitstream (e.g., sequence parameter set, picture parameter set, slice header and so forth). Such method gives encoder more flexibility to select the desirable WACP weights according to the specific characteristic of the current sequence/picture/slice on the fly.
[0097] Inherited BD-WACP mode
[0098] In the above methods, the selected weight of the BD-WACP mode is explicitly signaled in bitstream if one coding block is bi-predicted. However, as discussed in introduction section, merge mode is supported in both VVC and AVS3 where motion information of one coding block is not signaled but derived from one of a set of spatial/temporal merge candidates. To reduce the signaling overhead of BD-WACP weights, methods are proposed in this section to apply the BD-WACP to merge modes. Firstly, in addition to the motion information (i.e., reference picture indices and motion vectors), it is proposed to store the associated WACP weight for each bi-prediction. By this way, the BD-WACP weight can be inherited from block to block without signaling. In the existing VVC and AVS3 designs, there are multiple types of merge modes: including regular merge mode, inherited affine merge mode, constructed affine merge mode.
[0099] Firstly, when the current coding block is coded with regular merge mode or inherited affine merge mode, the corresponding WACP weight can be directly copied from the weight of the selected merge candidate (as indicated by the signaled merge index). FIG. 7 illustrates an inheritance of the WACP mode. FIG. 7 is one example to explain the inheritance scheme proposed for the WACP mode. In FIG. 7, spatial merge candidate B2, which is coded by the BD-WACP mode with weight value of 1, is selected as the merge candidate of the current coding block. In this case, both the BD-WACP weight and the motion information of B2 are inherited to generate the bi-prediction signal of the current block.
[00100] Different from regular merge mode and inherited affine merge mode, the motion information of constructed affine merge mode is generated from the motion information of multiple neighboring blocks. Different methods may be applied to generate the BD-WACP weight for one constructed affine merge block. In the first method, it is proposed to always disable the BD-WACP mode (i.e., forcibly setting the BD-WACP weight w to 0) when the current block is coded by the constructed affine merge mode. In the second method, it is proposed to set the BD-WACP weight of one constructed affine merge block to be equal to the BD-WACP weight of the block that generates the first control-point motion vector (i.e., at the top-left comer of the current block). In the third method, it is proposed to set the BD-WACP weight of one constructed affine merge block to be equal to the BD-WACP weight that are mostly used by the neighboring blocks. Additionally, when there are not enough neighboring blocks that are coded by the BD-WACP mode, the BD-WACP weight of the current block is set to 0 (i.e., disabling the BD-WACP).
[00101] Harmonization of the BD-WACP With Other Inter Coding Techniques [00102] Harmonization Between the BD-WACP and the WP: Conceptually, the BD- WACP and the WP are two coding tools with different flavors: the BD-WACP is targeting at compensating the high-frequency information that are missed at reference pictures while the BD-WACP is concentrated on compensating the illumination variations (i.e., low-frequency information) between the current picture and the reference pictures. Therefore, there are not obvious conflicts that prevent the two coding tools from being used jointly. Specifically, when the WP is turned on, the WP parameters (i.e., weight and offset) are signaled at picture/slice level. At coding block level, one additional BD-WACP weight can be signed when the current block is bi-predicted. Therefore, as one embodiment of the disclosure, it is proposed to apply the BD-WACP and the WP together. In details, in the method, the WP is firstly applied to adjust the illumination magnitudes of the prediction blocks which are combined by the WACP to generate the final bi-prediction. Assuming
Figure imgf000025_0001
the WP weights and offsets associated with L0 and LI reference pictures and W BD~WACP is the BD-WACP weight. When the proposed method is applied, the bi-prediction is generated as
Figure imgf000025_0002
[00103] Note that in equation (15) the coordinate (i,j) is excluded from the equation to facilitate the presentation. Additionally, for easy description, all the value of weights and offsets are assumed to be floating. In practice, the parameter discretization method as depicted in equation (14) can be easily applied to do equation (15) by fixed-point implementations. In another embodiment, it is proposed to always disable the BD-WACP mode for one bi-predicted coding block when the WP is enabled for the picture/slice that the coding block belongs. In the case, the BD-WACP weight does not need to be signaled but always inferred to 0.
[00104] Harmonization between the BD-WACP and the BCW: similar to the WP, the BD-WACP can also be seamlessly combined with the BCW mode, because the two modes aiming at improving different components of motion compensated prediction signal. Therefore, in one or more embodiments, it is proposed to jointly apply the BD-WACP and the BCW at the same time for one bi-predicted coding block. Specifically, by this method, the BCW is firstly applied to adjust the local illumination magnitudes of the prediction blocks which are combined by the WACP to generate the final bi-prediction. Assuming wBCW are the BCW weight being applied, the bi-prediction is generated as
Figure imgf000025_0003
[00105] In some other embodiments, it is proposed to always disable the BD-WACP mode for one bi-predicted coding block when the BCW is enabled for the block. By this case, the BD-WACP weight does not need to be signaled but always inferred to 0.
[00106] Harmonization Between the BD-WACP and BDOF: The BD-WACP can also be freely combined with the BDOF. More specifically, when the two tools are combined, the original prediction signal P0 and x are still applied to estimate the sample-wise sample refinement ABD0F as depicted in “bi-directional optical flow” (BDOF) section which is added to the enhanced bi-prediction signal by the BD-WACP, as shown as
Figure imgf000026_0001
[00107] In some embodiments, it is proposed to always disable the BDOF when the BD- WACP is applied to one bi-predicted coding block.
[00108] BD-WACP weight signaling
[00109] As shown above, for explicit mode, one BD-WACP weight needs to be signaled from encoder to decoder to reconstruct the bi-prediction signal of one BD-WACP coding block. To save the overhead of signaling those weight values, variable-length code-words should be designed to accommodate the specific distribution of the weight values of the BD-WACP mode. In general, the BD-WACP weight 0 (i.e., default bi-prediction) is considered to be most frequently selected weight and should be assigned with the shortest code-word. The weight values with larger absolute values are often less selected due to the relatively modifications to the AC components in the reference blocks. Therefore, they should be assigned with longer code-word. Based on such sprit, Table 4 and Table5 shows two BD-WACP weight binarization methods when three weights (—1, 0, 1} or five weights (—6, —1, 0, 1, 6} are applied to the BD- WACP mode.
Table 4 Binarization of three BD-WACP weights
Figure imgf000026_0002
Table 5 Binarization of five BD-WACP weights
Figure imgf000026_0003
Figure imgf000027_0001
[00110] In practice, other binarization methods may also be applied. For instance, the digits 0 and 1 in Table 4 and Table 5 can be switched based on the same design sprit.
[00111] FIG. 10 shows a computing environment 1010 coupled with a user interface 1060. The computing environment 1010 can be part of a data processing server. The computing environment 1010 includes processor 1020, memory 1040, and I/O interface 1050.
[00112] The processor 1020 typically controls overall operations of the computing environment 1010, such as the operations associated with the display, data acquisition, data communications, and image processing. The processor 1020 may include one or more processors to execute instructions to perform all or some of the steps in the above-described methods. Moreover, the processor 1020 may include one or more modules that facilitate the interaction between the processor 1020 and other components. The processor may be a Central Processing Unit (CPU), a microprocessor, a single chip machine, a GPU, or the like.
[00113] The memory 1040 is configured to store various types of data to support the operation of the computing environment 1010. Memory 1040 may include predetermine software 1042. Examples of such data comprise instructions for any applications or methods operated on the computing environment 1010, video datasets, image data, etc. The memory 1040 may be implemented by using any type of volatile or non-volatile memory devices, or a combination thereof, such as a static random access memory (SRAM), an electrically erasable programmable read-only memory (EEPROM), an erasable programmable read-only memory (EPROM), a programmable read-only memory (PROM), a read-only memory (ROM), a magnetic memory, a flash memory, a magnetic or optical disk.
[00114] The I/O interface 1050 provides an interface between the processor 1020 and peripheral interface modules, such as a keyboard, a click wheel, buttons, and the like. The buttons may include but are not limited to, a home button, a start scan button, and a stop scan button. The I/O interface 1050 can be coupled with an encoder and decoder.
[00115] In some embodiments, there is also provided a non-transitory computer-readable storage medium comprising a plurality of programs, such as comprised in the memory 1040, executable by the processor 1020 in the computing environment 1010, for performing the above-described methods. For example, the non-transitory computer-readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disc, an optical data storage device or the like.
[00116] The non-transitory computer-readable storage medium has stored therein a plurality of programs for execution by a computing device having one or more processors, where the plurality of programs when executed by the one or more processors, cause the computing device to perform the above-described method for motion prediction.
[00117] In some embodiments, the computing environment 1010 may be implemented with one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field- programmable gate arrays (FPGAs), graphical processing units (GPUs), controllers, micro controllers, microprocessors, or other electronic components, for performing the above methods.
[00118] The description of the present disclosure has been presented for purposes of illustration and is not intended to be exhaustive or limited to the present disclosure. Many modifications, variations, and alternative implementations will be apparent to those of ordinary skill in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings.
[00119] The examples were chosen and described in order to explain the principles of the disclosure and to enable others skilled in the art to understand the disclosure for various implementations and to best utilize the underlying principles and various implementations with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of the disclosure is not to be limited to the specific examples of the implementations disclosed and that modifications and other implementations are intended to be included within the scope of the present disclosure.

Claims

CLAIMS What is claimed is:
1. A method for video decoding in weighted alternating current prediction (WACP), comprising: obtaining a plurality of inter prediction blocks from a number of temporal reference pictures associated with a video block; obtaining a low-frequency signal based on the plurality of inter prediction blocks; obtaining a plurality of high-frequency signals based on the plurality of inter prediction blocks, wherein at least one of the plurality of high-frequency signals is associated with one prediction block; and determining at least one weight associated with the high-frequency signal of at least one of the inter prediction blocks; and calculating a final prediction signal of the video block based on a weighted sum of the low-frequency signal and the plurality of high-frequency signals using the at least one weight.
2. The method of claim 1, wherein calculating the final prediction signal of the video block comprises: obtaining a combined high-frequency signal based on a weighted sum of the plurality of high-frequency signals, wherein at least one of the plurality of high-frequency signals is weighted by a corresponding weight associated with the at least one of the plurality of high- frequency signals; and calculating the final prediction signal of the video block as a sum of the low- frequency signal and the combined high-frequency signal of the video block.
3. The method of claim 1, wherein the low-frequency signal comprises a direct current (DC) component of the plurality of inter prediction blocks; and wherein at least one of the high-frequency signals comprises an alternating current (AC) component of a corresponding inter prediction block of the plurality of inter prediction blocks.
4. The method of claim 1, further comprising: receiving, from a bitstream, a total number of inter prediction blocks used for calculating the final prediction signal of the video block.
5. The method of claim 1, wherein determining the at least one weight associated with the high-frequency signal of the at least one of the prediction blocks comprises: identifying, when a current block is coded with merge mode, a candidate block from a plurality of merge candidate blocks; and determining a plurality of weights based on the candidate block.
6. The method of claim 1, wherein determining the at least one weight associated with the high-frequency signal of the at least one of the prediction blocks comprises: receiving, when the video block is not coded with merge mode, an index identifying the plurality of weights from a set of predefined set of weights for the block; and determining a plurality of weights based on the index.
7. The method of claim 6, further comprising: identifying a weight of a high-frequency signal by subtracting weights of all other high-frequency signals from one.
8. The method of claim 6, further comprising: identifying a weight of a high-frequency signal by subtracting weights of all other high-frequency signals from zero.
9. The method of claim 7, further comprising: quantizing the plurality of weights in the predefined set as one integer value right shifted by a fixed number.
10. The method of claim 9, wherein the predefined set of weights comprise {5, 0, 3} and the fixed number is set to 3.
11. The method of claim 1 , wherein a total number of inter prediction blocks of a current block is equal to 2.
12. The method of claim 11, further comprising: obtaining, when the video block is coded with bi-directional optical flow (BDOF), sample refinements based on samples of the plurality of the inter prediction blocks; and obtaining the final prediction signal based on the low-frequency signal, the plurality of high-frequency signals, and the sample refinements.
13. The method of claim 12, wherein obtaining the final prediction signal based on the low-frequency signal, the plurality of high-frequency signals, and the sample refinements comprises: obtaining a combined high-frequency signal based on a weighted sum of the plurality of high-frequency signals, wherein at least one of the plurality of high-frequency signals is weighted by a corresponding weight associated with the at least one of the plurality of high- frequency signals; and calculating the final prediction signal of the video block as a sum of the low- frequency signal, the combined high-frequency signal of the video block, and the sample refinements.
14. A computing device, comprising: one or more processors; and a non-transitory computer-readable storage medium storing instructions executable by the one or more processors, wherein the one or more processors are configured to perform the method in any of claims 1-13.
15. A non-transitory computer-readable storage medium storing a plurality of programs for execution by a computing device having one or more processors, wherein the plurality of programs, when executed by the one or more processors, cause the computing device to perform the method in any of claims 1-13.
PCT/US2021/043335 2020-07-27 2021-07-27 Weighted ac prediction for video coding WO2022026480A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202180059215.9A CN116158079A (en) 2020-07-27 2021-07-27 Weighted AC prediction for video codec

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063057290P 2020-07-27 2020-07-27
US63/057,290 2020-07-27

Publications (1)

Publication Number Publication Date
WO2022026480A1 true WO2022026480A1 (en) 2022-02-03

Family

ID=80036689

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/043335 WO2022026480A1 (en) 2020-07-27 2021-07-27 Weighted ac prediction for video coding

Country Status (2)

Country Link
CN (1) CN116158079A (en)
WO (1) WO2022026480A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080240238A1 (en) * 2007-03-28 2008-10-02 Tomonobu Yoshino Intra prediction system of video encoder and video decoder
US7440629B2 (en) * 2003-12-19 2008-10-21 Matsushita Electric Industrial Co., Ltd. Image encoding apparatus and image encoding method
US8005143B2 (en) * 1997-10-23 2011-08-23 Mitsubishi Denki Kabushiki Kaisha Imaging decoding apparatus
US20140355674A1 (en) * 2011-06-22 2014-12-04 Blackberry Limited Compressing Image Data
WO2020150080A1 (en) * 2019-01-14 2020-07-23 Interdigital Vc Holdings, Inc. Method and apparatus for video encoding and decoding with bi-directional optical flow adapted to weighted prediction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8005143B2 (en) * 1997-10-23 2011-08-23 Mitsubishi Denki Kabushiki Kaisha Imaging decoding apparatus
US7440629B2 (en) * 2003-12-19 2008-10-21 Matsushita Electric Industrial Co., Ltd. Image encoding apparatus and image encoding method
US20080240238A1 (en) * 2007-03-28 2008-10-02 Tomonobu Yoshino Intra prediction system of video encoder and video decoder
US20140355674A1 (en) * 2011-06-22 2014-12-04 Blackberry Limited Compressing Image Data
WO2020150080A1 (en) * 2019-01-14 2020-07-23 Interdigital Vc Holdings, Inc. Method and apparatus for video encoding and decoding with bi-directional optical flow adapted to weighted prediction

Also Published As

Publication number Publication date
CN116158079A (en) 2023-05-23

Similar Documents

Publication Publication Date Title
US11343541B2 (en) Signaling for illumination compensation
US11297348B2 (en) Implicit transform settings for coding a block of pixels
CN113678452A (en) Constraint on decoder-side motion vector refinement
Gao et al. An overview of AVS2 standard
CN111373749A (en) Method and apparatus for low complexity bi-directional intra prediction in video encoding and decoding
WO2021188598A1 (en) Methods and devices for affine motion-compensated prediction refinement
CN114128263A (en) Method and apparatus for adaptive motion vector resolution in video coding and decoding
CN117813816A (en) Method and apparatus for decoder-side intra mode derivation
EP4320863A1 (en) Geometric partition mode with explicit motion signaling
WO2022081878A1 (en) Methods and apparatuses for affine motion-compensated prediction refinement
WO2022032028A1 (en) Methods and apparatuses for affine motion-compensated prediction refinement
JP2023523839A (en) Entropy coding for motion accuracy syntax
WO2022026480A1 (en) Weighted ac prediction for video coding
CN114342390B (en) Method and apparatus for prediction refinement for affine motion compensation
WO2021188707A1 (en) Methods and apparatuses for simplification of bidirectional optical flow and decoder side motion vector refinement
WO2023192335A1 (en) Methods and devices for candidate derivation for affine merge mode in video coding
WO2023220444A1 (en) Methods and devices for candidate derivation for affine merge mode in video coding
WO2024010831A1 (en) Methods and devices for candidate derivation for affine merge mode in video coding
WO2023205185A1 (en) Methods and devices for candidate derivation for affine merge mode in video coding
WO2023205283A1 (en) Methods and devices for enhanced local illumination compensation
WO2024006231A1 (en) Methods and apparatus on chroma motion compensation using adaptive cross-component filtering
WO2023158766A1 (en) Methods and devices for candidate derivation for affine merge mode in video coding
WO2024044404A1 (en) Methods and devices using intra block copy for video coding
JP2024523534A (en) Geometric partition mode with motion vector refinement
WO2024010832A1 (en) Methods and apparatus on chroma motion compensation using adaptive cross-component filtering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21849703

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21849703

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17/05/2023)