WO2024016106A1 - Low-complexity enhancement video coding using multiple reference frames - Google Patents

Low-complexity enhancement video coding using multiple reference frames Download PDF

Info

Publication number
WO2024016106A1
WO2024016106A1 PCT/CN2022/106236 CN2022106236W WO2024016106A1 WO 2024016106 A1 WO2024016106 A1 WO 2024016106A1 CN 2022106236 W CN2022106236 W CN 2022106236W WO 2024016106 A1 WO2024016106 A1 WO 2024016106A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
temporal
layer
video
video frame
Prior art date
Application number
PCT/CN2022/106236
Other languages
French (fr)
Inventor
Renzhi JIANG
Xiaomin Chen
Jing Li
Yi Wang
Hua Zhang
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to PCT/CN2022/106236 priority Critical patent/WO2024016106A1/en
Publication of WO2024016106A1 publication Critical patent/WO2024016106A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/31Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/33Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability

Definitions

  • This disclosure generally relates to systems and methods for video coding and, more particularly, to Low-Complexity Enhancement Video Coding.
  • LCEVC Low-Complexity Enhancement Video Coding
  • FIG. 1 is an example block diagram of a Low-Complexity Enhancement Video Coding decoder, according to some example embodiments of the present disclosure.
  • FIG. 2 illustrates example video decoding sequences using pyramid B-frames, according to some example embodiments of the present disclosure.
  • FIG. 3 is an example global configuration syntax for a bitstream encoded using Low-Complexity Enhancement Video Coding, according to some example embodiments of the present disclosure.
  • FIG. 4 is an example picture configuration syntax for a bitstream encoded using Low-Complexity Enhancement Video Coding, according to some example embodiments of the present disclosure.
  • FIG. 5 is an example temporal prediction type indicator of the syntax for a bitstream encoded using Low-Complexity Enhancement Video Coding, according to some example embodiments of the present disclosure.
  • FIG. 6 is an example block diagram of a Low-Complexity Enhancement Video Coding encoder, according to some example embodiments of the present disclosure.
  • FIG. 7 illustrates a flow diagram of an illustrative process for Low-Complexity Enhancement Video Coding using multiple reference frames, in accordance with one or more example embodiments of the present disclosure.
  • FIG. 8 illustrates an embodiment of an exemplary system, in accordance with one or more example embodiments of the present disclosure.
  • LCEVC Low-Complexity Enhancement Video Coding
  • the base codec such as AVC, HEVC, VP9, AV1, etc.
  • LCEVC is an enhancement codec, meaning that it does not only up-sample well, it also encodes the residual information necessary for true fidelity to the source video and compresses the information (e.g., transforming, quantizing and coding the information) .
  • LCEVC residual layers encode the residual information necessary for true fidelity to the source video and compress the information (e.g., transforming, quantizing, and coding it) .
  • LCEVC can be used in higher resolutions or higher frame rates (e.g., 4Kp60, 8K, 12K) of video encoding, adaptive video streaming, and the like.
  • LCEVC supports a temporal layer in a layer-two (L-2) enhancement layer to improve the BD-Rate (Bjontegaard rate difference) , which gives a temporal mask to indicate INTRA_PRED or INTER_PRED information per transform block (e.g., whether the block was coded using intra or inter prediction) .
  • the transform block could be either 2x2 or 4x4, for example. If the temporal mask of the block is INTER_PRED, only the residual delta (e.g., residual of the current L-2 enhancement layer –reconstructed residual of previous L-2 enhancement layer) is encoded.
  • the residual delta e.g., residual of the current L-2 enhancement layer –reconstructed residual of previous L-2 enhancement layer
  • the current temporal layer implementation in the LCEVC standard only supports one reference frame at a time (e.g., a single reference frame in a reference frame buffer may be considered in the encoding or decoding of a given frame) .
  • Reference frames are used in video coding and decoding to reduce the size of data to be encoded and decoded by identifying matching blocks of pixels (e.g., blocks of pixels from a reference frame that match blocks of a pixels in a frame being encoded or decoded) . While some coding standards allow for multiple reference frames to be considered for the encoding and decoding of one frame, LCEVC does not, and to allow LCEVC to support multiple reference frames requires modification to the syntax of a LCEVC bitstream.
  • the reference frame buffer stores the previous L-2 reconstructed residuals in an encoded sequence as the reference frame (e.g., previous L-2 residual reconstructed picture) .
  • the default reference mechanism may not represent the best selection.
  • One reference frame may not be enough for a video encoder to achieve a better video quality and compression ratio.
  • One proposal is to apply the reference relationship of the base layer encoding to the residual layer.
  • it is necessary to extend the current LCEVC syntax to support multiple reference frames. In the current LCEVC standard, there is no mechanism in the bitstream syntax to extend the reference frame capability from one to multiple reference frames at a given time.
  • the enhancements presented herein to extend the LCEVC standard solve such a problem by providing a mechanism to select multiple reference frame rather than being limited to only one reference frame in previous encoding order.
  • the enhancements presented herein do not require any extra workload on the decoder’s side.
  • an LCEVC encoder and decoder may define a new syntax in a payload global configuration and payload picture configuration as an extension of the LCEVC bitstream to support the use of multiple reference frames at the temporal layer of the decoder.
  • the new syntax introduces a method to maintain the reference frame buffer of the decoder.
  • a refresh flag may be used to indicate whether the current frame is used as reference frame, and if yes, to indicate which stored reference frame is to be refreshed with a current frame.
  • a reference frame index may be used to indicate which stored reference frame is used for a current frame temporal prediction.
  • the enhancements herein allow for LCEVC encoding in the L-2 enhancement layer more efficient if the temporal layer is enabled because the temporal layer may leverage the frame reference relationship of the base layer.
  • MPEG-5 Part 2 LCEVC is a new video coding standard and was recently published in 2020. It can leverage existing and future codecs to enhance their performances whilst reducing their computational complexity.
  • the new syntax structure for multi-reference temporal prediction represents an extension of the current encode process, and the previous encoded bitstream can also be correctly decoded.
  • the present disclosure allows the producer (e.g., encoder) a choice to use more computer (CPU) cycles to achieve better quality without requiring extra workload on the consumer (e.g., decoder) side.
  • the proposed new syntax may change some bit values in the payload global configuration and payload picture configuration.
  • the enhancements of the present disclosure may be detected by parsing the encoded enhancement layer bitstream.
  • a new syntax is proposed.
  • the new syntax is not the replacement of the existing syntax, but rather serves as an extension to the existing syntax.
  • a new flag ‘temporal_multi_ref_flag’ may be added to the syntax to indicate whether multiple reference frames for temporal prediction are enabled. The newly added flag reuses a reserved syntax bit, so it does not affect the current decoding process.
  • two bytes of data may be added in the end of a picture configuration payload when the ‘temporal_multi_ref_flag’ is enabled.
  • the new inserted syntax may include ‘refresh_frame_flags’ and ‘ref_frame_idx’ .
  • the MAX_REF_FRAMES indicator may indicate the maximum number of reference frames that may be used by the temporal layer.
  • the refresh_frame_flags indicator may be used to indicate whether the current frame will be used as reference frame, if bit [i] is not zero, and the reference frame buffer [i] may be updated after the current frame decoded.
  • the ref_frame_idx [i] may be used to indicate which reference frame in the reference frames buffer is used for temporal prediction.
  • a new temporal prediction type may be added to the syntax for LCEVC.
  • TEMPORAL_PRED may indicate the nearest frame in the reference frames
  • TEMPORAL_PREDi may indicate the ith nearest frame.
  • the LCEVC decode process flow after enabling the multiple reference temporal prediction is provided herein.
  • the most similar reference frame of the multiple references frames may be selected among the stored reference frames (e.g., a previous L-2 residual reconstructed picture list) to achieve better encoder BD-rate.
  • FIG. 1 is an example block diagram of a Low-Complexity Enhancement Video Coding (LCEVC) decoder 100 , according to some example embodiments of the present disclosure.
  • LEVC Low-Complexity Enhancement Video Coding
  • the LCEVC decoder 100 may receive a bitstream 102 (e.g., an encoded bitstream as generated by the LCEVC encoder 600 of FIG. 6) having multiple layers (e.g., a base layer and enhancement layers) .
  • Frames encoded at the base layer of the bitstream 102 may include base layer data 104.
  • Frames encoded an a first enhancement layer (e.g., Layer-1) may include Layer-1 coefficient data 106.
  • Frames encoded at a second enhancement layer (e.g., Layer-2) may include Layer-2 coefficient data 108.
  • Frames encoded at a temporal layer may include temporal data 110.
  • Headers 112 of the bitstream 102 may be input to a decoder configuration 114 used by the LCEVC decoder 100. The layers of the bitstream and the ways that they are encoded are explained below with respect to FIG. 6.
  • the base layer data 104 may be decoded by a base layer decoder 116 (e.g., a non-LCEVC decoder) , resulting in a decoded base layer frame 118.
  • An upscaler 120 may up-sample (e.g., increase the pixel count of the image) the decoded base layer frame 118, resulting in an up-sampled base layer frame 122 (e.g., a preliminary intermediate frame) .
  • the Layer-1 coefficient data 106 may be decoded using an entropy decoder 124, and inverse quantization 126 may be performed on the decoded Layer-1 coefficient data to identify the transform coefficients.
  • Inverse transformation 128 may determine the transform used to generate the Layer-1 coefficient data 106.
  • the Layer-1 data that has been inversely transformed may pass through a Layer-1 filter 130 to generate a Layer-1 decoded frame 132.
  • the Layer-1 decoded frame 132 and the up-sampled base layer frame 122 may be added by an adder 134 to generate a combined intermediate frame 136.
  • the combined intermediate frame 136 may be up-sampled by an upscaler 138 to generate a combined intermediate frame 140 at full resolution.
  • the Layer-2 coefficient data 108 may be decoded using an entropy decoder 142, and inverse quantization 144 may be performed on the decoded Layer-2 coefficient data to identify the transform coefficients.
  • Inverse transformation 146 may determine the transform used to generate the Layer-2 coefficient data 108.
  • the Layer-2 data that has been inversely transformed may generate Layer-2 residuals 148.
  • the temporal data 110 may be decoded using an entropy decoder 150. When inter prediction 152 was used (e.g., as indicated by the syntax of the bitstream 102) , only the Layer-2 residuals 148 may be decoded.
  • the decoded temporal data may be compared to a reference frame in a reference frame buffer 154.
  • the reference frame buffer 154 may store multiple reference frames 155 at a given time, allowing the LCEVC decoder 100 to select one of the references frames (e.g., the best matching reference frame for the given frame being decoded) to combine with the decoded temporal data to generate an intermediate frame 156.
  • the intermediate frame 156 may be combined with the Layer-2 residuals 148 at an adder 158 to generate a combined intermediate frame 160.
  • the combined intermediate frame 160 and the combined intermediate frame 140 may be combined by an adder 162 to generate a combined output video frame.
  • the combined output video frames 164 of the LCEVC decoder 100 may be presented for playback.
  • a new syntax for the bitstream 102 is proposed and is shown in FIGs. 3-5.
  • the new syntax is not the replacement of the existing syntax, but rather serves as an extension to the existing syntax.
  • a new flag ‘temporal_multi_ref_flag’ may be added to the syntax to indicate whether multiple reference frames for temporal prediction are enabled. The newly added flag reuses a reserved syntax bit, so it does not affect the current decoding process.
  • two bytes of data may be added in the end of a picture configuration payload of the bitstream 102 when the ‘temporal_multi_ref_flag’ is enabled.
  • the new inserted syntax may include ‘refresh_frame_flags’ and ‘ref_frame_idx’ .
  • the MAX_REF_FRAMES indicator may indicate the maximum number of reference frames that may be used by the temporal layer.
  • the refresh_frame_flags indicator may be used to indicate whether the current frame will be used as reference frame, if bit [i] is not zero, and the reference frame buffer [i] may be updated after the current frame decoded.
  • the ref_frame_idx [i] may be used to indicate which reference frame in the reference frames buffer is used for temporal prediction.
  • a new temporal prediction type may be added to the syntax of the bitstream 102.
  • TEMPORAL_PRED may indicate the nearest frame in the reference frames
  • TEMPORAL_PREDi may indicate the ith nearest frame.
  • the most similar reference frame of the multiple references frames may be selected by the LCEVC decoder 100 among the stored reference frames (e.g., the multiple reference frames 155) to achieve better encoder BD-rate.
  • FIG. 2 illustrates example video decoding sequences using pyramid B-frames, according to some example embodiments of the present disclosure.
  • a picture order count (POC) of frames 0-8 is shown for a video decoding sequence 200, along with their encoding/decoding orders and which encoding layer was used to encode the respective POC frames.
  • the video decoding sequence 200 represents a Layer-2 enhancement layer reference relationship according to the current LCEVC standard when temporal prediction is enabled.
  • the picture order count (POC) of frames 0-8 is shown for a video decoding sequence 250, along with their encoding/decoding orders and which encoding layer was used to encode the respective POC frames.
  • the L-2 layer of picture order count 6 (POC6) uses POC3 as a reference frame because the encode order of POC6 is 6 and the encode order of POC3 is 5 (e.g., POC6 uses the most recent previously encoded POC as a reference frame) .
  • POC3 is not a good selection as the reference frame for POC6, as the POC4 should be closer to POC6 than POC3, and in an optimal encoder, if POC4 were used as the reference frame for POC6, the BD-Rate would be improved (e.g., compared to using POC3 as the reference frame) .
  • the mechanism presented herein to extend the LCEVC standard solves this problem by providing a mechanism to select from multiple reference frames, not only one reference frame in a previous encoding order, and does not add workload to the decoder’s side.
  • FIG. 3 is an example global configuration syntax 300 for a bitstream (e.g., the bitstream 102 of FIG. 1) encoded using Low-Complexity Enhancement Video Coding, according to some example embodiments of the present disclosure.
  • a portion 302 of the global configuration syntax 300 as currently defined by the LCEVC standard is shown, and includes a reserved bit 304.
  • the global configuration syntax 300 may be updated to use the reserved bit 304 as an indicator 306 (e.g., temporal_multi_ref_flag) for whether multiple reference frames are enabled for temporal prediction (e.g., the multiple reference frames 155 of FIG. 1) . Because the indicator 306 reuses the reserved bit 304, the decode process (e.g., of the LCEVC decoder 100 of FIG. 1) is not impacted by adding any bits to decode from the bitstream.
  • an indicator 306 e.g., temporal_multi_ref_flag
  • FIG. 4 is an example picture configuration syntax 400 for a bitstream (e.g., the bitstream 102 of FIG. 1) encoded using Low-Complexity Enhancement Video Coding, according to some example embodiments of the present disclosure.
  • a portion 402 of the picture configuration syntax 400 is shown, and when the indicator 306 of FIG. 3 is present in the global configuration syntax 300, two bytes of data may be added to the picture configuration payload 404 of the picture configuration syntax 400, including a refresh_frame_flags indicator 406, reserved bits 408, a reference frame index 410 (ref_frame_idx [i] ) , and a maximum reference frames indicator 412 (MAX_REF_FRAMES) .
  • the maximum reference frames indicator 412 may indicate the maximum number of reference frames that may be stored (e.g., the maximum number of the multiple reference frames 155 of FIG. 1) .
  • the refresh_frame_flags indicator 406 may indicate whether the current frame will be used as reference frame, and if bit [i] is not zero, the reference frame buffer [i] (e.g., the reference frame buffer 154 of FIG. 1) may be updated after the current frame is decoded.
  • the reference frame index 410 may be used to indicate which reference frame of the multiple reference frames in the reference frame buffer is to be used for temporal prediction.
  • FIG. 5 is an example temporal prediction type indicator 500 of the syntax for a bitstream (e.g., the bitstream 102 of FIG. 1) encoded using Low-Complexity Enhancement Video Coding, according to some example embodiments of the present disclosure.
  • syntax 502 of the bitstream may indicate whether the temporal layer prediction type is inter or intra.
  • a temporal prediction type indicator 504 TEMPORAL_PRED indicates the nearest frame in the reference frames (e.g., the multiple reference frames 155 of FIG. 1) is used as the reference frame, and TEMPORAL_PREDi indicates the ith nearest frame is used as the reference frame.
  • a TEMPORAL_PRED1 indicator 506 indicates that an adjacent frame is the reference frame
  • a TEMPORAL_PRED2 indicator 508 indicates that a frame two frames away is the reference frame
  • a TEMPORAL_PRED3 indicator 510 indicates that a frame three frames away is the reference frame.
  • the following pseudo code may be used (e.g., in the syntax of the bitstream 102 of FIG. 1) to facilitate use of multiple reference frames (e.g., the multiple reference frames 155 of FIG. 1) for temporal prediction: Multiple Reference Temporal Decode process for Frame i:
  • FIG. 6 is an example block diagram of a LCEVC encoder 600, according to some example embodiments of the present disclosure.
  • the LCEVC encoder 600 may generate the bitstream 102 of FIG. 1.
  • An input sequence 602 of video frames may be used to generate the bitstream 102 using an encoder configuration 604.
  • the input sequence 602 may be down-sampled by a downscaler 606 to generate a downscaled frame 608, which may be down-sampled further by a downscaler 610 to generate a downscaled frame 612.
  • the downscaled frame 612 may be encoded by a base encoder 614 (e.g., a non-LCEVC encoder) to generate an encoded base 616 (e.g., the base layer of the bitstream 102 used for the base layer data 104 of FIG. 1) .
  • a base encoder 614 e.g., a non-LCEVC encoder
  • the encoded frame from the based encoder 614 may be up-sampled by an upscaler 618 and subtracted from the downscaled frame 608 by a subtractor 620 to generate Layer-1 residuals on which a transform 622 and quantization 624 may be performed (e.g., for reconstruction) .
  • the transformed and quantized Layer-1 data may be used for inverse quantization 626 and inverse transform 628 (e.g., to reconstruct the pixel data) , and passed through a Layer-1 filter 630.
  • the quantized Layer-1 data from the quantization 624 may produce the Layer-1 coefficient layers 634 for the bitstream 102.
  • the filtered Layer-1 data may be added to the up-sampled frame from the upscaler 618 by an adder 636 to generate an intermediate frame, which may be up-sampled again by an upscaler 638.
  • a frame from the input sequence 602 may be subtracted from the intermediate frame by a subtractor 640 to generate Layer-2 residuals input for temporal prediction 642.
  • Transform 644 and quantization 646 may be performed on the temporal prediction 642, which then may be entropy encoded 648 to generate Layer-2 coefficient layers 650 for the bitstream 102.
  • the temporal prediction 642 data may be entropy encoded 652 to generate the temporal layer 654 of the bitstream 102.
  • the encoder configuration 604 may be indicated in the headers 656 of the bitstream 102 syntax.
  • Transform and quantization may generate and quantize transform units to facilitate encoding by a coder (e.g., entropy coder) .
  • Transform and quantized data may be inversely transformed and inversely quantized by an inverse transform and quantizer on the decoder side.
  • An adder may compare the inversely transformed and inversely quantized data to a prediction block generated by a prediction unit (e.g., temporal prediction) , resulting in reconstructed frames.
  • a filter e.g., in-loop filter for resizing/cropping, color conversion, de-interlacing, composition/blending, etc.
  • a control may manage many encoding aspects (e.g., parameters) including at least the setting of a quantization parameter (QP) but could also include setting bitrate, rate distortion or scene characteristics, prediction and/or transform partition or block sizes, available prediction mode types, and best mode selection parameters, for example, based at least partly on data from the prediction unit.
  • QP quantization parameter
  • the transform and quantization processes may generate and quantize transform units to facilitate encoding by the coder, which may generate coded data that may be transmitted (e.g., an encoded bitstream) .
  • inverse transform and quantization may reconstruct pixel data based on the quantized residual coefficients and context data.
  • An adder may add the residual pixel data to a predicted block generated by a prediction unit.
  • a filter may filter the resulting data from the adder.
  • the filtered data may be output by a media output, and also may be stored as reconstructed frames in an image buffer (e.g., the reference frame buffer 154 of FIG. 1) for use by the prediction unit.
  • the LCEVC decoder 100 and encoder 600 performs the methods of intra prediction disclosed herein, and is arranged to perform at least one or more of the implementations described herein including intra block copying.
  • the LCEVC decoder 100 and encoder 600 may be configured to undertake video coding and/or implement video codecs according to one or more standards.
  • LCEVC decoder 100 and encoder 600 may be implemented as part of an image processor, video processor, and/or media processor and undertakes inter-prediction, intra-prediction, predictive coding, and residual prediction.
  • LCEVC decoder 100 and encoder 600 may undertake video compression and decompression and/or implement video codecs according to one or more standards or specifications, such as, for example, H. 264 (Advanced Video Coding, or AVC) , VP8, H. 265 (High Efficiency Video Coding or HEVC) and SCC extensions thereof, VP9, Alliance Open Media Version 1 (AV1) , H. 266 (Versatile Video Coding, or VVC) , DASH (Dynamic Adaptive Streaming over HTTP) , and others.
  • H. 264 Advanced Video Coding
  • VP8 H. 265 High Efficiency Video Coding or HEVC
  • VP9 Alliance Open Media Version 1
  • H. 266 Very Video Coding
  • DASH Dynamic Adaptive Streaming over HTTP
  • coder may refer to an encoder and/or a decoder.
  • coding may refer to encoding via an encoder and/or decoding via a decoder.
  • a coder, encoder, or decoder may have components of both an encoder and decoder.
  • An encoder may have a decoder loop as described below.
  • the LCEVC encoder 600 may be an encoder where current video information in the form of data related to a sequence of video frames may be received to be compressed.
  • a video sequence is formed of input frames of synthetic screen content such as from, or for, business applications such as word processors, power points, or spread sheets, computers, video games, virtual reality images, and so forth.
  • the images may be formed of a combination of synthetic screen content and natural camera captured images.
  • the video sequence only may be natural camera captured video.
  • a partitioner may partition each frame into smaller more manageable units, and then compare the frames to compute a prediction.
  • the LCEVC encoder 600 may receive an input frame from the input sequence 602.
  • the input frames may be frames sufficiently pre-processed for encoding.
  • the LCEVC encoder 600 also may manage many encoding aspects including at least the setting of a quantization parameter (QP) but could also include setting bitrate, rate distortion or scene characteristics, prediction and/or transform partition or block sizes, available prediction mode types, and best mode selection parameters to name a few examples.
  • QP quantization parameter
  • the output of the transformed and quantized data may be provided to the inverse transform and quantization to generate the same reference or reconstructed blocks, frames, or other units as would be generated at a decoder such as the LCEVC decoder 100.
  • the prediction unit may use the inverse transform and quantization, adder, and filter to reconstruct the frames.
  • a prediction unit may perform inter-prediction including motion estimation and motion compensation, intra-prediction according to the description herein, and/or a combined inter-intra prediction.
  • the prediction unit may select the best prediction mode (including intra-modes) for a particular block, typically based on bit-cost and other factors.
  • the prediction unit may select an intra-prediction and/or inter-prediction mode when multiple such modes of each may be available.
  • the prediction output of the prediction unit in the form of a prediction block may be provided both to the subtractor to generate a residual, and in the decoding loop to the adder to add the prediction to the reconstructed residual from the inverse transform to reconstruct a frame.
  • the partitioner or other initial units not shown may place frames in order for encoding and assign classifications to the frames, such as I-frame, B-frame, P-frame and so forth, where I-frames are intra-predicted. Otherwise, frames may be divided into slices (such as an I-slice) where each slice may be predicted differently. Thus, for HEVC or AV1 coding of an entire I-frame or I-slice, spatial or intra-prediction is used, and in one form, only from data in the frame itself.
  • the prediction unit may perform an intra block copy (IBC) prediction mode and a non-IBC mode operates any other available intra-prediction mode such as neighbor horizontal, diagonal, or direct coding (DC) prediction mode, palette mode, directional or angle modes, and any other available intra-prediction mode.
  • IBC intra block copy
  • DC direct coding
  • Other video coding standards such as HEVC or VP9 may have different sub-block dimensions but still may use the IBC search disclosed herein. It should be noted, however, that the foregoing are only example partition sizes and shapes, the present disclosure not being limited to any particular partition and partition shapes and/or sizes unless such a limit is mentioned or the context suggests such a limit, such as with the optional maximum efficiency size as mentioned. It should be noted that multiple alternative partitions may be provided as prediction candidates for the same image area as described below.
  • the prediction unit may select previously decoded reference blocks. Then comparisons may be performed to determine if any of the reference blocks match a current block being reconstructed. This may involve hash matching, SAD search, or other comparison of image data, and so forth. Once a match is found with a reference block, the prediction unit may use the image data of the one or more matching reference blocks to select a prediction mode.
  • previously reconstructed image data of the reference block is provided as the prediction, but alternatively, the original pixel image data of the reference block could be provided as the prediction instead. Either choice may be used regardless of the type of image data that was used to match the blocks.
  • the predicted block then may be subtracted at subtractor from the current block of original image data, and the resulting residual may be partitioned into one or more transform blocks (TUs) so that the transform and quantization can transform the divided residual data into transform coefficients using discrete cosine transform (DCT) for example.
  • DCT discrete cosine transform
  • the transform and quantization uses lossy resampling or quantization on the coefficients.
  • the frames and residuals along with supporting or context data block size and intra displacement vectors and so forth may be entropy encoded by the LCEVCe encoder 600 and transmitted to decoders.
  • the LCEVC decoder 100 may receive coded video data in the form of a bitstream and that has the image data (chroma and luma pixel values) and as well as context data including residuals in the form of quantized transform coefficients and the identity of reference blocks including at least the size of the reference blocks, for example.
  • the context also may include prediction modes for individual blocks, other partitions such as slices, inter-prediction motion vectors, partitions, quantization parameters, filter information, and so forth.
  • the LCEVC decoder 100 may process the bitstream with an entropy decoder to extract the quantized residual coefficients as well as the context data.
  • the LCEVC decoder 100 then may use the inverse transform and quantization to reconstruct the residual pixel data.
  • the LCEVC decoder 100 then may use an adder (along with assemblers not shown) to add the residual to a predicted block.
  • the LCEVC decoder 100 also may decode the resulting data using a decoding technique employed depending on the coding mode indicated in syntax of the bitstream, and either a first path including a prediction unit or a second path that includes a filter.
  • the prediction unit performs intra-prediction by using reference block sizes and the intra displacement or motion vectors extracted from the bitstream, and previously established at the encoder.
  • the prediction unit may utilize reconstructed frames as well as inter-prediction motion vectors from the bitstream to reconstruct a predicted block.
  • the prediction unit may set the correct prediction mode for each block, where the prediction mode may be extracted and decompressed from the compressed bitstream.
  • the coded data may include both video and audio data.
  • FIG. 7 illustrates a flow diagram of an illustrative process for LCEVC using multiple reference frames, in accordance with one or more example embodiments of the present disclosure.
  • a device may identify a bitstream (e.g., the bitstream 102 of FIG. 1) encoded using a base encoder (e.g., the base layer data 104 of FIG. 1 of the encoded base 616 generated by the base encoder 614 of FIG. 1) and enhancement layers (e.g., the Layer-1 coefficient data 106 of FIG. 1 of the Layer-1 coefficient layers 634 of FIG. 6, the Layer-2 coefficient data 108 of FIG. 1 of the Layer-2 coefficient layers 650 of FIG. 6, the temporal data 110 of the temporal layer 654 of FIG.
  • a base encoder e.g., the base layer data 104 of FIG. 1 of the encoded base 616 generated by the base encoder 614 of FIG.
  • enhancement layers e.g., the Layer-1 coefficient data 106 of FIG. 1 of the Layer-1 coefficient layers 634 of FIG. 6, the Layer-2 coefficient data 108 of FIG. 1 of the Layer-2 coefficient layers 650 of FIG. 6, the temporal data 110 of the temporal layer 654 of FIG.
  • the syntax of the bitstream may include one or more indicators, including the indicator 306 of FIG. 3 indicating whether multiple reference frames are enabled for temporal prediction, the refresh_frame_flags indicator 406 of FIG. 4 indicating whether the current frame will be used as reference frame, the reference frame index 410 of FIG. 4 to indicate which reference frame of the multiple reference frames in the reference frame buffer is to be used for temporal prediction, the maximum reference frames indicator 412 of FIG. 4 to indicate the maximum number of reference frames that may be stored, and/or the TEMPORAL_PREDi indicator of FIG. 5 indicates the ith nearest frame is used as the reference frame.
  • the device may decode a first video frame of the first layer of the bitstream using a base decoder (e.g., decode a frame having the base layer data 104 using the base layer decoder 116) .
  • the base decoder may be a non-LCEVC decoder (e.g., using a codec different than LCEVC) .
  • the device may up-sample the decoded first video frame (e.g., up-sample the decoded base layer frame 118 of FIG. 1 using the upscaler 120 of FIG. 1) .
  • the device may decode encoded video data of a first enhancement layer of the enhancement layers (e.g., decode the Layer-1 coefficient data 106 using the entropy decoding 124 of FIG. 1) .
  • the device may generate a first combined intermediate video frame (e.g., the combined intermediate frame 136 of FIG. 1) by combining the up-sampled first video frame and the decoded video data of the first enhancement layer.
  • a first combined intermediate video frame e.g., the combined intermediate frame 136 of FIG. 1
  • the device may up-sample the first combined intermediate video frame (e.g., using the upscaler 138 of FIG. 1) .
  • the device may decode encoded video data of a second enhancement layer of the enhancement layers (e.g., decode the Layer-2 coefficient data 108 using the entropy decoding 142 of FIG. 1) .
  • a second enhancement layer of the enhancement layers e.g., decode the Layer-2 coefficient data 108 using the entropy decoding 142 of FIG. 1.
  • the device may select, from among multiple temporal layer reference frames stored in a reference frame buffer (e.g., the multiple reference frames 155 of FIG. 1) , a reference frame with which to combine the decoded video data of the second enhancement layer.
  • a reference frame buffer e.g., the multiple reference frames 155 of FIG. 1
  • the device may generate a second combined intermediate video frame (e.g., the combined intermediate frame 160 of FIG. 1) by combining the decoded video data of the second enhancement layer and the selected reference frame.
  • a second combined intermediate video frame e.g., the combined intermediate frame 160 of FIG. 1
  • the device may generate a combined output video frame (e.g., the combined output video frames 164 of FIG. 1) by combining the first combined intermediate video frame and the second combined intermediate video frame.
  • Combined output video frames generated by the process 700 may represent the video frames used for playback.
  • the combined output video frames may be presented for playback.
  • FIG. 8 illustrates an embodiment of an exemplary system 800, in accordance with one or more example embodiments of the present disclosure.
  • the computing system 800 may comprise or be implemented as part of an electronic device.
  • the computing system 800 may be representative, for example, of a computer system that implements one or more components of FIG. 1.
  • the computing system 800 is configured to implement all logic, systems, processes, logic flows, methods, equations, apparatuses, and functionality described herein and with reference to FIGS. 1-7.
  • the system 800 may be a computer system with multiple processor cores such as a distributed computing system, supercomputer, high-performance computing system, computing cluster, mainframe computer, mini-computer, client-server system, personal computer (PC) , workstation, server, portable computer, laptop computer, tablet computer, a handheld device such as a personal digital assistant (PDA) , or other devices for processing, displaying, or transmitting information.
  • Similar embodiments may comprise, e.g., entertainment devices such as a portable music player or a portable video player, a smart phone or other cellular phones, a telephone, a digital video camera, a digital still camera, an external storage device, or the like. Further embodiments implement larger scale server configurations.
  • the system 800 may have a single processor with one core or more than one processor. Note that the term “processor” refers to a processor with a single core or a processor package with multiple processor cores.
  • the computing system 800 is representative of one or more components of FIG. 1 and FIG. 6. More generally, the computing system 800 is configured to implement all logic, systems, processes, logic flows, methods, apparatuses, and functionality described herein with reference to the above figures.
  • a component can be, but is not limited to being, a process running on a processor, a processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium) , an object, an executable, a thread of execution, a program, and/or a computer.
  • both an application running on a server and the server can be a component.
  • One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers.
  • components may be communicatively coupled to each other by various types of communications media to coordinate operations.
  • the coordination may involve the uni-directional or bi-directional exchange of information.
  • the components may communicate information in the form of signals communicated over the communications media.
  • the information can be implemented as signals allocated to various signal lines. In such allocations, each message is a signal.
  • Further embodiments, however, may alternatively employ data messages. Such data messages may be sent across various connections. Exemplary connections include parallel interfaces, serial interfaces, and bus interfaces.
  • system 800 comprises a motherboard 805 for mounting platform components.
  • the motherboard 805 is a point-to-point interconnect platform that includes a processor 810, a processor 830 coupled via a point-to-point interconnects as an Ultra Path Interconnect (UPI) , and a LCEVC device 819 (e.g., capable of performing the functions of FIGs. 1, 6, and 7) .
  • the system 800 may be of another bus architecture, such as a multi-drop bus.
  • each of processors 810 and 830 may be processor packages with multiple processor cores.
  • processors 810 and 830 are shown to include processor core (s) 820 and 840, respectively.
  • system 800 is an example of a two-socket (2S) platform
  • other embodiments may include more than two sockets or one socket.
  • some embodiments may include a four-socket (4S) platform or an eight-socket (8S) platform.
  • Each socket is a mount for a processor and may have a socket identifier.
  • platform refers to the motherboard with certain components mounted such as the processors 810 and the chipset 860.
  • Some platforms may include additional components and some platforms may only include sockets to mount the processors and/or the chipset.
  • the processors 810 and 830 can be any of various commercially available processors, including without limitation an Core (2) and processors; and processors; application, embedded and secure processors; and and processors; IBM and Cell processors; and similar processors. Dual microprocessors, multi-core processors, and other multi-processor architectures may also be employed as the processors 810, and 830.
  • the processor 810 includes an integrated memory controller (IMC) 814 and point-to-point (P-P) interfaces 818 and 852.
  • the processor 830 includes an IMC 834 and P-P interfaces 838 and 854.
  • the IMC’s 814 and 834 couple the processors 810 and 830, respectively, to respective memories, a memory 812 and a memory 832.
  • the memories 812 and 832 may be portions of the main memory (e.g., a dynamic random-access memory (DRAM) ) for the platform such as double data rate type 3 (DDR3) or type 4 (DDR4) synchronous DRAM (SDRAM) .
  • DRAM dynamic random-access memory
  • SDRAM synchronous DRAM
  • the memories 812 and 832 locally attach to the respective processors 810 and 830.
  • the system 800 may include the LCEVC device 819.
  • the LCEVC device 819 may be connected to chipset 860 by means of P-P interfaces 829 and 869.
  • the LCEVC device 819 may also be connected to a memory 839.
  • the LCEVC device 819 may be connected to at least one of the processors 810 and 830.
  • the memories 812, 832, and 839 may couple with the processor 810 and 830, and the LCEVC device 819 via a bus and shared memory hub.
  • System 800 includes chipset 860 coupled to processors 810 and 830. Furthermore, chipset 860 can be coupled to storage medium 803, for example, via an interface (I/F) 866.
  • the I/F 866 may be, for example, a Peripheral Component Interconnect-enhanced (PCI-e) .
  • the processors 810, 830, and the LCEVC device 819 may access the storage medium 803 through chipset 860.
  • Storage medium 803 may comprise any non-transitory computer-readable storage medium or machine-readable storage medium, such as an optical, magnetic or semiconductor storage medium. In various embodiments, storage medium 803 may comprise an article of manufacture. In some embodiments, storage medium 803 may store computer-executable instructions, such as computer-executable instructions 802 to implement one or more of processes or operations described herein, (e.g., process 700 of FIG. 7) . The storage medium 803 may store computer-executable instructions for any equations depicted above. The storage medium 803 may further store computer-executable instructions for models and/or networks described herein, such as a neural network or the like.
  • Examples of a computer-readable storage medium or machine-readable storage medium may include any tangible media capable of storing electronic data, including volatile memory or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth.
  • Examples of computer-executable instructions may include any suitable types of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, object-oriented code, visual code, and the like. It should be understood that the embodiments are not limited in this context.
  • the processor 810 couples to a chipset 860 via P-P interfaces 852 and 862 and the processor 830 couples to a chipset 860 via P-P interfaces 854 and 864.
  • Direct Media Interfaces may couple the P-P interfaces 852 and 862 and the P-P interfaces 854 and 864, respectively.
  • the DMI may be a high-speed interconnect that facilitates, e.g., eight Giga Transfers per second (GT/s) such as DMI 3.0.
  • GT/s Giga Transfers per second
  • the processors 810 and 830 may interconnect via a bus.
  • the chipset 860 may comprise a controller hub such as a platform controller hub (PCH) .
  • the chipset 860 may include a system clock to perform clocking functions and include interfaces for an I/O bus such as a universal serial bus (USB) , peripheral component interconnects (PCIs) , serial peripheral interconnects (SPIs) , integrated interconnects (I2Cs) , and the like, to facilitate connection of peripheral devices on the platform.
  • the chipset 860 may comprise more than one controller hub such as a chipset with a memory controller hub, a graphics controller hub, and an input/output (I/O) controller hub.
  • the chipset 860 couples with a trusted platform module (TPM) 872 and the UEFI, BIOS, Flash component 874 via an interface (I/F) 870.
  • TPM 872 is a dedicated microcontroller designed to secure hardware by integrating cryptographic keys into devices.
  • the UEFI, BIOS, Flash component 874 may provide pre-boot code.
  • chipset 860 includes the I/F 866 to couple chipset 860 with a high-performance graphics engine, graphics card 865.
  • the system 800 may include a flexible display interface (FDI) between the processors 810 and 830 and the chipset 860.
  • the FDI interconnects a graphics processor core in a processor with the chipset 860.
  • Various I/O devices 892 couple to the bus 881, along with a bus bridge 880 which couples the bus 881 to a second bus 891 and an I/F 868 that connects the bus 881 with the chipset 860.
  • the second bus 891 may be a low pin count (LPC) bus.
  • Various devices may couple to the second bus 891 including, for example, a keyboard 882, a mouse 884, communication devices 886, a storage medium 801, and an audio I/O 890.
  • the artificial intelligence (AI) accelerator 867 may be circuitry arranged to perform computations related to AI.
  • the AI accelerator 867 may be connected to storage medium 803 and chipset 860.
  • the AI accelerator 867 may deliver the processing power and energy efficiency needed to enable abundant-data computing.
  • the AI accelerator 867 is a class of specialized hardware accelerators or computer systems designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision.
  • the AI accelerator 867 may be applicable to algorithms for robotics, internet of things, other data-intensive and/or sensor-driven tasks.
  • I/O devices 892, communication devices 886, and the storage medium 801 may reside on the motherboard 805 while the keyboard 882 and the mouse 884 may be add-on peripherals. In other embodiments, some or all the I/O devices 892, communication devices 886, and the storage medium 801 are add-on peripherals and do not reside on the motherboard 805.
  • Coupled and “connected” along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, descriptions using the terms “connected” and/or “coupled” may indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled, ” however, may also mean that two or more elements are not in direct contact with each other, yet still co-operate or interact with each other.
  • a data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus.
  • the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories that provide temporary storage of at least some program code to reduce the number of times code must be retrieved from bulk storage during execution.
  • code covers a broad range of software components and constructs, including applications, drivers, processes, routines, methods, modules, firmware, microcode, and subprograms. Thus, the term “code” may be used to refer to any collection of instructions that, when executed by a processing system, perform a desired operation or operations.
  • Circuitry is hardware and may refer to one or more circuits. Each circuit may perform a particular function.
  • a circuit of the circuitry may comprise discrete electrical components interconnected with one or more conductors, an integrated circuit, a chip package, a chipset, memory, or the like.
  • Integrated circuits include circuits created on a substrate such as a silicon wafer and may comprise components.
  • Integrated circuits, processor packages, chip packages, and chipsets may comprise one or more processors.
  • Processors may receive signals such as instructions and/or data at the input (s) and process the signals to generate at least one output. While executing code, the code changes the physical states and characteristics of transistors that make up a processor pipeline. The physical states of the transistors translate into logical bits of ones and zeros stored in registers within the processor. The processor can transfer the physical states of the transistors into registers and transfer the physical states of the transistors to another storage medium.
  • a processor may comprise circuits to perform one or more sub-functions implemented to perform the overall function of the processor.
  • One example of a processor is a state machine or an application-specific integrated circuit (ASIC) that includes at least one input and at least one output.
  • a state machine may manipulate the at least one input to generate the at least one output by performing a predetermined series of serial and/or parallel manipulations or transformations on the at least one input.
  • the logic as described above may be part of the design for an integrated circuit chip.
  • the chip design is created in a graphical computer programming language, and stored in a computer storage medium or data storage medium (such as a disk, tape, physical hard drive, or virtual hard drive such as in a storage access network) . If the designer does not fabricate chips or the photolithographic masks used to fabricate chips, the designer transmits the resulting design by physical means (e.g., by providing a copy of the storage medium storing the design) or electronically (e.g., through the Internet) to such entities, directly or indirectly. The stored design is then converted into the appropriate format (e.g., GDSII) for the fabrication.
  • GDSII GDSI
  • the resulting integrated circuit chips can be distributed by the fabricator in raw wafer form (that is, as a single wafer that has multiple unpackaged chips) , as a bare die, or in a packaged form.
  • the chip is mounted in a single chip package (such as a plastic carrier, with leads that are affixed to a motherboard or other higher-level carrier) or in a multichip package (such as a ceramic carrier that has either or both surface interconnections or buried interconnections) .
  • the chip is then integrated with other chips, discrete circuit elements, and/or other signal processing devices as part of either (a) an intermediate product, such as a processor board, a server platform, or a motherboard, or (b) an end product.
  • the word “exemplary” is used herein to mean “serving as an example, instance, or illustration. ” Any embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments.
  • the terms “computing device, ” “user device, ” “communication station, ” “station, ” “handheld device, ” “mobile device, ” “wireless device” and “user equipment” (UE) as used herein refers to a wireless communication device such as a cellular telephone, a smartphone, a tablet, a netbook, a wireless terminal, a laptop computer, a femtocell, a high data rate (HDR) subscriber station, an access point, a printer, a point of sale device, an access terminal, or other personal communication system (PCS) device.
  • the device may be either mobile or stationary.
  • the term “communicate” is intended to include transmitting, or receiving, or both transmitting and receiving. This may be particularly useful in claims when describing the organization of data that is being transmitted by one device and received by another, but only the functionality of one of those devices is required to infringe the claim. Similarly, the bidirectional exchange of data between two devices (both devices transmit and receive during the exchange) may be described as “communicating, ” when only the functionality of one of those devices is being claimed.
  • the term “communicating” as used herein with respect to a wireless communication signal includes transmitting the wireless communication signal and/or receiving the wireless communication signal.
  • a wireless communication unit which is capable of communicating a wireless communication signal, may include a wireless transmitter to transmit the wireless communication signal to at least one other wireless communication unit, and/or a wireless communication receiver to receive the wireless communication signal from at least one other wireless communication unit.
  • a personal computer PC
  • a desktop computer a mobile computer, a laptop computer, a notebook computer, a tablet computer, a server computer, a handheld computer, a handheld device, a personal digital assistant (PDA) device, a handheld PDA device, an on-board device, an off-board device, a hybrid device, a vehicular device, a non-vehicular device, a mobile or portable device, a consumer device, a non-mobile or non-portable device, a wireless communication station, a wireless communication device, a wireless access point (AP) , a wired or wireless router, a wired or wireless modem, a video device, an audio device, an audio-video (A/V) device, a wired or wireless network, a wireless area network, a wireless video area network (WVAN) , a local area network (LAN) , a wireless LAN (WLAN) , a personal area network (P
  • Some embodiments may be used in conjunction with one way and/or two-way radio communication systems, cellular radio-telephone communication systems, a mobile phone, a cellular telephone, a wireless telephone, a personal communication system (PCS) device, a PDA device which incorporates a wireless communication device, a mobile or portable global positioning system (GPS) device, a device which incorporates a GPS receiver or transceiver or chip, a device which incorporates an RFID element or chip, a multiple input multiple output (MIMO) transceiver or device, a single input multiple output (SIMO) transceiver or device, a multiple input single output (MISO) transceiver or device, a device having one or more internal antennas and/or external antennas, digital video broadcast (DVB) devices or systems, multi-standard radio devices or systems, a wired or wireless handheld device, e.g., a smartphone, a wireless application protocol (WAP) device, or the like.
  • WAP wireless application protocol
  • Some embodiments may be used in conjunction with one or more types of wireless communication signals and/or systems following one or more wireless communication protocols, for example, radio frequency (RF) , infrared (IR) , frequency-division multiplexing (FDM) , orthogonal FDM (OFDM) , time-division multiplexing (TDM) , time-division multiple access (TDMA) , extended TDMA (E-TDMA) , general packet radio service (GPRS) , extended GPRS, code-division multiple access (CDMA) , wideband CDMA (WCDMA) , CDMA 2000, single-carrier CDMA, multi-carrier CDMA, multi-carrier modulation (MDM) , discrete multi-tone (DMT) , global positioning system (GPS) , Wi-Fi, Wi-Max, ZigBee, ultra-wideband (UWB) , global system for mobile communications (GSM) , 2G, 2.5G, 3G, 3.5G, 4G, fifth
  • Example 1 may be an apparatus of a device for decoding video data encoded using low-complexity enhancement video coding (LCEVC) , the processing circuitry configured to: identify a bitstream received from a device, the bitstream comprising a first layer encoded using a base encoder and enhancement layers encoded using LCEVC; decode a first video frame of the first layer of the bitstream using a base decoder; up-sample the decoded first video frame; decode encoded video data of a first enhancement layer of the enhancement layers; generate a first combined intermediate video frame by combining the up-sampled first video frame and the decoded video data of the first enhancement layer; up-sample the first combined intermediate video frame; decode encoded video data of a second enhancement layer of the enhancement layers; select, from among multiple temporal layer reference frames concurrently stored in a reference frame buffer, a reference frame with which to combine the decoded video data of the second enhancement layer; generate a second combined intermediate video frame by combining the decoded video data of the second
  • Example 2 may include the apparatus of example 1 and/or some other example herein, wherein the processing circuitry is further configured to: identify a global configuration syntax of the bitstream; and identify a temporal layer multiple reference frame indicator in the global configuration syntax, the temporal layer multiple reference frame indicator indicating that the apparatus is enabled to select from among the multiple temporal layer reference frames, wherein the selection of the reference frame is based on the identification of the temporal layer multiple reference frame indicator.
  • Example 3 may include the apparatus of example 2 and/or some other example herein, wherein the temporal layer multiple reference frame indicator consists of only one bit.
  • Example 4 may include the apparatus of example 2 or example 3 and/or some other example herein, wherein the processing circuitry is further configured to: identify a picture configuration syntax of the bitstream; identify a refresh frame indicator in the picture configuration syntax, the refresh frame indicator indicating that a current video frame, a previous video frame, or a future reference frame is the reference frame; and identify a reference frame index in the picture configuration syntax, the reference frame index indicating which of the multiple temporal layer reference frames is to be used for temporal prediction; and decode temporal data of the bitstream based on the reference frame, wherein to decode the temporal data is based on the temporal prediction.
  • Example 5 may include the apparatus of example 4 and/or some other example herein, wherein the refresh frame indicator and the reference frame index consists of two bytes.
  • Example 6 may include the apparatus of example 4 and/or some other example herein, wherein the processing circuitry is further configured to: identify a temporal prediction type indicator in syntax of the bitstream, wherein the temporal prediction is based on the temporal prediction type indicator.
  • Example 7 may include the apparatus of example 6 and/or some other example herein, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a nearest video frame of the multiple temporal layer reference frames.
  • Example 8 may include the apparatus of example 6 and/or some other example herein, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a video frame of the multiple temporal layer reference frames that is one video frame from the nearest video frame.
  • Example 9 may include the apparatus of example 6 and/or some other example herein, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a video frame of the multiple temporal layer reference frames that is two video frames from the nearest video frame.
  • Example 10 may include the apparatus of example 6 and/or some other example herein, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a video frame of the multiple temporal layer reference frames that is three video frames from the nearest video frame.
  • Example 11 may include a computer-readable storage medium comprising instructions to cause processing circuitry of a device for decoding video data encoded using low-complexity enhancement video coding (LCEVC) , upon execution of the instructions by the processing circuitry, to: identify a bitstream received from a device, the bitstream comprising a first layer encoded using a base encoder and enhancement layers encoded using LCEVC; decode a first video frame of the first layer of the bitstream using a base decoder; up-sample the decoded first video frame; decode encoded video data of a first enhancement layer of the enhancement layers; generate a first combined intermediate video frame by combining the up-sampled first video frame and the decoded video data of the first enhancement layer; up-sample the first combined intermediate video frame; decode encoded video data of a second enhancement layer of the enhancement layers; select, from among multiple temporal layer reference frames concurrently stored in a reference frame buffer, a reference frame with which to combine the decoded video data of the second enhancement layer; generate
  • Example 12 may include the computer-readable medium of example 11 and/or some other example herein, wherein execution of the instructions further causes the processing circuitry to: identify a global configuration syntax of the bitstream; and identify a temporal layer multiple reference frame indicator in the global configuration syntax, the temporal layer multiple reference frame indicator indicating that the device is enabled to select from among the multiple temporal layer reference frames, wherein the selection of the reference frame is based on the identification of the temporal layer multiple reference frame indicator.
  • Example 13 may include computer-readable medium of example 12 and/or some other example herein, wherein the temporal layer multiple reference frame indicator consists of only one bit.
  • Example 14 may include the computer-readable medium of example 12 and/or some other example herein, wherein execution of the instructions further causes the processing circuitry to: identify a picture configuration syntax of the bitstream; identify a refresh frame indicator in the picture configuration syntax, the refresh frame indicator indicating that a current video frame, a previous video frame, or a future reference frame is the reference frame; and identify a reference frame index in the picture configuration syntax, the reference frame index indicating which of the multiple temporal layer reference frames is to be used for temporal prediction; and decode temporal data of the bitstream based on the reference frame, wherein to decode the temporal data is based on the temporal prediction.
  • Example 15 may include the computer-readable medium of example 14 and/or some other example herein, wherein the refresh frame indicator and the reference frame index consists of two bytes.
  • Example 16 may include the computer-readable medium of example 14 and/or some other example herein, wherein execution of the instructions further causes the processing circuitry to: identify a temporal prediction type indicator in syntax of the bitstream, wherein the temporal prediction is based on the temporal prediction type indicator.
  • Example 17 may include the computer-readable medium of example 16 and/or some other example herein, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a nearest video frame of the multiple temporal layer reference frames.
  • Example 18 may include the computer-readable medium of example 16 and/or some other example herein, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a video frame of the multiple temporal layer reference frames that is one video frame from the nearest video frame.
  • Example 19 may include the computer-readable medium of example 16 and/or some other example herein, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a video frame of the multiple temporal layer reference frames that is two video frames from the nearest video frame.
  • Example 20 may include a method for decoding video data encoded using low-complexity enhancement video coding (LCEVC) , the method comprising: identifying, by a first device, a bitstream received from a second device, the bitstream comprising a first layer encoded using a base encoder and enhancement layers encoded using LCEVC; decoding a first video frame of the first layer of the bitstream using a base decoder; up-sampling the decoded first video frame; decoding encoded video data of a first enhancement layer of the enhancement layers; generating a first combined intermediate video frame by combining the up-sampled first video frame and the decoded video data of the first enhancement layer; up-sampling the first combined intermediate video frame; decoding encoded video data of a second enhancement layer of the enhancement layers; selecting, from among multiple temporal layer reference frames concurrently stored in a reference frame buffer, a reference frame with which to combine the decoded video data of the second enhancement layer; generating a second combined intermediate video frame by combining the decode
  • Example 21 may include the method of example 20 and/or some other example herein, further comprising: identifying a global configuration syntax of the bitstream; and identifying a temporal layer multiple reference frame indicator in the global configuration syntax, the temporal layer multiple reference frame indicator indicating that the device is enabled to select from among the multiple temporal layer reference frames, wherein the selection of the reference frame is based on the identification of the temporal layer multiple reference frame indicator.
  • Example 22 may include the method of example 21 and/or some other example herein, wherein the temporal layer multiple reference frame indicator consists of only one bit.
  • Example 23 may include the method of example 21 and/or some other example herein, further comprising: identifying a picture configuration syntax of the bitstream; identifying a refresh frame indicator in the picture configuration syntax, the refresh frame indicator indicating that a current video frame, a previous video frame, or a future reference frame is the reference frame; and identifying a reference frame index in the picture configuration syntax, the reference frame index indicating which of the multiple temporal layer reference frames is to be used for temporal prediction; and decoding temporal data of the bitstream based on the reference frame, wherein to decode the temporal data is based on the temporal prediction.
  • Example 24 may include the method of example 23 and/or some other example herein, wherein the refresh frame indicator and the reference frame index consists of two bytes.
  • Example 25 may include the method of example 23 and/or some other example herein, further comprising: identifying a temporal prediction type indicator in syntax of the bitstream, wherein the temporal prediction is based on the temporal prediction type indicator.
  • Example 26 may include an apparatus comprising means for: identifying a bitstream received from a device, the bitstream comprising a first layer encoded using a base encoder and enhancement layers encoded using LCEVC; decoding a first video frame of the first layer of the bitstream using a base decoder; up-sampling the decoded first video frame; decoding encoded video data of a first enhancement layer of the enhancement layers; generating a first combined intermediate video frame by combining the up-sampled first video frame and the decoded video data of the first enhancement layer; up-sampling the first combined intermediate video frame; decoding encoded video data of a second enhancement layer of the enhancement layers; selecting, from among multiple temporal layer reference frames concurrently stored in a reference frame buffer, a reference frame with which to combine the decoded video data of the second enhancement layer; generating a second combined intermediate video frame by combining the decoded video data of the second enhancement layer and the selected reference frame; and generating a combined output video frame by combining the first combined intermediate video frame and
  • Example 27 may include one or more non-transitory computer-readable media comprising instructions to cause an electronic device, upon execution of the instructions by one or more processors of the electronic device, to perform one or more elements of a method described in or related to any of examples 1-26, or any other method or process described herein
  • Example 28 may include an apparatus comprising logic, modules, and/or circuitry to perform one or more elements of a method described in or related to any of examples 1-26, or any other method or process described herein.
  • Example 29 may include a method, technique, or process as described in or related to any of examples 1-26, or portions or parts thereof.
  • Example 30 may include an apparatus comprising: one or more processors and one or more computer readable media comprising instructions that, when executed by the one or more processors, cause the one or more processors to perform the method, techniques, or process as described in or related to any of examples 1-26, or portions thereof.
  • Embodiments according to the disclosure are in particular disclosed in the attached claims directed to a method, a storage medium, a device and a computer program product, wherein any feature mentioned in one claim category, e.g., method, can be claimed in another claim category, e.g., system, as well.
  • the dependencies or references back in the attached claims are chosen for formal reasons only. However, any subject matter resulting from a deliberate reference back to any previous claims (in particular multiple dependencies) can be claimed as well, so that any combination ofclaims and the features thereof are disclosed and can be claimed regardless of the dependencies chosen in the attached claims.
  • These computer-executable program instructions may be loaded onto a special-purpose computer or other particular machine, a processor, or other programmable data processing apparatus to produce a particular machine, such that the instructions that execute on the computer, processor, or other programmable data processing apparatus create means for implementing one or more functions specified in the flow diagram block or blocks.
  • These computer program instructions may also be stored in a computer-readable storage media or memory that may direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable storage media produce an article of manufacture including instruction means that implement one or more functions specified in the flow diagram block or blocks.
  • certain implementations may provide for a computer program product, comprising a computer-readable storage medium having a computer-readable program code or program instructions implemented therein, said computer-readable program code adapted to be executed to implement one or more functions specified in the flow diagram block or blocks.
  • the computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational elements or steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions that execute on the computer or other programmable apparatus provide elements or steps for implementing the functions specified in the flow diagram block or blocks.
  • blocks of the block diagrams and flow diagrams support combinations of means for performing the specified functions, combinations of elements or steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that each block of the block diagrams and flow diagrams, and combinations of blocks in the block diagrams and flow diagrams, may be implemented by special-purpose, hardware-based computer systems that perform the specified functions, elements or steps, or combinations of special-purpose hardware and computer instructions.
  • Conditional language such as, among others, “can, ” “could, ” “might, ” or “may, ” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain implementations could include, while other implementations do not include, certain features, elements, and/or operations. Thus, such conditional language is not generally intended to imply that features, elements, and/or operations are in any way required for one or more implementations or that one or more implementations necessarily include logic for deciding, with or without user input or prompting, whether these features, elements, and/or operations are included or are to be performed in any particular implementation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

This disclosure describes systems, methods, and devices related to decoding low-complexity enhancement video coding (LCEVC) video data. A device may receive a bitstream including a first layer and enhancement layers; decode a video frame of the first layer of the bitstream using a base decoder; up-sample the decoded video frame; decode encoded video data of a first enhancement layer; generate a first combined intermediate video frame using the up-sampled video frame and the decoded video data of the first enhancement layer; up-sample the first combined intermediate video frame; decode encoded video data of a second enhancement layer; select, from among multiple reference frames, a reference frame; generate a second combined intermediate video frame using the decoded video data of the second enhancement layer and the selected reference frame; and generate a combined output video frame using the first combined intermediate video frame and the second combined intermediate video frame.

Description

LOW-COMPLEXITY ENHANCEMENT VIDEO CODING USING MULTIPLE REFERENCE FRAMES TECHNICAL FIELD
This disclosure generally relates to systems and methods for video coding and, more particularly, to Low-Complexity Enhancement Video Coding.
BACKGROUND
LCEVC (Low-Complexity Enhancement Video Coding) encodes video at a lower resolution version of a source image using any existing codec, and the difference between the reconstructed lower resolution image and the source using a different compression method. However, LCEVC is limited to using only one reference frame in the temporal layer.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is an example block diagram of a Low-Complexity Enhancement Video Coding decoder, according to some example embodiments of the present disclosure.
FIG. 2 illustrates example video decoding sequences using pyramid B-frames, according to some example embodiments of the present disclosure.
FIG. 3 is an example global configuration syntax for a bitstream encoded using Low-Complexity Enhancement Video Coding, according to some example embodiments of the present disclosure.
FIG. 4 is an example picture configuration syntax for a bitstream encoded using Low-Complexity Enhancement Video Coding, according to some example embodiments of the present disclosure.
FIG. 5 is an example temporal prediction type indicator of the syntax for a bitstream encoded using Low-Complexity Enhancement Video Coding, according to some example embodiments of the present disclosure.
FIG. 6 is an example block diagram of a Low-Complexity Enhancement Video Coding encoder, according to some example embodiments of the present disclosure.
FIG. 7 illustrates a flow diagram of an illustrative process for Low-Complexity Enhancement Video Coding using multiple reference frames, in accordance with one or more example embodiments of the present disclosure.
FIG. 8 illustrates an embodiment of an exemplary system, in accordance with one or more example embodiments of the present disclosure.
DETAILED DESCRIPTION
The following description and the drawings sufficiently illustrate specific embodiments to enable those skilled in the art to practice them. Other embodiments may incorporate structural, logical, electrical, process, algorithm, and other changes. Portions and features of some embodiments may be included in, or substituted for, those of other embodiments. Embodiments set forth in the claims encompass all available equivalents of those claims.
LCEVC (Low-Complexity Enhancement Video Coding) works by encoding a lower resolution version of a source image using any existing codec (e.g., the base codec, such as AVC, HEVC, VP9, AV1, etc. ) , and encoding the difference between the reconstructed lower resolution image and the source image using a different compression method (e.g., the enhancement) . LCEVC is an enhancement codec, meaning that it does not only up-sample well, it also encodes the residual information necessary for true fidelity to the source video and compresses the information (e.g., transforming, quantizing and coding the information) .
LCEVC residual layers (e.g., enhancement sub-layers) encode the residual information necessary for true fidelity to the source video and compress the information (e.g., transforming, quantizing, and coding it) . LCEVC can be used in higher resolutions or higher frame rates (e.g., 4Kp60, 8K, 12K) of video encoding, adaptive video streaming, and the like. LCEVC supports a temporal layer in a layer-two (L-2) enhancement layer to improve the BD-Rate (Bjontegaard rate difference) , which gives a temporal mask to indicate INTRA_PRED or INTER_PRED information per transform block (e.g., whether the block was coded using intra or inter prediction) . The transform block could be either 2x2 or 4x4, for example. If the temporal mask of the block is INTER_PRED, only the residual delta (e.g., residual of the current L-2 enhancement layer –reconstructed residual of previous L-2 enhancement layer) is encoded.
The current temporal layer implementation in the LCEVC standard only supports one reference frame at a time (e.g., a single reference frame in a reference frame buffer may be considered in the encoding or decoding of a given frame) . Reference frames are used in video coding and decoding to reduce the size of data to be encoded and decoded by identifying matching blocks of pixels (e.g., blocks of pixels from a reference frame that match blocks of a pixels in a frame being encoded or decoded) . While some coding standards allow for multiple reference frames to be considered for the encoding and decoding of one frame, LCEVC does not, and to allow LCEVC to support multiple reference frames requires modification to the syntax of a LCEVC bitstream.
In LCEVC, the reference frame buffer stores the previous L-2 reconstructed residuals in an encoded sequence as the reference frame (e.g., previous L-2 residual reconstructed picture) .  However, for sequences with B frames (e.g., video frames that may use forward and previous video frames as reference frames) , the default reference mechanism may not represent the best selection.
One reference frame may not be enough for a video encoder to achieve a better video quality and compression ratio. There may be techniques to apply multiple reference frames in existing non-LCEVC encoders, which can also be used to encode the base layer. Multiple reference frames can be used for the residual layer as well. One proposal is to apply the reference relationship of the base layer encoding to the residual layer. However, it is necessary to extend the current LCEVC syntax to support multiple reference frames. In the current LCEVC standard, there is no mechanism in the bitstream syntax to extend the reference frame capability from one to multiple reference frames at a given time.
The enhancements presented herein to extend the LCEVC standard solve such a problem by providing a mechanism to select multiple reference frame rather than being limited to only one reference frame in previous encoding order. The enhancements presented herein do not require any extra workload on the decoder’s side.
In one or more embodiments, an LCEVC encoder and decoder may define a new syntax in a payload global configuration and payload picture configuration as an extension of the LCEVC bitstream to support the use of multiple reference frames at the temporal layer of the decoder. The new syntax introduces a method to maintain the reference frame buffer of the decoder. A refresh flag may be used to indicate whether the current frame is used as reference frame, and if yes, to indicate which stored reference frame is to be refreshed with a current frame. A reference frame index may be used to indicate which stored reference frame is used for a current frame temporal prediction.
In one or more embodiments, the enhancements herein allow for LCEVC encoding in the L-2 enhancement layer more efficient if the temporal layer is enabled because the temporal layer may leverage the frame reference relationship of the base layer. MPEG-5 Part 2 LCEVC is a new video coding standard and was recently published in 2020. It can leverage existing and future codecs to enhance their performances whilst reducing their computational complexity. The new syntax structure for multi-reference temporal prediction represents an extension of the current encode process, and the previous encoded bitstream can also be correctly decoded. The present disclosure allows the producer (e.g., encoder) a choice to use more computer (CPU) cycles to achieve better quality without requiring extra workload on the consumer (e.g., decoder) side.
In one or more embodiments, the proposed new syntax may change some bit values in the payload global configuration and payload picture configuration. Thus, the enhancements of the present disclosure may be detected by parsing the encoded enhancement layer bitstream.
In one or more embodiments, to support the multiple reference frame feature for the LCEVC temporal layer, a new syntax is proposed. The new syntax is not the replacement of the existing syntax, but rather serves as an extension to the existing syntax. Based on a current payload global configuration syntax, a new flag ‘temporal_multi_ref_flag’ may be added to the syntax to indicate whether multiple reference frames for temporal prediction are enabled. The newly added flag reuses a reserved syntax bit, so it does not affect the current decoding process.
In one or more embodiments, to support the temporal prediction use multiple reference frames in LCEVC, two bytes of data may be added in the end of a picture configuration payload when the ‘temporal_multi_ref_flag’ is enabled. The new inserted syntax may include ‘refresh_frame_flags’ and ‘ref_frame_idx’ . In the new proposed syntax, the MAX_REF_FRAMES indicator may indicate the maximum number of reference frames that may be used by the temporal layer. The refresh_frame_flags indicator may be used to indicate whether the current frame will be used as reference frame, if bit [i] is not zero, and the reference frame buffer [i] may be updated after the current frame decoded. The ref_frame_idx [i] may be used to indicate which reference frame in the reference frames buffer is used for temporal prediction.
In one or more embodiments, a new temporal prediction type may be added to the syntax for LCEVC. In multiple reference temporal prediction mode, TEMPORAL_PRED may indicate the nearest frame in the reference frames, and TEMPORAL_PREDi may indicate the ith nearest frame.
In one or more embodiments, the LCEVC decode process flow after enabling the multiple reference temporal prediction is provided herein. When temporal prediction is enabled for LCEVC with the multiple reference frame temporal prediction enabled, the most similar reference frame of the multiple references frames may be selected among the stored reference frames (e.g., a previous L-2 residual reconstructed picture list) to achieve better encoder BD-rate.
The above descriptions are for purposes of illustration and are not meant to be limiting. Numerous other examples, configurations, processes, algorithms, etc., may exist, some of which are described in greater detail below. Example embodiments will now be described with reference to the accompanying figures.
FIG. 1 is an example block diagram of a Low-Complexity Enhancement Video Coding (LCEVC) decoder 100 , according to some example embodiments of the present disclosure.
Referring to FIG. 1, the LCEVC decoder 100 may receive a bitstream 102 (e.g., an encoded bitstream as generated by the LCEVC encoder 600 of FIG. 6) having multiple layers (e.g., a base layer and enhancement layers) . Frames encoded at the base layer of the bitstream 102 may include base layer data 104. Frames encoded an a first enhancement layer (e.g., Layer-1) may include Layer-1 coefficient data 106. Frames encoded at a second enhancement layer (e.g., Layer-2) may include Layer-2 coefficient data 108. Frames encoded at a temporal layer may include temporal data 110. Headers 112 of the bitstream 102 may be input to a decoder configuration 114 used by the LCEVC decoder 100. The layers of the bitstream and the ways that they are encoded are explained below with respect to FIG. 6.
Still referring to FIG. 1, the base layer data 104 may be decoded by a base layer decoder 116 (e.g., a non-LCEVC decoder) , resulting in a decoded base layer frame 118. An upscaler 120 may up-sample (e.g., increase the pixel count of the image) the decoded base layer frame 118, resulting in an up-sampled base layer frame 122 (e.g., a preliminary intermediate frame) . The Layer-1 coefficient data 106 may be decoded using an entropy decoder 124, and inverse quantization 126 may be performed on the decoded Layer-1 coefficient data to identify the transform coefficients. Inverse transformation 128 may determine the transform used to generate the Layer-1 coefficient data 106. The Layer-1 data that has been inversely transformed may pass through a Layer-1 filter 130 to generate a Layer-1 decoded frame 132. The Layer-1 decoded frame 132 and the up-sampled base layer frame 122 may be added by an adder 134 to generate a combined intermediate frame 136. The combined intermediate frame 136 may be up-sampled by an upscaler 138 to generate a combined intermediate frame 140 at full resolution.
Still referring to FIG. 4, the Layer-2 coefficient data 108 may be decoded using an entropy decoder 142, and inverse quantization 144 may be performed on the decoded Layer-2 coefficient data to identify the transform coefficients. Inverse transformation 146 may determine the transform used to generate the Layer-2 coefficient data 108. The Layer-2 data that has been inversely transformed may generate Layer-2 residuals 148. The temporal data 110 may be decoded using an entropy decoder 150. When inter prediction 152 was used (e.g., as indicated by the syntax of the bitstream 102) , only the Layer-2 residuals 148 may be decoded. The decoded temporal data may be compared to a reference frame in a reference frame buffer 154. The reference frame buffer 154 may store multiple reference frames 155 at a given time, allowing the LCEVC decoder 100 to select one of the references frames (e.g., the  best matching reference frame for the given frame being decoded) to combine with the decoded temporal data to generate an intermediate frame 156. The intermediate frame 156 may be combined with the Layer-2 residuals 148 at an adder 158 to generate a combined intermediate frame 160. The combined intermediate frame 160 and the combined intermediate frame 140 may be combined by an adder 162 to generate a combined output video frame. The combined output video frames 164 of the LCEVC decoder 100 may be presented for playback.
In one or more embodiments, to support the multiple reference frame feature for the LCEVC temporal layer, a new syntax for the bitstream 102 is proposed and is shown in FIGs. 3-5. The new syntax is not the replacement of the existing syntax, but rather serves as an extension to the existing syntax. Based on a current payload global configuration syntax, a new flag ‘temporal_multi_ref_flag’ may be added to the syntax to indicate whether multiple reference frames for temporal prediction are enabled. The newly added flag reuses a reserved syntax bit, so it does not affect the current decoding process.
In one or more embodiments, to support the temporal prediction use multiple reference frames in LCEVC, two bytes of data may be added in the end of a picture configuration payload of the bitstream 102 when the ‘temporal_multi_ref_flag’ is enabled. The new inserted syntax may include ‘refresh_frame_flags’ and ‘ref_frame_idx’ . In the new proposed syntax, the MAX_REF_FRAMES indicator may indicate the maximum number of reference frames that may be used by the temporal layer. The refresh_frame_flags indicator may be used to indicate whether the current frame will be used as reference frame, if bit [i] is not zero, and the reference frame buffer [i] may be updated after the current frame decoded. The ref_frame_idx [i] may be used to indicate which reference frame in the reference frames buffer is used for temporal prediction.
In one or more embodiments, a new temporal prediction type may be added to the syntax of the bitstream 102. In multiple reference temporal prediction mode, TEMPORAL_PRED may indicate the nearest frame in the reference frames, and TEMPORAL_PREDi may indicate the ith nearest frame.
In one or more embodiments, when temporal prediction is enabled for LCEVC with the multiple reference frame temporal prediction enabled, the most similar reference frame of the multiple references frames may be selected by the LCEVC decoder 100 among the stored reference frames (e.g., the multiple reference frames 155) to achieve better encoder BD-rate.
FIG. 2 illustrates example video decoding sequences using pyramid B-frames, according to some example embodiments of the present disclosure.
Referring to FIG. 2, a picture order count (POC) of frames 0-8 is shown for a video decoding sequence 200, along with their encoding/decoding orders and which encoding layer was used to encode the respective POC frames. The video decoding sequence 200 represents a Layer-2 enhancement layer reference relationship according to the current LCEVC standard when temporal prediction is enabled.
Still referring to FIG. 2, the picture order count (POC) of frames 0-8 is shown for a video decoding sequence 250, along with their encoding/decoding orders and which encoding layer was used to encode the respective POC frames. The L-2 layer of picture order count 6 (POC6) uses POC3 as a reference frame because the encode order of POC6 is 6 and the encode order of POC3 is 5 (e.g., POC6 uses the most recent previously encoded POC as a reference frame) . However, POC3 is not a good selection as the reference frame for POC6, as the POC4 should be closer to POC6 than POC3, and in an optimal encoder, if POC4 were used as the reference frame for POC6, the BD-Rate would be improved (e.g., compared to using POC3 as the reference frame) . The mechanism presented herein to extend the LCEVC standard solves this problem by providing a mechanism to select from multiple reference frames, not only one reference frame in a previous encoding order, and does not add workload to the decoder’s side.
FIG. 3 is an example global configuration syntax 300 for a bitstream (e.g., the bitstream 102 of FIG. 1) encoded using Low-Complexity Enhancement Video Coding, according to some example embodiments of the present disclosure.
Referring to FIG. 1, a portion 302 of the global configuration syntax 300 as currently defined by the LCEVC standard is shown, and includes a reserved bit 304. To enable the use of multiple reference frames for LCEVC (e.g., as shown in FIG. 1) , the global configuration syntax 300 may be updated to use the reserved bit 304 as an indicator 306 (e.g., temporal_multi_ref_flag) for whether multiple reference frames are enabled for temporal prediction (e.g., the multiple reference frames 155 of FIG. 1) . Because the indicator 306 reuses the reserved bit 304, the decode process (e.g., of the LCEVC decoder 100 of FIG. 1) is not impacted by adding any bits to decode from the bitstream.
FIG. 4 is an example picture configuration syntax 400 for a bitstream (e.g., the bitstream 102 of FIG. 1) encoded using Low-Complexity Enhancement Video Coding, according to some example embodiments of the present disclosure.
Referring to FIG. 4, a portion 402 of the picture configuration syntax 400 is shown, and when the indicator 306 of FIG. 3 is present in the global configuration syntax 300, two bytes of data may be added to the picture configuration payload 404 of the picture configuration syntax 400, including a refresh_frame_flags indicator 406, reserved bits 408, a reference frame  index 410 (ref_frame_idx [i] ) , and a maximum reference frames indicator 412 (MAX_REF_FRAMES) . The maximum reference frames indicator 412 may indicate the maximum number of reference frames that may be stored (e.g., the maximum number of the multiple reference frames 155 of FIG. 1) . The refresh_frame_flags indicator 406 may indicate whether the current frame will be used as reference frame, and if bit [i] is not zero, the reference frame buffer [i] (e.g., the reference frame buffer 154 of FIG. 1) may be updated after the current frame is decoded. The reference frame index 410 may be used to indicate which reference frame of the multiple reference frames in the reference frame buffer is to be used for temporal prediction.
FIG. 5 is an example temporal prediction type indicator 500 of the syntax for a bitstream (e.g., the bitstream 102 of FIG. 1) encoded using Low-Complexity Enhancement Video Coding, according to some example embodiments of the present disclosure.
Referring to FIG. 5, syntax 502 of the bitstream may indicate whether the temporal layer prediction type is inter or intra. A temporal prediction type indicator 504 TEMPORAL_PRED indicates the nearest frame in the reference frames (e.g., the multiple reference frames 155 of FIG. 1) is used as the reference frame, and TEMPORAL_PREDi indicates the ith nearest frame is used as the reference frame. For example, a TEMPORAL_PRED1 indicator 506 indicates that an adjacent frame is the reference frame, a TEMPORAL_PRED2 indicator 508 indicates that a frame two frames away is the reference frame, and a TEMPORAL_PRED3 indicator 510 indicates that a frame three frames away is the reference frame.
In one or more embodiments, with reference to FIGs. 1 and 3-5, the following pseudo code may be used (e.g., in the syntax of the bitstream 102 of FIG. 1) to facilitate use of multiple reference frames (e.g., the multiple reference frames 155 of FIG. 1) for temporal prediction: Multiple Reference Temporal Decode process for Frame i:
Figure PCTCN2022106236-appb-000001
Figure PCTCN2022106236-appb-000002
FIG. 6 is an example block diagram of a LCEVC encoder 600, according to some example embodiments of the present disclosure.
Referring to FIG. 6, the LCEVC encoder 600 may generate the bitstream 102 of FIG. 1. An input sequence 602 of video frames may be used to generate the bitstream 102 using an encoder configuration 604. The input sequence 602 may be down-sampled by a downscaler 606 to generate a downscaled frame 608, which may be down-sampled further by a downscaler 610 to generate a downscaled frame 612. The downscaled frame 612 may be encoded by a base encoder 614 (e.g., a non-LCEVC encoder) to generate an encoded base 616 (e.g., the base layer of the bitstream 102 used for the base layer data 104 of FIG. 1) . The encoded frame from the based encoder 614 may be up-sampled by an upscaler 618 and subtracted from the downscaled frame 608 by a subtractor 620 to generate Layer-1 residuals on which a transform 622 and quantization 624 may be performed (e.g., for reconstruction) . The transformed and quantized Layer-1 data may be used for inverse quantization 626 and inverse transform 628 (e.g., to reconstruct the pixel data) , and passed through a Layer-1 filter 630. The quantized Layer-1 data from the quantization 624 may produce the Layer-1 coefficient layers 634 for the  bitstream 102. The filtered Layer-1 data may be added to the up-sampled frame from the upscaler 618 by an adder 636 to generate an intermediate frame, which may be up-sampled again by an upscaler 638. A frame from the input sequence 602 may be subtracted from the intermediate frame by a subtractor 640 to generate Layer-2 residuals input for temporal prediction 642. Transform 644 and quantization 646 may be performed on the temporal prediction 642, which then may be entropy encoded 648 to generate Layer-2 coefficient layers 650 for the bitstream 102. The temporal prediction 642 data may be entropy encoded 652 to generate the temporal layer 654 of the bitstream 102. The encoder configuration 604 may be indicated in the headers 656 of the bitstream 102 syntax.
Referring to FIGs. 1 and 6, a subtractor may generate a residual as explained further herein. Transform and quantization may generate and quantize transform units to facilitate encoding by a coder (e.g., entropy coder) . Transform and quantized data may be inversely transformed and inversely quantized by an inverse transform and quantizer on the decoder side. An adder may compare the inversely transformed and inversely quantized data to a prediction block generated by a prediction unit (e.g., temporal prediction) , resulting in reconstructed frames. A filter (e.g., in-loop filter for resizing/cropping, color conversion, de-interlacing, composition/blending, etc. ) may revise reconstructed frames from an adder, and may store the reconstructed frames in an image buffer (e.g., the reference frame buffer 154 of FIG. 1) . A control may manage many encoding aspects (e.g., parameters) including at least the setting of a quantization parameter (QP) but could also include setting bitrate, rate distortion or scene characteristics, prediction and/or transform partition or block sizes, available prediction mode types, and best mode selection parameters, for example, based at least partly on data from the prediction unit. Using the encoding aspects, the transform and quantization processes may generate and quantize transform units to facilitate encoding by the coder, which may generate coded data that may be transmitted (e.g., an encoded bitstream) .
Still referring to FIGs. 1 and 6, inverse transform and quantization may reconstruct pixel data based on the quantized residual coefficients and context data. An adder may add the residual pixel data to a predicted block generated by a prediction unit. A filter may filter the resulting data from the adder. The filtered data may be output by a media output, and also may be stored as reconstructed frames in an image buffer (e.g., the reference frame buffer 154 of FIG. 1) for use by the prediction unit.
Referring to FIGs. 1 and 6, the LCEVC decoder 100 and encoder 600 performs the methods of intra prediction disclosed herein, and is arranged to perform at least one or more of the implementations described herein including intra block copying. In various  implementations, the LCEVC decoder 100 and encoder 600 may be configured to undertake video coding and/or implement video codecs according to one or more standards. Further, in various forms, LCEVC decoder 100 and encoder 600 may be implemented as part of an image processor, video processor, and/or media processor and undertakes inter-prediction, intra-prediction, predictive coding, and residual prediction. In various implementations, LCEVC decoder 100 and encoder 600 may undertake video compression and decompression and/or implement video codecs according to one or more standards or specifications, such as, for example, H. 264 (Advanced Video Coding, or AVC) , VP8, H. 265 (High Efficiency Video Coding or HEVC) and SCC extensions thereof, VP9, Alliance Open Media Version 1 (AV1) , H. 266 (Versatile Video Coding, or VVC) , DASH (Dynamic Adaptive Streaming over HTTP) , and others.
As used herein, the term “coder” may refer to an encoder and/or a decoder. Similarly, as used herein, the term “coding” may refer to encoding via an encoder and/or decoding via a decoder. A coder, encoder, or decoder may have components of both an encoder and decoder. An encoder may have a decoder loop as described below.
For example, the LCEVC encoder 600 may be an encoder where current video information in the form of data related to a sequence of video frames may be received to be compressed. By one form, a video sequence is formed of input frames of synthetic screen content such as from, or for, business applications such as word processors, power points, or spread sheets, computers, video games, virtual reality images, and so forth. By other forms, the images may be formed of a combination of synthetic screen content and natural camera captured images. By yet another form, the video sequence only may be natural camera captured video. A partitioner may partition each frame into smaller more manageable units, and then compare the frames to compute a prediction. If a difference or residual is determined between an original block and prediction, that resulting residual is transformed and quantized, and then entropy encoded and transmitted in a bitstream, along with reconstructed frames, out to decoders or storage. To perform these operations, the LCEVC encoder 600 may receive an input frame from the input sequence 602. The input frames may be frames sufficiently pre-processed for encoding.
The LCEVC encoder 600 also may manage many encoding aspects including at least the setting of a quantization parameter (QP) but could also include setting bitrate, rate distortion or scene characteristics, prediction and/or transform partition or block sizes, available prediction mode types, and best mode selection parameters to name a few examples.
The output of the transformed and quantized data may be provided to the inverse transform and quantization to generate the same reference or reconstructed blocks, frames, or other units as would be generated at a decoder such as the LCEVC decoder 100. Thus, the prediction unit may use the inverse transform and quantization, adder, and filter to reconstruct the frames.
A prediction unit may perform inter-prediction including motion estimation and motion compensation, intra-prediction according to the description herein, and/or a combined inter-intra prediction. The prediction unit may select the best prediction mode (including intra-modes) for a particular block, typically based on bit-cost and other factors. The prediction unit may select an intra-prediction and/or inter-prediction mode when multiple such modes of each may be available. The prediction output of the prediction unit in the form of a prediction block may be provided both to the subtractor to generate a residual, and in the decoding loop to the adder to add the prediction to the reconstructed residual from the inverse transform to reconstruct a frame.
The partitioner or other initial units not shown may place frames in order for encoding and assign classifications to the frames, such as I-frame, B-frame, P-frame and so forth, where I-frames are intra-predicted. Otherwise, frames may be divided into slices (such as an I-slice) where each slice may be predicted differently. Thus, for HEVC or AV1 coding of an entire I-frame or I-slice, spatial or intra-prediction is used, and in one form, only from data in the frame itself.
In various implementations, the prediction unit may perform an intra block copy (IBC) prediction mode and a non-IBC mode operates any other available intra-prediction mode such as neighbor horizontal, diagonal, or direct coding (DC) prediction mode, palette mode, directional or angle modes, and any other available intra-prediction mode. Other video coding standards, such as HEVC or VP9 may have different sub-block dimensions but still may use the IBC search disclosed herein. It should be noted, however, that the foregoing are only example partition sizes and shapes, the present disclosure not being limited to any particular partition and partition shapes and/or sizes unless such a limit is mentioned or the context suggests such a limit, such as with the optional maximum efficiency size as mentioned. It should be noted that multiple alternative partitions may be provided as prediction candidates for the same image area as described below.
The prediction unit may select previously decoded reference blocks. Then comparisons may be performed to determine if any of the reference blocks match a current block being reconstructed. This may involve hash matching, SAD search, or other comparison of image  data, and so forth. Once a match is found with a reference block, the prediction unit may use the image data of the one or more matching reference blocks to select a prediction mode. By one form, previously reconstructed image data of the reference block is provided as the prediction, but alternatively, the original pixel image data of the reference block could be provided as the prediction instead. Either choice may be used regardless of the type of image data that was used to match the blocks.
The predicted block then may be subtracted at subtractor from the current block of original image data, and the resulting residual may be partitioned into one or more transform blocks (TUs) so that the transform and quantization can transform the divided residual data into transform coefficients using discrete cosine transform (DCT) for example. Using the quantization parameter (QP) set by the LCEVC encoder 600, the transform and quantization then uses lossy resampling or quantization on the coefficients. The frames and residuals along with supporting or context data block size and intra displacement vectors and so forth may be entropy encoded by the LCEVCe encoder 600 and transmitted to decoders.
In one or more embodiments, the LCEVC decoder 100 may receive coded video data in the form of a bitstream and that has the image data (chroma and luma pixel values) and as well as context data including residuals in the form of quantized transform coefficients and the identity of reference blocks including at least the size of the reference blocks, for example. The context also may include prediction modes for individual blocks, other partitions such as slices, inter-prediction motion vectors, partitions, quantization parameters, filter information, and so forth. The LCEVC decoder 100 may process the bitstream with an entropy decoder to extract the quantized residual coefficients as well as the context data. The LCEVC decoder 100 then may use the inverse transform and quantization to reconstruct the residual pixel data.
The LCEVC decoder 100 then may use an adder (along with assemblers not shown) to add the residual to a predicted block. The LCEVC decoder 100 also may decode the resulting data using a decoding technique employed depending on the coding mode indicated in syntax of the bitstream, and either a first path including a prediction unit or a second path that includes a filter. The prediction unit performs intra-prediction by using reference block sizes and the intra displacement or motion vectors extracted from the bitstream, and previously established at the encoder. The prediction unit may utilize reconstructed frames as well as inter-prediction motion vectors from the bitstream to reconstruct a predicted block. The prediction unit may set the correct prediction mode for each block, where the prediction mode may be extracted and decompressed from the compressed bitstream.
In one or more embodiments, the coded data (e.g., the bitstream 102) may include both video and audio data.
It is understood that the above descriptions are for purposes of illustration and are not meant to be limiting.
FIG. 7 illustrates a flow diagram of an illustrative process for LCEVC using multiple reference frames, in accordance with one or more example embodiments of the present disclosure.
At block 702, a device (e.g., the LCEVC decoder 100 of FIG. 1, the graphics card 865 of FIG. 8, the LCEVC device 819 of FIG. 8) may identify a bitstream (e.g., the bitstream 102 of FIG. 1) encoded using a base encoder (e.g., the base layer data 104 of FIG. 1 of the encoded base 616 generated by the base encoder 614 of FIG. 1) and enhancement layers (e.g., the Layer-1 coefficient data 106 of FIG. 1 of the Layer-1 coefficient layers 634 of FIG. 6, the Layer-2 coefficient data 108 of FIG. 1 of the Layer-2 coefficient layers 650 of FIG. 6, the temporal data 110 of the temporal layer 654 of FIG. 6) , and headers (e.g., the headers 112 of FIG. 1 and the headers 656 of FIG. 6) . The syntax of the bitstream may include one or more indicators, including the indicator 306 of FIG. 3 indicating whether multiple reference frames are enabled for temporal prediction, the refresh_frame_flags indicator 406 of FIG. 4 indicating whether the current frame will be used as reference frame, the reference frame index 410 of FIG. 4 to indicate which reference frame of the multiple reference frames in the reference frame buffer is to be used for temporal prediction, the maximum reference frames indicator 412 of FIG. 4 to indicate the maximum number of reference frames that may be stored, and/or the TEMPORAL_PREDi indicator of FIG. 5 indicates the ith nearest frame is used as the reference frame.
At block 704, the device may decode a first video frame of the first layer of the bitstream using a base decoder (e.g., decode a frame having the base layer data 104 using the base layer decoder 116) . The base decoder may be a non-LCEVC decoder (e.g., using a codec different than LCEVC) .
At block 706, the device may up-sample the decoded first video frame (e.g., up-sample the decoded base layer frame 118 of FIG. 1 using the upscaler 120 of FIG. 1) .
At block 708, the device may decode encoded video data of a first enhancement layer of the enhancement layers (e.g., decode the Layer-1 coefficient data 106 using the entropy decoding 124 of FIG. 1) .
At block 710, the device may generate a first combined intermediate video frame (e.g., the combined intermediate frame 136 of FIG. 1) by combining the up-sampled first video frame and the decoded video data of the first enhancement layer.
At block 712, the device may up-sample the first combined intermediate video frame (e.g., using the upscaler 138 of FIG. 1) .
At block 714, the device may decode encoded video data of a second enhancement layer of the enhancement layers (e.g., decode the Layer-2 coefficient data 108 using the entropy decoding 142 of FIG. 1) .
At block 716, the device may select, from among multiple temporal layer reference frames stored in a reference frame buffer (e.g., the multiple reference frames 155 of FIG. 1) , a reference frame with which to combine the decoded video data of the second enhancement layer.
At block 718, the device may generate a second combined intermediate video frame (e.g., the combined intermediate frame 160 of FIG. 1) by combining the decoded video data of the second enhancement layer and the selected reference frame.
At block 720, the device may generate a combined output video frame (e.g., the combined output video frames 164 of FIG. 1) by combining the first combined intermediate video frame and the second combined intermediate video frame. Combined output video frames generated by the process 700 may represent the video frames used for playback. At block 722, the combined output video frames may be presented for playback.
It is understood that the above descriptions are for purposes of illustration and are not meant to be limiting.
FIG. 8 illustrates an embodiment of an exemplary system 800, in accordance with one or more example embodiments of the present disclosure.
In various embodiments, the computing system 800 may comprise or be implemented as part of an electronic device.
In some embodiments, the computing system 800 may be representative, for example, of a computer system that implements one or more components of FIG. 1.
The embodiments are not limited in this context. More generally, the computing system 800 is configured to implement all logic, systems, processes, logic flows, methods, equations, apparatuses, and functionality described herein and with reference to FIGS. 1-7.
The system 800 may be a computer system with multiple processor cores such as a distributed computing system, supercomputer, high-performance computing system, computing cluster, mainframe computer, mini-computer, client-server system, personal  computer (PC) , workstation, server, portable computer, laptop computer, tablet computer, a handheld device such as a personal digital assistant (PDA) , or other devices for processing, displaying, or transmitting information. Similar embodiments may comprise, e.g., entertainment devices such as a portable music player or a portable video player, a smart phone or other cellular phones, a telephone, a digital video camera, a digital still camera, an external storage device, or the like. Further embodiments implement larger scale server configurations. In other embodiments, the system 800 may have a single processor with one core or more than one processor. Note that the term “processor” refers to a processor with a single core or a processor package with multiple processor cores.
In at least one embodiment, the computing system 800 is representative of one or more components of FIG. 1 and FIG. 6. More generally, the computing system 800 is configured to implement all logic, systems, processes, logic flows, methods, apparatuses, and functionality described herein with reference to the above figures.
As used in this application, the terms “system” and “component” and “module” are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution, examples of which are provided by the exemplary system 800. For example, a component can be, but is not limited to being, a process running on a processor, a processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium) , an object, an executable, a thread of execution, a program, and/or a computer.
By way of illustration, both an application running on a server and the server can be a component. One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers. Further, components may be communicatively coupled to each other by various types of communications media to coordinate operations. The coordination may involve the uni-directional or bi-directional exchange of information. For instance, the components may communicate information in the form of signals communicated over the communications media. The information can be implemented as signals allocated to various signal lines. In such allocations, each message is a signal. Further embodiments, however, may alternatively employ data messages. Such data messages may be sent across various connections. Exemplary connections include parallel interfaces, serial interfaces, and bus interfaces.
As shown in this figure, system 800 comprises a motherboard 805 for mounting platform components. The motherboard 805 is a point-to-point interconnect platform that includes a processor 810, a processor 830 coupled via a point-to-point interconnects as an Ultra Path  Interconnect (UPI) , and a LCEVC device 819 (e.g., capable of performing the functions of FIGs. 1, 6, and 7) . In other embodiments, the system 800 may be of another bus architecture, such as a multi-drop bus. Furthermore, each of  processors  810 and 830 may be processor packages with multiple processor cores. As an example,  processors  810 and 830 are shown to include processor core (s) 820 and 840, respectively. While the system 800 is an example of a two-socket (2S) platform, other embodiments may include more than two sockets or one socket. For example, some embodiments may include a four-socket (4S) platform or an eight-socket (8S) platform. Each socket is a mount for a processor and may have a socket identifier. Note that the term platform refers to the motherboard with certain components mounted such as the processors 810 and the chipset 860. Some platforms may include additional components and some platforms may only include sockets to mount the processors and/or the chipset.
The  processors  810 and 830 can be any of various commercially available processors, including without limitation an 
Figure PCTCN2022106236-appb-000003
Core (2) 
Figure PCTCN2022106236-appb-000004
Figure PCTCN2022106236-appb-000005
and 
Figure PCTCN2022106236-appb-000006
processors; 
Figure PCTCN2022106236-appb-000007
and 
Figure PCTCN2022106236-appb-000008
processors; 
Figure PCTCN2022106236-appb-000009
application, embedded and secure processors; 
Figure PCTCN2022106236-appb-000010
and 
Figure PCTCN2022106236-appb-000011
Figure PCTCN2022106236-appb-000012
and 
Figure PCTCN2022106236-appb-000013
processors; IBM and 
Figure PCTCN2022106236-appb-000014
Cell processors; and similar processors. Dual microprocessors, multi-core processors, and other multi-processor architectures may also be employed as the  processors  810, and 830.
The processor 810 includes an integrated memory controller (IMC) 814 and point-to-point (P-P) interfaces 818 and 852. Similarly, the processor 830 includes an IMC 834 and  P-P interfaces  838 and 854. The IMC’s 814 and 834 couple the  processors  810 and 830, respectively, to respective memories, a memory 812 and a memory 832. The  memories  812 and 832 may be portions of the main memory (e.g., a dynamic random-access memory (DRAM) ) for the platform such as double data rate type 3 (DDR3) or type 4 (DDR4) synchronous DRAM (SDRAM) . In the present embodiment, the  memories  812 and 832 locally attach to the  respective processors  810 and 830.
In addition to the  processors  810 and 830, the system 800 may include the LCEVC device 819. The LCEVC device 819 may be connected to chipset 860 by means of  P-P interfaces  829 and 869. The LCEVC device 819 may also be connected to a memory 839. In some embodiments, the LCEVC device 819 may be connected to at least one of the  processors  810 and 830. In other embodiments, the  memories  812, 832, and 839 may couple with the  processor  810 and 830, and the LCEVC device 819 via a bus and shared memory hub.
System 800 includes chipset 860 coupled to  processors  810 and 830. Furthermore, chipset 860 can be coupled to storage medium 803, for example, via an interface (I/F) 866.  The I/F 866 may be, for example, a Peripheral Component Interconnect-enhanced (PCI-e) . The  processors  810, 830, and the LCEVC device 819 may access the storage medium 803 through chipset 860.
Storage medium 803 may comprise any non-transitory computer-readable storage medium or machine-readable storage medium, such as an optical, magnetic or semiconductor storage medium. In various embodiments, storage medium 803 may comprise an article of manufacture. In some embodiments, storage medium 803 may store computer-executable instructions, such as computer-executable instructions 802 to implement one or more of processes or operations described herein, (e.g., process 700 of FIG. 7) . The storage medium 803 may store computer-executable instructions for any equations depicted above. The storage medium 803 may further store computer-executable instructions for models and/or networks described herein, such as a neural network or the like. Examples of a computer-readable storage medium or machine-readable storage medium may include any tangible media capable of storing electronic data, including volatile memory or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth. Examples of computer-executable instructions may include any suitable types of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, object-oriented code, visual code, and the like. It should be understood that the embodiments are not limited in this context.
The processor 810 couples to a chipset 860 via  P-P interfaces  852 and 862 and the processor 830 couples to a chipset 860 via  P-P interfaces  854 and 864. Direct Media Interfaces (DMIs) may couple the  P-P interfaces  852 and 862 and the P-P interfaces 854 and 864, respectively. The DMI may be a high-speed interconnect that facilitates, e.g., eight Giga Transfers per second (GT/s) such as DMI 3.0. In other embodiments, the  processors  810 and 830 may interconnect via a bus.
The chipset 860 may comprise a controller hub such as a platform controller hub (PCH) . The chipset 860 may include a system clock to perform clocking functions and include interfaces for an I/O bus such as a universal serial bus (USB) , peripheral component interconnects (PCIs) , serial peripheral interconnects (SPIs) , integrated interconnects (I2Cs) , and the like, to facilitate connection of peripheral devices on the platform. In other embodiments, the chipset 860 may comprise more than one controller hub such as a chipset with a memory controller hub, a graphics controller hub, and an input/output (I/O) controller hub.
In the present embodiment, the chipset 860 couples with a trusted platform module (TPM) 872 and the UEFI, BIOS, Flash component 874 via an interface (I/F) 870. The TPM 872 is a dedicated microcontroller designed to secure hardware by integrating cryptographic keys into devices. The UEFI, BIOS, Flash component 874 may provide pre-boot code.
Furthermore, chipset 860 includes the I/F 866 to couple chipset 860 with a high-performance graphics engine, graphics card 865. In other embodiments, the system 800 may include a flexible display interface (FDI) between the  processors  810 and 830 and the chipset 860. The FDI interconnects a graphics processor core in a processor with the chipset 860.
Various I/O devices 892 couple to the bus 881, along with a bus bridge 880 which couples the bus 881 to a second bus 891 and an I/F 868 that connects the bus 881 with the chipset 860. In one embodiment, the second bus 891 may be a low pin count (LPC) bus. Various devices may couple to the second bus 891 including, for example, a keyboard 882, a mouse 884, communication devices 886, a storage medium 801, and an audio I/O 890.
The artificial intelligence (AI) accelerator 867 may be circuitry arranged to perform computations related to AI. The AI accelerator 867 may be connected to storage medium 803 and chipset 860. The AI accelerator 867 may deliver the processing power and energy efficiency needed to enable abundant-data computing. The AI accelerator 867 is a class of specialized hardware accelerators or computer systems designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. The AI accelerator 867 may be applicable to algorithms for robotics, internet of things, other data-intensive and/or sensor-driven tasks.
Many of the I/O devices 892, communication devices 886, and the storage medium 801 may reside on the motherboard 805 while the keyboard 882 and the mouse 884 may be add-on peripherals. In other embodiments, some or all the I/O devices 892, communication devices 886, and the storage medium 801 are add-on peripherals and do not reside on the motherboard 805.
Some examples may be described using the expression “in one example” or “an example” along with their derivatives. These terms mean that a particular feature, structure, or characteristic described in connection with the example is included in at least one example. The appearances of the phrase “in one example” in various places in the specification are not necessarily all referring to the same example.
Some examples may be described using the expression “coupled” and “connected” along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, descriptions using the terms “connected” and/or “coupled” may indicate that two  or more elements are in direct physical or electrical contact with each other. The term “coupled, ” however, may also mean that two or more elements are not in direct contact with each other, yet still co-operate or interact with each other.
In addition, in the foregoing Detailed Description, various features are grouped together in a single example to streamline the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed examples require more features than are expressly recited in each claim. Rather, as the following claims reflect, the inventive subject matter lies in less than all features of a single disclosed example. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate example. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein, ” respectively. Moreover, the terms “first, ” “second, ” “third, ” and so forth, are used merely as labels and are not intended to impose numerical requirements on their objects.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories that provide temporary storage of at least some program code to reduce the number of times code must be retrieved from bulk storage during execution. The term “code” covers a broad range of software components and constructs, including applications, drivers, processes, routines, methods, modules, firmware, microcode, and subprograms. Thus, the term “code” may be used to refer to any collection of instructions that, when executed by a processing system, perform a desired operation or operations.
Logic circuitry, devices, and interfaces herein described may perform functions implemented in hardware and implemented with code executed on one or more processors. Logic circuitry refers to the hardware or the hardware and code that implements one or more logical functions. Circuitry is hardware and may refer to one or more circuits. Each circuit may perform a particular function. A circuit of the circuitry may comprise discrete electrical components interconnected with one or more conductors, an integrated circuit, a chip package,  a chipset, memory, or the like. Integrated circuits include circuits created on a substrate such as a silicon wafer and may comprise components. Integrated circuits, processor packages, chip packages, and chipsets may comprise one or more processors.
Processors may receive signals such as instructions and/or data at the input (s) and process the signals to generate at least one output. While executing code, the code changes the physical states and characteristics of transistors that make up a processor pipeline. The physical states of the transistors translate into logical bits of ones and zeros stored in registers within the processor. The processor can transfer the physical states of the transistors into registers and transfer the physical states of the transistors to another storage medium.
A processor may comprise circuits to perform one or more sub-functions implemented to perform the overall function of the processor. One example of a processor is a state machine or an application-specific integrated circuit (ASIC) that includes at least one input and at least one output. A state machine may manipulate the at least one input to generate the at least one output by performing a predetermined series of serial and/or parallel manipulations or transformations on the at least one input.
The logic as described above may be part of the design for an integrated circuit chip. The chip design is created in a graphical computer programming language, and stored in a computer storage medium or data storage medium (such as a disk, tape, physical hard drive, or virtual hard drive such as in a storage access network) . If the designer does not fabricate chips or the photolithographic masks used to fabricate chips, the designer transmits the resulting design by physical means (e.g., by providing a copy of the storage medium storing the design) or electronically (e.g., through the Internet) to such entities, directly or indirectly. The stored design is then converted into the appropriate format (e.g., GDSII) for the fabrication.
The resulting integrated circuit chips can be distributed by the fabricator in raw wafer form (that is, as a single wafer that has multiple unpackaged chips) , as a bare die, or in a packaged form. In the latter case, the chip is mounted in a single chip package (such as a plastic carrier, with leads that are affixed to a motherboard or other higher-level carrier) or in a multichip package (such as a ceramic carrier that has either or both surface interconnections or buried interconnections) . In any case, the chip is then integrated with other chips, discrete circuit elements, and/or other signal processing devices as part of either (a) an intermediate product, such as a processor board, a server platform, or a motherboard, or (b) an end product.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration. ” Any embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments. The terms “computing  device, ” “user device, ” “communication station, ” “station, ” “handheld device, ” “mobile device, ” “wireless device” and “user equipment” (UE) as used herein refers to a wireless communication device such as a cellular telephone, a smartphone, a tablet, a netbook, a wireless terminal, a laptop computer, a femtocell, a high data rate (HDR) subscriber station, an access point, a printer, a point of sale device, an access terminal, or other personal communication system (PCS) device. The device may be either mobile or stationary.
As used within this document, the term “communicate” is intended to include transmitting, or receiving, or both transmitting and receiving. This may be particularly useful in claims when describing the organization of data that is being transmitted by one device and received by another, but only the functionality of one of those devices is required to infringe the claim. Similarly, the bidirectional exchange of data between two devices (both devices transmit and receive during the exchange) may be described as “communicating, ” when only the functionality of one of those devices is being claimed. The term “communicating” as used herein with respect to a wireless communication signal includes transmitting the wireless communication signal and/or receiving the wireless communication signal. For example, a wireless communication unit, which is capable of communicating a wireless communication signal, may include a wireless transmitter to transmit the wireless communication signal to at least one other wireless communication unit, and/or a wireless communication receiver to receive the wireless communication signal from at least one other wireless communication unit.
As used herein, unless otherwise specified, the use of the ordinal adjectives “first, ” “second, ” “third, ” etc., to describe a common object, merely indicates that different instances of like objects are being referred to and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
Some embodiments may be used in conjunction with various devices and systems, for example, a personal computer (PC) , a desktop computer, a mobile computer, a laptop computer, a notebook computer, a tablet computer, a server computer, a handheld computer, a handheld device, a personal digital assistant (PDA) device, a handheld PDA device, an on-board device, an off-board device, a hybrid device, a vehicular device, a non-vehicular device, a mobile or portable device, a consumer device, a non-mobile or non-portable device, a wireless communication station, a wireless communication device, a wireless access point (AP) , a wired or wireless router, a wired or wireless modem, a video device, an audio device, an audio-video (A/V) device, a wired or wireless network, a wireless area network, a wireless video area network (WVAN) , a local area network (LAN) , a wireless LAN (WLAN) , a personal area network (PAN) , a wireless PAN (WPAN) , and the like.
Some embodiments may be used in conjunction with one way and/or two-way radio communication systems, cellular radio-telephone communication systems, a mobile phone, a cellular telephone, a wireless telephone, a personal communication system (PCS) device, a PDA device which incorporates a wireless communication device, a mobile or portable global positioning system (GPS) device, a device which incorporates a GPS receiver or transceiver or chip, a device which incorporates an RFID element or chip, a multiple input multiple output (MIMO) transceiver or device, a single input multiple output (SIMO) transceiver or device, a multiple input single output (MISO) transceiver or device, a device having one or more internal antennas and/or external antennas, digital video broadcast (DVB) devices or systems, multi-standard radio devices or systems, a wired or wireless handheld device, e.g., a smartphone, a wireless application protocol (WAP) device, or the like.
Some embodiments may be used in conjunction with one or more types of wireless communication signals and/or systems following one or more wireless communication protocols, for example, radio frequency (RF) , infrared (IR) , frequency-division multiplexing (FDM) , orthogonal FDM (OFDM) , time-division multiplexing (TDM) , time-division multiple access (TDMA) , extended TDMA (E-TDMA) , general packet radio service (GPRS) , extended GPRS, code-division multiple access (CDMA) , wideband CDMA (WCDMA) , CDMA 2000, single-carrier CDMA, multi-carrier CDMA, multi-carrier modulation (MDM) , discrete multi-tone (DMT) , 
Figure PCTCN2022106236-appb-000015
global positioning system (GPS) , Wi-Fi, Wi-Max, ZigBee, ultra-wideband (UWB) , global system for mobile communications (GSM) , 2G, 2.5G, 3G, 3.5G, 4G, fifth generation (5G) mobile networks, 3GPP, long term evolution (LTE) , LTE advanced, enhanced data rates for GSM Evolution (EDGE) , or the like. Other embodiments may be used in various other devices, systems, and/or networks.
The following examples pertain to further embodiments.
Example 1 may be an apparatus of a device for decoding video data encoded using low-complexity enhancement video coding (LCEVC) , the processing circuitry configured to: identify a bitstream received from a device, the bitstream comprising a first layer encoded using a base encoder and enhancement layers encoded using LCEVC; decode a first video frame of the first layer of the bitstream using a base decoder; up-sample the decoded first video frame; decode encoded video data of a first enhancement layer of the enhancement layers; generate a first combined intermediate video frame by combining the up-sampled first video frame and the decoded video data of the first enhancement layer; up-sample the first combined intermediate video frame; decode encoded video data of a second enhancement layer of the  enhancement layers; select, from among multiple temporal layer reference frames concurrently stored in a reference frame buffer, a reference frame with which to combine the decoded video data of the second enhancement layer; generate a second combined intermediate video frame by combining the decoded video data of the second enhancement layer and the selected reference frame; and generate a combined output video frame by combining the first combined intermediate video frame and the second combined intermediate video frame.
Example 2 may include the apparatus of example 1 and/or some other example herein, wherein the processing circuitry is further configured to: identify a global configuration syntax of the bitstream; and identify a temporal layer multiple reference frame indicator in the global configuration syntax, the temporal layer multiple reference frame indicator indicating that the apparatus is enabled to select from among the multiple temporal layer reference frames, wherein the selection of the reference frame is based on the identification of the temporal layer multiple reference frame indicator.
Example 3 may include the apparatus of example 2 and/or some other example herein, wherein the temporal layer multiple reference frame indicator consists of only one bit.
Example 4 may include the apparatus of example 2 or example 3 and/or some other example herein, wherein the processing circuitry is further configured to: identify a picture configuration syntax of the bitstream; identify a refresh frame indicator in the picture configuration syntax, the refresh frame indicator indicating that a current video frame, a previous video frame, or a future reference frame is the reference frame; and identify a reference frame index in the picture configuration syntax, the reference frame index indicating which of the multiple temporal layer reference frames is to be used for temporal prediction; and decode temporal data of the bitstream based on the reference frame, wherein to decode the temporal data is based on the temporal prediction.
Example 5 may include the apparatus of example 4 and/or some other example herein, wherein the refresh frame indicator and the reference frame index consists of two bytes.
Example 6 may include the apparatus of example 4 and/or some other example herein, wherein the processing circuitry is further configured to: identify a temporal prediction type indicator in syntax of the bitstream, wherein the temporal prediction is based on the temporal prediction type indicator.
Example 7 may include the apparatus of example 6 and/or some other example herein, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a nearest video frame of the multiple temporal layer reference frames.
Example 8 may include the apparatus of example 6 and/or some other example herein, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a video frame of the multiple temporal layer reference frames that is one video frame from the nearest video frame.
Example 9 may include the apparatus of example 6 and/or some other example herein, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a video frame of the multiple temporal layer reference frames that is two video frames from the nearest video frame.
Example 10 may include the apparatus of example 6 and/or some other example herein, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a video frame of the multiple temporal layer reference frames that is three video frames from the nearest video frame.
Example 11 may include a computer-readable storage medium comprising instructions to cause processing circuitry of a device for decoding video data encoded using low-complexity enhancement video coding (LCEVC) , upon execution of the instructions by the processing circuitry, to: identify a bitstream received from a device, the bitstream comprising a first layer encoded using a base encoder and enhancement layers encoded using LCEVC; decode a first video frame of the first layer of the bitstream using a base decoder; up-sample the decoded first video frame; decode encoded video data of a first enhancement layer of the enhancement layers; generate a first combined intermediate video frame by combining the up-sampled first video frame and the decoded video data of the first enhancement layer; up-sample the first combined intermediate video frame; decode encoded video data of a second enhancement layer of the enhancement layers; select, from among multiple temporal layer reference frames concurrently stored in a reference frame buffer, a reference frame with which to combine the decoded video data of the second enhancement layer; generate a second combined intermediate video frame by combining the decoded video data of the second enhancement layer and the selected reference frame; and generate a combined output video frame by combining the first combined intermediate video frame and the second combined intermediate video frame.
Example 12 may include the computer-readable medium of example 11 and/or some other example herein, wherein execution of the instructions further causes the processing circuitry to: identify a global configuration syntax of the bitstream; and identify a temporal layer multiple reference frame indicator in the global configuration syntax, the temporal layer multiple reference frame indicator indicating that the device is enabled to select from among  the multiple temporal layer reference frames, wherein the selection of the reference frame is based on the identification of the temporal layer multiple reference frame indicator.
Example 13 may include computer-readable medium of example 12 and/or some other example herein, wherein the temporal layer multiple reference frame indicator consists of only one bit.
Example 14 may include the computer-readable medium of example 12 and/or some other example herein, wherein execution of the instructions further causes the processing circuitry to: identify a picture configuration syntax of the bitstream; identify a refresh frame indicator in the picture configuration syntax, the refresh frame indicator indicating that a current video frame, a previous video frame, or a future reference frame is the reference frame; and identify a reference frame index in the picture configuration syntax, the reference frame index indicating which of the multiple temporal layer reference frames is to be used for temporal prediction; and decode temporal data of the bitstream based on the reference frame, wherein to decode the temporal data is based on the temporal prediction.
Example 15 may include the computer-readable medium of example 14 and/or some other example herein, wherein the refresh frame indicator and the reference frame index consists of two bytes.
Example 16 may include the computer-readable medium of example 14 and/or some other example herein, wherein execution of the instructions further causes the processing circuitry to: identify a temporal prediction type indicator in syntax of the bitstream, wherein the temporal prediction is based on the temporal prediction type indicator.
Example 17 may include the computer-readable medium of example 16 and/or some other example herein, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a nearest video frame of the multiple temporal layer reference frames.
Example 18 may include the computer-readable medium of example 16 and/or some other example herein, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a video frame of the multiple temporal layer reference frames that is one video frame from the nearest video frame.
Example 19 may include the computer-readable medium of example 16 and/or some other example herein, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a video frame of the multiple temporal layer reference frames that is two video frames from the nearest video frame.
Example 20 may include a method for decoding video data encoded using low-complexity enhancement video coding (LCEVC) , the method comprising: identifying, by a  first device, a bitstream received from a second device, the bitstream comprising a first layer encoded using a base encoder and enhancement layers encoded using LCEVC; decoding a first video frame of the first layer of the bitstream using a base decoder; up-sampling the decoded first video frame; decoding encoded video data of a first enhancement layer of the enhancement layers; generating a first combined intermediate video frame by combining the up-sampled first video frame and the decoded video data of the first enhancement layer; up-sampling the first combined intermediate video frame; decoding encoded video data of a second enhancement layer of the enhancement layers; selecting, from among multiple temporal layer reference frames concurrently stored in a reference frame buffer, a reference frame with which to combine the decoded video data of the second enhancement layer; generating a second combined intermediate video frame by combining the decoded video data of the second enhancement layer and the selected reference frame; and generating a combined output video frame by combining the first combined intermediate video frame and the second combined intermediate video frame.
Example 21 may include the method of example 20 and/or some other example herein, further comprising: identifying a global configuration syntax of the bitstream; and identifying a temporal layer multiple reference frame indicator in the global configuration syntax, the temporal layer multiple reference frame indicator indicating that the device is enabled to select from among the multiple temporal layer reference frames, wherein the selection of the reference frame is based on the identification of the temporal layer multiple reference frame indicator.
Example 22 may include the method of example 21 and/or some other example herein, wherein the temporal layer multiple reference frame indicator consists of only one bit.
Example 23 may include the method of example 21 and/or some other example herein, further comprising: identifying a picture configuration syntax of the bitstream; identifying a refresh frame indicator in the picture configuration syntax, the refresh frame indicator indicating that a current video frame, a previous video frame, or a future reference frame is the reference frame; and identifying a reference frame index in the picture configuration syntax, the reference frame index indicating which of the multiple temporal layer reference frames is to be used for temporal prediction; and decoding temporal data of the bitstream based on the reference frame, wherein to decode the temporal data is based on the temporal prediction.
Example 24 may include the method of example 23 and/or some other example herein, wherein the refresh frame indicator and the reference frame index consists of two bytes.
Example 25 may include the method of example 23 and/or some other example herein, further comprising: identifying a temporal prediction type indicator in syntax of the bitstream, wherein the temporal prediction is based on the temporal prediction type indicator.
Example 26 may include an apparatus comprising means for: identifying a bitstream received from a device, the bitstream comprising a first layer encoded using a base encoder and enhancement layers encoded using LCEVC; decoding a first video frame of the first layer of the bitstream using a base decoder; up-sampling the decoded first video frame; decoding encoded video data of a first enhancement layer of the enhancement layers; generating a first combined intermediate video frame by combining the up-sampled first video frame and the decoded video data of the first enhancement layer; up-sampling the first combined intermediate video frame; decoding encoded video data of a second enhancement layer of the enhancement layers; selecting, from among multiple temporal layer reference frames concurrently stored in a reference frame buffer, a reference frame with which to combine the decoded video data of the second enhancement layer; generating a second combined intermediate video frame by combining the decoded video data of the second enhancement layer and the selected reference frame; and generating a combined output video frame by combining the first combined intermediate video frame and the second combined intermediate video frame.
Example 27 may include one or more non-transitory computer-readable media comprising instructions to cause an electronic device, upon execution of the instructions by one or more processors of the electronic device, to perform one or more elements of a method described in or related to any of examples 1-26, or any other method or process described herein
Example 28 may include an apparatus comprising logic, modules, and/or circuitry to perform one or more elements of a method described in or related to any of examples 1-26, or any other method or process described herein.
Example 29 may include a method, technique, or process as described in or related to any of examples 1-26, or portions or parts thereof.
Example 30 may include an apparatus comprising: one or more processors and one or more computer readable media comprising instructions that, when executed by the one or more processors, cause the one or more processors to perform the method, techniques, or process as described in or related to any of examples 1-26, or portions thereof.
Embodiments according to the disclosure are in particular disclosed in the attached claims directed to a method, a storage medium, a device and a computer program  product, wherein any feature mentioned in one claim category, e.g., method, can be claimed in another claim category, e.g., system, as well. The dependencies or references back in the attached claims are chosen for formal reasons only. However, any subject matter resulting from a deliberate reference back to any previous claims (in particular multiple dependencies) can be claimed as well, so that any combination ofclaims and the features thereof are disclosed and can be claimed regardless of the dependencies chosen in the attached claims. The subject-matter which can be claimed comprises not only the combinations of features as set out in the attached claims but also any other combination of features in the claims, wherein each feature mentioned in the claims can be combined with any other feature or combination of other features in the claims. Furthermore, any of the embodiments and features described or depicted herein can be claimed in a separate claim and/or in any combination with any embodiment or feature described or depicted herein or with any of the features of the attached claims.
The foregoing description of one or more implementations provides illustration and description, but is not intended to be exhaustive or to limit the scope of embodiments to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of various embodiments.
Certain aspects of the disclosure are described above with reference to block and flow diagrams of systems, methods, apparatuses, and/or computer program products according to various implementations. It will be understood that one or more blocks of the block diagrams and flow diagrams, and combinations of blocks in the block diagrams and the flow diagrams, respectively, may be implemented by computer-executable program instructions. Likewise, some blocks of the block diagrams and flow diagrams may not necessarily need to be performed in the order presented, or may not necessarily need to be performed at all, according to some implementations.
These computer-executable program instructions may be loaded onto a special-purpose computer or other particular machine, a processor, or other programmable data processing apparatus to produce a particular machine, such that the instructions that execute on the computer, processor, or other programmable data processing apparatus create means for implementing one or more functions specified in the flow diagram block or blocks. These computer program instructions may also be stored in a computer-readable storage media or memory that may direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable storage media produce an article of manufacture including instruction means that implement one or more functions specified in the flow diagram block or blocks. As an example, certain  implementations may provide for a computer program product, comprising a computer-readable storage medium having a computer-readable program code or program instructions implemented therein, said computer-readable program code adapted to be executed to implement one or more functions specified in the flow diagram block or blocks. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational elements or steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions that execute on the computer or other programmable apparatus provide elements or steps for implementing the functions specified in the flow diagram block or blocks.
Accordingly, blocks of the block diagrams and flow diagrams support combinations of means for performing the specified functions, combinations of elements or steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that each block of the block diagrams and flow diagrams, and combinations of blocks in the block diagrams and flow diagrams, may be implemented by special-purpose, hardware-based computer systems that perform the specified functions, elements or steps, or combinations of special-purpose hardware and computer instructions.
Conditional language, such as, among others, “can, ” “could, ” “might, ” or “may, ” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain implementations could include, while other implementations do not include, certain features, elements, and/or operations. Thus, such conditional language is not generally intended to imply that features, elements, and/or operations are in any way required for one or more implementations or that one or more implementations necessarily include logic for deciding, with or without user input or prompting, whether these features, elements, and/or operations are included or are to be performed in any particular implementation.
Many modifications and other implementations of the disclosure set forth herein will be apparent having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the disclosure is not to be limited to the specific implementations disclosed and that modifications and other implementations are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitat

Claims (25)

  1. An apparatus for decoding video data encoded using low-complexity enhancement video coding (LCEVC) , the apparatus comprising processing circuitry coupled to memory, the processing circuitry configured to:
    identify a bitstream received from a device, the bitstream comprising a first layer encoded using a base encoder and enhancement layers encoded using LCEVC;
    decode a first video frame of the first layer of the bitstream using a base decoder;
    up-sample the decoded first video frame;
    decode encoded video data of a first enhancement layer of the enhancement layers;
    generate a first combined intermediate video frame by combining the up-sampled first video frame and the decoded video data of the first enhancement layer;
    up-sample the first combined intermediate video frame;
    decode encoded video data of a second enhancement layer of the enhancement layers;
    select, from among multiple temporal layer reference frames concurrently stored in a reference frame buffer, a reference frame with which to combine the decoded video data of the second enhancement layer;
    generate a second combined intermediate video frame by combining the decoded video data of the second enhancement layer and the selected reference frame; and
    generate a combined output video frame by combining the first combined intermediate video frame and the second combined intermediate video frame.
  2. The apparatus of claim 1, wherein the processing circuitry is further configured to:
    identify a global configuration syntax of the bitstream; and
    identify a temporal layer multiple reference frame indicator in the global configuration syntax, the temporal layer multiple reference frame indicator indicating that the apparatus is enabled to select from among the multiple temporal layer reference frames,
    wherein the selection of the reference frame is based on the identification of the temporal layer multiple reference frame indicator.
  3. The apparatus of claim 2, wherein the temporal layer multiple reference frame indicator consists of only one bit.
  4. The apparatus of any of claims 2 or 3, wherein the processing circuitry is further configured to:
    identify a picture configuration syntax of the bitstream;
    identify a refresh frame indicator in the picture configuration syntax, the refresh frame indicator indicating that a current video frame, a previous video frame, or a future reference frame is the reference frame; and
    identify a reference frame index in the picture configuration syntax, the reference frame index indicating which of the multiple temporal layer reference frames is to be used for temporal prediction; and
    decode temporal data of the bitstream based on the reference frame,
    wherein to decode the temporal data is based on the temporal prediction.
  5. The apparatus of claim 4, wherein the refresh frame indicator and the reference frame index consists of two bytes.
  6. The apparatus of claim 4, wherein the processing circuitry is further configured to:
    identify a temporal prediction type indicator in syntax of the bitstream, wherein the temporal prediction is based on the temporal prediction type indicator.
  7. The apparatus of claim 6, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a nearest video frame of the multiple temporal layer reference frames.
  8. The apparatus of claim 6, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a video frame of the multiple temporal layer reference frames that is one video frame from the nearest video frame.
  9. The apparatus of claim 6, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a video frame of the multiple temporal layer reference frames that is two video frames from the nearest video frame.
  10. The apparatus of claim 6, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a video frame of the multiple temporal layer reference frames that is three video frames from the nearest video frame.
  11. A computer-readable storage medium comprising instructions to cause processing circuitry of a device for decoding video data encoded using low-complexity enhancement video coding (LCEVC) , upon execution of the instructions by the processing circuitry, to:
    identify a bitstream received from a device, the bitstream comprising a first layer encoded using a base encoder and enhancement layers encoded using LCEVC;
    decode a first video frame of the first layer of the bitstream using a base decoder;
    up-sample the decoded first video frame;
    decode encoded video data of a first enhancement layer of the enhancement layers;
    generate a first combined intermediate video frame by combining the up-sampled first video frame and the decoded video data of the first enhancement layer;
    up-sample the first combined intermediate video frame;
    decode encoded video data of a second enhancement layer of the enhancement layers;
    select, from among multiple temporal layer reference frames concurrently stored in a reference frame buffer, a reference frame with which to combine the decoded video data of the second enhancement layer;
    generate a second combined intermediate video frame by combining the decoded video data of the second enhancement layer and the selected reference frame; and
    generate a combined output video frame by combining the first combined intermediate video frame and the second combined intermediate video frame.
  12. The computer-readable medium of claim 11, wherein execution of the instructions further causes the processing circuitry to:
    identify a global configuration syntax of the bitstream; and
    identify a temporal layer multiple reference frame indicator in the global configuration syntax, the temporal layer multiple reference frame indicator indicating that the device is enabled to select from among the multiple temporal layer reference frames,
    wherein the selection of the reference frame is based on the identification of the temporal layer multiple reference frame indicator.
  13. The computer-readable medium of claim 12, wherein the temporal layer multiple reference frame indicator consists of only one bit.
  14. The computer-readable medium of claim 12, wherein execution of the instructions further causes the processing circuitry to:
    identify a picture configuration syntax of the bitstream;
    identify a refresh frame indicator in the picture configuration syntax, the refresh frame indicator indicating that a current video frame, a previous video frame, or a future reference frame is the reference frame; and
    identify a reference frame index in the picture configuration syntax, the reference frame index indicating which of the multiple temporal layer reference frames is to be used for temporal prediction; and
    decode temporal data of the bitstream based on the reference frame, wherein to decode the temporal data is based on the temporal prediction.
  15. The computer-readable medium of claim 14, wherein the refresh frame indicator and the reference frame index consists of two bytes.
  16. The computer-readable medium of claim 14, wherein execution of the instructions further causes the processing circuitry to:
    identify a temporal prediction type indicator in syntax of the bitstream, wherein the temporal prediction is based on the temporal prediction type indicator.
  17. The computer-readable medium of claim 16, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a nearest video frame of the multiple temporal layer reference frames.
  18. The computer-readable medium of claim 16, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a video frame of the multiple temporal layer reference frames that is one video frame from the nearest video frame.
  19. The computer-readable medium of claim 16, wherein the temporal prediction type indicator indicates that the temporal prediction is based on a video frame of the multiple temporal layer reference frames that is two video frames from the nearest video frame.
  20. A method for decoding video data encoded using low-complexity enhancement video coding (LCEVC) , the method comprising:
    identifying, by a first device, a bitstream received from a second device, the bitstream comprising a first layer encoded using a base encoder and enhancement layers encoded using LCEVC;
    decoding a first video frame of the first layer of the bitstream using a base decoder;
    up-sampling the decoded first video frame;
    decoding encoded video data of a first enhancement layer of the enhancement layers;
    generating a first combined intermediate video frame by combining the up-sampled first video frame and the decoded video data of the first enhancement layer;
    up-sampling the first combined intermediate video frame;
    decoding encoded video data of a second enhancement layer of the enhancement layers;
    selecting, from among multiple temporal layer reference frames concurrently stored in a reference frame buffer, a reference frame with which to combine the decoded video data of the second enhancement layer;
    generating a second combined intermediate video frame by combining the decoded video data of the second enhancement layer and the selected reference frame; and
    generating a combined output video frame by combining the first combined intermediate video frame and the second combined intermediate video frame.
  21. The method of claim 20, further comprising:
    identifying a global configuration syntax of the bitstream; and
    identifying a temporal layer multiple reference frame indicator in the global configuration syntax, the temporal layer multiple reference frame indicator indicating that the device is enabled to select from among the multiple temporal layer reference frames,
    wherein the selection of the reference frame is based on the identification of the temporal layer multiple reference frame indicator.
  22. The method of claim 21, wherein the temporal layer multiple reference frame indicator consists of only one bit.
  23. The method of claim 21, further comprising:
    identifying a picture configuration syntax of the bitstream;
    identifying a refresh frame indicator in the picture configuration syntax, the refresh frame indicator indicating that a current video frame, a previous video frame, or a future reference frame is the reference frame; and
    identifying a reference frame index in the picture configuration syntax, the reference frame index indicating which of the multiple temporal layer reference frames is to be used for temporal prediction; and
    decoding temporal data of the bitstream based on the reference frame, wherein to decode the temporal data is based on the temporal prediction.
  24. The method of claim 23, wherein the refresh frame indicator and the reference frame index consists of two bytes.
  25. The method of claim 23, further comprising:
    identifying a temporal prediction type indicator in syntax of the bitstream, wherein the temporal prediction is based on the temporal prediction type indicator.
PCT/CN2022/106236 2022-07-18 2022-07-18 Low-complexity enhancement video coding using multiple reference frames WO2024016106A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/106236 WO2024016106A1 (en) 2022-07-18 2022-07-18 Low-complexity enhancement video coding using multiple reference frames

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/106236 WO2024016106A1 (en) 2022-07-18 2022-07-18 Low-complexity enhancement video coding using multiple reference frames

Publications (1)

Publication Number Publication Date
WO2024016106A1 true WO2024016106A1 (en) 2024-01-25

Family

ID=89616631

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/106236 WO2024016106A1 (en) 2022-07-18 2022-07-18 Low-complexity enhancement video coding using multiple reference frames

Country Status (1)

Country Link
WO (1) WO2024016106A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180077421A1 (en) * 2016-09-09 2018-03-15 Microsoft Technology Licensing, Llc Loss Detection for Encoded Video Transmission
WO2021064412A1 (en) * 2019-10-02 2021-04-08 V-Nova International Limited Use of embedded signalling to correct signal impairments
US20210297681A1 (en) * 2018-07-15 2021-09-23 V-Nova International Limited Low complexity enhancement video coding
CN113994685A (en) * 2019-04-16 2022-01-28 威诺瓦国际有限公司 Exchanging information in scalable video coding
WO2022079450A1 (en) * 2020-10-16 2022-04-21 V-Nova International Ltd Distributed analysis of a multi-layer signal encoding
CN114424547A (en) * 2019-07-05 2022-04-29 威诺瓦国际有限公司 Quantization of residual in video coding
CN114503573A (en) * 2019-03-20 2022-05-13 威诺瓦国际有限公司 Low complexity enhanced video coding

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180077421A1 (en) * 2016-09-09 2018-03-15 Microsoft Technology Licensing, Llc Loss Detection for Encoded Video Transmission
US20210297681A1 (en) * 2018-07-15 2021-09-23 V-Nova International Limited Low complexity enhancement video coding
CN114503573A (en) * 2019-03-20 2022-05-13 威诺瓦国际有限公司 Low complexity enhanced video coding
CN113994685A (en) * 2019-04-16 2022-01-28 威诺瓦国际有限公司 Exchanging information in scalable video coding
CN114424547A (en) * 2019-07-05 2022-04-29 威诺瓦国际有限公司 Quantization of residual in video coding
WO2021064412A1 (en) * 2019-10-02 2021-04-08 V-Nova International Limited Use of embedded signalling to correct signal impairments
WO2022079450A1 (en) * 2020-10-16 2022-04-21 V-Nova International Ltd Distributed analysis of a multi-layer signal encoding

Similar Documents

Publication Publication Date Title
US11463709B2 (en) Encoder, a decoder and corresponding methods using intra block copy (IBC) dedicated buffer and default value refreshing for luma and chroma component
US9918082B2 (en) Continuous prediction domain
US8705624B2 (en) Parallel decoding for scalable video coding
US9407915B2 (en) Lossless video coding with sub-frame level optimal quantization values
US20210211657A1 (en) Apparatus and Method for Filtering in Video Coding with Look-Up Table Selected Based on Bitstream Information
US20210400304A1 (en) Encoder, a decoder and corresponding methods using ibc search range optimization for arbitrary ctu size
US20160007038A1 (en) Late-stage mode conversions in pipelined video encoders
JP2017515339A (en) Method and apparatus for lossless video coding signaling
JP2017515339A5 (en)
KR102390162B1 (en) Apparatus and method for encoding data
EP4229868A1 (en) Joint termination of bidirectional data blocks for parallel coding
US20210266602A1 (en) Separate merge list for subblock merge candidates and intra-inter techniques harmonization for video coding
CN117980916A (en) Transducer-based architecture for transform coding of media
US8576916B2 (en) Method and apparatus for reducing bus traffic of a texture decoding module in a video decoder
WO2024016106A1 (en) Low-complexity enhancement video coding using multiple reference frames
US20220116611A1 (en) Enhanced video coding using region-based adaptive quality tuning
WO2024065464A1 (en) Low-complexity enhancment video coding using tile-level quantization parameters
US20220109825A1 (en) Validation framework for media encode systems
WO2023184206A1 (en) Enhanced presentation of tiles of residual sub-layers in low complexity enhancement video coding encoded bitstream
US20220116595A1 (en) Enhanced video coding using a single mode decision engine for multiple codecs
US20220094931A1 (en) Low frequency non-separable transform and multiple transform selection deadlock prevention
US20230012862A1 (en) Bit-rate-based variable accuracy level of encoding
US20230027742A1 (en) Complexity aware encoding
US20230010681A1 (en) Bit-rate-based hybrid encoding on video hardware assisted central processing units
US20220094984A1 (en) Unrestricted intra content to improve video quality of real-time encoding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22951385

Country of ref document: EP

Kind code of ref document: A1