GB2619186A - Low complexity enhancement video coding - Google Patents

Low complexity enhancement video coding Download PDF

Info

Publication number
GB2619186A
GB2619186A GB2312668.3A GB202312668A GB2619186A GB 2619186 A GB2619186 A GB 2619186A GB 202312668 A GB202312668 A GB 202312668A GB 2619186 A GB2619186 A GB 2619186A
Authority
GB
United Kingdom
Prior art keywords
temporal
value
bit
residuals
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB2312668.3A
Other versions
GB202312668D0 (en
GB2619186B (en
Inventor
Meardi Guido
Ferrara Simone
Ciccarelli Lorenzo
Damnjanovic Ivan
Clucas Richard
Littlewood Sam
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
V Nova International Ltd
Original Assignee
V Nova International Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB1903844.7A external-priority patent/GB201903844D0/en
Priority claimed from GBGB1904014.6A external-priority patent/GB201904014D0/en
Priority claimed from GBGB1904492.4A external-priority patent/GB201904492D0/en
Priority claimed from GBGB1905325.5A external-priority patent/GB201905325D0/en
Priority claimed from GBGB1909701.3A external-priority patent/GB201909701D0/en
Priority claimed from GBGB1909724.5A external-priority patent/GB201909724D0/en
Priority claimed from GBGB1909997.7A external-priority patent/GB201909997D0/en
Priority claimed from GBGB1910674.9A external-priority patent/GB201910674D0/en
Priority claimed from GBGB1911467.7A external-priority patent/GB201911467D0/en
Priority claimed from GBGB1911546.8A external-priority patent/GB201911546D0/en
Priority claimed from GB201914215A external-priority patent/GB201914215D0/en
Priority claimed from GB201914414A external-priority patent/GB201914414D0/en
Priority claimed from GB201914634A external-priority patent/GB201914634D0/en
Priority claimed from GB201915553A external-priority patent/GB201915553D0/en
Priority claimed from GBGB1916090.2A external-priority patent/GB201916090D0/en
Priority claimed from GBGB1918099.1A external-priority patent/GB201918099D0/en
Priority claimed from GBGB2000430.5A external-priority patent/GB202000430D0/en
Priority claimed from GBGB2000483.4A external-priority patent/GB202000483D0/en
Priority claimed from GBGB2000600.3A external-priority patent/GB202000600D0/en
Priority claimed from GBGB2001408.0A external-priority patent/GB202001408D0/en
Application filed by V Nova International Ltd filed Critical V Nova International Ltd
Priority claimed from GB2303563.7A external-priority patent/GB2614983B/en
Publication of GB202312668D0 publication Critical patent/GB202312668D0/en
Publication of GB2619186A publication Critical patent/GB2619186A/en
Application granted granted Critical
Publication of GB2619186B publication Critical patent/GB2619186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/33Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/18Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a set of transform coefficients
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/187Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/34Scalability techniques involving progressive bit-plane based encoding of the enhancement layer, e.g. fine granular scalability [FGS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Examples of a low complexity enhancement video coding are described. The claimed invention relates to a bitstream for transmitting one or more enhancement residuals planes to be added to a set of preliminary pictures that are obtained from a decoder reconstructed video. The application is to a decoder configuration for controlling a decoding process of the bitstream comprising a no-enhancement variable, wherein a first value of the no enhancement variable indicates, that a decoder should perform dequantization and transformation processes on entropy decoded quantized coefficients to obtain an array of residuals and wherein a second value of the no enhancement variable indicates that at a decoder, the decoder should set values of an array to a predetermined value, wherein the decoder should invoke a picture reconstruction process based on the array. A related decoding process is also claimed.

Description

LOW COMPLEXITY ENHANCEMENT VIDEO CODING
TECHNICAL FIELD
The present invention relates to a video coding technology. In particular, the present invention relates to methods and systems for encoding and decoding video data. In certain examples, the methods and systems may be used to generate a compressed representation for streaming and/or storage.
BACKGROUND
Typical comparative video codecs operate using a single-layer, block-based approach, whereby an original signal is processed using a number of coding tools in order to produce an encoded signal which can then be reconstructed by a corresponding decoding process. For simplicity, coding and decoding algorithms or processes are often referred to as "codecs"; the term "codec" being used to cover one or more of encoding and decoding processes that are designed according to a common framework. Such typical codecs include, but are not limited, to MPEG-2, AVC/f1.264, HEVC/H.265, VP8, VP9, AV1. There are also other codecs that are currently under development by international standards organizations, such as MPEG/ISO/ITU as well as industry consortia such as Alliance for Open Media (AoM).
In general, video service providers need to work with complex ecosystems. The selection of video codecs are often based on many various factors, including maximum compatibility with their existing ecosystems and costs of deploying the technology (e.g. both resource and monetary costs). Once a selection is made, it is difficult to change codecs without further massive investments in the form of equipment and time. Currently, it is difficult to upgrade an ecosystem without needing to replace it completely. Further, the resource cost and complexity of delivering an increasing number of services, sometimes using decentralised infrastructures such as so-called "cloud" configurations, arc becoming a key concern for service operators, small and big alike. This is compounded by the rise in low-resource battery-powered edge devices (e.g. nodes in the so-called Internet of Things).
All these factors need to be balanced with a need to reduce resource usage, e.g. to become more environmentally friendly, and a need to scale, e.g. to increase the number of users and provided services.
There is also a problem that many comparative codecs were developed in a time where large-scale commodity hardware was unavailable. This is not the case today. Large-scale data centres provide cheap generic data processing hardware. This is at odds with traditional video coding solutions that require bespoke hardware to operate efficiently.
SUMMARY
Aspects of the present invention are set out in the appended independent claims Certain variations of the invention are then set out in the appended dependent claims
BRIEF DESCRIPTION OF THE DRAWINGS
Examples of the invention will now be described, by way of example only, with reference to the accompanying drawings.
Figure 1 is a schematic illustration of an encoder according to a first example. Figure 2 is a schematic illustration of a decoder according to a first example. Figure 3A is a schematic illustration of an encoder according to a first variation of a second example Figure 3B is a schematic illustration of an encoder according to a second variation of the second example.
Figure 4 is a schematic illustration of an encoder according to a third example. Figure 5A is a schematic illustration of a decoder according to a second example. Figure 5B is a schematic illustration of a first variation of a decoder according to a 20 third example Figure 5C is a schematic illustration of a second variation of the decoder according to the third example.
Figure 6A is a schematic illustration showing an example 4 by 4 coding unit of residuals Figure 6B is a schematic illustration showing how coding units may be arranged in tiles.
Figures 7A to 7C are schematic illustrations showing possible colour plane arrangements.
Figure 8 is a flow chart shows a method of configuring a bit stream.
Figure 9A is a schematic illustration showing how a colour plane may be decomposed into a plurality of layers.
Figures 9B to 9J are schematic illustrations showing various methods of up-sampling.
Figure 10A to 101 are schematic illustrations showing various methods of entropy encoding quantized data Figures 11A to 11C are schematic illustrations showing aspects of different temporal modes.
Figures 12A and 12B are schematic illustrations showing components for applying temporal prediction according to examples.
Figures 12C and 12D are schematic illustrations showing how temporal signalling relates to coding units and tiles.
Figure 12E is a schematic illustration showing an example state machine for run-length encoding.
Figure 13A and 13B are two halves of a flow chart that shows a method of applying temporal processing according to an example Figures 14A to 14C are schematic illustrations showing example aspects of cloud control Figure 15 is a schematic illustration showing residual weighting according to an
example
Figures 16A to 16D are schematic illustrations showing calculation of predicted average elements according to various examples.
Figures 17A and 17B are schematic illustrations showing a rate controller that may be applied to one or more of first and second level enhancement encoding.
Figure 18 is a schematic illustration showing a rate controller according to a first example Figure 19 is a schematic illustration showing a rate controller according to a second example.
Figures 20A to 20D are schematic illustrations showing various aspects of quantization that may be used in examples Figures 21A and 21B arc schematic illustrations showing different bits cam configurations.
Figures 22A to 22D are schematic illustrations showing different aspects of an example neural network up-sampler.
Figure 23 is a schematic illustration showing an example of how a frame may be encoded.
Figure 24 is a schematic illustration of a decoder according to a fourth example. Figure 25 is a schematic illustration of an encoder according to a fifth example.
Figure 26 is a schematic illustration of a decoder according to a fifth example. Figure 27 is a flow chart indicated a decoding process according to an example Figures 28A to 28E show parsing trees for a prefix coding example.
Figure 29A shows two types of bitstreams that may be used to check conformance of decoders.
Figure 29B shows an example combined decoder.
Figure 30 shows example locations of chroma samples for top and bottom fields of an example frame.
DETAILED DESCRIPTION
Introduction
Certain examples described herein relate to a framework for a new video coding technology that is flexible, adaptable, highly efficient and computationally inexpensive coding. It combines a selectable a base codec (e.g. AVC, HEVC, or any other present or future codec) with at least two enhancement levels of coded data. The framework offers an approach that is low complexity yet provides for flexible enhancement of video data. Certain examples described herein build on a new multi-layer approach that has been developed. Details of this approach are described, for example, in US Patent Nos. US8,977,065, US8,948,248, US8,711,943, US9,129,411, US8,531,321, US9,510,018, U59,300,980, and U59,626,772 and PCT applications Nos. PCT/EP2013/059833, PCT/EP2013/059847, PCT/EP2013/059880, PCT/EP2013/059853, PCT/EP2013/059885, PCT/EP2013/059886, and PCT/1B2014/060716, which are all included herein by reference. This new multi-layer approach uses a hierarchy of layers wherein each layer may relate to a different level of quality, such as a different video resolution.
Examples of a low complexity enhancement video coding are described. Encoding and decoding methods arc described, as well as corresponding encoders and decoders. The enhancement coding may operate on top of a base layer, which may provide base encoding and decoding. Spatial scaling may be applied across different layers. Only the base layer encodes full video, which may be at a lower resolution. The enhancement coding instead operates on computed sets of residuals. The sets of residuals are computed for a plurality of layers, which may represent different levels of scaling in one or more dimensions. A number of encoding and decoding components or tools are described, which may involve the application of transformations, quantization, entropy encoding and temporal buffering.
At an example decoder, an encoded base stream and one or more encoded enhancement streams may be independently decoded and combined to reconstruct an original video. The general structure of an example encoding scheme presented herein uses a down-sampled source signal encoded with a base codec, adds a first level of correction data to the decoded output of the base codec to generate a corrected picture, and then adds a further level of enhancement data to an up-sampled version of the corrected picture. An encoded stream as described herein may be considered to comprise a base stream and an enhancement stream. The enhancement stream may have multiple layers (e.g. two are described in examples). The base stream may be decodable by a hardware decoder while the enhancement stream may be suitable for software processing implementation with suitable power consumption.
Certain examples described herein have a structure that provides a plurality of degrees of freedom, which in turn allows great flexibility and adaptability to many situations. This means that the coding format is suitable for many use cases including OTT transmission, live streaming, live UHD broadcast, and so on.
Although the decoded output of the base codec is not intended for viewing, it is a fully decoded video at a lower resolution, making the output compatible with existing decoders and, where considered suitable, also usable as a lower resolution output.
In the following description, certain example architectures for video encoding and decoding are described. These architectures use a small number of simple coding tools to reduce complexity. When combined synergistically, they can provide visual quality improvements when compared with a full resolution picture encoded with the base codec whilst at the same time generating flexibility in the way they can be used.
The present described examples provide a solution to the recent desire to use less and less power and, contributes to reducing the computational cost of encoding and decoding whilst increasing performance. The present described examples may operate as a software layer on top of existing infrastructures and deliver desired performances. The present examples provide a solution that is compatible with existing (and future) video streaming and delivery ecosystems whilst delivering video coding at a lower computational cost than it would be otherwise possible with a tout-court upgrade. Combining the coding efficiency of the latest codecs with the processing power reductions of the described examples may improve a technical case for the adoption of next-generation codecs. Certain examples described herein operate upon residuals. Residuals may be computed by comparing two images or video signals. In one case, residuals are computed by comparing frames from an input video stream with frames of a reconstructed video stream. In the case of the level 1 enhancement stream as described herein the residuals may be computed by comparing a down-sampled input video stream with a first video stream that has been encoded by a base encoder and then decoded by a base decoder (e.g. the first video stream simulates decoding and reconstruction of the down-sampled input video stream at a decoder). In the case of the level 2 enhancement stream as described herein the residuals may be computed by comparing the input video stream (e.g. at a level of quality or resolution higher than the down-sampled or base video stream) with a second video stream that is reconstructed from an up-sampled version of the first video stream plus a set of decoded level 1 residuals (e.g. the second video stream simulates decoding both a base stream and the level 1 enhancement stream, reconstructing a video stream at a lower or down-sampled level of quality, then up-sampling this reconstructed video stream). This is, for example, shown in Figures Ito 5C.
In certain examples, residuals may thus be considered to be errors or differences at a particular level of quality or resolution. In described examples, there are two levels of quality or resolutions and thus two sets of residuals (levels 1 and 2). Each set of residuals described herein models a different form of error or difference. The level 1 residuals, for example, typically correct for the characteristics of the base encoder, e.g. correct artefacts that are introduced by the base encoder as part of the encoding process. In contrast, the level 2 residuals, for example, typically correct complex effects introduced by the shifting in the levels of quality and differences introduced by the level 1 correction (e.g. artefacts generated over a wider spatial scale, such as areas of 4 or 16 pixels, by the level 1 encoding pipeline). This means it is not obvious that operations performed on one set of residuals will necessarily provide the same effect for another set of residuals, e.g. each set of residuals may have different statistical patterns and sets of correlations.
In the examples described herein residuals are encoded by an encoding pipeline. This may include transformation, quantization and entropy encoding operations. It may also include residual ranking, weighting and filtering, and temporal processing. These pipelines are shown in Figures 1 and 3A and 3B. Residuals are then transmitted to a decoder, e.g. as level 1 and level 2 enhancement streams, which may be combined with a base stream as a hybrid stream (or transmitted separately). In one case, a bit rate is set for a hybrid data stream that comprises the base stream and both enhancements streams, and then different adaptive bit rates are applied to the individual streams based on the data being processed to meet the set bit rate (e.g. high-quality video that is perceived with low levels of artefacts may be constructed by adaptively assigning a bit rate to different individual streams, even at a frame by frame level, such that constrained data may be used by the most perceptually influential individual streams, which may change as the image data changes).
The sets of residuals as described herein may be seen as sparse data, e.g. in many cases there is no difference for a given pixel or area and the resultant residual value is zero. When looking at the distribution of residuals much of the probability mass is allocated to small residual values located near zero -e.g. for certain videos values of -2, -1, 0, 1,2 etc occur the most frequently. In certain cases, the distribution of residual values is symmetric or near symmetric about 0. In certain test video cases, the distribution of residual values was found to take a shape similar to logarithmic or exponential distributions (e.g. symmetrically or near symmetrically) about 0. The exact distribution of residual values may depend on the content of the input video stream.
Residuals may be treated as a two-dimensional image in themselves, e.g. a delta image of differences. Seen in this manner the sparsity of the data may be seen to relate features like "dots", small "lines", "edges", "corners-, etc. that are visible in the residual images. It has been found that these features are typically not fully correlated (e.g. in space and/or in time). They have characteristics that differ from the characteristics of the image data they are derived from (e.g. pixel characteristics of the original video signal).
As the characteristics of the present residuals, including transformed residuals in the form of coefficients, differ from the characteristics of the image data they are derived from it is generally not possible to apply standard encoding approaches, e.g. such as those found in traditional Moving Picture Experts Group (MPEG) encoding and decoding standards. For example, many comparative schemes use large transforms (e.g. transforms of large areas of pixels in a normal video frame). Due to the characteristics of residuals, e.g. as described herein, it would be very inefficient to use these comparative large transforms on residual images. For example, it would be very hard to encode a small dot in a residual image using a large block designed for an area of a normal image.
Certain examples described herein address these issues by instead using small and simple transform kernels (e.g. 2x2 or 4x4 kernels -the Directional Decomposition and the Directional Decomposition Squared -as presented herein). This moves in a different direction from comparative video coding approaches. Applying these new approaches to blocks of residuals generates compression efficiency. For example, certain transforms generate uncorrelated coefficients (e.g. in space) that may be efficiently compressed. While correlations between coefficients may be exploited, e.g. for lines in residual images, these can lead to encoding complexity, which is difficult to implement on legacy and low-resource devices, and often generates other complex artefacts that need to be corrected. In the present examples, a different transform is used (Hadamard) to encode the correction data and the residuals than comparative approaches. For example, the transforms presented herein may be much more efficient than transforming larger blocks of data using a Discrete Cosine Transform (DCT), which is the transform used in SVC/SHVC.
Certain examples described herein also consider the temporal characteristics of residuals, e.g. as well as spatial characteristics. For example, in residual images details like "edges" and "dots" that may be observed in residual "images" show little temporal correlation. This is because "edges" in residual images often don't translate or rotate like edges as perceived in a normal video stream. For example, within residual images, "edges" may actually change shape over time, e.g. a head turning may be captured within multiple residual image "edges" but may not move in a standard manner (as the "edge" reflects complex differences that depend on factors such as lighting, scale factors, encoding factors etc.). These temporal aspects of residual images, e.g. residual "video" comprising sequential residual "frames" or "pictures" typically differ from the temporal aspects of conventional images, e.g. normal video frames (e.g. in the Y, U or V planes). Hence, it is not obvious how to apply conventional encoding approaches to residual images; indeed, it has been found that motion compensation approaches from comparative video encoding schemes and standards cannot encode residual data (e.g. in a useful manner).
An AVC layer within SVC may involve calculating data that are referred to in that comparative standard as "residuals". However, these comparative "residuals" are the difference between a pixel block of the data stream of that layer and a corresponding pixel block determined using either inter-frame prediction or intra-frame prediction. These comparative "residuals" are, however, very different from residuals encoded in the present examples. In SVC, the "residuals" arc the difference between a pixel block of a frame and a predicted pixel block for the frame (predicted using either inter-frame prediction or intraframe prediction). In contrast, the present examples involve calculating residuals as a difference between a coding block and a reconstructed coding block (e.g. which has undergone down-sampling and subsequent up-sampling, and has been corrected for encoding / decoding errors) Furthermore, many comparative video encoding approaches attempt to provide temporal prediction and motion-compensation as default to conventional video data. These "built-in" approaches may not only fail when applied to sequential residual images, they may take up unnecessary processing resources (e.g. these resources may be used while actually corrupting the video encoding). It may also generate unnecessary bits that take up an assigned bit rate. It is not obvious from conventional approaches how to address these problems.
Certain examples described herein, e.g. as described in the "Temporal Aspects" section and elsewhere, provide an efficient way of predicting temporal features within residual images. Certain examples use zero-motion vector prediction to efficiently predict temporal aspects and movement within residuals. These may be seen to predict movement for relatively static features (e.g. apply the second temporal mode -inter prediction -to residual features that persist over time) and then use the first temporal mode (e.g. intra prediction) for everything else. Hence, certain examples described herein do not attempt to waste scare resources and bit rate predicting transient uncorrelated temporal features in residual "video".
Certain examples described herein allow for legacy, existing and future codecs to be enhanced. The examples may thus leverage the capabilities of these codes as part of a base layer and provide improvements in the form of an enhancement layer.
Certain examples described herein are low complexity. They enable a base codec to be enhanced with low computational complexity and/or in a manner that enables widespread parallelisation. If down-sampling is used prior to the base codec (e.g. an application of spatial scalability), then a video signal at the original input resolution may be provided with a reduced computational complexity as compared to using the base codec at the original input resolution. This allows wide adoption of ultra-high-resolution video. For example, by a combination of processing an input video at a lower resolution with a single-layer existing codec and using a simple and small set of highly specialised tools to add details to an up-sampled version of the processed video, many advantages may be realised.
Certain examples described herein implement a number of modular yet specialised video coding tools. The tools that make up the enhancement layer (including two levels of enhancement at two different points) are designed for a particular type of data: residual data. Residual data as described herein results from a comparison of an original data signal and a reconstructed data signal. The reconstructed data signal is generated in a manner that differs from comparative video coding schemes. For example, the reconstructed data signal relates to a particular small spatial portion of an input video frame -a coding unit. A set of coding units for a frame may be processed in parallel as the residual data is not generated using other coding units for the frame or other coding units for other frames, as opposed to inter-and intra-prediction in comparative video coding technologies. Although temporal processing may be applied, this is applied at the coding unit level, using previous data for a current coding unit. There is no interdependency between coding units.
Certain specialised video coding tools described herein are specifically adapted for sparse residual data processing. Due to the differing method of generation, residual data as used herein has different properties to that of comparative video coding technologies. As shown in the Figures, certain examples described herein provide an enhancement layer that processes one or two layers of residual data. The residual data is produced by taking differences between a reference video frame (e.g., a source video) and a base-decoded version of the video (e.g. with or without up-sampling depending on the layer). The resulting residual data is sparse information, typically edges, dots and details which are then processed using small transforms which are designed to deal with sparse information.
These small transforms may be scale invariant, e.g. have integer values within the range of Certain examples described herein allow efficient use of existing codecs. For example, a base encoder is typically applied at a lower resolution (e.g. than an original input signal). A base decoder is then used to decode the output of the base encoder at the lower resolution and the resultant decoded signal is used to generate the decoded data.
Because of this, the base codec operates on a smaller number of pixels, thus allowing the codec to operate at a higher level of quality (e.g. a smaller quantization step size) and use its own internal coding tools in a more efficient manner. It may also consume less power.
Certain examples described herein provide a resilient and adaptive coding process.
For example, the configuration of the enhancement layer allows the overall coding process to be resilient to the typical coding artefacts introduced by traditional Discrete Cosine Transform (DCT) block-based codecs that may be used in the base layer. The first enhancement layer (level 1 residuals) enables the correction of artefacts introduced by the base codec, whereas the second enhancement layer (level 2 residuals) enables the addition of details and sharpness to a corrected up-sampled version of the signal. The level of correction may be adjusted by controlling a bit-rate up to a version that provides maximum fidelity and lossless encoding. Typically, the worse the base reconstruction, the more the first enhancement layer may contribute to a correction (e.g. in the form of encoded residual data output by that layer). Conversely, the better the base reconstruction, the more bit-rate can be allocated to the second enhancement layer (level 2 residuals) to sharpen the video and add fine details.
Certain examples described herein provide for agnostic base layer enhancement. For example, the examples may be used to enhance any base codec, from existing codecs such as MPEG-2, VP8, AVC, HEVC, VP9, AV1, etc. to future codecs including those under development such as EVC and VVC. This is possible because the enhancement layer operates on a decoded version of the base codec, and therefore it can be used on any format as it does not require any information on how the base layer has been encoded and/or decoded.
As described below, certain examples described herein allow for parallelization of enhancement layer encoding. For example, the enhancement layer does not implement any form of inter (i.e. between) block prediction. The image is processed applying small (2x2 or 4x4) independent transform kernels over the layers of residual data. Since no prediction is made between blocks, each 2x2 or 4x4 block can be processed independently and in a parallel manner. Moreover, each layer is processed separately, thus allowing decoding of the blocks and decoding of the layers to be done in a massively parallel manner.
With the presently described examples, errors introduced by the encoding / decoding process and the down-sampling / up-sampling process may be corrected for separately, to regenerate the original video on the decoder side. The encoded residuals and the encoded correction data are thus smaller in size than the input video itself and can therefore be sent to the decoder more efficiently than the input video (and hence more efficiently than a comparative UHD stream of the SVC and SHVC approaches).
In further comparison with SVC and SHVC, certain described examples involve sending encoded residuals and correction data to a decoder, without sending an encoded UHD stream itself In contrast, in SVC and SHVC, both the HD and UHD images are encoded as separate video streams and sent to the decoder. The presently described examples may allow for a significantly reduction in the overall bit rate for sending the encoded data to the decoder, e.g. so that BWTO, B 0 7 W -* -UHD In these cases, the total bandwidth for sending both an HD stream and a UHD stream may be less than the 30 bandwidth required by comparative standards to send just the UHD stream.
The presently described examples further allow coding units or blocks to be processed in parallel rather than sequentially. This is because the presently described examples do not apply intra-prediction; there is very limited spatial correlation between the spatial coefficients of different blocks, whereas SVC/SHVC provides for intra-
U
prediction. This is more efficient than the comparative approaches of SVC/SHVC, which involve processing blocks sequentially (e.g. as the UHD stream relies on the predictions from various pixels of the HD stream).
The enhancement coding described in examples herein may be considered an enhancement codec that encodes and decodes streams of residual data. This differs from comparative SVC and SHVC implementations where encoders receive video data as input at each spatial resolution level and decoders output video data at each spatial resolution level. As such, the comparative SVC and SHVC may be seen as the parallel implementation of a set of codecs, where each codec has a video-in / video-out coding structure. The enhancement codecs described herein on the other hand receive residual data and also output residual data at each spatial resolution level. For example, in SVC and SHVC the outputs of each spatial resolution level are not summed to generate an output video -this would not make sense.
It should be noted that in examples references to levels 1 and 2 are to be taken as an arbitrary labelling of enhancement sub-layers. These may alternatively be referred to be different names (e.g. with a reversed numbering system with levels 1 and 2 being respectively labelled as level 1 and level 0, with the "level 0" base layer below being level 2) Definitions and Terms In certain examples described herein the following terms are used.
"access unit" -this refers to a set of Network Abstraction Layer (NAL) units that are associated with each other according to a specified classification rule. They may be consecutive in decoding order and contain a coded picture (i.e. frame) of video (in certain cases exactly one).
"base layer" -this is a layer pertaining to a coded base picture, where the "base" refers to a codec that receives processed input video data. It may pertain to a portion of a bitstream that relates to the base.
"bitstream" -this is sequence of bits, which may be supplied in the form of a NAL unit stream or a byte stream. It may form a representation of coded pictures and associated data forming one or more coded video sequences (CVSs).
"block" -an MxN (M-column by N-row) array of samples, or an MxN array of transform coefficients. The term "coding unit" or "coding block" is also used to refer to an MxN array of samples. These terms may be used to refer to sets of picture elements (e.g. 3' values for pixels of a particular colour channel), sets of residual elements, sets of values that represent processed residual elements and/or sets of encoded values. The term -coding unit" is sometimes used to refer to a coding block of luma samples or a coding block of chroma samples of a picture that has three sample arrays, or a coding block of samples of a monochrome picture or a picture that is coded using three separate colour planes and syntax structures used to code the samples "byte" -a sequence of 8 bits, within which, when written or read as a sequence of bit values, the left-most and right-most bits represent the most and least significant bits, respectively.
"byte-aligned" -a position in a bitstream is byte-aligned when the position is an integer multiple of 8 bits from the position of the first bit in the bitstream, and a bit or byte or syntax element is said to be byte-aligned when the position at which it appears in a bitstream is byte-aligned.
"byte stream" -this may be used to refer to an encapsulation of a NAL unit stream containing start code prefixes and NAL units.
"chroma" -this is used as an adjective to specify that a sample array or single sample is representing a colour signal. This may be one of the two colour difference signals related to the primary colours, e.g. as represented by the symbols Cb and Cr. It may also be used to refer to channels within a set of colour channels that provide information on the colouring of a picture. The term chroma is used rather than the term chrominance in order to avoid the implication of the use of linear light transfer characteristics that is often associated with the term chrominance.
"chunk" -this is used to refer to an entropy encoded portion of data containing a quantized transform coefficient belonging to a coefficient group.
"coded picture" -this is used to refer to a set of coding units that represent a coded representation of a picture.
"coded base picture" -this may refer to a coded representation of a picture encoded using a base encoding process that is separate (and often differs from) an enhancement encoding process.
"coded representation" -a data element as represented in its coded form "coefficient group (CG)" -is used to refer to a syntactical structure containing encoded data related to a specific set of transform coefficients (i.e a set of transformed residual values).
N
"component-or "colour component" -this is used to refer to an array or single sample from one of a set of colour component arrays. The colour components may comprise one luma and two chroma components and/or red, green, blue (RGB) components. The colour components may not have a one-to-one sampling frequency, e.g. the components may compose a picture in 4:2:0, 4:2:2, or 4:4:4 colour format. Certain examples described herein may also refer to just a single monochrome (e.g. luma or grayscale) picture, where there is a single array or a single sample of the array that composes a picture in monochrome format.
"data block" -this is used to refer to a syntax structure containing bytes corresponding to a type of data "decoded base picture" -this is used to refer to a decoded picture derived by decoding a coded base picture.
"decoded picture--a decoded picture may be derived by decoding a coded picture. A decoded picture may be either a decoded frame, or a decoded field. A decoded field may be either a decoded top field or a decoded bottom field.
"decoded picture buffer (DPB)" -this is used to refer to a buffer holding decoded pictures for reference or output reordering "decoder" -equipment or a device that embodies a decoding process.
"decoding order" -this may refer to an order in which syntax elements are processed by the decoding process "decoding process" -this is used to refer to a process that reads a bitstream and derives decoded pictures from it.
"emulation prevention byte" -this is used in certain examples to refer to a byte equal to 0x03 that may be present within a NAL unit. Emulation prevention bytes may be used to ensure that no sequence of consecutive byte-aligned bytes in the NAL unit contains a start code prefix.
"encoder" -equipment or a device that embodies a encoding process.
"encoding process" -this is used to refer to a process that produces a bitstream (i.e. an encoded bitstream) "enhancement layer" -this is a layer pertaining to a coded enhancement data, where the enhancement data is used to enhance the "base layer (sometimes referred to as the "base"). It may pertain to a portion of a bitstream that comprises planes of residual data. The singular term is used to refer to encoding and/or decoding processes that are distinguished from the "base" encoding and/or decoding processes.
"enhancement sub-layer" -in certain examples, the enhancement layer comprises multiple sub-layers. For example, the first and second levels described below are "enhancement sub-layers" that are seen as layers of the enhancement layer.
"field" -this term is used in certain examples to refer to an assembly of alternate rows of a frame. A frame is composed of two fields, a top field and a bottom field. The term field may be used in the context of interlaced video frames.
"video frame" -in certain examples a video frame may comprise a frame composed of an array of luma samples in monochrome format or an array of luma samples and two corresponding arrays of chroma samples. The luma and chroma samples may be supplied in 4:2:0, 4:2:2, and 4:4:4 colour formats (amongst others). A frame may consist of two fields, a top field and a bottom field (e.g. these terms may be used in the context of interlaced video).
"group of pictures (GOP)--this term is used to refer to a collection of successive coded base pictures starting with an intra picture. The coded base pictures may provide the reference ordering for enhancement data for those pictures.
"instantaneous decoding refresh (IDR) picture" -this is used to refer to a picture for which an NAL unit contains a global configuration data block.
"inverse transform" -this is used to refer to part of the decoding process by which a set of transform coefficients are converted into residuals.
"layer" -this term is used in certain examples to refer to one of a set of syntactical structures in a non-branching hierarchical relationship, e.g. as used when referring to the "base" and "enhancement" layers, or the two (sub-) "layers" of the enhancement layer. "luma" -this term is used as an adjective to specify a sample array or single sample that represents a lightness or monochrome signal, e.g, as related to the primary colours.
Luma samples may be represented by the symbol or subscript Y or L. The term "luma" is used rather than the term luminance in order to avoid the implication of the use of linear light transfer characteristics that is often associated with the term luminance. The symbol L is sometimes used instead of the symbol Y to avoid confusion with the symbol y as used for vertical location, "network abstraction layer (NAL) unit (NALU)" -this is a syntax structure containing an indication of the type of data to follow and bytes containing that data in the form of a raw byte sequence payload (RBSP -see definition below).
"network abstraction layer (NAL) unit stream" -a sequence of NAL units.
"output order" -this is used in certain examples to refer to an order in which the decoded pictures are output from the decoded picture buffer (for the decoded pictures that are to be output from the decoded picture buffer).
"partitioning" -this term is used in certain examples to refer to the division of a set into subsets. It may be used to refer to cases where each element of the set is in exactly one of the subsets.
"plane" -this term is used to refer to a collection of data related to a colour component. For example, a plane may comprise a Y (luma) or Cx (chroma) plane. In certain cases, a monochrome video may have only one colour component and so a picture 10 or frame may comprise one or more planes.
"picture" -this is used as a collective term for a field or a frame. In certain cases, the terms frame and picture are used interchangeably.
"random access" -this is used in certain examples to refer to an act of starting the decoding process for a bitstream at a point other than the beginning of the stream.
"raw byte sequence payload (RBSP)" -the RBSP is a syntax structure containing an integer number of bytes that is encapsulated in a NAL unit. An RBSP is either empty or has the form of a string of data bits containing syntax elements followed by an RBSP stop bit and followed by zero or more subsequent bits equal to 0. The RBSP may be interspersed as necessary with emulation prevention bytes.
"raw byte sequence payload (RBSP) stop bit" -this is a bit that may be set to 1 and included within a raw byte sequence payload (RBSP) after a string of data bits. The location of the end of the string of data bits within an RBSP may be identified by searching from the end of the RBSP for the RBSP stop bit, which is the last non-zero bit in the RBSP.
"reserved" -this term may refer to values of syntax elements that are not used in the bitstreams described herein but are reserved for future use or extensions. The term "reserved zeros" may refer to reserved bit values that are set to zero in examples. "residual" -this term is defined in further examples below. It generally refers to a difference between a reconstructed version of a sample or data element and a reference of that same sample or data element.
"residual plane" -this term is used to refer to a collection of residuals, e.g. that are organised in a plane structure that is analogous to a colour component plane. A residual plane may comprise a plurality of residuals (i.e. residual picture elements) that may be array elements with a value (e.g. an integer value). F'
"run length encoding" -this is a method for encoding a sequence of values in which consecutive occurrences of the same value are represented as a single value together with its number of occurrences.
"source" -this term is used in certain examples to describe the video material or some of its attributes before encoding.
"start code prefix" -this is used to refer to a unique sequence of three bytes equal to 0x000001 embedded in the byte stream as a prefix to each NAL unit. The location of a start code prefix may be used by a decoder to identify the beginning of a new NAL unit and the end of a previous NAL unit. Emulation of start code prefixes may be prevented within NAL units by the inclusion of emulation prevention bytes.
"string of data bits (SODB)" -this term refers to a sequence of some number of bits representing syntax elements present within a raw byte sequence payload prior to the raw byte sequence payload stop bit. Within an SODB, the left-most bit is considered to be the first and most significant bit, and the right-most bit is considered to be the last and least significant bit.
"syntax element" -this term may be used to refer to an element of data represented in the bitstream.
"syntax structure" -this term may be used to refer to zero or more syntax elements present together in the bitstream in a specified order.
"tile" -this term is used in certain examples to refer to a rectangular region of blocks or coding units within a particular picture, e.g, it may refer to an area of a frame that contains a plurality of coding units where the size of the coding unit is set based on an applied transform.
"transform coefficient" (or just "coefficient") -this term is used to refer to a value that is produced when a transformation is applied to a residual or data derived from a residual (e.g. a processed residual). It may be a scalar quantity, that is considered to be in a transformed domain. In one case, an 1\4 by N coding unit may be flattened into an M*N one-dimensional array. In this case, a transformation may comprise a multiplication of the one-dimensional array with an M by N transformation matrix. In this case, an output may comprise another (flattened) M*N one-dimensional array. In this output, each element may relate to a different "coefficient", e.g. for a 2x2 coding unit there may be 4 different types of coefficient. As such, the term "coefficient" may also be associated with a particular index in an inverse transform part of the decoding process, e.g. a particular index in the aforementioned one-dimensional array that represented transformed residuals.
"video coding layer (VCL) NAL unit" -this is a collective term for NAL units that have reserved values of NalUnitType and that are classified as VCL NAL units in certain examples.
As well as the terms above, the following abbreviations are sometimes used: CG -Coefficient Group; CPB-Coded Picture Buffer; CPBB -Coded Picture Buffer of the Base; CPBL -Coded Picture Buffer of the Enhancement; CU -Coding Unit; CVS -Coded Video Sequence; DPB -Decoded Picture Buffer; DPBB -Decoded Picture Buffer of the Base; DUT -Decoder Under Test; HBD -Hypothetical Base Decoder; LID -Hypothetical Demuxer; F1RD -Hypothetical Reference Decoder; HSS -Hypothetical Stream Scheduler; I -Intra; DR -Instantaneous Decoding Refresh; LSB -Least Significant Bit; MSB -Most Significant Bit; NAL -Network Abstraction Layer; P -Predictive; RBSP -Raw Byte Sequence Payload; RGB -red, green blue (may also be used as GBR -green, blue, red -i.e. reordered RGB; RLE -Run length encoding; SET -Supplemental Enhancement Information; SODB -String of data bits; SPS -Sequence Parameter Set; and VCL -Video Coding Layer.
Example Encoders and Decoders First Example Encoder General Architecture Figure 1 shows a first example encoder 100. The illustrated components may also be implemented as steps of a corresponding encoding process.
In the encoder 100, an input full resolution video 102 is received and is processed to generate various encoded streams. At a down-sampling component 104, the input video 102 is down-sampled. An output of the down-sampling component 104 is received by a base codec that comprises a base encoder 102 and a base decoder 104. A first encoded stream (encoded base stream) 116 is produced by feeding the base codec (e.g., AVC, FIEVC, or any other codec) with a down-sampled version of the input video 102. At a first subtraction component 120, a first set of residuals is obtained by taking the difference between a reconstructed base codec video as output by the base decoder 104 and the down-sampled version of the input video (i.e. as output by the down-sampling component 104).
A level 1 encoding component 122 is applied to the first set of residuals that are output by the first subtraction component 120 to produce a second encoded stream (encoded level 1 stream) 126.
In the example of Figure 1, the level 1 encoding component 122 operates with an optional level 1 temporal buffer 124. This may be used to apply temporal processing as described later below. Following a first level of encoding by the level 1 encoding component 122, the first encoded stream 126 may be decoded by a level 1 decoding component 128. A deblocking filter 130 may be applied to the output of the level 1 decoding component 128. In Figure 1, an output of the deblocking filter 130 is added to the output of the base decoder 114 (i.e. is added to the reconstructed base codec video) by a summation component 132 to generate a corrected version of the reconstructed base coded video. The output of the summation component 132 is then up-sampled by an up-sampling component 134 to produce an up-sampled version of a corrected version of the reconstructed base coded video.
At a second subtraction component 136, a difference between the up-sampled version of a corrected version of the reconstructed base coded video (i.e. the output of the up-sampling component 134) and the input video 102 is taken. This produces a second set of residuals. The second set of residuals as output by the second subtraction component 136 is passed to a level 2 encoding component 142. The level 2 encoding component 142 produces a third encoded stream (encoded level 2 stream) 146 by encoding the second set of residuals. The level 2 encoding component 142 may operate together with a level 2 temporal buffer 144 to apply temporal processing. One or more of the level 1 encoding component 122 and the level 2 encoding component 142 may apply residual selection as described below. This is shown as being controlled by a residual mode selection component 150. The residual mode selection component 150 may receive the input video 102 and apply residual mode selection based on an analysis of the input video 102. Similarly, the level 1 temporal buffer 124 and the level 2 temporal buffer 144 may operate under the control of a temporal selection component 152. The temporal selection component 152 may receive one or more of the input video 102 and the output of the down-sampling component 104 to select a temporal mode. This is explained in more detail in later examples.
First Example Decoder' -General Architecture Figure 2 shows a first example decoder 200. The illustrated components may also be implemented as steps of a corresponding decoding process. The decoder 200 receives three encoded streams: encoded base stream 216, encoded level 1 stream 226 and encoded level 2 stream 246. These three encoded streams correspond to the three streams generated by the encoder 100 of Figure 1. In the example of Figure 2, the three encoded streams are received together with headers 256 containing further decoding information.
The encoded base stream 216 is decoded by a base decoder 218 corresponding to the base codec used in the encoder 100 (e.g. corresponding to base decoder 114 in Figure 1). At a first summation component 220, the output of the base decoder 218 is combined with a decoded first set of residuals that are obtained from the encoded level 1 stream 226. In particular, a level 1 decoding component 228 receives the encoded level I stream 226 and decodes the stream to produce the decoded first set of residuals. The level 1 decoding component 228 may use a level 1 temporal buffer 230 to decode the encoded level 1 stream 226. In the example of Figure 2, the output of the level 1 decoding component 228 is passed to a deblocking filter 232. The level I decoding component 228 may be similar to the level 1 decoding component 128 used by the encoder 100 in Figure 1. The deblocking filter 232 may also be similar to the deblocking filter 130 used by the encoder 100. In Figure 2, the output of the deblocking filter 232 forms the decoded first set of residuals that are combined with the output of the base decoder 218 by the first summation component 220.
The output of the first summation component 220 may be seen as a corrected level 1 reconstruction, where the decoded first set of residuals correct an output of the base decoder 218 at a first resolution.
At an up-sampling component 234, the combined video is up-sampled. The up-20 sampling component 234 may implement a form of modified up-sampling as described with respect to later examples. The output of the up-sampling component 234 is further combined with a decoded second set of residuals that are obtained from the encoded level 2 stream 246. In particular, a level 2 decoding component 248 receives the encoded level 2 stream 246 and decodes the stream to produce the decoded second set of residuals. The decoded second set of residuals, as output by the level 2 decoding component 248 are combined with the output of the up-sampling component 234 by summation component 258 to produce a decoded video 260. The decoded video 260 comprises a decoded representation of the input video 102 in Figure 1. The level 2 decoding component 248 may also use a level 2 temporal buffer 250 to apply temporal processing. One or more of the level 1 temporal buffer 230 and the level 2 temporal buffer 250 may operate under the control of a temporal selection component 252. The temporal selection component 252 is shown receiving data from headers 256. This data may comprise data to implement temporal processing at one or more of the level I temporal buffer 230 and the level 2 temporal buffer 250. The data may indicate a temporal mode that is applied by the temporal selection component 252 as described with reference to later examples Second Example Encoder Encoding Sub-Processing and Temporal Prediction Figures 3A and 3B show different variations of a second example encoder 300, 360. The second example encoder 300, 360 may comprise an implementation of the first example encoder 100 of Figure 1. In the examples of Figures 3A and 3B, the encoding steps of the stream are expanded in more detail to provide an example of how the steps may be performed. Figure 3A illustrates a first variation with temporal prediction provided only in the second level of the enhancement process, i.e. with respect to the level 2 encoding. Figure 3B illustrates a second variation with temporal prediction performed in processes of both levels of enhancement (i.e. levels 1 and 2).
In Figure 3A, an encoded base stream 316 is substantially created by a process as explained with respect to Figure 1 above. That is, an input video 302 is down-sampled (i.e. a down-sampling operation is applied by a down-sampling component 304 to the input video 102 to generate a down-sampled input video. The down-sampled video is then encoded using a base codec, in particular by a base encoder 312 of the base codec. An encoding operation applied by the base encoder 312 to the down-sampled input video generates an encoded base stream 316. The base codec may also be referred to as a first codec, as it may differ from a second codec that is used to produce the enhancement streams (i.e. the encoded level 1 stream 326 and the encoded level 2 stream 346). Preferably the first or base codec is a codec suitable for hardware decoding. As per Figure 1, an output of the base encoder 312 (i.e. the encoded base stream 316) is received by a base decoder 314 (e.g, that forms part of, or provides a decoding operation for, the base codec) that outputs a decoded version of the encoded base stream. The operations performed by the base encoder 312 and the base decoder 314 may be referred to as the base layer or base level. The base layer or level may be implemented separately from an enhancement or second layer or level, and the enhancement layer or level instructs and/or controls the base layer or level (e.g. the base encoder 312 and the base decoder 314).
As noted with respect to Figure 1, the enhancement layer or level may comprise two levels that produce two corresponding streams. In this context, a first level of enhancement (described herein as "level 1") provides for a set of correction data which can be combined with a decoded version of the base stream to generate a corrected picture.
This first enhancement stream is illustrated in Figures 1 and 3 as the encoded level 1 stream 326 To generate the encoded level 1 stream, the encoded base stream is decoded, i.e. an output of the base decoder 314 provides a decoded base stream. As in Figure 1, at a first subtraction component, a difference between the decoded base stream and the down-sampled input video (i.e. the output of the down-sampling component 304) is then created (i.e. a subtraction operation is applied to the down-sampled input video and the decoded base stream to generate a first set of residuals). Here the term "residuals" is used in the same manner as that known in the art, that is, the error between a reference frame and a desired frame. Here the reference frame is the decoded base stream and the desired frame is the down-sampled input video. Thus, the residuals used in the first enhancement level can be considered as a corrected video as they 'correct' the decoded base stream to the down-sampled input video that was used in the base encoding operation.
In general, the term "residuals" as used herein refers to a difference between a value of a reference array or reference frame and an actual array or frame of data. The array may be a one or two-dimensional array that represents a coding unit. For example, a coding unit may be a 2x2 or 4x4 set of residual values that correspond to similar sized areas of an input video frame. It should be noted that this generalised example is agnostic as to the encoding operations performed and the nature of the input signal. Reference to "residual data" as used herein refers to data derived from a set of residuals, e.g. a set of residuals themselves or an output of a set of data processing operations that are performed on the set of residuals. Throughout the present description, generally a set of residuals includes a plurality of residuals or residual elements, each residual or residual element corresponding to a signal element, that is, an element of the signal or original data. The signal may be an image or video. In these examples, the set of residuals corresponds to an image or frame of the video, with each residual being associated with a pixel of the signal, the pixel being the signal clement.
It should be noted that the "residuals" described herein are, however, very different from "residuals" that are generated in comparative technologies such as SVC and SHVC.
In SVC, the term "residuals" is used to refer to a difference between a pixel block of a frame and a predicted pixel block for the frame, where the predicted pixel block is predicted using either inter-frame prediction or intra-frame prediction. In contrast, the present examples involve calculating residuals as a difference between a coding unit and a reconstructed coding unit, e.g. a coding unit of elements that has undergone down-sampling and subsequent up-sampling, and has been corrected for encoding / decoding errors. In the described examples, the base codec (i.e. the base encoder 312 and the base decoder 314) may comprise a different codec from the enhancement codec, e.g. the base and enhancement streams are generated by different sets of processing steps. In one case, the base encoder 312 may comprise an AVC or HEVC encoder and thus internally generates residual data that is used to generate the encoded base stream 316. However, the processes that are used by the AVC or HEVC encoder differ from those that are used to generate the encoded level 1 and level 2 streams 326, 346.
Returning to Figures 3A and 3B, an output of the subtraction component 320, i.e. a difference that corresponds to a first set of residuals, is then encoded to generate the encoded level 1 stream 326 (i.e. an encoding operation is applied to the first set of residuals to generate a first enhancement stream). In the example implementations of Figures 3A and 3B, the encoding operation comprises several sub-operations, each of which is optional and preferred and provides particular benefits. In Figures 3A and 3B, a series of components are shown that implement these sub-operations and these may be considered to implement the level 1 and level 2 encoding 122 and 142 as shown in Figure 1, In Figures 3A and 3B, the sub-operations, in general, include a residuals ranking mode step, a transform step, a quantization step and an entropy encoding step.
For the level 1 encoding, a level 1 residuals selection or ranking component 321 receives an output of the first subtraction component 320. The level 1 residuals selection or ranking component 321 is shown as being controlled by a residual mode ranking or selection component 350 (e.g. in a similar manner to the configuration of Figure 1), In Figure 3A, ranking is performed by the residual mode ranking component 350 and applied by the level 1 selection component 321, the latter selecting or filtering the first set of residuals based on a ranking performed by the residual mode ranking component 350 (e.g. based on an analysis of the input video 102 or other data). In Figure 3B this arrangement is reversed, such that a general residual mode selection control is performed by a residual mode selection component 350 but ranking is performed at each enhancement level (e.g. as opposed to ranking based on the input video 102). In the example of Figure 3B, the 30 ranking may be performed by the level 1 residual mode ranking component 321 based on an analysis of the first set of residuals as output by the first subtraction component 320. In general, the second example encoder 300, 360 identifies if the residuals ranking mode is selected. This may be performed by the residual mode ranking or selection component 350. If a residuals ranking mode is selected, then this may be indicated by the residual mode ranking or selection component 350 to the level 1 residuals selection or ranking component 321 to perform a residuals ranking step. The residuals ranking operation may be performed on the first step of residuals to generate a ranked set of residuals. The ranked set of residuals may be filtered so that not all residuals are encoded into the first enhancement stream 326 (or correction stream). Residual selection may comprise selecting a subset of received residuals to pass through for further encoding. Although the present examples describe a "ranking" operation, this may be seen as a general filtering operation that is performed on the first set of residuals (e.g. the output of the first subtraction component 320), i.e. the level 1 residuals selection or ranking component 321 is an implementation of a general filtering component that may modify the first set of residuals.
Filtering may be seen as setting certain residual values to zero, i.e. such that an input residual value is filtered out and does not form part of the encoded level 1 stream 326. In Figures 3A and 3B, an output of the level 1 residuals selection or ranking component 321 is then received by a level 1 transform component 322. The level 1 transform component 322 applies a transform to the first set of residuals, or the ranked or filtered first set of residuals, to generate a transformed set of residuals. The transform operation may be applied to the first set of residuals or the filtered first set of residuals depending on whether or not ranking mode is selected to generate a transformed set of residuals. A level 1 quantize component 323 is then applied to an output of the level 1 transform component 322 (i.e. the transformed set of residuals) to generate a set of quantized residuals. Entropy encoding is applied by a level 1 entropy encoding component 325 that applies an entropy encoding operation to the quantized set of residuals (or data derived from this set) to generate the first level of enhancement stream, i.e. the encoded level 1 stream 326. Hence, in the level 1 layer a first set of residuals are transformed, quantized and entropy encoded to produce the encoded level 1 stream 326. Further details of possible implementations of the transformation, quantization and entropy encoding are described with respect to later examples. Preferably, the entropy encoding operation may be a Huffman encoding operation or a run-length encoding operation or both. Optionally a control operation may be applied to the quantized set of residuals so as to cotTect for the effects of the ranking operation. This may be applied by the level 1 residual mode control component 324, which may operate under the control of the residual mode ranking or selection component 350.
As noted above, the enhancement stream may comprise a first level of enhancement and a second level of enhancement (i.e. levels 1 and 2). The first level of enhancement may be considered to be a corrected stream. The second level of enhancement may be considered to be a further level of enhancement that converts the corrected stream to the original input video. The further or second level of enhancement is created by encoding a further or second set of residuals which are the difference between an up-sampled version of a reconstructed level 1 video as output by the summation component 332 and the input video 302. Up-sampling is performed by an up-sampling component 334. The second set of residuals result from a subtraction applied by a second subtraction component 336, which takes the input video 302 and the output of the up-sampling component 334 as inputs.
In Figures 3A and 3B, the first set of residuals are encoded by a level 1 encoding process. This process, in the example of Figures 3A and 3B, comprises the level I transform component 322 and the level 1 quantize component 323. Before up-sampling, the encoded first set of residuals are decoded using an inverse quantize component 327 and an inverse transform component 328. These components act to simulate (level 1) decoding components that may be implemented at a decoder. As such, the quantized (or controlled) set of residuals that are derived from the application of the level 1 transform component 322 and the level 1 quantize component 323 are inversely quantized and inversely transformed before a de-blocking filter 330 is applied to generate a decoded first set of residuals (i.e. an inverse quantization operation is applied to the quantized first set of residuals to generate a de-quantized first set of residuals; an inverse transform operation is applied to the de-quantized first set of residuals to generate a de-transformed first set of residuals; and, a de-blocking filter operation is applied to the de-transformed first set of residuals to generate a decoded first set of residuals). The de-blocking filter 330 is optional depending on the transform applied and may comprise applying a weighted mask to each block of the de-transformed first set of residuals.
At the summation component 332, the decoded base stream as output by the base decoder 314 is combined with the decoded first set of residuals as received from the deblocking filter 330 (i.e, a summing operation is performed on the decoded base stream and the decoded first set of residuals to generate a re-created first stream). As illustrated in Figures 3A and B, that combination is then up-sampled by the up-sampling component 334 (i.e. an up-sampling operation is applied to the re-created first stream to generate an up-sampled re-created stream). The up-sampled stream is then compared to the input video at the second summation component 336, which creates the second set of residual s (i.e, a difference operation is applied to the up-sampled re-created stream to generate a further set of residuals). The second set of residuals are then encoded as the encoded level 2 enhancement stream 346 (i.e. an encoding operation is then applied to the further or second set of residuals to generate an encoded further or second enhancement stream).
As with the encoded level 1 stream, the encoding applied to the second set (level 2) residuals may comprise several operations. Figure 3A shows a level 2 residuals selection component 340, a level 2 transform component 341, a level 2 quantize component 343 and a level 2 entropy encoding component 345. Figure 3B shows a similar set of components but in this variation the level 2 residuals selection component 340 is implemented as a level 2 residuals ranking component 340, which is under control of the residual mode selection component 350. As discussed above, ranking and selection may be performed based on one or more of the input video 102 and the individual first and second sets of residuals. In Figure 3A, a level 2 temporal buffer 345 is also provided, the contents of which are subtracted from the output of the level 2 transform component 341 by third subtraction component 342. In other examples, the third subtraction component 342 may be located in other positions, including after the level 2 quantize component 343. As such the level 2 encoding shown in Figures 3A and 3B has steps of ranking, temporal prediction, transform, quantization and entropy encoding. In particular, the second example encoder 200 may identify if a residuals ranking mode is selected. This may be performed by one or more of the residual ranking or selection component 350 and the individual level 2 selection and ranking components 340. If a residuals ranking or filtering mode is selected the residuals ranking step may be performed by one or more of the residual ranking or selection component 350 and the individual level 2 selection and ranking components 340 (i.e. a residuals ranking operation may be performed on the second set of residuals to generate a second ranked set of residuals). The second ranked set of residuals may be filtered so that not all residuals are encoded into the second enhancement stream (i.e. the encoded level 2 stream 346). The second set of residuals or the second ranked set of residuals are subsequently transformed by the level 2 transform component 341 (i.e. a transform operation is performed on the second ranked set of residuals to generate a second transformed set of residuals). As illustrated by the coupling between the output of the summation component 332 and the level 2 transform component 341, the transform operation may utilise a predicted coefficient or predicted average derived from the recreated first stream, prior to up-sampling. Other examples of this predicted average computation are described with reference to other examples; further information may be found elsewhere in this document. In level 2, the transformed residuals (either temporally predicted or otherwise) are then quantized and entropy encoded in the manner described elsewhere (i.e. a quantization operation is applied to the transformed set of residuals to generate a second set of quantized residuals; and, an entropy encoding operation is applied to the quantized second set of residuals to generate the second level of enhancement stream).
Figure 3A shows a variation of the second example encoder 200 where temporal prediction is performed as part of the level 2 encoding process. Temporal prediction is performed using the temporal selection component 352 and the level 2 temporal buffer 345. The temporal selection component 352 may determine a temporal processing mode as described in more detail below and control the use of the level 2 temporal buffer 345 accordingly. For example, if no temporal processing is to be performed the temporal selection component 352 may indicate that the contents of the level 2 temporal buffer 345 are to be set to 0.
Figure 3B shows a variation of the second example encoder 200 where temporal prediction is performed as part of both the level 1 and the level 2 encoding process. In Figure 3B, a level 1 temporal buffer 361 is provided in addition to the level 2 temporal buffer 345. Although not shown, further variations where temporal processing is performed at level 1 but not level 2 are also possible.
When temporal prediction is selected, the second example encoder 200 may further modify the coefficients (i.e. the transformed residuals output by a transform component) by subtracting a corresponding set of coefficients derived from an appropriate temporal buffer. The corresponding set of coefficients may comprise a set of coefficients for a same spatial area (e.g. a same coding unit as located within a frame) that are derived from a previous frame (e.g. coefficients for the same area for a previous frame). The subtraction may be applied by a subtraction component such as the third subtractions components 346 and 362 (for respective levels 2 and 1). This temporal prediction step will be further described with respect to later examples. In summary, when temporal prediction is applied, the encoded coefficients correspond to a difference between the frame and an other frame of the stream. The other frame may be an earlier or later frame (or block in the frame) in the stream. Thus, instead of encoding the residuals between the up-sampled re-created stream and the input video, the encoding process may encode the difference between a transformed frame in the stream and the transformed residuals of the frame. Thus, the entropy may be reduced. Temporal prediction may be applied selectively for groups of coding units (referred to herein as "tiles") based on control information and the application of temporal prediction at a decoder may be applied by sending additional control information along with the encoded streams (e.g. within headers or as a further surface as described with reference to later examples).
As shown in Figures 3A and 3B, when temporal prediction is active, each transformed coefficient may be: A -''current ''buffer where the temporal buffer may store data associated with a previous frame. Temporal prediction may be performed for one colour plane or for multiple colour planes. In general, the subtraction may be applied as an element wise subtraction for a "frame" of video where the elements of the frame represent transformed coefficients, where the transform is applied with respect to a particular;' by n coding unit size (e.g. 2x2 or 4x4). The difference that results from the temporal prediction (e.g. the delta above may be stored in the buffer for use for a subsequent frame. Hence, in effect, the residual that results to the temporal prediction is a coefficient residual with respect to the buffer. Although Figures 3A and 3B show temporal prediction being performed after the transform operation, it may also be performed after the quantize operation. This may avoid the need to apply the level 2 inverse quantization component 372 and/or the level 1 inverse quantize component 364.
Thus, as illustrated in Figures 3A and 3B and described above, the output of the second example encoder 200 after performing an encoding process is an encoded base stream 316 and one or more enhancement streams which preferably comprise an encoded level 1 stream 326 for a first level of enhancement and an encoded level 2 stream 346 for a further or second level of enhancement.
Third Example Encoder and Second Example Decoder -Predicted Residuals Figure 4 shows a third example encoder 400 that is a variation of the first example encoder 100 of Figure 1. Corresponding reference numerals are used to refer to corresponding Features from Figure I (i.e. where feature I xx relates to feature 4xx in Figure 4). The example of Figure 4 shows in more detail how predicted residuals, e.g. a predicted average, may be applied as part of an up-sampling operation. Also, in Figure 4, the deblocking filter 130 is replaced by a more general configurable filter 430.
In Figure 4, a predicted residuals component 460 receives an input at a level 1 spatial resolution in the form of an output of a first summation component 432. This input comprises at least a portion of the reconstructed video at level 1 that is output by the first summation component 432. The predicted residuals component 460 also receives an input at a level 2 spatial resolution from the up-sampling component 434. The inputs may comprise a lower resolution element that is used to generate a plurality of higher resolution elements (e.g. a pixel that is then up-sampled to generate 4 pixels in a 2x2 block). The predicted residuals component 460 is configured to compute a modifier for the output of the up-sampling component 434 that is added to said output via a second summation component 462. The modifier may be computed to apply the predicted average processing that is described in detail in later examples. In particular, where an average delta is determined (e.g. a difference between a computed average coefficient and an average that is predicted from a lower level), the components of Figure 4 may be used to restore the average component outside of the level 2 encoding process 442. The output of the second summation component 462 is then used as the up-sampled input to the second subtraction component 436.
Figure 5A shows how a predicted residuals operation may be applied at a second example decoder 500. Like Figure 4, the second example decoder 500 may be considered is a variation of the first example decoder 200 of Figure 2. Corresponding reference numerals are used to refer to corresponding features from Figure 2 (i.e. where feature 2xx relates to feature 5xx in Figure 5). The example of Figure 5A shows in more detail how predicted residuals, e.g. a predicted average, may be applied at the decoder as part of an up-sampling operation. Also, in Figure SA, the deblocking filter 232 is replaced by a more general configurable filter 532. It should be noted that the predicted residuals processing may be applied asymmetrically at the encoder and the decoder, e.g. the encoder need not be configured according to Figure 4 to allow decoding as set out in Figure SA. For example, the encoder may applied a predicted average computation as described in US Patent 9,509,990, which is incorporated herein by reference.
The configuration of the second example decoder 500 is similar to the third example encoder 400 of Figure 4. A predicted residuals component 564 receives a first input from a first summation component 530, which represents a level 1 frame, and a second input from the up-sampling component 534, which represents an up-sampled version of the level I frame. The inputs may be received as a lower level element and a set of corresponding higher level elements. The predicted residuals component 564 uses the inputs to compute a modifier that is added to the output of the up-sampling component 534 by the second summation component 562. The modifier may correct for use of a predicted average, e.g. as described in US Patent 9,509,990 or computed by the third example encoder 400. The modified up-sampled output is then received by a third summation component 558 that performs the level 2 correction or enhancement as per previous examples The use of one or more of the predicted residuals components 460 and 564 may implement the "modified up-sampling" of other examples, where the modifier computed by the components and applied by respective summation components performs the "modification". These examples may provide for faster computation of predicted averages as the modifier is added in reconstructed video space as opposed to requiring conversion to coefficient space that represents transformed residuals (e.g. the modifier is applied to pixels of reconstructed video rather than applied in the A, H, V and D coefficient space of the transformed residuals).
Third Example Decoder Sub-Operations and Temporal Prediction Figures 5B and SC illustrate respective variations of a third example decoder 580, 590. The variations of the third example decoder 580, 590 may be respective implemented to correspond to the variations of the third example encoder 300, 360 shown in Figures 3A and 3B. The third example decoder 580, 590 may be seen as an implementation of one or more of the first and second example encoders 200, 400 from Figures 2 and 4. As before, similar reference numerals are used where possible to refer to features that correspond to features in earlier examples.
Figures 5B and SC show implementation examples of the decoding process described briefly above and illustrated in Figure 2. As is clearly identifiable, the decoding steps and components are expanded in more detail to provide an example of how decoding may be performed at each level. As with Figures 3A and 3B, Figure 5B illustrates a variation where temporal prediction is used only for the second level (i.e. level 2) and Figure 5C illustrates a variation where temporal prediction is used in both levels (i.e. levels 1 and 2). As before, further variations are envisaged (e.g. level 1 but not level 2), where the form of the configuration may be controlled using signalling information.
As shown in the examples of Figures 5A and 5C, in the decoding process, the decoder may parse headers 556 configure the decoder based on those headers. The headers may comprise one or more of global configuration data, picture (i.e. frame) configuration data, and assorted data blocks (e.g. relating to elements or groups of elements within a picture). In order to re-create the input video (e.g. the input video 102, 302 or 402 in previous examples), an example decoder such as the third example decoder may decode each of the encoded base stream 516, the first enhancement or encoded level 1 stream 526 and the second enhancement or encoded level 2 stream 546. The frames of the stream may be synchronised and then combined to derive the decoded video 560.
As shown in Figure 5B, the level 1 decoding component 528 may comprise a level 1 entropy decoding component 571, a level 1 inverse quantize component 572, and a level 1 inverse transform component 573. These may comprise decoding versions of the respective level 1 encoding components 325, 323 and 322 in Figures 3A and 3B. The level 2 decoding component 548 may comprise a level 2 entropy decoding component 581, a level 2 inverse quantize component 582, and a level 2 inverse transform component 583. These may comprise decoding versions of the respective level 2 encoding components 344, 343 and 341 in Figures 3A and 3B. In each decoding process, the enhancement streams may undergo the steps of entropy decoding, inverse quantization and inverse transform using the aforementioned components or operations to re-create a set of residuals.
In particular, in Figure 5B, an encoded base stream 516 is decoded by a base decoder 518 that is implemented as part of a base codec 584. It should be noted that the base and enhancement streams are typically encoded and decoded using different codecs, wherein the enhancement codec operates on residuals (i.e. may implement the level 1 and level 2 encoding and decoding components) and the base codec operates on video at a level 1 resolution. The video at the level I resolution may represent a lower resolution than the base codec normally operates at (e.g. a down-sampled signal in two dimensions may be a quarter of the size), which allows the base codec to operate at a high speed. This also marks a difference from SVC wherein each layer applies a common codec (AVC) and operates on video data rather than residual data. Even in SHVC, all spatial layers are configured to operate on a video in / video out manner where each video out represents a different playable video. In the present examples, the enhancement streams do not represent playable video in the conventional sense -the output of the level 1 and level 2 decoding components 528 and 548 (e.g. as received by the first summation component 530 and the second summation component 558) arc "residual videos", i.e. consecutive frames of residuals for multiple colour planes rather than the colour planes themselves. This then allows a much greater bit rate saving as compared to SVC and SHVC, as the enhancement streams will often be 0 (as a quantized difference is often 0), where 0 values may be efficiently compressed using nn-length coding. It is also to be noted that in the present examples, each coding unit of n by n elements (e.g. 2x2 or 4x4 blocks of pixels that may be flattened into one-dimensional arrays) does not depend on predictions that involve other coding units within the frame as per standard intra-processing in SVC and SHVC. As such, the encoding and decoding components in the enhancement streams may be applied in parallel to different coding units (e g. different areas of a frame may be effectively processed in parallel), as unlike SVC and SHVC there is no need to wait for a decoded result of another coding unit to compute a subsequent coding unit. This means the enhancement codec may be implemented extremely efficiently on parallel processors such as common graphic processing units in computing devices (including mobile computing devices). This parallelism is not possible with the high complexity processing of SVC and SHVC.
Returning to Figure 5B, as in previous examples an optional filter such as deblocking filter 532 may be applied to the output of the level 1 decoding component 528 to remove blocking or other artefacts and the output of the filter is received by the first summation component 530 where it is added to the output of the base codec (i.e. the decoded base stream). Note that the output of the base codec may resemble a low resolution video as decoded by a conventional codec but the level 1 decoding output is a (filtered) first set of residuals. This is different from SVC and SHVC where this form of summation makes no sense, as each layer outputs a full video at a respective spatial resolution. As in Figure 2, a modified up-sampling component 587 receives a corrected reconstruction of the video at level 1 that is output by the first summation component 530 and up-samples this to generate an up-sampled reconstruction. The modified up-sampling component 587 may apply the modified up-sampling illustrated in Figure 4. In other examples, the up-sampling may not be modified, e.g. if a predicted average is not being used or is being applied in the manner described in US Patent 9,509,990, In Figure 5B, temporal prediction is applied during the level 2 decoding. In the example of Figure 5B, the temporal prediction is controlled by temporal prediction component 585. In this variation, control information for the temporal prediction is extracted from the encoded level 2 stream 546, as indicated by the arrow from the stream to the temporal prediction component 585. In other implementations, such as those shown in Figures 5A and 5C, control information for the temporal prediction may be sent separately from the encoded level 2 stream 546, e.g. in the headers 556. The temporal prediction component 585 controls the use of level 2 temporal buffer 550, e.g. may determine a temporal mode and control temporal refresh as described with reference to later examples. The contents of the temporal buffer 550 may be updated based on data for a previous frame of residuals. When the temporal buffer 550 is applied, the contents of the buffer are added to the second set of residuals. In Figure 5B, the contents of the temporal buffer 550 are added to the output of the level 2 decoding component 548 at a third summation component 594. In other examples, the contents of the temporal buffer may represent any set of intermediate decoding data and as such the third summation component 586 may be moved appropriated to apply the contents of the buffer at an appropriate stage (e.g. if the temporal buffer is applied at the dequantized coefficient stage, the third summation component 586 may be located before the inverse transform component 583). The temporal-corrected second set of residuals are then combined with the output of the up-sampling component 587 by the second summation component 558 to generate decoded video 560. The decoded video is at a level 2 spatial resolution, which may be higher than a level 1 spatial resolution. The second set of residuals apply a correction to the (viewable) up-sampled reconstructed video, where the correction adds back in fine detail and improves the sharpness of lines and features.
Figure SC shows a variation 590 of the third example decoder. In this case, temporal prediction control data is received by a temporal prediction component 585 from headers 556. The temporal prediction component 585 controls both the level 1 and level 2 temporal prediction, but in other examples separate control components may be provided for both levels if desired. Figure SC shows how the reconstructed second set of residuals that are input to the second summation component 558 may be fed back to be stored in the level 2 temporal buffer for a next frame (the feedback is omitted from Figure 5B for clarity). A level 1 temporal buffer 591 is also shown that operates in a similar manner to the level 2 temporal buffer 550 described above and the feedback loop for the buffer is shown in this Figure. The contents of the level 1 temporal buffer 591 are added into the level 1 residual processing pipeline via a fourth summation component 595. Again, the position of this fourth summation component 595 may vary along the level 1_ residual processing pipeline depending on where the temporal prediction is applied (e.g. if it is applied in transformed coefficient space, it may be located before the level 1 inverse transform component 573.
Figure SC shows two ways in which temporal control information may be signalled to the decoder. A first way is via headers 556 as described above. A second way, which may be used as an alternative or additional signalling pathway is via data encoded within the residuals themselves. Figure 5C shows a case whereby data 592 may be encoded into an HI-1 transformed coefficient and so may be extracted following entropy decoding by the entropy decoding component 581. This data may be extracted from the level 2 residual processing pipeline and passed to the temporal prediction component 585.
In general, the enhancement encoding and/or decoding components described herein are low complexity (e.g. as compared to schemes such as SVC and SHVC) and may be implemented in a flexible modular manner. Additional filtering and other components may be inserted into the processing pipelines as determined by required implementations.
The level 1 and level 2 components may be implemented as copies or different versions of common operations, which further reduces complexity. The base codec may be operated as a separate modular black-box, and so different codecs may be used depending on the implementation.
The data processing pipelines described herein may be implemented as a series of nested loops over the dimensions of the data. Subtractions and additions may be performed at a plane level (e.g. for each of a set of colour planes for a frame) or using multidimensional arrays (e.g. X by Y by C arrays where C is a number of colour channels such as YUV or RGB). In certain cases, the components may be configured to operate on ii by ii coding units (e.g. 2x2 or 4x4), and as such may be applied on parallel on the coding units for a frame. For example, a colour plane of a frame of input video may be decomposed into a plurality of coding units that cover the area of the frame. This may create multiple small one-or two-dimension arrays (e.g. 2x2 or 4x1 arrays or 4x4 or 16x1 arrays), where the components are applied to these arrays. As such, reference to a set of residuals may include a reference to a set of small one-or two-dimension arrays where each array comprises integer element values of a configured bit depth.
Each enhancement stream or both enhancement streams may be encapsulated into one or more enhancement bitstreams using a set of Network Abstraction Layer Units (NALUs). The NALUs are meant to encapsulate the enhancement bitstream in order to apply the enhancement to the correct base reconstructed frame. The NALU may for example contain a reference index to the NALU containing the base decoder reconstructed frame bitstream to which the enhancement has to be applied. In this way, the enhancement can be synchronised to the base stream and the frames of each bitstream combined to produce the decoded output video (i.e. the residuals of each frame of enhancement level are combined with the frame of the base decoded stream). A group of pictures may represent multiple NALUs.
further De Yeti 111071 o Proce ssin Corn)onetrft It was noted above how a set of processing components or tools may be applied to each of the enhancement streams (or the input video) throughout encoding and/or decoding. These processing components may be applied as modular components. They may be implemented in computer program code, i.e. as executed by one or more processors, and/or configured as dedicated hardware circuitry, e.g. as separate or combined Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs). The computer program code may comprise firmware for an embedded device or part of a codec that is used by an operating system to provide video rendering services.
The following provides a brief summary each of the tools and their functionality within the overall process as illustrated in Figures 1 to 5C.
Down-sampling: The down-sampling process is applied by a down-sampling component in the examples (e.g. 104, 304 and 404). The down-sampling process is applied to the input video to produce a down-sampled video to be encoded by a base codec. The down-sampling can be done either in both vertical and horizontal directions, or alternatively only in the horizontal direction. A down-sampling component may further be described as a down-scaler.
Level-1 (L-1) encoding: The input to this component, which is shown as 122 in Figure 1 comprises the first set of residuals obtained by taking the difference between the decoded output of the base codec and the down-sampled video. The first set of residuals are then transformed, quantized and encoded as further described below.
Transform: In certain examples, there are two types of transforms that could be used in by the transform components (e.g, transform components 122, 322, and/or 341). The transform may be a directional decomposition. The transform may act to decorrelate the residual values in a coding unit (e.g. a small n by n block of elements). A transform may be applied as a matrix transformation, e.g. a matrix multiplication applied to a flattened array representing the coding unit.
In one case, the two types of transformation may correspond to two different sizes of transformation kernel. The size of the coding unit may thus be set based on the size of the transformation kernel. A first transform has a 2x2 kernel which is applied to a 2x2 block of residuals. The resulting coefficients are as follows: Coo /1 1 1 1\ /Roo \ Ccu) _ 1 -1 1 -1 R01 C10 1 1 -1 -1 R10 ( C11 \ 1 -1 -1 1 \R, A second transform has a 4x4 kernel which is applied to a 4x4 block of esiduals. The resulting coefficients are as follows: c" CO2 co3 C10 C11 C12 C13 C20 C21 C22 C23 C30 C31 C32 \ C33/ S 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 1 1 -1 -1 1 1 -1 1 -1 1 -1 1 -1 1 1 -1 -1 1 1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 1 1 \ -1 7R00 R0i R02 1 1 -1 -1 1 -1 -1 1 1 1 -1 -1 1 1 -1 -1 1 1 1 -1 -1 RO3 R1O Ril 1 1 1 -1 1 -1 1 -1 1 1 1 -1 1 -1 1 1 R12 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 Ri3 1 -1 1 1 1 1 1 1 -1 1 -1 1 1 1 1 -1 1 1 1 -1 1 -1 1 -1 1 -1 -1 1 -1 1 R20 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 -1 1 1 1 -1 1 -1 -1 1 -1 -1 1 R21 1 -1 1 1 -1 -1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 -1 -1 1 / R22 -1 1 -1 -1 1 -1 1 -1 1 -1 1 1 -1 1 -1 1 -1 1 R23 -1 -11 1 -1 -1 -1 R30 1 1 -1 -1 -1 -1 R31 1 1 1 1 -1 1 R32 1 -1 1 1 -1 1 \ R33/ 1 -1 -1 1 1 -1 1 1 1 1 1 1 -1 -1 -1 1 -1 -1 1 -1 -1 1 These transformation matrices may comprise integer values in the range {-1, 1) or {-1, 0, 1). This may simplify computations and allow for fast hardware implementations using addition and subtraction. The transformation matrices may comprise Hadamard matrices that have advantageous properties, such as orthogonal rows and being self-inverse (i.e. the inverse transformation is the same as the forward transformation). If a Hadamard matrix is used an inverse transformation may be referred to as a transformation as the same matrix may be used for both forward and inverse transformations.
In certain cases, the transformation may include the application of a predicted residuals computation (i.e. the use of a predicted average as described in more detail with reference to later examples).
Quantization: A set of transformed residuals (referred to herein as "coefficients") are quantized using a quantize component such as components 323 or 343). An inverse quantize component, such as components 327, 364, 372, 572 and 582 may reconstruct a version of a value pre-quantization by multiplying the quantized value by a defined quantization factor. The coefficients may be quantized using a linear quantizer. The linear quantizer may use a dead zone of variable size. The linear quantizer may use a dead zone of different size vs. the quantization step and a non-centred dequantization offset. These variations are described in more detail with reference to later examples.
Entropy coding: A set of quantized coefficients may be encoded using an entropy coder such as components 325 or 344. There are two schemes of entropy coding. In a first scheme, the quantized coefficients are encoded using a RunLength-Encoder (RLE). In a second scheme, the quantized coefficients are first encoded using RLE, then the encoded output is processed using a Huffman Encoder.
Residual mode (EM) selection: If a residual (filtering) mode (RM) has been selected, the first set of residuals (i.e. level I) may be further ranked and selected in order to determine which residuals should be transformed, quantized and encoded. Residual filtering may be performed by one or more of the components 321 or 340, e.g. under control of a control component such as 150 or 350. Filtering of residuals may be performed anywhere in the residual processing pipelines but preferably it is preformed prior to entropy encoding.
Temporal selection mode: If a temporal selection mode is selected, e.g. by a component such as 152 or 352, the encoder may modify the coefficients (i.e. the transformed residuals or data derived from these) by subtracting the corresponding coefficients derived from a temporal buffer, such as 345 or 361. This may implement temporal prediction as described below. The decoder may then modify the coefficients by adding the corresponding coefficients derived from a temporal buffer, such as one of components 230, 250, 530, 550 or 591.
Level-1 (L-1) decoding: This is shown as components 228 and 528. The input to this tool comprises the encoded level 1 stream 226 or 526 (i.e. L-1 encoded residuals), which are passed through an entropy decoder (such as 571), a dequantizer (such as 572) and an inverse transform module (such as 573). The operations performed by these modules are the inverse operations performed by the modules described above. If the temporal selection mode has been selected, the residuals may be in part predicted from co-located residuals from a temporal buffer. Deblocking and residual filters: In certain cases, if a 4x4 transform is used, the decoded residuals may be fed to a filter module or deblocking filter such as 130, 232, 330 or 535. The deblocking operates on each block of inversely transformed residuals by applying a mask whose weights can be specified. The general structure of the mask is as follows: al313 a 1311p pit p a PP a where 0 < a < 1 and 0 < 13 < 1. The weights may be specified within control signalling associated with the bitstream or may be retrieved from a local memory. U p-sampling: The combination of the decoded (and filtered or deblocked, if applicable) first set of (L-1) residuals and base decoded video is up-sampled in order to generate an up-sampled reconstructed video. The up-sampling may be performed as described with respect to up-sampling components 134, 234, 334, 434, 534 or 587. Examples of possible up-sampling operations are described in more details below. The up-sampling method may be selectable and signalled in the bytestream. It should be noted that in examples herein, the term "bytestream" or an alternative term such as stream, bitstream or NALU stream may be used as appropriate.
Level-2 (L-2) encoding: This is represented as components 142, 442 and 548. The input to this encoding operation comprises the second set of (L-2) residuals obtained by taking the difference between the up-sampled reconstructed video and the input video. The second set of (L-2) residuals are then transformed, quantized and encoded as further described herein. The transform, quantization and encoding are performed in the same manner as described in relation to L-1 encoding. If a residual filtering mode has been selected, the second set of residuals are further ranked and selected in order to determine which residuals should be transformed and encoded.
Predicted coefficient (or predicted average) mode: If the predicted coefficient mode is selected, the encoder may modify the transformed coefficient COO, which is also referred to herein as A, i.e. an average value (which may be Ax for a 4x4 transform as described in more detail below). If the 2x2 transform is used, COO may be modified by subtracting the value of the up-sampled residual which the transformed block of residuals is predicted from. If the 4x4 transform is used, COO may be modified by subtracting the average value of the four up-sampled residuals which the transformed block of residuals is predicted from. The predicted coefficient mode may be implemented at the decoder using the modified up-sampling as described herein.
Level-2 (L-2) decoding: This is shown as components 248 and 548. The input to this decoding comprises the encoded second set of (L-2) residuals. The decoding process of the second set of residuals involves an entropy decoder (e.g. 581), a de-quantizer (e.g. 582) and an inverse transform module (e.g. 583). The operations performed by these components are the inverse operations performed by the encoding components as described above. If the temporal selection mode has been selected, the residuals may be in part predicted from co-located residuals from a temporal buffer.
Modified up-sampling: The modified up-sampling process comprises two steps, the second depending on a signalling received by the decoder. In a first step, the combination of the decoded (and deblocked, if applicable) first set of (L-1) residuals and base decoded video (L-1 reconstructed video) is up-sampled to generate an up-sampled reconstructed video. If the predicted coefficient mode has been selected, then a second step is implemented. In particular, the value of the element in the L-1 reconstructed value from which a 2x2 block in the up-sampled reconstructed video was derived is added to said 2x2 block in the up-sampled reconstructed video. In general, the modified up-sampling may be based on the up-sampled reconstructed video and on the pre-up-sampling reconstructed lower resolution video as described with reference to Figure 4.
Dithering: In certain examples, a last stage of dithering may be selectively applied to the decoded video 260 or 560 in Figures 2 and SA to SC. Dithering may comprise the application of small levels of noise to the decoded video. Dithering may be applied by adding random or pseudo-random numbers within a defined range to the decoded video. The defined range may be configured based on local and/or signalled parameters. The defined range may be based on a defined minimum and maximum value, and/or a defined scaling factor (e.g. for an output of a random number generator within a specific range). Dithering may reduce the visual appearance of quantization artefacts as is known in the art.
Example of 4x4 Residual Coding Unit and ides Figure 6A shows an example 600 of a set of residuals 610 arranged in a 4x4 coding unit 620.
Example Picture Formats Figures 7A to 7C show a number of ways in which colour components may be organised to form a picture or frame within a video.
Example Bitstream Processing Figure 8 shows an example method 800 that may be used to process a bitstream that has been encoded using the example encoders or encoding processes described herein. The method 800 may be implemented by an example decoder, such as 200 or 500 in Figures 2 and 5. The method 800 shows an example flow which facilitates separation of an enhancement bitstream.
At block 802, the method 800 comprises receiving an input bitstream 802. At block 804, a NALU start is identified within the received bitstream. This then allows identification of an entry point at block 806. The entry point may indicate which version of a decoding process should be used to decode the bitstream. Next, at block 808, a payload enhancement configuration is determined. The payload enhancement configuration may indicate certain parameters of the payload. The payload enhancement configuration may be signalled once per stream. Optionally, the payload enhancement configuration may be signalled multiple per group of pictures or for each NALU. The payload enhancement configuration may be used to extract payload metadata at block 810.
At block 812, a start of a group of pictures (GOP) is identified. Although the term group of pictures is used it will be understood that this term is used to refer to a corresponding structure to that of the base stream but not to define a particular structure on the enhancement stream. That is, enhancement streams may not have a GOP structure in the strict sense and strict compliance with GOP structures of the art is not required. If payload metadata is included, it may be included after the payload enhancement configuration and before the set of groups of pictures. Payload metadata may for example include HDR information. Following block 812, a GOP may be retrieved. At block 814, if the NALU relates to a first bitstream frame, the method may further comprise retrieving a payload global configuration at block 816. The payload global configuration may indicate parameters of the decoding process, for example, the payload global configuration may indicate if a predicted residual mode or temporal prediction mode was enabled in the encoder (and should be enabled at the decoder), thus the payload global configuration may indicate if a mode should be used in the decoding method. The payload global configuration may be retrieved once for each GOP. At block 818, the method 800 may further comprise retrieving a set of payload decoder control parameters which indicate to the decoder parameters to be enabled during decoding, such as dithering or up-sampling parameters. The payload decoder control parameters may be retrieved for each GOP. At block 820, the method 800 comprises retrieving a payload picture configuration from the bitstream. The payload picture configuration may comprise parameters relating to each picture or frame, for example, quantization parameters such as a step width. The payload picture configuration may be retrieved once for each NALU (that is, once for each picture or frame). At block 822, the method 800 may then further comprise retrieving a payload of encoded data which may comprise encoded data of each frame. The payload of encoded data may be signalled once for each NALU (that is, once for each picture or frame). The payload of encoded data may comprise a surface, plane or layer of data which may be separated into chunks as described with reference to Figures 9A, as well as the examples of Figures 21A and 21B. After the payload of encoded data is retrieved, the NALU may end at block 824.
If the GOP also ends, the method may continue to retrieve a new NALU for a new GOP. If the NALU is not the first bitstream frame (as the case here), then the NALU may then, optionally, retrieve an entry point (i.e, an indication of a software version to be used for decoding). The method may then retrieve a payload global configuration, payload decoder control parameters and payload picture configuration. The method may then retrieve a payload of encoded data. The NALU will then end.
If at block 814, the NALU does not relate to a first bitstream frame, then blocks 828 to 838 may be performed. Optional block 828 may be similar to block 806. Blocks 830 to 838 may be performed in a similar manner to blocks 816 to 824.
At blocks 840 and 842, after each NALU has ended, if the GOP has not ended, the method 800 may comprise retrieving a new NALU from the stream at block 844. For each second and subsequent NALU of each GOP, the method 800 may optionally retrieve an entry point indication at block 846, in a similar manner to blocks 806 and 828. The method 800 may then comprise retrieving payload picture configuration parameters at block 848 and a payload of encoded data for the NALU at block 850. Blocks 848 to 852 may thus be performed in a similar manner to blocks 820 to 824 and blocks 834 to 838. The payload encoded data may comprise tile data.
As above, if the NALU is not the last NALU for the GOP, the method may comprise retrieving a further NALU (e.g. looping around to block 844). If the NALU is the last NALU in the GOP, the method 800 may proceed to block 854. If there are further GOPs, the method may loop around to block 812 and comprise retrieving a further GOP and performing blocks 814 onwards as previously described. Once all GOPs have been retrieved the bitstream ends at block 856.
Example Form of Encoded Payload Data Figure 9A shows how encoded data 900 within an encoded bitstream may be separated into chunks. More particularly, Figure 9A shows an example data structure for a bitstream generated by an enhancement encoder (e.g. level 1 and level 2 encoded data). A plurality of planes 910 are shown (of number nPlanes). Each plane relates to a particular colour component. In Figure 9A, an example with YUV colour planes is shown (e.g. where a frame of input video has three colour channels, i.e. three values for every pixel). In the examples, the planes are encoded separately.
The data for each plane is further organised into a number of levels (nLevets). In Figure 9A there are two levels, relating to each of enhancement levels 1 and 2. The data for each level is then further organised as a number of layers (nuoiers). These layers are separate from the base and enhancement layers; in this case, they refer to data for each of the coefficient groups that result from the transform. For example, an 2x2 transform results in four different coefficients that are then quantized and entropy encoded and an 4x4 transform results in sixteen different coefficients that are then likewise quantized and entropy encoded. In these cases, there are thus respectively 4 and 16 layers, where each layer represents the data associated with each different coefficient. In cases, where the coefficients are referred to as A, H, V and D coefficients then the layers may be seen as A, H, V and D layers. In certain examples, these "layers" are also referred to as "surfaces", as 4.3 they may be viewed as a "frame" of coefficients in a similar manner to a set of two-dimensional arrays for a set of colour components.
The data for the set of layers may be considered as "chunks". As such each payload may be seen as ordered hierarchically into chunks. That is, each payload is grouped into planes, then within each plane each level is grouped into layers and each layer comprises a set of chunks for that layer. A level represents each level of enhancement (first or further) and layer represents a set of transform coefficients. In any decoding process, the method may comprise retrieving chunks for two level of enhancement for each plane. The method may comprise retrieving 4 or 16 layers for each level, depending on size of transform that is used. Thus, each payload is ordered into a set of chunks for all layers in each level and then the set of chunks for all layers in the next level of the plane. Then the payload comprises the set of chunks for the layers of the first level of the next plane and so on. As such, in the encoding and decoding methods described herein, the pictures of a video may be partitioned, e.g. into a hierarchical structure with a specified organisation.
Each picture may be composed of three different planes, organized in a hierarchical structure. A decoding process may seek to obtain a set of decoded base picture planes and a set of residuals planes. A decoded base picture corresponds to the decoded output of a base decoder. The base decoder may be a known or legacy decoder, and as such the bitstream syntax and decoding process for the base decoder may be determined based on the base decoder that is used. In contrast, the residuals planes are new to the enhancement layer and may be partitioned as described herein. A "residual plane.' may comprise a set of residuals associated with a particular colour component. For example, although the planes 910 are shown as relating to YUV planes of an input video, it should be noted the data 920 does not comprise YUV values, e.g. as for a comparative coding technology.
Rather, the data 920 comprises encoded residuals that were derived from data from each of the YUV planes.
In certain examples, a residuals plane may be divided into coding units whose size depends on the size of the transform used. For example, a coding unit may have a dimension of 2x2 if a 2x2 directional decomposition transform is used or a dimension of 4x4 if a 4x4 directional decomposition transform is used. The decoding process may comprise outputting one or more set of residuals surfaces, that is one or more sets of collections of residuals. For example, these may be output by the level 1 decoding component 228 and the level 2 decoding component 248 in Figure 2. A first set of residual surfaces may provide a first level of enhancement. A second set of residual surfaces may be a further level of enhancement. Each set of residual surfaces may combine, individually or collectively, with a reconstructed picture derived from a base decoder, e.g. as illustrated in the example decoder 200 of Figure 2.
Excrithes Figures 9B to 9J and the description below relate to possible up-sampling approaches that may be used when implementing the up-sampling components as described in examples herein, e.g. up-sampling components 134, 234, 334, 434, 534 or 587 in Figures Ito 5C.
Figs. 9B and 9C show two examples of how a frame to be up-sampled may be divided. Reference to a frame may be taken as reference to one or more planes of data, e g in YUV format. Each frame to be up-sampled, called a source frame 910, is divided into two major parts, namely a centre area 910C, and a border area 910B. Fig. 9B shows an example arrangement for bilinear and bicubic up-sampling methods. In Fig. 9B, the border area 910B consists of four segments, namely top segment 91OBT, left segment 910BL, right segment 910BR, and bottom segment 910BB. Fig. 9C shows an example arrangement for a nearest up-sampling method. In Fig. 9C, the border area 910B consists of 2 segments; right segment 910BR and bottom segment 910BB. In both examples, the segments may be defined by a border-size parameter (BS), e.g. which sets a width of the segment (i.e. a length that the segment extends into the source frame from an edge of the frame). The border-size may be set to be 2 pixels for bilinear and bicubic up-sampling methods or 1 pixel for the nearest method.
In use, determining whether a source frame pixel is located within a particular segment may be performed based on a set of defined pixel indices (e.g. in x and y directions). Performing differential up-sampling based on whether a source frame pixel is within a centre area 910C or a border area 910B may help avoid border effects that may be introduced due to the discontinuity at the source frame edges.
Nearest up-sampling Figure 9C provides an overview of how a frame is up-sampled using a nearest up-sampling method. In Figure 9C, a source frame 920 is up-sampled to become destination frame 922. The nearest up-sampling method up-samples by copying a current source pixel 928 onto a 2x2 destination grid 924 of destination pixels, e.g. as indicated by arrows 925. Centre and edge pixels are respectively shown as 926 and 927. The destination pixel positions are calculated by doubling the index of the source pixel 928 on both axes and progressively adding +1 to each axis to extend the range to cover 4 pixels as shown on the right-hand side of Figure 9C. For example, the value of source pixel 928 with index location (x=6, y=6) is copied to destination grid 924 comprising pixels with index locations (12, 12) (13, 12) (12, 13) and (13, 13). Each pixel in the destination grid 924 takes the value of the source pixel 928.
The nearest method of up-sampling provides enables fast implementations that may be preferable for embedded devices with limited processor resources. However, the nearest method has a disadvantage that blocking, or "pixilation", artefacts may need to be corrected by the level 2 residuals (e.g. that result in more non-zero residual values that require more bits for transmission following entropy encoding). In certain examples described below, bilinear and bicubic up-sampling may result in a set of level 2 residuals that can be more efficiently encoded, e.g. that require fewer bits following quantization and entropy encoding. For example, bilinear and bicubic up-sampling may generate an up-sampled output that more accurately matches the input signal, leading to smaller level 2 residual values.
Bilinear up-sampling Figures 9E, 9F and 9G illustrate a bilinear up-sampling method. The bilinear up-sampling method can be divided into three main steps. The first step involves constructing a 2x2 source grid 930 of source pixels 932 in the source frame. The second step involves performing a bilinear interpolation. The third step involves writing the interpolation result to destination pixels 936 in the destination frame.
Bilinear up-sampling -Step Source pixel grid Fig. 9E illustrates a construction example of the 2x2 source grid 930 (which may also be called a bilinear grid). The 2x2 source grid 930 is used instead of a source pixel 932 because the bilinear up-sampling method performs up-sampling by considering the values of the nearest 3 pixels to a base pixel 932B, i.e. the nearest 3 pixels falling within the 2x2 source grid 930. In this example, the base pixel 932B is at the bottom right of the 2x2 source grid 930, but other positions are possible. During the bilinear up method the 2x2 source grid 930 may be determined for multiple source frame pixels, so as to iteratively determine destination frame pixel values for the whole destination frame. The base pixel 932B location is used to determine an address of a destination frame pixel.
Bilinear up-sampling -Step 2: Bilinear interpolation Fig. 9F illustrates a bilinear coefficient derivation. In this example, the bilinear interpolation is a weighted summation of the values of the four pixels in the 2x2 source grid 930. The weighted summation is used as the pixel value of a destination pixel 936 being calculated. The particular weights employed are dependent on the position of the particular destination pixel 936 in a 2x2 destination grid 935. In this example, the bilinear interpolation applies weights to each source pixel 932 in the 2x2 source grid 930, using the position of the destination pixel 936 in the 2x2 destination grid 935. For example, if calculating the value for the top left destination pixel (shown as 936/936B in Fig. 9F), then the top left source pixel value will receive the largest weighting coefficient 934 (e.g. weighting factor 9) while the bottom right pixel value (diagonally opposite) will receive the smallest weighting coefficient (e.g. weighting factor 1), and the remaining two pixel values will receive an intermediate weighting coefficient (e.g. weighting factor 3). This is visualized in Fig. 9F with the weightings shown in the 2x2 source grid 930.
For the pixel on the right of 936/936B within the 2x2 destination grid 935, the weightings applied to the weighted summation would change as follows: the top right source pixel value will receive the largest weighting coefficient (e.g. weighting factor 9) while the bottom left pixel value (diagonally opposite) will receive the smallest weighting coefficient (e.g. weighting factor 1), and the remaining two pixel values will receive an intermediate weighting coefficient (e.g. weighting factor 3).
In Figure 9F, four destination pixels are computed for the base pixel 932B based on the 2x2 source grid 930 but each destination pixel is determined using a different set of weights. These weights may be thought of as an up-sampling kernel. In this way, there may be four different sets of four weighted values that are applied to the original pixel values within the 2x2 source grid 930 to generate the 2x2 destination grid 935 for the base pixel 932B. After the four destination pixel values are determined, another base pixel is selected with a different source grid and the process begins again to determine the next four destination pixel values. This may be iteratively repeated until pixel values for the whole destination (e.g. up-sampled) frame are determined. The next section describes in more detail the mapping of these interpolated pixel values from the source frame to the destination frame.
Bilinear up-sampling -Step 3: Destination pixels Figure 9G shows an overview of the bilinear up-sampling method comprising a source frame 940, a destination frame 942, an interpolation module 944, a plurality of 2x2 source grids 930 (a,b,c,d,hj), and a plurality of 2x2 destination grids 935 (d,e,h,k). The source frame 940 and destination frame 942 have indexes starting from 0 on each column and row for pixel addressing (although other indexing schemes may be used).
In general, each of the weighted averages generated from each 2x2 source grid 930 is mapped to a corresponding destination pixel 936 in the corresponding 2x2 destination grid 935. The mapping uses the source base pixel 932B of each 2x2 source grid 930 to map to a corresponding destination base pixel 936B of the corresponding 2x2 destination grid 942, unlike the nearest sampling method. The destination base pixel 936B address is calculated from the equation (applied for both axes): Dst base ctddr = (Syre base address x 2)-1 Also, the destination pixels have three corresponding destination sub-pixels 721S calculated from the equation: Dst sub addr = Ds! base addr+1 (for both axes) And so, each 2x2 destination grid 935 generally comprises a destination base pixel 936B together with three destination sub pixels 9365, one each to the right, below, and diagonally down to the right of the destination base pixel, respectively. This is shown in Figure 9F. However, other configurations of destination grid and base pixel are possible.
The calculated destination base and sub addresses for destination pixels 936B and 9365 respectively can be out of range on the destination frame 942. For example, pixel A (0, 0) on source frame 940 generates a destination base pixel address (-1, -1) for a 2x2 destination grid 935. Destination address (-1, -1) does not exist on the destination frame 942. When this occurs, writes to the destination frame 942 are ignored for these out of range values. This is expected to occur when up-sampling the border source frames.
However, it should be noted that in this particular example one of the destination sub-pixel addresses (0, 0) is in range on the destination frame 942. The weighted average value of the 2x2 source grid 930 (i.e. based on the lower left pixel value taking the highest weighting) will be written to address (0, 0) on the destination frame 942. Similarly, pixel B (1, 0) on source frame 940 generates a destination base pixel address (1, -1) which is out of range because there is no -1 row. However, the destination sub-pixel addresses (1, 0) and (2, 0) are in range and the corresponding weighted sums are each entered into the corresponding addresses. Similar happens for pixel C, but only the two values on the column 0 are entered 0.e. addresses (0, 1) and (0, 2)). Pixel D at address (1, 1) of the source frame contributes a full 2x2 destination grid 935d based on the weighted averages of source grid 930d, as do pixels E, H and K, with 2x2 destination grids 935e, 935h, and 935k and corresponding source grids 930e, 930h and 930k illustrated in Figure 90.
As will be understood, these equations usefully deal with the border area 910B and its associated segments, and ensure that when the centre 910C segment is up-sampled it will remain in the centre of the destination frame 942. Any pixel values that are determined twice using this approach, e.g. due to the manner in which the destination sub-pixels are determined, may be ignored or overwritten Furthermore, the ranges for border segments 910BR and 910BB are extended by +I_ in order to fill all pixels in the destination frame. In other words, the source frame 940 is extrapolated to provide a new column of pixels in border segment 910BR (shown as index column number 8 in Figure 90), and a new row of pixels in border segment 910BB (shown as index row number 8 in Figure 90).
Cubic up-sampling Figures 9H, 91 and 9J together illustrate a cubic up-sampling method, in particular, a bicubic method. The cubic up-sampling method of the present example may be divided into three main steps. The first step involves constructing a 4x4 source grid 962 of source pixels with a base pixel 964B positioned at the local index (2, 2) within the 4x4 source grid 815. The second step involves performing a bicubic interpolation. The third step involves writing the interpolation result to the destination pixels.
Cubic up-sampling -Step 1: Source pixel grid Figure 9H shows a 4x4 source grid 962 construction on source frame 960 for an in-bound grid 962i and separately an out-of-bound grid 962o. In this example, "in-bound" refers to the fact that the grid covers source pixels that are within the source frame, e.g. the centre region 910C and the border regions 910B; "out-of-bound" refers to the fact that the grid includes locations that are outside of the source frame. The cubic up-sampling method is performed by using the 4x4 source grid 962 which is subsequently multiplied by a 4x4 kernel. This kernel may be called an up-sampling kernel. During the generation of the 4x4 source grid 962, any pixels which fall outside the frame limits of the source frame 960 (e.g. those shown in out of bounds grid 9620) are replaced with the value of the source pixels 964 the at the boundary of the source frame 960.
Cubic up-sampling -Step 2: Bicubic interpolation The kernels used for the bicubic up-sampling process typically have a 4x4 coefficient grid. However, the relative position of the destination pixel with reference to the source pixel will yield a different coefficient set, and since the up-sampling is a factor of two in this example, there will be 4 sets of 4x4 kernels used in the up-sampling process.
These sets are represented by a 4-dimensional grid of coefficients (2 x 2 x 4 x 4). For example, there will be one 4x4 kernel for each destination pixel in a 2x2 destination grid, that represents a single up-sampled source pixel 964B.
In one case, the bicubic coefficients may be calculated from a fixed set of parameters. In one case, this comprises a core parameter (bicubic parameter) and a set of spline creation parameters. In an example, a core parameter of -0.6 and four spline creation parameters of [1.25, 0.25, -0.75 & -1.75] may be used. An implementation of the filter may use fixed point computations within hardware devices.
Cubic up-sampling -Step 3: Destination pixels Figure 9J shows an overview of the cubic up-sampling method comprising a source frame 972, a destination frame 980, an interpolation module 982, a 4x4 source grid 970, and a 2x2 destination grid 984. The source frame 972 and destination frame 980 have indexes starting from 0 on each column and row for pixel addressing (although other addressing schemes may be used).
Similarly to the bilinear method, the bicubic destination pixels have a base address calculated from the equation for both axes: Dst base culdr = (Src base address x 2)-1 Also, the destination addresses are calculated from: Dst sub addr = Dst base ada'r+1 (for both axes) And so, as for the bilinear method, each 2x2 destination grid 984 generally comprises a destination base pixel together with three destination sub pixels, one each to the right, below, and diagonally down to the right of the destination base pixel, respectively. However, other configurations of destination grid and base pixel are possible.
Again, these equations ensure that when the centre segment is up-sampled it will remain in the centre of the destination frame. Furthermore, the ranges for border segments 510BR and 510BB are extended by +1 in order to fill all pixels in the destination frame 980 in the same way as described for the bilinear method. Any pixel values that are determined twice using this approach, e.g. due to the manner in which the destination sub-pixels are determined, may be ignored or overwritten. The calculated destination base and sub addresses can be out of range When this occurs, writes to the destination frame are ignored for these out of range values. This is expected to occur when up-sampling the border area.
E vamkEncoding Figures 10A to 101 illustrate different aspects of entropy encoding. These aspects may relate to an entropy encoding performed, for example, by entropy encoding components 325, 344 in Figures 3A and 3B and/or an entropy decoding performed, for example, by entropy decoding components 571, 581 in Figures 5B and 5C.
Figure 10A illustrates one implementation 1000 of an example entropy decoding component 1003 (e.g. one or more of entropy decoding components 571, 581 in Figures 5B and SC). The entropy decoding component 1003 takes as inputs a set 1001 of entropy encoded residuals (Ae, He, Ve, De) 1002 and outputs a set 1006 of quantized coefficients 1007 (e.g. quantized transformed residuals in this illustrated example). The entropy encoded residuals 1002 may comprise a received encoded level 1 or level 2 stream (e.g. 226 or 246 as shown in Figure 2). The entropy decoding component 1003 comprises a Huffinan decoder 1004 followed by a run-length decoder 1005. The Huffman decoder 1004 receives the encoded enhancement stream that is encoded using Huffman encoding and decodes this to produce a run-length encoded stream. The run-length encoded stream is then received by the run-length decoder 1005, which applies run-length decoding to generate the quantized coefficients 1007. In Figure 10A, a 2x2 transform example is shown, hence, the coefficients are shown as A, H, V and D coefficients from a 2x2 directional decomposition.
An entropy encoding component may be arranged in an inverse manner to the implementation 1000. For example, an input of an entropy encoding component may comprise a surface (e.g. residual data derived from a quantized set of transformed residuals) and may be configured to an entropy encoded version of the residual data, e.g. data in the form of the encoded stream data 1001 (with, for a 2x2 example, Ae, He, ye, De encoded and quantized coefficients).
Example Entropy Encoding -Header Formats Figures 10B to 10E illustrate a specific implementation of the header formats and how the code lengths may be written to a stream header depending on the amount of nonzero codes.
Figure 10B shows a prefix coding (i.e. Huffman) decoder stream header 1010 for a case where there are more than 31 non-zero codes. A first 5 bits indicate a minimum length for a prefix code. A second 5 bits indicate a maximum length for a prefix code. A third bit then provides a compression flag 1011 that indicates whether compression is being applied. There then follow 3 symbols in the example of Figure 10B: a first non-zero symbol 1014, a second zero symbol 1015 and a third non-zero symbol 1016. Non-zero length flags 1017 comprise one bit flag indicating whether each symbol is non-zero; the flags for the first and third symbols 1014, 1016 are 1 whereas the flag for the second symbol 1015 is 0. Each non-zero symbol indicates a code length for prefix coding that is equal to a code length minus the minimum length (e.g. as sent with the first 5 bits). The code lengths may be used to initialise the prefix (i.e. Huffman) decoder, such as 1004 in Figure 10A. In this example, the number of code length bits may equal: log2(max length -min length+1). Hence, in the example of Figure 10A there are more than 31 non-zero values in the data and the header includes a minimum code length and a maximum code length. The code length for each symbol is then sent sequentially. A flag indicates that the length of the symbol is non-zero. The bits of the code length are then sent as a difference between the code length and the minimum signalled length. This reduces the overall size of the header.
Figure 10C illustrates a header 1020 similar to Figure 10B but used where there are fewer than 31 non-zero codes. This may comprise a normal case. The header 1020 again has a first 5 bits that indicate a minimum length, a subsequent 5 bits that indicate a maximum length, and a compression flag 1021 (e.g. that may be 0 or 1 to indicate a compression as is described elsewhere herein). The header 1020 then further includes the number of symbols in the data, followed by a set of consecutive symbols 1024, 1025. Each symbol may comprise 8 bits that indicate the symbol value followed by the length of the codeword for that symbol, again sent as a difference between the length and the minimum length as described with respect to Figure 10A.
In both cases, the header 1010 or 1020 is used to initialise the entropy decoding component (in particular the Huffman or prefix coding decoder) by reading the code lengths from the header.
Figures IOD and 10E illustrate further headers 1030 and 1040 that may be sent in outlying cases. For example, where the frequencies are all zero, the stream header may comprise the header 1030 be as illustrated in Figure 10D where the 5 bit minimum and maximum lengths (1031 and 1032) are both set to 31 (i.e. to a maximum value) to indicate the special situation. Figure 10E shows a header 1040 that may be used where there is only one code in the Huffman tree. In this case, a 0 (i.e. minimum) value in the minimum and maximum length fields (1041 and 1042) indicates the one-code special situation and then these field values are followed by the symbol value to be used 1043. In this latter example, where there is only one symbol value, this may indicate that there is only one data value in the set of quantized coefficients data.
Example Entropy Encoding RLE State Machine Figure 10F shows a state machine 1050 that may be used be a run length decoder, such as run length decoder 1005 in Figure 10A. The run length decoder is configured to read a set of nm length encoded data byte by byte. The state machine 1050 has three states: a run-length coding (RLC) residual least-significant bit (LSB) case 1051; a run-length coding (REC) residual most-significant bit (MSB) case 1052; and a run-length coding (RIX) zero run case 1053. Different run-length encoders and decoders may be used for different types of data. For example, different run-length encoding and decoding configurations may be used for each of: coefficient groups, temporal signal coefficient groups, and entropy encoded tiles of data.
In certain examples, the prefix or Huffman coding may be optionally and signalled in the headers (e.g. using an rle only flag). The input of the RLE decoder may comprise a byte stream of Huffman decoded data if Huffman coding is used (e.g. the rle only flag is equal to zero) or may comprise a byte stream of raw data if Huffman coding is not used (e.g. if the flag rle only is equal to 1). The output of the RLE decoder may comprise a stream of quantized transform coefficients. In one case, these coefficients may belong to a chunk as indicated in Figure 9A (e.g. indexed by plane, level and layer -as pointed by the variables planeldx, levellndex and /aye/Index described in later examples) or comprise a stream of temporal signals (a temporal chunk that forms part of a temporal layer that is used to implement temporal prediction -this is described with reference to later examples). The state machine 1050 of Figure 1OF may be used to implement a RLE decoder for coefficient groups. The run length state machine 1050 may be used by the Huffman encoding and decoding processed to know which Huffman code to use for the current symbol or code word. The RLE decoder uses the run length state machine 1050 to decode sequences of zeros. It also decodes the frequency tables used to build the Huffman trees for the Huffman decoding.
By configuration, the state of the first byte of data is guaranteed to be in the first state 1051 (i.e. a RLC residual LSB state). The RLE decoder uses the state machine 1050 to determine the state of the next byte of data based on the contents of the received stream. The current state tells the decoder how to interpret the current byte of data. Figures 106, 1011 and 101 shows how the RLE decoder of the present example is configured to interpret the byte.
As shown in Figure 10F, the state machine 1050 has three states: the RLC residual LSB state 1051: this is where the state machine 1050 starts. For bytes in a received stream, this state 1051 expects the 6 lesser significant bits (bits 6 to 1) to encode a non-zero element value. An example of a byte 1070 divided as expected by this state is shown in Figure 100. The run bit 1071 indicates that the next byte is encoding a count of a run of zeros. This is encoded in data portion 1072. The overflow bit 1073, which in this example is the least significant bit of the byte, is set if the element value does not fit within 6 bits of data (e.g. is set to 0 if there is no overflow and is set to 1 if there is overflow). If the run bit 1071 is 0 and the overflow bit 1073 is 0, then the state machine 1050 remains in the RLC residual LSB state 1051. When the overflow bit 1073 is set (e.g. is 1), as shown by the arrow 1074, the state of the next byte moves to the RLC residual MSB state 1052 as described below. The lower half of Figure 10G thus shows a byte in the RLC residual LSB state 1051 that causes a state transition. When the overflow bit is set, as shown at 1075, the next state cannot be a run of zeros and bit 7 can be used to encode data instead, e.g. as shown by the data portion 1076.
the RLC residual AISS slate: this state (shown as 1052) encodes bits 7 to 13 of element values that do not fit within 6 bits of data. Run length encoding of a byte 1080 for the RLC residual state is as shown in Figure 1011. A data portion 1082 fills the seven least significant bits. In this example, bit 7 indicated as run bit 1081 encodes whether the next byte is a run of zeros. If the mm bit is set (e.g. to 1), then the state transitions to the RLC zero run state 1053.
the RLC zero run state: this state (shown as 1053) encodes 7 bits of a zero run count. Run length coding of a byte 1085 for the RLC zero run state 1053 is shown in Figure 101. Again, a data portion 1087 is provided in the seven least signification bits. The most significant bit 1086 is a run bit. The run bit is high if more bits are needed to encode the count. If the run bit is high (e.g. 1) the state machine 1050 remains in the RLC zero run state 1053. If the run bit is low (e.g. 0), the state machine 1050 transitions to the RLC residual LSB state 1051. In the RLC residual LSB state 1051, if the run bit is high (e.g. 1) and the overflow bit is low (e.g. 0), then the state machine 1050 transitions from the RLC residual LSB state 1051 to the RLC zero run state 1053.
In examples, a frequency table is created for each state for use by the Huffman encoder. In order for the decoder to start on a known state, the first symbol in the encoded stream will always be a residual. Bits can of course be inverted (0/1, 1/0, etc.) without loss of functionality. Similarly, the locations within the symbols or bytes of the flags is merely illustrative.
Temporal Prediction and Signalling Certain variations and implementation details of the temporal prediction will now be described, including certain aspects of temporal signalling.
In certain examples described herein, information from two of more frames of video that relate to different time samples may be used. This may be described as a temporal mode, e.g. as it relates to information from different times. Not all embodiments may make use of temporal aspects. Components for temporal prediction are shown in the examples of Figures 1 to 5C. As described herein, a step of encoding one or more sets of residuals may utilise a temporal buffer that is arranged to store information relating to a previous frame of video. In one case, a step of encoding a set of residuals may comprise deriving a set of temporal coefficients from the temporal buffer and using the retrieved set of temporal coefficients to modify a current set of coefficients. -Coefficients", in these examples, may comprise transformed residuals, e.g. as defined with reference to one or more coding units of a frame of a video stream -approaches may be applied to both residuals and coefficients. In certain cases, the modifying may comprise subtracting the set of temporal coefficients from the current set of coefficients. This approach may be applied to multiple sets of coefficients, e.g. those relating to a level 1 stream and those relating to a level 2 stream. The modification of a current set of coefficients may be performed selectively, e.g. with reference to a coding unit within a frame of video data.
Temporal aspects may be applied at both the encoding and decoding stages. Use of a temporal buffer is shown in the encoder 300 of Figures 3A and 3B and in the decoder 580, 590 of Figures 5B and 5C. As described herein, prior to modifying a current set of coefficients, the current set of coefficients may be one or more of ranked and transformed.
In one case, dequantized transformed coefficients - -from a previous encoded (n-1) frame at a corresponding position (e.g. a same position or mapped position) are used to predict the coefficients Cx.y.i, in a frame to be encoded (n). If a 4x4 transform is used, x, y may be in the range [0,3]; if a 2x2 transform is used x,y may be in the range [0,1].
Dequantized coefficients may be generated by an inverse quantize block or operation. For example, in Figure 3B, dequantized coefficients are generated by inverse quantize component 372.
In certain examples, there may be at least two temporal modes.
* A first temporal mode that does not use the temporal buffer or that uses the temporal buffer with all zero values. The first temporal mode may be seen as an intra-frame mode as it only uses information from within a current frame. In the first temporal mode, following any applied ranking and transformation, coefficients may be quantized without modification based on information from one or more previous frames.
* A second temporal mode that makes use of the temporal buffer, e.g. that uses a temporal buffer with possible non-zero values. The second temporal mode may be seen as an inter-frame mode as it uses information from outside a current frame, e.g. from multiple frames. In the second temporal mode, following any applied ranking and transformation, previous frame dequantized coefficients may be subtracted from the coefficients to be quantized CK,v,ii,thier.= Gym -In one case, a first temporal mode may be applied by performing a subtraction with a set of zeroed temporal coefficients. In another case, the subtraction may be performed selectively based on temporal signalling data. Figures 11A and 11B show example operations in the encoder for two respective temporal modes. A first example 1100 in Figure 11A shows a set of coefficients being generated by an encoding component 1102 in a first temporal mode - . These are then passed for quantization. In Figure I IA, a set of coefficients in a second temporal mode Cx.v.n.mrer are produced by an encoding component 1112 by subtraction 1114 as described above and are then passed for quantization. The quantized coefficients in both cases are then encoded as per Figures 3A and 3B. It should be noted that in other examples, a temporal mode may be applied after quantization, or at another point in the encoding pipeline.
Each of the two temporal modes may be signalled. Temporal signalling may be provided between an encoder and a decoder. The two temporal modes may be selectable within a video stream, e.g. different modes may be applied to different portions of the video stream (e.g. different encoded pictures and/or different areas with a picture such as tiles). The temporal mode may also or alternatively be signalled for the whole video stream. Temporal signalling may form part of metadata that is transmitted to the decoder, e.g from the encoder. Temporal signalling may be encoded.
In one case, a global configuration variable may be defined for a video stream, e.g. for a plurality of frames within the video stream. For example, this may comprise a temporal enabled flag, where a value of 0 indicates the first temporal mode and a value of 1 indicates a second temporal mode. In other cases, as well or, or instead of the global configuration value, each frame or "picture" within a video stream may be assigned a flag indicating the temporal mode. If a temporal enabled flag is used as a global configuration variable this may be set by the encoder and communicated to the decoder.
In certain cases, one or more portions of a frame of a video stream may be assigned a variable that indicates a temporal mode for the portions. For example, the portions may comprise coding units or blocks, e.g. 2x2 or 4x4 areas that are transformed by a 2x2 or 4x4 transform matrix. In certain cases, each coding unit may be assigned a variable that indicates a temporal mode. For example, a value of 1 may indicate a first temporal mode (e.g, that the unit is an "intra" unit) and a value of 0 may indicate a second temporal mode (e.g, that the unit is an "inter" unit). The variable associated with each portion may be signalled between the encoder and the decoder. In one case, this may be performed by setting one of the transformed coefficients to the variable value, e.g. this may be signalled by setting an H coefficient for a 2x2 coding unit or an RH coefficient for a 4x4 coding unit to the variable value (e.g. 0 or 1). In another case, each coding unit may comprise metadata and/or side-band signalling that indicates the temporal mode. Figure 11C shows an example 1120 of the former case. In this example 1120, there are four coefficients 1122 that result from a 2x2 transformation. These four coefficients 1122 may be generated by transforming a 2x2 coding unit of residuals (e.g. for a given plane). When a Hadamard transform is used, the four coefficients may be referred to as A, H, V and D components 1124 respectively representing Average, Horizontal, Vertical and Diagonal aspects within the coding unit. In the example 1120 of Figure 11C, the H component is used to signal a temporal mode, as shown by 1126.
Temporal processing may be selectively applied at the encoder and/or the decoder based on an indicated temporal mode. Temporal signalling within metadata and/or a side-band channel for portions of a frame of an enhancement stream may be encoded, e.g. with run-length encoding or the like to reduce the size of the data that is to be transmitted to the decoder. Run-length encoding may be advantageous for small portions, e.g. coding units and/or tiles, where there are a few temporal modes (e.g. as this metadata may comprise streams of '0's and '1's with sequences of repeated values).
A temporal mode may be signalled for one or more of the two enhancement streams (e.g. at level 2 and/or at level 1). For example, in one case, a temporal mode may be applied at LoQ2 (i.e. level 2) but not at LoQ1 (i.e. level 1). In another case, a temporal mode may be applied at both LoQ2 and LoQl. The temporal mode may be signalled (e.g. as discussed above) independently for each level of enhancement. Each level of enhancement may use a different temporal buffer. For LoQ1 a default mode may be not to use a temporal mode (e.g. a value of 0 indicates no temporal features are used and a value of 1 indicates a temporal mode is used). Whether a temporal mode is used at a particular level of enhancement may depend on capabilities of a decoder. The temporal modes of operation described herein may be applied similarly at each level of enhancement.
Temporal Processing at the Encoder In certain cases, a cost of each temporal mode for at least a portion of video may be estimated. This may be performed at the encoder or in a different device. In certain cases, a temporal mode with a smaller cost is selected and signalled. In the encoder, this may be performed by the temporal mode selection block shown in Figures 3A and 3B. A decoder may then decode the signalling and apply the selected temporal mode, e.g. as instructed by the encoder.
Costing may be performed on a per frame basis and/or on a per portion basis, e.g, per tile and/or per coding unit. In the latter case, a result of a costing evaluation may be used to set the temporal mode variable for the coding unit prior to quantization and 25 encoding.
In certain cases, a map may be provided that indicates an initial temporal mode for a frame, or a set of portions of a frame, of video. This map may be used by the encoder. In one case, a temporal type variable may be obtained by the encoded for use in cost estimation as described in more detail below.
In one case, a cost that is used to select a temporal mode may be controllable, e.g. by setting a parameter in a configuration file. In one case, a cost that is used to select a temporal mode may be based on a difference between an input frame and one or more sets of residuals (e.g, as reconstructed). In another case, a cost function may be based on a difference between an input frame and a reconstructed frame. The cost for each temporal mode may be evaluated and the mode having the smallest cost may be selected. The cost may be based on a sum of absolute differences (SAD) computation. The cost may be evaluated in this manner per frame and/or per coding unit.
For example, a first cost function may be based on J. = Sum(abs(Ix.v.n -Rx.y.v.0), where is an input value, Rx,y,v," is a reconstructed residual and o is intra or inter frame (i.e. indicates a first or second temporal mode). The cost function may be evaluated using reconstructed residuals from each temporal mode and then the results of the cost function may be compared for each temporal mode. A second cost function may be based on additional terms that apply a penalty for non-zero quantized coefficients and/or based on values of one or more directional components if these are used for signalling (e.g. following transformation. In the second case, the second cost function may be based on J. = Sum(abs(Ix.y."- + step_widthAA * Sum((qCx.y....!= 0) + ((o==intra)&(qC0,3mintra == 0))), where the step width is a configurable weight or multiplier that may be tuned empirically, is a quantized coefficient and qC0,3,",.n, is a coefficient that relates to an H (for a 2x2 transform) or HH (for a 4x4 transform) element. In other cases, where a side-band signalling in used, a cost of setting these bits to I may be incorporated into the second cost function. For the first temporal mode (e.g. an intra mode), residuals may be reconstructed according to 11...x.y.11. intra=Transform(dqCx.y...m., ), where "dq" indicates dequantized. For a second temporal mode (e.g. an inter mode), residuals may be reconstructed according to Rx.y.n.inter = Transform(dqCx,y,.J.i.r +dqC,,y,.4). "Transform" in both cases may indicate an inverse transform of the coefficients. If a transform matrix is a self-inverse matrix then a common or shared matrix may be used for both forward and inverse transformations. As before, the temporal mode that is used may be indicated in signalling information, e.g. metadata and/or a set parameter value.
In one case, the cost may be evaluated at the encoder. For example, the temporal selection block may evaluate the cost. In other cases, the cost may be evaluated by a separate entity (e.g. a remote server during pre-processing of a video stream) and the temporal mode signalled to the encoder and/ decoder.
If the second temporal mode is selected (e.g. inter frame processing), then modified quantized coefficients (e.g. output by the subtraction block 342 between transform component 341 and quantize component 343 in Figure 3B) are then sent for entropy encoding. The dequantized values of these coefficients may then be kept for temporal prediction of the next frame, e.g. frame n+1. Although Figure 3B shows two separate inverse quantize operations for a level 1 stream, it should be noted that these may comprise a single common inverse quantize operation in certain cases.
Temporal mode selection and temporal prediction may be applied to one or more of the level 2 and level 1 streams shown in Figure 3B (e.g. to one or both sets of residuals).
In certain cases, a temporal mode may be separately configured and/or signalled for each stream.
Temporal Refresh As described in later sections, in certain examples, a second temporal mode may utilise a temporal refresh parameter. This parameter may signal when a temporal buffer is to be refreshed, e.g. where a first set of values stored in the temporal buffer are to be replaced with a second set of values. Temporal refresh may be applied at one or more of the encoder and the decoder. The temporal buffer may be any one of the temporal buffers 124, 144, 230, 250, 345, 361, 424, 444, 530, 550, and 591. For example, in the encoder, a temporal buffer may store dequantized coefficients for a previous frame that are loaded when a temporal refresh flag is set (e.g. is equal to 1 indicating "refresh-). In this case, the dequantized coefficients are stored in the temporal buffer and used for temporal prediction for future frames (e.g. for subtraction) while the temporal refresh flag for a frame is unset (e.g. is equal to 0 indicating "no refresh"). In this case, when a frame is received that has an associated temporal refresh flag set to 1, the contents of the temporal buffer are replaced.
This may be performed on a per frame basis and/or applied for portions of a frame such as tiles or coding units.
A temporal refresh parameter may be useful for a set of frames representing a slow-changing or relatively static scene, e.g. a first shot for the set of frames may be used for subsequent frames in the scene. When the scene changes again, a first frame in a set of frames for the next scene may indicate that temporal refresh is again required. This may help speed up temporal prediction operations.
A temporal refresh operation for a temporal buffer may be effected by zeroing all values with the temporal buffer.
A temporal refresh parameter may be signalled to the decoder by the encoder, e.g. as a binary temporal refresh bit where 1 indicates that the decoder is to refresh the temporal buffer for a particular encoded stream (e.g. level 1 or level 2).
Temporal Estimates and Refreshing for Tiles As described herein, in certain examples, data may be grouped into tiles, e.g. 32x32 blocks of an image. In this case, a temporal refresh operation, e.g. as described above, may be performed on a tile-by-tile basis for a frame, e.g. where coefficients are stored in the temporal buffer and may be addressed by tile. A mechanism for tiled temporal refresh may be applied asymmetrically at the encoder and the decoder.
In one case, a temporal processing operation may be performed at the encoder to determine temporal refresh logic on a per frame or per block/coding unit basis. In certain cases, the signalling for a temporal refresh at the decoder may be adapted to conserve a number of bits that are transmitted to the decoder from the encoder.
Figure 12A shows an example 1200 of temporal processing that may be performed at the encoder. Figure 12A shows a temporal processing subunit 1210 of an example encoder. This encoder may be based on the encoder 300, 360 of Figures 3A or 3B. The temporal processing subunit receives a set of residuals indicate as R. These may be level 2 or level 1 residuals as described herein. They may comprise a set of ranked and filtered residuals or a set of unranked and unfiltered residuals. The temporal processing subunit 1210 outputs a set of quantized coefficients -indicated as qC -that may then be entropy encoded. In the present example, the temporal processing subunit 1210 also outputs temporal signalling data -indicated as TS -for communication to the decoder. The temporal signalling data TS may be encoded together with, or separately from, the quantized coefficients. The temporal signalling data TS may be provided as header data and/or as part of a side-band signalling channel. In one case, temporal data may be encoded as a separate surface that is communicated to the decoder.
In the example 1200 of Figure 12A, the residuals (R) are received by a transform component 1212. This may correspond to the transform component of other examples, e.g. one of transform components 322, 341 in Figures 3A and 3B. The transform component 1212 outputs transform coefficients as described herein (i.e. transformed residuals). The temporal processing subunit 1210 also comprises a central temporal processor 1214. This also receives metadata in the form of a tile-based temporal refresh parameter temporal relresh_per tile and an estimate of a temporal mode initial temporal mode. The estimate of temporal mode may be provided per coding unit of a frame and the tile-based temporal refresh parameter may be provided per tile. For example, if a 2x2 transform is used, then a coding unit relates to a 2x2 area, and in a 32x32 tile there are 16x16 such areas, and so 256 coding units. The metadata may be generated by another El subunit of the encoder, e.g. in a pre-processing operation and/or may be supplied to the encoder, e.g. via a network Application Programming Interface (API).
In the example 1200 of Figure 12A, the temporal processor 1214 receives the metadata and is configured to determine a temporal mode for each coding unit and a value for a temporal refresh bit for the whole frame or picture. The temporal processor 1214 controls the application of a temporal buffer 1222. The temporal buffer 1222 may correspond to the temporal buffer of previous examples as referenced above. The temporal buffer 1222 receives de-or inverse quantized coefficients from an inverse quantize component 1220, which may correspond to one of the inverse quantize components 372 or 364 in Figures 3A and 3B. The inverse quantize component 1220 is communicatively coupled in turn to an output of a quantize component 1216, which may correspond to one of quantize components 323 or 343 in Figures 3A and 3B. The temporal processor 1214 may implement certain functions of the temporal mode selection components 363 or 370 as shown in Figures 3A and 3B. Although, Figure 12A shows a certain coupling between the quantize component 1216, the inverse quantize component 1220 and the temporal buffer 1222, in other examples, the temporal buffer 1222 may receive an output of the temporal processor 1214 before quantization and so the inverse quantize component 1220 may be omitted. In Figure 12A, a temporal signalling component 1218 is also shown that generates the temporal signalling TS based on operation of the temporal processor 1214.
Figure 12B shows a corresponding example 1230, e.g. as implemented at a decoder, where the decoder receives a temporal refresh bit per frame and a temporal mode bit per coding unit. As discussed above, in certain cases the temporal mode for each coding unit may be set within the encoded coefficients, e.g. by replacing an H or HH value within the coefficients. In other examples, the temporal mode for each coding unit may be sent via additional signalling information, e.g. via a side-band and/or as part of frame metadata.
In the example 1230 of Figure 12B, a temporal processing subunit 1235 is provided at the decoder. This may implement at least a portion of a level 1 or level 2 decoding component. The temporal processing subunit 1235 comprises an inverse quantize component 1240, an inverse transform component 1242, a temporal processor 1244 and a 30 temporal buffer 1248. The inverse quantize component 1240 and the inverse transform component 1242 may comprise implementations of the inverse quantize components 572, 582 and the inverse transform components 573, 583 shown in Figures 5B and 5C. The temporal processor 1244 may correspond to functionality applied by the temporal prediction component 585 and the third summation component 594, or by the temporal prediction component 585 and the fourth summation component 595. The temporal buffer 1248 may correspond to one the temporal buffers 550 or 591. In Figure 12B, there is also a temporal signalling component 1246 that receives data 1232 that is, in this example, indicated in a set of headers H for the bitstream. These headers H may correspond to the headers 556 of Figure 5C. It should be noted that the temporal subunits 1210 and 1235 may, in certain cases, be implemented with respective encoders and decoders that differ from the other examples herein.
In certain cases, when a temporal mode is enabled, e.g. as set by a global temporal enabled bit, the temporal processor 1214 of Figure 12A is configured to use the tile-based temporal refresh parameter temporal refresh per tile and the estimate of the temporal mode initial temporal mode and to determine values for the temporal mode for each coding unit and the temporal refresh bit for the whole frame that improve communication efficiency between the encoder and the decoder.
In one case, the temporal processor may determine costs based on the estimate of the temporal modes initial temporal mode and use these costs to set the values that are communicated to the decoder.
In one case, the temporal processor may initially determine whether a per frame refresh should be performed and signalled based on percentages of different estimated temporal modes across the set of coding units for the frame, e.g. where the coding units have an initial estimate of the temporal mode. For example, first, all coding units of both estimated temporal modes (e.g. elements associated with a 2x2 or 4x4 transform) may be ignored if they have a zero sum of absolute differences (e.g. cases where there is no residual). A refresh bit for the frame may then be estimated based on proportions (e.g. percentages) of non-zero coding units. In certain examples, a refresh operation for the contents of a temporal buffer may be set based on a percentage of coding units that are initially estimated to relate to the first temporal mode. For example, if more than 60% of coding units that arc estimated to relate to thc first temporal mode in the case that temporal refresh per tile is not set, or if more than 75% of coding units are deemed to relate to the first temporal mode in the case that temporal refresh_per tile is set, then the temporal buffer 1222 may be refreshed (e.g. by zeroing values within the buffer) for the whole frame and appropriate signalling set for the decoder. In these cases, even if temporal processing is enabled (e.g. via the temporal enabled signalling), any subtraction is performed with respect to zeroed values within the temporal buffer 1222 and so temporal prediction at the decoder is inhibited similar to the first temporal mode. This may be used 63' to revert back to the first temporal mode based on changes within the video stream (e.g. if it is a live stream) even though a second temporal mode with temporal prediction is signalled This may improve viewing quality.
Similarly, in certain cases, even if the second temporal mode is selected for coding units and signalled to the decoder, if a frame encoded by the base encoder is set as an I or intra frame (e.g. by setting the temporal refresh bit for the frame), then the temporal buffer 1222 is refreshed as above (e.g. effecting processing similar to the first temporal mode). This may help to ensure that Group of Pictures (GoP) boundaries of the base stream, e.g. as encoded, are respected when temporal processing is enabled.
Whether a temporal refresh is performed, e.g. for a tile, may depend on whether noise sequences are present with isolated static edges. The exact fonin of the cost function may depend on the implementation.
Returning to processing performed by the temporal processing subunit 1210 of Figure 12A, following a decision on whole frame refresh, a second stage may involve tile-based processing based on the temporal refresh per tile bit value. This may be performed per tile for a given set of tiles for a frame. If temporal refresh_per tile is used, and if the flag temporal refresh per tile is set in the metadata received by the temporal processor, then the following processing may be performed.
At a first substage, it may be checked whether a temporal buffer for a given tile is already empty. If it is, all temporal signals in the tile are zero and coding units in this tile are encoded in the second temporal mode (e.g. inter encoded), e.g. the temporal mode for the unit is set as the second mode, further temporal processing is performed in relation to this mode at the encoder, and the temporal mode is signalled to the decoder (e.g. either by setting a coefficient value or via sideband signalling). This may effectively code the tile as per the first temporal mode (e.g. intra coding) as the temporal buffer is empty. If the second temporal mode (e.g. inter mode) is set via a 0 value in the temporal mode bit, this approach may reduce the number of bits that need to be communicated to the decoder in cases where the temporal buffer will be empty.
If the flag temporal refresh_per tile is not set for a given tile, a first coding unit in the tile may be encoded as per the second temporal mode (e.g. as an inter unit) and temporal signalling for this tile is not set. In this case, a costing operation as described previously is performed for the other coding units within the tile (e.g. the first or second temporal mode may be determined based on a sum of absolute differences (SAD) metric). In this case, for the other coding units, the initial estimated temporal mode information is recomputed based on current (e.g. live) encoding conditions. All other coding units in the tile may be subjected to the procedure and costing steps above. The encoding of the first coding unit in the tile as the second temporal mode may be used to instruct initial temporal processing at the decoder (e.g. to instruct an initial refresh for the tile), where the temporal processing for the other coding units is performed at the decoder based on the confirmed values of the temporal mode bit set for the coding units.
If the flag temporal refresh_per tile for a given tile is set and a temporal buffer for the tile is not empty, then the temporal processor may arrange for a temporal refresh of the tile, where temporal signalling is then set to instruct this at the decoder. This may be performed by setting a temporal mode value for a first coding unit to 1 and the temporal mode value for all other coding units to 0. This matter of 1 in the first coding unit and 0 in the other coding units indicates to the decoder that a refresh operation is to be performed with respect to the tile yet reduces the information to be transmitted across. In this case, the temporal processor effectively ignores the temporal mode values and encodes all the coding units as per the first temporal mode (e.g. as intra coding units without temporal prediction).
Hence, in these examples, when the temporal refresh per tile is set as part of the encoder metadata, a first coding unit may be used to instruct the decoder to clean (i.e. empty) its corresponding temporal buffer at the position of that tile and the encoder logic may apply temporal processing as an appropriate temporal mode.
The approaches above may allow temporal prediction to be perform on a per tile basis based on coding units within the tile. Configurations for a given tile may be set for one coding unit within the tile. These approaches may be applied for one or more of the level 2 stream and the level 1 stream, e.g, for one or more of the sets of residuals.
In certain cases, a temporal tile infra signalling global parameter may be set for a video stream to indicate that the tile refresh logic described above is to be used at the decoder.
Initial Temporal Mode Flag In certain examples, the initial temporal mode data may be provided for a plurality of frames, e.g. for a current frame and a next frame. In these examples, the initial temporal mode estimate for a next frame, e.g. frame n+1, may also be used to remove quantized values that are not considered important to reduce the bit rate, the estimated temporal mode information may be used to control comparisons with one or more thresholds to instruct removal of quantized values (e.g. at one of the quantize components 323, 343, at one of the temporal mode selection components 363, 370 or at the RM L-1 control components 324, 365 in Figures 3A or 3B).
In certain cases, if an initial temporal mode for a coding unit at the same position in a next frame is estimated to be related to the first temporal mode (e.g. an intra mode), it may be assumed that residuals to be coded in the present coding unit will disappear in the next frame and hence residuals that are smaller or equal to a given threshold may be removed. As an example, in a test case, this threshold may be set to 2, meaning all quantized values smaller than +/-3 will be removed from the coding unit.
Figure 12C shows an example 1250 of how temporal signalling information may be provided for a frame of residuals 1251. References to a "frame" in these examples may refer to a frame for a particular plane, e.g. where separate frames of residuals generated for each of YUV planes. As such the terms "plane" and "frame" may be used interchangeably.
The left-hand-side of Figure 12C shows how a frame of residuals may be divided into a number of tiles 1252. The right-hand-side of Figure 12C shows how temporal signalling information may be assigned to each tile. For example, circle 1253 indicates a first tile 1254. In the frame 1251, the tiles form a raster-like pattern of rows across the frame 1251. The right-hand-side shows the first tile 1254 in more detail.
The circle 1253 on the right-hand-side of Figure 12C shows how each tile 1254 comprises a number of coding units. A coding unit may comprise one or more residuals. In one case, a coding unit may relate to a block of residuals associated with a transform operation, e.g. a 2x2 block as described herein, which may relate to a Directional Decomposition transformation (DD -described in more detail below), or a 4x4 block as described herein, which may relate to a Directional Decomposition Squared (DDS). In Figure 12C, each coding unit within the tile has a temporal type flag 1255 (shown as "TT") and the tile 1254 has a temporal refresh per tile flag 1256 (shown as "TR"). This information may be obtained and used by the encoder to apply temporal encoding as described above.
Other Temporal Signalling Examples As described above, in one case, temporal signalling may be provided "in-stream", e.g. as part of an enhancement stream. This may be performed by replacing a particular coefficient following transformation, e.g. the temporal signalling is embedded within the transform coefficients. In one case, a horizontal coefficient (e.g. H in a 2x2 Directional Decomposition transform or HH in a 4x4 Directional Decomposition Squared transform) may be used to signal a temporal mode for a particular coding unit. A horizontal coefficient may be used as this may minimise an effect on a reconstructed signal. In certain cases, the effect of the horizontal coefficient may be reconstructed by the inverse transform at the decoder, e.g. based on the data carried by the other coefficients in the coding block. In another case, temporal signalling may be performed using metadata. Metadata, as used here, may be a form of side-band signalling, e.g. that does not form part of the base or enhancement streams. In one case, metadata is transmitted in a separate stream (e.g. by the encoder or a remote server) that is received by the decoder.
Although "in-stream" temporal signalling can provide certain advantages for compression, sending temporal data for a frame as a separate chunk of information, e.g. metadata, allows different and possibly more efficient entropy coding to be used for this information. In also allows temporal control and processing, e.g. as described above, to be performed without the need for received enhancement stream data This allows the temporal buffer to be prepared and makes in-loop temporal decoding a simple additive process.
If the second temporal mode (e.g. if temporal processing is enabled) there may be three levels of temporal signalling: * At a first level, there may be per frame temporal signals. These may comprise a per frame temporal refresh signal. This may be a per frame refresh bit. If this is set the whole frame may be encoded without temporal prediction. A signal at this level may be used to encode the frame and may be signalled to the decoder.
* At a second level, there may be per tile temporal signals. For example, these may be set per in by n tile, where in and n may be 32. Per tile temporal signals may comprise a per tile temporal refresh signal. This may be a per tile refresh bit. Tithe temporal refresh signal is set for a tile then that whole tile is encoded without temporal information. This level of temporal signalling may be used to encode the frame. In one case, it may not be explicitly signalled to the decoder; in this case, a tile refresh signal may be indicated by a first temporal signal at a third level as described below. In another case, a per tile temporal refresh signal may be explicitly signalled to the decoder.
* At a third level, there may be per block or coding unit temporal signals. These may comprise a temporal mode signal for the block. This may be signalled to the decoder. If the per tile temporal refresh signal is set to 1, and the whole tile is encoded without temporal information (e.g. according to the first temporal mode), then this may be signalled to the decoder with a one-bit per block temporal signal for the first block, which may be set to 1. If the per tile temporal refresh signal is set to 0, then the first transform block in the tile (e.g. 2x2 or 4x4 block) may be encoded with temporal prediction (e.g. using the temporal buffer). In this case, the temporal signal per block may be set to 0 indicating temporal prediction is used (e.g. encoded according to the second temporal mode). If the per tile temporal refresh signal is set to 0, all other transform blocks in the tile may have a one-bit temporal signal that is set to 1 if the tile is encoded without temporal information and that is set to 0 if the transform coefficients from the previous frame at the same spatial position are first subtracted from the transform coefficients and difference is then quantized and passed to the entropy encoder (i.e. if the second temporal mode and the temporal buffer are to be used).
Figure 12D shows a representation 1260 of temporal signals for 4x4 transform size (e.g. a DDS transform). A 2x2 transform size may be signalled in a corresponding manner.
Figure 12D shows a frame (or plane) 1261 of elements 1262 (e.g. derived from residuals) with a plurality of tiles 1265, 1266 (e.g. similar to Figure 12C). Temporal signals are organized using the tiles 1265, 1266. For a 4x4 transform and a 32x32 tile, there are 8x8 temporal signals per tile (i.e. 32/4). For a 2x2 transform and a 32x32 tile, there are 16x16 temporal signals per tile (i.e. 32/2). The set of temporal signals for a frame of residuals, e.g. as shown in Figure 12D, may be referred to as a "temporal map". The temporal map may be communicated from the encoder to the decoder.
Figure 12D shows how a temporal signal for a first transform block 1268, 1269 within the tile 1265, 1266 may indicate whether the tile is to be processed within the first or second temporal mode. The temporal signal may be a bit indicating the temporal mode.
If the bit is set to 1 for the first transform block, e.g. as shown for block 1268, this indicates that the tile 1265 is to be decoded according to the first temporal mode, e.g. without use of the temporal buffer. In this case, bits for the other transform blocks may not be set. This can reduce the amount of temporal data that is transmitted to the decoder. If the temporal signalling bit of the first transform block is set to 0, e.g. as is the case for block 1269, this indicates in Figure 12D that the tile 1266 is to be decoded according to the second temporal mode, e g with temporal prediction and use of the temporal buffer. In this case, the temporal signalling bits of the remaining transform blocks are set to either 0 or 1, providing a level of temporal control at the (third) per block level.
Encoding Temporal Signals In certain cases, the temporal signalling at the third level, as described above may be efficiently encoded if it is sent as metadata (e.g. sideband data).
In the case described above, and e.g. as shown in Figure 12D, the temporal map for a frame may be sent to a run-length encoder (e.g. where a frame is a "picture" of encoded residuals). The temporal map may be efficiently encoded using run length encoding. The run-length encoding may be performed using the same run-length encoder used in the "Entropy Coding" component of one or more of the first and second enhancement streams (or a copy of this encoder process). In other cases, a different run-length encoder may be used.
If run-length encoding is to be used, then when the temporal map is received by the run-length encoder several operations may occur. In one case, if first temporal signal in the tile is 1, the temporal signalling for the rest of the tile is skipped. This is shown by the arrow from the first transform block with a value of 1. In this case, if the first temporal signal in the tile is 0, e.g. as shown for the subsequent tiles 1266 in Figure 12D, the temporal signalling bits for the tile may be scanned line by line (e.g. along a first row of transform blocks before moving to the next row of transform blocks, at each step moving to a next column of transform blocks). In Figure 12D, each tile has 8 rows and 8 columns, so for a 0 bit, an iteration is performed over the first 8 columns of the first row, and then the iteration is repeated for the same 8 columns for the second row, and so on until all the temporal signals for the transform blocks for that particular tile are encoded.
In one case, a run-length encoder for the temporal signals may have two states, representing bit values of 0 and 1 (i.e, second temporal mode and first temporal mode). These may be used to encodes runs of Is and runs of Os. In one case, the run-length encoder may encode runs byte by byte, using 7 bits per byte to encode the run and bit 7 to encode either that more bits are needed to encode the run (set to 1) or that context is changed. By convention, the first symbol in the stream is always coded as 0 or 1, so decoder can initialize the state machine. A state machine 1280 that may be used is shown in Figure 12E.
The data shown in Figure 12D may be referred to as a -temporal surface-, e.g. a surface of temporal signalling data.
The state machine 1280 of Figure 12E has a start state 1281 and then two subsequent states 1282 and 1283. A run length decoder for the temporal signalling may read the run length encoded data byte by byte (e.g. the data shown in Figure 12D that is encoded by a run length encoder). By construction the state 1281 of the first byte of data may be guaranteed to be true value of the first symbol in the stream. The decoder uses the state machine 1280 to determine the state of the next byte of data. A byte of data may be encoded in a similar manner to the bytes 1080 and 1085 in Figures 10H and 101. In these cases, a first subsequent state is a one-run state 1282. This may have the most significant bit (bit 7) as a run flag bit (e.g. similar to 1081 in Figure 10H) and the remaining bits (bits 6 to 0 -seven in total -similar to 1082 in Figure 10H) as a data portion. The one-run state 1082 encodes 7 bits of a one run count. The run bit is high if more bits are needed to encode the count. From the first symbol state 1281, the state machine 1280 may move to the one-run state 1282 if the run and symbol bits are both 0 or both 1 and may move to a zero-run state if the run and symbol bits are different (e.g. 0 and 1 or 1 and 0). A run bit value of 0 may toggle between the one-run and zero-run states 1282 and 1283. The zero-run state 1283 may also have a byte structure similar to that shown in Figures 10H or 10I). The zero-run state encodes 7 bits of a zero-run count. The run bit is high if more bits are needed to encode the count.
In one example, a run-length decoder may write 0 and 1 values into a temporal signal surface array TempSigSurface of the size (Picture Width I riThs, PictureHeight I nThs) where nTbs is transform size (e.g. 2 or 4 in examples herein). If the value to write at the writing position (x,y) in the TernpAS'igSutface is 1 and x%(32/nTbs) == 0 and y%(32/nTbs) == 0, the next writing position is moved to (x, y+32/nTbs) when y+32/nTbs < PictureWidth / nTbs, otherwise it is moved to (x+32/nTbs, 0). Run length encoding and decoding for the temporal signalling may be implemented in a similar manner to the run length encoding described for the residual data (e.g, with reference to Figures 10A to 101). In one case, the information generated by the run-length encoder may be sent to an 30 entropy encoder. This may comprise a Huffman encoder. A Huffman encoder may write into a metadata stream two Huffman codes for each state and Huffman encoded values. The run-length encoding and entropy encoding may thus use existing entropy coding components and/or suitably adapted duplicates of these components (e.g. as suitably initialised threads). This may simplify the encoding and decoding, as components may be re-used with different configuration information. In certain cases, Huffman or prefix coding may be implemented in a similar manner for both residual data and temporal signalling data (e.g. as described with reference to Figures 10A to 101).
l'emporal Processing Flowchart Example Figures 13A and 13B are two halves 1300, 1340 of a flow chart showing a method of temporal processing according to an example. The method of temporal processing may be performed at the encoder. The method of temporal processing may implement certain processes described above. The method of processing may be applied to the frame of residuals shown in Figure 12C.
At a first block 1302, a check is made as to whether a current frame of residuals is an I-frame (i.e. an intra-coded frame). If the current frame of residuals is an I-frame then the temporal buffer is refreshed at block 1304, and the current frame of residuals is encoded as an Inter-frame at block 1306 with per picture signalling set to 1 at block 1308. If the current frame of residuals is determined not to be an I-frame at block 1302, then a first tile is selected and a check is made at block 1310 to determine whether the temporal refresh per tile flag is set (e.g. has a value of 1). This may be the IR variable 1256 as shown on the right-hand-side of Figure 12C. If the temporal refresh per tile flag is set, then at a next block 1320 the temporal type flags of the units within the current tile are analysed. For example, for a first tile, these may be the temporal type flags 1255 of the units shown on the right-hand-side of Figure 12C. At the next block 1324, a percentage of I or first temporal mode flag values may be counted (e.g. values of '1'). If these are greater than 75%, then the temporal buffer is refreshed at block 1328 and the tile is inter coded at block 1330, with the temporal signals in each tile set to 0 at block 1332. If these are less than 75%, the method proceeds to Figure 13B (e.g. via node A). A similar process takes place if the temporal refresh_per tile is not set (e.g. has a value of 0), where a check at block 1322 is made to determine whether more than 60% of the temporal ope flags of the units within the current tile are set to an I or first temporal mode (e.g. have values of '1'). If this is the case, a similar process as per the previous 75% check takes place (e.g. blocks 1328 to 1332 are performed). If less than 60% of the temporal type flags of the units within the current tile are set to an I or first temporal mode, then the method again proceeds to Figure 13B (e.g. via node B).
Turning to the second half 1340 shown in Figure 13B, and starting with the node A at the left-hand-side of Figure 13B, if less than 75% of units have an I or first temporal mode then a check at block 1342 is made as to whether the temporal buffer is empty. If the temporal buffer is empty, the units within the tile are inter coded at block 1344 and the temporal signals are set to 0 for the units in the tile at block 1346. If the temporal buffer is not empty, then the units within the tile are intra coded at block 1348. In this case, then at block 1350, the temporal signal for the first unit is set to 1 and the temporal signal for all other units in the tile are set to 0 Now turning to the right-hand-side of Figure 13B and starting at node B, if less than 60% of units have an I or first temporal mode, then the first unit in the current tile is inter coded at block 1352 and the temporal signal for the first unit is set to 0 at block 1354.
Then a check is made at block 1356 as to whether a temporal type for a co-located n+1 unit (i.e. co-located unit in a next frame) is set to 1. If so and the residual value is determined to be less than 2 at block 1358 then the residual is removed at block 1360, e.g. by setting the residual value to 0. If the residual value is not less than 2 at block 1358, or if the co-located unit is not set to 1, then a determination is made at block 1362 as to whether the next unit in the tile is to be intra or inter coded based on a cost function. The temporal signal for the next unit may be set according to the cost function classification at block 1364. This may be repeated for the remaining units in the tile. The method, e.g. from the check on temporal refresh per tile, may be repeated for each tile in the frame.
Cloud Coiliguration In certain examples, an encoder (or encoding process) may communicate with one or more remote devices. The encoder may be an encoder as shown in any one of Figures 1, 3A and 3B or described in any other of the examples.
Figure 14A shows an example 1400 of an encoder 1402 communicating across a network 1404 Figure 14B shows that the encoder 1402 may send and/or receive configuration data 1406, 1408 to and/or from a remote control server 1412 Figure 14C shows how an encoder 1432 (which may implement any of the described encoders including encoder 1402 in Figures 14A and 14B) may comprise a configuration interface 1434 that is configured to communicate over the network, e.g. with the remote control server 1412.
Residual Vlode Selection Figure 15 shows an example of a residual mode.
Predicted Averages Figure 16A shows a process 1600 involving a DD transform at the encoder Figure 16B sets out a corresponding process 1655 at the decoder.
Rate Control & Ouantization Figure 17A shows the use of the buffer 1740 with respect to the encoded base stream and the encoded L-1 stream; Figure 17B shows another example, where the buffer 1740 receives the encoded base stream and both the encoded level 1 and level 2 enhancement streams.
Figures 18 and 19 show two possible implementations of the rate controller (e.g. rate controller 1710).
Quantization Features Figure 20A provides an example 2000 of how quantization of residuals and/or coefficients (transformed residuals) may be performed based on bins having a defined step width.
Dectdzone Figure 20B shows an example 2010 how a so-called deadzone" (DZ) may be implemented Bin Folding Figure 20C shows an example 2020 of how an approach called bin folding may be applied.
Ouantization Offsets Figure 20D shows an example 2030 of how a quantization offset may be used in certain cases.
Ottannzation Matrix In one case, a step-width for quantization may be varied for different coefficients within a 2x2 or 4x4 block of coefficients. For example, a smaller step-width may be assigned to coefficients that are experimentally determined to more heavily influence perception of a decoded signal, e.g. in a 4x4 Directional Decomposition (DD-Squared or "DDS") as described above AA, All, AV and AD coefficients may be assigned smaller step-widths with later coefficients being assigned larger step-widths. In this case, a base stepwidth parameter may be defined that sets a default step-width and then a modifier may be applied to this to compute a modified stepwidth to use in quantization (and de-quantization), e.g. modified stepwidth -base stepwidth*modifier where modifier may be set based on a particular coefficient within a block or unit.
In certain cases, the modifier may also, or alternatively, be dependent on a level of enhancement. For example, a step-width may be smaller for the level 1 enhancement stream as it may influence multiple reconstructed pixels at a higher level of quality.
In certain cases, modifiers may be defined based on both a coefficient within a block and a level of enhancement. In one case, a quantization matrix may be defined with a set of modifiers for different coefficients and different levels of enhancement. This quantization matrix may be pre-set (e.g. at the encoder and/or decoder), signalled between the encoder and decoder, and/or constructed dynamically at the encoder and/or decoder.
For example, in the latter case, the quantization matrix may be constructed at the encoder and/or decoder as a function of other stored and/or signalled parameters, e.g. those received via a configuration interface as previously described.
In one case, different quantization modes may be defined. In one mode a common quantization matrix may be used for both levels of enhancement; in another mode, separate matrices may be used for different levels; in yet another mode, a quantization matrix may be used for only one level of enhancement, e.g. just for level 2. The quantization matrix may be indexed by a position of the coefficient within the block (e.g. 0 or I in the x direction and 0 or 1 in the y direction for a 2x2 block, or 0 to 3 for a 4x4 block).
In one case, a base quantization matrix may be defined with a set of values. This base quantization matrix may be modified by a scaling factor that is a function of a step-width for one or more of the enhancement levels. In one case, a scaling factor may be a clamped function of a step-width variable. At the decoder, the step-width variable may be received from the encoder for one or more of the level 1 stream and the level 2 stream. In one case, each entry in the quantization matrix may be scaled using an exponential function of the scaling factor, e.g. each entry may be raised to the power of the scaling factor. In one case, different quantization matrices may be used for each of the level 1 stream and the level 2 stream (e.g. different quantization matrices are used when encoding and decoding coefficients -transformed residuals -relating to these levels). In one case, a particular quantization configuration may be set as a predefined default, and any variations from this default may be signalled between the encoder and the decoder. For example, if different quantization matrices are to be used by default, this may require no signalling to this effect between the encoder and the decoder. However, if a common quantization matrix is to be used, this may be signalled to override the default configuration. Having a default configuration may reduce a level of signalling that is needed (as the default configuration may not need to be signalled). 1ding
As described above, for example with reference to Figure 12C, in certain configurations a frame of video data may be divided into two-dimensional portions referred to as "tiles". For example, a 640 by 480 frame of video data may contain 1200 tiles of 16 pixels by 16 pixels (e.g. 40 tiles by 30 tiles). Tiles may thus comprise non-overlapping successive areas within a frame, where each area is of a set size in each of two-dimensional.
A common convention is for tiles to run successively in rows across the frame, e.g. a row of tiles may run across a horizontal extent of the frame before starting a row of tiles below (a so-called "raster" format, although other conventions, such as interlaced formats may also be used). A tile may be defined as a particular set of coding units, e.g. a 16 by 16 pixel tile may comprise an 8 by 8 set of 2x2 coding units or a 4 by 4 set of 4x4 coding units.
In certain cases, a decoder may selectively decode portions of one or more of a base stream, a level 1 enhancement stream and a level 2 enhancement stream. For example, it may be desired to only decode data relating to a region of interest in a reconstructed video frame. In this case, the decoder may receive a complete set of data for one or more of the base stream, the level 1 enhancement stream and the level 2 enhancement stream but may only decode data within the streams that is useable to render the region of interest in the reconstructed video frame. This may be seen as a form of partial decoding.
Partial decoding in this manner may provide advantages in a number of different areas When implementing a virtual or augmented reality application, only a portion of a wide field of view may be being viewed at any one time. In this case, only a small region of interest relating to the viewed area may be reconstructed at a high level of quality, with the remaining areas of the field of view being rendered at a low (i.e, lower) level of quality. Further details regarding this approach may be found in patent publication W02018/015764 Al, which is incorporated by reference herein. Similar, approaches may be useful when communicating video data relating to a computer game Partial decoding may also provide an advantage for mobile and/or embedded devices where resources are constrained. For example, a base stream may be decoded rapidly and presented to a user. The user may then select a portion of this base stream to render in more detail. Following selection of a region of interest, data within one or both of the level 1 and level 2 enhancement streams relating to the region of interest may be decoded and used to render a particular limited area in high detail. A similar approach may also be advantageous for object recognition, whereby an object may be located in a base stream, and this location may form a region of interest. Data within one or both of the level 1 and level 2 enhancement streams relating to the region of interest may then be decoded to further process video data relating to the object.
In the present examples, partial decoding may be based on tiles. For example, a region of interest may be defined as a set of one or more tiles within frames of the reconstructed video stream, e.g. the reconstructed video stream at a high level of quality or full resolution. Tiles in the reconstructed video stream may correspond to equivalent tiles in frames of the input video stream. Hence, a set of tiles that covers an area that is smaller that a complete frame of video may be decoded.
In certain configurations described herein, the encoded data that forms part of at least the level 1 enhancement stream and the level 2 enhancement stream may result from a Run-Length encoding then Huffman encoding. In this encoded data stream, it may not be possible to discern data relating to specific portions of the reconstructed frame of video without first decoding the data (e.g. until obtaining at least quantized transformed coefficients that are organised into coding units).
In the above configurations, certain variations of the examples described herein may include a set of signalling within the encoded data of one or more of the level 1 enhancement stream and the level 2 enhancement stream such that encoded data relating to particular tiles may be identifier prior to decoding. This can then allow for the partial decoding discussed above.
For example, in certain examples, the encoding scheme illustrated in one or more of Figures 10A to 101 may be adapted to include header data that identifies a particular tile within a frame. The identifier may comprise a 16-bit integer that identifies a particular tile number within a regular grid of tiles (such as shown in Figure 12C). For example, at the start of transmission of encoded data relating to a particular tile of the input video frame, an identifier for the tile may be added to a header field of the encoded data. At the decoder, all data following the identifier may be deemed to relate to the identified tile, up to a time where a new header field is detected within the encoded stream or a frame transition header field is detected. In this case, the encoder signals tile identification information within one or more of the level 1 enhancement stream and the level 2 enhancement stream and this information may be received within the streams and extracted without decoding the streams. Hence, in a case where a decoder is to decode one or more tiles relating to a defined region of interest, the decoder may only decode portions of one or more of the enhancement streams that relate to those tiles.
Use of a tile identifier within the encoded enhancement streams allows variable length data, such as that output by the combination of Huffman and Run-length encoding, while still enabling data that relates to particular areas of a reconstructed video frame to be determined prior to decoding. The tile identifier may thus be used to identify different portions of a received bitstream.
In the present examples, enhancement data (e.g. in the form of transformed coefficients and/or decoded residual data) relating to a tile may be independent of enhancement data relating to other tiles within the enhancement streams. For example, residual data may be obtained for a given tile without requiring data relating to other tiles. In this manner, the present examples may differ from comparative Scalable Video Coding schemes, such as in associated with the HEVC and AVC standards (e.g. SVC and SHVC), that require other intra or inter picture data to decode data relating to a particular area or macroblock of a reconstructed picture. This enables the present examples to be efficiently implemented using parallel processing -different tiles and/or coding units of the reconstructed frame may be reconstructed in parallel. This can greatly speed up decoding and reconstruction on modern computing hardware where multiple CPU or GPU cores are available.
Tiles Within the Bytes/ream Figure 21A shows another example 2100 of a bit or bytestream structure for an enhancement stream. Figure 21A may be seen as another example similar to Figure 9A.
Neural Network Up-sampling In certain examples, up-sampling may be enhanced by using an artificial neural network.
In Figure 22B, the input to the neural network up-sampler 2210 (e.g. the up-sampler from Figure 22A) is first processed by a first conversion component 2222.
Figure 22C shows an example architecture 2230 for a simple neural network up-sampler 2210.
Figure 22D shows an example 2240 with one implementation of the optionally post-processing operation 2238 from Figure 22C.
Example Encoder and Decoder Variations Graphical Example with Optional Level 0 Upscaling Figure 23 shows a graphical representation 2300 of the decoding process described in certain examples herein.
Fourth Example Decoder Figure 24 shows a fourth example decoder 2400. The fourth example decoder 2400 may be seen as a variation of the other example decoders described herein. Figure 24 represents in a block diagram some of the processes described in more detail above and below.
Fifth Example Encoder and Decoder Figures 25 and 26 respectively show variations of the encoder architecture of Figures 1, 3A and 3B and the decoder architecture of Figures 2, 5A and 5B.
Figure 26 shows a variation of a decoder 2600 according to an example In general summary, with reference to Figure 26, there is shown there is shown a non-limiting exemplary embodiment according to the present invention. In Figure 26, an exemplary decoding module 2600 is depicted. The decoding module 2600 receives a plurality of input bitstreams, comprising encoded base 2616, level 1 coefficient groups 2626, level 2 coefficient groups 2646, a temporal coefficient group 2656 and headers 2666.
In general, the decoding module 2600 processes two layers of data A first layer, namely the base layer, comprises a received data stream 2616 which includes the encoded base. The encoded base 2616 is then sent to a base decoding module 2618, which decodes the encoded base 2616 to produce a decoded base picture. The base decoding may be a decoder implementing any existing base codec algorithm, such as AVC, HEVC, AV1, VVC, EVC, VC-6, VP9, etc. depending on the encoded format of the encoded base.
A second layer, namely the enhancement layer, is further composed of two enhancement sublayers. The decoding module receives a first group of coefficients, namely level 1 coefficient groups 2626, which are then passed to an entropy decoding module 2671 to generate decoded coefficient groups. These are then passed to an inverse quantization module 2672, which uses one or more dequantization parameters to generate dequantized coefficient groups. These are then passed to an inverse transform module 2673 which performs an inverse transform on the dequantized coefficient groups to generate residuals at enhancement sublayer 1 (level 1 residuals). The residuals may then be filtered by a smoothing filter 2632. The level 1 residuals (i.e., the decoded first enhancement sublayer) is applied to a processed output of the base picture.
The decoding module receives a second group of coefficients, namely level 2 coefficient groups 2646, which are then passed to an entropy decoding module 2681 to generate decoded coefficient groups. These are then passed to an inverse quantization module 2682, which uses one or more dequantization parameters to generate dequantized coefficient groups. The dequantization parameters used for the enhancement sublayer 2 may be different from the dequantization parameters used for the enhancement sublayer 1. The dequantized coefficient groups are then passed to an inverse transform module 2683 which performs an inverse transform on the dequantized coefficient groups to generate residuals at enhancement sublayer 2 (level 2 residuals).
User Data Signalling In certain variations of the examples described herein, a bit in the bitstream may be used to signal the presence of user data in place of one of the coefficients associated with a transform block (e.g., the HET coefficient), specifically in the case of a 4x4 transform. For example, this may comprise signalling user data in place of the temporal signalling described with respect to other examples (and shown, for example, in Figure 11C).
In certain examples, an encoding of user data in place of one of the coefficients may be configured as follows. If the bit is set to -0", then the decoder shall interpret that data as the relevant transform coefficient. If the bit is set to "1-, then the data contained in the relevant coefficient is deemed to be user data, and the decoder is configured to ignore that data -i.e. decode the relevant coefficient as zero.
User data transmitted in this manner may be useful to enable the decoder to obtain supplementary information including, for example, various feature extractions and derivations, as described in co-filed patent application number GB1914413.8, which is incorporated herein by reference.
Modular Signalling of Parameters In an aspect of the present disclosure, there is provided a method for signalling certain decoding parameters in a modular manner. In particular, one or more bits may be used in a signalling portion of a bitstream (for example, in a header indicating parameters associated with a sequence, such as Sequence Parameter Sets (SPS), or with a picture, such as Picture Parameter Sets (PPS)) to indicate that certain parameters are indicated in the bitstream.
In particular, the bitstream may contain one or more bits which, when set to one or more certain values, indicate to the decoder the presence of additional information to be decoded. The decoder, once received the bitstream, decodes the one or more bits and, upon determining that the one or more bits corresponds to said one or more certain values, interpret one or more subsequent set of bits in the bitstream as one or more specific parameters to be used when decoding the bitstream (e.g., a payload included in the bitstream).
In a non-limiting example, said one or more specific parameters may be associated with the decoding of a portion of encoded data. For example, the one or more specific parameters may be associated with one or more quantization parameters to decode a portion of the encoded data. For example, if the encoded data comprises two or more portions of encoded data (for example, each portion may be a sublayer of an enhancement layer as described previously), the one or more specific parameters may be one or more quantization parameters associated with decoding some of the two or more portions of encoded data. In another example, the one or more specific parameters may be one or more parameters associated with some post-processing operations to be performed at the decoder, for example applying a dithering function.
In a specific example, the one or more bits may be a bit (e.g., step width level] enabled bit) which enables explicit signalling of a quantization parameter (e.g."slep width level]) only when required. For example, this may occur only when there are data encoded in sublayer 1 as described above. In particular, if the bit step width level] enabled is set to "0", then the value of the step width for sublayer 1 would be set by default to a maximum value. On the other hand, when the bit step width level] enabled is set to "1", then step width level] is explicitly signalled and the value of the step width for sublayer 1 is derived from it. A decoding module / decoder would decode the bit step width level' enabled and, if it determines that it is set to "0-, it is able to set the value of the step width for sublayer 1 to a maximum value. On the other hand, if it determines that it is set to "1", it is able to set the value of the step width for sublayer 1 to a value corresponding to the parameter step width levell (for example, a value between 0 and 2N-1 where N is the number of bits associated with step width levell).
In a different example, the one or more bits may be a bit (e.g., decoder control bit) to enable two parameters (e.g., dithering control variables dithering type and dithering strength) to be signalled on a per picture basis if decoder control is set to "1". A decoding module / decoder would decode the bit decoder control and, if it determines that it is set to "1", it would decode the dithering control variables dithering type and dithering strength and apply the dithering as described in the present application.
The above mechanism provides some important technical advantages, here described by reference to the specific examples but which can be easily generalised to general cases. First, there are some efficiency gains coming from the use of the bit step width level] enabled, which brings an N bits per picture saving in the event no enhancement is used for sub-layer 1. This could result, for example, in a saving of 800 bps for a 50fps sequence. Second, the use of the bit step width level] enabled may lead to a decoding module /decoder being able to "by-pass" completely any processing for enhancement sub-layer 1, thus further decreasing the decoding complexity.
Further examples of different signalling approaches are described with respect to the syntax and semantic sections below.
Temporal Signalling and Temporal Modifier In an aspect of the present application, there is a mechanism for managing temporal information separately from the encoded data (encoded coefficients). In particular, the temporal signalling information is sent via a separate layer of encoded data. In the event that no coefficients are sent (for example, by setting the step-width for level 2 enhancement sub-layer to the maximum value) the temporal buffer can be used to continue applying the residuals computed in previous frames and stored in the buffer to the current frame.
In particular, if no enhancement is sent (e.g., by setting the no enhancement/lag to zero), the temporal buffer could be reset for the whole frame based on a signalling (e.g., by setting the temporal refresh bit to one) in which case no residuals are applied to the current frame. In the event however that the buffer is not reset for the whole frame based on a signalling (e.g., by setting the temporal refresh bit to zero), a second flag may be used to determine whether a temporal signalling should be read by a decoder. In particular, an encoder would set a flag to one (e.g., temporal signalling_present_flag set to one) in order to inform the decoder that a temporal signalling layer is present. In that case, the decoder should read the temporal signalling and apply the temporal logic indicated by the encoder to the decoded bitstream. In particular, it should refresh the tiles and/or the block that are indicated in the signalling. On the other hand, if the encoder sets the flag to zero (e.g., temporal signalling_present_flag set to zero), no temporal signalling is sent and the decoder would apply the residuals contained in the temporal buffer to the current frame.
By the above mechanism, temporal information and residuals belonging to static areas can be preserved even in the event no further data are sent, thus allowing to maintain high quality and details.
In a second aspect, the step-width to be applied to an enhancement sub-layer is reduced for static areas of a picture. In particular, based on a signalling that identifies tiles (i.e., groups of blocks) which are to be decoded using information from the buffer and additional delta residuals from the current frame, i.e., static tiles, the step-width can be reduced by a factor proportional to a signalled parameter (e.g., slepvidth modifier) in order to enable a greater quantization granularity for those parts of the video which are static, and therefore are more likely to be visually relevant. Also, because the step-width is applied to the delta residuals (i.e., the difference between the residuals for a current frame and the co-located residuals already stored in the temporal buffer) a lower step-width (i.e., a higher quantization step) would enable more accuracy in the quantization of the delta residuals, which arc likely to be much smaller than the residuals. Thus, improved quality would be achieved.
Bitstream An example bitstream as generating by the video coding frameworks described herein may contain a base layer, which may be at a lower resolution, and an enhancement layer consisting of up to two sub-layers. The following subsection briefly explains the structure of this bitstream and how the information can be extracted The base layer can be created using any video encoder and is may be flexibly implemented using a wide variety of existing and future video encoding technologies. The bitstream from the base layer may resemble a bitstream as output by an existing codec. The enhancement layer has an additional different structure. Within this structure, syntax elements are encapsulated in a set of network abstraction layer (NAL) units. These also enable synchronisation of the enhancement layer information with the base layer decoded information (e.g. at a decoder so as to reconstruct a video). Depending on the position of a frame of video within a group of pictures (GOP), additional data specifying the global configuration and for controlling the decoder may be present.
As described in the examples herein, and as shown in Figures 9A, 21A and 21B, the data of one enhancement picture may be encoded as several chunks. These data chunks may be hierarchically organised as shown in the aforementioned Figures. In these examples, for each plane (e.g. corresponding to a colour component), up to two enhancement sub-layers are extracted. Each of them again unfolds into numerous coefficient groups of transform coefficients. The number of coefficients depends on the chosen type of transform (e.g. a 4x4 transform applied to 2x2 coding units may generate 4 coefficients and a 16x16 transform applied to 4x4 coding units may generate 16 coefficients). Additionally, if a temporal processing mode is used, an additional chunk with temporal data for one or more enhancement sub-layers may be present (e.g. one or more of the level 1 and level 2 sub-layers). Entropy-encoded transform coefficients within the enhancement bitstream may be processed at a decoder by the coding tools described herein. As described herein the terms bitstream, bytestream and stream of NALUs may be used interchangeably. Implementations of examples may only comprise an implementation of the enhancement levels and base layer implementations, such as base encoders and decoders may be implemented by third-party components, wherein an output of a base layer implementation may be received and combined with decoded planes of the enhancement levels, with the enhancement decoding as described herein.
In certain examples, the bitstream can be in one of two formats: a NAL unit stream format or a byte stream format. A NAL unit stream format may be considered conceptually to be the more "basic" type. It consists of a sequence of syntax structures called NAL units.
This sequence is ordered in decoding order. There may be constraints imposed on the decoding order (and contents) of the NAL units in the NAL unit stream. The byte stream format can be constructed from the NAL unit stream format by ordering the NAL units in decoding order and prefixing each NAL unit with a start code prefix and zero or more zero-8:3 valued bytes to form a stream of bytes. The NAL unit stream format can be extracted from the byte stream format by searching for the location of the unique start code prefix pattern within this stream of bytes.
For bit-oriented delivery, the bit order for the byte stream format may be specified to start with the most significant bit of the first byte, proceed to the least significant bit of the first byte, followed by the most significant bit of the second byte, etc. The byte stream format may consist of a sequence of byte stream NAL unit syntax structures. Each byte stream NAL unit syntax structure may contain one 4-byte length indication followed by one nal unit( NumBytesInNalUnit) syntax structure. This syntax structure may be as follows: Syntax Descriptor byte stream nal unit( ) { nal unit length u(32) nal unit( nal unit length) ) The order of byte stream NAL units in the byte stream may follow a decoding order of the NAL units contained in the byte stream NAL units. The content of each byte stream NAL unit may be associated with the same access unit as the NAL unit contained in the byte stream NAL unit. In the above nal unit length is a 4-byte length field indicating the length of the NAL unit within the nal unit( ) syntax structure.
Payload Processing A payload data block unit process may be applied to the input bitstream. The payload data block unit process may comprise separating the input bitstream into data blocks, where each data block is encapsulated into a NALU. The NALU may be used as described above to synchronise the enhancement levels with the base level. Each data block may comprise a header and a payload. The payload data block unit may comprise parsing each data block to derive a header and a payload where the header comprises configuration metadata to facilitate decoding and the payload comprises encoded data. A process for decoding the payload of encoded data may comprise retrieving a set of encoded data and this may be performed following the decoding process for a set of headers. Payloads may be processed based on the structure shown in one or more of Figures 9A, 21A and 21B, e.g. a set of entropy encoded coefficients grouped be plane, levels of enhancement or layers. As mentioned, each picture of each NALU may be preceded by picture configuration payload parameters.
It is noted for example that each layer is a syntactical structure containing encoded data related to a specific set of transform coefficients. Thus, each layer may comprise, e.g. where a 2x2 transform is used, a set of 'average' values for each block (or coding unit), a set of 'horizontal' values for each block, a set of 'vertical' for each block and a set of 'diagonal' values for each block. Of course, it will be understood that the specific set of transform coefficients that are comprised in each layer will relate to the specific transform used for that particular level of enhancement (e.g. first or further, level 1 or 2, defined above).
Bilstream Syntax In certain examples, the bitstreams described herein (e.g. in particular, the enhancement bitstream) may be configured according to a defined. This section presents an example syntax that may be used. The example syntax may be used for interpreting data and may indicate possible processing implementations to aid understanding of the examples described herein. It should be noted that the syntax described below is not limiting, and that different syntax to that presented below may be used in examples to provide the described functionality.
In general, a syntax may provide example methods by which it can be identified what is contained within a header and what is contained within data accompanying the header. The headers may comprise headers as illustrated in previous examples, such as headers 256, 556, 2402, 2566 or 2666. The syntax may indicate what is represented but not necessarily how to encode or decode that data. For example, with relation to a specific example of an up-sample operation, the syntax may describe that a header comprises an indicator of an up-sample operation selected for use in the broader encoding operation, i.e. the encoder side of the process. It may also be indicated where that indication is comprised in the header or how that indicator can be determined. As well as the syntax examples described below, a decoder may also implement components for identifying entry points into the bitstream, components for identifying and handling non-conforming bitstreams, and components for identifying and handling errors The table below provides a general guide to how the example syntax is presented. When a syntax element appears, it is indicated via a variable such as syntax element; this specifies that a syntax element is parsed from the bitstream and the bitstream pointer is advanced to the next position beyond the syntax element in the bitstream parsing process. The letter "D" indicates a descriptor, which is explained below. Examples of syntax are presented in a most significant bit to least significant bit order.
General Guide -Syntax Specification D
/* A statement can be a syntax element with an associated descriptor or can be an expression used to specify conditions for the existence, type and quantity of syntax elements, as in the following two examples */ syntax element u(n)
conditioning statement
/* A group of statements enclosed in curly brackets is a compound statement and is treated functionally as a single statement. */ c
Statement
Statement i
/* A "while" structure specifies a test of whether a condition is true, and if true, specifies evaluation of a statement (or compound statement) repeatedly until the condition is no longer true */ while (condition)
Statement
/* A "do.. while" structure specifies evaluation of a statement once, followed by a test of whether a condition is true, and if true, specifies repeated evaluation of the statement until the condition is no longer true */ do
Statement
while (condition)
General Guide -Syntax Specification D
/* An "if... else" structure specifies a test of whether a condition is true and, if the condition is true, specifies evaluation of a primary statement, otherwise, specifies evaluation of an alternative statement. The "else" part of the structure and the associated alternative statement is omitted if no alternative statement evaluation is needed */ if (condition)
primary statement el se
alternative statement
/* A "for" structure specifies evaluation of an initial statement, followed by a test of a condition, and if the condition is true, specifies repeated evaluation of a primary statement followed by a subsequent statement until the condition is no longer true. */ for (initial statement; condition; subsequent statement)
primary statement
In the examples of syntax, functions are defined as set out in the table below. Functions are expressed in terms of the value of a bitstream pointer that indicates the position of the next bit to be read by the decoding process from the bitstream.
Syntax function Use byte stream has data( ) If the byte-stream has more data, then returns TRUE; otherwise returns FALSE.
process_payload function(payload type, payload byte size) Behaves like a function lookup table, by selecting and invoking the process payload function relating to the payload type.
read bits(n) Reads the next n bits from the bitstream. Following the read operation, the bitstream pointer is advanced by n bit positions. When n is equal to 0, read_bits(n) returns a value Syntax function Use equal to 0 and the bitstream pointer is not advanced.
read_byte(bi tstream) Reads a byte in the bitstream returning its value. Following the return of the value, the bitstream pointer is advanced by a byte.
read multibyte(bitstream) Executes a read byte(bitstream) until the MSB of the read byte is equal to zero.
bytestream current(bitstream) Returns the current bitstream pointer.
bytestream seek(bitstream, n) Returns the current bitstream pointer at the position in the bitstream corresponding to n bytes.
The following descriptors, which may be used in the "D" column of the example tables, specify the parsing process of each syntax element: b(8): byte having any pattern of bit string (8 bits). The parsing process for this descriptor is specified by the return value of the function read bits( 8).
f(n): fixed-pattern bit string using n bits written (from left to right) with the left bit first. The parsing process for this descriptor is specified by the return value of the function read bits(n) u(n): unsigned integer using n bits. When n is "v" in the syntax table, the number of bits varies in a manner dependent on the value of other syntax elements. The parsing process for this descriptor is specified by the return value of the function read_bits(n) interpreted as a binary representation of an unsigned integer with most significant bit written first.
ue(v): unsigned integer 0-th order Exp-Golomb-coded syntax element with the left bit first. The parsing process for this descriptor is specified later examples.
nib: read multiple bytes. The parsing process for this descriptor is specified by the return value of the function read multibyte(bitstream) interpreted as a binary representation of multiple unsigned char with most significant bit written first, and most significant byte of the sequence of unsigned char written first.
Process Payload -Global Configuration A process payload global configuration syntax may be as set out in the table below: Syntax D process payload global config(payload size) 1 processed_planes type_flag u(1) resolution type u(6) transfbrm type u(1) chroma sampling type u(2) base depth type u(2) enhancement depth type u(2) temporal step width modifier signalled_flag u(1) predicted residual modejlag u(1) temporal tile infra signalling enabled_llag u(1) temporal enabled.flag u(1) upsample type u(3) level I _filtering signalled_flag u(1) scaling mode level] u(2) scaling mode level2 u(2) tile dimensions type u(2) user data enabled u(2) level! depth.flags u(1) reserved zeros lbit u(1) if (temporal step width modifier signalled flag -1) { temporal step width modifier u(8) I else { temporal step width modifier = 48 if (level _ 1 _filtering_signalled_flag) 1 level I.filtering.first coefficient u(4) level l_filtering second coefficient u(4) ^ i if (tile dimensions type > 0) { if (tile dimensions type -3) 1 Syntax D custom tile width u(16) custom tile height 416) reserved zeros 5bit u(5) compression type entropy enabled_per tile_flag u(1) compression type size _per tile u(2) if (resolution_type == 63) 1 custom resolution width 416) custom resolution height 416) i ft Process Payload Picture Configuration A process payload picture configuration syntax, e.g. for a frame of video, may be as set out in the table below: Syntax D process_payload_picture_config(payload_size) { no enhancement bit_flag u(1) if (no enhancement bit flag == 0) { glIalli matrix mode u(3) dequant offset signalled_frag u(1) picture type bit flag u(1) temporal refresh bit_flag u(1) step width level] enabled.flag u(1) step width level2 u(1.5) dithering conn-oljlag u(1) 1 else 1 reserved zeros 4hit u(4) picture type bit.flag u(1) temporal refresh hit_flag u(1) temporal signalling present.flag u(1) Syntax D i if (picture type bit flag == 1) (
field ope bit flag u(1)
reserved zeros 7b11 u(7) i if (step_width_levell_enabled_flag == 1) ( step width level] u(15) level I _fil lug enahled_flag u(1) if (quant_matrix_mode == 2 quant_matrix_mode == 3 II quant_matrix_rnode -5) { for(layerldx = 0; layerldx < nLayers; layerldx++) { qin coefficient 011ayerliq u(8)
I
if (quant_matrix_mode == 4 I quant_matrix_mode == 5) ( for(layerIdx = 0. layerIdx < nLayers; layerIdx++) { q171 coefficient Illayeridx] u(8) i if (dequant offset signalled flag) ( deviant eSet modejlag u(1) dequctnt offset U(7) f if (dithering control flag == 1) { dithering type u(2) reserverd zero u(1) if (dithering type!= 0) 1 dithering strength u(5) 1 else { reserved zeros 5h11 u(S) Syntax Process Payload -Encoded Data A process payload encoded data syntax may be as set out in the table below: Syntax D process_payload_encoded_data(payload_size) { if (tile dimensions type == 0) { for (planeIdx = 0; planeIdx < nPlanes; planeIdx++) { if (no enhancement bit flag == 0) { for (levelIdx = 1; levelIdx <= 2; levelIdx++) { for (layerldx = 0; layerldx < nLayers;layeadx++) { surfaces[planekbelilevelIddllayerldalentropy enabled _Mfg u(1) surface.sfplaneldrfilevelldvfflayerld4rle only.flag u(1) i
I
if (temporal_signalling_present_flag == 1){ temporal surfacesIplaneldx I. entropy enabled_ flag u( 1) temporal surfaces[planeldrirle only _flag u(1) byte_alignment( ) for (planeIdx = 0; planeIdx < nPlanes; planeIdx++) { for (levelldx = 1; level ldx <= 2; levelldx++) { for (layerldx = 0; layerldx < nLayers; layerldx++) process_surface(surfaces[planeIdx][levelIdx][layerIdx]) if (temporal signalling_present flag == 1) process_surface(temporal_surfaces[planeldx]) i } else { Syntax D process_payl oad_encoded_data_tiled(payl oad_size)
I
Process Payload Encoded Tiled Data A process payload encoded tiled data syntax may be as set out in the table below: Syntax D process payload encoded data tiled(payload size) { for (planeIdx = 0; planeIdx < nPlanes; planeIdx++) { for (levelIdx = 1; levelIdx <= 2; levelIdx++) { if (no enhancement bit flag == 0) t for (layerldx = 0; layerldx < nLayers;layeadx++) surfaces[planeldrffievelldrfflayerldrirle only flag u(1) i if (temporal_signalling_present_flag == 1) temporal surfaces Iplaneldx J. He only.flag u(1) byte alignment( ) if (compression_type_entropy_enabled_per_tileflag == 0) { for (planeIdx = 0; planeIdx < nPlanes; planeIdx++) I if (no enhancement bit flag == 0) { for (levelldx = 1; levelldx <= 2; levelldx++) t if (levelIdx == 1) nTiles = nTilesL 1 else nTiles = nT lesL2 for (layerldx = 0; layerldx < nLayers; layerldx++) 1 for (tileIdx = 0; tileIdx < nTiles; tileIdx++) surfaces[planack fflevelldvfflayerldirltiles[tileldy]. entropy enabled _flag u(1) f Syntax D
I
if (temporal signalling_present flag == 1) { for (tileIdx = 0; tileIdx < nTilesL2; tileIdx++) temporal surfaces[planeldritile,sitilacbclentropy enabledflag u(1) i } else { entropy enabled per tile compressed data tie mb byte alignment( ) if (compression_type_size_per_tile == 0) { for (planeIdx = 0; planeIdx < nPlanes; planeIdx++) 1 for (levelldx = 1; levelldx <= 2; level Idx++) / if (levelldx == 1) nTiles = nT lesL1 else nTiles = nTilesL2 for (layerldx = 0; layerldx < nLayers; layerldx++) [ for (tileIdx = 0; tileIdx < nTiles; tileIdx++) process_surface(surfaces[pl aneI dx] [1 evel Idx][layerIdx]. tiles[tileIdx]) ) i if (temporal signalling present flag == 1) { for (tileIdx = 0; tileldx < nTilesL2; tileldx++) process surface(temporal surfaces[planeIdx]fles[tileIdx]) -^ } else { for (planeIdx = 0; planeIdx < nPlanes; planeIdx++) 1 for (levelldx = 1; levelldx <= 2; levelIdx++) { Syntax D if (levelIdx == I) nTiles = nTilesL I &se nTiles = nTilesL2 for (layerIdx = 0; layerIdx < nLayers; layeadx++) ( if(surfaces[planeIdx][levelIdx][layerIdx].rle_onlyilag) ( compressed size_per tile_erirefix mb 1 else ( compressed prefix last symbol bit offset per tile prefix mb compressed size_per tile_prefix mb for (tileIdx=0; tileIdx < nTiles; t leIdx++) process surface(surfaces[planeIdx][levelIdx][layerIdx]. tiles[tileldx]) if (temporal signalling present flag == 1) 1 if(temporal_surfaces[planeId4rle_only_flag) ( compressed size_per tile_prefix mb 1 else ( compressed prefix last symbol bit offset per tile prefix mb compressed size_per tile_prefix mb for (tileIdx = 0; tileIdx < nTilesL2; tileIdx++) process surface(tenaporal surfaces[planeldx].tiles[tileldx])
I
Bitstream Semalnics The section below provides further detail on the meaning of certain variables set out in the tables above. This detail may be referred to as the "semantics" of the bitstream. Example semantics associated with the syntax structures and with the syntax elements within these structures are described in this section. In certain cases, syntax elements may have a closed set of possible values and examples of these cases are presented in certain tables below.
Data Block Semantics The following describes the semantics for each of the data block units, e.g. the data that is carried by the NAL units. Certain variables discussed below relate to profiles, levels and toolsets. Profiles, levels and toolsets may be used to specify restrictions on the bitstreams and hence apply limits to the capabilities needed to decode the bitstreams. Profiles, levels and toolsets may also be used to indicate interoperability points between individual decoder implementations. It may be desired to avoid individually selectable "options" at the decoder, as this may increase interoperability difficulties.
A "profile" may specify a subset of algorithmic features and limits that are supported by all decoders conforming to that profile. In certain case, encoders may not be required to make use of any particular subset of features supported in a profile.
A "level" may specify a set of limits on the values that may be taken by the syntax elements (e.g. the elements described above). The same set of level definitions may be used with all profiles, but individual implementations may support a different level for each supported profile. For any given profile, a level may generally correspond to a particular decoder processing load and memory capability. Implementations of video decoders conforming to the examples described herein may be specified in terms of the ability to decode video streams conforming to the constraints of profiles and levels, e.g. the profiles and/or levels may indicate a certain specification for a video decoder, such as a certain set of features that are supported and/or used. As such, the capabilities of a particular implementation of a decoder may be specified using a profile, and a given level for that profile. The variable profile idc may be used to indicate a profile for the bitstream and the variable level idc may be used to indicate a level. The values for these variables may be restricted to a set of defined specifications. A reserved value ofprofi le idc between a set of specified values may not indicate intermediate capabilities between the specified profiles; however, a reserved value of level idc between a set of specified values may be used to indicated intermediate capabilities between the specified levels. The variable sublevel /dc may also be used to indicate a sublevel for a set of capabilities. These levels and sublevels are not to be confused with the levels and sublevels of the enhancement encoders and decoders, which are a different concept.
As an example, there may be a -main" profile. Conformance of a bitstream to this example "main" profile may be indicated byprojile idc equal to 0. Bitstreams conforming to this example "main" profile may have the constraint that active global configuration data blocks have chroma sampling type equal to 0 or 1 only. All constraints for global configuration parameter sets that are specified may be constraints for global configuration parameter sets that are activated when the bitstream is decoded. Decoders conforming to the present example "main" profile at a specific level (e.g. as identified by a specific value of level idc) may be capable of decoding all bitstreams and sublayer representations for which all of the following conditions apply: the bitstream is indicated to conform to the "main" profile and the bitstream or sublayer representation is indicated to conform to a level that is lower than or equal to the specified level. Variations of this example "main" profile may also be defined and given differing values of profile /dc. For example, there may be a "main 4:4:4" profile. Conformance of a bitstream to the example "main 4:4:4" profile may be indicated by profile Eck equal to 1. Bitstreams conforming to the example "main 4:4:4" profile may have the constraint that active global configuration data blocks shall have chroma sampling type in the range of 0 to 3, inclusive. Again, decoders conforming to the example "main 4:4:4" profile at a specific level (e.g. as identified by a specific value of level idc) may be capable of decoding all bitstreams and sublayer representations for which all of the following conditions apply: the bitstream is indicated to conform to the "main" profile and the bitstream or sublayer representation is indicated to conform to a level that is lower than or equal to the specified level. The variables extended_profile idc and extended level idc may be respectively used to indicate that an extended profile and an extended level are used.
In certain implementation, the "levels" associated with a profile may be defined based on two parameters: a count of luma samples of output picture in time (i.e. the Output Sample Rate) and maximum input bit rate for the Coded Picture Buffer for the enhancement coding (CPBL). Both sample rate and bitrate may be considered on observation periods of one second (e.g. the maximum CPBL bit rate may be measured in terms of bits per second per thousand Output Samples). The table below indicates some example levels and sublevels.
Level Sublevel Maximum Output Maximum CPBL Example Resolution and Frame Rate Sample Rate bit rate 1 0 29,410,000 4 1280x720 (30fps) 1 1 29,410,000 40 1280x720 (30fps) 2 0 124,560,000 4 1920x1080 (60fps) 2 1 124,560,000 40 1920x1080 (60fps) 3 0 527,650,000 4 3840x2160 (60 fps) 3 1 527,650,000 40 3840x2160 (60 fps) 4 0 2,235,160,000 4 7640x4320 (60fps) 4 1 2,235,160,000 40 7640x4320 (60fps) Returning to further variables of the NAL unit data block, if the variable confirmance window_flag is equal to 1 this may be used to indicate that conformance cropping window offset parameters are present in the sequence configuration data block.
If the variable cottfortmmce window flag is equal to 0 this may indicate that the conformance cropping window offset parameters are not present. The variables C011f Win left offiet, conf will right offset, Coll win top offset and con/ win bottom offset specify the samples of the pictures in the coded video sequence that are output from the decoding process (i.e. the resulting output video), in terms of a rectangular region specified in picture coordinates for output. When copfbrmance window_fictg is equal to 0, the values of cant win left offset, conf win right offset, conf win top offset and conf win bottom offset may be inferred to be equal to 0. The conformance cropping window may be defined to contain the luma samples with horizontal picture coordinates from (SubWidthC * conf win left offset) to (width -(SubWidthC * conf win right offset+ 1)) and vertical picture coordinates from (SubHeightC* conf win top offset to height-(SubHeightC* conf win bottom offset + 1)), inclusive. The value of Sub WidthC * (conf win left offset + cop! win right offset) may be constrained to be less than width, and the value of SubHeighIC * (conf win top offset+ conf win bottom offset) may be constrained to be less than height.
The corresponding specified samples of the two chroma arrays (e.g. in a YUV example) may be similarly defined as the samples having picture coordinates (x / Sub WidthC, y / SubHeighiC), where (x, y) are the picture coordinates of the specified luma samples. Example value of SubWidthC and SubfleightC are indicated in the "Example Picture Formats" section above. Note that the conformance cropping window offset parameters may only be applied at the output; all internal decoding processes may be applied to the uncropped picture size.
Data Block Unit Global Configuration Semantics A short description of certain global configuration variables as indicated in the above syntax will now be described. A number of examples of variables or parameters that may be used to carry information regarding the global configuration will be described. These should not be seen as limiting.
The variable processed planes type.flag may be used to specify the plane to be processed by the decoder. It may be equal to 0 or I. For a YUV examples, if it is equal to 0, only the Luma (Y) plane may be processed; if it is equal to 1, all planes (e.g. one luma and two chroma) may be processed. In this case, if theprocessed _planes type flag is equal to 0, nl'Ittnes shall be equal to I and if processed _planes Gpe _flag is equal to 1, ill-Vanes shall be equal to 3. An illustration of the variable ',Planes is shown in Figure 9A.
The variable resolution type may be used to specify the resolution of a Luma (Y) plane of the enhanced decoded picture. It may be defined as a value between 0 and 63, as specified in the table below. The value of the type is expressed as Nx/14, where A' is the width of the Luma (Y) plane of the enhanced decoded picture and /1/ i s height of the Luma (Y) plane of the enhanced decoded picture. For example, the following values (amongst others) may be available: resolution_type Value of type 0 unused /* Escape code prevention */ 1 360x200 2 400x240 3 480x320 4 640x360 640x480 6 768x480 7 800x600 8 852x480 9 854x480 856x480 resolution_type Value of type 11 960x540 12 960x640 13 1024x576 14 1024x600 1024x768 16 1152x864 17 1280x720 18 1280x800 19 1280x1024 1360x768 21 1366x768 22 1400x1050 23 1440x900 24 1600x1200 1680x1050 26 1920x1080 27 1920x1200 28 2048x1080 29 2048x1152 2048x1536 31 2160x1440 32 2560x1440 33 2560x1600 34 2560x2048 3200x1800 36 3200x2048 37 3200x2400 38 3440x1440 39 3840x1600 3840x2160 41 3840x2400 resolution_type Value of type 42 4096x2160 43 4096x3072 44 5120x2880 5120x3200 46 5120x4096 47 6400x4096 48 6400x4800 49 7680x4320 7680x4800 51-62 Reserved 63 Custom The variable chroma sampling type defines the colour format for the enhanced decoded picture as set out in the table in the "Example Picture Formats-section.
The variable transform type may be used to define the type of transform to be used.
For example, the following values (amongst others) may be available: transform_type Value of type 0 2x2 directional decomposition transform 1 4x4 directional decomposition transform In the example above, if transform type is equal to 0, nLayers (e.g. as shown in Figure 9A) may be equal to 4 and if transibrm type is equal to 1, nLcoiers may be equal to 16.
The variable base depth type may be used to define the bit depth of the decoded base picture. For example, the following values (amongst others) may be available: base_depth_type Value of type 0 8 1 10 2 12 3 14 Similarly, the variable enhancement depth type may be used to define the bit depth of the enhanced decoded picture. For example, the following values (amongst others) may be available: enhancement_depth_type Value of type 0 8 1 10 2 12 3 14 The variable temporal step width modifier signalledjlag may be used to specify if the value of the temporal step width modifier parameter is signalled. It may be equal to 0 or 1. If equal to 0, the temporal step width modifier parameter may not be signalled.
The variable predicted residual modejlag may be used to specific whether the decoder should activate the predicted residual process during the decoding process. If the value is 0, the predicted residual process shall be disabled.
The variable temporal tile infra signalling enabled.flag may be used to specify whether temporal tile prediction should be used when decoding a tile (e.g. a 32x32 tile). If the value is 1, the temporal file prediction process shall be enabled.
The variable upsample type may be used to specify the type of up-sampler to be used in the decoding process. For example, the following values may be available: upsample_type Value of type 0 Nearest 1 Linear 2 Cubic 3 Modified Cubic 4-6 Reserved 7 Custom The variable level lJiltering signalled may be used to specify whether a deblocking filter should use a set of signalled parameters, e.g. instead of default parameters. If the value is equal to 1, the values of the deblocking coefficients may be signalled.
The variable temporal step width modifier may be used to specify a value to be used to calculate a variable step width modifier for transforms that use temporal prediction. If temporal step width modifier signalled.flag is equal to 0, this variable may be set to a predefined value (e.g. 48).
The variable level Lfiltering_first coefficient may be used to specify the value of the first coefficient in the deblocking mask (e.g. a or the 4x4 block corner residual weight in the example from the earlier sections above). The value of the first coefficient may be between 0 and 15.
The variable level 1.filtering second coefficient may be used to specify the value of the second coefficient in the deblocking mask (e.g. fi or the 4x4 block side residual weight in the example from the earlier sections above). The value of the second coefficient may be between 0 and 15.
The variable scaling mode level] may be provided to specify whether and how the up-sampling process should be performed between decoded base picture and preliminary intermediate picture (e.g. up-scaler 2608 in Figure 26). The scaling mode parameter for level 1 (e.g. to convert from level 0 to level 1) may have a number of possible values including: scaling_mode_levell Value of type 0 no scaling 1 one-dimensional 2:1 scaling only across the horizontal dimension 2 two-dimensional 2:1 scaling across both dimensions 3 Reserved A similar variable scaling mode level2 may be used to specify whether and how the up-sampling process is be performed between combined intermediate picture and preliminary output picture (e.g. as per up-scaler 2687 in Figure 26). The combined intermediate picture corresponds to the output of process 8.9.1. The scaling mode parameter for level 2 (e.g. to convert from level 1 to level 2) may have a number of possible values including: scaling_mode_level2 Value of type 0 no scaling scaling_mode_level2 Value of type 1 one-dimensional 2:1 scaling only across the horizontal dimension 2 two-dimensional 2:1 scaling across both dimensions 3 Reserved As described in the section title "User Data Signalling" above, the variable user data enabled may be used to specify whether user data are included in the bitstream and the size of the user data. For example, this variable may have the following values: user data enabled Value of type
_ _
0 disabled 1 enabled 2-bits 2 enabled 6-bits 3 reserved Variables may also be defined to indicate the bit depth of one or more of the base layer and the two enhancement sub-layers. For example, the variable level] depth.flag may be used to specify whether the encoding and/or decoding components at level I process data using the base depth type or the enhancement depth type (i.e. according to a base bit depth or a bit depth defined for one or more enhancement levels). In certain cases, the base and enhancement layers may use different bit depths. It may also be possible for level I and level 2 processing to be performed at different bit depths (e.g. level 1 may use a lower bit depth than level 2 as level 1 may accommodate a lower level of bit quantization or level 2 may use a lower bit depth to reduce a number of bytes used to encode the level 2 residuals). In a case where a variable such as level I depth flag is provided, then a value of 0 may indicate that the level 1 sub-layer is to be processed using the base depth type. If a value of t is used, this may indicate that the level 1 sub-layer shall be processed using the enhancement depth type.
A variable tile dimensions Ope may be specified to indicate the resolution of the picture tiles. Example values for this variable are shown in the table below. The value of the type may be mapped to an NxNI resolution, where N is the width of the picture tile and M is height of the picture tile.
tile_dimensions_type Value of type 0 no tiling 1 512x256 2 1024x512 3 Custom As indicated by type "3" above, in certain cases a custom tile size may be defined. If a custom tile size is indicated (e.g. via a value of 3 in the table above), the variables custom tile width and custom tile height may be used to specify a custom width and height for the tile.
One or variables may be defined to indicate a compression method for data associated with a picture tile. The compression method may be applied to signalling for the file. For example, the compression type entropy enabled_per tilejlag may be used to specify the compression method used to encode the entropy enabled_flag field of each picture tile. It may take values as shown in the table below, compression type entropy enabled per tile flag Value of type 0 No compression used 1 Run length encoding Similarly, a variable compression Ope size_per tie may be defined to indicate a compression method used to encode the size field of each picture tile. In this case, the compression type size per tile may take the values indicated in the table below (where the terms Huffman Coding and Prefix Coding are used interchangeably).
compression_type_size_per_tile Value of type 0 No compression used I Prefix Coding encoding 2 Prefix Coding encoding on differences 3 Reserved Lastly, the variables custom resolution width and custom resolution height may be used to respectively specify the width and height of a custom resolution.
Data Block Unit Picture Configuration Semantics A number of examples of variables or parameters that may be used to carry information regarding a picture configuration will now be described. These should not be seen as limiting.
In certain examples, a variable may be defined to indicate that certain layers are not to feature enhancement. This may indicate that the enhancement layer is effectively turned off or disabled for certain pictures. For example, if there is network congestion it may be desirable to turn off the enhancement layer for a number of frames and so not receive and add any enhancement data (e.g. not add one or more of the first set and the second set of the decoded residuals). In certain examples, a no enhancement bit.flag variable may be specified to indicate that there are no enhancement data for all layerldx < tiLcoiers in the picture (e.g. as shown with respect to Figure 9A). A no enhancement bit _flag value of 0 may indicate that enhancement is being used and that there is enhancement data.
As described in other examples herein, a quantization matrix may be used to instruct quantization and/or dequantization. For dequantization at the decoder, signalling may be provided that indicates a quantization matrix mode, e.g. a particular mode of operation for generating and using one or more quantization matrices. For example, a variable such as avant matrix mode may be used to specify how a quantization matrix is to be used in the decoding process in accordance with the table below. In certain cases, when (pant matrix mode is not present, i.e. when a mode is not explicitly signalled, the mode may be assumed to take a default value, e.g. be inferred to be equal to 0 as indicated below. By allowing the quantization matrix mode value to be absent, signalling bandwidth for each picture may be saved (e.g. the quantization components of the decoder may use a default setting). Use of modes such as indicated in the examples below may allow for efficient implementation of quantization control, whereby quantization parameters may be varied dynamically in certain cases (e.g. when encoding has to adapt to changing conditions) and retrieved based on default values in other cases. The examples in the table below are not intended to be limiting, and other modes may be provided for as indicated with respect to other examples described herein.
quant_matrix_mode Value of type 0 each enhancement sub-layer uses the matrices used for the previous frame, unless the current picture is an instantaneous decoding refresh -IDR -picture, in which case both enhancement sub-layers use default matrices 1 both enhancement sub-layers use default matrices 2 one matrix of modifiers is signalled and should be used on both residual plane i one matrix of modifiers is signalled and should be used on enhancement sub-layer 2 residual plane 4 one matrix of modifiers is signalled and should be used on enhancement sub-layer 1 residual plane two matrices of modifiers are signalled -the first one for enhancement sub-layer 2 residual plane, the second for enhancement sub-layer 1 residual plane 6-7 Reserved As described above, in certain examples a quantization offset may be used. For dequantization at the decoder, a quantization offset (also referred to as a dequantization offset for symmetrical quantization and dequantization) may be signalled by the encoder or another control device or may be retrieved from local decoder memory. For example, a variable dequant ojftetsignalledJlagmay be used to specify if the offset method and the value of the offset parameter to be applied when dequantizing is signalled. In this case, if the value is equal to 1, the method for dequantization offset and/or the value of the dequantization offset parameter may be signalled. When dequant offs-et signalled _flag is not present, it may be inferred to be equal to 0. Again, having an inferred value for its absence may help reduce a number of bits that need to be sent to encode a particular picture or frame.
Following from the above, the variable dequant nffiet mode _flag may be used to specify the method for applying dequantization offset. For example, different modes may be used to indicate different methods of applying the offset. One mode, which may be a default mode, may involve using a signalled dequant offret variable that specifies the value of the dequantization offset parameter to be applied. This may vary dynamically. In one case, if the dequant offset modejlag is equal to 0, the aforementioned default mode is applied; if the value of dequcnit offset modejlag is equal to 1, a constant-offset method applies, which may also use the signalled dequant offset parameter. The value of the dequantization offset parameter dequant offSet may be, in certain implementations, between 0 and 127, inclusive.
Further quantization variables may also be used. In one case, a set of variables may be used to signal one or more quantization step-widths to use for a picture or frame within the enhancement layer. The step-width values may be used to apply quantization and/or dequantization as explained with respect to the quantization and/or dequantization components of the above examples. For example, step width level] may be used to specify the value of the step-width to be used when decoding the encoded residuals in enhancement sub-layer 1 (i.e. level 1) and step width level2 may be used to specify the value of the step-width value to be used when decoding the encoded residuals in enhancement sub-layer 2 (i.e. level 2).
In certain examples, a step-width may be defined for one or more of the enhancement sub-layers (i.e. levels 1 and 2). In certain cases, a step-width may be signalled for certain sub-layers but not others. For example, a step width level l enabled flag variable may be used to specify whether the value of the step-width to be used when decoding the encoded residuals in the enhancement sub-layer 1 (i.e. level 1 as described herein) is a default value or is signalled (e.g. from the encoder). It may be either 0 (default value) or 1 (to indicate that the value is signalled by step width level]). An example default value may be 32,767. When step width level! enabled flag is not present, it is inferred to be equal to 0.
In certain examples, a set of arrays may be defined to specify a set of quantization scaling parameters. The quantization scaling parameters may indicate how to scale each coefficient within a coding unit or block (e.g. for a 2x2 transform how to scale each of the four layers representing A, H, V and D components). In one example, an array qm coefficient O[layerldx] may be defined to specify the values of the quantization matrix scaling parameter when quant matrix mode is equal to 2, 3 or 5 in the table above and an array q177 coefficient I[layerkly] may be used to specify the values of the quantization matrix scaling parameter when quant matrix mode is equal to 4 or 5 in the table above. The index layerIdx represents a particular layer (e.g. as shown in Figure 9A), which in turn relates to a particular set of coefficients (e.g. one layer may comprise A coefficients etc.).
In examples, a picture Ope bit_flag variable may be used to specify whether the encoded data are sent on a frame basis (e.g., progressive mode or interlaced mode) or on a field basis (e.g., interlaced mode). An example of possible values is shown in the table below.
pieture_type_bit_flag Value of type 0 Frame
1 Field
If a field picture type is specified (e.g. via a value of 1 from the table above), a further variable may be provided to indicate a particular field. For example, a variable field ope bit flag may be used to specify, if the picture type bit_flag is equal to 1, whether the data sent are for top or bottom field. Example values for the field type bit_flag are shown below.
field_type_bit_flag Value of type 0 Top
1 Bottom As discussed in the 'Temporal Prediction and Signalling" section set out above, a number of variables may be defined to signal temporal prediction configurations and settings to the decoder. Certain variables may be defined at a picture or frame level (e g to apply to a particular picture or frame). Some examples are further discussed in this section.
In one case, a temporal refresh bit_flag variable may be signalled to specify whether the temporal buffer should be refreshed for the picture. If equal to 1, this may instruct the refreshing of the temporal buffer (e.g. the setting of values within the buffer to zero as described above).
In one case, a temporal signalling_presentiktg variable may be signalled to specify whether the temporal signalling coefficient group is present in the bitstream. If the temporctl sigmdling_present _flag is not present, it may be inferred to be equal to 1 if temporal enabled_flag is equal to 1 and the temporal refresh bit_flag is equal to 0; otherwise it may be inferred to be equal to O. Lastly, a set of variables may be used to indicate and control filtering within the enhancement layer, e.g. as described with respect to the examples of the Figures. In one case, the filtering that is applied at level 1 (e.g. by filtering component 232, 532, 2426 or 2632 in Figures 2, 5A to 5C, 24 or 26) may be selectively controlled using signalling from the encoder. In one case, signalling may be provided to turn the filtering on and off For example, a levell _filtering enabled/lag may be used to specify whether the level 1 deblocking filter should be used. A value of 0 may indicate that filtering is disabled and a value of 1 may indicate that filtering is enabled. When levell_filtering enabledjktg is not present, it may be inferred to be equal to 0 (i.e. that filtering is disabled as a default if the flag is not present). Although an example is presented with respect to the filtering of residuals that are decoded in the level 1 enhancement sub-layer, in other examples, (e.g. in addition or instead of), filtering may also be selectively applied to residuals that are decoded in the level 2 enhancement sub-layer. Filtering may be turned off and on, and/or configured according to defined variables, in one or more of the levels using signalling similar to the examples described here.
As described in examples above, in certain examples, dithering may be applied to the output decoded picture. This may involve the application of random values generated by a random number generator to reduce visual artefacts that result from quantization. Dithering may be controlled using signalling information.
In one example, a dithering control.flag may be used to specify whether dithering should be applied. In may be applied in a similar way to the residual filtering control flags For example, a value of 0 may indicate that dithering is disabled and a value of 1 may indicate that dithering is enabled. When dithering control.flag is not present, it may be inferred to be equal to 0 (e.g. disabled as per the level filtering above). One or more variables may also be defined to specify a range of values the additional random numbers are to have. For example, a variable (Inhering strength may be defined to specify a scaling factor for random numbers. It may be used to set a range between [-dithering strength,+ dithering strength]. In certain examples, it may have a value between 0 and 31.
In certain examples, different types of dithering may be defined and applied. In this case, the dithering type and/or parameters for each dithering type may be signalled from the encoder. For example, a variable dithering type may be used to specify what type of dithering is applied to the final reconstructed picture. Example values of the variable dithering type are set out in the table below.
dithering_type Value of type 0 None dithering_type Value of type 1 Uniform 2-3 Reserved Data Block Unit Encoded Data Semantics The following section sets out some examples of how the encoded data may be configured. In certain examples, a portion of encoded data, e.g. that relates to a given coefficient, is referred to as a chunk (e.g. with respect to Figure 9A). The data structures 920 in Figure 9A or 2130 indicated in Figure 21 may be referred to as "surfaces". Surfaces may be stored as a multi-dimensional array. A first dimension in the multi-dimensional array may indicate different planes and use a plane index -planeldx; a second dimension in the multi-dimensional array may indicate different levels, i.e. relating to the enhancement sub-layers, and use a level index -levelldx; and a third dimension in the multi-dimensional array may indicate different layers, i.e. relating to different coefficients (e.g. different locations within a block of coefficients, which may be referred to as A, H, V and D coefficients for a 2x2 transform), and use a layer index -layerltly. This is illustrated in Figure 9A, where there are nPlemes, ttLevels., and nLayers (where these indicate how many elements are in each dimension).
As described with respect to the examples of Figures 25 and 26, in certain cases additional custom layers may be added. For example, temporal signalling may be encoded as a non-coefficient layer in addition to the coefficient layers. Other user signalling may also be added as custom layers. In other cases, separate "surface" arrays may be provided for these uses, e.g. in addition to a main "surfaces" array structured as indicated in Figure 9A.
In certain cases, the "surfaces" array may have a further dimension that indicates a grouping such as the tiles shown in Figure 21A. The arrangement of the "surfaces" array is also flexible, e.g. tiles may be arranged below layers as shown in Figure 21B.
Returning to the examples of the above syntax section, a number of control flags that relate to the surfaces may be defined. One control flag may be used to indicate whether there is encoded data within the surfaces array. For example, a surfiwesiplaneldxfilevelldxlilayerldxf entropy enabled flag may be used to indicate whether there are encoded data in surfaces[planeldxfflevelldyfflayerldy] . Similarly, a control flag may be used to indicate how a particular surface is encoded. For example, a surfaces[planeldifflevellthefflayerldx1rle only_flag may indicate whether the data in Ill surfaces Iplaneldx level Idx layerldx fare encoded using only run length encoding or using run length encoding and Prefix (i.e. Huffman) Coding.
If temporal data is configured as an additional set of surfaces, a temporal surfaces array may be provided with a dimensionality that reflects whether temporal processing is performed on one or two enhancement levels. With regard to the example shown in Figure 3A and Figures 24 to 26, a one-dimensional temporal surfaces [platteldx1 array may be provided, where each plane has a different temporal surface (e.g. providing signalling for level 2 temporal processing, where all coefficients use the same signalling). In other examples, with more selective temporal processing the temporal surfaces array may be extended into further dimensions to reflect one or more of different levels, different layers (i.e. coefficient groups) and different tiles.
With regard to the temporal surface signalling of the above syntax examples, similar flag to the other surfaces may be provided. For example, a temporal surfaceslphtneldx [entropy enabled _flag may be used to indicate whether there are encoded data in temporal surfaces [platteldr] and a temporal surfaces[planeklylrle only flag may be used to indicate whether the data in temporal surfaces( planeIdx] are encoded using only run length encoding or using run length encoding and Prefix (i.e. Huffman) Coding.
Data Block Unit Encoded Tiled Data Semantics Similar variables to those set out above for the surfaces may be used for encoded data that uses tiles. In one case, the encoded tiled data block unit, e.g. tiled data, may have a similar surfaces[planeld4 Hewn-dr fflayerld4 fie only.flag. However, it may have an additional dimension (or set of variables) reflecting the partition into tiles. This may be indicated using the data structure surfaces[planeldrfflevelldrfflayerldel.tilespileld4. As set out in the examples above, the tiled data may also have a surfaces[planeldxfflevelldxfflayerldxftiles[tileldxfentropy enabled _flag that indicates, for each tile, whether there are encoded data in the respective tiles (e.g. in surfaces[planeldxfflevelldrfflayerldxf tiles[tile[dr]).
The tiled data structures may also have associated temporal processing signalling that is similar to that described for the surfaces above. For example, temporal surfaces Iplaneldx _Lyle only.flag may again be used to indicate whether the data in temporal surfaces Iplaneld4 are encoded using only run length encoding or using run length encoding and Prefix (i.e. Huffman) Coding. Each tile may have a temporal surfaces lplaneldx 1.tilesItilelaVentropy enabled_flag that indicates whether there are encoded data in temporal surfacesIplaneldxitilesItileldx Tiled data may have some additional data that relates to the use of tiles. For example, the variable entropy enabled_per tile compressed data rle may contain the RLE-encoded signalling for each picture tile. A variable compressed size_per tile _prefix may also be used to specify the compressed size of the encoded data for each picture tile. The variable compressed_prefix last symbol bit offset_per tile_prefir may be used to specify the last symbol bit offset of Prefix (i.e. Huffman) Coding encoded data. Decoding examples that use this signalling are set out later below.
Data Block Unit Surface Semantics The higher level "surfaces" array described above may additionally have some associated data structures. For example, the variable surface size may specify the size of the entropy encoded data and mirface.data may contain the entropy encoded data itself The variable surface.prefix last symbol bit offset may be used to specify the last symbol bit offset of the Prefix (i.e. Huffman) Coding encoded data.
Data Block Unit Additional Info Semantics The additional information data structures may be used to communicate additional information, e.g. that may be used alongside the encoded video. Additional information may be defined according to one or more additional information types. These may be indicated via an additional info type variable. As an example, additional information may be provided in the form of Supplementary Enhancement Information (SEI) messages or Video Usability Information (VUI) messages. Further examples of these forms of additional information are provided with respect to later examples. When SEI messages are used a payload ope variable may specify the payload type of an SET message.
Data Block Unit Filler Semantics In certain cases, it may be required to fill NAL units with filler. For example, this may be required to maintain a defined constant bit rate when the enhancement layer contains a large number of 0 values (i.e. when the size of the enhancement layer is small, which may be possible depending on the pictures being encoded). A filler unit may be constructed using a constant filler byte value for the payload. The filler byte may be a byte equal to OxAA.
It should be noted that the example syntax and semantics that are set out above are provided for example only. They may allow a suitable implementation to be constructed. However, it should be noted that variable names and data formats may be varied from those described while maintaining similar functionality. Further, not all features are required and certain features may be omitted or varied depending on the implementation requirements.
Detailed Exam le In; letnentation o the Decodin Process A detailed example of one implementation of the decoding process is set out below.
The detailed example is described with reference to the method 2700 of Figure 27. The description below makes reference to some of the variables defined in the syntax and semantics section above. The detailed example may be taken as one possible implementation of the schematic decoder arrangements shown in Figures 2, SA to SC, 24, and 26. In particular, the example below concentrates on the decoding aspects for the enhancement layer, which may be seen as an implementation of an enhancement codec.
The enhancement codec encodes and decodes streams of residual data. This differs from comparative SVC and SHVC implementations where encoders receive video data as input at each spatial resolution level and decoders output video data at each spatial resolution level. As such, the comparative SVC and SHVC may be seen as the parallel implementation of a set of codecs, where each codec has a video-in / video-out coding structure. The enhancement codecs described herein on the other hand receive residual data and also output residual data at each spatial resolution level. For example, in SVC and SHVC the outputs of each spatial resolution level are not summed to generate an output video -this would not make sense.
As set out in the "Syntax" section above, a syntax maybe defined to process a received bitstream. The "Syntax" section sets out example methods such as retrieving an indicator from a header accompanying data, where the indicator may be retrieved from a predetermined location of the header and may indicate one or more actions according to the syntax of the following sections. As an example, the indicator may indicate whether to perform the step of adding residuals and/or predicting residuals. The indicator may indicate whether the decoder should perform certain operations, or be configured to perform certain operations, in order to decode the bitstream. The indicator may indicate if such steps have been performed at the encoder stage.
General Overview Turning to the method 2700 of Figure 27A, the input to the presently described decoding process is an enhancement bitstream 2702 (also called a low complexity enhancement video coding bitstream) that contains an enhancement layer consisting of up to two sub-layers. The outputs of the decoding process are: 1) an enhancement residuals planes (sub-layer 1 residual planes) to be added to a set of preliminary pictures that are obtained from the base decoder reconstructed pictures; and 2) an enhancement residuals planes (sub-layer 2 residual planes) to be added to the preliminary output pictures resulting from upscaling, and modifying via predicted residuals, the combination of the preliminary pictures 1 and the sub-layer 1 residual planes.
As described above, and with reference to Figures 9A, 21A and 21B, data may be arranged in chunks or surfaces. Each chunk or surface may be decoded according to an example process substantially similar to described below and shown in the Figures. As such the decoding process operates on data blocks as described in the sections above.
An overview of the blocks of method 2700 will now be set out. Each block is described in more detail in the subsequent sub-sections.
In block 2704 of the method 2700, a set of payload data block units are decoded. This allows portions of the bitstream following the NAL unit headers to be identified and extracted (i.e. the payload data block units).
In block 2706 of the method 2700, a decoding process for the picture receives the payload data block units and starts decoding of a picture using the syntax elements set out above. Pictures may be decoded sequentially to output a video sequence following decoding. Block 2706 extracts a set of (data) surfaces and a set of temporal surfaces as described above. In certain cases, entropy decoding may be applied at this block.
In block 2710 of the method 2700, a decoding process for base encoding data extraction is applied to obtain a set of reconstructed decoded base samples (recDecodedBaseSamples). This may comprise applying the base decoder of previous examples. If the base codec or decoder is implemented separately, then the enhancement codec may instruct the base decoding of a particular frame (including sub-portions of a frame and/or particular planes for a frame). The set of reconstructed decoded base samples (e.g. 2302 in Figure 23) are then passed to block 2712 where an optional first set of upscaling may be applied to generate a preliminary intermediate picture (e.g. 2304 in Figure 23). For example, block 2712 may correspond to up-scaler 2608 of Figure 26. The output of block 2712 is a set of reconstructed level 1 base samples (where level 0 may comprise to the base level resolution).
At block 2714, a decoding process for the enhancement sub-layer 1 (i.e. level 1) encoded data is performed. This may receive variables that indicate a transform size (nTbs), a user data enabled flag (userDataFnabled) and a step-width (i.e. for dequantization), as well as blocks of level 1 entropy-decoded quantized transform coefficients (TransformCoeffQ) and the reconstructed level 1 base samples (reeLlBaseSamples). A plane index (IdxPlanes) may also be passed to indicate which plane is being decoded (in monochrome decoding there may be no index). The variables and data may be extracted from the payload data units of the bitstream using the above syntax.
Block 2714 is shown as comprising a number of sub-blocks that correspond to the inverse quantization, inverse transform and level 1 filtering (e.g. deblocking) components of previous examples. At a first sub-block 2716, a decoding process for the dequantization is performed. This may receive a number of control variables from the above syntax that are described in more detail below. A set of dequantized coefficient coding units or blocks may be output. At a second sub-block 2718, a decoding process for the transform is performed. A set of reconstructed residuals (e.g. a first set of level 1 residuals) may be output. At a third sub-block 2720, a decoding process for a level 1 filter may be applied.
The output of this process may be a first set of reconstructed and filtered (i.e. decoded) residuals (e.g. 2308 in Figure 23). In certain cases, the residual data may be arranged in NxA4 blocks so as to apply an NXA4 filter at sub-block 2720.
At block 2730, the reconstructed level 1 base samples and the filtered residuals that are output from block 2714 are combined. This is referred to in the Figure as residual reconstruction for a level 1 block. At output of this block is a set of reconstructed level 1 samples (e.g. 2310 in Figure 23). These may be viewed as a video stream (if multiple planes arc combined for colour signals).
At block 2732, a second up-scaling process is applied. This up-scaling process takes a combined intermediate picture (e.g. 2310 in Figure 23) that is output from block 2730 and generates a preliminary output picture (e.g. 2312 in Figure 23). It may comprise an application of the up-scaler 2687 in Figure 26 or any of the previously described up-sampling components.
In Figure 27, block 2732 comprises a number of sub-blocks. At block 2734, switching is implemented depending on a signalled up-sampler type. Sub-blocks 2736, 2738, 2740 and 2742 represent respective implementations of a nearest sample up-sampling process, a bilinear up-sampling process, a cubic up-sampling process and a modified cubic up-sampling process. Sub-blocks may be extended to accommodate new up-sampling approaches as required (e.g. such as the neural network up-sampling described herein). The output from sub-blocks 2736, 2738, 2740 and 2742 is provided in a common format, e.g. a set of reconstructed up-sampled samples (e.g. 2312 in Figure 23), and is passed, together with a set of lower resolution reconstructed samples (e.g. as output from block 2730) to a predicted residuals process 2744. This may implement the modified up-sampling described herein to apply predicted average portions. The output of block 2744 and of block 2732 is a set of reconstructed level 2 modified up-sampled samples (recL2ModifiedUpsampledSamples).
Block 2746 shows a decoding process for the enhancement sub-layer 2 (i.e. level 2) encoded data. In a similar manner to block 2714, it receives variables that indicate a step-width (i.e. for dequantization), as well as blocks of level 2 entropy-decoded quantized transform coefficients (TransfonnCoeffQ) and the set of reconstructed level 2 modified up-sampled samples (recL2ModifiedUpsampledSamples). A plane index (IdxPlanes) is also passed to indicate which plane is being decoded fin monochrome decoding there may be no index). The variables and data may again be extracted from the payload data units of the bitstream using the above syntax.
Block 2746 comprises a number of temporal prediction sub-blocks. In the present example, temporal prediction is applied for enhancement sub-layer 2 (i.e. level 2). Block 2746 may thus receive further variables as indicated above that relate to temporal processing including the variables temporal enabled, temporal refresh hit, temporal signalling_present, and temporal step width modifier as well as the data structures TransfbrmTempSig and Tile TempSig that provide the temporal signalling data.
Two temporal processing sub-blocks are shown: a first sub-block 2748 where a decoding process for temporal prediction is applied using the IranstbrinTempSig and Tile TempSig data structures and a second sub-block 2750 that applies a tiled temporal refresh (e.g. as explained with reference to the examples of Figures 11A to 13B). Sub-block 2750 is configured to set the contents of a temporal buffer to zero depending on the refresh signalling.
At sub-blocks 2752 and 2756, decoding processes for the dequantization and transform are applied to the level 2 data in a similar manner to sub-blocks 2718 and 2720 (the latter being applied to the level 1 data). A second set of reconstructed residuals that are output from the inverse transform processing at sub-block 2756 are then added at sub-block 2756 to a set of temporally predicted level 2 residuals that are output from sub-block 2748; this implements part of the temporal prediction. The output of block 2746 is a set of reconstructed level 2 residuals (resL2Residuals).
At block 2758, the reconstructed level 2 residuals (resL2Residuals) and the reconstructed level 2 modified up-sampled samples (recL2ModifiedUpsampledSamples) are combined in a residual reconstruction process for the enhancement sub-layer 2. The output of this block is a set of reconstructed picture samples at level 2 (recL2PictureSamples). At block 2760, these reconstructed picture samples at level 2 may be subject to a dithering process that applies a dither filter. The output to this process is a set of reconstructed dithered picture samples at level 2 (recL2DitheredPictureSamples). These may be viewed at block 2762 as an output video sequence (e.g. for multiple consecutive pictures making up the frames of a video, where planes may be combined into a multi-dimensional array for viewing on display devices).
Payload Data Block Unit Process The operations performed at block 2704 will now be described in more detail. The input to this process is the enhancement layer bitstream. The enhancement layer bitstream is encapsulated in NAL units, e.g. as indicated above. A NAL unit may be used to synchronize the enhancement layer information with the base layer decoded information.
The bitstream is organized in NAL units, with each NAL unit including one or more data blocks. For each data block, the process block() syntax structure (as shown in the "Syntax" section above) is used to parse a block header (in certain cases, only the block header). It may invoke a relevant process block ( ) syntax element based upon the information in the block header. A NAL unit which includes encoded data may comprise at least two data blocks: a picture configuration data block and an encoded (tiled) data block. A set of possible different data blocks arc indicated in the table above that shows possible payload types.
A sequence configuration data block may occur at least once at the beginning of the bitstream. A global configuration data block may occur at least for every instantaneous decoding refresh picture. An encoded (tiled) data block may be preceded by a picture configuration data block. When present in a NAL unit, a global configuration data block may be the first data block in the NAL unit.
Picture Enhancement Decoding Process The present section describes in more detail the picture enhancement decoding process performed at block 2706.
The input of this process may be the portion of the bitstream following the headers decoding process described in the "Process Block Syntax" section set out above. Outputs are the entropy encoded transform coefficients belonging to the picture enhancement being decoded. An encoded picture maybe preceded by the picture configuration payload described in the "Process Payload -Picture Configuration" and "Data Block Unit Picture Configuration Semantics" sections above.
The picture enhancement encoded data may be received as payload encoded data with the syntax for the processing of this data being described in the "Process Payload -Encoded Data" section. Inputs for the processing of the picture enhancement encoded data may comprise: a variable itPlanes containing the number of plane (which may depend on the value of the variable processed planes type flag), a variable nLayers (which may depend on the value of trans/arm type), and a variable nLerels (which indicates the number of levels to be processed). These are shown in Figure 9A. The variable nLevels may be a constant, e.g. equal to 2, if two enhancement sub-layers are used and processed.
The output of block 2706 process may comprise a set of (nPlanes)x(nLevels)x(nLayers) surfaces (e.g. arranged as an array -preferably multidimensional) with elements surfaces[nPlaneslinLevelslinLayers 1. If the temporal signalling_presem flag is equal to 1, an additional temporal surface of a size nPlanes with elements temporal surface[nPlanes] may also be retrieved. The variable nPlanes may be derived using the following processing: if (processed planes type flag == 0) nPlanes = 1 else nPlanes = 3 and the variable nLayers may be derived using the following processing: if (transform type == 0) nLayers = 4 else nLayers = 16 The encoded data may be organized in chunks as shown in Figure 9A (amongst others). The total number of chunks total chunk count may be computed as: nPlanes * nLevels * nfayers * (no enhancement bit _flag == 0) + nPlanes * (temporal signalling_present _flag == 1). For each plane, a number (e.g. up to 2) enhancement sub-layers are extracted. For each enhancement sub-layer, a number (e.g. up to 16 for a 4x4 transform) coefficient groups of transform coefficients can be extracted. Additionally, if temporal signalling_presentjlag is equal to 1, an additional chunk with temporal data for enhancement sub-layer 2 may be extracted. Within this processing, a value of the variable JEMMY equal to 1 may be used to refer to enhancement sub-layer 1 and a value of the variable levelIdir equal to 2 may be used to refer to enhancement sub-layer 2. During the decoding process chunks may be read 2 bits at a time Values for surfacesiplaneldxfilevellarxlilayerldxf entropy enabled.flag, surfacesiplaneldxfflevelldrliktyerIckirle onlyjlag, temporal surf aces1PlaneIdrientropy enabledjlag and temporal surfacesIplaneldxfrle only.flag may be derived as follows: shift size = -1 for (planeldx = 0, planeldx < nPlanes ++planeldx) { if (no enhancement_ bit_ flag == 0) { for (levelldx = 1. levelldx <= nLevels, ++levelldx) { for (layeadx = 0; layer < nLayers, ++layerIdx) { if (shift _size <0) I" data = read byte(bitstream) shift_ size = 8 -1 surfaces [planeldx] [levelldx] entropy enabled flag = ((data >> shift size) & Oxl) surfaces [planeIdglevelldx][layerldx] rle_only_flag = ((data >> (shift_size -1)) & Oxl) shift _size -= 2 t
I
1 else ( for (layeddx = 0; layer < nLayers; ++layerIdx) surfaces [planeIdx][levelIdx][layerIdgentropy enabled flag = 0
I
if (temporal_signalling_present_flag == 1) t if (shift size < 0) { data = read byte(bitstream) shift _size = 8 -1
I
temporal_surfaces[planeldx].entropy_enabl ed_fl ag = ((data >> shift_si ze) & Ox]) temporal surfaces[planeIdithle only flag = ((data >> (shift size -1)) & Oxl) shift_ size -= 2 i
I
L
Data associated with the entropy-encoded transform coefficients and the entropy-encoded temporal signal coefficient group may be derived according to respective values of the entropy enabled.flag and rle only.flag fields. Here entropy encoding may comprise run-length encoding only or Prefix/Huffman Coding and run-length encoding.
The content for the stufaces[olaneklyfflevelldxfiktyeadyldata provides a starting address for the entropy encoded transform coefficients related to the specific chunk of data and temporal surfacesiplaneldvidata provides the starting address for the entropy-encoded temporal signal coefficient group related to the specific chunk of data. These portions of data may be derived as set out below: for (planeIdx = 0; planeIdx < nPlanes; ++planeIdx)( for (levelIdx = 1; levelIdx <= nLevels; ++levelIdx) 1 for (layerldx = 0; layer < nLayers; ++layerldx) { if (surfaces [planeIdx][levelIdx][layeddx] entropy enabled flag) 1 if (surfaces [planeIdx][levelIdx][layeddx].rle_only_flag) ( multibyte = read multibyte(bitstream) surfaces[pl aneidx][levelIdx][1 ayeddx]. size = multibyte surfaces[planeldx][levelldx][layerldx] data bytestream current(bitstream) } else 1 data = read_byte(bitstream) surfaces[planeIdx][levelIdx][layerIdx] prefix last symbol bit offset = (data&Ox 1F) multibyte = read multibyte(bitstream) surfaces[pl an el dx] [1 evel I dx] [1 ayerl dx]. size = multibyte surfaces[planeIdx][levelIdx][layerIdx] data = bytestream current(bitstream) bytestream seek(bitstream, surfaces[planeIdx][levelIdx][layerIdx].size) i r i i if (temporal signalling present flag == 1) f if (temporal surfaces[planeIdgentropy enabled flag) 1 if' (temporal_surfaces[planeldx].rle_only_flag) { multibyte = read multibyte(bitstream) temporal_surfaces[planeIthasize = multibyte temporal surfaces[planeIdx].data = bytestream current(bitstream) f else f data = read_byte(bitstream) temporal surfaces[planddx] . prefix last symbol bit offset = (data&Ox 1F) multibyte = read_multibyte(bitstream) temporal surfaces[planeIdx].size = multibyte temporal_surfaces[planeIdx].data = bytestream_current(bitstream) bytestream seek(bitstream, temporal surfaces[planeldx].size) i The transform coefficients contained in the block of bytes of length surfacesIplaneldx levella5c1 layerldx [size and starting from surfacesfplaneldvfflevelldvfflayeadxfdata address may then be extracted and passed to an entropy decoding process, which may apply the methods described above with respect to Figures 10A to 101, and/or the methods described in more detail in the description below.
If temporal.signalling_present_flag is set to 1, the temporal signal coefficient group contained in the block of bytes of length temporal surfacesIplctneldvlsize and starting from temporal surfaces1PlaneIdxidata address may also be passed to similar entropy decoding process, . Picture Enhancement Decoding Process-Tiled Data The decoding process for picture enhancement encoded tiled data (payload encoded tiled data) may be seen as a variation of the process described above. Syntax for this process is described in the above section entitled "Process Payload -Encoded Tiled Data".
Inputs to this process may be: variables nPlanes,nLayers and nTevels as above; a variable nTilesT2, which equals to Ceil(Picture Width I Tile Withh)xCeil(Picture Height I Tile Height) and refers to the number of tiles in the level 2 sub-layer; a variable ithlesk which refers to the number of tiles in level 1 sub-layer and equals: (a) nTilesT,2 if the variable scaling mode level2 is equal to 0, (b) Ceil(Ceil(Picture Width I 2) / Width)xCeil(Ceil(Picture Height)! Tile Height) if the variable scaling mode level2 is equal to 1, and (c) Ceil(Ceil(Picture Width I 2) / Tile Width)xCeil(Ceil(Picture Height / 2)! Tile Height) if the variable scaling mode level2 is equal to 2; Picture Width and Picture Height, which refer to the picture width and height as derived from the value of the variable resolution type; and Tile Width and Tile Height, which refer to the tile width and height as derived from the value of the variable tile dimensions type. Further details of the variables referred to here is set out in the Data Block Semantics sections above. An output of this process is the (nPlanes)x(nTevels)x(tiLayer) array "surfaces", with elements surfaces[nP/aneslibLeveisffntayerf If temporal signalling_present_flag is set to 1, the output may also comprise an additional temporal surface of a size nPlanes with elements temporal surf acelnPlanesl. Values for the variables //P/c/ne.5 and nLayers may be derived as set out in the above section.
As above, the encoded data is organized in chunks. In this case, each chunk may correspond to a tile, e.g. each of the portions 2140 shown in Figure 21A. The total number of chunks, total chunk count, is calculated as: trHanes * nLevety * nLayers* (nrilesLI + nfilesL2) * (no enhancement bit jag == 0) + invianes * nTilesL2 * (temporal signalling present.flag == 1). The enhancement picture data chunks may be hierarchically organized as shown in one of Figures 21A and 21B. In accordance with the examples described herein, for each plane, up to 2 layers of enhancement sub-layers may be extracted and for each sub-layer of enhancement, up to 16 coefficient groups of transform coefficients may be extracted. Other implementations with different numbers of sub-layers or different transforms may have different numbers of extracted levels and layers. As before, as the present example applies temporal prediction in level 2, if temporal signalling present.flag is set to 1, an additional chunk with temporal data for enhancement sub-layer 2 is extracted. The variable kvelldx may be used as set out above. In this tiled case, each chunk may be read 1 bit at a time. The surfacesIplaneldx levella5c layerldx [He only_flag and, if temporal signalling_present _flag is set to 1, temporal surfaces/planeldx onlyilag may be derived as follows: shift size = -1 for (planeldx = 0; planeldx < nPlanes; ++ planeldx) I if (no enhancement bit flag == 0)-1 for (levelIdx = 1; levelIdx <= nLevels, ++levelIdx) f for (layerldx = 0. layer < nLayers; ++layerldx) 1 if (shift_size <0) 1 data = read byte(bitstream) shift size = 8 -1 surfaces [planeIdx][levelIdx][layerIdx].rle only flag = ((data >> (shift size -1)) & Oxl) shift size -= 1 if (temporal signalling present flag == 1) 1 if (shift size <0) { data = read byte(bitstream) shift size = 8 -1 t temporal surfaces[planeIdx] rle only flag = ((data >> (shift size -1)) & Oxl) shift size -= 1 The surfaces[planeldxffievelIdifflayerIdxfnles[tilekbelentropy enabledfiag and, if temporal signalling_presentilag is set to 1, temporal mufacesfplaneIdxpilespileldxfentropy enabled flag may be derived as 5 follows: if (compression type entropy enabled_per tile flag == 0) { shift_size = -1 for (planeIdx = 0, planeIdx < nPlanes ++planeIdx); if (no enhancement bit flag == 0) f for (levelIdx = 1, levelIdx <= nLevels, ++levelIdx)1 if (levelIdx == 1) nTiles = nT lesL1 else nTiles = nTilesL2 for (layerIdx = 0; layer < nLayers, ++layerIdx) { for (tileldx=0; tileIdx < nTiles; tileldx ++) { if (shift size < 0) f data = read_byte(bitstream) shift size = 8 -1 surfaces [pl an e1dx] [1 evel I dx] [layerIdx] Tiles[tileIdgentropy_enabled_flag -((data >> (shift size -1)) & Oxl) shift size -= 1 _ i 1 else { for (levelIdx = 1; levelIdx <= nLevels; ++levelIdx) ( if (levelIdx == 1) nTiles = nTilesL1 else nTiles = nTilesL2 for (layerIdx = 0; layer < nLayers; ++layerIdx) { for (tileIdx = 0; tileIdx < nTiles, tileIdx++) surfaces [pl an eldx] [1 evel I dx] [layerIdx] TileskileIdgentropy enabled flag = 0 if (temporal_signalling_present_flag == 1) ( for (tileldx = 0; tileIdx < nTilesL2; tileldx++) { if (shift size <0) { data = read_byte(bitstream) shift size = 8 -1 i temporal surfaces[planeIdx] .tiles[tileIdx] entropy enabled flag = ((data >> (shift size -1)) & Ox]) shift size = 1 l 1 else { RLE decoding process as defined herein.
According to the value of the entropy enctfilecl_flag and fie_only_flag fields, the content for the sztrfaces[planeIdyfflevelicbcfilayerldritiles[tilelthidata (i.e. indicating the beginning of the RLE only or Prefix Coding and RLE encoded coefficients related to the specific chunk of data) and, if temporal signalling_present _flag is set to 1, according to the value of the entropy enabledflag and rle only_flag fields, the content for the temporal smfacesIplaneldx[tilesItileldx[data indicating the beginning of the RLE only or Prefix Coding and RLE encoded temporal signal coefficient group related to the specific chunk of data may be derived as follows: if (compression type size per tile == 0) 1 for (planeldx = 0; planeldx < nPlanes; ++planeIdx) ( for (level Idx = 1; level Idx <= nLevels; ++Ievel Idx) ( if (levelIdx == 1) nTiles = nTilesL1 else nTiles = nTilesL2 for (layerIdx = 0, layer < nLayers; ++layerIdx) { for (tileldx = 0; tileldx < nTiles ileIdx++) { if (surfaces [planeIdx][levelIdx][layerIdx] tiles[tileIdx] entropy enabled flag) 1 if (surfaces [planeIdx][levelIdx][layerIdx].rle_only_flag) ( multibyte = read_multibyte(bitstream) surfaces[planeIdx][levelIdx][layerIdx] tiles[tileIdx].size = multibyte surfaces[planeIdx][levelIdx][layerIdx] tiles[tileIdx]data = bytestream current(bitstream) } else ( data = read_byte(b tstream) surfaces[planeldx][levelldx][layerldx].
tiles[tileIdx].prefix last symbol bit offset (data&Ox 1F) multibyte = read_multibyte(bitstream) surfaces[planeIdx][levelIdx][layerIdx] tiles[tileIdx].size = multibyte surfaces[planeIdx][levelIdx][layerIdx].tiles[tileIdx].data = bytestream current(bitstream) bytestream_seek(bitstream, surfaces[planeIdx][levelIdx][layerIdx].fileskileIdx] size)
I i
if (temporal_signalling_present_flag == 1) { for (tileIdx = 0; tileIdx < nTilesL2; tileIdx++) { if (temporal surfaces [planeIdx] tiles[tileIdx] entropy enabled flag) { if (tem poral_surfaces[pl an el dic] . rl e_only_fl ag) { multibyte = read multibyte(bitstream) tem poral _surfaces[pl aneldx] . tiles [tile Idx] . si ze = multibyte temporal surfaces[planeIdx].tiles[tileIdx].data bytestream current(bitstream) 1 else { data = read byte(bitstream) temporal surfaces[planeIdx].tiles[tileIdx].
prefix last symbol bit offset = (data&Ox1F) multibyte = read multibyte(bitstream) temporal_surfaces[planeIdx].tiles[tileIdx] size = multibyte temporal surfaces[planeIdx].tiles[tileIdx].data bytestream current(bitstream) = bytestream seek(bitstream, temporal_surfaces[pl anel dx] . tiles [ti I eldx]. size)
I l i
I
I else { for (planeIdx = 0; planeIdx < nPlanes; ++planeIdx) { for (levelIdx = 1; levelIdx <= nLevels, ++levelIdx) { if (level ldx == 1) nTiles = nTilesL1 else nTiles = nTilesL2 for (layerIdx = 0; layer < nLayers; ++layerIdx) 1 for (tileIdx = 0; tileIdx < nTiles, tileIdx ++) { if (surfaces [planeIdx][levelldx][layerIdx].tiles[tileldgentropy enabled flag) 1 if (surfaces[plane1dx][levelldx][layerldx] tle only flag) { Prefix Coding decoding process as defined in other sections to fill surfaces[planeIdx][levelIdx][layerIdx].tiles[tileIdx] size surfaces[pl an et dx] [level I dx] [layer1dx].tiles [tileI dx]. data = bytestream current(bitstream) ) else 1 Prefix Coding decoding process as defined in other sections to fill surfaces[planeldx][levelldx][layerldx].tiles[tileIdx]. prefix last symbol bit offset Prefix Coding decoding process as defined in other sections to fill surfaces[planeIdx][levelIdx][layerIdx].tiles[fileIdx].size surfaces[planeIdx][levelIdx][layerIdx].tiles[tileIdx].data = bytestream_current(bitstream) bytestream_seek( bitstream,surfaces[planeIdx][levelIdx][layerIdx] tiles[tileIdx].size) i if (temporal signalling_present flag == 1) { for (tileldx = 0; tileIdx < nTilesL2; tileldx++) / if (temporal surfaces [planeIdx] tiles[tileIdx] entropy enabled flag); if (temporal surfaces [p1ane1dx].rle only flag) { Prefix Coding decoding process as defined in other sections to fill temporal surfaces[planeldx].tiles[tileldx].size temporal surfaces[planeIdx].tiles[tileIdx].data bytestream_current(bitstream) 1 else { Prefix Coding decoding process as defined in other sections to fill temporal surfaces[planeIdx] ,tileskileIdgprefix last symbol bit offset Prefix Coding decoding process as defined in other sections to fill temporal surfaces[planeldx].files[tileldx].size temporal surfaces[planeIdx].tiles[tileIdx].data = bytestream current(bitstream) bytestream seek(bitstream, temporal surfaces[planeIdx].tiles[tileIdx].size)
I
I
I
The coefficients contained in the block of bytes of length surfaces1Planeldifflevelldrfflayerldxftileshileldxfsize and starting from surfacesIplaneldxfIlevelldxfilayerldxitilesItilelcbc[data address may then be passed to for entropy decoding process as described elsewhere. If temporal signalling enabledilag is set to 1, the temporal signal coefficient group contained in the block of bytes of length temporal szulaces1PlaneIdx1.tilesitileIdxfsize and starting from temporal surfacesIplaneldxpilesitileldxfdata address are also passed for entropy decoding.
Decoding Process for Enhancement Sub-Layer 1 (L-1) Encoded Data This section describes certain processes that may be performed as part of block 2714 in Figure 27. The result of this process may be an enhancement residual surface (i.e. a first set of level I residuals) to be added to the preliminary intermediate picture.
As a first operation, the dimensions of a level 1 picture may be derived. The level 1 dimensions of the residuals surface are the same as the preliminary intermediate picture, e.g. as output by block 2712. If scaling mode level2 (as described above) is equal to 0, the level 1 dimensions may be taken as the same as the level 2 dimensions derived from resolution type (e.g. as also referenced above). If scaling mode level2 is equal to 1, the level 1 length may be set as the same as the level 2 length as derived from resolution type, whereas the level 1 width may be computed by halving the level 2 width as derived from resolution type. If scaling mode level2 is equal to 2, the level 1 dimensions shall be computed by halving the level 2 dimensions as derived from resolution type.
The general decoding process for a level 1 encoded data block, e.g. block 2714 in Figure 27, may take as input: a sample location (xTb0, yTb0) specifying the top-left sample of the current transform block relative to the top-left sample of the current picture; a variable nrbS specifying the size of the current transform block, e.g. as derived from the value of variable transform type as described above -//MS = 2 if transform type is equal to 0 and nlitS = 4 if transform type is equal to 1; an array TransformCoeff0 of a size (nTb,S)c(nTbS) specifying level 1 entropy decoded quantized transform coefficients; an array recLIBaseSamples of a size (niM)x(nTbS) specifying the preliminary intermediate picture reconstructed samples of the current block resulting from a base decoder or earlier up-scaling; a step Width value derived as set out above from the value of variable step width levell e.g. this may be obtained by shifting step width levell to the left by one bit (i.e., step width level] << 1); a variable IdxPlanes specifying to which plane the transform coefficients belong; and a variable userDataEnabled derived from the value of variable user data enabled.
Output of the process 2714 may be a (nTbS)x(nThS) array of residuals rest] FilteredResiduals with elements resLIFilteredResiductls[x][y]. Arrays of residuals relating to different block locations with respect to a picture may be computed.
The sample location (xTbP, yTbP) specifying the top-left sample of the current transform block relative to the top-left sample of the current picture may be derived as follows: (xTbP, yrbP)= (IdxPlanes== 0) ? AO, yrit0): (xTb0>> ShiftWidthe, y17,0>> Shifilleightn, e.g. where P can be related to either luma or chroma planes depending on which plane the transform coefficients relate to. Shift kVidthC and ShtfilleightC are may be derived as set out above.
If no enhancement hit flag is equal to 0, then enhancement data may be present the following ordered steps apply: 1) If nibs is equal to 4, li-anst7oeftt(1)(1) may be shifted to the right either by two bits (>>2) if the variable userDataEttabled is set to 1, or by six bits (>>6) if userDataEnabled is set to 2. And in addition, if the last bit of TransCoeff0(1)(1) is set to 0, TransCoeff0(1)(1) may be shifted to the right by one bit (>>1); otherwise (e.g. if the last bit of TransCoeffQ(1)(1) is set to 1), TransCoeffQ(1)(1) is shifted to the right by one bit (>>1) and TransCod:f0(1)(1) is set to have a negative value.
2) If nTbs is equal to 2, TransCo00(0)(1) is shifted to the right either by two bits (>>2) if the variable userDatctEnabled is set to 1, or by six bits (>>6) if usetitataEnabled is set to 2. And in addition, if the last bit of TraiisCoeff0(0)(1) is set to 0, Thanseoeff0(0)(1) is shifted to the right by one bit (>>1); otherwise (e.g. if the last bit of Trans.(' oeff0(0)(1) is set to 1), Tr/gist-000(0)M is shifted to the right by one bit (>>1) and Trans(' oeff0(0)(1) is set to have a negative value.
3) The dequantization process of sub-block 2716 as described below is invoiced with the transform size set equal to WM', the array TransibrmCoeffQ of a size (nTbS)x(nTbS), and the variable step Width as inputs, and the output is an (nTbS)x(nTbS) array or dequantized coefficients dequanICoeff 4) The transformation process of sub-block 2718 is invoked with the luma location (xTbX, yThX), the transform size set equal to nTbS, the array dequantC oeff of size (tiTbS)x(iiThS) as inputs, and the output is an (nTbS)x(nTbS) array of reconstructed level 1 residuals -res 1.1 Residuals'.
5) The level 1 filter process of sub-block 2720 is invoked with the luma location (xTbX, yThX), the array resL!Residuals of a size (nTh,S),c(nTb,S) as inputs, and the output is an (nTbS)x(nTlikS) array of reconstructed filtered level 1 residuals -resEIFilteredResiduals% The above steps may be repeated for all coding units that make up a plane or a frame. If no enhancement bit _flag is equal to 1, then enhancement is not applied. In this case, the array resLIFilteredResiduals of size (tiThS)x(tiTbS) may be set to contain only zeros.
Following the operations discussed above, the picture reconstruction process for each plane, i.e block 2730 of Figure 27 and as described in more detail below, is invoked with the transform block location (rib°, ylb0), the transform block size nil bS, the variable Idx1)lanes, the (nlb,c)x(nlb,S) array resL1FilteredResiduals, and the (iiiti,c)x(nlb,c) array recL 1BaseSa pl es as inputs.
Decoding Process for Enhancement Sub-Layer 2 (L-2) Encoded Data The decoding process for enhancement sub-layer 2 (level 2) encoded data at block 2746 may be similar to the decoding process for enhancement sub-layer 1 (level I) encoded data described above. The result of this process is a level 2 enhancement residuals plane to be added to the upscaled level 1 enhanced reconstructed picture.
As a first operation, the dimensions of level 2 picture may be derived. These may be derived from the variable resolution type described above. The dimensions of the level 2 residuals plane may be the same as the dimensions of the level 2 picture.
The general decoding process for a level 2 encoded data block may take as input: a sample location (xTb0, yilb0) specifying the top-left sample of the current transform block relative to the top-left sample of the current picture; a variable firliS specifying the size of the current transform block derived from the value of variable transform type (e.g. as described above -in other examples each level may have different transform sizes); a variable temporal enabled.flag; a variable temporal refresh bit.flag; a variable temporal signalling_present_flag; a variable temporal step width modifier; an array recL2ModdiedUpsampledSamples of a size (nTb,S)x(nTbS) specifying the up-sampled reconstructed samples resulting from the up-scaling process 2732 in Figure 27; an array Tran.sfortnCoeff0 of a size (7Thic)x(tiThS) specifying level 2 entropy decoded quantized transform coefficients; if the variable temporal signalling present.flag is equal to 1 and temporal tile infra signalling enabled _flag is equal to 1, a variable TransformTempSig corresponding to the value in TempSigSullace (e.g. see the temporal processing sections) at the position (xTb0 >> nTbs, ylhO >> nTbs); if in addition temporal tile intra signalling enabled _flag is set to 1, a variable TileTempSig corresponding to the value in TempSigStuface at the position ((xTb0%32) * 32, (yrb0%32) * 32); a step Width value derived, as set out in the "Semantics" section above, from the value of variable step width level2; and a variable Mx-Planes specifying to which plane the transform coefficients belong to. Further details of these variables are set out in the "Syntax" and "Semantics" section above.
The block 2746 processes the inputs as described below and outputs an (nTbS)x(nib,S) array of level 2 residuals -resL2Residzials -with elements resE2Residuals[x][y].
The derivation of a sample location (xTbP, yTbP) may follow a process similar to that set out for the level 1 residuals in the section above.
If no enhancement bil_flag is set to 0, i.e. enhancement is to be applied, then the following ordered steps may be undertaken: 1) If variable temporal enabled _flag is equal to 1 and temporal refresh bit_fhtg is equal to 0, the temporal prediction process (e.g. as specified in the examples abov and below) is invoked with the luma location (x1bY, yibY), the transform size set equal to nTbS, a variable Transform TempSig and a variable Tile TempSig as inputs. The output is an array of temporally predicted level 2 residuals -templiredL2Residzials -of a size eilbSpx(nithS).
2) If variable temporal enabled_flag is equal to 1 and temporal refresh bit _flag is equal to 1, the of temporally predicted level 2 residuals as specified above is set to contain only zeros.
3) If variable temporal enabled flag is equal to 1, temporal refresh bit flag is equal to 0, temporal tile infra signalling enabled.flag is equal to 1 and Transform TempSig is equal to 0, the variable step Width may be modified. It may be modified using the temporal step width modifier, e.g. using the computation: FloorcsiepWidth * (1 -(Clip3(0, 0.5, temporal step width modifier)! 255))).
4) The dequantization process as specified in other sections and shown as sub-block 2752 is invoked with the transform size set equal to nTbS, the array TrattybrmCoeff0 of a size (ttlbS)x(ttlb,S), and the variable sic pWidth as inputs.
The output is an (nTbS)x(nTbS) array dequanteoeff of dequantized coefficients.
5) The transformation process as specified in other sections and shown as sub-block 2756 is invoked with the luma location (xTbY, yThY), the transform size set equal to nTbS, the array dequantized coefficients dequaniCoeff of a size (nTbS)x(nTbS) as inputs. The output is an,5)x(n TM') array of reconstructed level 2 residuals -resL2Residnals.
6) If variable temporal enabled_flag is equal to 1, the array of temporally predicted level 2 residuals tempPredL2Residnais. of a size (nithic)x(nTb.5) is added to the (nTbS)x(nTbS) array of reconstructed level 2 residuals resT2Residuals and the array of reconstructed level 2 residuals resT2Residnals is stored to the temporalanffer at the luma location (x1bY, As per level 1 residual processing, the above operations may be performed on multiple coding units that make up the picture. As the coding units are not dependent on other coding units, as per level 1 residual processing, the above operations may be performed in parallel for the coding units of size (nTbS)x(nTbS).
If no enhancement bit _flag is set to 1, i.e. enhancement is disabled at least for level 2, the following ordered steps apply: 1) If variable temporal enabled.flag is equal to 1, temporal refresh bit.flag is equal to 0 and variable temporal signalling present flag is equal to 1, the temporal prediction process as specified in other sections and below is invoked with the luma location (xTbY, yrbY), the transform size set equal to nTbS, a variable TransformTempSig and a variable T i leTempSig as inputs. The output is an array tempPredL2Residitals (e.g. as per operation 1 above) of a size (nlb,c)x(nTbS).
2) If variable temporal enabled.flag is equal to 1, temporal refresh bit.flag is equal to 0 and variable temporal signalling _present_flag is equal to, the temporal prediction process is invoiced with the luma location (xTbY, ylbY), the transform size set equal to OM', a variable ThatqfbrmTempSig set equal to 0 and a variable Tile TempSig set equal to 0 as inputs and the output is an array tempPredL2Residnals of a size (nTbS)x(nTh,S) 3) If variable temporal enabled.flag is equal to 1 and temporal refresh bit.flag is equal to 1, the array tempPredL2Residuals of a size (nTbS)x(nTbS) is set to contain only zeros.
4) If variable temporal enabled_flag is equal to 1, the array of lempPredL2Residnals of a size (nTbS)x(nTbS) is stored in the (tiThS)x(n AS) array rest2Residnals and resL2Residuals array is stored to the temporalBuffer at the luma location (xTbY, yIbY). Else, the array resL2Residuals of a size (nTbS)x(nTbS) is set to contain only zeros.
The picture reconstruction process for each plane as shown in block 2758 is invoked with the transform block location (xTb0, yib0), the transform block size nTbS, the variable IdxP lanes, the (nlbS)x(nTbS) array resL2Residnals, and the (xTbY)x(yibY) recT2ModifiedUpsampledSampks as inputs. The output is a reconstructed picture.
Decoding Process for The Temporal Prediction A decoding process for temporal prediction such as sub-block 2752 may take as inputs: a location (xThP, yThP) specifying the top-left sample of the current luma or chroma transform block relative to the top-left luma or chroma sample of the current picture (where P can be related to either luma or chroma plane depending to which plane the transform coefficients belong); a variable ifTbS specifying the size of the current transform block (e.g. as derived in the examples above); a variable TransformTempSig; and a variable TileTempSig. In this example, the output to this process is a (itTb,S)x(nTbS) array of temporally predicted level 2 residuals tempPredL2Residnaly with elements lempPredL2Residualy[x][y].
The process 2752 may apply the following ordered steps: 1) If variable temporal tile infra signalling is equal to 1 and xTbP >>5 is equal to 0 andyibP >>5 is equal to 0 and life TempSig is equal to 1, the tiled temporal refresh process as described below and shown as sub-block 2750 is invoked with the location (xTbP, yThP).
2) If variable Transform TempSig is equal to 0, then tempPredL2Residuals[x][y] = temporalBter[xTbP + x][yTbP + y] where x and y are in the range [0, (nTb.S"-1)]. Otherwise, iempPredL2Residuals[x][y] are all set to O. This then conditionally loads values from a temporal buffer to apply to level 2 residuals (e.g., similar to the temporal prediction processes of other examples).
Tiled Temporal Refresh The input to the tiled temporal refresh at sub-block 2750 may comprise a location (xibP, yilhP) specifying the top-left sample of the current luma or chroma transform block relative to the top-left luma or chroma sample of the current picture (where P can be related to either luma or chroma plane depending to which plane the transform coefficients belong). The output of this process is that the samples of the area of the size 32x32 of lemporalBuffer at the location (xTbP, yTbP) (i.e. relating to a defined tile) are set to zero. This process may thus be seen to reset or refresh the temporal buffer as described with reference to other examples.
Decoding Process for lhe Dequantizat ion The following process may be applied to both level 1 and level 2 data blocks. It may also be applied in the encoder as part of the level 1 decoding pipeline. It may implement the dequantize components of the examples. With reference to Figure 27, it may be used to implement sub-blocks 2716 and 2752. Decoding process for the dequantization overview Every group of transform coefficients passed to this process belongs to a specific plane and enhancement sub-layer. They may have been scaled using a uniform quantizer with deadzone. The quantizer may use a non-centered dequantization offset (e.g. as described with reference to Figure 20A to 20D).
The dequantization may be seen as a scaling process for transform coefficients. In one example a dequantization process may be configured as follows. The dequantization process may take as inputs: a variable nTbS specifying the size of the current transform block (e.g. as per other processes above), an array TransformeoeffQ of size (nTbS)x(nThS) containing entropy-decoded quantized transform coefficients; a variable step Width specifying the step width value parameter; a variable leve Aix specifying the index of the enhancement sub-layer (with!exalt& = I for enhancement sub-layer 1 and levelldx= 2 for enhancement sub-layer 2); a variable dOnaniQffsve specifying a dequantization offset (derived from variable dequant offset as described above); if quant matrix mode is different from 0, an array OmCoeff0 of size 1 x tifb.S' (derived from variable qm coefficient 0) and further, if quail matrix mode is equal to 4, an array OmCoeffl of size 1 x nTINS' (derived from variable qin coefficient 1); if' nTbS == 2, an array OnantScalerDDBuffer of size (3 * tilbS)x(tITM)) containing the scaling parameters array used in the previous picture; and, if nlbS == 4, an array OuntniScaferDDSBnffer of size (3 * nThS)x(nTbS) containing the scaling parameters array used in the previous picture.
An Output of the present dequantization process is a (nTbS)x(nTb;S) array d of dequantized transform coefficients with elements d[x][y] and the updated array 25 Onanafatr, &Wier.
For the derivation of the scaled transform coefficients d[x][y] with x = 0...nTbS -1, y = 0...nTbS -1, and a given quantization matrix qm[x] [y], the following computation may be used: d[x][y] = (TransfortnCoeff0[x][y]* ((qm[x+ (levelIdxSivap * nTh,S)][y]+ stepWidthAlodifier[x][y])+ appliedOffret [x][y] The dequantization process above uses a dequantization offset and step-width modifier. A process for deriving these variables, e.g. in the form of 611)/d/caw/Tx] [y] and stepWidthAlodifier[x][y] is shown below: if (dequant offset signalled flag == 0) f stepWidthModifier [x][y] = ((((Floor(-Cconst * Ln (qm[x + (levelIdxSwap * nTbS)][y]))) + Dconst) * (qm[x + (level IdxSwap * nTbS)][y]2))) / 32768) >> 16 if (TransformCoeffQ[x][y] <0) appliedOffset = (-1 * (-deadZoneWidthOffset [x][y])) else if (TransformCoeffQ [x][y] >0) appliedOffset [x][y] = -deadZoneWidthOffset [x][y] else appliedOffset [x][y] = 0 1 else if (dequant offset signalled flag == 1) && (dequant offset mode flag ==1) r i stepWidthModifier [x][y] = 0 if (TransformCoeffQ[x][y] <0) appliedOffset = (-1 * (dQuantOffsetActual [x][y] -deadZoneWidthOffset [)d[Y])) else if (TransformCoeffQ [x][y] > 0) appliedOffset [x][y] = dQuantOffsetActual [x][y] -deaci7oneWidthOffset [x][y] else appliedOffset [x][y] = 0} 1 else if (dequant offset signalled flag == 1) && (dequant offset mode flag == 0) 1 stepWidthModifier [x][y] = (Floor((dQuantOffsetActual [x][y]) * (qm[x + (level IdxSwap * nTbS)][y])) /32768) if (TransformCoeffQ[x][y] <0) appliedOffset = (-1 * (-deadZoneWidthOffset [x][y])) else if (TransformCoeffQ [x][y] > 0) appliedOffset [x][y] = -deadZoneWidthOffset [x] [A else appliedOffset [x][y] = 0 ^ r As described in other examples above, dequantization may use a deadzone with a variable width. A variable deadZoneWidthOffsvi may be derived according to the following process: if stepWidth > 16: deadZoneWidthOffset [x][y] = << 16) -((Aconst * (qm[x + (levelIdxSwap * nTbs)][y] + step WidthModifier [x][y])) + Bconst) >> 1) * (qm[x + (level IdxSwap * nTbs)][y] + stepWidthModifier [x][y]))) >> 16 if stepWidth <= 16: deadZoneWidthOffset [x][y] = stepWidth >> 1 In the above computations, the following constants may be used Aconst = 39; Bconst = 126484; Cconst = 9175; and Dconst = 79953.
The variable dOitatitOffNetActual[x][y] may be computed as follows: if (dequant_offset == 0) dQuantOffsetActual [x][y] = dQuantOffset else I if (dequant_offset_mode_flag == 1) dQuantOffsetActual [x][y] = ((Floor(-Cconst * Ln(qm[x + (levelIdxSwap * nTbs)][y]) + (dQuantOffset << 9) + Floor(Cconst * Ln(StepWidth)))) * (qm[x + (level Idx Swap * nTbs)][y])) >> 16 else if (dequant offset mode flag == 0) dQuantOffsetActual [x][y] = ((Floor(Cconst * Ln(qm[x + (levelIdxSwap * nTbs)][y]) + (dQuantOffset << 11) + Floor(Cconst * Ln(StepWidth)))) * (qm[x + (level Idx Swap * nTbs)][y])) >>16 The variable levelldthwap may be derived as follows: if (level Idx == 2) levelIdxSwap = 0 else levelIdxSwap = 1 Derivation of Quantization Matrix Various quantization and dequantization processes may use a quantization matrix. In the examples above this is referred to as qin[x][y]. The quantization matrix qtn[x][y] contains the actual quantization step widths to be used to decode each coefficient group.
In one example, the quantization matrix qm[x][y] may be derived as set out below, which builds the quantization matrix from a preliminary quantization matrix am _p[x][y] depending on the scaling mode and the level of enhancement: if (levelIdx == 2) 1 if (scaling mode level2 == 1) { for (x = 0; x < nTbS; x++) { for (y = 0; y e nTbs; y++) qm [x][y] = qm p [x][y] } else 1 for (x = 0; x < nTbS; x++) f for (y = 0-y < nTbS y++) qm [x][y] = qm_p [x + nTbS][y] } else 1 for (x = 0. x < nTbS* x++) 1 for (y = 0; y < nTbs; y++) qm [x][y] = qm p [x + (2 * nTbS)][y] The preliminary quantization matrix qm_p[x][y] may be computed as follows: if (nTbs == 2) { for (x = 0; x < 6; x++) { for (y = 0; y < nTbs; y++) qm p[x][y] = (Clip3 (0, (3 << 16),[(QuantScalerDDBuffer [x][y] * stepWidth) + (1 << 16)])* stepWidth) >> 16 ).
1 else { for (y = 0; y < 12; y++) { for (x = 0; x < nTbs; x++) qm_p[x][y] = (Clip3 (0, (3 << 16),[(QuantScalerDDSBuffer [x][y] * stepWidth) + (1 << 16)]) * stepWidth) >> 16 r In this case, the preliminary quantization matrix qm_p[x][y] is built from the contents of a quantization matrix scaling buffer and a step Width variable, depending on the size of the transform used. A different quantization matrix scaling buffer may be used for each transform, e.g. a first quantization matrix scaling buffer OttaniSca/erD08(4ffer may be used for a 2x2 directional decomposition transform and a second quantization matrix scaling buffer OuantScalerDDSBuffer may be used for a 4x4 directional decomposition transform. These buffers may be constructed as set out below.
Derivation of Quantization Matrix Scaling Buffers The quantization matrix scaling buffer may be derived based on one or more of a set of default matrix parameters (which may be stored locally at the decoder), the contents of the buffer for a previous picture and a set of signalled quantization matrix coefficients (e.g. as received from an encoder). The derivation of the buffers for each of the transform sizes described in examples (e.g. 2x2 and 4x4) may be similar. The initialization of the quantization matrix scaling buffer may be dependent on (i.e. controlled by) a signalled or default (if a signal is omitted) quantization matrix mode (e.g., as referenced above). The scaling parameters for a 2x2 transform, in the form of quantization matrix scaling buffer QuandcalerDDBuf,r[x][y] may be derived as follows (i.e. when the variable nTbS is equal to 2). First, the default scaling parameters default scaling dd[x][y] may be set as follows: default scaling dd[x][y] = t 0, 2 { t 0, 0 { t 32, 3) { 0, 32 1 0, 3} f 0, 32} It should be noted that these values may change depending on implementation.
Then, the array OuttntScalerDDBuffer may be initialized based on whether the current picture is an IDR picture. If the current picture is an IDR picture, QuantScalerDDBuffer may be initialized to be equal to default scaling dd as initialised above. If the current picture is not an IDR picture, the OuantScalernDBuffer matrix may be left unchanged, e.g. from a previous picture.
Following initialization, the quantization matrix scaling buffer QuantScalerDDBuffer may be modified based on a quantization matrix mode, e.g. as indicated by the value of quant matrix mode.
If the quant matrix mode is equal to 0 and the current picture is not an IDR picture, 15 the QuantScalerDDBuffer may be left unchanged.
If quant matrix mode is equal to 1, the OuantScalerDDButfrr may be equal to the default scaling dd.
If quailt matrix mode is equal to 2, the OttandcalerDIThtffer may be modified based on a signalled set of quantization matrix coefficients Omeoeff0 as follows: for (MIdx = 0, MIdx <3; MIdx++) for (x = 0; x <2; x++) for (y = 0 y <2 y++) QuantScalerDDBuffer [x + (MIdx * 2)][y] = QmCoeff0[(x * 2) + y] If quant matrix mode is equal to 3, the QuantScalerDDBuffer may be modified based on a signalled set of quantization matrix coefficients OmCoeff0 as follows: for (MIdx = 0; Mldx <2; Mldx++) for (x = 0 x <2; x++) for (y = 0, y < 2, y++) QuantScalerDDBuffer [x + (MIdx * 2)][y] = QmCoeff0 kx * 2) + y] If quant matrix mode is equal to 4, the QuantScalerDDBuffer may be modified based on a signalled set of quantization matrix coefficients QmCoeffl as follows: for (x = 0; x <2; x++) for (y = 0; y <2; y++) QuantScalerDDBuffer [x + 4][y] = QmCoeffl [(x * 2) + y] If (plant matrix mode is equal to 5, the ChtetntScalerDDBztffer may be modified based on two signalled sets of quantization matrix coefficients QmCoeff0 and (2mCoeffl as follows: for (MIdx = 0; MIdx <2; Mldx ++) for (x = 0, x <2; x++) for (y = 0, y <2; y++) QuantScalerDDBuffer [x + (MIdx * 2)][y] = QmCoeff0[(x. 2) + y] for (x = 4, x< 6, x++) for (y = 0; y <2; y++) QuantScalerDDBuffer [y][x] = QmCoeffl[(x * 2) + y] The derivation of scaling parameters for 4x4 transform may be similar to the process described above. The scaling parameters for the 4x4 transform, in the form of quantization matrix scaling buffer OttantScalerDTASBuffer[x][y] may be derived as follows (i.e. when the variable nTbS is equal to 4) First, the default scaling parameters default scaling dds[x][y] may be set: clefault scaling dds[x][y]= 13, 26, 19, 32 1 52, 1,78, 9) 13, 26, 19, 32 1 150, 91, 91, 19 13, 26, 19, 32) 52, 1,78, 9} 26, 72, 0, 3 f 150, 91, 91, 19 1 1 0, 0, 0, 2 52, 1,78, 9} 26, 72, 0, 3 150, 91, 91, 19 Again, the values shown are an example only and may vary for different implementations.
Then, the array QuantScalerDDSBuffer may be initialized based on whether the current picture is an DR picture. If the current picture is an LDR picture, QuantScalerDDSBuffer may be initialized to be equal to default sealing dds as initialised above. If the current picture is not an DR picture, the QuantScalerDDSBuffer matrix may be left unchanged, e.g. from a previous picture.
Following initialization, the quantization matrix scaling buffer QuantScalerDDSBuffer may again be modified based on a quantization matrix mode, e.g. as indicated by the value of plant matrix mode.
If the quain, matrix mode is equal to 0 and the current picture is not an DR picture, the OutintScalerDIXSThtf -Pr may be left unchanged.
If (plant matrix mode is equal to 1, the OnantScaierDDSBitffer may be equal to the default scaling dds If quail( matrix mode is equal to 2, the QuantScalerDDSBuffer may be modified based on a signalled set of quantization matrix coefficients QmCoeff0 as follows: for (MIdx = 0; MIdx < 3; MIdx++) for (x = 0, x <4; x++) for (y = 0, y <4; y++) QuantScalerDDSBuffer [x + (MIdx * 4)][y] = QmCoeff0[(x * 4) + y] If quant matrix mode is equal to 3, the QuantScalerDDSBuffer may be modified based on a signalled set of quantization matrix coefficients OinCoeff0 as follows: for (MIdx = 0, MIdx <2; MIdx++) for (x = 0; x <4; x++) for (y = 0; y <4; y++) QuantScalerDDSBuffer [x + (MIdx * 4)][y] = QmCoeff0[(x * 4) + y] If (pant matrix mode is equal to 4, the OuantScalerDIXSBuf ir may be modified based on a signalled set of quantization matrix coefficients OmCoeff I as follows: for (x = 0; x <4; x++) for (y = 0; y <4; y++) QuantScalerDDSBuffer [x + 811[y] = QmCoeffl [(x * 4) + y] If quant matrix mode is equal to 5, the QuantScalerDDSBuffer may be modified based on two signalled sets of quantization matrix coefficients Om(' oeff0 and OinCoeff as follows: for (MIdx = 0; MIdx <2; MIdx++) for (x = 0; x <4; x++) for (y = 0; y <4; y++) QuantScalerDDSBuffer]x + (MIdx * 4)][y] = QmCoeff0[(x 4) + y] for (x = 8, x < 12, x++) for (y = O. y <4; y++) QuantScalerDDSBuffer [x][3/] = qm coefficient 1[(x * 4) + y]
General Upscaling Process Description
Upscaling processes may be applied, at the decoder, to the decoded base picture at block 2712 in Figure 27 and to the combined intermediate picture at block 2732. In the present examples, upscaling may be configured based on a signalled scaling mode. In the processes described below the upscaling is configured based on the indications of scaling mode level] for the up-scaling to level 1 and based on scaling mode level2 for the up-scaling to level 2.
Upscaling from Decoded Base Picture to Preliminary Intermediate Picture The up-scaling from a decoded base picture to a preliminary intermediate picture, e.g. as performed in block 2712, may take the following inputs: a location (xeurr, yeurr) specifying the top-left sample of the current block relative to the top-left sample of the current picture component; a variable Kix!' lanes specifying the colour component of the current block; a variable neurrS specifying the size of the residual blocks used in the general decoding process; an (n('urr,8)x(netttr,S) array recDecodedBaseSamples specifying decoded base samples for the current block; variables src Width and srcHeight specifying the width and the height of the decoded base picture; variables dst Width and dsilleight specifying the width and the height of the resulting upscaled picture; and a variable is8Bit used to select the kernel coefficients for the scaling to be applied, e.g. if the samples are 8-bit, then variable is8Bit shall be equal to 0, if the samples are 16-bit, then variable is8Bit shall be equal to 1. An output of block 2712 may comprise a (nCurrX)x(nCurrY) array recL1ModilieclUpsampledBaseSamples of picture elements.
In the array of elements rect. IModifieclUpsampledBase,S'amples[x][y] the variables 5 neth-TX and nCurrY may be derived based on the scaling mode. For example, if scaling mode level] is equal to 0, no upscaling is performed, and recLIModifiedUpsampled&tseSamples[x][y] are set to be equal to recDecodedBaseSarnples[x][y]. If scaling mode level] is equal to 1, then neutTX = nCutTS << 1, and nCutrY = tCuiTS. If scaling mode level is equal to 2, then riCurrX = 10 nCurrS << 1, and nCurrY = tCurrS << 1.
The up-scaling applied at block 2712 may involve the use of a switchable up-scaling filter. The decoded base samples may be processed by an upscaling filter of a type signalled in the bitstream. The type of up-scaler maybe derived from the process described in the section "Data Block Unit Global Configuration Semantics". Depending on the value of the variable upsample type, a number of different kernel types may be applied. For example, each kernel types may be configured to receive a set of picture samples recDecodedBaseSamples as input and to produce a set of up-sampled picture samples recL1UpsampledBaseSamples as output. There may be four possible up-scaler kernels (although these may vary in number and type depending on implementation). These are also described in the section titled "Example Up-sampling Approaches". In the present example, if upsample type is equal to 0, the Nearest sample up-scaler described in the "Nearest up-sampling" section above may be selected. If upsample type is equal to 1, the Bilinear up-scaler described in the "Bilinear up-sampling" section above may be selected. If upsample type is equal to 2, a Bicubic up-scaler described in the "Cubic Up-sampling" section above may be selected. If upsample type is equal to 3, a Modified Cubic up-scaler described in the "Cubic Up-sampling" section above may be selected.
A predicted residuals (e.g. predicted average) decoding computation may also be applied in certain cases as described below with respect to the level 1 to level 2 up-scaling.
A general up-scaler may divide the picture to upscale in 2 areas: center area and border areas as shown in Figures 9B and 9C. For the Bilinear and Bicubic kernel, the border area consists of four segments: Top, Left, Right and Bottom segments as shown in Figure 9B, while for the Nearest kernel consists of 2 segments: Right and Bottom as shown in Figure 9C. These segments are defined by the border-size parameter which may be set to 2 samples (1 sample for nearest method).
Level 1 Bit Depth Conversion In certain examples, an up-scaling process may also involve a bit depth conversion, e.g. different levels (including levels 0, 1 and 2 described herein) may process data having different bit depths. The bit depths of each level may be configurable, e.g. based on configuration data that may be signalled from the encoder to the decoder. For example, the bit depths for each level, and any required conversion, may depending on the values of the bitstream fields in the global configuration as processed in the examples above. In one case, bit depth conversion is performed as part of the up-scaling process. Bit depth conversion may be applied using bit shifts and the difference between the bit depths of the lower and upper levels in the up-scaling.
When applying block 2712, the sample bit depth for level I may be based on level] depth.flag. If levell depth.flag is equal to 0, the preliminary intermediate picture samples are processed at the same bit depth as they are represented for the decoded base picture. If level] depth flag is equal to 1, the preliminary intermediate picture samples may be converted depending on the value of a variable base depth and a variable enhancement depth. The variable base depth indicates a bit depth for the base layer. In certain examples, if base depth is assigned a value between 8 and 14, e.g. depending on the value of field base depth type as specified in the "Data Block Unit Global Configuration Semantics" section above, then enhancement depth is assigned a value between 8 and 14, depending on the value of field enhancement depth type as specified in the aforementioned semantics section.
If base depth is equal to enhancement depth, no further processing is required.
If enhancement depth is greater than base depth, the array of level 1 up-sampled base samples recLIModifiecItIpsampledBaseSamples may be modified as follows: recL1ModiliedUpsampledBaseSample.s[x][y]= recL1ModifiedUpsampledBaseSamples[x][y] << (enhancement depth 30 base depth) If base depth is greater than enhancement depth, the array recLIModifiedUpsampledBaseSamples may be modified as follows: recL1ModifiedUpsampledBaseSamples[x][y]= recL1ModifiedUpsampledBaseSamples[x][y] >> (base depth enhancement depth) Upscaling from Combined Intermediate Picture to Preliminary Output Picture A similar set of processes may be performed at block 2732. Inputs to this process may comprise: a location (iceurr, yflitz) specifying the top-left sample of the current block relative to the top-left sample of the current picture component; a variable IdvPlanes specifying the colour component of the current block; a variable neurrS specifying the size of the residual block; an (itCurrS)x(tiCurrS) array recLIPictureSamples specifying the combined intermediate picture samples of the current block; variables srcWidth and srcHeight specifying the width and the height of the reconstructed base picture; variables dst Width and dstHeight specifying the with and the height of the resulting upscaled picture; and a variable is8Bit used to select the kernel coefficients for the scaling to be applied. If the samples are 8-bit, then variable 1s8131t may be equal to 0, if the samples are 16-bit, then variable is8Bit may be equal to 1. An output of process 2732 is the (nCurrX)x(nCurrY) array rect2A/lad4fiedrIpsampledSamples of preliminary output picture samples with elements recL2ModifiedUpsampledS'amples[x][y].
The variables iCurrX and nCurrY may be derived based on a scaling mode in a similar manner to the process described above. If scaling mode level2 is equal to 0, no upscaling is performed, and recL2ModifiedUpscunpledSamples[x][y] are set to be equal to recL IP ictureSample.s[x][y]. If scaling mode level2 is equal to 1, then riCurrX = nCurrS << 1, and neurrY = nCurrS. If scaling mode kvel2 is equal to 2, then fiCtinX= nCurrS << 1, and neurrY = nCurrS << I. As described in the section above, the up-scaling performed at block 2732 may also involve the selective application of an upscaling filter, where an up-scaling type is signalled in the bitstream. Depending on the value of the variable upsample type, each kernel type may be configured to recLIPictureSamples as input and producing recL2UpsampledSamples as output. There may be four possible up-scaler kernels (although these may vary in number and type depending on implementation). These are also described in the section titled "Example Up-sampling Approaches". In the present example, if upsample type is equal to 0, the Nearest sample up-scaler described in the "Nearest up-sampling" section above may be selected. If upsample type is equal to 1, the Bilinear up-scaler described in the "Bilinear up-sampling" section above may be selected. If upsample type is equal to 2, a Bicubic up-scaler described in the "Cubic Up-sampling" section above may be selected. If upset/Nile type is equal to 3, a Modified Cubic up-scaler described in the "Cubic Up-sampling" section above may be selected.
The division of the picture into multiple areas may be performed as described in the section above with reference to Figures 9B and 9C.
Following the upscaling, if predicted residual mode _flag as described above is equal to 1 process, a predicted residual (i.e. modified up-sampling) mode as described above and below (see sub-block 2744 in Figure 27) may be invoked with inputs as the (n(.'urrX)x(nCurrY) array recL2UpsampledSample,s and the (n(T7urrS)x(n(T7ttrrS) array recLIP.klureSample.s specifying the combined intermediate picture samples of the current block. The output of this process may comprise a (nCurrX)x(nCurrY) array recL2ModifiedUp.sampledSamples of picture elements, Otherwise, if predicted residual modeflag is equal to 0, a predicted residual mode is not applied, and recL2ModtfiedtlpsampledSamples[x][y] are set to be equal to recL2tIpsampledS'amples [x][y] (i.e. set as the up-sampled values without modification).
Level 2 Bit Depth Conversion Bit depth conversion as described above for level 1 may also (or alternatively) be applied when up-scaling from level 1 to level 2. Again, bit depth conversion may be performed depending on the values of the bitstream fields in the global configuration.
With respect to level 2, the sample bit depth may be derived from the level] depth flag. If level depth.flag is equal to 1, the preliminary output picture samples are processed at the same bit depth as they are represented for the preliminary intermediate picture. If kvel I depth.flag is equal to 0, the output intermediate picture samples are converted depending on the value of the variables base depth and enhancement depth. These may be derived as discussed in the level 1 bit depth conversion section above. Again, if base depth is equal to enhancement depth, no further processing is required. If enhancement depth is greater than base depth, the array recL2ModifiedUp.sampledSample.s is modified as follows: recL2ModifiedUpsampledSamples[x][y] recL2ModifiedlIpsampledSamples[x][y] << (enhancement depth base depth) If base depth is greater than enhancement depth, the array recL2Modedupsampled,campks is modified as follows: recL2ModifiedUpsampledSamples[x][y] = recL2ModifiedUpsampledSamples[x][y] >> (base depth enhancement depth)
Nearest Sample Upsampler Kernel Description
The sections below set out additional details on the example up-samplers described above.
A first up-sampler is a nearest sample up-sampler as shown in sub-block 2736 and discussed in the "Nearest Up-sampling" section above. The example of sub-block 2736 takes as inputs: variables.sTcX and,sra specifying the width and the height of the input array; variables cisIX and dslY specifying the width and the height of the output array; and a (srcX)x(srcY) array reanputSctmples[x][y] of input samples. Outputs to this process are a (ds00x(dslY) array recUpsampledSamples[x][y] of output samples.
The Nearest kernel performs upscaling by copying the current source sample onto the destination 2x2 grid. This is shown in Figure 9D and described in the accompanying description above. The destination sample positions are calculated by doubling the index of the source sample on both axes and adding +1 to extend the range to cover 4 samples as shown in Figure 9Error! Reference source not found.D.
The nearest sample kernel up-scaler may be applied as specified by the following ordered steps whenever (xCurr, yeurr) block belongs to the picture or to the border area as specified in Figure 9D.
If scaling mode levelX is equal to 1, the computation may be as follows: for (ySrc = 0, ySrc < nCurrS, ++ySrc) yDst = ySrc for (xSrc = 0; xSrc < nCun-S, ++xSrc) xDst = xSrc « 1 recUpsampledSamples[xDst][yDst] = recInputSamples[xSrc][ySrc] recUpsampledSamples[xDst + 1] [yDst] = reanputSamples[xSrc][ySrc] If scaling mode levelX is equal to 2, the computation may be as follows: for (ySrc = 0, ySrc < nCurrS, ++ySrc) yDst = ySrc « 1 for (xSrc = 0; xSrc < nCun-S, ++xSrc) xDst = xSrc « 1 recUp sampled Samples [xDst] [yDst] = recInputSamples[xSrc] [y Src] recUpsampledSamples[xDst][yDst + 1] = recInputSamples[xSrc][ySrc] recUpsampledSamples[xDst + l][yDst] = recinputSamples[xSrc][ySrc] recUpsampledSamples[xDst l][yDst 1] recInputSamples[xSrc][ySrc]
Bilinear Upsampler Kernel Description
A bilinear upsampler kernel process is described in the section titles "Bilinear up-sampling above". Further examples are now described with reference to sub-block 2738 in Figure 27. The inputs and outputs to sub-block 2738 may be the same as for sub-block 2736. The Bilinear up-sampling kernel consists of three main steps. The first step involves constructing a 2x2 grid of source samples with the base sample positioned at the bottom right corner. The second step involves performing the bilinear interpolation. The third step involves writing the interpolation result to the destination samples. The bilinear method performs the up-sampling by considering the values of the nearest 3 samples to the base sample. The base sample is the source sample from which the address of the destination sample is derived. Figure 9E shows an example source grid used in the kernel.
The bilinear interpolation is a weighted summation of all the samples in the source grid. The weights employed are dependent on the destination sample being derived. The algorithm applies weights which are relative to the position of the source samples with respect to the position of the destination samples. If calculating the value for the top left destination sample, then the top left source sample will receive the largest weighting coefficient while the bottom right sample (diagonally opposite) will receive the smallest weighting coefficient, and the remaining two samples will receive an intermediate weighting coefficient. This is visualized in Figure 9F and described in detail above.
An example bilinear kernel up-scaler is illustrated in Figure 9G. It may be applied as specified by the following ordered steps below when (xCurr,yCurr) block does not belong to the border area as specified in Figures 9B and 9C.
If scaling mode levelXis equal to 1, the following up-scaling computation may be 25 performed: for (ySrc = 0, ySrc < nCurrS + 1, ++ySrc) for (xSrc = 0, xSrc < nCurrS + 1, ++xSrc) xDst = (xSrc << 1) -1 bilinear] D(recInputSamples[xSrc -l][ySrc], recInputSamples[xSrc] ][ySrc], recUpsampledSamples [xDst][ySrc], recUpsampledSamples [xDst + l][ySrc]) If scaling mode levelX is equal to 2, the following up-scaling computation may be performed: for (ySrc = 0; ySrc < nCurrS + I; ++ySrc) yDst = (ySrc << 1) -1 for (xSrc = 0; xSrc < nCurrS + 1; +-IxSrc) xDst = (xSrc << 1) -1 bilinear2D(recInputSamples[xSrc -1][ySre -1], rechmutSamples[xSrc][ySrc -I], recInputSamples[xSrc - l][ySrc], recInputSamples[xSrc][ySrc], recUpsampledSamples[xDst][ySrc], recUpsampledSamples [xDst + 1][ySre], recUpsampledSamples[xDst][ySrc + 1], recUpsampledSamples[xDst + l][ySrc + 1]) The bilinear kernel up-scaler is applied as specified by the following ordered steps below when (xCurr, l'urr) block belongs to the border area as specified in Figures 9B and 9C.
If scaling mode (eve IX is equal to 1: for (ySrc = 0; ySrc < nCurrS + 1; ++ySrc) for (xSrc = 0; xSrc < nCurrS + 1; ++xSrc) xDst = (xSrc << 1) -1 xSrc0 = Max(xSrc -1, 0), xSrcl = Min(xSrc, srcWidth -1) bilinearlD(recInputSamples[xSrc0][ySrc], recInputSamples[xSrcl][ySrc], dst00, dst] 0) if (xDst >= 0) recUpsampledSamples[xDst][ySrc] = dst00 if (xDst < (dstWidth-1)) recUpsampledSamples[xDst + l][ySrc] = dst10 If scaling mode levelX is equal to 2: for (ySrc = 0, ySrc < nCurrS + 1, ++ySrc) yDst = (ySrc << 1) -1 ySrc0 = Max(ySrc -1, 0))); ySrcl = Min (ySrc, srcHeight -1))) for (xSrc = 0, xSrc < nCurrS + 1; ++xSrc) xDst = (xSrc << 1) -1 xSrc0 = Max(xSrc -1, 0), xSrcl = Min(xSrc, srcWidth -1) bilinear2D(recInputSamples[xSrc0][ySrc0], recInputSamples[xSrcl][xSrc0], recInputSamples[xSrc0][ySrcl], recInputSamples[xSrcl][ySrcl], dst00, dstI0, dst01, dstl 1) The function bilinearlD (in00, in10, out00, out10) as set out above may be applied as set out below: in00x3 = in00 * 3 inl Ox3 = in10 * 3 out00 = ((in00x3 + inI0 + 2) >> 2) outl 0 = ((in00 + in10x3 + 2) >> 2) The function bilinear2D (in00, in10, inOl, ml, out00, out10, out01, outll) as set out above may be applied as set out below: in00x3 = in00 * 3 in10x3 = inlO *3 inOlx3 = in01 * 3 inllx3 = n11 *3 in00x9 = in00x3 * 3 in10x9 = in10x3 *3 inOlx9 = inOlx3 * 3 inl lx9 = n1 lx3 *3 out00 = ((in00x9 + in10x3 + inOlx3 + in11 + 8) >> 4)) out10 = ((in00x3 + in10x9 + in01+ in] ]x3 + 8) >> 4)) out01 = ((in00x3 + inI0 + in0 1 x9 + inl lx3 + 8) >> 4)) outll = ((in00 + in10x3 + inOlx3 + inl lx9 + 8) >> 4)) 15.3
Cubic Upsampler Kernel Description
The cubic up-sampler kernel process that is shown in sub-block 2740 may be applied as set out in this section. The inputs and outputs are the same as those described in the sections above. Further reference is made to Figures 91-1, 91 and 9J and the section titled -Cubic Up-sampling".
The cubic up-sampling kernel of sub-block 2740 may be divided into three main steps. The first step involves constructing a 4x4 grid of source samples with the base sample positioned at the local index (2, 2). The second step involves performing a bicubic interpolation. The third step involves writing the interpolation result to the destination samples.
The cubic up-sampling kernel may be performed by using a 4x4 source grid which is subsequently multiplied by a 4x4 kernel. During the generation of the source grid, any samples which fall outside the frame limits of the source frame are replaced with the value of the source samples at the boundary of the frame. This is visualized in Figures 9H and 91.
The kernels used for the cubic up-sampling process typically have a 4x4 coefficient grid. However, the relative position of the destination sample with regards to the source sample will yield a different coefficient set, and since the up-sampling is a factor of two, there will be 4 sets of 4x4 kernels used in the up-sampling process. These sets are represented by a 4-dimensional grid of coefficients (2 x 2 x 4 x 4). The bicubic coefficients are calculated from a fixed set of parameters; a core parameter (or bicubic parameter) of and four spline creation parameters. These may have values of, for example, -0.6 and [1.25, 0.25, -0.75, -1.75] respectively. The implementation of the filter uses fixed point computations.
The cubic kernel up-scaler is shown in Figure 9J and described in more detail in the "Cubic up-sampling" section above. The up-scaler is applied on one direction (vertical and horizontal) at time and follows different steps if (xCurr, yCurr) block belongs to the border as specified in Figures 9B and 9C Given a set of example coefficients as follows: kernel [y] [x] = -1382, 14285, 3942, -461 1 { -461, 3942, 14285, -1382 { -1280, 14208, 3840, -384 I -384, 3840, 14208, -1280 { where y = 0...1 are coefficients to be used with 10-bit samples and y = 2...3 to be used with 8-bits samples. The up-scaler may thus be applied according to the following pseudo-code kernelOffset is equal to 4.
kernelSize is equal to 4.
if (Horizontal) { for (y = 0; y < nCurrS, y++) for (xSrc = 0; xSrc < nCurrS + 1; xSrc++) ConvolveHorizontal(reclnputSamples, recUpsampledSamples, xSrc, y, kernel[ s8Bit * 2]) 1 else if (Vertical) { dstHeightM1 = dstHeight -1 for (ySrc = 0, ySrc < nCun-S + 1 ySrc++) yDst = (ySrc << 1) -1 if (border) { yDstO = ((yDst > 0)&&(yDst < dstHe ght)) ? yDst -yDstl = ((yDst + 1) < dstHeightM1) ? yDst + 1 * -1 ) else { yDstO = yDst yDstl = (yDst + 1) i for (x = 0; x < nCurrS; x++) ConvolveVertical(recInputSamples, recUpsampledSamples, yDstO, yDst], x, ySrc, kemel[is8Bit * 2]) ^ ( The function ConvolvellorizontalOnput, out/nu x, y, kernel, border) as referenced above may be applied as set out below: xDst = (x « 1) -1; if (border) dstWidthM1 = dstWidth -1 if (xDst >= 0 && xDst < dstWidth) output[xDst][y] = ConvolveHorizontal (kernel [0], input[x + kernel Offset] [y] , 14); if (xDst < dstWi dthM1) output [xDst + l][y] = ConvolveHorizonta1(kernel [1], input[x + kernelOffset][y]) , 14) else output [xDst][y] = ConvolveHorizontal (kernel [0], input[x + kernel Offset][y] , 14) output [xDst + I] [y] = ConvolveHorizontal (kernel [1], input[x + kernel Offset] [y] , 14) The function Convolve Vertical (input, output, yDstn, yDst, x ySrc, kernel) as referenced above may be applied as set out below: if (border) dstWidthM1 = dstWidth -1 if (yDstO >= 0) output[x][yDstO] = ConvolveHorizontal (kernel [0], input[x][y + kernelOffset] , 14) if (yDstO >= 0) output [x][yDst] = ConvolveHorizontal(kernel[1], input[x][y + kernelOffset]) , 14) else output [x][yDstO] = ConvolveHorizontal (kernel[0]input[x + kemelOffset][y] , 14) output [x][yDst]] = ConvolveHorizontal (kernel [1], input[x + kernelOffset][y], 14) The function ConvolveHorizontal (kernel, input, shift) as referenced above may be applied as set out below: accumulator = 0 for (int32 t x = 0; x < kernel Size; x++) _ accumulator += input[x] * kernel [x] offset = 1 << (shift -1) output = ((accumulator + offset) >> shift)
Modified Cubic Upsainpler Kernel Description
Lastly in this section, a short description of an example implementation of sub- block 2742 is presented. The inputs and outputs may be defined as for the other up-sampling processes above. The implementation of the modified cubic filter again uses fixed point computations. It may be seen as a variation of the cubic up-sampler kernel described above, but with the following kernel coefficients: kernel [y][x] = (-2360,15855, 4]65,-]276} -1276, 4165, 15855, -2360) where y = 0...1 are coefficients to be used with 10-bit samples and y = 2...3 to be used with 8 bits samples, the kernelOffset is equal to 4, and the kernelSize is equal to 4.
It should be noted the kernels provided herein are for example only and other implementations may use different kernels.
Predicted Residual Process Description
The following section will briefly provide an example implementation for the predicted residual process shown in sub-block 2744 of Figure 27. It may also be applied as part of the up-scaling of block 2712 in other examples. Inputs to this process are shown as: variables srcX and srcY specifying the width and the height of the lower resolution array; variables dstX and dstY specifying the width and the height of the upsampled arrays; a (srcX)x(srcY) array recLowerResSamples[x][y] of samples that were provided as input to the upscaling process; and a (dstX)x(dstY) array recUpsctmpledSamples [x][y] of samples that were output of the up-scaling process. The outputs to this process are a (c,/s00x(dslY) array recUpsampledModifiedSamples[x][y] of output samples.
In the present example, the predicted residual process modifies recUpsampledSamples using a 2x2 grid if scaling mode levelX is equal to 2 (i.e. is two-dimensional) and using a 2x1 grid if scaling mode levelX is equal to 1 (i.e. is one-dimensional). The predicted residual process is not applied if scaling mode levelX is equal to 0 (e.g. as no up-scaling is performed).
The predicted residual process may be applied as specified by the following ordered steps whenever (A-nu, yCurr) block belongs to the picture or to the border area as specified in Figures 9B and 9C: If scaling_mode level X is equal to 1 (i.e scaling is one-dimensional), the following computation may be performed: -2360, 15855, 4165, -1276 -1276, 4165, 15855, -2360 for (ySrc = 0; ySrc < srcY; ySrc++) yDst = ySrc for (xSrc = 0; xSrc < srcX; xSrc++) xDst = xSrc << 1 modifier = recLowerResSamples[xSrc][ySrc] - (recUpsampledSamples[xDst][yDst] + recUpsampledSamples[xDst + 1][yDst]) >> 1 recModifiedUpsampledSamples[xDst][yDst] recUpsampledSamples[xDst][yDst] + modifier reoLVIodifiedUpsampledSamples[xDst + 1][yDst] recUpsampledSamples[xDst + l][yDst] + modifier If scaling mode levelX is equal to 2 e. scaling is two-dimensional), the following computation may be performed: for (ySrc = 0, ySrc < srcY, ySrc++) yDst = ySrc « 1 for (xSrc = 0; xSrc < srcX; xSrc++) xDst = xSrc << 1 modifier = recLowerResSamples[xSrc][ySrc] -(recUpsampledSamples[xDst][yDst] + recUpsampledSamples[xDst + l][yDst] + recUpsampledSamples[xDst][yDst + 1] + recUpsampledSamples[xDst + 1][yDst + 1])>> 2 recModifiedUpsampledSamples [xDst] [yDst] recUpsampledSamples[xDst][yDst] + modifier recModifiedUpsampledSamples [xDst][yDst + 1] = recUpsampledSamples[xDst + l][yDst] + modifier recModifiedUpsampledSamples [xDst + l][yDst] = recUpsampledSamples[xDst][yDst + 1] + modifier recModifiedUpsampledSamples [xDst + l][yDst + 1] recUpsampledSamples[xDst + 1][yDst + 1] + modifier Transform Inputs' and Outputs, Transform Types, and Residual Samples Derivation Decoding processes for the transform are shown as sub-blocks 2718 and 2756 in Figure 27. These processes may perform an inverse transform at the decoder. Inputs to this these processes may be: a location (TAT yThP) specifying the top-left sample of the current luma or chroma transform block relative to the top-left luma or chroma sample of the current picture (as before, P can be related to either luma or chroma plane depending to which plane the transform coefficients belong); a variable nThS specifying the size of the current transform block (e.g. derived as above); and a (nTbS)x(nTbS) array d of dequantized transform coefficients with elements d[x][y]. An output of this process is the (nlbS)x(nlb 5) array R of residuals with elements R[x][y].
In the examples described herein, there are two types of transforms that can be used in the encoding process. These need not be limiting, and other transforms may be used. The two transforms described herein both leverage small kernels which are applied directly to the residuals that remain after the stage of applying Predicted Residuals (e.g. as per the predicted average computations described above). Residuals may be similar to those shown in Figure 6A.
The (tiTb,S)x(nTbAS) array R of residual samples may be derived in one of two ways. For the first transform (referred to herein as 2x2 or directional decomposition -DD), each (vertical) column of dequantized transform coefficients d[x][y] with x = 0..,nThS-1, y = 0...nTbS -1 may be transformed to R[x][y] with x = 0...nTbS -1, y = 0...nTbS-1 by invoking the two-dimensional transformation process for the first transform described herein if nThS is equal to 2. For the second transform (referred to herein as 4x4 or directional decomposition squared -DDS), each (vertical) column of dequantized transform coefficients d[x][y] with x = On] 'IS -1, y = On] 'IS -1 is transformed to R[x][y] with x = 0...nTbS -1, y = 0...nTbS -1 by invoking the two-dimensional transformation process for the second transform if nTbS is equal to 4.
2x2 Directional Decomposition Trans:num The first transform (2x2 or DD) will now be briefly described.
If nThS is equal to 2, the transform has a 2x2 kernel which is applied to each 2x2 block of transform coefficients. The resulting residuals are derived as set out below.
If,svaling /node levelX for the corresponding enhancement sub-layer is equal to 0 or 2, the inverse transformation is performed according to the following matrix multiplication: {Roo) { 1, 1, 1, 1) {Coo) {Roil ={ 1, -1, 1, -1 * {Col) {Rio) [1, 1,-1,-1) {Cio) {RA) { 1, -1, -1, 1) [Cu) If scaling mode levelX, for the corresponding enhancement sub-layer is equal to (i.e. scaling is in one-direction), the inverse transformation is performed according to the following matrix multiplication: {Roo) { 1, 1, 1, 0) {Coo) [Roil = 1, -1, -1, 0 * {Cud [Rio) {. 0, 1, -1, 1 {Cm} {RIO 0, -1, -1, 11 {CIO 4x4 Directional Decomposition Transfrrm The second transform (4x4 or DDS) will now be briefly described.
If nTI),S1 i s equal to 4 the transform has a 4x4 kernel which is applied to a 4x4 block of transform coefficients. The resulting residuals are derived as set out below.
If scaling mode levelX for the corresponding enhancement sub-layer is equal to 0 or 2, the inverse transformation is performed according to the following matrix 20 multiplication: {R001 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 1C001 {R01} { 1, 1,-1,-1, 1, 1,-1,-1, 1, 1,-1,-1, 1, 1,-1,-1) {C011 (R02} ( 1,-1, 1,-1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -I} {CO21 {R031 I 1,-1,-1, 1, 1,-1,-1, 1, 1,-1,-1, 1, 1,-1,-1, 1 1 {CO3} {R10} { 1, 1, 1, 1, 1, 1, 1, 1,-1,-1,-1,-1,-1,-1,-1,-1} 1C101 (1211)- (1, 1,-1,-1, 1, 1,-1,-1,-1,-1, 1, 1,-1,-1, 1, 1 1- (CM {R12) ( 1,-1, 1,-1, 1,-1, 1,-1,-1, 1,-1, 1,-1, 1,-1, 1 1 {C121 {R13} ={ 1,-1,-1, 1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1}* {C13} {R20} { 1, 1, 1, 1, -1, -1, -1,-1, 1, 1, 1, 1, -1, -1, -1, -1} {C20} {R21} {1, 1,-1,-1,-1,-1, 1, 1, 1, 1,-1,-1,-1,-1, 1, 1) {(1721) (R22} { 1,-1, 1,-1,-1, 1,-1, 1, 1,-1, 1,-1,-1, 1,-1, 1) (C221 {R231 1,-1,-1, 1,-1, 1, 1,-1, 1, -1, -1, 1, -1, 1, 1, -1 {C23) {R301 1, 1, 1, 1,-1,-1,-1,-1,-1,-1,-1,-1, 1, 1, 1, 1 1 {C30) {R31} { 1, 1,-1,-1,-1,-1, 1, 1,-1,-1, 1, 1, 1, 1,-1,-1 {C31) (R32} ( 1, -1, 1, -1, -1, 1, -1, 1, -1, 1, -1, 1, 1, -1, 1,-1} {C321 {R33} { 1,-1,-1, 1,-1, 1, 1,-1,-1, 1, 1,-1, 1,-1, 1, 1 {C33} If scaling mode levelX for the corresponding enhancement sub-layer is equal to 1, the inverse transformation is performed according to the following matrix multiplication: {R001 {R011 1 I, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1 1 C011 1CO21 1 1, 1, -1, -1, 1, 1, -1, -1, 0, 0, -1, -1, 0, 0, -1, -1 {R02} { 1, -1, 1, -1, 1, -1, 1, -1, 0, 0, 1, -1, 0, 0, 1, -1} {C031 (1t031 { 1, -1, -1, 1, 1, -1, -1, 1, 0, 0, -1, 1, 0, 0, -1, 1 {C041 1R101 1 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, -1, -1, 1, 1, -1, -1 1 {C101 {R11} 1 0, 0, -1, -1, 0, 0, -1,-1, 1, 1, 1, 1, 1, 1, 1, 1 1 1C111 (R121 (0, 0, 1, -1, 0, 0, 1, -1, 1, -1, -1, 1, 1, -1, -1, 1 1 {C121 {R13) = { 0, 0, -1, 1, 0, 0, -1, 1, 1, -1, 1, -1, 1, -1, 1, -1) {C131 (11201 1 1, 1, 1, 1, -1, -1, -1, -1, 0, 0, 1, 1, 0, 0, -1, -1 1C201 (11211 1 1, 1, -1, -1, -1, -1, 1, 1, 0, 0, -1, -1, 0, 0, 1, 1} {C21} {R22} { 1, -1, 1, -1, -1, 1, -1, 1, 0, 0, 1, -1, 0, 0, -1, 1 1 {C221 (R231 (1, -1, -1, 1, -1, 1, 1, -1, 0, 0, -1, 1, 0, 0, 1, -1 1 {C231 {R301 1 0, 0, 1, 1, 0, 0, -1, -1, 1, 1, -1, -1, -1, -1, 1, 1 1 1C301 {R311 1 0, 0, -1, -1, 0,0, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1 1C311 {R32} { 0, 0, 1, -1, 0, 0, -1, 1, 1, -1, -1, 1, -1, 1, 1, -H {C321 (R331 { 0, 0, -1, 1, 0, 0, 1, -1, 1, -1, 1, -1, -1, 1, -1, 1 1 {C331 Decoding process for the residual reconstruction Blocks 2730 and 2758 in Figure 27, show a process for reconstructing residuals.
This process involves applying residual data derived from the enhancement decoding to various pictures to enhance those pictures. This may be performed at multiple levels, which may reflect multiple levels of scaling in multiple dimensions (e.g. as configured via transmitted scaling factors). Blocks 2730 and 2758, as with other operations shown in Figure 27, are shown as applied to blocks or coding units of data (e.g. 2x2 or 4x4 blocks of residuals and pixel elements). The operations may then be applied across all the blocks or coding units that makeup a complete picture or frame. As the blocks or coding units do not depend on other blocks or coding units, these operations may be parallelised for the blocks or coding units, e.g. on Graphical Processing Units (GPUs) and/or multi-core processors, including those present in mobile devices such as tablets and smartphone.
Turning now to a process to implement one or more of blocks 2730 and 2758 in Figure 27, the reconstructed residual of each block or coding unit may be derived as set out here. First, the variable riCb"ST may be set equal to 2 if transtbrm type or 4 if transform type is equal to 1 (e.g. reflecting the type of transform -2x2 or 4x4 in the described examples.). The variable tiCbSC is then set equal to //CM" >> 1.
If IcbcPlanes is equal to 0, the residual reconstruction process for a colour component as specified below is invoked with the luma coding block location (xeb, y(b), the variable He urrS set equal to tiCbSL, and the variable ItixPI caws set equal to 0 as input. This corresponds to processing for a luma plane.
If kb-Planes is equal to 1, the residual reconstruction process for a colour component as specified below is invoked with the chroma coding block location (xCb >> ShiftWidthC, yCb >> AWL-eightC), the variable nCurrS set equal to nebSC, and the variable IdxPlcuies set equal to as inputs. This corresponds to processing for chroma planes, where the chroma samples may be arranged with respect to the luma samples as shown in Figures 7A to 7C (and described with reference to the parameters Shift WicithC and ShiltHeighte above).
A residual reconstruction for a level 1 block, e.g. as shown as block 2730 in Figure 27 (and relating to, say, the operation applied by summation component 220 in Figure 2), will now be described. Inputs to this process may comprise: a location (xCurr, yeurr) specifying the top-left sample of the current block relative to the top-left sample of the current picture component; a variable IckPlanes specifying the colour component of the current block; a variable itCurrS specifying the size of the residual block; an (trOirrS)x(trOntS) array recL IliaseSampies specifying the preliminary intermediate picture reconstructed samples of the current block (e.g., relating to 2304 in Figure 23); and an (nCurrS)x(nCurrS) array resLIFilteredResiduals specifying the level 1 filtered residuals of the current block (e.g., relating to 2308 in Figure 23). The output of this process is the combined intermediate picture (tiCurr,S)x(neurrS) array reef, I Samples with elements reef.1 Samples[x][y] (e.g., relating to 2310 in Figure 23).
The (nCurrS)x(nCurrAS) block of the reconstructed sample array red, 1 Samples at location (xCurr, yeurr) may be derived as follows: recLLS'amples[xCurr + i][ye + j] = recLIBaseS'amples[i][j] + resliFilteredResiduals[i][j] with i = 0... neurrS -1,j = 0... ;CHITS-1 As can be seen this may be performed block by block or for the complete plane (as the residuals and reconstructed base decoded picture are added elementwise). In the above, the location (x(i'urr, yeurr) simply provides the two-dimensional offset for a current block or coding unit with respect to an enhanced level 1 output picture.
Following reconstruction at block 2730, the upscaling process for a colour component as specified above (and shown as block 2732) may be invoked with inputs: the location (xCittr, yChrt.); the transform block size urb.9, the (7Curr5)xeCurrS) array recLISamples; the variables sreWidth and srcHeight specifying the size of the reconstructed base picture; the variables dst Width and dstHeight specifying the width and the height of the upscaled resulting picture; and the variable is8Bit (e.g. the latter equal to 1 if enhancement depth type is equal to 0).
A residual reconstruction for a level 2 block, e.g. as shown as block 2758 in Figure 27 (and relating to, say, the operation applied by summation component 258 in Figure 2) will now be described. Inputs to this process are similar to those set out above with relation to level 1 reconstruction. For example, inputs may comprise: a location (xCur, r, yOuT) specifying the top-left sample of the current block relative to the top-left sample of the current picture component; a variable NA-Planes specifying the colour component of the current block; an (nCurn8)x(neturS) array recL2ModifiedrIpscaledSamples specifying the preliminary output picture samples of the current block (e.g. relating to 2312 in Figure 23); and an (neutT,S)x(tiCurr,S) array resL2Residuals specifying the level 2 residuals of the current block (e.g. relating to 2316 in Figure 23). An output of this process is the (nCurr,S)x(nCurr,S) array reeL2PietureSamples of combined output picture samples with elements recL2PictureSamples[x][y] (e.g. relating to 2322 in Figure 3).
The (nCluTS)x(tiCurr,S) block of the reconstructed sample array recL2PictureSamples at location (xCurr, yCurr) may be computed as follows: recL2PictureSantples[xCurr + i][yCurr + j] = recL2Mod4tiedUpscaledSamples[i][d+ resk2Residuals[i][j] with i = 0... nCturS -1,] = 0... tiCurrS -] If dithering type as described in at least the "Semantics" section above is not equal to 0, the a dithering process as shown by block 2760 is invoked with the location (xGun, yCurr) and the (nCurr,S)x(nCurrS) array recL2PictureSamples. This may then output a final array recL2DitheredPictureSamples that is used, together with the other coding units making up the picture, to output a reconstructed picture of the video at block 2762.
Decoding process Pt' the L-I filter As set out in other examples, a filter may be applied to the decoded level 1 residuals. A similar filter may also be deployed at the encoder (e.g. the simulated level 1 decoding path). This filter may be described as an "in-loop" filter, as it is applied as the processing loops around different coding units. In Figure 27, a level 1 residual filter is applied on the level I residual surface block at method block 2720 before the residuals are added to the base reconstructed picture (e.g. at method block 2730). In certain cases, the level 1 filter may be selectively applied based on the transform type. In certain cases, it may only be applied when a 4x4 transform is used, i.e. the filter may be configured to operate only if the variable transform type is equal to 1. In other examples, block 2720 may also be applied when a 2x2 transform is used with a different kernel, or after reformatting of the samples into a different sized block.
The level 1 filter of block 2720 may operate on each 4x4 block of transformed residuals by applying a mask whose weights are structured as follows (and is also set out with reference to the description of processing components above): I a, 13, 13, a 113, 1, 1,13 113, 1, 1,13 fa,P43,a) Turning to block 2720 of Figure 27, the inputs to this process are: a sample location (x1b0, ylb0) specifying the top-left sample of the current transform block relative to the top-left sample of the current picture; and an array resf 'Residuals of a size 4x4 specifying residuals for enhancement sub-layer 1. The output to this process is a 4x4 array of filtered residuals rest] 1-7 teredResiduals* with elements res1.11-111eredResiduals[x][y]. An in-loop level 1 residual filter may be applied by first obtaining the variables dehlockEncthled, a and fi as follows: deblockEnabled = level 1 filtering enabled flag if (level I filtering signalled flag) a = 16 -level I filtering first coefficient [3= 16 -level 1 filtering second coefficient else a= 16 [3= 16 If dehlockEnabled is true, the following steps are applied given the residual representation in Figure 6A: resL1FilteredResidual s[0][0] = (resL1Resi dual s [0][0] * a) >> 4 resL1FilteredResidual s[0][3] = (resL1Resi dual s [0] [3] * a) >> 4 resL IFilteredResiduals[3][0] = (resL 'Residuals [3][0] * a) >> 4 re sL 1FilteredResidual s [3] pi = (resL1Residuals [3] [3] * a) >> 4 re sL IFilteredResidual s [0] [ I] = (resL1Residua1 s [0] [1] * 13) >> 4 re sL 1FilteredResidual s [0] [2] = (resL1Residuals [0] [2] * 13) >> 4 resL IF IteredResiduals[1][0] = (resL1Residuals[1][0] *13) >> 4 re sL IFilteredResidual s [2] [0] = (resL1Residua1 s [2] [0] * f3) >> 4 resL1FilteredResidual s[1][3] = (resL1Resi dual s [ 1]p] * [3) >> 4 re sL IFilteredResidual s [2] pi = (resL1Residua1 s [2] [3] * 13) >> 4 resL1FilteredResidual s[3][1] = (resL1Resi dual s [3][1] * f3) >> 4 re sL IFilteredResidual s pi [2] = (resL1Residua1 s [3] [2] * 13) >> 4 resL IF IteredResiduals[1][1] = resL 1Residuals[1][1] re sLIFilteredResidual s[ I][2] = resLIResiduals[ I][2] resL1FilteredResidual s[2][]] = resL 'Residual s[2][1] re sL IFilteredResiduals[2][2] = resL1Residuals[2][2] If dehlockEnabled is false, the resLIFilleredResidnals* are simply set to equal the resLI]?esiductls (e.g. the filter is applied as a pass-through filter with no modification).
Decoding process fbr base decoder data eiaraction A brief overview of block 2710 will now be described. However, it should be noted that as per other examples, the base decoding may be considered a separate stand-alone process that may be performed by third-party components. In certain cases, one or more of hardware and software base decoders may be instructed to decode a received base stream under the control of the enhancement decoder (i.e. that implements the residual decoding processing shown in Figure 27). In other cases, the enhancement decoder may be configured to receive base decoded pictures at a particular rate, that are then synchronised with the residual decoding of the enhancement sub-layers. The process described below may be considered as a wrapper for a base decoding process that does not actually implement the base decoding process, but that instead reads data output by a separate process, e.g. from a decoded base picture buffer.
In the example of Figure 27, the decoding process for base decoder data extraction may receive the following inputs: a location (xCurr, yeurr) specifying the top-left sample of the current block relative to the top-left sample of the current picture component; a variable Idthaserrame specifying the base decoder picture buffer frame from which to read the samples; and a variable IcbcP lanes specifying the colour component of the current block. An output of this process is shown as being a (nOtriX)x(neurrY) array recDecodedactseSamples of picture samples with elements recDecodedBuseSantples[x][y]. The process 2710 may read a block of samples of size (nOirrS)x(nCurrS) from the location (xCurr, yeTurr) and the frame pointed by the variable 10 IdxBaseFrame. The blocks may be read in raster order.
When luma and chroma planes are present, e.g arranged as shown in Figures 7A to 7C, the sample block size variables nCurrX and nCurrY may derived as follows: nCurrX= (IdxPlanes == 0) ? untr/X: nOtITX>> ShEft WidthC nCurit = lanes == 0) ? neurrY: 1K:cunt >> ShiftHeighte Decoding process fbr dither filter As mentioned above in the section on residual reconstruction, a dither filter may be applied to the output of the level 2 reconstruction. The dither filter may be applied to improve an image quality of the output picture, e.g. by hiding artefacts that are generated by the application of quantization. In Figure 27, the dither process is shown as block 2760.
In the example of Figure 27, the inputs to the dither process 2760 comprise: a location (xCurr, yeurr) specifying the top-left sample of the current block relative to the top-left sample of the current picture component; and an (ne urr,S)x(neurr,c) array recL2PictureSamples specifying the reconstructed combined output picture samples (e.g. relating to 2322 in Figure 23). The output of process 2760 is shown as a (neurrS)x(neurr,S) array recL2DitheredPicture,S'amples of residuals with elements recL2DitheredPicture,S'ainples[x][y].
Different forms and variations of known dithering approaches may be applied. The type of dithering may be signalled using the variable dithering 019e (as described above).
For example, if dithering type is equal 1 (e.g. a uniform dither), the (nCurrS)x(nCurr,S) block of the reconstructed sample array recL2DitheredPictureSomples at location (x('zirr, ;Cuff) may be derived as follows: recL2DitheredPictureSamples[xCurr + i][yeurr + j]= recL2Pic eSamples[i][j] + rand(i,j) with i = 0...i/Curr5 -1, j = 0...nCnrrS -1. The function rand(i,j) is a pseudorandom number, e.g. as generated with a known pseudo or true random number generator. The function rand(i, j) may be configured to output a value within a predefined range. This predefined range may be signalled. In the present example, the predefined range is set using the variable dithering strength as described in the "Syntax" and "Semantics" sections above, where the defined range may be set as [-dithering strength,+dithering strength].
Parsing Process Ibr Entropy Encoded Transform Coefficients This section describes an entropy decoding process that may be applied to entropy-encoded transform coefficients. Inputs to this process may comprise the bits belonging to chunks of data containing the entropy encoded transform coefficients derived from the picture enhancement decoding process shown as block 2706. The process described herein may also be used to implement the entropy decoding components of the previous decoder examples.
As set out above, it should be noted that references to Huffman encoding and decoding as described herein shown be treated as also referring to prefix coding. Prefix codes are also known as prefix-free codes, comma-free codes, prefix condition codes and instantaneous codes. Although Huffman coding is just one of many algorithms for deriving prefix codes, prefix codes are widely referred to as "Huffman codes", even when the code was not produced by a Huffman algorithm. Hence, "Huffman code" is used herein as a synonym for the more general "prefix code".
In more detail, and with reference to block 2706 of Figure 27, different processing operations may take place depending on whether the data is tiled. If the data is not tiled, e.g. as indicated by a tile dimensions' type equal to 0, for each chunk the following information is provided: a variable surfacesfolaneldrillevelldr fflayerldrirle only.flag specifying if a prefix coding decoder is needed (or if the data is to be decoded without prefix coding as only run-length encoded data): a variable surfacesfplanadxlilevelldxfflayerIdxlsize specifying the size of the chunk of data (e.g. wherein chunks correspond to the data portions shown in Figure 9A); and a variable surlaces1PlaneIclyfflevelIcbcificowIdxl.data specifying the beginning of the chunk. The variables planeIdy, levelIcbc and layerIcbc may be used as described above to indicate the plane, enhancement sub-layer and coefficient group to which the chunk belongs Outputs of this process are entropy-decoded quantized transform coefficients to be used as input for the decoding processes shown as blocks 2714 and 2746 in Figure 27, which are described above.
If tiled data is enabled (e.g. as shown in Figure 21A), then tile dimensions type may be set to a non-zero value. In this case, the following information is provided for each chunk: a variable smfacesIplaneldx levellthcl Icovrldx Riles pointing to the tiles of the decoded picture; and a variable surfacesiplanelcbc1 flevelldrulayerldx[rle only_flag specifying if the prefix coding decoder is needed for all tiles. In this case, a chunk of data is further split to smaller chunks of data, which are termed as tiles. For each tile the following information are provided: a variable surfaces iplaneldxfilevelldxf Ilayerldxpilesitileldxfsize specifying the size of the chunk of tile data; and a variable surfaces fplaneldyfflevelldyfilayerldxpiles[tileldyjdata specifying the beginning of the chunk. The indexes pktne/dr, levelldr, lctyerldx and Meld"( indicate the plane, enhancement sub-layer, coefficient group and tile to which the chunk belongs. Outputs of this process are again entropy-decoded quantized transform coefficients to be used as input for the processes shown as blocks 2714 and 2746 in Figure 27 (e.g. in the order shown in Figure 21A).
An example entropy decoder may consist of two components: a prefix coding decoder and a run length decoder. This is described in the section "Example Entropy Encoding" above with reference to Figure 10A, Parsing Process for Entropy Encoded Temporal Signal Coefficient Group Temporal signalling data may also be entropy encoded, e.g. as shown in the examples of Figures 24 to 26, and described with reference to Figure 12A to 12E. The temporal signalling data in the example of Figure 27 is organised in entropy-encoded temporal signal coefficient groups. These may also be parsed as part of the picture enhancement decoding process 2706 or as part of a separate process. They may be entropy decoded in a similar manner to the processes described above.
As set out above, the processing of a temporal signalling surface may depend on whether the data is tiled. This may be indicated using the variable tile dimensions type.
Inputs to this parsing process for the temporal signalling data may comprise the bits belonging to chunks of data containing the entropy encoded temporal signal coefficient group derived from block 2706.
If tile dimensions type is equal to 0, for each chunk the following information are provided: a variable temporal surfaces[planeldx1 fie only_flag specifying if the prefix coding decoder is needed; a variable temporal surfaces Iplanelctx [size specifying the size of the chunk of data; and a variable temporal surfaces' planeklx.data specifying the beginning of the chunk. In this case, planeldx is an index indicating the plane to which the chunk belongs. The output of this process is an entropy decoded temporal signal coefficient group to be stored in TempSigS'uttace, as described in more detail above and below.
If tiled data is enabled, e.g. tile dimensions type is not equal to 0, the following information may be provided for each chunk: a variable temporal surfaces fplaneldx[tiles pointing to the tiles of the decoded picture; and a variable temporal mu:laces Iplaneldx.rle only.flag specifying if the prefix coding decoder is needed for all tiles. In this case, a chunk of data is further split to smaller chunks of data, which are termed as tiles (e.g. as shown in Figure 21A but for the temporal signalling -the temporal signalling may be seen as an extra layer). For each tile the following information may be provided: a variable temporal surfacesIplanela5c1.tilesltileldx [size specifying the size of the chunk of tile data; and a variable temporal surfacesfplaneldxJ.tilesftileldxJ.data specifying the beginning of the chunk. In this case, indexes plane/dr and tileldx indicate the plane and tile to which the chunk belongs. The output of this process is an entropy decoded temporal signal coefficient group to be stored in TempSigSmface as described in more detail above and below.
Again, an example entropy decoder may consist of two components a prefix coding decoder and a am length decoder. This may be applied as described with respect to the transform coefficients in the section "Example Entropy Encoding" above with reference to Figure 10A.
Prefix Coding Decoder Description
Certain aspects of an example prefix coding decoder relate to the above section titled Example Entropy Encoding" and Figures 10A to 101.
In certain examples, if the variable rk only _flag is equal to 1, the prefix coding decoder process is skipped, and the run length decoding process described herein is invoked. If variable rle only _flag is equal to 0, the prefix coding decoder is applied.
The prefix coding decoder may be initialised by reading code lengths from the stream header size. If there are more than 31 non-zero values the stream header is as shown in Figure 10B. Otherwise the stream header may be arranged as shown in Figure 10C. In the special case for which the frequencies are all zero, the stream header may be arranged as shown in Figure 10D. In the special case where there is only one code in the prefix coding tree the stream header may be arranged as shown in Figure 10E. Further details of these data structures may be found in the "Example Entropy Encoding" section above After being initialised, the prefix coding decoder may undertake the following steps 1) Set code lengths for each symbol; 2) Assign codes to symbols from the code lengths; and 3) Generate a table for searching the subsets of codes with identical lengths. In this step, each element of the table may be used to record the first index of a given length and the corresponding code (e.g. in the form firsildx, firs/Code).
Prefix Coding Decoder Table Generation A short example of step 3) as set out above, e.g. prefix coding decoder table generation will now be described with reference to the symbol tables below and Figures 28A to 28E.
To find a prefix coding code for a given set of symbols a prefix coding tree may be created. The table below shows a hypothetical example with 6 symbols (A, B, C, D, E and F) that each occur at different frequencies. First the symbols are sorted by frequency. This is shown in the table below: Symbol Frequency A 3 B 8 C 10 D 15 E 20 F 43 The two lowest elements are then removed from the list and made into leaves of a tree, with a parent node that has a frequency the sum of the two lower element's frequencies. A first partial tree is shown in Figure 28A. A new sorted frequency list is then generated with the combined elements as shown in the table below: Symbol Frequency C 10 D 15 E 20 F 43 Then the loop is repeated, combining the two lowest elements, as shown in Figure 28B. The new list with updated sorted frequencies is as shown below: Symbol Frequency D 15 E 20 F 43 This process is repeated until only one element remains in the list. The iterations are shown in Figures 28C and 28D and in the three consecutive tables below: Symbol Frequency F 43 Symbol Frequency F 43 Symbol Frequency Once the tree is built, to generate the prefix coding code for a symbol the tree is traversed from the root to this symbol, appending a 0 each time a left branch is taken and a 1 each time a right branch is taken. This is shown in Figure 28E. In the hypothetical example presented here, this gives the following code, as specified in the table below: Symbol Code Code length A 1010 4 B 1011 4 C 100 3 D 110 3 E 111 3 F 0 1 The code length of a symbol is the length of its corresponding code. To decode a prefix coding code, the tree is traversed beginning at the root, taking a left path if a 0 is read and a right path if a 1 is read. The symbol is found when reaching a leaf Prefix Coding Decoder for Tile Data Sizes This section describes a prefix coding decoder that may be used for tiled data. In this case, the decoder reads the prefix coding encoded data size of each tile byte by byte. A state machine for this decoding has two states: a LSB Prefix Code state and a MSB Prefix Code state. By construction the state of the first byte of data is guaranteed to be LSB Prefix Code state. If an overflow flag is 0, the state machine remains in the LSB Prefix Code state. If the overflow flag is 1, the state machine transitions to the MSB Prefix Code state. The decoder this state machine to determine the state of the next byte of data. The state tells the decoder how to interpret the current byte of data. The two states may those illustrated in Figure 10G.
The LSB Prefix Coding state may encode the 7 less significant bits of a non-zero value. In this state a byte is divided as shown in Figure 10G. The overflow bit is set if the value does not fit within 7 bits of data. When the overflow bit is set, the state of the next byte will be MSB Prefix Coding state. The MSB Prefix Coding state encodes bits 8 to 15 of values that do not fit within 7 bits of data. All 8 bits of the byte may be data bits (e.g. similar to Figures 10H and 10I without the run bit). A frequency table is created for each state for use by the Prefix Coding encoder.
If this process is invoked with surfaces referring to entropy encoded transform coefficients, the decoded values are stored into a temporary buffer Imp size per tile of size nTilesT,1 or nTilesT2 (respectively, number of tiles for enhancement sub-layer 1 and These mapped sub-layer 2). may get to 17'3 surfaceslplaneldx levelldx layerldx [size as follows, using the indexes planeldx, levelIctx, layerldx, and tileldx: if (I evelidx == 1) nTiles = nTilesL 1 else nTiles = nTilesL2 if (compression type size_per tile == 2) [ for (tileldx = 1; tileldx < nTiles; tiletdx++) t tmp size_per tile[tileldx] += tmp size_per tile[tileIdx -1]
I
for (tileldx = 0; tileldx < nTiles; tileIdx++) { temporal surfaces[planeIdiatiles[tileIdx] size blip size per tile[tileldx] i If this process is invoked with temporal surfaces referring to an entropy encoded transform signal coefficient group, the decoded values may be stored into a temporary buffer Imp size per tile of size iii iles1,2 and gel mapped to temporal sutface.s[planeldypiles[tilelthi size as follows: if (compression type size per tile == 2) 1 for (tileldx = 1; tileldx < nTilesL2; tileldx++) { tmp size per tile[tileldx] += tmp size per tile[tileldx -1] r for (tileldx = 0; tileldx < nTilesL2, tileIdx++) 1 temporal surfaces[planeIdx].tileskileIdiasize tmp size_per tile[tileldx] = The last bit symbol offset per tile may use the same prefix coding decoding process as described above If this process is invoked with surfaces referring to entropy encoded transform coefficients, the decoded values are stored into a temporary buffer imp decoded tile_prefix last,symbol hit offset of size nirilestl or nilles1,2 (respectively, number of files for enhancement sub-layer 1 and sub-layer 2) and get mapped to situfacesfplaneIdelilevelidxfflayeridx [Nest-Iliadic" The variable prefix last symbol hit offset is then derived as follows: if (levelIdx == 1) nTiles = nTilesL1 else nTiles = nTilesL2 for (tileldx = 0; tileldx < nTiles; tileldx++) { surfaces[planeIdx]fievelIdxillayerIdxpiles[tileIdx] prefix last symbol bit offset = tmp decoded tile prefix last symbol bit offset[tileldx] i If this process is invoked with temporal surfaces referring to an entropy encoded transform signal coefficient group, the decoded values are stored into a temporary buffer imp decoded tile_prefix last symbol bit offset of size nrilesL2. They may then be mapped to temporal surfacesIplaneldxf.tilesItilekbcfprefix last symbol hit offset as follows: for (tileldx = 0 tileldx < nTilesL2; tileldx++) temporal surfaces[planeIdx] tiles[tileIdx].prefix last symbol bit offset = tmp_decoded_tile_prefix_last_symbol_bit_offset[tileldx] RLE Decoder An example run length encoding (RLE) decoder will now be described. Further details of run length encoders and decoders are also found in the section "Example Entropy Encoding" set out above and Figures 10A to 101.
The input of the RLE decoder may be a byte stream of prefix coding decoded data if rle onlyilagis equal to zero or just a byte stream of raw data if rle only_fictgis equal to 1. The output of this process is a stream of quantized transform coefficients belonging to the chunk pointed by the variables planeldx, levelldx and layerldx or a stream of temporal signals belonging to a temporal chunk.
When decoding coefficient groups, the RLE decoder may use the state machine 1050 shown in Figure 10F as described above. The run length state machine 1050 may be used by the prefix coding encoding and decoding processed to know which prefix coding code to use for the current symbol or code word. The RLE decoder decodes sequences of zeros. It may also decode the frequency tables used to build the prefix coding trees. The RLE decoder reads the run length encoded data byte by byte. By construction the state of the first byte of data is guaranteed to be the RLC Residual LSB state 1051. The RLE decoder uses the state machine 1050 to determine the state of the next byte of data. The state tells the RLE decoder how to interpret the current byte of data. Further details of the three states 1051, 1052 and 1053 are provided in the description of Figure 10F above.
Reference is also made to the byte encoding shown in Figures 100 to 101. A frequency table may be created for each state for use by the prefix coding encoder. In order for the decoder to start on a known state, the first symbol in the encoded stream will always be a residual.
The RLE decoder for a temporal signal coefficient group may operate in a similar manner. An example RLE decoder for a temporal signal coefficient group was described in the section title "Temporal Prediction and Signalling" above. The RLE decoder for a temporal signal coefficient group may use a state machine similar to the state machine 1280 shown in Figure 12E. The state machine 1280 may be used by the prefix coding encoding and decoding processed to know which prefix coding code to use for the current symbol or code word. The RLE decoder for the temporal signal coefficient group decodes sequences of zeros and sequences of ones. It may also decode the frequency tables used to build prefix coding trees. The RLE decoder for a temporal signal coefficient group may read received run length encoded data byte by byte. By construction the state of the first byte of data may be guaranteed to be true value of the first symbol in the stream (e.g. the state 1281 in Figure 12E). The RLE decoder may use the state machine 1280 to determine the state of the next byte of data. The state tells the RLE decoder how to interpret the current byte of data. As described above with reference to Figure 12E, a RLE decoder for a temporal signal coefficient group may have two states in addition to a first symbol state: an RLC zero run state as shown as state 1283 in Figure 12E and a RLC one run state as shown as state 1282 in Figure 12E. The zero run state 1283 encodes 7 bits of a zero run count. The run bit is high if more bits are needed to encode the count. Run length encoding of a byte for the zero run state may be similar to that shown in Figure 10H or 101. The one run state 1282 encodes 7 bits of a one run count. The run bit is high if more bits are needed to encode the count. Run length encoding of a byte for one run state may also be similar to that shown in Figure 10H or 101. A frequency table may be created for each state for use by the prefix coding encoder. In order for the RLE decoder to start on a known state (e.g. state 1281), the first symbol may contain the real value 0 or 1.
When decoding a temporal signal coefficient group, the RLE decoder writes the 0 and 1 values into the temporal signal surface TempSigSztrface. This may have a size (Picture Width I nTbS,PictureHeight I nTbS) where nTbS is transform size.
The encoding described with respect to Figures 12C and 12D, and the flow-chart of Figures 13A and 13B, may be used. In this case, decoding of the temporal signal may involve the following steps. If temporal tile Antra signalling enabled.flag is equal to 1, and if the value to write at the writing position (x, y) in the TempSigSurface is equal to 1, and x%(32/nTb,S) == 0 and y%(32/nTbS) == 0, a next writing position is moved to (x + 32InTbS, y) when (x + 32/nlbS)<(PictureWidth I nibs), otherwise it is moved to (0, y + 32/t7TbS). This may decode the temporal signalling as shown in Figure 12D, where this Figure may represent a RLE decoder for a temporal signal coefficient group writing values to the temporal signal surface for a 4x4 transform with nTbs = 4.
In certain examples, other signalling may be encoded and/or decoded as set out for the temporal signalling above and in other examples.
For example, in one case, an entropy enctblecl_flag for a tile may be encoded and decoded in this manner. In this case, a run length state machine to be used to code the entropy enabled.flag field of each of the tiles may be configured in a similar manner to the state machine 1280 shown in Figure 12E and used for the temporal signalling above. The RLE decoder may again decodes sequence of zeros and sequences of ones as discussed for the temporal signalling above (with the same state arrangement).
When using an encoded tile entropy enabled.flag, the RLE data may be organized in blocks. Each block may have an output capacity of 4096 bytes. In this case, the RLE decoder may switch to a new block in the following cases: 1) the current block is full; 2) the current RLE data is a run and there is less than 5 bytes left in the current block; and 3) the current RLE data is lead to an LSB/MSB pair and there is less than 2 bytes left in the current block. In this example, the RLE decoder may write the 0 and 1 values into temporary signal surface Imp decoded tile entropy enabled of size: (nPlatnes) x (nLevels) x (nLayers) x (idilesL1+11211esL2) x (no enhancement bit _flag == 0) + (temporal signalling_present_flag == 1) x (nPktnes) x (nTilesL2) In this case, the resulting temporary signal surface imp decoded tile entropy enabled may get mapped to surfaces[planeldxfflevelldvfflayerldxpilesfidelde [entropy enabled flag and temporal surfitces[planadypiles[tileldx_fentropy enabled jlag as follows: for (planeIdx = 0; planeIdx < nPlanes; ++planeIdx) { if (no_enhancement_bit_flag == 0) { for (level Idx = 1; level Idx <= nLevel s; ++level Idx) { if (levelIdx == 1) nTiles = nTil esL 1 else nTiles = nTilesL2 for (layerIdx = 0; layer < nLayers; ++layerIdx) { for (tileldx = 0; tileIdx < nTiles; tileldx++) { surfaces[planeIdx][levelIdx][layerIdx].tiles[tileIdx].entropy enabled flag = tmp decoded tile entropy enabled[tileldx]
I
I
I else { for (levelIdx = 1; levelldx <= nLevels; ++Ievelldx) { if (levelIdx == 1) nTiles = nTilesL 1 else nTiles = nTilesL2 for (layerldx = 0; layer < nLayers; ++layerldx) { for (tileIdx = 0; tileIdx < nTiles; tileIdx++) surfaces[planeIdx][levelIdx][layerIdx] .tiles[tileIdx] entropy enabled flag = 0
I i
if (temporal signalling present flag == I) { for (tileldx = 0; tileIdx < nTilesL2; tileldx++) { temporal_surfaces[planeIdx].tiles[tileIdx].entropy_enabled_flag = tmp decoded tile entropy enabled[tileIdx]
I (
I
Parsing process Pr Exp-Golomb Codes In certain examples described herein data may be encoded with Exp-Golomb codes. These may be 0-th order. This section sets out a parsing process that may be invoked when the descriptor of a syntax element in the syntax tables is equal to ue(v) (e.g. as set out in the "Bitstream Syntax" section above.
In the example, inputs to the Exp-Golomb code parsing process may comprise bits from the raw byte sequence payload (RBSP). Outputs of this process may comprise syntax element values.
Syntax elements coded as ue(v) may be Exp-Golomb-coded with order 0. The parsing process for these syntax elements begins with reading the bits starting at the current location in the bitstream up to and including the first non-zero bit and counting the number of leading bits that are equal to 0. This process may be specified as follows: lead ngZeroBits = -1 for( b = 0; !b; leadingZeroBits++ ) b = read bits( 1) The variable codeNum may then assigned as follows: codeNum = (2leadingZeroBits 1) + read bits(leadingZeroBits) where the value returned from read bits( leadingZeroBits) is interpreted as a binary representation of an unsigned integer with most significant bit written first.
The table below illustrates an example structure of a 0-th order Exp-Golomb code by separating the bit string into "prefix" and "suffix" bits. The "prefix" bits are those bits that are parsed as specified above for the computation of leadingZeroBas, and are shown as either 0 or 1 in the bit string column of the table. The "suffix" bits are those bits that are parsed in the computation of codeNum and are shown as xi in the table, with i in the range of 0 to leadingZeroBits -1, inclusive. Each xi is equal to either 0 or I Bit string form Range of codeNum 1 0 0 1 x0 1 2 0 0 1 xl x0 3.6 0001x2 xlx0 714 00001 x3 x2 xl x0 15 30 0 0 0 0 0 1 x4 x3 x2 xl x0 31.62 The table below illustrates explicitly an assignment of bit strings to codeNum values Bit string codeNum 1 0 0 1 0 1 0 1 1 2 0 0 1 0 0 3 0 0 1 0 1 4 0 0 1 1 0 5 0 0 1 1 1 6 0 0 0 1 0 0 0 7 0 0 0 1 0 0 1 8 0 0 0 1 0 1 0 9 The value of the syntax element is then equal to codeNztin.
Summary of the Detailed Example Implementation of the Decoder The example described above processes a bitstream where data is logically organized into chunks. First, each chunk is entropy decoded. That is, the method comprises retrieving each chunk and applying an entropy decoding operation to each chunk. An example of entropy decoding operations is described above and may comprise for example run length decoding, prefix coding decoding or both. The method may then output an any of entropy decoded quantized coefficients. A rim-length coding operation may identify the next symbol in a set of symbols and extract either a data value or a run of zeros. The decoding operation may then combine these values and zeros to decode the data. The order may be in the order extracted or alternatively in some predetermined order.
An example implementation of the decoding process for a first level of enhancement chunk (e.g. level 1 following entropy decoding) is described. An example implementation of the decoding process for a further level of enhancement chunk (e.g. level 2 following entropy decoding) is also described.
The method may comprise retrieving an array of entropy decoded quantized coefficients representing a first level of enhancement and outputting an array of residuals.
The method may further comprise retrieving an array of samples of output of a base decoder. The method may further comprise applying a de-quantization process to the array of entropy decoded quantized coefficients to derive a set of de-quantized coefficients, applying a transformation process to the set of de-quantized coefficients and applying a filter process to output the array of residuals representing a first level of enhancement. The method may then further comprise recreating a picture from arrays of residuals. The method may comprise applying a transform process from a set of predetermined transform processes according to a signalled parameter. For example, the transform process may be applied on a 2x2 coding unit or a 4x4 coding unit.
The method may also comprise retrieving an array of entropy decoded quantized coefficients representing a further level of enhancement and outputting an array of residuals. The method may further comprise retrieving the array of residuals of the first level of enhancement corresponding to the array of entropy decoded quantized coefficients representing a further level of enhancement. The method may further comprise applying an up-sampling process to the array of residuals of the first level of enhancement. The method may comprise applying a temporal prediction process to the array of entropy decoded quantized coefficients representing a further level of enhancement to derive an array of temporally predicted samples. The method may further comprise applying a de-quantization process to the array of entropy decoded quantized coefficients to derive a set of de-quantized coefficients, applying a transformation process to the set of de-quantized coefficients to derive a set of transformed coefficients. The array of temporally predicted samples may then be combined with the set of transformed coefficients to derive an array of residuals for the further later of enhancement. The method may then further comprise recreating a picture from the array of residuals. The method may comprise applying a 1 8 1 transform process from a set of predetermined transform processes according to a signalled parameter. For example, the transform process may be applied on a 2x2 coding unit or a 4x4 coding unit.
The method may comprise predicting residuals. The step of predicting residuals may be performed as part of the transform process. The predicting residuals step may comprise modifying a residual. The modification may be performed based on a location of the residual in a frame. The modification may be a predetermined value. A filtering step may also be applied in the further level of enhancement. Similarly, temporal prediction may also be applied in the first level of enhancement process.
Although the method described above specifies examples of a de-quantization process, a transform process, an up-sampling process and a filter process (and other processes), it will be understood that the process described are not essential and may other contemporary process may be applied to perform the steps described.
Methods may be applied for operating on temporal signalling (e.g. signalling a temporal mode using metadata). For example, encoded data may be modified such that if temporal enabled bit is 1 and temporal refresh hit is 0 and if layeridx is 0, an additional temporal surface is processed. The decoding process for picture enhancement encoded data may in certain cases be modified such that if temporal enabled bit is 1 and temporal refresh bit is 0 and if layer/dr is 0, an additional temporal surface is processed.
A decoding process for temporal prediction may be modified such that variables TransTempSig and lileTempSig are read from a temporal surface (e.g. the temporal map). If a temporal embedded bit is 1, Trans TempSig and Tile TempSig may be supplied as inputs to the temporal prediction processes and these processes may be configured to determine from tileTempSig array if a tile refresh process should be invoked. Additionally, in this case, decoding may be configured to invoke temporal processing for the transform if TransTempSig is set to 0.
in examples, processes are used to describe the decoding of syntax elements. A process may have a separately described specification and invoking. Syntax elements and upper-case variables that pertain to a current syntax structure and depending syntax structures may be available in the process specification and invoking. A process specification may also have a lower-case variable explicitly specified as input. Each process specification may have an explicitly specified output. The output is a variable that may either be an upper-case variable or a lower-case variable. When invoking a process, the assignment of variables is specified as follows: if the variables at the invoking and the process specification do not have the same name, the variables are explicitly assigned to lower case input or output variables of the process specification; otherwise (the variables at the invoking and the process specification have the same name), assignment is implied.
In the specification of a process, a specific coding block may be referred to by the variable name having a value equal to the address of the specific coding block.
At both the encoder and decoder, for example implemented in a streaming server or client device or client device decoding from a data store, methods, "components" and processes described herein can be embodied as code (e.g., software code) and/or data. The encoder and decoder may be implemented in hardware or software as is well-known in the art of data compression. For example, hardware acceleration using a specifically programmed Graphical Processing Unit (GPU) or a specifically designed Field Programmable Gate Array (FPGA) may provide certain efficiencies. For completeness, such code and data can be stored on one or more computer-readable media, which may include any device or medium that can store code and/or data for use by a computer system. When a computer system reads and executes the code and/or data stored on a computer-readable medium, the computer system performs the methods and processes embodied as data structures and code stored within the computer-readable storage medium. In certain embodiments, one or more of the steps of the methods and processes described herein can be performed by a processor (e.g., a processor of a computer system or data storage system).
Generally, any of the functionality described in this text or illustrated in the figures can be implemented using software, firmware (e.g., fixed logic circuitry), programmable or nonprogrammable hardware, or a combination of these implementations. The terms "component" or "function" as used herein generally represents software, firmware, hardware or a combination of these. For instance, in the case of a software implementation, the terms "component" or "function" may refer to program code that performs specified tasks when executed on a processing device or devices. The illustrated separation of components and functions into distinct units may reflect any actual or conceptual physical grouping and allocation of such software and/or hardware and tasks.

Claims (25)

  1. CLAIMS1. A bitstream for transmitting one or more enhancement residuals planes suitable to be added to a set of preliminary pictures obtained from a decoder reconstructed video, comprising: a decoder configuration for controlling a decoding process of the bitstream, wherein the decoder configuration comprises a no enhancement bit flag variable, wherein a first value of the no enhancement _ bit_ flao variable indicates, that a decoder should: perform dequantization and transformation processes on entropy decoded quantized coefficients to obtain an array of residuals; and wherein a second value of the no enhancement _ bit_ flao variable indicates that at a decoder, the decoder should: set values of an array to a predetermined value, wherein the decoder should invoke a picture reconstruction process based on the array.
  2. 2. A bitstream according to claim 1, wherein the first value of the no enhancement_ bit_ flag variable further indicates that at a decoder, for a first enhancement layer, the decoder should: after performing the dequantization and transformation processes, perform a filter process to obtain a first array of residuals.
  3. 3. A bitstream according to claim 1 or 2, wherein the decoder configuration further comprises a temporal enabled flag variable, wherein for the second value of the no enhancement bit flag variable, for a second enhancement layer: a first value of the temporal enabled flag variable indicates that the predetermined value is a value stored in a temporal buffer, wherein the temporal buffer comprises values associated with a previous frame; and a second value of the temporal_enabled_flag variable indicates that the predetermined value is 0.
  4. 4. A bitstream according to claim 1 or 2, the first value of the no enhancement bit flag variable indicates that the decoder should, for a second enhancement layer, invoke a temporal process to obtain an array of temporally predicted residuals; and combine the array of temporally predicted residuals with the array of residuals.
  5. 5. A bitstream according to claim 4, wherein the decoder configuration further comprises a temporal refresh bit flag variable and a temporal enabled flag variable, wherein the first value of the no enhancement bit flag variable further indicates that at a decoder, the decoder should: based on a value of the temporal refresh bit flag variable, either invoke the temporal prediction process to obtain the array of temporally predicted residuals, wherein the temporal prediction process comprises combining an array of decoded temporally predicted residuals with a temporal buffer, wherein the temporal buffer comprises values associated with a previous frame, or set values of the array of temporally predicted residuals to contain only zeros; and based on a value of the temporal_enabled_flag variable: add the array of temporally predicted residuals to the array of residuals, and store the array of residuals to the temporal buffer.
  6. 6. A bitstream according to claim 5, wherein the first value of the no enhancement bit flag variable further indicates that at a decoder, the decoder should: invoke the temporal prediction process to output the array of temporally predicted residuals if the value of the temporal refresh bit flag variable is a first value; or set the array of temporally predicted residuals to contain only zeros if the value of the temporal_refresh_bit_flag variable is a second value; and add the array of temporally predicted residuals to the array of residuals and store the array of residuals to the temporal buffer if the value of the temporal_enabled_flag variable is the second value.
  7. 7. A bitstream according to claim 6, wherein the first value of the no enhancement bit flag variable further indicates that at a decoder, the decoder should: modify a stepWidth variable to be used in the dequantization process if a value of the temporal enabled flag is the second value.
  8. 8. A bitstream according to claim 2, wherein the decoder configuration further comprises a user data enabled variable, and the first value of the no enhancement bit flag variable further indicates that at a decoder, for the first enhancement layer, before performing the dequantization and transformation processes, the decoder should perform a plurality of bit shift operations comprising: a first bit shift operation on an element of an array specifying level 1 entropy decoded quantized transform coefficients, wherein the first bit shift operation comprises a 2-bit or 6-bit right shift based on a value of a variable specifying a size of a current transform block, and the value of the user data enabled variable; a second bit shift operation on the element, wherein the second bit shift operation comprises a 1-bit bit right shift; and if the last bit of the element is set to 1, the element is set to be negative.
  9. 9. A bitstream according claim 8, wherein the first value of the no enhancement bit flag variable further indicates that at a decoder, for the first enhancement layer: if a variable representing a size of a current transform block, nTbS, is equal to 4, to perform the first bit shift operation, the decoder should: shift TransformCoetYQ(1)(1) two bits to the right when the value of user data enabled is 1, wherein TransformCoeffQ is the array of level 1 entropy decoded quantized transform coefficients; or shift TransformCoeffQ(1)(1) six bits to the right when the value of user data enabled is 2-and to perform the second bit shift operation, the decoder should: if a last bit of TransformCoeffQ(1)(1) is set to 0, shift TransformCoeffQ(1)(1) one bit to the right; or if a last bit of TransformCoeffQ(1)(1) is set to 1, shift TransfonnCoeffQ(1)(1) one bit to the right and set TransformCoeffQ(1)(1) to be negative; or if nTbS is equal to 2, to perform the first bit shift operation shift operation, the decoder should: shift TransformCoeffQ(0)(1) two bits to the right when the value of user_ data_ enabled is 1; or shift TransformCoeffQ(0)(1) six bits to the right when the value of user data enabled is 2; and to perform the second bit shift operation, the decoder should: if a last bit of TransformCoeffQ(0)(1) is set to 0, shift TransformCoeffQ(1)(1) one bit to the right; or if a last bit of TransformCoeffQ(0)(1) is set to 1, shift TransfonnCoeffQ(0)(1) one bit to the right and setting TransformCoeffQ(0)(1) to be negative.
  10. 10. A bitstream according to claim 3 wherein the decoder configuration further comprises a temporal refresh bit flag variable, wherein the second value of the no enhancement bit flag indicates that at a decoder, the decoder should: based on a value of the temporal_refresh_bit_flag variable: invoke a temporal process to obtain an array of temporally predicted residuals, wherein the temporal prediction process comprises combining the array of decoded temporally predicted residuals with a temporal buffer, wherein the temporal buffer comprises values associated with a previous frame; or set values of the array of temporally predicted residuals to contain only zeros; and based on the value of the temporal_enabled_flag variable: store the array of temporally predicted residuals in the array of residuals, and store the array of residuals to the temporal buffer, or set values of the array of residuals to contain only zeros.
  11. 11. A bitstream according to claim 10, wherein the second value of the no enhancement bit flag variable indicates that at a decoder, the decoder should: invoke the temporal prediction process if the value of the temporal_refresh_bit_flag variable is a first value; or set the array of temporally predicted residuals to contain only zeros if the value of the temporal refresh bit flag variable is a second value; and store the array of temporally predicted residuals in the array of residuals, and store the array of residuals to the temporal buffer if the value of the temporal enabled flag is the second value; or set the array of residuals to contain only zeros if the value of the temporal enabled flag is the first value.
  12. 12. A bitstream according to any preceding claim, wherein the decoder configuration further comprises: an indication of a scaling factor to be applied to the decoded version of the video frame, and/or a type of transform to be applied to coding units of the encoded residual data.
  13. 13. A method of decoding an encoded bitstream into one or more enhancement residuals planes suitable to be added to a set of preliminary pictures obtained from a decoder reconstructed video, the method comprising: retrieving a plurality of decoding parameters from a decoder configuration associated with the encoded bitstream, wherein the decoding parameters are used to configure the decoding operations; retrieving a no enhancement_bit_flag variable from the decoder configuration, wherein for a first value of the no enhancement bit flag variable, the method further comprises: retrieving encoded enhancement data from the encoded bitstream; decoding the enhancement data to generate a set of residuals representing differences between a reference video frame and a decoded version of the video frame, wherein decoding the enhancement data comprises: performing dequantization and transformations processes to obtain an array of residuals; wherein for a second value of the no enhancement bit flag variable, the method further comprises: setting values of an array to a predetermined value, wherein the method further comprises invoking a picture reconstruction process based on the array.
  14. 14. A method according to claim 13 wherein for the first value of the no enhancement bit flag variable decoding the enhancement data further comprises, for a first enhancement layer: after performing the dequantization and transformation processes, performing a filter process to obtain a first array of residuals.
  15. 15. A method according to claim 13 wherein for the second value of the no enhancement_bit_flag variable the method further comprises, for a second enhancement layer: retrieving a temporal enabled flag variable from the decoder configuration, wherein a first value of the temporal enabled flag variable indicates that the predetermined value is a value stored in a temporal buffer, wherein the temporal buffer comprises values associated with a previous frame, and wherein a second value of the temporal enabled flag variable indicates that the predetermined value is 0.
  16. 16. A method according to claim 13, wherein for the first value of the no enhancement bit flag variable, the method further comprises, for a second enhancement layer: invoking a temporal prediction process to obtain an array of temporally predicted residuals; and combining the array of temporally predicted residuals with the array of residuals.
  17. 17. A method according to claim 16 wherein decoding the enhancement data further comprises: retrieving a temporal_refresh_bit_flag variable and a temporal_enabled_flag variable from the decoder configuration; based on a value of the temporal refresh bit flag variable, the method further comprises either invoking a temporal prediction process to obtain an array of temporally predicted residuals, wherein the temporal prediction process comprises combining an array of decoded temporally predicted residuals with a temporal buffer, wherein the temporal buffer comprises values associated with a previous frame; or setting values of the array of temporally predicted residuals to contain only zeros; and based on the value of temporal enabled flag, adding the array of temporally predicted residuals to the array of residuals, and storing the array of residuals to a temporal buffer, wherein the temporal buffer comprises values associated with a previous frame.
  18. 18. A method according to claim 17, wherein decoding the enhancement data comprises: invoking the temporal prediction to output the array of temporally predicted residuals if the value of the temporal_refresh_bit_flag variable is a first value; setting the array of temporally predicted residuals to contain only zeros if the value of the temporal refresh bit flag variable is a second value; adding the array of temporally predicted residuals to the array of residuals and storing the array of residuals to the temporal buffer if the value of the temporal enabled flag variable is the second value.
  19. 19. A method according to claim 18, wherein the first value of the no enhancement_ bit_ flag variable further indicates that the method further comprises: modifying a step Width variable to be used in the dequantization process if a value of the temporal enabled flag is the second value.
  20. 20. A method according to claim 14, wherein the method further comprises retrieving a user_ data_ enabled variable from the decoder configuration, and before performing the dequantization and transformation processes, the method further comprises performing a plurality of bit shift operations comprising: performing a first bit shift operation on an element of an array specifying level 1 entropy decoded quantized transform coefficients, wherein the first bit shift operation comprises a 2-bit or 6-bit right shift based on a value of a variable specifying a size of a current transform block, and the value of the user data enabled variable; performing a second bit shift operation on the element, wherein the second bit shift operation comprises a 1-bit bit right shift; and if the last bit of the element is set to 1, the second bit shift operation further comprises setting the element to be negative.
  21. 21. A method according to claim 20, wherein, if a variable representing a size of a current transform block, nTbS, is equal to 4, the first bit shift operation comprises shifting TransformCoeffQ(1)(1) two bits to the right when the value of user data enabled is 1, wherein TransfonnCoeffQ is the array of level 1 entropy decoded quantized transform coefficients; or shifting TransfonnCoeffQ(1)(1) six bits to the right when the value of user data enabled is 2; and the second bit shift operation comprises if a last bit of TransformCoeffQ(1)(1) is set to 0, shifting TransformCoeffQ(1)(1) one bit to the right; or if a last bit of TransfonnCoeffQ(1)(1) is set to 1, shifting TransformCoeffQ(1)(1) one bit to the right and setting TransfonnCoeffQ(1)(1) to be negative; or if nTbS is equal to 2, the first bit shift operation comprises: shifting TransformCoeffQ(0)(1) two bits to the right when the value of user data enabled is 1-or shifting TransformCoeffQ(0)(1) six bits to the right when the value of user data enabled is 2; and the second bit shift operation comprises: if a last bit of TransformCoeffQ(0)(1) is set to 0, shifting TransfonnCoeffQ(1)(1) one bit to the right; or if a last bit of TransformCoeffQ(0)(1) is set to 1, shifting TransformCoeffQ(0)(1) one bit to the right and setting TransformCoeffQ(0)(1) to be negative.
  22. 22. A method according to claim 15, wherein the method further comprises retrieving a temporal refresh bit flag variable and a temporal enabled flag variable from the decoder configuration, and for the second value of the no enhancement bit flag variable: based on a value of the temporal_refresh_bit_flag variable, the method further comprises: invoking a temporal prediction process to obtain an array of temporally predicted residuals, wherein the temporal prediction process comprises combining the array of decoded temporally predicted residuals with a temporal buffer, wherein the temporal buffer comprises values associated with a previous frame; or setting values of an array of temporally predicted residuals to contain only zeros; and based on the value of the temporal_enabled_flag variable, the method further comprises: storing the array of temporally predicted residuals in the array of residuals, and storing the array of residuals to the temporal buffer, or setting values of the array of residuals to contain only zeros
  23. 23. A method according to claim 22, wherein decoding the enhancement data further comprises: invoking the temporal prediction process if the value of the temporal refresh bit flag variable is a first value; or setting the array of temporally predicted residuals to contain only zeros if the value of the temporal refresh bit flag variable is a second value; and storing the array of temporally predicted residuals in the array of residuals, and storing the array of residuals to the temporal buffer if the value of the temporal enabled flag is the second value; or setting the array of residuals to contain only zeros if the value of the temporal enabled flag is the first value.
  24. 24. A processing apparatus configured to decode a bitstream according to any of claims 1 to 12 and/or perform a method according to any of claims 13 to 13.
  25. 25. A computer-readable storage medium storing instructions which, when executed by a processing apparatus, cause the processing apparatus to perform a method according to any of claims 13 to 23.
GB2312668.3A 2019-03-20 2020-03-18 Low complexity enhancement video coding Active GB2619186B (en)

Applications Claiming Priority (22)

Application Number Priority Date Filing Date Title
GBGB1903844.7A GB201903844D0 (en) 2019-03-20 2019-03-20 A method of encoding and decoding a video
GBGB1904014.6A GB201904014D0 (en) 2019-03-23 2019-03-23 Video coding technology
GBGB1904492.4A GB201904492D0 (en) 2019-03-29 2019-03-29 Video coding technology
GBGB1905325.5A GB201905325D0 (en) 2019-04-15 2019-04-15 Video coding technology
GBGB1909701.3A GB201909701D0 (en) 2019-07-05 2019-07-05 Video coding technology
GBGB1909724.5A GB201909724D0 (en) 2019-07-06 2019-07-06 Video coding technology
GBGB1909997.7A GB201909997D0 (en) 2019-07-11 2019-07-11 Encapsulation structure
GBGB1910674.9A GB201910674D0 (en) 2019-07-25 2019-07-25 Video Coding Technology
GBGB1911467.7A GB201911467D0 (en) 2019-08-09 2019-08-09 Video coding technology
GBGB1911546.8A GB201911546D0 (en) 2019-08-13 2019-08-13 Video coding technology
GB201914215A GB201914215D0 (en) 2019-10-02 2019-10-02 Video coding technology
GB201914414A GB201914414D0 (en) 2019-10-06 2019-10-06 Video coding technology
GB201914634A GB201914634D0 (en) 2019-10-10 2019-10-10 Video coding technology
GB201915553A GB201915553D0 (en) 2019-10-25 2019-10-25 Video Coding technology
GBGB1916090.2A GB201916090D0 (en) 2019-11-05 2019-11-05 Video coding technology
GBGB1918099.1A GB201918099D0 (en) 2019-12-10 2019-12-10 Video coding technology
GBGB2000430.5A GB202000430D0 (en) 2020-01-12 2020-01-12 Video Coding technologies
GBGB2000483.4A GB202000483D0 (en) 2020-01-13 2020-01-13 Video coding technology
GBGB2000600.3A GB202000600D0 (en) 2020-01-15 2020-01-15 Video coding technology
GBGB2001408.0A GB202001408D0 (en) 2020-01-31 2020-01-31 Video coding technology
US202062984261P 2020-03-02 2020-03-02
GB2303563.7A GB2614983B (en) 2019-03-20 2020-03-18 Low complexity enhancement video coding

Publications (3)

Publication Number Publication Date
GB202312668D0 GB202312668D0 (en) 2023-10-04
GB2619186A true GB2619186A (en) 2023-11-29
GB2619186B GB2619186B (en) 2024-03-20

Family

ID=88189928

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2312668.3A Active GB2619186B (en) 2019-03-20 2020-03-18 Low complexity enhancement video coding

Country Status (1)

Country Link
GB (1) GB2619186B (en)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None *

Also Published As

Publication number Publication date
GB202312668D0 (en) 2023-10-04
GB2619186B (en) 2024-03-20

Similar Documents

Publication Publication Date Title
GB2618717A (en) Low complexity enhancement video coding
US20220385911A1 (en) Use of embedded signalling for backward-compatible scaling improvements and super-resolution signalling
JP2022526726A (en) Methods for adaptive parameter set references and constraints in coded video streams
US20220272342A1 (en) Quantization of residuals in video coding
US20220182654A1 (en) Exchanging information in hierarchical video coding
KR20230107627A (en) Video decoding using post-processing control
GB2619186A (en) Low complexity enhancement video coding
CN114041289A (en) Method for indicating the number of sub-layers in a multi-layer video stream
GB2614054A (en) Digital image processing