GB2509966A

GB2509966A - Encoding a video data stream using hardware acceleration

Info

Publication number: GB2509966A
Application number: GB1301003.8A
Authority: GB
Inventors: Steven B M Delputte; Kristof A J Denolf; Ronny R Dewaele
Original assignee: Barco NV
Current assignee: Barco NV
Priority date: 2013-01-10
Filing date: 2013-01-21
Publication date: 2014-07-23
Anticipated expiration: 2033-01-21
Also published as: HK1200253A1; GB201301003D0; GB201300410D0; GB2509966B

Abstract

An encoder for encoding an embedded bit stream from a video data source in which a base layer is generated to act as a transport container and in which additional information is streamed alongside each frame as an enhancement layer (such as in a scalable video encoder, SVC) with the encoder being adapted to use hardware acceleration. The encoder may utilise commercial of the shelf (COTS) hardware units to generate the base layer. Preferably the encoder can make use of existing streaming protocols over existing network infrastructure which are often already optimised for steaming video applications using existing standards such as H.264/AVC or HEVC High efficiency video coding. Also disclosed is a decoder adapted to decode an output from the encoder and a network including the disclosed encoder and decoder.

Description

ENHANCED VIDEO CODEC

The present invention relates to networks, encoders and/or decoders for networks for the transmission of video images, as well as methods of operating such networks and devices and software and hardware for carrying out such methods.

Technical Background

In the move from dedicated video cabling towards network based content transmission, video compression is often needed. The current and/or upcoming l-l.264/AVC based and hardware accelerated solutions provide a good enough quality to cover most video-only use cases.

Several use cases however require more accurate client side representations of the original content (typical synthetic screen content: high quality remote desktop, control rooms (SCADA). medical synthetic content (computer reconstructed imagery),..). There is a need for enhanced video coding technology.

SEI = Special Enhancement Information messages or the VUI = Video Usability Information messages are known from the H.264 standard. Used for e.g. SVC. SE! messages (Supplemental enhancement information) are an integral part of the H.264 standard. See Annex D for some predefined payloads. Some examples are buffering period, scene information, recoveiy point, picture timing. progressive refinement segment, full-frame snapshot. (spare picture). post-filter hint,... The user_data_unregistered payload type can hold any user specified content. User_data_registered contains user data registered as specified by ITU-T Rec. T.35. Enhancement!ayers as defined by Annex U (H.264/SVC) are also embedded in these SRI messages.

Summary of the present invention

Embodiments of the present invention provide methods or systems or components for such systems which are any of, or a combination of hardware (1-lW) independent, low complexity, parallelizable and custom made for suitable graphics processor (such as COTS CPUs, UPUs, FPGAs, SoCs or any combination thereof), as well as methods of operating such networks and devices and software and hardware for carrying out such methods.

The present invention relates in one aspect to an embedded bitstream and its applications. In particular embodiments of the present invention relate to a base layer as "transport container" and to stream any possible information alongside each frame. Preferably reuse is made of existing (standardized) streaming protocols over existing network infrastructure which often already are optimized for (e.g. H.264/AVC or HEVC High efficiency video coding) streaming video applications.

Embodiments of the present invention not only involve encapsulating of infoirnation in a base layer, but also a dedicated codec for rather static synthetic screen content which is computed on the programmable compute cores of a I-lW platform (e.g. CPU, CPU, APU, FPCIA, SoC.

or any combination thereof). These platforms have support for HW accelerated encoding of video streams, using non-programmable dedicated HW cores, which generate standard compliant bitstreams.

These latter streams can have constant 30 or 60 fps, standard compliant, low bandwidth, suitable for LAN breakout,... They can be reused as well to facilitate computation an enhancement layer, e.g. to improve quality of synthetic screen content, e.g. using interlayer prediction techniques.

Embodiments of the present invention facilitate and/or improve encoding of synthetic screen content; streaming of metadata alongside the base ayer, e.g. with composition information, for tile border artifact correction. data-dependant color transforms,...

Embodiments of the present invention provide a so-called LAN breakout involving synthetic screen content using only a non-complex gateway. Thus, it can handle synthetic screen content within the LAN and provide a medium qua'ity high-frame-rate outside the LAN without complex decoding and re-encoding at the gateway.

Particular aspects of the present invention are: -The re-use of dedicated hardware acceleration, e.g. available on COTS CPUs, APU's; GPUs, FPGAs, SoCs or any combination thereof for the base layer in combination with a low complexity enhancement layer (EL) to be computed on the general purpose compute cores of those COTS platforms.

-A specific implementation of the enhancement layer codec (see the options AIB/CTh below).

-scaling between colour domains (option D below) , that is combination in the same window video and image content, thus corresponding to lower quality (for video) and higher quality (for image) which can be transmitted in one stream using the layered approach of embodiments of the present invention, still with COTS.

Brief Description of the drawings

Fig 1 shows a screen scraped composition of a computer graphics application with video overlay or a webbrowser with a video insert.

Fig. 2 illustrates a Block-based per macro block (MB) computation of a prediction value, wherein the difference of each pixel with that prediction is calculated.

Fig. 3 shows a schematic diagram of an embodiment of the present invention.

Fig. 4 shows a schematic diagram of a further embodiment of the present invention.

Fig. 5 shows a schematic diagram of yet another embodiment of the present invention.

Fig. 6 shows a schematic diagram of still another embodiment of the present invention.

Fig. 7 shows a schematic diagram of a further embodiment of the present invention.

Fig. 8 shows an independent embodiment of the present invention having enhancement of the base layer by repurposing scalability tools, more specifically the spatial resolution.

Fig. 9 shows a schematic diagram of a further embodiment of the present invention.

Fig. 10 shows a schematic diagram of an embodiment of a network according to the present invention having at least one encoder and at least one decoder.

Fig. II shows a schematic diagram of a further embodiment of a network according to the present invention having at least one encoder and at least one decoder.

Detailed description

Hardware accderation is the use of hardware to perform some function, e.g. video processing, without putting load on your CPU. Hardware acceleration allows concurrency which obtains faster processing compared to software. In the context of this text, the components we refer to should be COTS of e.g. GPUs, CPUs, FPGAs, SoCs. APUs or any combination thereof. For video processing, the accelerating hardware can be fixed function blocks, i.e. dedicated processing cores on the chip that can only do video encoding.

Once embedded iii the base layer, that infoirnation can travel a network alongside the frame it corresponds with. Benefit can be obtained from all the technologies that typically facilitate video streaming such as time stamping and synchronization, buffering, NAT, firewall S traversal, error correction In H.264. this embedding is attained through the SET = Special Enhancement Information messages (per frame) or the VUI = Video Usability Information messages (per set of frames).

In the present application HEVC High efficiency video coding can replace 264/AVC coding as a possible base layer in each or any of the embodiments.

The information that can stream along includes either alone or in any combination, but is not limited to * Enhancement layer (EL): high quality (standard or non-standard) encoded video frames (or regions of interest thereof), * Metadata with respect to pre-and post-coding operations such as o data-dependant color transforms o and decoder-side chroma upsampling filter matched to the encoder-side downsampling filter, * Composition information * Metadata for correcting tile border artefacts.

Enhancement layer codec Embodiments of the present invention provide a codec for use in network based systems and methods. The codec can facilitate cost efficient reuse of off-the-shelf (e.g. HW accelerated) encoders that are (or will be) provided on virtually all HW platforms, e.g. APIJ, GPU, DSP, FPGA or any combination thcrcof.

An advantage of the codec can be to increase the quality of the base layer (BL) as provided by the complimentary' NW through an enhancement layer (EL) in order to reach high video quality requirements. The EL codec can be easy to implement and can run on general purpose computing units of the HW mentioned above.

The base layer can be used in typica' WAN video streaming (e.g. lossy, constant bit rate, for instance 1 -10 Mbps). The enhancement layer can be used in a wide range of applications and networks, for instance with LAN video distribution, ranging from lossy compression with S constant bit rate for optimal network transmission behavior, over visually up to mathematically lossless compression.

For this purpose, the base ayer is not simply a transpor container but also a means to improve the compression ratio of the (optionally near) lossless enhancement layer. Inter-layer prediction can be used in embodiments to decrease the overall data rate of the embedded bitstream w.r.t to simulcasting both streams.

The present invention also includes within its scope that in BL-EL combinations, DPCM coding of the original data e.g. pixels with the base layer can introduce new artificial high frequencies because of the quantized DCT coefficients of the base layer.

The encoded enhancement layer is preferably encapsulated in the base layer, e.g. in such a way that the embedded bit stream remains processable by any standard compliant decoder.

For example. in such a case, any such decoder can extract and decode the base layer by simply discarding the enhancement layer information. This embedded approach also allows for straightforward gateway functionality (cheap packet processing).

Embodiments of the present invention may find advantageous use for rather static high resolution synthetic screen content, in particular enhancing rather static high saliency regions.

Depending on the available data rate however, more dynamic regions of interest (e.g. natural light video, video games,...) can benefit from the quality enhancement layer as well.

With respect to Latency -if reconstructed frames are not accessible through the vendor's API, an extra (HW acc) decoder step can be used. This added latency can be reduced by working with independent slices and/or by removing the inter-layer prediction (see Option B), or by adding a buffer containing the previous reconstructed frame to be used for the inter-layer prediction of the current frame in the EL, such that the BL encoder and EL encoder can keep working in parallel instead of sequentially (cfr option D below).

For the encoding of the enhancement layer, several options are included within the scope of the present invention each of which is an embodiment of the present invention.

Remarks concerning all options: * First step is preferably color transfoim to decrease inter-channel redundancy. A reversible transform should preferably be used when mathematical losslessness is required.

* Most options below are block based with as little data dependencies between blocks as possible. This allows for any or all of: a Parallelization for HW acceleration, o easy compositioning of 1 source to be spread over multiple monitors/display controllers/video streams/encoders o region of interest (ROl) coding: each block can have its own quality priority for the encoder as well as the transmission. Blocks can be classified based on is their * number of edges * number of colors * static versus dynamic nature * saliency / entropy o Periodic intra refresh (blocks are refreshed in a (e.g.) horizontally scrolling column -the refresh wave', cfr. x264). Benefits: * New decoders can join mid-stream * Avoid too large residual drift in case of lossy compression of inter-frame prediction without encoder side reconstruction loop.

* Low latency, constant bit rate streaming: as opposed to classic key frame method, more constant frame sizes become possible with intra refresh.

When hybnd content is transmitted (e.g. a screen scraped composition of a computer graphics application with video overlay or a webbrowser with a video insert, see for example Fig. 1), the boundaries of the static synthetic content typically won't coincide with block (or tile) borders. Quality differences can occur between the video pixels that were and were not enhanced. Moreover, the frame rate of the enhancement layer might be lower than that of the base layer. Especially for e.g. option B with its rather large blocks/tiles of 64 x 64 pixels these border artifacts could be disturbing. To hide them, the present invention includes within its scope to define and transmit in the SEI messages a binary mask (run length encoded (RLE) and only transmitted if changed wrt previous frame) that segments the frame in static versus S dynamic regions up to the pixel level. This problem will occur however in only a few use cases with visibly lossy inserts embedded in lossless content.

This way, the decoder-side renderer can opt not to show lossless enhancements of those dynamic pixels for which there is only lossy base layer information available in the neighbouring blocks (tiles).

Lossess compression conflicts with low latency constant bit rate streaming. For synthetic content the frame rate could be lowered, only updating the display frame buffer when all (dirty) blocks have been decoded. This simple buffering mechanism of course increases the latency.

Inter-frame prediction in combination with lossy compression, requires an encoder side reconstruction loop to avoid residual drift between encoder and decoder. Or if an intra-refresh mechanism is used with high enough periodicity and/or the quantization step size is small, this loop might be skipped with acceptable drift. Thus reducing the encoder complexity.

Disturbing differences in quality in regions where users expect similar quality can e.g. occur when there is only limited bandwidth available: just enough to bring the wh6le frame to a decent enough video quality at constant high frame rate and enhance the static regions to lossless at lower frame rate.

With COTS encoders the quality of the static regions will increase through p-frames as well, at least if qp_min is set low enough, but typically never up to mathematically lossless, as this is not supported in COTS implementations.

Option A: lossless bit depth reduction (Data Packing) with possible extension to "near lossless".

This relates to a specific embodiment for enhanced layer (EL) coding. It relates to bit packing technique to encode the enhancement layer (e.g. residual after inter-layer prediction).

It is Block-based: Per macro block (MB): one prediction value is computed, then the difference of each pixel with that prediction is calculated, If the MB is small enough, this error will be small for each pixel. Therefore most Most Significant Bits will be zero. Using an appropriate transport protocol, these MSBs can be discarded from the bit stream, resulting in a lossless compressed bitstream.

In figure 2, one can see that the most MBs (macroblocks) in the chroma channels (bottom) require only the transmission of 2-3 bits. The Y channel is less uniform and therefore requires more bits per macroblock. For this frame, you'd need to transmit on average only 7.44 bits per pixel (instead of 24bpp). Of course you also need to transmit the prediction values. In this case, increasing the overall average data rate to 8.96 bpp.

This principle can be used in a stand-alone' codec (e.g. prediction value per MB is e.g. either the MB median value (intra-frame prediction), resulting in a down scaled thumbnail' version of the whole frame that is transmitted raw & thus allows for fast previewing, see figure above) as well as EL codec. In the latter case the prediction value is chosen from the best prediction candidate: * intra-frame (spatial prediction) * inter-frame (temporal prediction, MVO only) * or inter-layer (error between input & base layer) (If base layer is 4:2:0, then one should also take the diff with the inter-layer prediction of previous frame, to compensate for the 4:2:0 versus 4:4:4 differences that remain non-zero even when the content became static.) The best prediction value is the one with lowest k (= number of bits required to represent the dift'), where k is ceil(log2(rnax(diff(:))-min(diff(:)))) (if input is 8 bit, then k <=9).

An embodiment of the present invention is shown in the flow diagram of Fig. 3 and results in Fig. 4.

This principle can be extended to "near lossless" compression by discarding one or more non-zero LSB bit planes. Dithering (and optimized color palettes) can be employed to reduce the potential visual impact (gradient banding). In combination with temporal prediction, an encoder side reconstruction loop will be needed to avoid residual drift in this lossy compression approach. This kind of quantization' imposes an upper bound on the absolute pixel error, which can be of interest for legal certification.

Transmission mode can be made "quality progressive" by transmitting bit plane per bit plane, starting with the MSB bit plane and subsequently transmit less significant bit planes, progressively updating from visually lossless to mathematically lossless.

By combining the above in a packetized transport protocol, cheap transcoding can be achieved. Simply discarding LSB packages will transcode a high data rate lossless stream into a lower data rate lossy stream.

Option B: flexible, DWT-based based codec with progressive transmission of dirty tiles This embodiment relates to a flexible solution that allows full adaptation (in theory even on the fly) to specific tecimical market requirements through the following encoder & transmission control options: Content adaptive: classification & prioritization of channels, tiles and subbands. Possible classification features are the same as above: * Dirtyness: clean tilcs don't nccd to bc scnt to the rcccivcr and arc skipped. Thc more pixels are different w.r,t. to the previous frame, the dirtier a tile is.

* Agel: how long ago did a tile become clean * Age2: how long ago has the tile last been updated (for intra-refresh purpses) * Number of colors * Number of edges * Static versus dynamic nature (computed from statistics of the dirtyness of that tile gathered over time) * Saliency / entropy Network adaptive: o Adjustable quantization step sizes per subband & channel o Resolution progressive updating: select subbands for transmission (3 steps) o Quality progressive updating: encode bit plane per bit plane (8 steps) o Spatially progressive updating: classify, select and prioritze tiles for transmission in up to m x n steps, with m = width / 64 and n = height / 64.

o Temporal progressive updating: frame skipping or finer grained: tile/block skipping Ad hoc or heunstic rules eliminate the need for expensive computationally expensive) rate-distortion optimization. Packetization of the above progressions allows for cheap transcoding by simply dropping packets.

This second approach does not need to exploit inter-layer dependencies. That makes it able to function as a simulcast solution, where the base layer provides a high frame rate, standard compliant video stream and where the enhancement layer (e.g. embedded in the base layer using SEI messages) updates this to better quality (but possibly lower frame rate depending on the available frame rate and the content dirtiness). This approach has a much more flexible EL codec, which makes it easier to adapt behavior to a wide range of use case requirements.

Just as most currently availaHe remote desktop protocols, option B uses dirty tiling and intra-frame prediction only. No inter-layer prediction means this is a simulcast solution, where the AVC base layer is only used as transport container' for network streaming of the EL.

A flow diagram of this embodiment is shown in Fig. 5 -Dirty tile (aka Hock) detection, with tile! block size: 64 x 64.

Reversible DWT, e.g. 3 levels.

Most DWTs are not shift variant. This can make it sub-optimal to do motion estimation in the wavelet domain. Consequently MVO prediction of moving (e.g. scrolling) content is hampered as well.

Quantization is optional: encoder can operate in mathematically ossless mode as well as high compression ratio lossy mode.

Entropy encoder: * CAVLC = H.264 run length encoder, for more efficient compression in lossless mode, CAVLC can be adjusted for non-quantized coefficients * CABAC = H.264 arithmetic encoder (NW acceleration accessible through e.g. AML API (AMD)) * RLGR = Run Length Golomb Rice encoder, cfr RemoteFX * BLC3-alike transform + runlength encoder

S

MUX here means merging the EL coded information with the base layer by adding that info to the SEI messages. Thus creating a standard compliant stream decodable by any standard compliant decoder. But with quality enhancements if the decoder can also understand and decode the EL info in the SEL messages.

Some more advantages: * No extra latency. EL and BL can be computed simultaneously.

* This EL codec can operate stand-alone: o Sub-frame latency is possible o Do color transform after the dirty tile detection to reduce computation load.

This codec can use similar coding tools as MS RemoteEX codec from Microsoft. which is hardware accelerated on many OPUs, ASICS,... RemoteFX is a codec plus. importantly, a transmission protocol intended for remote desktop connections.

An alternative, another option with similar data flowchart (see Fig. 6) is dirty tiling + JPEG-LS. JPEG-LS uses line-based median prediction with adaptive prediction correction, followed by an entropy coder operating either in run length mode in flat regions or context modeling + Golomb coding mode elsewhere. JPEG-LS can also operate in near-lossless mode (more complex). The near-lossless mode of JPEG-LS defines an upper boundary for the absolute pixel error. This cou'd be useful for legal certification. Tile sizes can be smaller than DWT based approach.

Option C: DPCM encoder: BLC4 = BLC3 + inter-layer prediction A flow chart for this embodiment is shown in Fig. 7. Encoder side reconstruction ioop can be omitted if periodic intra refresh is used andlor the quantization step sizes are small (see the remarks higher in this document concerning afl EL codec options).

Standard Compliancy Option C can be implemented using H.264/AVC and SVC compliant tools.

Advantages: -Possible hardware acceleration through e.g. AMD AML API -bit stream can be standard compliant, allowing for playback on any standard compliant device that supports the required profile and level. Predictive lossless AVC compression however requires the not widely supported I-1i444P profile at high data throughputs.

An independent embodiment of the present invention is enhancing the 4:2:0 base layer to 4:4:4 by repurposing H.264 I SVC scalability tools, more specifically the spatial resolution upsampling filters. This independent embodiment is shown in the flow chart of Fig. 8.

In 4:2:0 mode, the base layer encoded Y-channel will have the full width W and height H. While the chroma channels will have only half that width and height. Using the SVC spatial upsampling filters, it is possible to upsample the chroma channels to the full width and height. Then use these upscaled channel estimates for residual prediction followed by entropy coding of these residuals to create the EL bitstream.

Option D Methods and systems have been described above that reuse a (e.g. non-programmable) HW accelerated video encoder (e.g. embedded in a HW platform such as CPU, GPU, FPGA, SoC (System on Chip) or any combination thereof, to generate a standard compliant base layer. An enhancement layer is computed using the programmable compute cores I capabilities of that hardware platform.

Fig. 9 shows a flowchart of a further embodiment of the present invention. The first flowchart borrows from option B and shows how the reconstructed base layer frames can be exploited efficiently for the enhancement layer. in this embodiment there is efficient use of the non-S programmable HW accelerated encoding of the base layer in conjunction with an enhancement layer encoder running on the programmable compute cores of that HW.

It is adapted for encoding rather static, computer generated synthetic screen content and to priontize tiles for encoding and transmission. This makes the enhancement layer encoder adaptive.

Notes for Fig. 9: -EL = enhancement layer -BL = base layer -MUX means merging the EL coded information with the base layer by adding that info to the SET messages. Thus creating a standard compliant stream decodable by any standard compliant decoder. But with quality enhancements if the decoder can also understand and decode the EL info in the SEL messages.

-Color transform is lossless if enhancement layer should result in mathematically lossless quality -"reconstructed frame buffer" is a "frame delayer with I frame". It is optionally used if the reconstructed reference frames are not directly accessible from the HW base layer encoder, but instead need to be computed by a HW decoder in a separate step following the base layer encoder. In the latter case, we can avoid 1 frame latency by using the previous reconstructed frame (stored in Buffer]) for the inter-layer prediction of the EL.

-Bufferl is used to catch 4:2:0 versus 4:4:4 differences in case of static content. If the content is static for a certain time, the base ayer will converge to a rather high quality but will always be 4:2:0. Thus there will always remain a difference with the 4:4:4 original frames. That residue, although static in time will have a certain entropy and thus require a constant certain amount of bits after compression. By taking the difference between two of such residu-frames in time, that difference will have all zero's and thus compress much better. (note: this behaviour can also be obtained by the "tile classification and selection" block) -The EL chooses from inter-layer prediction, intra-frame prediction or inter-frame prediction based on the output of the residual encoding of each option. It chooses the option which results in hightest compression ratio.

-The "Tile classification and selection" block does encoder and network transmission control. For each tile a weight is computed. The tiles with highest weight get priority to be encoded by the EL codec and to be muxed in the final bitstream for transmission.

-The weight of each tile can depend on following tile I block features: -Dirtyness: clean files don't need to be sent to the receiver and are skipped. The more pixels are different w.tt. to the previous frame, the dirtier a tile is.

-Agel: how long ago did a tile become clean -Age2: how long ago has the tile last been updated (for intra-refresh purpses) -Number of colors -Number of edges -Static versus dynamic nature (computed from statistics of the dirtyness of that tile gathered over time) -Saliency I entropy.

-Tiles are pushed through the encoder and into the muxer for transmission for as long as Cpu load is not above a threshold and network bandwidth is still available. Other tiles of that frame are skipped (but their Age 1 is increased if they were clean).

-The residiual encoder can either operate in lossless mode or lossy (with transform and quantization). In the latter case inter-frame prediction will cause residual drift (between encoder and decoder), which can be solved by in-loop encoder side reconstruction (inverse transform and quantization) or -if that extra encoder complexity is unwanted-can be kept under control by using an intra-refresh mechanism with high enough periodicity and/or small enough quantizadon step sizes.

-With reference to fig. I also in this case with hybrid content (e.g. a screen scraped composition of a computer graphics application with video overlay or a webbrowser with a video insert), the boundaries of the static synthetic content typically won't coincide with block (or tile) borders. Quality differences can occur between the video pixels that were and were not enhanced. Moreover, the frame rate of the enhancement layer might be lower than that of the base layer. Especially when using larger block / tiles sizes (e.g. 64x64) these border artifacts could be disturbing. To hide them, we define and transmit in the SEL messages a binaiy mask (RLE encoded & only transmitted if changed wrt previous frame) that segments the frame in static versus dynamic regions up to the pixel level.

-Periodic intra-refresh is used (blocks / tiles are refreshed in a (e.g.) horizontally scrolling column). Benefits: -New receivers can join and start decoding mid-stream -Avoid too large residua' drift in case of lossy compression of inter-frame prediction without encoder side reconstruction oop.

-Low latency, constant bit rate streaming: more constant frame sizes are possible when using periodic intra-refresh as opposed to classic key frame method where peaks occur in the bandwidth when an intra-only encoded frame is sent.

Metadata for pre-and post-coding operations The following aspects can be used with any of the options of the present invention described above.

Content adaptive color transform Decorrelation of the three color channels is preferably the first step in compression. It reduces statistical reduncancy a lot, if the use case is well defined, e.g. surgical endoscopic video, and can be expected to primarily contain specific colors, the color transforms can be adapted to decompose these specific colors into one channel primarily.

Also, typical color transforms are not reversible. To assure mathematically Iossess processing for the whole chain, we'll want to define our own specific reversible color transforms. These could be communicated between encoder and decoder through SEIIVUI messages.

Chroma subsampling The human eye has less resolving power for color than for luminance information. The first step in compression is therefore preferably reducing the resolution of the chroma channels, through a filtering operation. A two-tap filter kernel would mean a chroma sample location between two luma samples, whereas a three-tap filter kernel would mean that the chroma sample location is equal to one of the luma sample locations.

To avoid washed out colors and spatially shifted chroma channels, this pre-and post-coding operation needs to be matched.

A solution to this in H.264 is, e.g. this information can optionally be passed from encoder to decoder through the VUI messages. Figure E-1 in H.264 Annex E for instance describes how to "nansform" chroma sample locations into a value of 0 to 5, to be passed to the VUI message (not SET).

An improved solution for synthetic content, and especially colored shading patterns, is that more advanced I dedicated chroma up-and downsamplers can be used that preserve these patterns.

Algorithm and filter coefficients can be communicated dynamically (frame per frame) between encoder and decoder as "enhancement information" embedded in lossy standard encoded bitstreams.

Metadata for Dynamic Composition The following aspects can be used with any of the options of the present invention described above.

In applications where the video stream is to be shown as part of a composition on a arger display wall, the position, size, possible geometrical transform, etc can be transmitted along with each frame.

This approach works when the composition is known at the encoder side.

Metadata for tile border artifact correction The following aspects can be used with any of the options of the present invention described above.

In a tiled approach. with each tile being encoded/decoded by an independent encoder/decoder instance, the codec of this embodiment will allow for larger than typical video sizes (as commonly achievable with off the shelve encoders). In case of (strong) lossy compression, such an approach typically suffers from tile border artefacts: -artificially introduced high frequencies by simply cutting the broader picture in smaller parts -in case of CBR coding, tiles with lower entropy can have significantly higher quality than neigbouring high entropy tiles.

We propose adding metadata to the embedded bitstream to facilitate each decoder instance to decode its designated tile and correct possible tile border artefacts, independantly from other decoders, that are designated to handling neighbouring tiles.

That metadata could either be a couple of compressed lines from the neighbounng tile or true metadata (coefficients,...) to steer the border correction algorithm in the current tile.

This tile border artifact correction technique can be linked to sending along a binary mask to mark regions with significant difference in quality at artificial tile/block borders as opposed to less noticeable locations in the video frames.

Decoder The present invention also includes a decoder either alone or in combination with any embodiment of one or more encoders as described above, e.g. in a network. The decoder is preferably adapted to decode any of the outputs of the encoder of any embodiment of one or more encoders as described above, e.g. input received from a network. A decoder in accordance with an embodiment of the present invention is adapted for hardware accelerated decoding of the base layer. Preferably an embodiment of a decoder according to the present invention is adapted for decoding the enhancement layer using programmable cores of any of the above mentioned COTS platforms, e.g. COTS CPUs. GPUs, FPGAs, SoCs, ... or any combination thereof. At least one of the decoders of a network can be adapted to decode the base layer by discarding the enhancement layer information. At least one of the decoders of a network can be adapted to decode the base layer and the enhancement layer information.

Implementation As described above the embodiments of the present invention can be implemented using in part at least hardware but also software. The present invention also includes a decoder that inverts the methods described above with respect to the embodiments described. The present invention also includes a network having an encoder and/or a decoder according to any of the embodiments of the present invention.

Figure 10 gives a high level overview of an end-to-end system, comprising an encoder, a network and a decoder. Both encoder and decoder can use the HW accelerated fixed function blocks on thc COTS HW platform (CPU, GPU, APU, FPGA, SoC. ... or any combination thereof) for encoding, resp. decoding the base layer. The enhancement layer can be processed on the general compute cores of those platforms. The final step, after decoding the base layer and the enhancement layer. reconstructs/recompositions the final image to be shown at the receiver end by taking parts from the base layer and pails from the enhancement byer. The latter optionally takes into account metadata embedded in the bitstream for this composition task.

Figure 10 also demonstrates how an economical gateway can be implemented for the LAN breakout. The gateway takes the embedded bitstream as input, does simple packet processing to discard those parts of the bitstream concerning the enhancement information and outputs a S lower data rate bitstream containing only the base layer for transmission over lower bandwidth networks.

It is also shown how standard compliant decoders can take the embedded bitstream as an input and will be able to decode the base layer and simply skip over the enhancement layer information.

With reference to Figures 10 and 11 a network is shown schematica'ly that can be a Loca' Area Network (LAN) or Wider Area Network, data network, etc. In the network there is at least one encoder 10 as described above with respect to any of the encoder embodiments of the invention. An encoder according to any of the embodiments of the present invention can be adapted to perform the encoder algorithm described above preferably on compute cores of the available hardware, e.g. of the COTS CPUs, APUs, GPUs, FPGAs, SoCs or any combination thereof, as shown schematically in Fig. 11. The encoder 10 can include a standard comphant hardware encoder unit 11 for encoding according to a standard such as I-1.264/AVC or I-IEVC and an encoder unit 13 for providing the enhancement layer. The base layer and the enhancement layer are added in a multiplexer 15. Accordingly the present invention also includes a computer program product, e.g. an embedded computer program product that is adapted to implement the codec of an encoder according to any or all of of the embodiments when processed on one of the COTS CPUs, APUs. GPUs, FPGAs. SoCs or any combination thereof.

Output of the encoder will be a base layer 12 and an enhancement layer 14 that will be transmitted in frames 16 in a bitstream. A decoder 20,30 receives the bitstream and displays the video. The decoder 20, 30 can have a variety of functionalities. For example the bitstream may be fed to a standard compliant decoder 24 where only the base layer is decoded. The video can be broken out to different devices such as a computer 26, a laptop 27, a tablet 28, e.g. on a caNe or wireless network or combination of both. Optiona'ly a gateway 22 can be provided to also breakout various video streams to be displayed on displays 26-28.

At least one decoder can also include an enhancement layer decoder unit 30. The decoder 30 includes a demultiplexer 32 which spits the bitstream into the base layer and the enhancement layer. The base layer is decoded with a standard compliant decoder 36, e.g. a hardware H.264/AVC or HEVC decoder. The output of this decoder 36 will be added to the output of an enhancement layer decoder unit 34 so as to display a better quality video image. The decoder unit 34 receives the enhancement layer bitstream. It is adapted to perform an inverse of the encoder algorithm described above preferably on compute cores of the available hardware, e.g. of the COTS CPUs. APUs, GPUs, FPGAs. SoCs or any combination thereof, as shown schematiSly in Fig. 11. Accordingly the present invention also includes a computer program product, e.g. an embedded computer program product that is adapted to implement the decodec when processed on one of the COTS CPUs, GPUs, APUs, FPGAs, SoCs or any combination thereof, the decodec being the inverse of a codec according to any or all of the embodiments of the encoder of the present invention.

In m alternative embodiment the decoder 20, 30 can receive its input bitstream from a storage device such as an opilcal disk (CD-ROM, DVD-ROM, etc.), a Random Access memory (RAM), a magnetic tape, magnetic disk (harddrive, diskette, etc.), a solid state memory (USB memory, flash memory etc.), The video may be compressed and/or encrypted on such a storage medium and the decoder 20, 30 may include a unit adapted to decompress and/or decrypt the video. Hence, in an alternative embodiment the decoder 20, 30 cis adapted to receive its input bitstream from a storage device, e.g. has the appropriate reader Hence the present invention also includes a storage medium for stonng output from an encoder according to any of the embodiments of the present invention and storing the enhancement byer and the base layer. The storage may be compressed and/or encrypted. The storage device may be an optical disk (CD-ROM, DVD-ROM, etc.), a Random Access memory (RAM), a magnetic tape, a magnetic disk (harddrive, diskette, etc.), a solid state memory (USB memory, flash memory etc.),

Claims

Claims 1. A decoder adapted to decode any of the output of an encoder of the claims 7 to 32.
2. A decoder of claim I that is adapted for hardware accelerated decoding of the base layer.
3. A decoder of claim 2 that is adapted for decoding the enhancement layer using the programmable cores of a COTS platform.
4. The decoder of claim 3 that is adapted for decoding on any of COTS CPUs, GPUs, FPGAs, APU's, SoCs, ... or any combination thereof
5. The decoder of any of the claims ito 4 adapted to decode the base layer by discarding the enhancement layer information.
6. The decoder of any of the claims 1 to 5 adapted to receive the base layer and the enhancement layer from a storage medium.
7. An encoder for generating an embedded bitstream from a video data source, wherein the bitstream comprises frames, the encoder having means for generating a base layer as transport container and to stream additionai information alongside each frame as an enhancement layer and being adapted to use hardware acceleration.
8. The encoder of claim 7 wherein the hardware comprises any of COTS CPUs, GPUs, FPGAS, APU's, SoCs, or any combination thereof, adapted to generate the base layer.
9. The encoder of claim 8 further comprising general purpose compute cores adapted to generate the enhancement layer to be computed on one of those COTS platforms.
10. The encoder of claim 9 adapted for computing static synthetic screen content on programmable compute cores of a HW platform.
11. The encoder of any of the claims 7 to 10 adapted so that embedding is attained through the SEI = Special Enhancement Information messages (per frame) or the VUT = Video Usability Information messages (per set of frames).
12. The encoder of any of the claims 7 to 11 wherein the information that can stream along includes either alone or in any combination, but is not limited to the Enhancement layer such as high quality (standard or non-standard) encoded video frames (or regions of interest thereof), Metadata with respect to pre-and post-coding operations such as o data-dependant color transforms o and decoder-side chroma upsampling filter matched to the encoder-side downsampling filter, orComposition information orMetadata for correcting tile border artefacts.
13. The encoder of any of the claims 7 to 12 wherein the base layer is not simply a transport container but also has means to improve the compression ratio of the (optionally near) lossless enhancement layer.
14. The encoder of any of the claims 7 to 13 adapted to use inter-layer prediction to decrease the overa'l data rate of the embedded bitstream.
15. The encoder of any of the claims 7 to 14 adapted so that the enhancement layer is encapsulated in the base layer.
16. The encoder of any of the claims 7 to 15 further comprising means for color transformation to decrease inter-channel redundancy.
17. The encoder of claim 16 wherein the transformation is a reversible transformation.
18. The encoder according to any of the claims 7 to 17 adapted for block based processing of the data source.
19. The encoder of any of the claims 7 to 18 adapted for any of: o Parallelization for HW acceleration, o easy compositioning of 1 source to be spread over muldple monitors/display controllers/video streams/encoders o region of interest (RO!) coding: each block can have its own quality priority for the encoder as well as the transmission. Blocks can be classified based on their * number of edges * number of colors * static versus dynamic nature saliency / entropy o Penodic intra refresh (blocks are refreshed in a (e.g.) horizontally scrolling column
20. Encoder according to any of the claims 7 to 19 adapted to encode the enhancement layer.
21. Encoder according to any of the claims 7 to 20 wherein base layer is a standard compliant video stream and the enhancement layer updates this to better quality.
22. Encoder according to any of the claims 7 to 21 further comprising dirty tile detection.
23. Encoder according to any of the claims 7 to 22 adapted for classification & prioritization of any of colour channels, tiles (spatial regions of interest) and wavelet subbands.
24. Encoder according to claim 23 wherein classification features are any of: * Dirtyness: clean tiles don't need to be sent to the receiver and are skipped whereby the more pixels are different w.r.t. to the previous frame, the dirtier a tile is.* Agel: how long ago did a tile become clean * Age2: how long ago has the tile last been updated for intra-refresh purposes * Number of colors * Number of edges * Static versus dynamic nature optionally computed from statistics of the dirtyness of that tile gathered over time * Saliency / entropy
25. Encoder of claim 24 further adapted for any of the following: o Adjustable quantization step sizes per subband & channel o Resolution progressive updating: select subbands for transmission (n steps, where n is the number of wavelet decomposition levels) o Quality progressive updating: encode bit plane per bit plane (n steps, where n is the bit depth of the content) o Spatially progressive updating: classify, select and prioritze tiles for compression and transmission in up to m x n steps, with m = width / tile_width and n = height / tile_height.o Temporal progressive updating: frame skipping or finer grained: is tile/block skipping
26. The encoder according to any of the daims 23 to 25 adapted for DWT-based coding with progressive transmission of dirty tiles.
27. The encoder of claim 26 wherein the DWT is reversible.
28. The encoder according to any of the claims 23 to 27 adapted for quantization.
29. The encoder of claim 26 adapted to operate in mathematically lossless mode or a compressed lossy mode.
30. The encoder of any of the claims 12 to 29 further comprising an entropy encoder.
31. The encoder of any of the claims 7 to 30 adapted to transmit a binary mask that segments a frame in static versus dynamic regions up to the pixel level.
32. The encoder of any of the claims 7 to 31 adapted for scaling between colour domains, that is combination in the same window video and image content.
33. A network including an encoder according to any of the claims 7 to 32 or the decoder of any of claims 1 toô.
34, A storage medium storing an output from an encoder according to any of the claims 7 to32.
35. A computer program product that is adapted to implement the codec of an encoder according to any of the claims 7 to 32.
36. A computer program product that is adapted to implement the decodec of an decoder according to any of the claims I to 6