OA21018A

OA21018A - Video encoder, video decoder, methods for encoding and decoding and video data stream for realizing advanced video coding concepts

Info

Publication number: OA21018A
Application number: OA1202200478
Authority: OA
Inventors: De La Fuente Yago Sanchez; Karsten Sühring; Cornelius Hellge; Thomas Schierl; Robert Skupin; Thomas Wiegand
Original assignee: Ge Video Compression, Llc
Priority date: 2020-05-22
Filing date: 2021-05-21
Publication date: 2023-08-24

Abstract

An apparatus (200) for receiving an input video data stream according to an embodiment is provided. The input video data stream has a video encoded thereinto. The apparatus (200) is configured to generate an output video data stream from the input video data stream.

Description

The présent invention relates to video encoding and video decoding and, in particular, to a video encoder, to a video décoder, to methods for encoding and decoding and to a video data stream for realizing advanced video coding concepts.

H.265/HEVC (HEVC = High Efficiency Video Coding) is a video codée which already provides tools for elevating or even enabling parallel processing at an encoder and/or at a décoder. For example, HEVC supports a sub-division of pictures into an array of tiles which are encoded independently from each other. Another concept supported by HEVC pertains to WPP, according to which CTU-rows or CTU-lines of the pictures may be processed in parallel from left to right, e.g. in stripes, provided that some minimum CTU offset is obeyed in the processing of consecutive CTU lines (CTU = coding tree unit). It would be favorable, however, to hâve a video codée at hand which supports parallel processing capabilities of video encoders and/or video decoders even more efficiently.

In the following, an introduction to VCL partitioning according to the state-of-the-art is described (VCL = video coding layer).

Typically, in video coding, a coding process of picture samples requires smaller partitions, where samples are divided into some rectangular areas for joint processing such as prédiction or transforma coding. Therefore, a picture is partitioned into blocks of a particular size that is constant during encoding ofthe video sequence. In H.264/AVC standard fixedsize blocks of 16x16 samples, so called macroblocks, are used (AVC = Advanced Video Coding).

In the state-of-the-art HEVC standard (see [1]), there are Coded Tree Blocks (CTB) or Coding Tree Units (CTU) of a maximum size of 64 x 64 samples. In the further description of HEVC, for such a kind of blocks, the more common term CTU is used.

CTUs are processed in raster scan order, starting with the top-left CTU, processing CTUs in the picture line-wise, down to the bottom-right CTU.

The coded CTU data is organized into a kind of container called slice. Originalfy, in former video coding standards, slice means a segment comprising one or more consecutive CTUs of a picture. Slices are employed for a segmentation of coded data. From another point of view, the complété picture can also be defined as one big segment and hence, historically, the term slice is still applied. Besides the coded picture samples, slices also comprise additional information related to the coding process of the slice itself which is placed into a so-called slice header.

According to the state-of-the-art, a VCL (video coding layer) also comprises techniques for fragmentation and spatial partitioning. Such partitioning may, e.g., be applied in video coding for various reasons, among which are processing load-balancing in paralleîization, CTU size matching in network transmission, error-mitigation etc.

Other examples relate to Roi (Roi = Région of Interest) encodings, where there is for example a région in the middle of the picture that viewers can select e.g. with a zoom in operation (decoding only the Roi), or graduai décoder refresh (GDR) in which intra data (that is typically put into one frame of a video sequence) is temporalîy distributed over several successive frames, e.g. as a column of intra blocks that swipes over the picture plane and resets the temporal prédiction chain locally in the same fashion as an intra picture does it for the whole picture plane. For the latter, two régions exist in each picture, one that is recently reset and one that is potentially affected by errors and error propagation.

Reference Picture Resampling (RPR) is a technique used in video coding to adapt the quality/rate of the video not only by using a coarser quantization parameter but by adapting the resolution of potentially each transmitted picture. Th us, references used for inter prédiction might hâve a different size that the picture that is currently being predicted for encoding. Basically, RPR requires a resampling process in the prédiction loop, e.g., upsampling and downsampling filters to be defined.

Depending on flavor, RPR can result in a change of coded picture size at any picture, or be limited to happen at only some particular picture, e.g. only at particular positions bounded for instance to segment boundaries adaptive HTTP streaming.

The object of the présent invention is to provide improved concepts for video encoding and video decoding.

The object of the présent invention is solved by the subject-matter of the independent claims.

In accordance with a first aspect of the invention, an apparatus for receiving an input video data stream is provided. The input video data stream has a video encoded thereinto. The apparatus is configured to generate an output video data stream from the input video data stream. Moreover, the apparatus is to détermine whether a picture of the video preceding a dépendent random access picture shall be output or not.

Moreover, a video data stream is provided. The video data stream has a video encoded thereinto. The video data stream comprises an indication that indicates whether a picture ofthe video preceding a dépendent random access picture shall be output or not.

Furthermore, a video encoder îs provided. The video encoder is configured to encode a video into a video data stream. Moreover, the video encoder is configured to generate the video data stream such that the video data stream comprises an indication that indicates whether a picture of the video preceding a dépendent random access picture shall be output or not.

Moreover, a video décoder for receiving a video data stream having a video stored therein is provided. The video décoder is configured to décodé the video from the video data stream. The video décoder is configured to décodé the video dépending on an indication indicating whether a picture ofthe video preceding a dépendent random access picture shall be output or not.

Furthermore, a method for receiving an input video data stream is provided. The input video data stream has a video encoded thereinto. The method comprises generating an output video data stream from the input video data stream. Moreover, the method comprises determining whether a picture of the video preceding a dépendent random access picture shall be output or not.

Moreover, a method for encoding a video into a video data stream is provided. The method comprises generating the video data stream such that the video data stream comprises an indication that indicates whether a picture of the video preceding a dépendent random access picture shafJ be output or not.

Furthermore, a method for receiving a video data stream having a video stored therein is provided. The method comprises decoding the video from the video data stream. Decoding the video is conducted depending on an indication indicating whether a picture of the video preceding a dépendent random access picture shall be output or not.

Moreover, computer programs for implementing one of the above-described methods when being executed on a computer or signal processor are provided.

In accordance with a second aspect of the invention, an apparatus for receiving one or more input video data streams is provided. Each of the one or more input video data streams has an input video encoded thereinto. The apparatus is configured to generate an output video data stream from the one or more input video data streams, the output video data stream encoding an output video, wherein the apparatus is configured to generate the output video data stream such that the output video is the input video being encoded within one of the one or more input video data streams, or such that the output video dépends on the input video of at least one of the one or more input video data streams. Moreover, the apparatus is configured to détermine an access unit removal time of a current picture of a plurality of pictures ofthe output videofrom a coded picture buffer. The apparatus is configured to détermine whether or not to use coded picture buffer delay offset information for determining the access unit removal time ofthe current picture from the coded picture buffer.

Furthermore, a video data stream is provided. The video data stream has a video encoded thereinto. The video data stream comprises coded picture buffer delay offset information.

Furthermore, a video décoder for receiving a video data stream a video stored therein is provided. The video décoder is configured to décodé the video from the video data stream. Moreover, the video décoder is configured to décodé the video depending on an access unit removal time of a current picture of a plurality of pictures ofthe video from a coded picture buffer. The video décoder is configured to décodé the video depending on an indication indicating whether or not to use coded picture buffer delay offset information fordetermining the access unit removal time ofthe current picture from the coded picture buffer.

Moreover, a method for receiving one or more input video data streams is provided. Each of the one or more input video data streams has an input video encoded thereinto. The method comprises generating an output video data stream from the one or more input video data streams, the output video data stream encoding an output video, wherein generating the output video data stream is conducted such that the output video is the input video being encoded within one of the one or more input video data streams, or such that the output video dépends on the input video of at least one of the one or more input video data streams. Moreover, the method comprises determining an access unit removal time of a current picture of a plurality of pictures of the output video from a coded picture buffer. Furthermore, the method comprises determining whether or not to use coded picture buffer delay offset information for determining the access unit removal time of the current picture from the coded picture buffer.

Furthermore, a method for encoding a video into a video data stream according to an embodiment is provided. The method comprises generating the video data stream such that the video data stream comprises coded picture buffer delay offset information.

Moreover, a method for receiving a video data stream a video stored therein is provided. The method comprises decoding the video from the video data stream, Decoding the video is conducted depending on an access unit removal time of a current picture of a plurality of pictures ofthe video from a coded picture buffer, Moreover, decoding the video is conducted depending on an indication indicating whether or not to use coded picture buffer delay offset information for determining the access unit removal time ofthe current picture from the coded picture buffer.

Furthermore, computer programs for implementing one ofthe above-described methods when being executed on a computer or signal processor are provided.

In accordance with a third aspect ofthe invention, a video data stream is provided. The video data stream has a video encoded thereinto. Moreover, the video data stream comprises an initial coded picture buffer removal delay. Furthermore, the video data stream comprises an initial coded picture buffer removal offset. Moreover, the video data stream comprises information that indicates whether or not a sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across two or more buffering période.

Furthermore, a video encoder is provided. The video encoder is configured to encode a video into a video data stream. Moreover, the video encoder is configured to generate the video data stream such that the video data stream comprises an initial coded picture buffer removal delay. Furthermore, the video encoder is configured to generate the video data stream such that the video data stream comprises an initial coded picture buffer removal offset. Moreover, the video encoder is configured to generate the video data stream such that the video data stream comprises information that indicates whether or not a sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across two or more buffering periods.

Moreover, an apparatus for receiving two input video data streams, being a first input video data stream and a second input video data stream, is provided. Each of the two input video data streams has an input video encoded thereinto. The apparatus is configured to generate an output video data stream from the two input video data streams, the output video data stream encoding an output video, wherein the apparatus is configured to generate an output video data stream by concatenating the first input video data stream and the second input video data stream. Moreover, the apparatus is configured to generate the output video data stream such that the output video data stream comprises an initial coded picture buffer removal delay. Furthermore, the apparatus is configured to generate the output video data stream such that the output video data stream comprises an initial coded picture buffer removal offset. Moreover, the apparatus is configured to generate the output video data stream such that the output video data stream comprises information that indicates whether or not a sum ofthe initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across two or more buffering periods.

Furthermore, a video décoder for receiving a video data stream a video stored therein is provided. The video décoder is configured to décodé the video from the video data stream. Moreover, the video data stream comprises an initial coded picture buffer removal delay. Furthermore, the video data stream comprises an initial coded picture buffer removal offset. Moreover, the video data stream comprises information that indicates whether or not a sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across two or more buffering periods. Furthermore, the video décoder is configured to décodé the video depending on the information that indicates whether or not the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across the two or more buffering periods.

Moreover, a method for encoding a video into a video data stream is provided. The method comprises generating the video data stream such that the video data stream comprises an initial coded picture buffer removal delay. Furthermore, the method comprises generating the video data stream such that the video data stream comprises an initial coded picture buffer removal offset. Moreover, the method comprises generating the video data stream such that the video data stream comprises information that indicates whether or not a sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across two or more buffering periods.

Furthermore, a method for receiving two input video data streams, being a first input video data stream and a second input video data stream, is provided. Each of the two input video data streams has an input video encoded thereinto. The method comprises generating an output video data stream from the two input video data streams, the output video data stream encoding an output video, wherein the apparatus is configured to generate an output video data stream by concatenating the first input video data stream and the second input video data stream. Moreover, the method comprises generating the output video data stream such that the output video data stream comprises an initial coded picture buffer removal delay. Furthermore, the method comprises generating the output video data stream such that the output video data stream comprises an initial coded picture buffer removal offset. Moreover, the method comprises generating the output video data stream such that the output video data stream comprises information that indicates whether or not a sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across two or more buffering periods.

Moreover, a method for receiving a video data stream a video stored therein is provided. The method comprises decodîng the video from the video data stream. The video data stream comprises an initial coded picture buffer removal delay. Moreover, the video data stream comprises an initial coded picture buffer removal offset. Furthermore, the video data stream comprises information that indicates whether or not a sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across two or more buffering periods. The method comprises decodîng the video depending on the information that indicates whether or not the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across the two or more buffering periods.

In accordance with a fourth aspect ofthe invention, a video data stream is provided. The video data stream has a video encoded thereinto. Moreover, the video data stream comprises an indication indicating whether or not a non-scalable nested picture timing supplémentai enhancement information message of a network abstraction layer unit of an access unit of the plurality of access units of a coded video sequence of a one or more coded video sequences of the video data stream is defined to apply to ail output layer sets of a plurality of output layer sets of said access unit. If the indication has a first value, then the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit is defined to apply to ail output layer sets ofthe plurality of output layer sets of said access unit. Ifthe indication has a value being different from the first value, then the indication does not define whether or not the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit applies to ail output layer sets of the plurality of output layer sets of said access unit.

Moreover, a video encoder is provided. The video encoder is configured to encode a video into a video data stream. Moreover, the video encoder is configured to generate the video data stream such that the video data stream comprises an indication indicating whether or not a non-scalable nested picture timing supplémentai enhancement information message of a network abstraction layer unit of an access unit of the plurality of access units of a coded video sequence of a one or more coded video sequences of the video data stream is defined to apply to ail output layer sets of a plurality of output layer sets of said access unit. If the indication has a first value, then the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit is defined to apply to ail output layer sets of the plurality of output layer sets of said access unit. If the indication has a value being different from the first value, then the indication does not define whether or not the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit applies to ali output layer sets of the plurality of output layer sets of said access unit.

Furthermore, an apparatus for receiving an input video data stream is provided., The input video data stream has a video encoded thereinto. The apparatus is configured to generate an processed video data stream from the input video data stream. Moreover, the apparatus is configured to generate the processed video data stream such that the processed video data stream comprises an indication indicating whether or not a nonscalable nested picture timing supplémentai enhancement information message of a network abstraction layer unit of an access unit of the plurality of access units of a coded video sequence of a one or more coded video sequences of the processed video data stream is defined to apply to ail output layer sets of a plurality of output layer sets of said access unit. If the indication has a first value, then the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unît of said access unit is defined to apply to ail output layersets ofthe plurality ofoutput layer sets of said access unit. If the indication has a value being different from the first value, then the indication does not define whether or not the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit applies to ail output layer sets of the plurality of output layer sets of said access unit.

Moreover, a video décoder for receiving a video data stream having a video stored therein is provided, The video décoder is configured to décodé the video from the video data stream. The video data stream comprises an indication indicating whether or not a nonscalable nested picture timing supplémentai enhancement information message of a network abstraction layer unit of an access unit of the plurality of access units of a coded video sequence of a one or more coded video sequences of the video data stream is defined to apply to ail output layer sets of a plurality of output layer sets of said access unit. If the indication has a first value, then the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit is defined to apply to ail output layer sets of the plurality of output layer sets of said access unit. If the indication has a value being different from the first value, then the indication does not define whether or not the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit applies to ail output layer sets of the plurality of output layer sets of said access unit.

Furthermore, a method for encoding a video into a video data stream is provided. The method comprises generating the video data stream such that the video data stream comprises an indication indicating whether or not a non-scalable nested picture timing supplémentai enhancement information message of a network abstraction layer unit of an access unit of the plurality of access units of a coded video sequence of a one or more coded video sequences of the video data stream is defined to apply to ail output layer sets of a plurality of output layer sets of said access unit. If the indication has a first value, then the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit is defined to apply to ali output layer sets of the plurality of output layer sets of said access unit. If the indication has a value being different from the first value, then the indication does not define whether or not the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit applies to ail output layer sets of the plurality of output layer sets of said access unit.

Moreover, a method for receiving an input video data stream is provided. The input video data stream has a video encoded thereinto. The method comprises generating an processed video data stream from the input video data stream. Moreover, the method comprises generating the processed video data stream such that the processed video data stream comprises an indication indicating whether or not a non-scalable nested picture timing supplémentai enhancement information message of a network abstraction layer unit of an access unit of the plurality of access units of a coded video sequence of a one or more coded video sequences of the processed video data stream is defined to apply to ail output layer sets of a plurality of output layer sets of said access unit. If the indication has a first value, then the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit is defined to apply to ail output layer sets of the plurality of output layer sets of said access unit. If the indication has a value being different from the first value, then the indication does not define whether or not the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit applies to ail output layer sets of the plurality of output iayer sets of said access unit.

Furthermore, a method for receiving a video data stream having a video stored therein is provided. The method comprises decoding the video from the video data stream. The video data stream comprises an indication indicating whether or not a non-scalable nested picture timing supplémenta! enhancement information message of a network abstraction layer unit of an access unit ofthe plurality of access units of a coded video sequence of a one or more coded video sequences ofthe video data stream is defined to apply to ail output layer sets of a plurality of output layer sets of said access unit. If the indication has a first value, then the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit is defined to apply to ail output layer sets of the plurality of output layer sets of said access unit. If the indication has a value being different from the first value, then the indication does not define whether or not the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit applies to ail output layer sets ofthe plurality of output layer sets of said access unit.

!n accordance with a fifth aspect of the invention, a video data stream is provided. The video data stream has a video encoded thereinto. Moreover, the video data stream comprises one or more scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémentai enhancement information messages comprise a plurality of syntax éléments. Each syntax element of one or more syntax éléments of the plurality of syntax éléments is defined to hâve a same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream or of a portion of the video data stream.

Moreover, a video encoder is provided. The video encoder is configured to encode a video into a video data stream. Moreover, the video encoder is configured to generate the video data stream such that the video data stream comprises one or more scalable nested supplémentai enhancement information messages. Furthermore, the video encoder is configured to generate the video data stream such that the one or more scalable nested supplémentai enhancement information messages comprise a plurality of syntax éléments. Moreover, the video encoder is configured to generate the video data stream such that each syntax element of one or more syntax éléments of the plurality of syntax éléments is defined to hâve a same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream or of a portion of the video data stream.

Furthermore, an apparatus for receiving an input video data stream is provided. The input video data stream has a video encoded thereinto. The apparatus is configured to generate an output video data stream from the input video data stream. The video data stream comprises one or more scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémentai enhancement information messages comprise a plurality of syntax éléments. Each syntax element of one or more syntax éléments ofthe plurality of syntax éléments is defined to hâve a same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream or of a portion of the video data stream. The apparatus is configured to process the one or more scalable nested supplémentai enhancement information messages.

Moreover, a video décoder for receiving a video data stream having a video stored therein is provided. The video décoder is configured to décodé the video from the video data stream. The video data stream comprises one or more scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémentai enhancement information messages comprise a plurality of syntax éléments. Each syntax element of one or more syntax éléments of the plurality of syntax éléments is defined to hâve a same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream or of a portion of the video data stream. The video décoder is configured to décodé the video depending on the one or more syntax éléments of the plurality of syntax éléments.

Furthermore, a method for encoding a video into a video data stream is provided. The method comprises generating the video data stream such that the video data stream comprises one or more scalable nested supplémentai enhancement information messages. Moreover, the method comprises generating the video data stream such that the one or more scalable nested supplémentai enhancement information messages comprise a plurality of syntax éléments. Furthermore, the method comprises generating the video data stream such that each syntax element of one or more syntax éléments of the plurality of syntax éléments is defined to hâve a same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream or of a portion of the video data stream.

Moreover, a method for receiving an input video data stream is provided. The input video data stream has a video encoded thereinto. The method comprises generating an output video data stream from the input video data stream. The video data stream comprises one or more scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémentai enhancement information messages comprise a plurality of syntax éléments. Each syntax element of one or more syntax éléments of the plurality of syntax éléments is defined to hâve a same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream or of a portion of the video data stream. The method comprises processing the one or more scalable nested supplémentai enhancement information messages.

Furthermore, a method for receiving a video data stream having a video stored therein is provided. The method comprises decoding the video from the video data stream. The video data stream comprises one or more scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémentai enhancement information messages comprise a plurality of syntax éléments. Each syntax element of one or more syntax éléments of the plurality of syntax éléments is defined to hâve a same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream orof a portion ofthe video data stream. Decoding the video is conducted depending on the one or more syntax éléments of the plurality of syntax éléments.

Preferred embodiments are provided in the dépendent claims.

In the following, embodiments of the présent invention are described in detail with reference to the figures, in which:

Fig. 1 illustrâtes a video encoder for encoding a video into a video data stream according to an embodiment.

Fig. 2 illustrâtes an apparatus for receiving an input video data stream according to an embodiment.

Fig. 3 illustrâtes a video décoder for receiving a video data stream having a video stored therein according to an embodiment.

Fig. 4 illustrâtes an original bitstream (depicted at the top of Fig. 4), and a bitstream after dropping pictures (depicted atthe bottom of Fig. 4) according to an embodiment.

Fig. 5 illustrâtes a splicing of two bitstreams after pictures hâve been dropped from the one ofthe two bitstreams according to an embodiment.

Fig. 6 illustrâtes a splicing of two bitstreams according to another embodiment.

Fig. 7 illustrâtes two sets of HRD SEIs, scalable nested SEIs and non-scalable nested SEIs, in a two-layer bitstream according to an embodiment.

Fig. 8 illustrâtes a video encoder.

Fig. 9 illustrâtes a video décoder.

Fig. 10 illustrâtes the relationship between the reconstructed signal, e.g., the reconstructed picture, on the one hand, and the combination of the prédiction residual signal as signaled in the data stream, and the prédiction signal, on the other hand.

The following description of the figures starts with a présentation of a description of an encoder and a décoder of a block-based prédictive codée for coding pictures of a video in order to form an example for a coding framework into which embodiments of the présent invention may be built in. The respective encoder and décoder are described with respect to Fig. 8 to Fig. 10. Thereinafter the description of embodiments of the concept of the présent invention is presented along with a description as to how such concepts could be built into the encoder and décoder of Fig. 8 and Fig. 9, respectively, although the embodiments described with Fig. 1 to Fig. 3 and following, may also be used to form encoders and decoders not operating according to the coding framework underlying the encoder and décoder of Fig. 8 and Fig. 9.

Fig. 8 shows a video encoder, an apparatus for predictively coding a picture 12 into a data stream 14 exemplarily using transform-based residual coding. The apparatus, or encoder, is indicated using reference sign 10. Fig. 9 shows a corresponding video décoder 20, e.g., an apparatus 20 confïgured to predictively décodé the picture 12’ from the data stream 14 also using transform-based residual decoding, wherein the apostrophe has been used to indicate that the picture 12’ as reconstructed by the décoder 20 deviates from picture 12 originally encoded by apparatus 10 in terms of coding loss introduced by a quantization ofthe prédiction residual signal. Fig. 8 and Fig. 9 exemplarily use transform based prédiction residual coding, although embodiments of the présent application are not restricted to this kind of prédiction residual coding. This is true for other details described with respect to Fig. 8 and Fig. 9, too, as will be outlined hereinafter.

The encoder 10 is confïgured to subject the prédiction residual signal to spatial-to-spectral transformation and to encode the prédiction residual signal, thus obtained, into the data stream 14. Likewise, the décoder 20 is confïgured to décodé the prédiction residual signal from the data stream 14 and subject the prédiction residual signal thus obtained to spectral-to-spatial transformation.

Intemally, the encoder 10 may comprise a prédiction residual signal former 22 which generates a prédiction residual 24 so as to measure a déviation of a prédiction signal 26 from the original signal, e.g., from the picture 12. The prédiction residual signal former 22 may, for instance, be a subtractor which subtracts the prédiction signal from the original signal, e.g., from the picture 12. The encoder 10 then further comprises a transformer 28 which subjects the prédiction residual signal 24 to a spatial-to-spectral transformation to obtain a spectral-domain prédiction residual signal 24’ which is then subject to quantization by a quantizer 32, also comprised by the encoder 10. The thus quantized prédiction residual signal 24” is coded into bitstream 14. To this end, encoder 10 may optionally comprise an entropy coder 34 which entropy codes the prédiction residual signal as transformed and quantized into data stream 14. The prédiction signal 26 is generated by a prédiction stage 36 of encoder 10 on the basis of the prédiction residual signal 24” encoded into, and decodable from, data stream 14. To this end, the prédiction stage 36 may internally, as is shown in Fig. 8, comprise a dequantizer 38 which dequantizes prédiction residual signal 24” so as to gain spectral-domain prédiction residual signal 24”’, which corresponds to signal 24’ except for quantization loss, followed by an inverse transformer 40 which subjects the iatter prédiction residual signal 24’” to an inverse transformation, e.g., a spectral-to-spatial transformation, to obtain prédiction residual signal 24””, which corresponds to the original prédiction residual signal 24 except for quantization loss. A combiner 42 of the prédiction stage 36 then recombines, such as by addition, the prédiction signal 26 and the prédiction residual signal 24”” so as to obtain a reconstructed signal 46, e.g., a reconstruction of the original signal 12. Reconstructed signal 46 may correspond to signal 12'. A prédiction module 44 of prédiction stage 36 then generates the prédiction signal 26 on the basis of signal 46 by using, for instance, spatial prédiction, e.g., intra-picture prédiction, and/or temporal prédiction, e.g., interpicture prédiction.

Likewise, décoder 20, as shown in Fig, 9, may be internally composed of components corresponding to, and interconnected in a manner corresponding to, prédiction stage 36. In particular, entropy décoder 50 of décoder 20 may entropy décodé the quantized spectral-domain prédiction residual signal 24” from the data stream, whereupon dequantizer 52, inverse transformer 54, combiner 56 and prédiction module 58, interconnected and cooperating in the manner described above with respect to the modules of prédiction stage 36, recover the reconstructed signal on the basis of prédiction residual signal 24” so that, as shown in Fig. 9, the output of combiner 56 results in the reconstructed signal, namely picture 12’.

Although not specifically described above, it is readily clear that the encoder 10 may set some coding parameters including, for instance, prédiction modes, motion parameters and the like, according to some optimization scheme such as, for instance, in a manner optimizing some rate and distortion related criterion, e.g., coding cost. For example, encoder 10 and décoder 20 and the corresponding modules 44, 58, respectively, may support different prédiction modes such as intra-coding modes and inter-coding modes. The granuiarity at which encoder and décoder switch between these prédiction mode types may correspond to a subdivision of picture 12 and 12’, respectively, into coding segments or coding blocks. In units of these coding segments, for instance, the picture may be subdivided into blocks being intra-coded and blocks being inter-coded. Intracoded blocks are predicted on the basis of a spatial, already coded/decoded neighborhood of the respective block as is outlined in more detail below. Several intracoding modes may exist and be selected for a respective intra-coded segment including directional or angular intra-coding modes according to which the respective segment is filled by extrapolating the sample values of the neighborhood along a certain direction which is spécifie forthe respective directional intra-coding mode, into the respective intracoded segment. The intra-coding modes may, for instance, also comprise one or more further modes such as a DC coding mode, according to which the prédiction for the respective intra-coded block assigns a DC value to ail samples within the respective intracoded segment, and/or a planar intra-coding mode according to which the prédiction of the respective block is approximated or determined to be a spatial distribution of sample values described by a two-dimensional linear function over the sample positions of the respective intra-coded block with driving tilt and offset of the plane defined by the twodimensional linear function on the basis ofthe neighboring samples. Compared thereto, inter-coded blocks may be predicted, for instance, temporally. For inter-coded blocks, motion vectors may be signaled within the data stream, the motion vectors indicating the spatial displacement of the portion of a previously coded picture of the video to which picture 12 belongs, at which the previously coded/decoded picture is sampled in order to obtain the prédiction signal forthe respective inter-coded block. This means, in addition to the residual signal coding comprised by data stream 14, such as the entropy-coded transform coefficient levels representing the quantized spectral-domain prédiction residual signal 24”, data stream 14 may hâve encoded thereinto coding mode parameters for assigning the coding modes to the various blocks, prédiction parameters for some of the blocks, such as motion parameters for inter-coded segments, and optional further parameters such as parameters for controlling and signaling the subdivision of picture 12 and 12’, respectively, into the segments. The décoder 20 uses these parameters to subdivide the picture in the same manner as the encoder did, to assign the same prédiction modes to the segments, and to perform the same prédiction to resuit in the same prédiction signal.

Fig. 10 illustrâtes the relationship between the reconstructed signal, e.g., the reconstructed picture 12’, on the one hand, and the combination ofthe prédiction residual signal 24’’ as signaled in the data stream 14, and the prédiction signal 26, on the other hand. As already denoted above, the combination may be an addition. The prédiction signal 26 is illustrated in Fig. 10 as a subdivision ofthe picture area into intra-coded blocks which are illustratively indicated using hatching, and inter-coded blocks which are illustratively indicated not-hatched. The subdivision may be any subdivision, such as a regular subdivision of the picture area into rows and columns of square blocks or nonsquare blocks, or a multi-tree subdivision of picture 12 from a tree root block into a plurality of leaf blocks of varying size, such as a quadtree subdivision or the like, wherein a mixture thereof is illustrated in Fig. 10 in which the picture area is first subdivided into rows and columns of tree root blocks which are then further subdivided in accordance with a recursive multi-tree subdivisioning into one or more leaf blocks.

Again, data stream 14 may hâve an intra-coding mode coded thereinto for intra-coded blocks 80, which assigns one of several supported intra-coding modes to the respective intra-coded block 80. For inter-coded blocks 82, the data stream 14 may hâve one or more motion parameters coded thereinto. Generally speaking, inter-coded blocks 82 are not restricted to being temporally coded. Altematively, inter-coded blocks 82 may be any block predicted from previously coded portions beyond the current picture 12 itseîf, such as previously coded pictures of a video to which picture 12 belongs, or picture of another view or an hierarchically lower layer in the case of encoder and décoder being scalable encoders and decoders, respectively.

The prédiction residual signal 24’”’ in Fig. 10 is also illustrated as a subdivision ofthe picture area into blocks 84. These blocks might be called transform blocks in order to distinguish same from the coding blocks 80 and 82. In effect, Fig. 10 illustrâtes that encoder 10 and décoder 20 may use two different subdivisions of picture 12 and picture 12’, respectively, into blocks, namely one subdivisioning into coding blocks 80 and 82, respectively, and another subdivision into transform blocks 84. Both subdivisions might be the same, e.g., each coding block 80 and 82, may concurrently form a transform block 84, but Fig. 10 illustrâtes the case where, for instance, a subdivision into transform blocks 84 forms an extension of the subdivision into coding blocks 80, 82 so that any border between two blocks of blocks 80 and 82 overlays a border between two blocks 84, or altematively speaking each block 80, 82 either coïncides with one of the transform blocks or coïncides with a cluster of transform blocks 84. However, the subdivisions may also be determined or selected independent from each other so that transform blocks 84 could alternatively cross block borders between blocks 80, 82. As far as the subdivision into transform blocks 84 is concerned, similar statements are thus true as those brought forward with respect to the subdivision into blocks 80, 82, e.g., the blocks 84 may be the resuit ofa regular subdivision of picture area into blocks (with or without arrangement into rows and columns), the resuit ofa recursive multi-tree subdivisioning ofthe picture area, or a combination thereof or any other sort of blockation. Just as an aside, it is noted that blocks 80, 82 and 84 are not restricted to being of quadratic, rectangular or any other shape.

Fig. 10 further illustrâtesthatthe combination ofthe prédiction signal 26 and the prédiction residual signal 24”” directly resuIts in the reconstructed signal 12’. However, it should be noted that more than one prédiction signal 26 may be combined with the prédiction residual signal 24”” to resuit into picture 12' in accordance with alternative embodiments.

In Fig. 10, the transform blocks 84 shall hâve the following significance. Transformer 28 and inverse transformer 54 perform their transformations in units of these transform blocks 84. For instance, many codées use some sort of DST or DCT for all transform blocks 84. Some codées allow for skipping the transformation so that, for some of the transform blocks 84, the prédiction residual signal is coded in the spatial domain directly. However, in accordance with embodiments described below, encoder 10 and décoder 20 are configured in such a mannerthat they support several transforms. For example, the transforms supported by encoder 10 and décoder 20 could comprise:

o DCT-Il (or DCT-III), where DCT stands for Discrète Cosine Transform o DST-IV, where DST stands for Discrète Sine Transform o DCT-IV o DST-VII o Identity Transformation (IT)

Naturally, while transformer 28 would support all of the forward transform versions of these transforms, the décoder 20 or inverse transformer 54 would support the corresponding backward or inverse versions thereof:

o Inverse DCT-II (or inverse DCT-III) o Inverse DST-IV o Inverse DCT-IV o Inverse DST-VII o Identity Transformation (IT)

The subséquent description provides more details on which transforme could be supported by encoder 10 and décoder 20. In any case, it should be noted that the set of supported transforms may comprise merely one transform such as one spectral-to-spatial or spatiaî-to-spectral transform.

As already outlined above, Fig. 8 to Fig. 10 hâve been presented as an example where the inventive concept described further below may be implemented in order to form spécifie examples for encoders and decoders according to the present application. Insofar, the encoder and décoder of Fig. 8 and Fig. 9, respectively, may represent possible implémentations ofthe encoders and decoders described herein below. Fig. 8 and Fig. 9 are, however, only examples. An encoder according to embodiments of the present application may, however, perform block-based encoding of a picture 12 using the concept outlined in more detail below and being different from the encoder of Fig. 8 such as, for instance, in that same is no video encoder, but a still picture encoder, in that same does not support inter-prediction, or in that the sub-division into blocks 80 is performed in a manner different than exemplified in Fig. 10. Likewise, decoders according to embodiments of the present application may perform block-based decodîng of picture 12’ from data stream 14 using the coding concept further outlined below, but may differ, for instance, from the décoder 20 of Fig. 9 in that same is no video décoder, but a still picture décoder, in that same does not support intra-prediction, or in that same subdivides picture 12’ into blocks in a manner different than described with respect to Fig. 10 and/or in that same does not dérivé the prédiction residual from the data stream 14 in transform domain, but in spatial domain, for instance.

Fig. 1 illustrâtes a video encoder 100 for encoding a video into a video data stream according to an embodiment. The video encoder 100 is configured to generate the video data stream such that the video data stream comprises an indication that indicates whether a picture of the video preceding a dépendent random access picture shall be output or not.

Fig. 2 illustrâtes an apparatus 200 for receiving an input video data stream according to an embodiment. The input video data stream has a video encoded thereinto. The apparatus 200 is configured to generate an output video data stream from the input video data stream.

Fig. 3 illustrâtes a video décoder 300 for receiving a video data stream having a video stored therein according to an embodiment. The video décoder 300 is configured to décodé the video from the video data stream. The video décoder 300 is configured to décodé the video depending on an indication indicating whether a picture of the video preceding a dépendent random access picture shali be output or not,

Moreover, a System according to an embodiment is provided. The system comprises the apparatus of Fig. 2 and the video décoder of Fig. 3. The video décoder 300 of Fig. 3 is configured to receive the output video data stream of the apparatus of Fig. 2. The video décoder 300 of Fig. 3 is configured to décodé the video from the output video data stream of the apparatus 200 of Fig. 2.

In an embodiment, the system may, e.g., further comprise a video encoder 100 of Fig. 1. The apparatus 200 of Fig. 2 may, e.g., be configured to receive the video data stream from the video encoder 100 of Fig. 1 as the input video data stream.

The (optional) intermediate device 210 of the apparatus 200 may, e.g., be configured to receive the video data stream from the video encoder 100 as an input video data stream and to generate an output video data stream from the input video data stream. For example, the intermediate device may, e.g., be configured to modify (header/meta data) information of the input video data stream and/or may, e.g., be configured to delete pictures from the input video data stream and/or may configured to mix/splice the input video data stream with an additional second bitstream having a second video encoded thereinto.

The (optional) video décoder 221 may, e.g., be configured to décodé the video from the output video data stream.

The (optional) Hypothetical Reference Décoder 222 may, e.g., be configured to détermine timing information for the video depending on the output video data stream, or may, e.g., be configured to détermine buffer information for a buffer into which the video or a portion ofthe video is to be stored.

The system comprises the video encoder 101 of Fig. 1 and the video décoder 151 of Fig. 2.

The video encoder 101 is configured to generate the encoded video signal. The video décoder 151 is configured to décodé the encoded video signal to reconstruct the picture of the video.

A first aspect ofthe invention is claimed in claims 1 to 38.

A second aspect ofthe invention is claimed in claims 39 to 78.

A third aspect of the invention is claimed in claims 79 to 108.

A fourth aspect ofthe invention îs claimed in claims 109 to 134.

A fifth aspect of the invention îs claimed in claims 135 to 188.

In the following, the first aspect ofthe invention is now described in detail.

In accordance with the first aspect of the invention, an apparatus 200 for receiving an input video data stream is provided. The input video data stream has a video encoded thereinto. The apparatus 200 is configured to generate an output video data stream from the input video data stream. Moreover, the apparatus 200 is to détermine whether a picture ofthe video preceding a dépendent random access picture shall be output or not.

According to an embodiment, the apparatus 200 may, e.g., be configured to détermine a first variable (e.g., a NoOutputBeforeDrapFlag) indicating whetherthe picture ofthe video that précédés the dépendent random access picture shall be output or not.

In an embodiment, the apparatus 200 may, e.g., be configured to generate the output video data stream such that the output video data stream may, e.g., comprise an indication that may, e.g., indicate whether the picture of the video preceding the dépendent random access picture shall be output or not.

According to an embodiment, the apparatus 200 may, e.g., be configured to generate the output video data stream such that the output video data stream may, e.g,, comprise supplémentai enhancement information comprising the indication that may, e.g., indicate whether the picture ofthe video preceding the dépendent random access picture shall be output or not.

In an embodiment, the picture ofthe video that précédés the dépendent random access picture may, e.g., be an independent random access picture. The apparatus 200 may, e.g., be configured to generate the output video data stream such that the output video data stream may, e.g., comprise a flag (e.g., a ph_pic_putput_flag) having a predefined value (e.g., 0) in a picture header of the independent random access picture, such that the predefined value (e.g., 0) ofthe flag (e.g., a ph_pic_output_flag) may, e.g., indicate for the independent random access picture directly précédés said dépendent random access picture within the video data stream, and that said independent random access picture shall not be output.

According to an embodiment, the flag may, e.g., be a first flag, wherein the apparatus 200 may, e.g,, be configured to generate the output video data stream such that the output video data stream may, e.g., comprise a further flag in a picture parameter set of the video data stream, wherein the further flag may, e.g., indicate whether or not the first flag (e.g., a ph_pic_output_flag) exists in the picture header of the independent random access picture.

In an embodiment, the apparatus 200 may, e.g., be configured to generate the output video data stream such that the output video data stream may, e.g., comprise as the indication that may, e.g., indicate whether the picture of the video preceding the dépendent random access picture shall be output or not a supplémentai enhancement information flag within a supplémentai enhancement information ofthe output video data stream, or a picture parameter set flag within a picture parameter set of the output video data stream, or a sequence parameter set flag within a sequence parameter set of the output video data stream, or an external means flag, wherein a value of the external means flag may, e.g., be set by an external unit being external to the apparatus 200.

According to an embodiment, the apparatus 200 may, e.g., be configured to détermine a value of a second variable (e.g., a PictureOutputFlag) for the picture of the video that précédés the dépendent random access picture depending on the first variable (e.g., a NoOutputBeforeDrapFlag), wherein the second variable (e.g., a PictureOutputFlag) may, e.g., indicate for said picture whether said picture shall be output or not, and wherein the apparatus 200 may, e.g., be configured to output or to not output said picture depending on the second variable (e.g., a PictureOutputFlag).

In an embodiment, the picture ofthe video that précédés the dépendent random access picture may, e.g., be an independent random access picture. The first variable (e.g., a NoOutputBeforeDrapFlag) may, e.g., indicate that the independent random access picture shall not be output.

According to an embodiment, the picture of the video that précédés the dépendent random access picture may, e.g., be an independent random access picture. The apparatus 200 may, e.g., be configured to set the first variable (e.g., a NoOutputBeforeDrapFlag) such that the first variable (e.g., a NoOutputBeforeDrapFlag) may, e.g., indicate that the independent random access picture shall be output.

In an embodiment, the apparatus 200 may, e.g., be configured to signal to a video décoder 300, whether a picture of the video preceding a dépendent random access picture shall be output or not.

Moreover, a video data stream is provided. The video data stream has a video encoded thereinto. The video data stream comprises an indication that indicates whether a picture of the video preceding a dépendent random access picture shall be output or not.

According to an embodiment, the video data stream may, e.g., comprise supplémentai enhancement information comprising the indication that may, e.g., indicate whether the picture of the video preceding the dépendent random access picture shall be output or not.

In an embodiment, the picture ofthe video that précédés the dépendent random access picture may, e.g., be an independent random access picture. The video data stream may, e.g., comprise a flag (e.g., a ph_pic_output_flag) having a predefined value (e.g., 0) in a picture header of the independent random access picture, such that the predefined value (e.g., 0) of the flag (e.g., a ph_pic_output_flag) may, e.g., indicate for the independent random access picture directly précédés said dépendent random access picture within the video data stream, and that said independent random access picture shall not be output.

According to an embodiment, the flag may, e.g., be a first flag, wherein the video data stream may, e.g., comprise a further flag in a picture parameter set of the video data stream, wherein the further flag may, e.g., indicate whether or not the first flag (e.g., a ph pic outputjflag) exists in the picture header of the independent random access picture.

In an embodiment, the video data stream may, e.g., comprise as the indication that may, e.g., indicate whether the picture of the video preceding the dépendent random access picture shall be output or not a supplémentai enhancement information flag within a supplémentai enhancement information of the output video data stream, or a picture parameter set flag within a picture parameter set of the output video data stream, or a sequence parameter set flag within a sequence parameter set of the output video data stream, or an external means flag, wherein a value of the external means flag may, e.g., be set by an external unit being external to an apparatus 200.

Furthermore, a video encoder 100 is provided. The video encoder 100 may, e.g., be configured to encode a video into a video data stream. Moreover, the video encoder 100 may, e.g., be configured to generate the video data stream such that the video data stream comprises an indication that indicates whether a picture of the video preceding a dépendent random access picture shall be output or not.

According to an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that the video data stream may, e.g., comprise supplémentai enhancement information comprising the indication that may, e.g., indicate whether the picture of the video preceding the dépendent random access picture shall be output or not.

In an embodiment, the picture of the video that précédés the dépendent random access picture may, e.g., be an independent random access picture. The video encoder 100 may,

e.g., be configurée! to generate the video data stream such that the video data stream may, e.g., comprise a flag (e.g., a ph_pic_output_fîag) having a predefined value (e.g., 0) în a picture header of the independent random access picture, such that the predefined value (e.g., 0) of the flag (e.g., a ph.pic _output_flag) may, e.g., indicate for the independent random access picture directly précédés said dépendent random access picture within the video data stream, and that said independent random access picture shall not be output.

According to an embodiment, the flag may, e.g., be a first flag, wherein the video encoder 100 may, e.g., be configured to generate the video data stream such that the video data stream may, e.g., comprise a further flag in a picture parameter set of the video data stream, wherein the further flag may, e.g., indicate whether or not the first flag (e.g., a ph_pic _output flag) exists in the picture header of the independent random access picture.

In an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that the video data stream may, e.g., comprise as the indication that may, e.g., indicate whether the picture of the video preceding the dépendent random access picture shall be output or not a supplémentai enhancement information flag within a supplémentai enhancement information of the output video data stream, or a picture parameter set flag within a picture parameter set of the output video data stream, or a sequence parameter set flag within a sequence parameter set of the output video data stream, or an external means flag, wherein a value ofthe external means flag may, e.g., be set by an external unit being external to an apparatus 200.

Moreover, a video décoder 300 for receiving a video data stream having a video stored therein îs provided. The video décoder 300 is configured to décodé the video from the video data stream. The video décoder 300 is configured to décodé the video depending on an indication indicating whether a picture ofthe video preceding a dépendent random access picture shall be output or not.

According to an embodiment, the video décoder 300 may, e.g,, be configured to décodé the video depending on a first variable (e.g., a NoOutputBeforeDrapFlag) indicating whetherthe picture ofthe video that précédésthe dépendent random access picture shall be output or not.

In an embodiment, the video data stream may, e.g., comprise the indication that may, e.g., indicate whether the picture of the video preceding the dépendent random access picture shall be output or not. The video decoder 300 may, e.g., be configured to décodé the video depending on the indication within the video data stream.

According to an embodiment, the video data stream may, e.g., comprise supplémentai enhancement information comprising the indication that may, e.g., indicate whether the picture of the video preceding the dépendent random access picture shall be output or not. The video decoder 300 may, e.g., be configured to décodé the video depending on the supplémentai enhancement information.

In an embodiment, the picture of the video that précédés the dépendent random access picture may, e.g., be an independent random access picture. The video data stream may, e.g., comprise a flag (e.g., a ph_pic_output_flag) having a predefined value (e.g., 0) in a picture header ofthe independent random access picture, such that the predefined value (e.g., 0) of the flag (e.g., a ph_pic_output_flag) may, e.g., indicate for the independent random access picture directly précédés said dépendent random access picture within the video data stream, and that said independent random access picture shall not be output. The video decoder 300 may, e.g., be configured to décodé the video depending on the flag.

According to an embodiment, the flag may, e.g,, be a first flag, wherein the video data stream may, e.g., comprise a further flag in a picture parameter set of the video data stream, wherein the further flag may, e.g., indicate whether or not the first flag (e.g., a ph_pic_output_flag) exists in the picture header of the independent random access picture. The video decoder 300 may, e.g., be configured to décodé the video depending on the further flag.

In an embodiment, the video data stream may, e.g., comprise as the indication that may, e.g., indicate whether the picture of the video preceding the dépendent random access picture shall be output or not a supplémentai enhancement information flag within a supplémentai enhancement information of the output video data stream, or a picture parameter set flag within a picture parameter set of the output video data stream, or a sequence parameter set flag within a sequence parameter set of the output video data stream, or an external means flag, wherein a value ofthe externai means flag may, e.g., be set by an external unit being external to an apparatus 200. The video décoder 300 may, e.g., be confïgured to décodé the video depending on the indication within the video data stream.

According to an embodiment, the video décoder 300 may, e.g., be confïgured to reconstruct the video from the video data stream. The video décoder 300 may, e.g., be confïgured to output orto not output the picture ofthe video that précédés the dépendent random access picture depending on the first variable (e.g., a NoOutputBefore Drap Flag).

In an embodiment, the video décoder 300 may, e.g., be confïgured to détermine a value of a second variable (e.g., a PictureOutputFlag) for the picture of the video that précédés the dépendent random access picture depending on the first variable (e.g., a NoOutputBeforeDrapFlag), wherein the second variable (e.g., a PictureOutputFlag) may, e.g., indicate for said picture whether said picture shall be output or not, and wherein the apparatus 200 may, e.g., be confïgured to output or to not output said picture depending on the second variable (e.g., a PictureOutputFlag).

According to an embodiment, the picture of the video that précédés the dépendent random access picture may, e.g., be an independent random access picture. The video décoder 300 may, e.g., be confïgured to décodé the video depending on the first variable (e.g., a NoOutputBeforeDrapFlag) which may, e.g., indicate thatthe independent random access picture shall not be output.

In an embodiment, the picture of the video that précédés the dépendent random access picture may, e.g., be an independent random access picture. The video décoder 300 may, e.g., be confïgured to décodé the video depending on the first variable (e.g., a NoOutputBeforeDrapFlag) which may, e.g., indicate that the independent random access picture shall be output.

Furthermore, a System is provided. The System comprises an apparatus 200 as described above and a video décoder 300 as described above. The video décoder 300 is confïgured to receive the output video data stream of the apparatus 200. Moreover, the video décoder 300 is confïgured to décodé the video from the output video data stream of the apparatus 200.

According to an embodiment, the system may, e.g., further comprise a video encoder 100. The apparatus 200 may, e.g., be configured to receive the video data stream from the video encoder 100 as the input video data stream.

In particular, the first aspect of the invention relates to CVS start at DRAP and to omit IDR output in decoding and conformance testing.

When a bitstream comprises pictures marked as DRAP (i.e. only using the previous IRAP as reference for the DRAP and from there on in the bitstream) it is possible to utilize these DRAP pictures for random access functionality at lower rate overhead. However, when using some target DRAP for random accessing a stream, it is undesirable to display any initial picture before the target DRAP (i.e. the associated IRAP of target DRAP) at the decoder output as the temporal distance between these pictures would lead to a shacky/jittery video playback when played back at the rate ofthe original video until the video is played back in a smooth way from the target DRAP on.

Therefore, it is désirable to omit the output ofthe pictures preceding the DRAP picture. This aspect of the invention présents means to control the decoder accordingly.

In one embodiment, an external means to set the PicOutputFlag variable of an IRAP picture is made available for implémentations to use as follows:

- If some external means not specified in this Spécification is available to set the variable NoOutputBeforeDrapFlag for the picture to a value, NoOutputBeforeDrapFlag for the picture is set equal to the value provided by the external means.

[-]

- The variable PictureOutputFlag of the current picture Is derived as follows:

- If sps_video_parameter_set_id is greater than 0 and the current layer is not an output layer (Le., nuh_layer_id is not equal to

OutputLayerldlnOlsf TargetOlsIdx ][ i ] for any value of i in the range of 0 to NumOutputLayerslnOls[ TargetOlsIdx ] - 1, inclusive), or one ofthe following conditions is true, PictureOutputFlag is set equal to 0:

- The current picture is a RASL picture and NoOutputBeforeRecoveryFlag of the associated IRAP picture is equal to 1.

- The current picture is a GDR picture with NoOutputBeforeRecoveryFlag equal to 1 or is a recovering picture of a GDR picture with NoOutputBeforeRecoveryFlag equal to 1.

- The current picture is a IRAP picture with NoOutputBeforeDrapFlag equal to 1,

- Otherwise, PictureOutputFlag is set equal to ph_pic_output_flag.

In another embodiment, the NoOutputBeforeDrapFlag is set by external means only for the first IRAP picture in a CVS, and set to 0 otherwise.

- If some external means not specified in this Spécification is availabie to set the variable NoOutputBeforeDrapFlag for the picture to a value, NoOutputBeforeDrapFlag for the first picture in the CVS is set equal to the value provided by the external means. Otherwise, NoOutputBeforeDrapFlag is set to 0.

The above mentioned flag NoOutputBeforeDrapFlag could also be associated with the usage of alternative HRD timîngs conveyed in the bitstream for the case of removal of pictures between the IRAP picture and the DRAP picture, e.g. a the flag UseAltCpbParamsFIag in the WC spécification.

In an alternative embodiment, it is a constraint that IRAP pictures that directly précédé DRAP pictures without non-DRAP pictures in between shall hâve a value of 0 in the output flag ph_pic_output_flag in the picture header. In this case, whenever an extractor or player uses a DRAP for random accessing, i.e. it removes intermediate pictures between IRAP and DRAP from the bitstream, it is aiso required to verify or adjust that the respective output fiag is set to 0 and output of the IRAP is omitted.

For this operation to be simple the original bitstream needs to be prepareed correspondingly. More concretely, pps_outpui_flag__present_flag. which détermines the presence of the flag ph pic outputjïag in the picture header shall be equal to 1 so that the picture header can be easily change and it is not required to change also parameter sets. That is:

It is a requirement of bitstream conformance that the value of pps_output_flag_present_flag shall be equal to 1 if the PPS is referred to by a picture within a CVSS AU that has associated DRAP AUs.

In addition to the options listed above, in another embodiment, ît is indieated in a parameter set PPS or an SPS whether the first AU in the bitstream, i.e. a CRA or IDR that constitutes a CLVS start, is to be output or not after decoding. Thus, the System intégration is simpler, as only a parameter set needs to be adjusted instead of requiring comparatively low-level syntax such as a PHs also to be changed, e.g. when parsing a file in file format ISOBMFF.

An example is shown in the following:

seq parameter_set_rbsp( ) {	Descriptor
[-.]
if( sps_conformance_window_flag ) {
sps_conf_win_left_offset	ue(v)
spsco nfwi n_ri g ht_offset	ue(v)
s ps_co nf_wi n_top_offset	ue(v)
sps_conf_win_bottom_offset	ue(v)
}
sps_pic_in_cvss_au_no_outputflag	u(1)
spslog2_ctu_size_minus5	u(2)
[-]
}

sps_picjn_cvss_au_no_output_flag equal to 1 spécifiés that a picture in a CVSS AU referring to the SPS is not ou put. sps_pic_in_cvss_au_no_outputflag equal to 0 spécifiés that a picture in a CVSS AU referring to the SPS may or may not be ouput.

It is a requirement of bitstream conformance that the value of sps_pic_in_cvss_auno__output_flag shall be the same for any SPS referred to by any output layer in an OLS.

In 8.1.2

- The variable PictureOutputFlag ofthe current picture is derived as follows:

- If sps_video parameter_set_id is greaterthan 0 and the current layer is not an output layer (i.e., nuh_layer_îd is not equal to OutputLayerldlnOlsÊ TargetOlsIdx ][ i ] for any value of i in the range of 0 to NumOutputLayerslnOlsf TargetOlsIdx ] - 1, inclusive), or one of the following conditions is true, PictureOutputFlag is set equal to 0:

- Otherwise, if the current AU is a CVSS AU and sps_pic_in_cvss_au_no_output_flag equal to 1 PictureOutputFlag is set equal to 0.

- Otherwise, PictureOutputFlag is set equal to ph_pic_output_flag.

NOTE - In an implémentation, the décoder could output a picture not belonging to an output layer. For example, when there is only one output layer while in an AU the picture of the output layer is not available, e.g., due to a loss or layer down-switching, the décoder could set PictureOutputFlag set equal to 1 for the picture that has the highest value of nuhJayerJd among ail pictures of the AU available to the décoder and having ph_pic_output_flag equal to 1, and set PictureOutputFlag equal to 0 for ail other pictures of the AU available to the décoder.

In another embodiment, for example, a requirement may, e.g., be defined as follows:

It is a requirement of bitstream conformance that the value of ph _pic output_flag shall be equal to 0 if the a picture belongs to a IRAP AU and the IRAP AU is directly preceding a DRAP AU.

In the following, the second aspect of the invention is now described in detail.

In accordance with the second aspect of the invention, an apparatus 200 for receiving one or more input video data streams is provided. Each of the one or more input video data streams has an input video encoded thereinto. The apparatus 200 is configured to generate an output video data stream from the one or more input video data streams, the output video data stream encoding an output video, wherein the apparatus is configured to generate the output video data stream such that the output video is the input video being encoded within one of the one or more input video data streams, or such that the output video dépends on the input video of at least one of the one or more input video data streams. Moreover, the apparatus 200 is configured to détermine an access unît removal time of a current picture of a plurality of pictures of the output video from a coded picture buffer. The apparatus 200 is configured to détermine whether or not to use coded picture buffer delay offset information for determining the access unit removal time of the current picture from the coded picture buffer.

According to an embodiment, the apparatus 200 may, e.g., be configured to drop a group of one or more pictures of the input video of a first video data stream of the one or more input video data streams to generate the output video data stream. The apparatus 200 may, e.g., be configured to détermine an access unit removal time for at least one ofthe plurality of pictures of the output video from the coded picture buffer depending on the coded picture buffer delay offset information.

In an embodiment, the first video received by the apparatus 200 may, e.g., be a preprocessed video which results from an original video from which a group of one or more pictures has been dropped to generate the preprocessed video. The apparatus 200 may, e.g., be configured to détermine an access unit removal time for at least one of the plurality of pictures of the output video from the coded picture buffer depending on the coded picture buffer delay offset information.

According to an embodiment, the buffer delay offset information dépends on a number of pictures of the input video that hâve been dropped.

In an embodiment, the one or more input video data streams are two or more input video data streams. The apparatus 200 may, e.g., be configured to splice the processed video and the input video of a second video data stream of the two or more input video data streams to obtain the output video, and may, e.g., be configured to encode the output video into the output video data stream.

According to an embodiment, the apparatus 200 may, e.g., be configured to détermine whether or not to use coded picture buffer delay offset information for determining the access unit removal time of the current picture depending on a location of the current picture within the output video. Or, the apparatus 200 may, e.g., be configured to détermine whether or not to set a coded picture buffer delay offset value of the coded picture buffer delay offset information to 0 for determining the access unit removal time of the current picture depending on the location of the current picture within the output video.

In an embodiment, the apparatus 200 may, e.g., be configured to détermine whether or not to use coded picture buffer delay offset information for determining the access unit removal time ofthe current picture depending on a position of a previous non-discardable picture which précédés the current picture within the output video.

According to an embodiment, the apparatus 200 may, e.g., be configured to détermine whether or not to use coded picture buffer delay offset information for determining the access unit removaltime ofthe current picture depending on whether ornotthe previous non-discardable picture which précédés the current picture within the output video may, e.g., be a first picture in a previous buffering period.

In an embodiment, the apparatus 200 may, e.g., be configured to détermine whether or not to use coded picture buffer delay offset information for determining the access unit removal time ofthe current picture depending on a concaténation flag, the current picture being a first picture of the input video of the second video data stream.

According to an embodiment, the apparatus 200 may, e.g., be configured to détermine the access unit removal time of the current picture depending on a removal time of a preceding picture.

In an embodiment, the apparatus 200 may, e.g., be configured to détermine the access unit removal time ofthe current picture depending on initial coded picture buffer removal delay information.

According to an embodiment, the apparatus 200 may, e.g., be configured to update the initial coded picture buffer removal delay information depending on a clock tick to obtain temporary coded picture buffer removal delay information to détermine the access unit removal time ofthe current picture.

According to an embodiment, if the concaténation flag is set to a first value, then the apparatus 200 is configured to use the coded picture buffer delay offset information to détermine one or more removal times. If the concaténation flag is set to a second value being different from the first value then the apparatus 200 is configured to not use the coded picture buffer delay offset information to détermine the one or more removal times.

In an embodiment, the apparatus 200 may, e.g., be configured to signal to a video décoder 300, whether or not to use coded picture buffer delay offset information for determining the access unit removal time ofthe current picture from the coded picture buffer.

According to an embodiment, the current picture may, e.g,, be located at a splicing point of the output video, where two input videos hâve been spliced.

According to an embodiment, the video data stream may, e.g., comprise a concaténation flag.

In an embodiment, the video data stream may, e.g., comprise initial coded picture buffer removal delay information.

According to an embodiment, if the concaténation flag is set to a first value (e.g., 0) then the concaténation flag indicates that the coded picture buffer delay offset information needs to be used to détermine one or more (picture or access unit) removal times, e.g., when it is known that some pictures (e.g., RASL pictures) hâve been dropped. If the concaténation flag is set to a second value being different from the first value (e.g., 1) then the concaténation flag indicates that the indicated offset is not used to détermine the one or more (picture or access unit) removal times, e.g., irrespective of an offset signaling and, e.g., irrespective of whether RASL pictures hâve been dropped. If pictures are not dropped, then, e.g., the offset is not to be used.

Moreover, a video encoder 100, is provided. The video encoder 100 is configured to encode a video into a video data stream. The video encoder 100 is configured to generate the video data stream such that the video data stream comprises coded picture buffer delay offset information.

According to an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that the video data stream may, e.g., comprise a concaténation flag.

In an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that the video data stream may, e.g., comprise coded picture buffer delay offset information.

Furthermore, a video décoder 300 for receiving a video data stream a video stored therein is provided. The video décoder 300 is configured to décodé the video from the video data stream. Moreover, the video décoder 300 is configured to décodé the video depending on an access unit removal time of a current picture of a plurality of pictures of the video from a coded picture buffer. The video décoder 300 is configured to décodé the video depending on an indication indicating whether or not to use coded picture buffer delay offset information for determining the access unit removal time of the current picture from the coded picture buffer.

According to an embodiment, the access unit removal time for at least one of the plurality of pictures ofthe video from the coded picture buffer dépends on the coded picture buffer delay offset information.

In an embodiment, the video décoder 300 is configured to décodé the video depending on whether or not to use coded picture buffer delay offset information for determining the access unit removal time of the current picture depending on a location of the current picture within the video.

According to an embodiment, the video décoder 300 may, e.g., be configured to décodé the video depending on whether or not a coded picture buffer delay offset value of the coded picture buffer delay offset information may, e.g., be set to 0.

In an embodiment, the video décoder 300 may, e.g., be configured to determine whether or not to use coded picture buffer delay offset information for determining the access unit removal time ofthe current picture depending on a position of a previous non-discardable picture which précédés the current picture within the video.

According to an embodiment, the video décoder 300 may, e.g., be configured to determine whether or not to use coded picture buffer delay offset information for determining the access unit removal time ofthe current picture depending on whether or not the previous non-discardable picture which précédés the current picture within the video may, e.g., be a first picture in a previous buffering period.

In an embodiment, the video décoder 300 may, e.g., be configured to determine whether or not to use coded picture buffer delay offset information for determining the access unit removal time of the current picture depending on a concaténation flag, the current picture being a first picture of the input video of the second video data stream.

According to an embodiment, the video décoder 300 may, e.g., be configured to determine the access unit removal time of the current picture depending on a removal time of a preceding picture.

In an embodiment, the video décoder 300 may, e.g., be configured to determine the access unit removal time ofthe current picture depending on initial coded picture buffer removal delay information.

According to an embodiment, the video décoder 300 may, e.g., be configured to update the initial coded picture buffer removal delay information depending on a dock tick to obtain temporary coded picture buffer removal delay information to determine the access unit removal time ofthe current picture.

According to an embodiment, if the concaténation flag is set to a first value, then the video décoder 300 is configured to use the coded picture buffer delay offset information to détermine one or more removal times. If the concaténation flag is set to a second value being different from the first value then the video décoder 300 is configured to not use the coded picture buffer deîay offset information to détermine the one or more removal times.

Furthermore, a system is provided. The system comprises an apparatus 200 as described above and a video décoder 300 as described above. The video décoder 300 is configured to receive the output video data stream of the apparatus 200. Moreover, the video décoder 300 is configured to décodé the video from the output video data stream of the apparatus 200.

In particular, the second aspect of the invention relates to that prevNonDiscardable in case of alternative timing may (when it is not a BP start) aiready include the alternative offset (CpbDelayOffset ), so for the AU with concatenation_flag == 1 , CpbDelayOffset should be temporally set to zéro.

When splicing of two bitstreams happens the dérivation ofthe removal time of an AU from the CPB is done differently than for non-spliced bitstreams. At the splicing point a Buffering Period SEI message (BP SEI message; SEI = supplémentai enhancement information) comprises a concatenationFlag being equal to 1. Then the décoder needs to check among 2 values and take the bigger one of both:

• previous Non Discardable Pic (prevNonDiscardablePic) removal time plus a delta signalled in the BP SEI message (auCpbRemovalDeiayDeîtaMinusl + 1), or • preceding Pic removal time plus InitCpbRemovalDelay

However, when the previous Pic with a BP SEI message was an AU for which alternative timings hâve been used (i.e. a second timing information used when the RASL picture or pictures up to a DRAP hâve been dropped) for dérivation ofthe removal times, an offset is used (CpbDelayOffset) to corn pute each removal time that computed as a delta to the previous pic with a buffering period, i.e. AuNominalRemovalTime[ firstPicInPrevBuffPeriod ] plus AuCpbRemovalDelayVal - CpbDelayOffset, as illustrated in Fig. 4.

Fig, 4 illustrâtes an original bitstream (top of Fig. 4), and a bitstream after dropping pictures (bottom of Fig. 4): An offset is incorporated into a calculation of the removal delay after dropping AUs (lines 1,2 and 3 in the original bitstream).

The offset is added since the removal time is computed using a delta to the removal time of the picture referred to as firstPicInPrevBuffPeriod, after which some AUs hâve been dropped and therefore a CpbDelayOffset is necessary to account (compensate) for the AU dropping.

Fig. 5 illustrâtes the splicing of two bitstreams (at a different position), a first bitstream (in Fig. 5 middle left) and a second bitstream (in Fig. 5, middle, right), after pictures were dropped from the original first bitstream (in Fig. 5, middle, left).

The example for using the preceding Pic removal time as anchor instead of the previous non-discardable picture is similar and would not require the “-3” correction factor (CpbDelayOffset) neither.

However, in the splicing case as illustrated in Fig. 5, note that it is not necessarily the case that the two dérivations use the removal time of the AU associated with a BP SEI message (firstPicInPrevBuffPeriod). As discussed, for the splicing case a delta added to either the prevNonDiscardabiePic or just the preceding Pic. This means that when the prevNonDiscardablePic is not the firstPicInPrevBuffPeriod, the CpbDelayOffset cannot be used to dérivé the removal time from the CPB of the current AU, as the removal time of prevNonDiscardablePic already accounts for AU dropping and no AUs are dropped between prevNonDiscardablePic and the AU for which the removal time is computed. Now imagine that the preceding Pic removal time is used instead, as for the case that the current AU (i.e. the splicing point with a new BP SEI message) has an InitiaiCpbRemovalDelay that forces the removal time of the current AU to corne after its desired removal time, which would hâve achieved an équidistant removal time (when the prevNonDiscardablePic is used instead). In such a case, the removal time of the current AU cannot be smalierthan the time computed by using the preceding Pic removal time plus InîtCpbRemovalDelay as this could lead to buffer underruns (AUs not in the buffer before they need to be removed). Therefore, as part of the invention, for this case the CpbDelayOffset is not used for the computation or considered to be equal to 0.

Summarizing the embodiment herein is to use a CpbDelayOffset for the computation of AU removal times when RASL AUs are dropped from a bitstream or AUs in between an IRAP and DRAP AUs are dropped depending on a check. The check to détermine whether CpbDelayOffset is not used or considered to be equal to 0 being one of the following:

• prevNonDiscardablePic is not the firstPicInPrevBuffPeriod • preceding Pic removal time plus InitCpbRemovalDelay is used for the computation of the removal ofthe currentAU

The implémentation in the spécification could be as follows:

-When AU n is the first AU of a BP that does not initialize the HRD, the following applies:

The nominal removal time of the AU n from the CPB is specified by:

if( IconcatenationFlag ) { baseTime = AuNominalRemovalTime[ firstPicInPrevBuffPeriod ] tmpCpbRemovalDelay = AuCpbRemovalDelayVal tmpCpbDelayOffset = CpbDelayOffset } else { baseTimel = AuNominalRemovalTime[ prevNonDiscardablePic ] tmpCpbRemovalDelayl = ( auCpbRemovalDelayDeltaMinusl + 1 ) baseTime2 = AuNominalRemovalTime[ n - 1 ] tmpCpbRemovalDelay2 = (C.10)

Ceîl( ( lnitCpbRemovalDelay[ Htid ][ Scldx ] + 90000 +

AuFinalArrivalTimef n - 1 ] - Au Nominal Rem ovalTimef n - 1 ] ) + ClockTick ) if( baseTimel + ClockTick * tmpCpbRemovalDelayl < baseTime2 + ClockTick * tmpCpbRemovalDelay2 ) { baseTime = baseTime2 tmpCpbRemovalDelay = tmpCpbRemovalDelay2 tmpCpbDelayOffset = 0 } else { baseTime = baseTimel tmpCpbRemovalDelay = tmpCpbRemovalDelayl tmpCpbDelayOffset = ((prevNonDiscardablePic= =firstPiclnPrevBuffPeriod)?CpbDelayOffset:0) ) }

AuNominalRemovalTime[ n ] = baseTime + ( ClockTick * tmpCpbRemovalDelay - tmpCpbDelayOffset )

Alternatively, in another embodiment illustrated in Fig. 6, the CpbDelayOffset for the computation of AU removal times when RASLAUs are dropped from a bitstream or AUs in between an IRAP and DRAP AUs are dropped depending on a different check that comprises checking the concatenationFlag.

In that case the delta in the bitstream when concatenationFlag is set to 1 needs to match the proper value as if the CpbDelayOffset was accounted for (as évident when comparing Fig. 5 and 6), as for that figure CpbDelayOffset is not applied or considered to be 0.

The implémentation in the spécification could be as follows:

-When AU n is the first AU of a BP that does not inîtialize the HRD, the following applies:

The nominal removal time of the AU n from the CPB is specified by:

if( îconcatenationFlag ) { baseTime = AuNominaIRemovalTimef firstPicInPrevBuffPeriod ] tmpCpbRemovalDelay = AuCpbRemovalDelayVal tmpCpbDelayOffset = CpbDelayOffset } else { baseTimel = AuNominalRemovalTime[ prevNonDiscardablePic ] tmpCpbRemovalDelayl = ( auCpbRemovalDelayDeltaMinusl + 1 ) baseTime2 = AuNominalRemovalTimef n — 1 ] tmpCpbRemovalDelay2 - (C.10)

Ceil( ( InitCpbRemovalDelayf Htid ][ Scldx ] + 90000 +

AuFinalArrivalTime[ n - 1 ] - AuNominalRemovalTime[ π - 1 ] ) + ClockTick ) if( baseTimel + ClockTick * tmpCpbRemovalDelayl <

baseTime2 + ClockTick * tmpCpbRemovalDelay2 ) { baseTime = baseTime2 tmpCpbRemovalDelay - tmpCpbRemovalDelay2 } else { baseTime = baseTimel tmpCpbRemovalDelay = tmpCpbRemovalDelayl }

tmpCpb Delay Offset = 0 }

AuNominalRemovalTimef n ] = baseTime + ( ClockTick * tmpCpbRemovalDelay - tmpCpbDelayOffset

In the following, the third aspect ofthe invention is now described in detail.

In accordance with the third aspect ofthe invention, a video data stream is provided. The video data stream has a video encoded thereinto. Moreover, the video data stream comprises an initial coded picture buffer removal delay. Furthermore, the video data stream comprises an initial coded picture buffer removal offset. Moreover, the video data stream comprises information that indicates whether or not a sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across two or more buffering periods.

According to an embodiment, the initial coded picture buffer removal delay may, e.g., indicate a time that needs to pass for a first access unit of a picture of the video data stream that initializes a video décoder 300 before sending the first access unit to the video décoder 300.

In an embodiment, the video data stream may, e.g., comprise a single indication that may, e.g., indicate whetheror notthe sum ofthe initial coded picture buffer removal delay and the initial coded picture buffer removal offset may, e.g., be defined to be constant across the two or more buffering periods.

According to an embodiment, the video data stream may, e.g., comprise a concaténation flag as the single indication, that may, e.g., indicate whether or not the sum ofthe initial coded picture buffer removal delay and the initial coded picture buffer removal offset may, e.g., be defined to be constant across the two or more buffering periods. If the concaténation flag is equal to a first value, the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is constant across the two or more buffering periods. If the concaténation flag is different from the first value, the concaténation flag does not define whether or not the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is constant across the two or more buffering periods.

In an embodiment, if the single indication does not indicate that the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across the two or more buffering periods, the video data stream may, e.g., comprise continuously updated information on the initial coded picture buffer removal delay information and continuously updated information on the initial coded picture buffer removal offset information.

According to an embodiment, if the video data stream comprises the information that indicates thatthe sum ofthe initial coded picture bufferremoval delay and the initial coded picture buffer removal offset is defined to be constant across the two or more buffering periods, the sum of the initiai coded picture buffer removal delay and the initial coded picture buffer removal offset may, e.g., be defined to be constant starting from a current position within the video data stream.

Furthermore, a video encoder 100 is provided. The video encoder 100 is confïgured to encode a video into a video data stream. Moreover, the video encoder 100 is confïgured to generate the video data stream such that the video data stream comprises an initial coded picture buffer removal delay. Furthermore, the video encoder 100 is confïgured to generate the video data stream such that the video data stream comprises an initial coded picture buffer removal offset. Moreover, the video encoder 100 is confïgured to generate the video data stream such that the video data stream comprises information that indicates whether or not a sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across two or more buffering periods.

In an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that the video data stream may, e.g., comprise a single indication that may, e.g., indicatewhetherornotthesum ofthe initial coded picture bufferremoval delay and the initial coded picture buffer removal offset may, e.g., be defined to be constant across the two or more buffering periods.

According to an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that the video data stream may, e.g., comprise a concaténation flag as the single indication, that may, e.g., indicate whether or not the sum ofthe initial coded picture bufferremoval delay and the initial coded picture buffer removal offset may, e.g., be defined to be constant across the two or more buffering periods. If the concaténation flag is equal to a first value, the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is constant across the two or more buffering periods. If the concaténation flag is different from the first value, the concaténation flag does not define whether or not the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is constant across the two or more buffering periods.

In an embodiment, if the single indication does not indicate that the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across the two or more buffering periods, the video encoder 100 may, e.g., be configured to generate the video data stream such that the video data stream may, e.g., comprise continuously updated information on the initial coded picture buffer removal delay information and continuously updated information on the initial coded picture buffer removal offset information.

According to an embodiment, if the video data stream comprises the information that may, e.g., indicate that the sum ofthe initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across the two or more buffering periods, the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant starting from a current position within the video data stream.

Moreover, an apparatus 200 for receiving two input video data streams, being a first input video data stream and a second input video data stream, is provided. Each of the two input video data streams has an input video encoded thereinto. The apparatus 200 is configured to generate an output video data stream from the two input video data streams, the output video data stream encoding an output video, wherein the apparatus is configured to generate an output video data stream by concatenating the first input video data stream and the second input video data stream. Moreover, the apparatus 200 is configured to generate the output video data stream such that the output video data stream comprises an initial coded picture buffer removal delay. Furthermore, the apparatus 200 is configured to generate the output video data stream such that the output video data stream comprises an initial coded picture buffer removal offset. Moreover, the apparatus 200 is configured to generate the output video data stream such that the output video data stream comprises information that indicates whether or not a sum ofthe initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across two or more buffering periods.

According to an embodiment, the initial coded picture buffer removal delay may, e.g., indicate a time that needs to pass for a first access unit of a picture of the output video data stream that initializes a video decoder 300 before sending the first access unit to the video decoder 300.

In an embodiment, the apparatus 200 may, e.g., be configured to generate the output video data stream such that the output video data stream may, e.g., comprise a single indication that may, e.g., indicate whetherornotthe sum ofthe initial coded picture buffer removal delay and the initial coded picture buffer removal offset may, e.g., be defined to be constant across the two or more buffering periods.

According to an embodiment, the apparatus 200 may, e.g., be configured to generate the output video data stream such that the output video data stream may, e.g., comprise a concaténation flag as the single indication, that may, e.g., indicate whether or not the sum ofthe initial coded picture buffer removal delay and the initial coded picture buffer removal offset may, e.g., be defined to be constant across the two or more buffering periods. If the concaténation flag is equal to a first value, the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is constant across the two or more buffering periods. If the concaténation flag is different from the first value, the concaténation flag does not define whether or not the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is constant across the two or more buffering periods.

In an embodiment, if the single indication does not indicate that the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across the two or more buffering periods, the apparatus 200 is configured to generate the output video data stream such that the output video data stream comprises continuously updated information on the initial coded picture buffer removal delay information and continuously updated information on the initial coded picture buffer removal offset information.

According to an embodiment, if the video data stream comprises the information that indicates that the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across the two or more buffering periods, the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant starting from a current position within the video data stream.

Furthermore, a video décoder 300 for receiving a video data stream a video stored therein is provided. The video décoder 300 is configured to décodé the video from the video data stream. Moreover, the video data stream comprises an initial coded picture buffer removal delay. Furthermore, the video data stream comprises an initial coded picture buffer removal offset. Moreover, the video data stream comprises information that indicates whether or not a sum ofthe initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across two or more buffering periods. Furthermore, the video décoder 300 is configured to décodé the video depending on the information that indicates whether or not the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across the two or more buffering periods.

According to an embodiment, the initial coded picture buffer removal delay may, e.g., indicate a time that needs to pass for a first access unit of a picture of the output video data stream that initializes the video décoder 300 before sending the first access unit to the video décoder 300.

In an embodiment, the video data stream may, e.g., comprise a single indication that may, e.g., indicate whether or not the sum ofthe initial coded picture buffer removal delay and the initial coded picture buffer removal offset may, e.g., be defined to be constant across the two or more buffering periods. The video décoder 300 may, e.g., be configured to décodé the video depending on the single indication.

According to an embodiment, the video data stream may, e.g., comprise a concaténation flag as the single indication, that may, e.g., indicate whether or not the sum ofthe initial coded picture buffer removal delay and the initial coded picture buffer removal offset may, e.g., be defined to be constant across the two or more buffering periods. If the concaténation flag is equal to a first value, the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is constant across the two or more buffering periods. If the concaténation flag is different from the first value, the concaténation flag does not define whether or not the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is constant across the two or more buffering periods. The video décoder 300 is configured to décodé the video depending on the concaténation flag.

In an embodiment, if the single indication does not indicate that the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across the two or more buffering periods, the video data stream comprises continuously updated information on the initial coded picture buffer removal delay information and continuously updated information on the initial coded picture buffer removal offset information. The video décoder 300 is configured to décodé the video depending on the continuously updated information on the initial coded picture buffer removal delay information and on the continuously updated information on the initial coded picture buffer removal offset information.

According to an embodiment, if the video data stream comprises the information that indicates thatthe sum ofthe initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant across the two or more buffering periods, the sum of the initial coded picture buffer removal delay and the initial coded picture buffer removal offset is defined to be constant starting from a current position within the video data stream.

Moreover, a System is provided. The System comprises an apparatus 200 as described above and a video decoder 300 as described above. The video decoder 300 is configured to receive the output video data stream of the apparatus 200. Moreover, the video decoder 300 is configured to décodé the video from the output video data stream of the apparatus 200.

According to an embodiment, the System may, e.g., further comprise a video encoder 100. The apparatus 200 according to one of claims 221 to 226 may, e.g., be configured to receive the video data stream from the video encoder 100 according to one of claims 211 to 216 as the input video data stream.

In particular, the third aspect of the invention relates to splicing, to an Initial Cpb Removal Delay and to an Initial Cpb Removal Offset

Currently the spécification indicates that the sum of Initial Cpb Removal Delay and Initial Cpb Removal Offset are constant within a CVS. The same constraint is expressed for the alternative timings. The Initial Cpb Removal Delay indicates the time that needs to pass for the first AU in the bitstream that initializes the decoder before sending the first AU for decoding. The Initial Cpb Removal Offset is a property ofthe bitstream that means that the AUs earliest arrivai time in the decoder is not necessarily équidistant with respect to the time 0 at which the first AU arrive the decoder. It helps determining when the first bit of an AU can earliest reach the decoder.

The current constraint in the WC draft spécification indicates that the sum of these two values needs to be constant within a CVS:

Over the entire CVS, for each value pair of i and j, the sum of nal_initial_cpb_removal_delay[ i ][ j ] and nal_initial_cpb_removal_offset[ i ][ j ] shall be constant, and the sum of naljnitial_alt_cpb_removal_delay[ i ][ j ] and nal Jnitial_alt_cpb_removaLpffset[ i ][ j ] shall be constant.

The problem appears when editing or splicing bitstreams to form a new joint bitstream. It is désirable also to be able to indicate whether this property is fulfilled across the CVS boundary for the bitstream as having a different value of the sum could lead to buffer underruns or overflows.

Therefore, in an embodiment an indication is carried in the bitstream that from a certain point in the bitstream on (e.g. splicing point), the value constraint regarding the constant sum of InitCpbRemovalDelay and InitCpbRemovalDelayOffset (and the alternative counterparts) is reset and the su ms before and after the point in the bitstream may be different. Unless this indication is présent in the bitstream the sum stays constant.

For instance:

When concatenationFlag is equal to 0, it is a constraint of bitstream conformance that the sum of InitCpbRemovalDelay and InitCpbRemovalDelayOffset is constant across buffering periods.

Otherwise, sum of InitCpbRemovalDelay and InitCpbRemovalDelayOffset does not hâve to be constant across buffering periods. The values of InitCpbRemovalDelay and InitCpbRemovalDelayOffset are updated to account for the arrivai times.

In an embodiment, if several bitstreams are spliced, at each splicing point a concaténation flag may, e.g., define whether the sum stays constant or not.

In the following, the fourth aspect ofthe invention is now described in detail.

In accordance with the fourth aspect of the invention, a video data stream is provided. The video data stream has a video encoded thereinto. Moreover, the video data stream comprises an indication (e.g., a general_same_pic_timingJn_all_olsJ1ag) indicating whether or not a non-scalable nested picture timing supplémentai enhancement information message of a network abstraction layer unit of an access unit of the plurality of access units of a coded video sequence of a one or more coded video sequences of the video data stream is defined to apply to ail output layer sets of a plurality of output layer sets of said access unit. If the indication (e.g., a general_same_pic_timing_in_all_ols_flag) has a first value, then the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction iayer unît of said access unit is defined to apply to ail output layer sets of the piurality of output layer sets of said access unit. If the indication (e.g., a general_same pic__timmg Jn_all_ols_flag) has a value being different from the first value, then the indication does not define whether or not the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit applies to ail output layer sets of the plurality of output layer sets of said access unit.

According to an embodiment, for example, if the indication (e.g., a general_same_pic_timing_in_all__ols flag) has the first value, then said network abstraction layer unit does not comprise any other supplémentai enhancement information message that is different from a picture timing supplémentai enhancement information message.

In an embodiment, e.g., if the indication (e.g., a général_same_pic_timing_in_alLols_flag) has the first value, then said network abstraction layer unit does not comprise any other supplémentai enhancement information message.

According to an embodiment, for example, if the indication (e.g., a general_same_picjiming_in_all_ols_flag) has the first value, then for each network abstraction layer unit, which comprises a non-scalable nested picture timing supplémentai enhancement information message, of each access unit of the plurality of access units of a coded video sequence of the one or more coded video sequences, said network abstraction layer unit does not comprise any other supplémentai enhancement information message that is different from a picture timing supplémentai enhancement information message, or does not comprise any other supplémentai enhancement information message.

In an embodiment, if the indication (e.g., a general_same pic_timing_in all_ols_flag) has the first value, then for each network abstraction layer unit, which comprises a nonscalable nested picture timing supplémentai enhancement information message, of each access unit of the plurality of access units of each of the one or more coded video sequences of the video data stream, said network abstraction layer unit does not comprise any other supplémentai enhancement information message that is different from a picture timing supplémentai enhancement information message, or does not comprise any other supplémentai enhancement information message.

Moreover, a video encoder 100 may, e.g., be provided. The video encoder 100 is configured to encode a video into a video data stream. Moreover, the video encoder 100 is configured to generate the video data stream such that the video data stream comprises an indication (e.g., a general_same pic_timingin_all_ols_fiag) indicating whether or not a non-scalable nested picture timing supplémentai enhancement information message of a network abstraction layer unît of an access unit ofthe plurality of access units of a coded video sequence of a one or more coded video sequences of the video data stream is defined to apply to ail output layer sets of a plurality of output layer sets of said access unit. If the indication (e.g., a general same pic timing_m_all_ols_flag) has a first value, then the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit is defined to apply to ail output layer sets ofthe plurality of output layer sets of said access unit. If the indication (e.g., a general_same_pic_timing_in_alLols_flag) has a value being different from the first value, then the indication does not define whether or not the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit applies to ail output layer sets of the plurality of output layer sets of said access unit.

According to an embodiment, for example, if the indication (e.g., a genera[ same pic timing_in_all_ols_flag) has the first value, then the video encoder 100 is configured to generate the video data stream such that said network abstraction layer unit does not comprise any other supplémentai enhancement information message that is different from a picture timing supplémentai enhancement information message.

In an embodiment, e.g., if the indication (e.g., a general_same_pic_timing Jn_all_ols_flag) has the first value, then the video encoder 100 is configured to generate the video data stream such that said network abstraction layer unit does not comprise any other supplémentai enhancement information message.

According to an embodiment, for example, if the indication (e.g., a generai_same_pic_timingjn_all_ols_flag) has the first value, then the video encoder 100 may, e.g., be confïgured to generate the video data stream such that for each network abstraction layer unit, which comprises a non-scalable nested picture timing supplémentai enhancement information message, of each access unît of the pluralîty of access units of a coded video sequence ofthe one or more coded video sequences, said network abstraction layer unit does not comprise any other supplémentai enhancement information message that is different from a picture timing supplémentai enhancement information message, or does not comprise any other supplémentai enhancement information message.

In an embodiment, e.g., if the indication (e.g., a general_same_pic Jiming_in_all_ols_flag) has the first value, then the video encoder 100 may, e.g., be confïgured to generate the video data stream such that for each network abstraction layer unit, which comprises a non-scalable nested picture timing supplémentai enhancement information message, of each access unit ofthe pluralîty of access units of each ofthe one or more coded video sequences ofthe video data stream, said network abstraction layer unit does not comprise any other supplémentai enhancement information message that is different from a picture timing supplémentai enhancement information message, or does not comprise any other supplémentai enhancement information message.

Furthermore, an apparatus 200 for receiving an input video data stream is provided. The input video data stream has a video encoded thereinto. The apparatus 200 is confïgured to generate an processed video data stream from the input video data stream. Moreover, the apparatus 200 is confïgured to generate the processed video data stream such that the processed video data stream comprises an indication (e.g., a general_same_pic_timing_in_all_ols_flag) indicating whether or not a non-scalable nested picture timing supplémentai enhancement information message of a network abstraction layer unit of an access unit of the pluralîty of access units of a coded video sequence of a one or more coded video sequences of the processed video data stream is defined to apply to ail output layer sets of a pluralîty of output layer sets of said access unit. If the indication (e.g., a genera(_same_pic_timingjn all_ols_flag) has a first value, then the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit is defined to apply to ail output layer sets of the plurality of output layer sets of said access unit. If the indication (e.g., a general_same_picjiming_in_all_ols_flag) has a value being different from the first value, then the indication does not define whether or not the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit applies to ali output layer sets of the plurality of output layer sets of said access unit.

According to an embodiment, for example, if the indication (e.g., a general_same_pic_timingjn_all_ols_flag) has the first value, then the apparatus 200 is configured to generate the processed video data stream such that said network abstraction layer unit does not comprise any other supplémentai enhancement information message that is different from a picture timing supplémentai enhancement information message.

In an embodiment, e.g., if the indication (e.g., a generaLsame_pic_timing_in_all_ols_flag) has the first value, then the apparatus 200 is configured to generate the processed video data stream such that said network abstraction layer unit does not comprise any other supplémentai enhancement information message.

According to an embodiment, for example, if the indication (e.g., a general_same_picjiming_in_all_olsjlag) has the first value, then the apparatus 200 may, e.g., be configured to generate the processed video data stream such that for each network abstraction layer unit, which comprises a non-scalable nested picture timing supplémentai enhancement information message, of each access unit ofthe plurality of access units of a coded video sequence ofthe one or more coded video sequences, said network abstraction layer unit does not comprise any other supplémentai enhancement information message that is different from a picture timing supplémentai enhancement information message, or does not comprise any other supplémentai enhancement information message.

In an embodiment, e.g., if the indication (e.g., a general_same_pic_timing_in_all_ols_flag) has the first value, then the apparatus 200 may, e.g., be configured to generate the processed video data stream such that for each network abstraction layer unit, which comprises a non-scalable nested picture timing supplémentai enhancement information message, of each access unit ofthe plurality of access units of each of the one or more coded video sequences of the processed video data stream, said network abstraction layer unit does not comprise any other supplémentai enhancement information message that is different from a picture timing supplémentai enhancement information message, or does not comprise any other supplémentai enhancement information message.

Moreover, a video décoder 300 for receiving a video data stream having a video stored therein is provided. The video décoder 300 is configured to décodé the video from the video data stream. The video data stream comprises an indication (e.g., a general_same_pic_timing_in^aH ols__flag) indicating whether or not a non-scalabie nested picture timing supplémentai enhancement information message of a network abstraction layer unit of an access unit of the plurality of access units of a coded video sequence of a one or more coded video sequences of the video data stream is defined to apply to ail output layer sets of a plurality of output layer sets of said access unit. If the indication (e.g., a general_same_pic_timingjn_all_ols_flag) has a first value, then the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit is defined to apply to ail output layer sets of the plurality of output layer sets of said access unit. If the indication (e.g., a generaLsame_pic_timingJn_all_ols_flag) has a value being different from the first value, then the indication does not define whether or not the non-scalable nested picture timing supplémentai enhancement information message of said network abstraction layer unit of said access unit applies to ail output layer sets of the plurality of output layer sets of said access unit. The video décoder 300 is configured to décodé the video depending on said indication.

According to an embodiment, for example, if the indication (e.g., a general_same__picjimingjn_all_ols_flag) has the first value, then said network abstraction layer unit does not comprise any other supplémentai enhancement information message that is different from a picture timing supplémentai enhancement information message.

In an embodiment, e.g., if the indication (e.g., a general_same_picjimingjn_all_ols_flag) has the first value, then said network abstraction layer unit does not comprise any other supplémentai enhancement information message. The video décoder 300 is configured to décodé the video depending on said indication.

According to an embodiment, for example, if the indication (e.g., a general_same_pic_timing_in_all_ols_flag) has the first value, then for each network abstraction layer unit, which comprises a non-scalable nested picture timing supplémentai enhancement information message, of each access unit of the plurality of access units of a coded video sequence of the one or more coded video sequences, said network abstraction layer unît does not comprise any other supplémentai enhancement information message that is different from a picture timing supplémentai enhancement information message, or does not comprise any other supplémentai enhancement information message.

In an embodiment, e.g., if the indication (e.g., a general_same_pic_timing_in_all_ols_flag) has the first value, then for each network abstraction layer unit, which comprises a non-scalable nested picture timing supplémentai enhancement information message, of each access unit ofthe plurality of access units of each of the one or more coded video sequences of the video data stream, said network abstraction layer unit does not comprise any other supplémentai enhancement information message that is different from a picture timing supplémentai enhancement information message, or does not comprise any other supplémentai enhancement information message.

Furthermore, a System is provided. The system comprises an apparatus 200 as described above and a video décoder 300 as described above. The video décoder 300 is configured to receive the processed video data stream of the apparatus 200. Moreover, the video décoder 300 is configured to décodé the video from the output video data stream of the apparatus 200.

In particular, the fourth aspect of the invention relates to constraining PT SEI to not be paired with other HRD SEIs when general_same_pic_timingjn_all_ols_flag equal to 1

The WG draft spécification comprises a flag called general_same_pic_timing jn all ols__flag in the general HRD parameter structure which the following semantics:

general _same ,pic_timing_in_ali_ols_flag equal to 1 spécifiés that the nonscalable-nested PT SEI message in each AU applies to the AU for any OLS in the bitstream and no scalable-nested PT SEI messages are présent, généra l_same_picjiming_in_all_ols_flag equal to 0 spécifiés that the nonsca la b le-nested PT SEI message in each AU may or may not apply to the AU for any OLS in the bitstream and scalable-nested PT SEI messages may be présent.

In general, when an OLS sub-bitstream is extracted from an original bitstream (comprises OLS data plus non-OLS data), corresponding HRD-related timing/buffer information for the target OLS in the form of Buffering Period, Picture Timing and Decoding Unit Information SEI messages which are encapsulated in so called scalable-nesting SEI messages are decapsulated. This decapsulated SEI messages are subsequently used to replace the non-scalable nested HRD SEI information in the original bitstream. However, in many scénarios, the content of some messages, e.g. the Picture Timing SEI message, may remain the same when a layer is dropped, i.e. from one OLS to a sub-set thereof. Therefore, général_sarne_pic timingjn_all_ols _fiag provides a shortcut so that only BP and DUI SEI messages are to be replaced, but the PT SEI in the original bitstream may stay in effect, i.e. it is simpîy not removed during extraction when general same pic_ timing_in_all_ols_fiag is equal to 1. Therefore, no replacement PT SEI message needs to be encapsulated in the scalable-nesting SEI message carrying the replacement BP and DUI SEI messages and no bitrate overhead is introduced for this information.

However, in the state-of-the-art, the PT SEI message is allowed to be carried within one SEI NAL unit (NAL unit = network abstraction layer unit) jointly with other HRD SEI messages, i.e. BP, PT and SEI messages may ail be encapsulated within the same Prefix SEI NAL unit. Hence, an extractor would hâve to do a deeper inspection of such an SEI NAL unit to understand the comprised messages and when only one of the comprised messages (PT) is to be kept during the extraction procedure, it would be required to practically re-write the show SEI NAL units (i.e. remove non-PT SEI messages). In order to avoid this cumbersome low-level processing and allow an extractor to operate on the non-parameter-set portion of a bitstream entireiy on the NAL unit level, it is part of the invention that a bitstream constraint disallows such bitstream construction. In one embodiment, the constraint is phrased as follows:

general_same_pic_timing_in_all_ols_flag equal ΐο 1 spécifiés that the nonscalable-nested PT SEI message in each AU applies to the AU for any OLS in the bitstream and no scalable-nested PT SEI messages are présent. general_same_pic_timing_in_all_ols_flag equal to 0 spécifiés that the nonscalable-nested PT SEI message in each AU may or may not apply to the AU for any OLS in the bitstream and scalable-nested PT SEI messages may be présent. When general same_pic_timing_in_all_ols_flag equal to 1, it is a constraint of bitstream conformance that all general SEI messages in the bitstream containing an SEI message with payload_type equal to 1 (Picture Timing ) shall not contain SEI messages with payload_type unequal to 1.

In the following, the fifth aspect ofthe invention is now described in detail.

In accordance with the fifth aspect ofthe invention, a video data stream is provided. The video data stream has a video encoded thereinto. Moreover, the video data stream comprises one or more scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémentai enhancement information messages comprise a plurality of syntax éléments. Each syntax element of one or more syntax éléments ofthe plurality of syntax éléments is defined to hâve a same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream or of a portion of the video data stream.

According to an embodiment, the video data stream may, e.g., comprise one or more non-scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémentai enhancement information messages and the one or more non-scalable nested supplémentai enhancement information message comprise the plurality of syntax éléments. Each syntax element ofthe one or more syntax éléments of the plurality of syntax éléments is defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream or of the portion of the video data stream and in in every one of the non-scalable nested supplémentai enhancement information messages ofthe video data stream or of the portion of the video data stream.

In an embodiment, the video data stream may, e.g., comprise a plurality of access units, wherein each access unit ofthe plurality of access units may, e.g., be assigned to one of a plurality of pictures ofthe video. The portion of the video data stream may, e.g., be an access unit of the plurality of access units of the video data stream. Each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages of the access unit.

According to an embodiment, the video data stream may, e.g., comprise one or more non-scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémentai enhancement information messages and the one or more non-scalable nested supplémentai enhancement information message comprise the plurality of syntax éléments. Each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages of the access unit and in in every one of the non-scalable nested supplémentai enhancement information messages ofthe access unit.

In an embodiment, the portion of the video data stream may, e.g., be a coded video sequence of the video data stream. Each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages of the coded video sequence.

According to an embodiment, the video data stream may, e.g., comprise one or more non-scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémentai enhancement information messages and the one or more non-scalable nested supplémentai enhancement information message comprise the plurality of syntax éléments. Each syntax element ofthe one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages of the coded video sequence and in in every one of the non-scalable nested supplémentai enhancement information messages of the coded video sequence.

In an embodiment, each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages ofthe video data stream,

According to an embodiment, each syntax element ofthe one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream and in in every one of the non-scalable nested supplémentai enhancement information messages ofthe video data stream.

In an embodiment, the video data stream or the portion of the video data stream may, e.g., comprise at least one buffering period supplémentai enhancement information message, wherein said buffering period supplémentai enhancement information message defines the size for each syntax element of the one or more syntax éléments of the plurality of syntax éléments.

According to an embodiment, said buffering period supplémentai enhancement information message comprises, for defining the size foreach syntax element ofthe one or more syntax éléments of the plurality of syntax éléments, at least one of a bp_cpbjnitial_removal_delay_length_minus1 element, a bp_cpb_removal_delay_length_minus1 element, a bp_dpb_output_delayjength_minus1 element, a bp_du_cpb_removaMelay_increment_length_minus1 element, a bp_dpb_output_delay_du_length_minus1 element.

In an embodiment, for each access unit of a plurality of access units of the video data stream, which comprises a scalable-nested buffering period supplémentai enhancement information message, said access unit may, e.g., also comprise a non-scalable-nested buffering period supplémentai enhancement information message.

According to an embodiment, for each single-layer access unit of a plurality of singlelayer access units ofthe video data stream, which comprises a scalable-nested buffering period supplémentai enhancement information message, said single-layer access unit may, e.g., also comprise a non-scalable-nested buffering period supplémentai enhancement information message.

Moreover, a video encoder 100 is provided. The video encoder 100 is configured to encode a video into a video data stream. Moreover, the video encoder 100 is configured to generate the video data stream such that the video data stream comprises one or more scalable nested supplémentai enhancement information messages. Furthermore, the video encoder 100 is configured to generate the video data stream such that the one or more scalable nested supplémentai enhancement information messages comprise a plurality of syntax éléments. Moreover, the video encoder 100 is configured to generate the video data stream such that each syntax element of one or more syntax éléments of the plurality of syntax éléments is defined to hâve a same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream or of a portion of the video data stream.

According to an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that the video data stream may, e.g., comprise one or more non-scalable nested supplémentai enhancement information messages. The video encoder 100 may, e.g., be configured to generate the video data stream such that the one or more scalable nested supplémentai enhancement information messages and the one or more non-scalable nested supplémentai enhancement information message comprise the plurality of syntax éléments. The video encoder 100 may, e.g., be configured to generate the video data stream such that each syntax element of the one or more syntax éléments ofthe plurality ofsyntax éléments may, e.g., be defined to hâvethe same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream or of the portion of the video data stream and in in every one of the non-scalable nested supplémentai enhancement information messages of the video data stream or ofthe portion ofthe video data stream.

In an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that the video data stream may, e.g., comprise a plurality of access units, wherein each access unit of the plurality of access units may, e.g., be assigned to one of a plurality of pictures ofthe video. The portion ofthe video data stream may, e.g., be an access unit of the plurality of access units of the video data stream. The video encoder 100 may, e.g., be configured to generate the video data stream such that each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages ofthe access unît.

According to an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that the video data stream may, e.g., comprise one or more non-scalable nested supplémentai enhancement information messages. The video encoder 100 may, e.g., be configured to generate the video data stream such that the one or more scalable nested supplémentai enhancement information messages and the one or more non-scalable nested supplémentai enhancement information message comprise the plurality of syntax éléments. The video encoder 100 may, e.g., be configured to generate the video data stream such that each syntax element of the one or more syntax éléments ofthe plurality of syntax éléments may, e.g., be defined to hâvethe same size in every one of the scalable nested supplémentai enhancement information messages ofthe access unit and in in every one ofthe non-scalable nested supplémentai enhancement information messages of the access unit.

In an embodiment, the portion of the video data stream may, e.g., be a coded video sequence of the video data stream. The video encoder 100 may, e.g., be configured to generate the video data stream such that each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one ofthe scalable nested supplémenta! enhancement information messages of the coded video sequence.

According to an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that the video data stream may, e.g., comprise one or more non-scalable nested supplémentai enhancement information messages. The video encoder 100 may, e.g., be configured to generate the video data stream such that the one or more scalable nested supplémentai enhancement information messages and the one or more non-scalable nested supplémentai enhancement information message comprise the plurality of syntax éléments. The video encoder 100 may, e.g., be configured to generate the video data stream such that each syntax element of the one or more syntax éléments ofthe plurality of syntax éléments may, e.g., be defined to hâvethe same size in every one of the scalable nested supplémentai enhancement information messages of the coded video sequence and in in every one of the non-scalable nested supplémentai enhancement information messages ofthe coded video sequence.

In an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream.

According to an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream and in in every one of the non-scalable nested supplémentai enhancement information messages of the video data stream.

In an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that the video data stream or the portion of the video data stream may, e.g., comprise at least one buffering period supplémentai enhancement information message, wherein said buffering period supplémentai enhancement information message defines the size for each syntax element of the one or more syntax éléments of the plurality of syntax éléments.

According to an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that said buffering period supplémentai enhancement information message comprises, for defining the size for each syntax element of the one or more syntax éléments of the plurality of syntax éléments, at least one of a bp_cpbjnitial__ removal delay_length_minus1 element, a bp_cpb_removal_delayjength_minus1 element, a bp_dpb_output_delayjength_minus1 element, a bp_du_cpb_removal_delayjncrementjength_minus1 element, a bp_dpb_output_delay_du_length_minus1 element.

In an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that for each access unit of a plurality of access units of the video data stream, which comprises a scalable-nested buffering period supplémentai enhancement information message, said access unit may, e.g., also comprise a non-scalable-nested buffering period supplémentai enhancement information message.

According to an embodiment, the video encoder 100 may, e.g., be configured to generate the video data stream such that for each single-layer access unit of a plurality of singlelayer access units of the video data stream, which comprises a scalable-nested buffering period supplémentai enhancement information message, said single-layer access unit may, e.g., also comprise a non-scalable-nested buffering period supplémentai enhancement information message.

Furthermore, an apparatus 200 for receiving an input video data stream is provided. The input video data stream has a video encoded thereinto. The apparatus 200 is configured to generate an output video data stream from the input video data stream. The video data stream comprises one or more scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémentai enhancement information messages comprise a plurality of syntax éléments. Each syntax element of one or more syntax éléments ofthe plurality of syntax éléments is defined to hâve a same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream or of a portion of the video data stream. The apparatus 200 is configured to process the one or more scalable nested supplémentai enhancement information messages.

According to an embodiment, the video data stream may, e.g., comprise one or more non-scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémentai enhancement information messages and the one or more non-scalable nested supplémentai enhancement information message comprise the plurality of syntax éléments. Each syntax element ofthe one or more syntax éléments of the plurality of syntax éléments is defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream or of the portion of the video data stream and in in every one of the non-scalable nested supplémentai enhancement information messages of the video data stream or of the portion of the video data stream. The apparatus 200 is configured to process the one or more scalable nested supplémentai enhancement information messages and the one or more non-scalable nested supplémentai enhancement information messages.

In an embodiment, the video data stream may, e.g., comprise a plurality of access units, wherein each access unit ofthe plurality of access units may, e.g., be assigned to one of a plurality of pictures of the video. The portion ofthe video data stream may, e.g., be an access unit of the plurality of access units of the video data stream. Each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages ofthe access unit.

According to an embodiment, the video data stream may, e.g., comprise one or more non-scalable nested supplémentai enhancement information messages. The one or more scalabie nested supplémentai enhancement information messages and the one or more non-scalable nested supplémentai enhancement information message comprise the plurality of syntax éléments. Each syntax element ofthe one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages of the access unit and in in every one of the non-scalable nested supplémentai enhancement information messages ofthe access unit. The apparatus 200 may, e.g., be configured to process the one or more scalable nested supplémentai enhancement information messages and the one or more non-scalable nested supplémentai enhancement information messages.

In an embodiment, the portion of the video data stream may, e.g., be a coded video sequence of the video data stream. Each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one ofthe scalable nested supplémentai enhancement information messages of the coded video sequence.

According to an embodiment, the video data stream may, e.g., comprise one or more non-scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémentai enhancement information messages and the one or more non-scalable nested supplémentai enhancement information message comprise the plurality of syntax éléments. Each syntax element ofthe one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages of the coded video sequence and în in every one of the non-scalable nested supplémentai enhancement information messages of the coded video sequence. The apparatus 200 may, e.g., be configured to process the one or more scalable nested supplémentai enhancement information messages and the one or more non-scalable nested supplémentai enhancement information messages.

In an embodiment, each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages ofthe video data stream.

According to an embodiment, each syntax element ofthe one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size în every one of the scalable nested supplémentai enhancement information messages of the video data stream and in in every one of the non-scalable nested supplémentai enhancement information messages ofthe video data stream. The apparatus 200 may, e.g., be configured to process the one or more scalable nested supplémentai enhancement information messages and the one or more non-scalable nested supplémentai enhancement information messages.

In an embodiment, the video data stream or the portion ofthe video data stream may, e.g., comprise at least one buffering period supplémentai enhancement information message, wherein said buffering period supplémentai enhancement information message defines the size of the one or more of the plurality of syntax éléments. The apparatus 200 may, e.g., be configured to process the at least one buffering period supplémentai enhancement information message.

According to an embodiment, said buffering period supplémentai enhancement information message comprises, for defining the size of the one or more of the plurality of syntax éléments, at least one of a bp_cpb_initial_removal_delay_length_minus1 element, a bpcpbremoval__delay_lengthminus1 element, a bp dpb output_delayjength minusl element, a bp_du_cpb_removal_delay_increment_length_minus1 element, a bp_dpb_output_delay_du_length_minus1 element.

In an embodiment, for each access unit of a plurality of access units of the video data stream, which comprises a scalabîe-nested buffering period supplémentai enhancement information message, said access unit may, e.g., also comprise a non-scalable-nested buffering period supplémentai enhancement information message. The apparatus 200 may, e.g., be configured to process the scalable nested supplémentai enhancement information messages and the non-scalable nested supplémentai enhancement information messages.

According to an embodiment, for each single-layer access unit of a plurality of singlelayer access units ofthe video data stream, which comprises a scalable-nested buffering period supplémentai enhancement information message, said single-layer access unit may, e.g., also comprise a non-scalable-nested buffering period supplémentai enhancement information message. The apparatus 200 may, e.g., be configured to process the scalable nested supplémentai enhancement information messages and the non-scalable nested supplémentai enhancement information messages.

Moreover, a video décoder 300 for receiving a video data stream having a video stored therein is provided. The video décoder 300 is configured to décodé the video from the video data stream. The video data stream comprises one or more scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémentai enhancement information messages comprise a plurality of syntax éléments. Each syntax element of one or more syntax éléments of the plurality of syntax éléments is defined to hâve a same size in every one ofthe scalable nested supplémentai enhancement information messages ofthe video data stream or of a portion ofthe video data stream. The video décoder 300 is configured to décodé the video depending on the one or more syntax éléments of the plurality of syntax éléments.

According to an embodiment, the video data stream may, e.g., comprise one or more non-scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémentai enhancement information messages and the one or more non-scalable nested supplémentai enhancement information message comprise the plurality of syntax éléments. Each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream or of the portion of the video data stream and in in every one of the nonscalable nested supplémentai enhancement information messages of the video data stream orofthe portion ofthe video data stream.

In an embodiment, the video data stream may, e.g., comprise a plurality of access units, wherein each access unit of the plurality of access units may, e.g., be assigned to one of a plurality of pictures of the video. The portion of the video data stream may, e.g., be an access unit ofthe plurality of access units ofthe video data stream. Each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in eveiy one of the scalable nested supplémentai enhancement information messages of the access unit.

According to an embodiment, the video data stream may, e.g., comprise one or more non-scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémenta! enhancement information messages and the one or more non-scalable nested supplémentai enhancement information message comprise the plurality of syntax éléments. Each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages of the access unit and în in every one of the non-scalable nested supplémentai enhancement information messages ofthe access unit.

According to an embodiment, the video data stream may, e.g., comprise one or more non-scalable nested supplémentai enhancement information messages. The one or more scalable nested supplémentai enhancement information messages and the one or more non-scalable nested supplémentai enhancement information message comprise the plurality of syntax éléments. Each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages of the coded video sequence and in in every one of the non-scalable nested supplémentai enhancement information messages ofthe coded video sequence.

According to an embodiment, each syntax element of the one or more syntax éléments of the plurality of syntax éléments may, e.g., be defined to hâve the same size in every one of the scalable nested supplémentai enhancement information messages of the video data stream and in in every one of the non-scalable nested supplémentai enhancement information messages ofthe video data stream.

According to an embodiment, said buffering period supplémentai enhancement information message comprises, for defining the size for each syntax element ofthe one or more syntax éléments ofthe plurality of syntax éléments, at least one of a bp_cpb_initial_removal_delay_length_minus1 element, a bp_cpb_removal_delayjength_minus1 element, a bp_dpb_output_delay_length_minus1 element, a bpducpbremovaldelayjncrement_length__minus1 element, a bp_dpb _output_delay _du_length_ minus 1 element.

Furthermore, a system is provided. The System comprises an apparatus 200 as described above and a video décoder 300 as described above. The video décoder 300 îs configured to receive the output video data stream of the apparatus 200. Moreover, the video décoder 300 configured to décodé the video from the output video data stream of the apparatus 200.

In particular, the fifth aspect ofthe invention relates to constraining ail BP SEIs messages in a bitstream to indicate the same length of certain variable-length coded syntax éléments and not be scalable-nested without a non-scalable nested variant in the same AU

The buffering period SEI message, the picture timing SEI message and the decoding unit information SEI message are used to provide précisé timing information for the N AL units within a bitstream to control their transition through the buffers of a décoder in conformance tests. Some syntax éléments in the PT and DUI SEI message are coded with variable length and the length of these syntax éléments is conveyed in the BP SEI message. This parsing dependency is a design trade-off. For the cost of not allowing PT and DUI SEI message parsing without parsing the associated BP SEI message first, the benefit of saving sending of those length syntax éléments ateach PT or DUI SEI message is achieved. As the BP SEI message (once per multiple frames) is send much less often than PT (once per each frame) or DUI SEI messages (multiple times per frame), a bitsaving is achieved through this common design trade-off similar as to how picture header structures can reduce the bit-cost of slice headers when many slices are used.

More specifically, the BP SEI message in the current WC draft spécification includes the syntax éléments that are the root of parsing dependencies:

• bp_cpbjnitial_removal_delay_length_minus1 that spécifiés the coded length of alternative timing initial CPB removal delays of AUs in the PT SEI message and, • bp_cpb_removal_delay_length_minus1 that spécifiés the coded length of CPB removal delays and removal delay offsets of AUs in the PT SEI message and, • bp_dpb_output_de!ayjength_minus1 that spécifiés the coded length of DPB output delays of AUs in the PT SEI message and, • bp_du_cpb_removal_delay_incrementjength_minus1 that spécifiés the coded length of the individual and common CPB removal delays of DUs in the PT SEI message and the CPB removal delays of DUs in the DUI SEI message and, • bp_dpb_output_delay_du_length_minus1 that spécifiés the coded length DPB output delays of AUs in the PT SEI message and in the DU SEI message.

However, a problem anses when a bitstream comprises multiple OLSs. While the BP/PT/DUI SEI messages that apply to the OLS that represents the bitstream are carried in a Verbatim fashion in the bitstream are keeping track of the parsing dependency is trivial, other pairs of BP/PT/DUI SEI messages that correspond to the OLSs representing (sub-)bitstreams are to be carried in an encapsulated form in so-called scalable nesting SEI messages. Still, the parsing dependencies apply and given that the number of OLS might be very high, it is a considérable burden for a décoder or parser to keep track of the correct encapsulated BP SEI message for the sake of the parsing dependency when processing the encapsulated PT and DUI SEI messages. Especially, since those messages can also be encapsulated in different scalable nesting SEI messages.

Therefore, as part of the invention, in one embodiment, a bitstream constraint is established that the coded value ofthe respective syntax éléments describing the lengths must be the same in ali scalable-nested and non-scalable nested BP SEI messages in an AU. Therefore, a décoder or parser only needs to store the respective length values when parsing the first non-scalable BP SEI message in the AU and can résolve the parsing dependencies of ali PT and DUI SEI message in the buffering période that start at the respective AU, whether encapsulated in scalable-nesting SEI messages or not. The following is an example ofthe respective spécification text:

It is a requirement of bitstream conformance that ail scalable-nested and nonscalable nested buffering period SEI messages in an AU hâve the same respective value of the syntax éléments bp_cpbJnitial_removaLdelayJength_rninus1, bp_cpb„removal_delay_length_minus1, bp_dpb_output_delay_length„minus1, bp_du_cpb_removal_delayjncrement_length_minus1, bp_dpb_output_delay_du_length_minus1.

In another embodiment, the constraint is expressed only for scalable-nested BP SEI messages that are in the buffering period that the current non-scalable-nested BP SEI message détermine as follows:

It is a req ui rement of bitstream conformance that ali scalable nested buffering period SEI messages in a buffering period hâve the same respective value of the syntax éléments bp_cpb_initial_removal_delay_length_minus1, bp_cpb_removal_delay_length_minus1, bp_dpb_output_delay_length_minus1, bp_du_cpb_removal_delay_increment_length_minus1, bp_dpb_output_delay_du_length_minus1 then the non-scalable nested buffering period SEI messages ofthe buffering period.

Here, the BPs of the bitstream define the scope of the constraints for the scalable-nested BPs from one scalable nested BP to the next scalable nested BP.

In another embodiment, the constraint is expressed for ail AUs ofthe bitstream, e.g. as follows:

It is a requirement of bitstream conformance that ail scalable-nested and nonscalable nested buffering period SEI messages in the bitstream hâve the same respective value of the syntax éléments bp_cpbjnitial_removal„delay_length_minus1, bp_cpb_removal_delay_length_minus1, bp_dpb_output_delay_length_minus1, bp„du_cpb__removal_delay_increment_length_minus1, bp_dpb_output_delay_dujength_minus1.

In another embodiment, the constraint is expressed only for the AUs in a CVS, so a smart encoder may still be able to faciîitate the différence in duration of BPs in the bitstream for the coding of the relevant delay and offset syntax éléments. Spécification text would be as foilows:

It is a requirement of bitstream conformance that ail scalable-nested and nonscalable nested buffering period SEI messages in a CVS hâve the same respective value of the syntax éléments bp_cpbjnitial_removal_delay_length_minus1, bp_cpbj-emoval_delay_length_minus1, bp_dpb_output_de[ay_length_minus1, bp_du_cpb_removal_delayjncrement_length_minus1, bp_dpb_output_delay_dujength_minus1.

Here, the constraint scope is the CVS.

More specifically, the buffering period or BP SEI message defines a so-called buffering period in which timings of individual pictures use the picture at the start of a buffering period as an anchor. The beginning of a buffering period is instrumental, for instance, to test conformance of random-access functionality in a bitstream.

In a muîti-layer scénario as shown in Fig. 7, for instance, the scalable-nested H RD SEIs provide a different buffering period setup (through the BP at POC 0 and POC 3) than the non-scalable nested SEIs (only POC 0) to be used when only layer L0 is extracted and played from POC3 onwards.

However, this also cornes at the increased complexity cost of tracking the parsing dependencies between the PT and the individual BP messages as explained above which is undesirable. Therefore, as part ofthe invention, in one embodiment, it is prohibited to hâve scalable-nested BP SEI message in AUs without a non-scalable nested BP SEI message as follows:

It is a requi rement of bitstream conformance that no scalable-nested BP SEI message shall be in an AU that does not contain a non-scalable-nested BP SEI message.

As the above usage scénario is limited to multi-layer bitstreams, in another embodiment, the related constraint is limited to single-layer bitstreams as follows:

It is a requirement of bitstream conformance that no scalable-nested BP SEI message shall be in a single-layer AU that does not contain a non-scalable-nested BP SEI message,

Although some aspects hâve been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or ail of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, one or more of the most important method steps may be executed by such an apparatus.

Depending on certain implémentation requirements, embodiments of the invention can be implemented in hardware or in software or at least partially in hardware or at least partially in software. The implémentation can be performed using a digital storage medium, for exampîe a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having électron ica lly readable control signais stored thereon, which cooperate (or are capable of cooperating) with a programmable computer System such that the respective method is performed. Therefore, the digital storage medium may be computer readable.

Some embodiments according to the invention comprise a data carrier having electronically readable control signais, which are capable of cooperating with a programmable computer System, such that one of the methods described herein is performed.

Generally, embodiments of the présent invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.

Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.

In other words, an embodiment ofthe inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.

A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one ofthe methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or nontransitory.

A further embodiment of the inventive method is, therefore, a data stream or a sequence of signais representing the computer program for performing one of the methods described herein. The data stream or the sequence of signais may for example be configured to be transierred via a data communication connection, for example via the Internet.

A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.

A further embodiment comprises a computer having installed thereon the computer program for performing one ofthe methods described herein.

A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, électron ica lly or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.

In some embodiments, a programmable logic device (for example a field programmable gâte array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gâte array may cooperate with a microprocessor in orderto perform one ofthe methods described herein. Generally, the methods are preferably performed by any hardware apparatus.

The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination ofa hardware apparatus and a computer.

The methods described herein may be performed using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.

The above described embodiments are merely illustrative forthe principles ofthe présent invention. It is understood that modifications and variations ofthe arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the 5 spécifie details presented by way of description and explanation of the embodiments herein.

Claims

1. A video decoder (300) for receiving a video data stream a video stored therein, wherein the video decoder (300) is configured to décodé the video from the video data stream, wherein the video decoder (300) is configured to décodé the video depending on an access unit removal time of a current picture of a plurality of pictures of the video from a coded picture buffer, _wh_erej_n the video decoder (300) is configured to décodé the video depending on an indication indicatîng whether or not to use coded picture buffer delay offset information for determining the access unit removal time of the current picture from the coded picture buffer.

y

2. The video decoder (300) according to claim 1, wherein the video decoder (300) is configured to détermine whether or not to use the coded picture buffer delay offset information for determining the access unit removal time ofthe current picture depending on a concaténation flag.

3. The video decoder (300) according to claim 2, wherein if the concaténation flag is set to a first value then the video decoder (300) is configured to use the coded picture buffer delay offset information to détermine one or more access unit removal times, and wherein if the concaténation flag is set to a second value being different from the first value then the video decoder (300) is configured to not use the coded picture buffer delay offset information to détermine the one or more access unit removal times.

4. The video decoder (300) according to any one of claims 1 to 3, ⁷⁸ wherein the access unit removal time for ât least one of the plurality of pictures of the video from the coded picture buffer dépends on the coded picture buffer delay offset information?

5. The video decoder (300) according to any one of claims 1 to 4, wherein the video décoder (300) is configured to décodé the video depending on whether or not to use coded picture buffer delay offset information for determining the access unit removal time of the current picture, wherein the access unit removal time of the current picture dépends on a location of the current picture within the video.

6. A video encoder (100), wherein the video encoder (100) is configured to encode a video into a video data stream, wherein the video encoder (100) is configured to generate the video data stream such that the video data stream comprises coded picture buffer delay offset information, wherein the video encoder (100) is configured to generate the video data stream such that the video data stream comprises an indication indicating whether or not to use the coded picture buffer delay offset information for determining an access unit removal time of a current picture from a coded picture buffer used for decoding the video on a décoder side.

7. The video encoder (100) according to claim 6, wherein the video encoder (100) is configured to generate the video data stream such that the video data stream comprises a concaténation flag.

8. The video encoder (100) according to claim 7, wherein if the concaténation flag is set to a first value then the concaténation flag indicates that the coded picture buffer delay offset information needs to be used to détermine one or more access unit removal times, and wherein if the concaténation flag is set to a second value being different from the first value then the concaténation flag indicates that the indicated offset is not used to détermine the one or more access unit removal times.

9. The video encoder (100) according to any one of claims 6 to 8, wherein the video encoder (100) is configured to generate the video data stream such that the video data stream comprises information on the coded picture buffer delay offset usable to détermine an access unit removal time for at least one of the plurality of pictures of the video from the coded picture buffer.

10. The video encoder (100) according to any one ofclaims 6 to 9, wherein the video encoder (100) is configured to generate the video data stream such that the video data stream comprises information on a location of the current picture within the video usable for determining an access unit removal time of the current picture.

11. A video data stream, wherein the video data stream has a video encoded thereinto, wherein the video data stream comprises coded picture buffer delay offset information, wherein the video data stream comprises an indication indicating whether or not to use the coded picture buffer delay offset information for determining an access unît removal time of a current picture from a coded picture buffer used for decodîng the video on a décoder side,

12. The video data stream according todaim 11, ” 80 *

wherein the video data stream comprises a concaténation flag,

13. The video data stream according to claim 12, wherein if the concaténation flag is set to a first value then the concaténation flag indicates that the coded picture buffer delay offset information needs to be used to détermine one or more access unit removal times, and wherein if the concaténation flag is set to a second value being different from the first value then the concaténation flag indicates that the indicated offset is not used to détermine the one or more access unit removal times.

14. The video data stream according to any one of claims 11 to 13, wherein the video data stream comprises information on the coded picture buffer delayoffset usable to détermine an access unit removal time for at least one ofthe plurality of pictures of the video from the coded picture buffer.

15. The video data stream according to any one of claims 11 to 14, wherein the video data stream comprises information on a location of the current picture within the video usable for determining an access unit removal time of the current picture.

16. A method for receiving a video data stream a video stored therein, wherein the method comprises decoding the video from the video data stream, wherein decoding the video is conducted depending on an access unit removal time of a current picture of a plurality of pictures of the video from a coded picture buffer, • fa wherein decoding the video is conducied 'depending on an indication indicating whether or not to use coded picture buffer delay offset information for determining the access unit removal time of the current picture from the coded picture buffer.

5

17. A method for encoding a video into a video data stream, wherein the method comprises generating the video data stream such that the video data stream comprises coded picture buffer delay offset information,

10 wherein generating the video data stream is conducted such that the video data stream comprises an indication indicating whether or not to use the coded picture buffer delay offset information for determining an access unit removal time of a current picture from a coded picture buffer used for decoding the video on a décoder side.

18. A computer program for împlementing the method of claim 16 or 17 when being executed on a computer or signal processor.