WO2020249530A1 - Layered video data stream - Google Patents
Layered video data stream Download PDFInfo
- Publication number
- WO2020249530A1 WO2020249530A1 PCT/EP2020/065887 EP2020065887W WO2020249530A1 WO 2020249530 A1 WO2020249530 A1 WO 2020249530A1 EP 2020065887 W EP2020065887 W EP 2020065887W WO 2020249530 A1 WO2020249530 A1 WO 2020249530A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- layer
- data stream
- video data
- layers
- layered video
- Prior art date
Links
- 239000000872 buffer Substances 0.000 claims abstract description 17
- 239000010410 layer Substances 0.000 claims description 309
- 238000000034 method Methods 0.000 claims description 38
- 238000013507 mapping Methods 0.000 claims description 31
- 230000011664 signaling Effects 0.000 claims description 12
- 239000002356 single layer Substances 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 11
- 230000002123 temporal effect Effects 0.000 claims description 9
- 230000000153 supplemental effect Effects 0.000 claims description 7
- 230000015572 biosynthetic process Effects 0.000 claims description 5
- 238000009877 rendering Methods 0.000 claims description 3
- 238000007726 management method Methods 0.000 claims 25
- 241000282326 Felis catus Species 0.000 claims 3
- 238000010586 diagram Methods 0.000 description 15
- 230000000875 corresponding effect Effects 0.000 description 14
- 230000009466 transformation Effects 0.000 description 7
- 238000003860 storage Methods 0.000 description 5
- 239000000203 mixture Substances 0.000 description 4
- 238000013139 quantization Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000012805 post-processing Methods 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 238000000844 transformation Methods 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000012886 linear function Methods 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 229920000136 polysorbate Polymers 0.000 description 2
- 239000012536 storage buffer Substances 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 241000023320 Luma <angiosperm> Species 0.000 description 1
- 102100037812 Medium-wave-sensitive opsin 1 Human genes 0.000 description 1
- 101150114886 NECTIN1 gene Proteins 0.000 description 1
- 102100023064 Nectin-1 Human genes 0.000 description 1
- 235000017276 Salvia Nutrition 0.000 description 1
- 241001072909 Salvia Species 0.000 description 1
- ZVQOOHYFBIDMTQ-UHFFFAOYSA-N [methyl(oxido){1-[6-(trifluoromethyl)pyridin-3-yl]ethyl}-lambda(6)-sulfanylidene]cyanamide Chemical compound N#CN=S(C)(=O)C(C)C1=CC=C(C(F)(F)F)N=C1 ZVQOOHYFBIDMTQ-UHFFFAOYSA-N 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 229940000425 combination drug Drugs 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- -1 e.g. Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000011229 interlayer Substances 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical group COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 235000002020 sage Nutrition 0.000 description 1
- 230000002226 simultaneous effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/2365—Multiplexing of several video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234327—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into layers, e.g. base layer and one or more enhancement layers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/187—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
Definitions
- the present application is concerned with layered video data stream and its coding/decod- ing.
- Layered media coding technologies such as Scalable Video Coding (SVC) or Multiview Video Coding (MVC) generate video bit-streams with various media layers, each of them representing another level of quality. Due to inter-layer prediction there exist hierarchies between these media layers, where media layers depend on other media layers for suc- cessful decoding.
- SVC Scalable Video Coding
- MVC Multiview Video Coding
- SVC is one of the video compression standards and standardizes the encoding of a high- quality video bitstream that also contains one or more subset bitstreams (a form of layered coding).
- a subset video bitstream is derived by dropping packets from the larger video to reduce the bandwidth required for the subset bitstream.
- the subset bitstream can represent a lower spatial resolution (smaller screen), lower temporal resolution (lower frame rate), or lower quality video signal.
- MVC is a stereoscopic video coding standard for video compression that allows for the efficient encoding of video sequences captured simultaneously from multiple camera angles in a single video stream.
- MVC is based on the idea that video recordings of the same scene from multiple angles share many common elements. It is possible to encode all simultane- ous frames captures in the same elementary stream and to share as much information as possible across the different layers. This can reduce size of the encoded video.
- the object of the subject-matter of the present application is to provide a layered video data stream which comprises information how to output the video content from the layered video data stream including more than one video content.
- a layered video data stream comprising several layers into which different video content is encoded inde- pendently from each other, wherein the layered video data stream comprises: decoded pic- ture buffer, DPB, information indicating, for a predetermined layer, first DPB management parameters for isolatedly decoding the predetermined layer from the layered video data stream, and second DPB management parameters for joint decoding the predetermined layer from the layered video data stream along with at least one further layer associated with the predetermined layer. That is, in case framerate of each layer is not aligned, i.e., different framerate is mixed, the DPB management parameters indicate a relationship be- tween decoded pictures and an output picture. Therefore, it is possible to output the de- coded pictures even if the video data stream includes multi-layered coded video content and single layered coded video content.
- a layered video data stream comprising several layers into which different video content is encoded inde- pendently from each other, wherein the layered video data stream comprises: access unit, AU, delimiters indicative of, and positioned at borders between, consecutive AUs of the layered video data stream, and a signaling indicating whether the AUs between the AU delimiters contain a picture of one layer exclusively, or co-temporal pictures of the several layers, e.g., group together pictures of the several layers which coincide in picture output time, or individually carry the picture of the several layers which coincide in picture output time. Therefore, it is possible to output the video content at appropriate timing.
- a layered video data stream comprising several layers into which a plurality of sub-videos is encoded inde- pendently from each other, wherein the layered video data stream comprises: mapping in- formation indicating a mapping of the sub-videos onto different sub-regions of an output scene area. Therefore, it is possible to efficiently output, for example, 360 degree video by using the mapping information indicating the spatial location of the output picture.
- a layered video data stream comprising several layers into which different video content is encoded, wherein the layered video data stream comprises: supplemental enhancement information, SEI, mes- sages, which relate to different groups of layers out of the several layers, and layer associ- ation information which informs on an association of each SEI message to a group of layers which same relates to. That is, for example, in case of the 3D content, a picture belongs to a region of an output content is signaled by using several layers, i.e., a group of layers containing the picture which must output together, is indicated by the SEI message, and, therefore, the multi-layered content is simply and efficiently output by using SEI message.
- SEI Supplemental Enhancement Information
- an encoder configured to encode different video content into layers of a layered video data stream independently from each other, and provide the layered video data stream comprising, e.g., an indicator indicating a video data signaling condition, i.e., indicating data set belongs to a picture, and/or, belongs to a region consisting a picture. Therefore, at a decoder, which receives the encoded video data stream from the encoder, it is possible to output the layered video con- tent without increasing complexity of decoding process.
- Fig. 1 shows a block diagram of an apparatus for predictively encoding a video as an example for a video encoder where a layered video data stream according to embodiments of the present application could be encoded;
- Fig. 2 shows a block diagram of an apparatus for predictively decoding a video, which fits to the apparatus of Fig. 1 , as an example for a video decoder where a layered video data stream according to embodiments of the present appli- cation could be decoded;
- Fig. 3 shows a schematic diagram illustrating an example for a relationship be- tween a prediction residual signal, a prediction signal and a reconstructed signal so as to illustrate possibilities of setting subdivisions for defining the prediction signal, handling the prediction residual signal and the like, respec- tively;
- Fig. 4 shows a schematic illustration for a relationship between access units, AUs, and layers and its corresponding pictures according to layered video
- Fig. 5 shows a schematic illustration in case different framerate layers are included in the layered video data stream
- Fig. 6 shows a schematic diagram indicating an example for an extension of the video parameter set, VPS, according to embodiments of the present applica- tion;
- Fig. 7 shows a diagram indicating a further example for an extension of the video parameter set, VPS, according to embodiments of the present application
- Fig. 8 shows a schematic illustration an example indicating a decoded picture buffer, DPB, according to embodiments of the present application
- Fig. 9 (a), (b) shows a schematic illustration an example indicating a required maximum number of the pictures to be stored in the DPB according to embodiments of the present application
- DPB decoded picture buffer
- Fig. 10 shows a schematic illustration an example indicating a relationship between the layers in the layered video data stream according to embodiments of the present application
- Fig. 1 1 shows a diagram indicating an example for mapping information in form of video parameter set according to embodiments of the present application
- Fig. 12 shows a diagram indicating a further example for mapping information in form of video parameter set according to embodiments of the present application
- Fig. 13 shows a diagram indicating an example for mapping information in case an output scene is a cube map according to embodiments of the present appli- cation
- Fig. 14 shows a diagram indicating an example for supplemental enhancement in- formation, SEI, message including layer association information in form of the video parameter set according to embodiments of the present applica- tion;
- Fig. 15 shows a diagram indicating a further example for SEI message including the layer association information according to embodiments of the present appli- cation
- Fig. 16 shows a schematic illustration indicating an example for SEI message includ- ing the layer association information indicating rendering parameters accord- ing to embodiments of the present application;
- Fig. 17 shows a schematic illustration indicating an example for SEI message includ- ing the layer association information indicating an application of offsets ac- cording to embodiments of the present application
- Fig. 18 shows diagram indicating an example for SEI message including the layer association information indicating an application of offsets in form of video parameter set according to embodiments of the present application.
- the following description of the figures starts with a presentation of a description of video encoder and video decoder of a block-based predictive codec for coding pictures of a video in order to form an example for a coding framework into which embodiments for a layered video data stream codec may be built in.
- the video encoder and video decoder are de- scribed with respect to Figs 1 to 3.
- the description of embodiments of the con- cept of the layered video data stream codec of the present application are presented along with a description as to how such concepts could be built into the video encoder and de- coder of Figs. 1 and 2, respectively, although the embodiments subsequently described, may also be used to form video encoder and video decoders not operating according to the coding framework underlying the video encoder and video decoder of Figs. 1 and 2.
- Fig. 1 shows a block diagram of an apparatus for predictiveiy coding a video as an example for a video decoder where a motion compensated prediction for inter-predicted blocks ac- cording to embodiments of the present application could be implemented. That is, Fig. 1 shows an apparatus for predictiveiy coding a video 11 composed of a sequence of pictures 12 into a data stream 14. Block-wise predictive coding is used to this end. Further, trans- form-based residual coding is exemp!arily used. The apparatus, or encoder, is indicated using reference sign 10.
- Fig. 2 shows a block diagram of an apparatus for predictiveiy decoding a video as an ex- ample for a video decoder where a motion compensated prediction for inter-predicted blocks according to embodiments of the present application could be implemented. That is, Fig. 2 shows a corresponding decoder 20, i.e. an apparatus 20 configured to predictiveiy decode the video 1 T composed of pictures 12’ in picture blocks from the data stream 14, also here exempiarily using transform-based residual decoding, wherein the apostrophe has been used to indicate that the pictures 12’ and video 1 T, respectively, as reconstructed by decoder 20 deviate from pictures 12 originally encoded by apparatus 10 in terms of cod- ing loss introduced by a quantization of the prediction residual signal.
- Fig. 20 shows a block diagram of an apparatus for predictiveiy decoding a video as an ex- ample for a video decoder where a motion compensated prediction for inter-predicted blocks according to embodiments of the present application could be implemented. That is, Fig
- the encoder 10 is configured to subject the prediction residual signal to spatial-to-spectral transformation and to encode the prediction residual signal, thus obtained, into the data stream 14.
- the decoder 20 is configured to decode the prediction residual signal from the data stream 14 and subject the prediction residual signal thus obtained to spectral- to-spatial transformation.
- the encoder 10 may comprise a prediction residual signal former 22 which gen- erates a prediction residual 24 so as to measure a deviation of a prediction signal 26 from the original signal, i.e. video 1 1 or a current picture 12.
- the prediction residual signal former 22 may, for instance, be a subtractor which subtracts the prediction signal from the original signal, i.e. current picture 12.
- the encoder 10 then further comprises a transformer 28 which subjects the prediction residual signal 24 to a spatial-to-spectral transformation to obtain a spectral-domain prediction residual signal 24’ which is then subject to quantization by a quantizer 32, also comprised by encoder 10.
- the thus quantized prediction residual signal 24” is coded into data stream 14.
- encoder 10 may optionally comprise an en- tropy coder 34 which entropy codes the prediction residual signal as transformed and quan- tized into data stream 14.
- the prediction residual 26 is generated by a prediction stage 36 of encoder 10 on the basis of the prediction residual signal 24” decoded into and decodable from, data stream 14.
- the prediction stage 36 may internally, as is shown in Fig. 6, comprise a dequantizer 38 which dequantizes prediction residual signal 24” so as to gain spectral-domain prediction residual signal 24'”, which corresponds to signal 24’ except for quantization loss, followed by an inverse transformer 40 which subjects the latter pre- diction residual signal 24’” to an inverse transformation, i.e.
- prediction residual signal 24’ a spectral-to-spatial transfor- mation, to obtain prediction residual signal 24’”’, which corresponds to the original prediction residual signal 24 except for quantization loss.
- a combiner 42 of the prediction stage 36 then recombines, such as by addition, the prediction signal 26 and the prediction residual signal 24”” so as to obtain a reconstructed signal 46, i.e. a reconstruction of the original signal 12.
- Reconstructed signal 46 may correspond to signal 12’.
- a prediction module 44 of prediction stage 36 then generates the prediction signal 26 on the basis of signal 46 by using, for instance, spatial prediction, i.e. intra prediction, and/or temporal prediction, i.e. inter prediction. Details in this regard are described in the following.
- decoder 20 may be internally composed of components corresponding to, and interconnected in a manner corresponding to, prediction stage 36.
- entropy decoder 50 of decoder 20 may entropy decode the quantized spectral-domain prediction residual signal 24" from the data stream, whereupon dequantizer 52, inverse transformer 54, combiner 56 and prediction module 58, interconnected and cooperating in the manner described above with respect to the modules of prediction stage 36, recover the recon- structed signal on the basis of prediction residual signal 24” so that, as shown in Fig. 3, the output of combiner 56 results in the reconstructed signal, namely the video 1 1 'or a current picture 12’ thereof.
- the encoder 10 may set some coding parameters including, for instance, prediction modes, motion parameters and the like, according to some optimization scheme such as, for instance, in a manner optimiz- ing some rate and distortion related criterion, i.e. coding cost, and/or using some rate con- trol.
- some optimization scheme such as, for instance, in a manner optimiz- ing some rate and distortion related criterion, i.e. coding cost, and/or using some rate con- trol.
- encoder 10 and decoder 20 and the corresponding modules 44, 58 respectively, support different prediction modes such as intra-coding modes and inter-coding modes which form a kind of set or pool of primitive prediction modes based on which the predictions of picture blocks are composed in a manner described in more detail below.
- the granularity at which encoder and decoder switch between these prediction compositions may correspond to a subdivision of the pictures 12 and 12', respec- tively, into blocks. Note that some of these blocks may be blocks being solely intra-coded and some blocks may be blocks solely being inter-coded and, optionally, even further blocks may be blocks obtained using both intra-coding and inter-coding, but details are set-out hereinafter.
- intra-coding mode a prediction signal for a block is obtained on the basis of a spatial, already coded/decoded neighborhood of the respective block.
- intra-coding sub-modes may exist the selection among which, quasi, represents a kind of infra prediction parameter.
- the intra-coding sub- modes may, for instance, also comprise one or more further sub-modes such as a DC cod- ing mode, according to which the prediction signal for the respective block assigns a DC value to all samples within the respective block, and/or a planar intra-coding mode accord- ing to which the prediction signal of the respective block is approximated or determined to be a spatial distribution of sample values described by a two-dimensional linear function over the sample positions of the respective block with deriving tilt and offset of the plane defined by the two-dimensional linear function on the basis of the neighboring samples.
- a prediction signal for a block may be obtained, for instance, by temporally predicting the block inner.
- motion vectors may be signaled within the data stream, the motion vectors indicating the spatial displacement of the portion of a previously coded picture of the video 1 1 at which the previously coded/decoded picture is sampled in order to obtain the prediction signal for the respective block.
- data stream 14 may have encoded thereinto prediction related parameters for assigning to the blocks pre- diction modes, prediction parameters for the assigned prediction modes, such as motion parameters for inter-prediction modes, and, optionally, further parameters which control a composition of the final prediction signal for the blocks using the assigned prediction modes and prediction parameters as will be outlined in more detail below.
- the data stream may comprise parameters controlling and signaling the subdivision of picture 12 and 12’, respectively, into the blocks.
- the decoder 20 uses these parameters to subdivide the picture in the same manner as the encoder did, to assign the same prediction modes and parameters to the blocks, and to perform the same prediction to result in the same prediction signal.
- Fig. 3 shows a schematic diagram illustrating an example for a relationship between a pre- diction residual signal, a prediction signal and a reconstructed signal so as to illustrate pos- sibilities of setting subdivisions of defining the prediction signal, handling the prediction re- sidual signal and the like, respectively. That is, Fig. 3 illustrates the relationship between the reconstructed signal, i.e. the reconstructed picture 12’, on the one hand, and the com- bination of the prediction residual signal 24”” as signaled in the data stream, and the pre- diction signal 26, on the other hand. As already denoted above, the combination may be an addition.
- the prediction signal 26 is a subdivision of the picture area into blocks 80 of vary- ing size, although this is merely an example.
- the subdivision may be any subdivision, such as a regular subdivision of the picture area into rows and columns of blocks, or a multi-tree subdivision of picture 12 into leaf blocks of varying size, such as a quadtree subdivision or the like, wherein a mixture thereof where the picture area is firstly subdivided into rows and columns of tree-root blocks which are then further subdivided in accordance with a recursive multi-tree subdivisioning to result into blocks 80.
- Layered video coding has been in the past used for SNR, Spatial and temporal scalability or combinations thereof.
- Temporal scalability with unequal framerates (frame rates) is known in the art, e.g. as in AVC or HEVC.
- every N-th Access Unit, where N is the framerate ratio between two layers, contains one more picture belonging to a higher layer compared to other AUs. For instance, as shown in Fig.
- Having picture of more than one layer into a single AU i.e. grouping more than one picture of one layer together as an AU
- Having picture of more than one layer into a single AU allows treating them as a single construct, i.e. providing the same timing information (e.g. cpb, coded picture buffer, removal time) when decoding both layers, adding a single Access Unit Delimiter (AUD) NAL, Network Abstraction Layer, unit for detection of when an AU begins or ends, etc.
- AUD Access Unit Delimiter
- the invention is to, in certain cases, not bundle pictures from two layers into a joint access unit, regardless of the output timing, and handle the bumping process layer- wise.
- two pictures of the bitstream have the same output time, e.g. same POC value of same syntax“dpbjemoval” time in an SEI (Supple- mental Enhancement Information)
- SEI Supplemental mental Enhancement Information
- AUD NAL units may for some cases mean that the boundary of cross-layer AU is indicated (1 ), and for some use-cases that only a specific layer is considered to be within an AU and therefore the presence of the AUD NAL unit indicates that the boundary of a picture of one layer is indicated (2).
- Such a meaning of an AUD NAL unit could be signaled within a parameter set, that indicates that AUs contain pictures of a group of layers or AU (1 ) or that an AU contains only a picture for a single layer (2). Such a signaling would be added for instance in the SPS (Sequence Parameter Set) or VPS (Video Parameter Set).
- SPS Sequence Parameter Set
- VPS Video Parameter Set
- an extension of the VPS is shown in Fig. 6.
- a syntax“au_multi_layer_flag” indicating that the AUs contain pictures of a group of layers is included in VPS.
- the flag when set in the parameter set, would imply the meaning of an AUD NAL unit. If the flag is set to 1 it would mean that the following AU has a different output time, e.g. a different POC value or a different syntax“dpb_removal” time at an SEI. If set to 0 it would mean that in addition to having a different output time, it could be the picture of another layer but with the same output time.
- an AUD NAL unit could be extended and include a flag that indicates whether it applies to cross-layer AU (1) or single layer AU (2) as shown in Fig. 7.
- each AUD NAL unit would indicate whether it covers multiple layers or it covers only one layer and each layer has its own AUD NAL unit.
- the bitstream may be constraint to have the same value in all AUD NAL units, disallowing having sometimes the flag to 1 indicating that an AU is considered as having multiple layers and sometimes indi- cating a per layer AU with the flag set to 0.
- a further related aspect applies to a syntax“max_dec_pic_buffering_delay”, i.e., decoded picture buffer, DPB, information which is related to the selected GOP (group of pictures) size.
- Fig. 8 shows a relationship between decoded picture, a DPB (decoded picture buffer) and an output picture.
- framerates are completely different, and, therefore also the different layers are used only for multiplexing, i.e. there is no joint decoding con- sidered or AUs only refer to picture of a single layer, completely different and non-aligned GOP sizes could be chosen.
- the maximum number of pictures within the DPB for a layer can be easily computed. How- ever, it may apply that a couple of layers should be output simultaneously when consumed together. Obviously, if synchronized output was desired, delaying the output of a picture of one layer to be able to synchronize it with a corresponding picture in another layer may be required, increasing thus the number of pictures in the DPB of that layer. On the other side, if layers are to be consumed independently, e.g. only one of it, a smaller value of the syntax “max_dec_pic_buffering_delay” would be considered, i.e. that of that layer should be taken into consideration.
- the syntax“max_dec_pic_buffertng” actually refers to the maximum number of pic- tures that may be needed to be stored in the DPB.
- more than one value is to be signaled for a given layer, e.g. one for the case that is consumed individually and another value indicating for a given combination of layers.
- an inheritance value would be indicated for the case that more than one layers are consumed together and need to be synchronized, e.g. taking the value of the other layer and multiplying it by the ratio of their corresponding framerates or GOP sizes.
- Fig. 9 (a) shows an example of a maximum number of pictures to be stored in the DPB for a case of 8 frame per second
- Fig. 9 (b) shows an example of a maximum number of pictures to be stored in the DPB for a case that framerate of 8 fps and 10 fps are mixed.
- a syntax“sps_max_num_reorder_pics” is a value between 0 and the syntax“max_dec_pic_buffering” that steers the bumping process, i.e. the output of the pic- ture in the DPB.
- the picture that is first for output is selected as the one having the smallest value of PicOrderCntVal of all pictures in the DPB marked as“needed for output”.
- the picture is cropped, using the conformance cropping window specified in the active SPS for the picture, the cropped picture is output, and the picture is marked as“not needed for output”.
- the syntax“sps_max_num_reorder_pics” controls the number of pictures in the DPB that are required before the bumping process starts, i.e., from HEVC: When one or more of the following conditions are true, the "bumping" process specified in clause G.5.2.4 is invoked repeatedly until none of the following conditions are true: - The number of pictures in the DPB that are marked as "needed for output” is greater than a value defined by a syntax “sps_max_num_reorder_pics” [HighestTid].
- the values of the syntax“sps_max_num_reorder_pics” for different layers might be different if e.g. the GOP structure is chosen differently. But if output both simultaneously, it would be desirable to achieve some kind of synchronization (where possible through dif- ferent framerates). Therefore, in another embodiment, similar as discussed above for the maximum number of pictures in the DPB, indication of several syntaxes“s ps_m ax_n u m_re- order_pics” corresponding to when a layer is individually output and when a combination of layers is output is done in the parameter sets, e.g. SPS or VPS. Alternatively, it is indicated that a derivation is possible where, for instance, the values indicated are multiplied by the ration of GOP sizes or framerates.
- Spatial location signaling for layered coding
- different areas of a picture are coded in separate layers (e.g. 360 degree video, e.g., an immersive or spatial video, use case as described above)
- Fig. 10 An example is shown in Fig. 10 where 12 layers (L0 to L11) are multiplexed within a single bitstream and 3 different outputs are illustrated: decoder decodes all layers, decoder de- codes the four layers L5, L6, L9 and L10; and decoder decodes four layers 12, L3, L5 and L6.
- mapping table i.e. , mapping information
- layers e.g. indicated by a value of a syntax“layerjd”
- a two-di- mensional spatial coordinate system is created, for which each of the x- and y- coordinates are units of luma samples.
- x- and y-positions in this coordinate system are transmitted.
- the x- and y- coordinates are subsampled, and only the subsampled coordinates and the subsampling ratio is transmitted.
- the lo- cation information could be sent as part of the Video Parameter Set (VPS) or a Supple- mental Enhancement Information (SEI) message.
- VPS Video Parameter Set
- SEI Supple- mental Enhancement Information
- Fig. 11 depicts a syntax diagram.
- the VPS indicates a syntax“layer_spatial_location_enabling_flag” and nec- essary parameters to orderly output the pictures.
- a maximum size of the grid i.e., an output scene area, could be specified as indicated in Fig. 12 indicated with reference number 100.
- the separate layer might have timing information that indicates when to be output and such an information could be used for a post-processor to synchronize all the picture of all the decoded regions
- typical devices are not able to synchronized those as separate decoders might have some drift in timing. Therefore, another embodiment is a decoder that decodes separately all independent layer but generates a single output picture for several independent decoded layer following the position indicated in a parameter set, e.g. as shown in Figs. 11 and 12. Special handling for specific 360 formats
- each cube face can be coded in a different layer. This avoids filling up the picture area to a rectangular format and thus coding of “empty” regions.
- Fig. 13 shows an example signaling of cube map faces to layer ids.
- the map of cube map faces to layer IDs can be transmitted to the decoder/display process as an SEI message.
- the invention in the previous section or in (previous invention about Virtual layers) relate to multiple regions being decoded either in a single-layer mode or multi-layer mode respec- tively, that have in common that a single output picture per access unit is generated for more than one syntax“layerjd” value. In both cases, a picture is composed of several regions that have different syntax“layerjd” value in their respective NAL units.
- the pictures output from the decoder require some post-processing for display, e.g. frame-packed 3D content or 360° projected content.
- SEI messages are used to indicate the transformations/post-processing step required to be performed for such output pictures in order to be displayed.
- the SEI message that is applicable to the output picture is identified by using the same layerjd (target layer) as the output picture.
- SHVC Scala- bility extension of HVEC
- there is a single output picture so only the SEI messages corre- sponding to the target syntax“layerjd” are taken into consideration.
- more than one output picture might be output (e.g., one per view), and, therefore SEI message applicable to each layer is identified separately, associating each output picture of a layer with its corresponding SEI message with the same syntax“layerjd”. That is, each SEI message comprises a layer association information which indicates a group of layers the respective SEI message relates to.
- the invention herein described allows association of an SEI message to an output picture containing content in different regions belonging to different syntax“Iayerjd” values by other means than using the syntax“Iayerjd” of the respective SEI message.
- the information sent in a SEI message (e.g., description of samples and transformations to be performed on those) is layer agnostic.
- a SEI message e.g., description of samples and transformations to be performed on those
- a mapping to identify that it applies for the output pictures that results from decod- ing and joint outputting a certain set of layers with identified syntax“Iayerjd” values (see, for example, Fig. 16).
- the association could be done by indicating that a given syntax“Iayerjd” for SEI messages, which is different from those syntax“Iayerjd” values used for VCL NAL units, corresponds to a given set of layers when they are output together. For instance, this could be done into a parameter set such as the VPS (or in an SEI) as shown in Fig. 14. As shown in Fig. 14, the layer association information is comprised by a parameter set, i.e., VPS 198, indicates the association, i.e., syntax 200 indicated in Fig. 14 for each of a plurality of SEI messages.
- the VPS 198 relates to a certain layer and includes a group participation flag, i.e., a syntax indicating a flag“layer_group_for_sing!e_output_enabled_flag” 204 indicating whether the respective layer is comprised by any of the groups of layers which any of SEI message relates to.
- the association 200 includes an identifier, i.e., a syntax“layers Jd” 202 indicates the group of layers which are output together.
- bitstream partition nesting SEI mes- sage that contains further SEI messages inside (something like an encapsulated SEI) that indicates the output layer set idx (index) to which it applies, e.g., shown as Fig. 16.
- the existing bitstream partition nesting SEf message does not include means to indicate that the contained SEI messages apply to the joint-output into a single picture, so in a further embodiment such a container SEI would indicated the group of layers that are considered, either a list of syntax“layerjds” (see Fig. 15) or an output layer idx, and a flag indicating that the SEI message included apply to a joint output picture.
- the following embodiment consists of taking all SEI messages with a syntax “layerjd” equal to the layer ids that are used for decoding and applying those. But in addi- tion, the spatial location of each of the syntax“iayerjds” is provided to the Tenderer by means of an additional“combining” SEI message, for instance, or VUI (Video Usability In- formation) in a parameter set. And the locations of samples signaled (or that apply) for each of the SEI messages of a given syntax layerjd” are treated as delta to the locations of the syntax“Iayerjds” in the composed picture.
- the layer association information may indicate whether offsets are applied or not as shown in Fig. 17.
- the offset could be indicated, for example, in a parameter set, such as in the VPS or in an SEI as shown in Fig. 18.
- aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
- Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, one or more of the most important method steps may be executed by such an apparatus.
- the inventive data stream can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
- embodiments of the application can be implemented in hardware or in software.
- the implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable con- trol signals stored thereon, which cooperate (or are capable of cooperating) with a program- mable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
- Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
- embodiments of the present application can be implemented as a computer pro- gram product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer.
- the program code may, for example, be stored on a machine readable carrier.
- inventions comprise a computer program for performing one of the methods de- scribed herein, stored on a machine readable carrier.
- an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the com- puter program runs on a computer.
- a further embodiment of the inventive methods is, therefore, a data carrier (or a digital stor- age medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
- the data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
- a further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein.
- the data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example via the internet.
- a further embodiment comprises a processing means, for example a computer, or a pro- grammable logic device, configured to or adapted to perform one of the methods described herein.
- a processing means for example a computer, or a pro- grammable logic device, configured to or adapted to perform one of the methods described herein.
- a further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
- a further embodiment according to the invention comprises an apparatus or a system con- figured to transfer (for example, electronically or optically) a computer program for perform- ing one of the methods described herein to a receiver.
- the receiver may, for example, be a computer, a mobile device, a memory device or the like.
- the apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
- a programmable logic device for example a field programmable gate array
- a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein.
- the methods are preferably performed by any hardware apparatus.
- the apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
- the apparatus described herein, or any components of the apparatus described herein may be implemented at least partially in hardware and/or in software.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Layered Video data stream comprising several layers into which different Video content is encoded independently from each other, wherein the layered Video data stream comprises: decoded picture buffer, DPB, Information indicating, for a predetermined layer, first DPB management Parameters for isolatedly decoding the predetermined layer from the layered Video data stream, and second DPB management Parameters for joint decoding the predetermined layer from the layered Video data stream along with at least one further layer associated with the predetermined layer.
Description
Layered video data stream
The present application is concerned with layered video data stream and its coding/decod- ing.
Layered media coding technologies such as Scalable Video Coding (SVC) or Multiview Video Coding (MVC) generate video bit-streams with various media layers, each of them representing another level of quality. Due to inter-layer prediction there exist hierarchies between these media layers, where media layers depend on other media layers for suc- cessful decoding.
SVC is one of the video compression standards and standardizes the encoding of a high- quality video bitstream that also contains one or more subset bitstreams (a form of layered coding). A subset video bitstream is derived by dropping packets from the larger video to reduce the bandwidth required for the subset bitstream. The subset bitstream can represent a lower spatial resolution (smaller screen), lower temporal resolution (lower frame rate), or lower quality video signal.
MVC is a stereoscopic video coding standard for video compression that allows for the efficient encoding of video sequences captured simultaneously from multiple camera angles in a single video stream. MVC is based on the idea that video recordings of the same scene from multiple angles share many common elements. It is possible to encode all simultane- ous frames captures in the same elementary stream and to share as much information as possible across the different layers. This can reduce size of the encoded video.
The object of the subject-matter of the present application is to provide a layered video data stream which comprises information how to output the video content from the layered video data stream including more than one video content.
This object is achieved by the subject-matter of the claims of the present application.
In accordance with one of embodiments of the present application, a layered video data stream comprising several layers into which different video content is encoded inde- pendently from each other, wherein the layered video data stream comprises: decoded pic- ture buffer, DPB, information indicating, for a predetermined layer, first DPB management parameters for isolatedly decoding the predetermined layer from the layered video data stream, and second DPB management parameters for joint decoding the predetermined
layer from the layered video data stream along with at least one further layer associated with the predetermined layer. That is, in case framerate of each layer is not aligned, i.e., different framerate is mixed, the DPB management parameters indicate a relationship be- tween decoded pictures and an output picture. Therefore, it is possible to output the de- coded pictures even if the video data stream includes multi-layered coded video content and single layered coded video content.
In accordance with one of embodiments of the present application, a layered video data stream comprising several layers into which different video content is encoded inde- pendently from each other, wherein the layered video data stream comprises: access unit, AU, delimiters indicative of, and positioned at borders between, consecutive AUs of the layered video data stream, and a signaling indicating whether the AUs between the AU delimiters contain a picture of one layer exclusively, or co-temporal pictures of the several layers, e.g., group together pictures of the several layers which coincide in picture output time, or individually carry the picture of the several layers which coincide in picture output time. Therefore, it is possible to output the video content at appropriate timing.
In accordance with one of embodiments of the present application, a layered video data stream comprising several layers into which a plurality of sub-videos is encoded inde- pendently from each other, wherein the layered video data stream comprises: mapping in- formation indicating a mapping of the sub-videos onto different sub-regions of an output scene area. Therefore, it is possible to efficiently output, for example, 360 degree video by using the mapping information indicating the spatial location of the output picture.
In accordance with one of embodiments of the present application, a layered video data stream comprising several layers into which different video content is encoded, wherein the layered video data stream comprises: supplemental enhancement information, SEI, mes- sages, which relate to different groups of layers out of the several layers, and layer associ- ation information which informs on an association of each SEI message to a group of layers which same relates to. That is, for example, in case of the 3D content, a picture belongs to a region of an output content is signaled by using several layers, i.e., a group of layers containing the picture which must output together, is indicated by the SEI message, and, therefore, the multi-layered content is simply and efficiently output by using SEI message.
In accordance with one of embodiments of the present application, an encoder configured to encode different video content into layers of a layered video data stream independently
from each other, and provide the layered video data stream comprising, e.g., an indicator indicating a video data signaling condition, i.e., indicating data set belongs to a picture, and/or, belongs to a region consisting a picture. Therefore, at a decoder, which receives the encoded video data stream from the encoder, it is possible to output the layered video con- tent without increasing complexity of decoding process.
Fig. 1 shows a block diagram of an apparatus for predictively encoding a video as an example for a video encoder where a layered video data stream according to embodiments of the present application could be encoded;
Fig. 2 shows a block diagram of an apparatus for predictively decoding a video, which fits to the apparatus of Fig. 1 , as an example for a video decoder where a layered video data stream according to embodiments of the present appli- cation could be decoded;
Fig. 3 shows a schematic diagram illustrating an example for a relationship be- tween a prediction residual signal, a prediction signal and a reconstructed signal so as to illustrate possibilities of setting subdivisions for defining the prediction signal, handling the prediction residual signal and the like, respec- tively;
Fig. 4 shows a schematic illustration for a relationship between access units, AUs, and layers and its corresponding pictures according to layered video;
Fig. 5 shows a schematic illustration in case different framerate layers are included in the layered video data stream;
Fig. 6 shows a schematic diagram indicating an example for an extension of the video parameter set, VPS, according to embodiments of the present applica- tion;
Fig. 7 shows a diagram indicating a further example for an extension of the video parameter set, VPS, according to embodiments of the present application;
Fig. 8 shows a schematic illustration an example indicating a decoded picture buffer, DPB, according to embodiments of the present application;
Fig. 9 (a), (b) shows a schematic illustration an example indicating a required maximum number of the pictures to be stored in the DPB according to embodiments of the present application;
Fig. 10 shows a schematic illustration an example indicating a relationship between the layers in the layered video data stream according to embodiments of the present application;
Fig. 1 1 shows a diagram indicating an example for mapping information in form of video parameter set according to embodiments of the present application;
Fig. 12 shows a diagram indicating a further example for mapping information in form of video parameter set according to embodiments of the present application;
Fig. 13 shows a diagram indicating an example for mapping information in case an output scene is a cube map according to embodiments of the present appli- cation;
Fig. 14 shows a diagram indicating an example for supplemental enhancement in- formation, SEI, message including layer association information in form of the video parameter set according to embodiments of the present applica- tion;
Fig. 15 shows a diagram indicating a further example for SEI message including the layer association information according to embodiments of the present appli- cation;
Fig. 16 shows a schematic illustration indicating an example for SEI message includ- ing the layer association information indicating rendering parameters accord- ing to embodiments of the present application;
Fig. 17 shows a schematic illustration indicating an example for SEI message includ- ing the layer association information indicating an application of offsets ac- cording to embodiments of the present application; and
Fig. 18 shows diagram indicating an example for SEI message including the layer association information indicating an application of offsets in form of video parameter set according to embodiments of the present application.
The following description of the figures starts with a presentation of a description of video encoder and video decoder of a block-based predictive codec for coding pictures of a video in order to form an example for a coding framework into which embodiments for a layered video data stream codec may be built in. The video encoder and video decoder are de- scribed with respect to Figs 1 to 3. Thereinafter the description of embodiments of the con- cept of the layered video data stream codec of the present application are presented along with a description as to how such concepts could be built into the video encoder and de- coder of Figs. 1 and 2, respectively, although the embodiments subsequently described, may also be used to form video encoder and video decoders not operating according to the coding framework underlying the video encoder and video decoder of Figs. 1 and 2.
Fig. 1 shows a block diagram of an apparatus for predictiveiy coding a video as an example for a video decoder where a motion compensated prediction for inter-predicted blocks ac- cording to embodiments of the present application could be implemented. That is, Fig. 1 shows an apparatus for predictiveiy coding a video 11 composed of a sequence of pictures 12 into a data stream 14. Block-wise predictive coding is used to this end. Further, trans- form-based residual coding is exemp!arily used. The apparatus, or encoder, is indicated using reference sign 10.
Fig. 2 shows a block diagram of an apparatus for predictiveiy decoding a video as an ex- ample for a video decoder where a motion compensated prediction for inter-predicted blocks according to embodiments of the present application could be implemented. That is, Fig. 2 shows a corresponding decoder 20, i.e. an apparatus 20 configured to predictiveiy decode the video 1 T composed of pictures 12’ in picture blocks from the data stream 14, also here exempiarily using transform-based residual decoding, wherein the apostrophe has been used to indicate that the pictures 12’ and video 1 T, respectively, as reconstructed by decoder 20 deviate from pictures 12 originally encoded by apparatus 10 in terms of cod- ing loss introduced by a quantization of the prediction residual signal. Fig. 1 and Fig. 2 exempiarily use transform based prediction residual coding, although embodiments of the present application are not restricted to this kind of prediction residual coding. This is true for other details described with respect to Fig. 1 and 2, too, as will be outlined hereinafter.
The encoder 10 is configured to subject the prediction residual signal to spatial-to-spectral transformation and to encode the prediction residual signal, thus obtained, into the data stream 14. Likewise, the decoder 20 is configured to decode the prediction residual signal from the data stream 14 and subject the prediction residual signal thus obtained to spectral- to-spatial transformation.
Internally, the encoder 10 may comprise a prediction residual signal former 22 which gen- erates a prediction residual 24 so as to measure a deviation of a prediction signal 26 from the original signal, i.e. video 1 1 or a current picture 12. The prediction residual signal former 22 may, for instance, be a subtractor which subtracts the prediction signal from the original signal, i.e. current picture 12. The encoder 10 then further comprises a transformer 28 which subjects the prediction residual signal 24 to a spatial-to-spectral transformation to obtain a spectral-domain prediction residual signal 24’ which is then subject to quantization by a quantizer 32, also comprised by encoder 10. The thus quantized prediction residual signal 24” is coded into data stream 14. To this end, encoder 10 may optionally comprise an en- tropy coder 34 which entropy codes the prediction residual signal as transformed and quan- tized into data stream 14. The prediction residual 26 is generated by a prediction stage 36 of encoder 10 on the basis of the prediction residual signal 24” decoded into and decodable from, data stream 14. To this end, the prediction stage 36 may internally, as is shown in Fig. 6, comprise a dequantizer 38 which dequantizes prediction residual signal 24” so as to gain spectral-domain prediction residual signal 24'”, which corresponds to signal 24’ except for quantization loss, followed by an inverse transformer 40 which subjects the latter pre- diction residual signal 24’” to an inverse transformation, i.e. a spectral-to-spatial transfor- mation, to obtain prediction residual signal 24’”’, which corresponds to the original prediction residual signal 24 except for quantization loss. A combiner 42 of the prediction stage 36 then recombines, such as by addition, the prediction signal 26 and the prediction residual signal 24”” so as to obtain a reconstructed signal 46, i.e. a reconstruction of the original signal 12. Reconstructed signal 46 may correspond to signal 12’.
A prediction module 44 of prediction stage 36 then generates the prediction signal 26 on the basis of signal 46 by using, for instance, spatial prediction, i.e. intra prediction, and/or temporal prediction, i.e. inter prediction. Details in this regard are described in the following.
Likewise, decoder 20 may be internally composed of components corresponding to, and interconnected in a manner corresponding to, prediction stage 36. In particular, entropy decoder 50 of decoder 20 may entropy decode the quantized spectral-domain prediction
residual signal 24" from the data stream, whereupon dequantizer 52, inverse transformer 54, combiner 56 and prediction module 58, interconnected and cooperating in the manner described above with respect to the modules of prediction stage 36, recover the recon- structed signal on the basis of prediction residual signal 24” so that, as shown in Fig. 3, the output of combiner 56 results in the reconstructed signal, namely the video 1 1 'or a current picture 12’ thereof.
Although not specifically described above, it is readily clear that the encoder 10 may set some coding parameters including, for instance, prediction modes, motion parameters and the like, according to some optimization scheme such as, for instance, in a manner optimiz- ing some rate and distortion related criterion, i.e. coding cost, and/or using some rate con- trol. As described in more details below, encoder 10 and decoder 20 and the corresponding modules 44, 58, respectively, support different prediction modes such as intra-coding modes and inter-coding modes which form a kind of set or pool of primitive prediction modes based on which the predictions of picture blocks are composed in a manner described in more detail below. The granularity at which encoder and decoder switch between these prediction compositions may correspond to a subdivision of the pictures 12 and 12', respec- tively, into blocks. Note that some of these blocks may be blocks being solely intra-coded and some blocks may be blocks solely being inter-coded and, optionally, even further blocks may be blocks obtained using both intra-coding and inter-coding, but details are set-out hereinafter. According to intra-coding mode, a prediction signal for a block is obtained on the basis of a spatial, already coded/decoded neighborhood of the respective block. Several intra-coding sub-modes may exist the selection among which, quasi, represents a kind of infra prediction parameter. There may be directional or angular intra-coding sub-modes ac- cording to which the prediction signal for the respective block is filled by extrapolating the sample values of the neighborhood along a certain direction which is specific for the respec- tive directional intra-coding sub-mode, into the respective block. The intra-coding sub- modes may, for instance, also comprise one or more further sub-modes such as a DC cod- ing mode, according to which the prediction signal for the respective block assigns a DC value to all samples within the respective block, and/or a planar intra-coding mode accord- ing to which the prediction signal of the respective block is approximated or determined to be a spatial distribution of sample values described by a two-dimensional linear function over the sample positions of the respective block with deriving tilt and offset of the plane defined by the two-dimensional linear function on the basis of the neighboring samples. Compared thereto, according to inter-prediction mode, a prediction signal for a block may be obtained, for instance, by temporally predicting the block inner. For parametrization of
an inter-prediction mode, motion vectors may be signaled within the data stream, the motion vectors indicating the spatial displacement of the portion of a previously coded picture of the video 1 1 at which the previously coded/decoded picture is sampled in order to obtain the prediction signal for the respective block. This means, in addition to the residual signal coding comprised by data stream 14, such as the entropy-coded transform coefficient levels representing the quantized spectral-domain prediction residual signal 24", data stream 14 may have encoded thereinto prediction related parameters for assigning to the blocks pre- diction modes, prediction parameters for the assigned prediction modes, such as motion parameters for inter-prediction modes, and, optionally, further parameters which control a composition of the final prediction signal for the blocks using the assigned prediction modes and prediction parameters as will be outlined in more detail below. Additionally, the data stream may comprise parameters controlling and signaling the subdivision of picture 12 and 12’, respectively, into the blocks. The decoder 20 uses these parameters to subdivide the picture in the same manner as the encoder did, to assign the same prediction modes and parameters to the blocks, and to perform the same prediction to result in the same prediction signal.
Fig. 3 shows a schematic diagram illustrating an example for a relationship between a pre- diction residual signal, a prediction signal and a reconstructed signal so as to illustrate pos- sibilities of setting subdivisions of defining the prediction signal, handling the prediction re- sidual signal and the like, respectively. That is, Fig. 3 illustrates the relationship between the reconstructed signal, i.e. the reconstructed picture 12’, on the one hand, and the com- bination of the prediction residual signal 24”” as signaled in the data stream, and the pre- diction signal 26, on the other hand. As already denoted above, the combination may be an addition. The prediction signal 26 is a subdivision of the picture area into blocks 80 of vary- ing size, although this is merely an example. The subdivision may be any subdivision, such as a regular subdivision of the picture area into rows and columns of blocks, or a multi-tree subdivision of picture 12 into leaf blocks of varying size, such as a quadtree subdivision or the like, wherein a mixture thereof where the picture area is firstly subdivided into rows and columns of tree-root blocks which are then further subdivided in accordance with a recursive multi-tree subdivisioning to result into blocks 80.
Output and post-process hints for layered media
Introduction TBD
1. Non-aligned framerate mixtures
Layered video coding has been in the past used for SNR, Spatial and temporal scalability or combinations thereof. Temporal scalability with unequal framerates (frame rates) is known in the art, e.g. as in AVC or HEVC. Typically, for such scenarios, where SNR or Spatial scalability is combined with temporal scalability, every N-th Access Unit, where N is the framerate ratio between two layers, contains one more picture belonging to a higher layer compared to other AUs. For instance, as shown in Fig. 4, which shows an example of layered video data stream according to the prior art, the relationship between AUs (access units) and layers, e.g., layer 0 and layer 1 , and its corresponding pictures is shown for a case where the enhancement layer (layer 1 ) framerate is double of the framerate of the base layer (layer 0).
Having picture of more than one layer into a single AU (i.e. grouping more than one picture of one layer together as an AU) allows treating them as a single construct, i.e. providing the same timing information (e.g. cpb, coded picture buffer, removal time) when decoding both layers, adding a single Access Unit Delimiter (AUD) NAL, Network Abstraction Layer, unit for detection of when an AU begins or ends, etc. Such an approach is very helpful for typical scenarios considered in the past.
However, there exist new scenarios where the same does not necessarily apply. When different video contents (e.g. geometry and texture of a volumetric video object) are to be encoded separately but multiplexed into a single bitstream, it could be that the unequal framerates of the contents are even also unaligned, e.g. the higher framerate is not a mul- tiple of the lower framerate such as is the case for framerate pairs 10 fps for geometry and 29 fps for texture. An example is shown in Fig. 5 with framerates for which every 8th or 10th picture the pictures have the same timing.
For such cases in former codecs, pictures from the independently coded layers that ran- domly fall into the same output time slot where bundles together into an access unit. This logical bundling comes with additional constraints, e.g. that pictures need to have a common POC (Picture Order Count) value or even the same GOP (Group Of Pictures) size or a common value of a syntax“max_dec_pic_buffering” (respective syntax“max_dec_pic_buff- ering” multiples of each or to reflect different framerates but aligned GOPs), which alto- gether unnecessarily constraints the encoding of these separate video contents. Addition- ally, for such use cases, one layer might be consumed together with another layer, or
independently, or with another layer together, leading to a high number of permutations if the number of layers increases. Therefore, such a bundle construct (considering AUs cross- layer) might become more complex when abandoning simple scalable use-cases as the typical one depicted in Fig. 4.
Therefore, the invention is to, in certain cases, not bundle pictures from two layers into a joint access unit, regardless of the output timing, and handle the bumping process layer- wise. In other words, irrespective of whether two pictures of the bitstream have the same output time, e.g. same POC value of same syntax“dpbjemoval” time in an SEI (Supple- mental Enhancement Information), there might not be necessarily grouped as a single AU within or the boundaries not only picture with different time will be signaled but also bound- aries of picture with the same output time but belonging to different layers.
The first simple implication and embodiment is that AUD NAL units may for some cases mean that the boundary of cross-layer AU is indicated (1 ), and for some use-cases that only a specific layer is considered to be within an AU and therefore the presence of the AUD NAL unit indicates that the boundary of a picture of one layer is indicated (2).
Such a meaning of an AUD NAL unit could be signaled within a parameter set, that indicates that AUs contain pictures of a group of layers or AU (1 ) or that an AU contains only a picture for a single layer (2). Such a signaling would be added for instance in the SPS (Sequence Parameter Set) or VPS (Video Parameter Set). In the following example, an extension of the VPS is shown in Fig. 6. As indicated in Fig. 6, a syntax“au_multi_layer_flag” indicating that the AUs contain pictures of a group of layers is included in VPS.
The flag, when set in the parameter set, would imply the meaning of an AUD NAL unit. If the flag is set to 1 it would mean that the following AU has a different output time, e.g. a different POC value or a different syntax“dpb_removal” time at an SEI. If set to 0 it would mean that in addition to having a different output time, it could be the picture of another layer but with the same output time.
Alternatively, to avoid a parsing dependency on a parameter set, an AUD NAL unit could be extended and include a flag that indicates whether it applies to cross-layer AU (1) or single layer AU (2) as shown in Fig. 7.
Thus, each AUD NAL unit would indicate whether it covers multiple layers or it covers only one layer and each layer has its own AUD NAL unit. In this case, the bitstream may be constraint to have the same value in all AUD NAL units, disallowing having sometimes the flag to 1 indicating that an AU is considered as having multiple layers and sometimes indi- cating a per layer AU with the flag set to 0.
A further related aspect applies to a syntax“max_dec_pic_buffering_delay”, i.e., decoded picture buffer, DPB, information which is related to the selected GOP (group of pictures) size. Fig. 8 shows a relationship between decoded picture, a DPB (decoded picture buffer) and an output picture. In the case that framerates are completely different, and, therefore also the different layers are used only for multiplexing, i.e. there is no joint decoding con- sidered or AUs only refer to picture of a single layer, completely different and non-aligned GOP sizes could be chosen.
The maximum number of pictures within the DPB for a layer can be easily computed. How- ever, it may apply that a couple of layers should be output simultaneously when consumed together. Obviously, if synchronized output was desired, delaying the output of a picture of one layer to be able to synchronize it with a corresponding picture in another layer may be required, increasing thus the number of pictures in the DPB of that layer. On the other side, if layers are to be consumed independently, e.g. only one of it, a smaller value of the syntax “max_dec_pic_buffering_delay” would be considered, i.e. that of that layer should be taken into consideration.
Since the syntax“max_dec_pic_buffertng” actually refers to the maximum number of pic- tures that may be needed to be stored in the DPB. In another embodiment, more than one value is to be signaled for a given layer, e.g. one for the case that is consumed individually and another value indicating for a given combination of layers. Alternatively, an inheritance value would be indicated for the case that more than one layers are consumed together and need to be synchronized, e.g. taking the value of the other layer and multiplying it by the ratio of their corresponding framerates or GOP sizes. An example is explained as follows, where a framerate of 8 fps and 10 fps is considered. Fig. 9 (a) shows an example of a maximum number of pictures to be stored in the DPB for a case of 8 frame per second, and Fig. 9 (b) shows an example of a maximum number of pictures to be stored in the DPB for a case that framerate of 8 fps and 10 fps are mixed. Considering the situation shown in Fig. 9, with 4 frames in the DPB (for 8 fps, as shown in Fig. 9 (a)), when consumed together with another layer of 10 fps the layer would require 5 frames in the DPB (as shown in Fig. 9 (b)).
The maximum number of pictures in the DPB and the bumping process (output of pictures) are closely related. A syntax“sps_max_num_reorder_pics" is a value between 0 and the syntax“max_dec_pic_buffering” that steers the bumping process, i.e. the output of the pic- ture in the DPB.
According to clause C. 5.2.4 (Recommendation ITU-T H.265), it is defined that the“bumping process” consists of the following ordered steps:
1. The picture that is first for output is selected as the one having the smallest value of PicOrderCntVal of all pictures in the DPB marked as“needed for output”.
2. The picture is cropped, using the conformance cropping window specified in the active SPS for the picture, the cropped picture is output, and the picture is marked as“not needed for output”.
3. When the picture storage buffer that included in the picture that was cropped and output contains a picture marked as“unused for reference”, the picture storage buffer is emptied.
The syntax“sps_max_num_reorder_pics” controls the number of pictures in the DPB that are required before the bumping process starts, i.e., from HEVC: When one or more of the following conditions are true, the "bumping" process specified in clause G.5.2.4 is invoked repeatedly until none of the following conditions are true: - The number of pictures in the DPB that are marked as "needed for output" is greater than a value defined by a syntax “sps_max_num_reorder_pics” [HighestTid].
Therefore, the values of the syntax“sps_max_num_reorder_pics” for different layers might be different if e.g. the GOP structure is chosen differently. But if output both simultaneously, it would be desirable to achieve some kind of synchronization (where possible through dif- ferent framerates). Therefore, in another embodiment, similar as discussed above for the maximum number of pictures in the DPB, indication of several syntaxes“s ps_m ax_n u m_re- order_pics” corresponding to when a layer is individually output and when a combination of layers is output is done in the parameter sets, e.g. SPS or VPS. Alternatively, it is indicated that a derivation is possible where, for instance, the values indicated are multiplied by the ration of GOP sizes or framerates.
2. Spatial location signaling for layered coding
When different areas of a picture are coded in separate layers (e.g. 360 degree video, e.g., an immersive or spatial video, use case as described above), it is necessary to indicate to a decoder or post-decoding process, whether and how these areas relate to each other spatially.
An example is shown in Fig. 10 where 12 layers (L0 to L11) are multiplexed within a single bitstream and 3 different outputs are illustrated: decoder decodes all layers, decoder de- codes the four layers L5, L6, L9 and L10; and decoder decodes four layers 12, L3, L5 and L6.
For this purpose, a mapping table (i.e. , mapping information) for layers (e.g. indicated by a value of a syntax“layerjd”) to a spatial location is transmitted in the bitstream. A two-di- mensional spatial coordinate system is created, for which each of the x- and y- coordinates are units of luma samples. For each of the coded layers, x- and y-positions in this coordinate system are transmitted.
Optionally, the x- and y- coordinates are subsampled, and only the subsampled coordinates and the subsampling ratio is transmitted. In WC (Versatile Video Coding) context, the lo- cation information could be sent as part of the Video Parameter Set (VPS) or a Supple- mental Enhancement Information (SEI) message.
An example implementation is shown in Fig. 11 which depicts a syntax diagram. As indi- cated in Fig. 11 , the VPS indicates a syntax“layer_spatial_location_enabling_flag” and nec- essary parameters to orderly output the pictures.
Additionally, a maximum size of the grid, i.e., an output scene area, could be specified as indicated in Fig. 12 indicated with reference number 100.
Although, the separate layer might have timing information that indicates when to be output and such an information could be used for a post-processor to synchronize all the picture of all the decoded regions, typical devices are not able to synchronized those as separate decoders might have some drift in timing. Therefore, another embodiment is a decoder that decodes separately all independent layer but generates a single output picture for several independent decoded layer following the position indicated in a parameter set, e.g. as shown in Figs. 11 and 12.
Special handling for specific 360 formats
Cube map projection indication
In 360 degree video using cube map projection, each cube face can be coded in a different layer. This avoids filling up the picture area to a rectangular format and thus coding of “empty” regions.
The display process needs to the mapping of faces to different layer. Fig. 13 shows an example signaling of cube map faces to layer ids. The map of cube map faces to layer IDs can be transmitted to the decoder/display process as an SEI message.
3. Applicability of SEI to the composed output picture
The invention in the previous section or in (previous invention about Virtual layers) relate to multiple regions being decoded either in a single-layer mode or multi-layer mode respec- tively, that have in common that a single output picture per access unit is generated for more than one syntax“layerjd” value. In both cases, a picture is composed of several regions that have different syntax“layerjd" value in their respective NAL units.
In some applications, the pictures output from the decoder require some post-processing for display, e.g. frame-packed 3D content or 360° projected content. In such cases, typically, SEI messages are used to indicate the transformations/post-processing step required to be performed for such output pictures in order to be displayed.
In the multi-layer case, the SEI message that is applicable to the output picture (target layer) is identified by using the same layerjd (target layer) as the output picture. In SHVC (Scala- bility extension of HVEC), there is a single output picture so only the SEI messages corre- sponding to the target syntax“layerjd” are taken into consideration. In case of 3D exten- sions, more than one output picture might be output (e.g., one per view), and, therefore SEI message applicable to each layer is identified separately, associating each output picture of a layer with its corresponding SEI message with the same syntax“layerjd". That is, each SEI message comprises a layer association information which indicates a group of layers the respective SEI message relates to.
As in the discussed use-case, not only a syntax“iayerjd" is used to identify the decoded picture but a set of syntax“layerjds”, without any implicit or explicit hierarchy, determining the post-processing information that need to be used cannot be done simply by using a syntax“iayerjd" value.
The invention herein described allows association of an SEI message to an output picture containing content in different regions belonging to different syntax“Iayerjd” values by other means than using the syntax“Iayerjd” of the respective SEI message.
On a first embodiment, the information sent in a SEI message (e.g., description of samples and transformations to be performed on those) is layer agnostic. However, such an SEI requires a mapping to identify that it applies for the output pictures that results from decod- ing and joint outputting a certain set of layers with identified syntax“Iayerjd” values (see, for example, Fig. 16).
The association could be done by indicating that a given syntax“Iayerjd” for SEI messages, which is different from those syntax“Iayerjd” values used for VCL NAL units, corresponds to a given set of layers when they are output together. For instance, this could be done into a parameter set such as the VPS (or in an SEI) as shown in Fig. 14. As shown in Fig. 14, the layer association information is comprised by a parameter set, i.e., VPS 198, indicates the association, i.e., syntax 200 indicated in Fig. 14 for each of a plurality of SEI messages. The VPS 198 relates to a certain layer and includes a group participation flag, i.e., a syntax indicating a flag“layer_group_for_sing!e_output_enabled_flag” 204 indicating whether the respective layer is comprised by any of the groups of layers which any of SEI message relates to. The association 200 includes an identifier, i.e., a syntax“layers Jd” 202 indicates the group of layers which are output together.
Another option would be to use a construct similar to a bitstream partition nesting SEI mes- sage that contains further SEI messages inside (something like an encapsulated SEI) that indicates the output layer set idx (index) to which it applies, e.g., shown as Fig. 16. Currently, the existing bitstream partition nesting SEf message does not include means to indicate that the contained SEI messages apply to the joint-output into a single picture, so in a further embodiment such a container SEI would indicated the group of layers that are considered, either a list of syntax“layerjds” (see Fig. 15) or an output layer idx, and a flag indicating that the SEI message included apply to a joint output picture.
Alternatively, the following embodiment consists of taking all SEI messages with a syntax “layerjd” equal to the layer ids that are used for decoding and applying those. But in addi- tion, the spatial location of each of the syntax“iayerjds” is provided to the Tenderer by means of an additional“combining” SEI message, for instance, or VUI (Video Usability In- formation) in a parameter set. And the locations of samples signaled (or that apply) for each of the SEI messages of a given syntax layerjd” are treated as delta to the locations of the syntax“Iayerjds” in the composed picture.
In addition, the layer association information may indicate whether offsets are applied or not as shown in Fig. 17. The offset could be indicated, for example, in a parameter set, such as in the VPS or in an SEI as shown in Fig. 18.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, one or more of the most important method steps may be executed by such an apparatus.
The inventive data stream can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the application can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable con- trol signals stored thereon, which cooperate (or are capable of cooperating) with a program- mable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present application can be implemented as a computer pro- gram product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may, for example, be stored on a machine readable carrier.
Other embodiments comprise a computer program for performing one of the methods de- scribed herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the com- puter program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital stor- age medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example via the internet.
A further embodiment comprises a processing means, for example a computer, or a pro- grammable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system con- figured to transfer (for example, electronically or optically) a computer program for perform- ing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or ail of the functionalities of the methods de- scribed herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are preferably performed by any hardware apparatus.
The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer. The apparatus described herein, or any components of the apparatus described herein, may be implemented at least partially in hardware and/or in software.
Claims
1. Layered video data stream comprising several layers into which different video con- tent is encoded independently from each other, wherein the layered video data stream comprises: decoded picture buffer, DPB, information indicating, for a predetermined layer, first DPB management parameters for isolatedly decoding the predetermined layer from the layered video data stream, and second DPB management parameters for joint decoding the predetermined layer from the layered video data stream along with at least one further layer associated with the predetermined layer.
2. Layered video data stream according to claim 1 , wherein the DPB information in- cludes the first DPB management parameters for each layer of the several layers.
3. Layered video data stream according to claim 1 or 2, wherein the first DPB manage- ment parameters indicate a maximum number of pictures to be stored in a decoded picture buffer, DPB, and/or a minimum number of required pictures before com- mencing outputting the video content decoded from the predetermined layer.
4. Layered video data stream according to any one of claims 1 to 3, wherein the DPB information includes one instantiation of the second DPB management parameters for each of a plurality of combinations of the predetermined layer with one or more further layers of the several layers.
5. Layered video data stream according to any one of preceding claims, wherein the DPB information includes the second DPB management parameters for each layer of the several layers and, for each layer, the DPB information includes one instanti- ation of the second DPB management parameters for the respective layer for each of a plurality of combinations of the respective layer with one or more further layers of the several layers.
6. Layered video data stream according to any one of preceding claims, wherein the second DPB management parameters indicate an inheritance value using which a maximum number of pictures to be stored in a decoded picture buffer, DPB, and/or a minimum number of required pictures before commencing outputting the video
content decoded from the predetermined layer and the at least one further layer associated with the predetermined layer is derivable from the first DPB management parameters for isolatedly decoding the predetermined layer from the layered video data stream.
7. Layered video data stream according to claim 6, wherein the inheritance value indi- cates a frame rate or a group of pictures, GOP, size of the predetermined layer relating to a reduced data stream derived from the layered video data stream by removing all layers except the predetermined layer and the at least one further layer associated with the predetermined layer for a joint decoding thereof, and the layered video data stream further indicates the frame rate or GOP size for each of the at least one further layer associated with the predetermined layer, so that the maximum number of pictures to be stored in a decoded picture buffer, DPB, and/or the mini- mum number of required pictures before commencing outputting the video content decoded from the predetermined layer and the at least one further layer associated with the predetermined layer is derivable from the first DPB management parame- ters for isolatedly decoding the predetermined layer from the layered video data stream using a ratio between the frame rates or GOP sizes of the predetermined layer and the at least one layer associated with the predetermined layer.
8. Layered video data stream according to any one of preceding claims, wherein the layered video data stream comprises access unit, AU, delimiters indicative of, and positioned at border between consecutive AUs of the layered video data stream.
9. Layered video data stream according to claim 8, wherein each AU delimiter indicates whether an AU following the AU delimiter contains a picture of one layer exclusively, or co-temporal picture of the several layers.
10. Layered video data stream according to claim 8, wherein each AU delimiter depend- ing on the information indicated by a parameter set whether the AUs between the AU delimiter contains a picture of one layer exclusively, or co-temporal picture of the several layers.
11. Layered video data stream comprising several layers into which different video con- tent is encoded independently from each other, wherein the layered video data stream comprises:
access unit, AU, delimiters indicative of, and positioned at borders between, con- secutive AUs of the layered video data stream, and a signaling indicating whether the AUs between the AU delimiters contain a picture of one layer exclusively, or co- temporal pictures of the several layers.
12. Layered video data stream according to claim 11 , wherein each AU delimiter indi- cates whether an AU following the AU delimiter contains a picture of one layer ex- clusively, or co-temporal picture of the several layers.
13. Layered video data stream according to claim 11 , wherein the signaling is comprised by a parameter set of the layered video data stream which is valid for several AUs of the layered video data stream.
14. Layered video data stream comprising several layers into which a plurality of sub- videos is encoded independently from each other, wherein the layered video data stream comprises: mapping information indicating a mapping of the sub-videos onto different sub-re- gions of an output scene area.
15. Layered video data stream according to claim 14, wherein the mapping information indicates, for each of different subsets of sub-regions, a mapping of the sub-region of the respective subset onto different sub-region of an output scene area.
16. Layered video data stream according to claim 14 or 15, wherein the output scene area is a cube map of a cube map projection, and the sub-regions are face of the cube map.
17. Layered video data stream according to claim 14 or 15, wherein the output scene area is an output picture area subdivided into the sub-regions.
18. Layered video data stream according to 17, wherein the mapping information indi- cates a width and height (100) of the output scene area.
19. Layered video data stream comprising several layers into which different video con- tent is encoded, wherein the layered video data stream comprises: supplemental enhancement information, SEI, messages, which relate to different groups of layers out of the several layers, and layer association information which informs on an association of each SEI message to a group of layers which same relates to.
20. Layered video data stream according to claim 19, wherein each SEI message com- prises a layer identification table which identifies each of the group of layers the respective SEI message relates to.
21. Layered video data stream according to claim 19, wherein the layer association in- formation is comprised by a parameter set (198) of the layered video stream which indicates the association (200) for each of a plurality of SEI messages.
22. Layered video data stream according to claim 21 , wherein the parameter set (198) of the layered video data stream indicates the association (200) for each of a plurality of SEI messages by, for each layer of the several layers, a group participation flag (204) indicating whether the respective layer is comprised by any of the groups of layers which any of the SEI messages relates to, and, if so, at least one layer asso- ciation table which indicates a group ID of a group of layers by which the respective layer is comprised, and an identifier (202) for each layer of said group of layers which indicates the respective layer, wherein each SEI message of the plurality of SEI messages indicates the group ID of the group of layer which the respective SEI message relates to.
23. Layered video data stream according to claim 19, further comprising single layer SEI messages each relating to a single layer out of the several layers.
24. Layered video data stream according to claim 23, wherein each single layer SEI message comprises an indication whether the layer the respective single layer SEI message is comprised by any of the groups of layers which any of the SEI messages relates to.
25. Layered video data stream according to any one of claims 19 to 24, wherein the layer association information further indicates rendering parameters of the group of the layers the respective SEI message is related to.
26. Layered video data stream according to any one of claims 19 to 25, wherein the different video content is encoded independently from each other into the several layers.
27. Layered video data stream according to any one of claims 19 to 26, further com- prises mapping information indicating, for each group of layers, a mapping of sub- video independently encoded into the layers of the respective group of layers, onto different sub-regions of an output scene area.
28. Video encoder configured to encode different video content into layers of a layered video data stream inde- pendently from each other, and provide the layered video data stream comprising decoded picture buffer, DPB, in- formation indicating, for a predetermined layer, first DPB management parameters for isolatedly decoding the predetermined layer from the layered video data stream, and second DPB management parameters for joint decoding the predetermined layer from the layered video data stream along with at least one further layer asso- ciated with the predetermined layer.
29. Video encoder according to claim 28, wherein the DPB information includes the first DPB management parameters for each layer of the several layers.
30. Video encoder according to claim 28 or 29, wherein the first DPB management pa- rameters indicate a maximum number of pictures to be stored in a decoded picture buffer, DPB, and/or a minimum number of required pictures before commencing out- putting the video content decoded from the predetermined layer.
31. Video encoder according to any one of claims 28 to 30, wherein the DPB information includes one instantiation of the second DPB management parameters for each of
a plurality of combinations of the predetermined layer with one or more further layers of the several layers.
32. Video encoder according to any one of claims 28 to 31 , wherein the DPB information includes the second DPB management parameters for each layer of the several layers and, for each layer, the DPB information includes one instantiation of the sec- ond DPB management parameters for the respective layer for each of a plurality of combinations of the respective layer with one or more further layers of the several layers.
33. Video encoder according to any one of claims 28 to 32, wherein the second DPB management parameters indicate an inheritance value using which a maximum number of pictures to be stored in a decoded picture buffer, DPB, and/or a minimum number of required pictures before commencing outputting the video content de- coded from the predetermined layer and the at least one further layer associated with the predetermined layer is derivable from the first DPB management parame- ters for isolatedly decoding the predetermined layer from the layered video data stream.
34. Layered video data stream according to claim 33, wherein the inheritance value in- dicates a frame rate or a group of pictures, GOP, size of the predetermined layer relating to a reduced data stream derived from the layered video data stream by removing all layers except the predetermined layer and the at least one further layer associated with the predetermined layer for a joint decoding thereof, and the ... stream further indicates the frame rate or GOP size for each of the at least one further layer associated with the predetermined layer, so that the maximum number of pictures to be stored in a decoded picture buffer, DPB, and/or the minimum num- ber of required pictures before commencing outputting the video content decoded from the predetermined layer and the at least one further layer associated with the predetermined layer is derivable from the first DPB management parameters for iso- latedly decoding the predetermined layer from the layered video data stream using a ratio between the frame rates or GOP sizes of the predetermined layer and the at least one layer associated with the predetermined layer.
35. Video encoder according to any one of claims 28 to 34, wherein the layered video data stream comprises access unit, AU, delimiters indicative of, and positioned at border between consecutive AUs of the layered video data stream.
36. Video encoder according to claim 35, wherein each AU delimiter indicates whether an AU following the AU delimiter contains a picture of one layer exclusively, or co- temporal picture of the several layers.
37. Video encoder according to claim 35, wherein each AU delimiter depending on the information indicated by a parameter set whether the AUs between the AU delimiter contains a picture of one layer exclusively, or co-temporal picture of the several lay- ers.
38. Video encoder configured to encode different video content into layers of a layered video data stream inde- pendently from each other, and provide the layered video data stream comprising access unit, AU, delimiters indic- ative of, and positioned at borders between consecutive AUs of the layered video data stream, and a signaling indicating whether the AUs between the AU delimiters contain a picture of one layer exclusively, or co-temporal pictures of the several lay- ers.
39. Video encoder according to claim 38, wherein each AU delimiter indicates whether an AU following the AU delimiter contains a picture of one layer exclusively, or co- temporal picture of the several layers.
40. Video encoder according to claim 38, wherein the signaling is comprised by a pa- rameter set of the layered video stream which is valid for several AUs of the layered video data stream.
41. Video encoder configured to encode a plurality of sub-videos into layers of a layered video data stream inde- pendently from each other, and
provide the layered video data stream comprising mapping information indicating a mapping of the sub-videos onto different sub-regions of an output scene area.
42. Video encoder according to claim 41 , wherein the mapping information indicates, for each of different subsets of sub-regions, a mapping of the sub-region of the respec- tive subset onto different sub-region of an output scene area.
43. Video encoder according to claim 41 or 42, wherein the output scene area is a cube map of a cube map projection, and the sub-regions are face of the cube map.
44. Video encoder according to claim 41 or 42, wherein the output scene area is an output picture area subdivided into the sub-regions.
45. Video encoder according to claim 44, wherein the mapping information indicates a width and height (100) of the output scene area.
46. Video encoder configured to encode different video content into layers of a layered video data stream, and provide the layered video data stream comprising supplemental enhancement infor- mation, SEI, messages, which relate to different groups of layers out of the several layers, and layer association information which informs on an association of each SEI message to a group of layers which same relates to.
47. Video encoder according to claim 46, wherein each SEI message comprises a layer identification table which identifies each of the group of layers the respective SEI message relates to.
48. Video encoder according to claim 46, wherein the layer association information is comprised by a parameter set (198) of the layered video stream which indicates the association (200) for each of a plurality of SEI messages.
49. Video encoder according to claim 48, wherein the parameter set (198) of the layered video data stream indicates the association (200) for each of a plurality of SEI
messages by, for each layer of the several layers, a group participation flag (204) indicating whether the respective layer is comprised by any of the groups of layers which any of the SEI messages relates to, and, if so, at least one layer association table which indicates a group ID of a group of layers by which the respective layer is comprised, and an identifier (202) for each layer of said group of layers which indicates the respective layer, wherein each SEI message of the plurality of SEI messages indicates the group ID of the group of layer which the respective SEI message relates to.
50. Video encoder according to claim 46, further comprising single layer SEI messages each relating to a single layer out of the several layers.
51. Video encoder according to claim 50, wherein each single layer SEI message com- prises an indication whether the layer the respective single layer SEI message is comprised by any of the groups of layers which any of the SEI messages relates to.
52. Video encoder according to any one of claims 46 to 51 , wherein the layer association information further indicates rendering parameters of the group of the layers the re- spective SEI message is related to
53. Video encoder according to any one of claims 46 to 52, wherein the different video content is encoded independently from each other into the several layers.
54. Video encoder according to any one of claims 46 to 53, further comprises mapping information, for each group of layers, a mapping of sub-video independently en- coded into the layers of the respective group of layers, onto different sub-regions of an output scene area.
55. Video decoder configured to decode the layered video data stream according to any one of claims 1 to 10, wherein the decoder is configured to decode the different video content based on the DPB information.
56. Video decoder according to claim 55, wherein the DPB is configured to store layer of the layered video data stream independently from each other at a separate cell of a DPB.
57. Video decoder according to claim 55, wherein the second DPB management param- eters indicate an inheritance value, wherein the video decoder is configured to de- rive, using the inheritance value, a maximum number of pictures to be stored in a decoded picture buffer, DPB, and/or a minimum number of required pictures before commencing outputting the video content decoded from the predetermined layer and the at least one further layer associated with the predetermined layer from the first DPB management parameters for isolatedly decoding the predetermined layer from the layered video data stream.
58. Video decoder according to claim 57, wherein the inheritance value indicates a frame rate or a group of pictures, GOP, size of the predetermined layer relating to a reduced data stream derived from the layered video data stream by removing all layers except the predetermined layer and the at least one further layer associated with the predetermined layer for a joint decoding thereof, and the layered video data stream further indicates the frame rate or GOP size for each of the at least one further layer associated with the predetermined layer, wherein the video decoder is configured to derive, using the maximum number of pictures to be stored in a de- coded picture buffer, DPB, and/or the minimum number of required pictures before commencing outputting the video content decoded from the predetermined layer and the at least one further layer associated with the predetermined layer is derivable from the first DPB management parameters for isolatedly decoding the predeter- mined layer from the layered video data stream using a ratio between the frame rates or GOP sizes of the predetermined layer and the at least one layer associated with the predetermined layer.
59. Video decoder configured to decode the layered video data stream according to any one of claims 11 to 13, wherein the decoder is configured to decode the different video content based on the Au delimiters and the signaling.
60. Video decoder configured to decode the layered video data stream according to any one of claims 14 to 18, wherein the decoder is configured to decode the different video content based on the mapping information.
61. Video decoder according to claim 60, wherein the layered video data stream further comprises timing information indicating for each layer an output timing of the sub- video which is encoded into the respective layers,
wherein the decoder is configured to output the decoded video content based on the timing information.
62. Video decoder configured to decode the layered video data stream according to any one of claims 19 to 27, wherein the decoder is configured to decode the different video content based on the SEI message and the layer association information.
63. Video decoder according to claim 62, wherein the layered video data stream further comprises mapping information, for each group of layers, a mapping of sub-video independently encoded into the layers of the respective group of layers, onto differ- ent group of layers of an output scene area, wherein the video decoder is further configured to output the output scene area based on the mapping information.
64. Video decoder according to claim 63, further configured to identify the SEI message to be applied based on the mapping information.
65. A method for providing layered video data stream comprising several layers into which different video content is encoded independently from each other, wherein the layered video data stream comprises: decoded picture buffer, DPB, information indicating, for a predetermined layer, first DPB management parameters for isolatedly decoding the predetermined layer from the layered video data stream, and second DPB management parameters for joint decoding the predetermined layer from the layered video data stream along with at least one further layer associated with the predetermined layer.
66. A method for providing layered video data stream comprising several layers into which different video content is encoded independently from each other, wherein the layered video data stream comprises: access unit, AU, delimiters indicative of, and positioned at borders between, con- secutive AUs of the layered video data stream, and a signaling indicating whether
the AUs between the AU delimiters contain a picture of one layer exclusively, or co- temporal pictures of the several layers.
67. A method for providing layered video data stream comprising several layers into which a plurality of sub-videos is encoded independently from each other, wherein the layered video data stream comprises: mapping information indicating a mapping of the sub-videos onto different sub-re- gions of an output scene area.
68. A method for providing layered video data stream comprising several layers into which different video content is encoded, wherein the layered video data stream comprises: supplemental enhancement information, SEI, messages, which relate to different groups of layers out of the several layers, and layer association information which informs on an association of each SEI message to a group of layers which same relates to.
69. A computer program having a program code for performing, when running on com- puter, a method according to any one of claims 65 to 68.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19180392.3 | 2019-06-14 | ||
EP19180392 | 2019-06-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020249530A1 true WO2020249530A1 (en) | 2020-12-17 |
Family
ID=66951790
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2020/065887 WO2020249530A1 (en) | 2019-06-14 | 2020-06-08 | Layered video data stream |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2020249530A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150016545A1 (en) * | 2013-07-15 | 2015-01-15 | Qualcomm Incorporated | Decoded picture buffer operations for video coding |
US20150103884A1 (en) * | 2013-10-10 | 2015-04-16 | Qualcomm Incorporated | Signaling for sub-decoded picture buffer (sub-dpb) based dpb operations in video coding |
-
2020
- 2020-06-08 WO PCT/EP2020/065887 patent/WO2020249530A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150016545A1 (en) * | 2013-07-15 | 2015-01-15 | Qualcomm Incorporated | Decoded picture buffer operations for video coding |
US20150103884A1 (en) * | 2013-10-10 | 2015-04-16 | Qualcomm Incorporated | Signaling for sub-decoded picture buffer (sub-dpb) based dpb operations in video coding |
Non-Patent Citations (3)
Title |
---|
CHEN YING ET AL: "Overview of the MVC+D 3D video coding standard", JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, ACADEMIC PRESS, INC, US, vol. 25, no. 4, 4 April 2013 (2013-04-04), pages 679 - 688, XP028668087, ISSN: 1047-3203, DOI: 10.1016/J.JVCIR.2013.03.013 * |
CHOI (SAMSUNG) B ET AL: "MV-HEVC/SHVC HLS: Signalling decoded picture buffer size", no. JCTVC-O0136, 21 October 2013 (2013-10-21), XP030238465, Retrieved from the Internet <URL:http://phenix.int-evry.fr/jct/doc_end_user/documents/15_Geneva/wg11/JCTVC-O0136-v2.zip JCTVC-O0136_JCT3V-F0049.ppt> [retrieved on 20131021] * |
CHOI B ET AL: "MV-HEVC/SHVC HLS: Signalling decoded picture buffer size", 15. JCT-VC MEETING; 23-10-2013 - 1-11-2013; GENEVA; (JOINT COLLABORATIVE TEAM ON VIDEO CODING OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://WFTP3.ITU.INT/AV-ARCH/JCTVC-SITE/,, no. JCTVC-O0136-v2, 21 October 2013 (2013-10-21), XP030115147 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230037902A1 (en) | Video data stream, video encoder, apparatus and methods for hrd timing fixes, and further additions for scalable and mergeable bitstreams | |
KR101930817B1 (en) | Low delay concept in multi-layered video coding | |
KR101881239B1 (en) | Level definitions for multi-layer video codecs | |
US20170006309A1 (en) | Constrained depth intra mode coding for 3d video coding | |
WO2013016610A1 (en) | Multiview video coding | |
EP2737700A1 (en) | Multiview video coding | |
KR20210095958A (en) | Improved flexible tiling in video coding | |
AU2020295272A1 (en) | Image decoding method for deriving prediction sample on basis of default merge mode, and device therefor | |
CN115866259A (en) | Video coding stream extraction using identifier indication | |
US12063381B2 (en) | Video data stream, video encoder, apparatus and methods for a hypothetical reference decoder and for output layer sets | |
Zare et al. | HEVC-compliant viewport-adaptive streaming of stereoscopic panoramic video | |
Santamaria et al. | Coding of volumetric content with MIV using VVC subpictures | |
US11909957B2 (en) | Encoder and decoder, encoding method and decoding method for reference picture resampling extensions | |
WO2020249530A1 (en) | Layered video data stream | |
RU2827654C1 (en) | Signalling id of sub-images when encoding video based on sub-images | |
CN116210223B (en) | Media file processing method and device | |
US20240357148A1 (en) | Video data stream, video encoder, apparatus and methods for a hypothetical reference decoder and for output layer sets | |
Zare | Analysis and Comparison of Modern Video Compression Standards for Random-access Light-field Compression | |
EP4409884A1 (en) | Histogram of gradient generation | |
CN116210225A (en) | Method and equipment for generating media file | |
CN117203968A (en) | Image encoding/decoding method and apparatus based on SEI message including layer identifier information and method of transmitting bitstream | |
CN116210223A (en) | Media file processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20730292 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20730292 Country of ref document: EP Kind code of ref document: A1 |