GB2593897A

GB2593897A - Method, device, and computer program for improving random picture access in video streaming

Info

Publication number: GB2593897A
Application number: GB2005075.3A
Authority: GB
Inventors: Denoual Franck; Ouedraogo Naël
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-04-06
Filing date: 2020-04-06
Publication date: 2021-10-13
Anticipated expiration: 2040-04-06
Also published as: EP4133743A1; GB2593897B; GB2595740A; GB202009169D0; WO2021204827A1; US20230164371A1; GB202005075D0

Abstract

Encapsulating a video bit-stream (especially a Versatile Video Coding, VVC bitstream) in a file container, the method comprising: obtaining at least one network abstraction layer unit (NAL unit) of a first picture, buffering the at least one obtained NAL unit, obtaining at least one NAL unit of a second picture, the second picture being a random access picture following the first picture, and encapsulating the at least one obtained NAL unit of the second picture and the at least one buffered NAL unit in an encapsulation structure making it possible to generate a video bit-stream without processing the encapsulated at least one buffered NAL unit when processing the encapsulated video bit-stream. Preferably, the first NAL unit is a non-VCL parameter set such as an adaptation parameter set, APS, that is referred to by the random access IRAP NAL unit. Preferably parameters may be stored in media data, or in metadata where a syncsamplebox may be modified or a new box defined to carry the parameter set data. A second inventive concept describes bitstream retrieval from a file container by filtering out additional NAL units associated to an IRAP picture if the picture is not being used for random access.

Description

METHOD, DEVICE, AND COMPUTER PROGRAM FOR IMPROVING RANDOM PICTURE ACCESS IN VIDEO STREAMING

FIELD OF THE INVENTION

The present invention relates to a method, a device, and a computer program for improving random picture access in video streams, for example for improving random access in a media file carrying versatile video codec (VVC) bit-streams.

BACKGROUND OF THE INVENTION

The joint video exploration team (JVET) group is standardizing a video codec called versatile video codec (WC) offering better compression performance than previous codecs, in particular the codecs known as HEVC (high efficiency video coding) and AVC (advanced video coding). This compression gain is obtained thanks to new compression or filtering tools. One of these tools is the adaptive loop filter minimizing the mean square error between original pixels and decoded pixels using adaptive filter coefficients.

The VVC has a block-based hybrid coding architecture, combining inter-picture and intra-picture prediction and transform coding with entropy coding. In broad outline, the data to be transmitted are coded according to VCL (video coding layer) NAL units and according to non-VCL NAL units, wherein a network abstraction layer unit (NAL unit or NALU) is a syntax structure containing data and an indication of the type of these data. For example, the non-VCL NALUs may consist in parameter sets or adaptation parameter set NAL units while VCL NALUs may be coded slice NAL units. Some non-VCL NALUs may also consist in SEI messages (Supplemental Enhancement Information) that may assist processes related to decoding, display or other purposes.

Figure la illustrates an example of a sequence of picture units (PU) complying with VVC. For the sake of illustration, the sequence comprises four picture units denoted 100-1 to 100-4, the first and the third picture units being selectable by a user as an entry picture (or random access point) within the sequence. Such entry picture units are referred to as intra random access pictures (IRAP). This means that, for example, the decoding process may start from picture 100-3, as illustrated with bold arrow 105.

For the sake of illustration, picture unit 100-1 contains both non-VCL NAL units, for example non-VOL NAL unit 110, and VOL NAL units, for example VOL NAL units 115-1 to 115-3.

As illustrated, a picture unit, for example picture unit 100-2, may contain only VOL NAL units and may reference non-VOL NAL units from previous picture units, as illustrated by the dashed arrows. For example, a VOL NAL unit may reference adaptation parameter set NAL unit(s) (APS NAL unit(s)), through one or more syntax elements declared in the slice header.

However, as illustrated with the dashed bold arrows, corresponding to a reference to non-VOL NAL unit 110 from random access picture 100-3 and from a subsequent picture 100-4, a problem may arise when non-VOL NAL units of a previous picture unit are referenced by a random access picture to be used as a starting picture of a sequence of pictures and/or by a following picture unit. According to the illustrated example, VOL NAL unit 120 of picture unit 100-3 and VOL NAL unit 125 of picture unit 100-4 depend on non-VOL NAL unit 110 of picture unit 100-1. Therefore, non-VOL NAL unit 110 must be available to decode VOL NAL units 120 and 125.

Accordingly, when at least one VOL NAL unit in a random access picture or in a picture between two random access pictures depends on one or more non-VOL NAL units from a picture unit preceding the random access picture, the storage in a file format must handle these dependencies so as to encapsulate random access samples as real random access samples (i.e. not requiring any data or NAL units from previous samples). It is noted that dependencies between VOL NAL units and non-VOL NAL units from the preceding random access picture unit or from picture units between the preceding random access and the pictures units comprising the VOL units is not an issue.

As illustrated, the dependency from VOL NAL unit 130 to APS NAL unit 135 is not an issue because the decoding cannot be started from picture 100-4 but only from IRAP picture 100-1 (first picture of the bit-stream) or from IRAP picture 100-3.

While ISOBMFF and its extension for NALU-based video codec (150/IEC 14496-15) historically provide a support for random access in compressed video bit-streams, the problem stated above from the VVC specification introduces new constraints for random access samples.

As a matter of fact, when a picture was indicated as a random access point in a bit-stream conforming to a previous codec like HEVC or AVC, this meant that no NAL unit from previous pictures were required for the decoding. This is no longer the case within VVC bit-streams because some picture units, signalled as intra random access point pictures (IRAP pictures) may have dependencies on non-VCL NAL units from previous picture units in the bit-stream, for example dependencies on APS NAL units (as illustrated in Figure la).

Figure lb illustrates an example of description of random or synchronization samples in the ISO Base media file format, for the video track referenced 150. Typically, these samples are described using the 'sync' (or a 'tap') sample group mechanism, for example 'sync' sample group 155, or a dedicated box in the sample description, for example the SyncSampleBox (not illustrated). For the sake of illustration, samples 1601 and 160-2 are described in 'sync' sample group 155 as synchronization or random access samples. They may also be described in a SyncSampleBox in the SampleTableBox providing sample description.

It is noted that while such mechanisms indicate samples onto which a decoder can start decoding, they do not provide the necessary decoding context (nonVCL NALUs) required for the decoder to correctly reconstruct the pictures from the compressed bit-stream.

The non-VCL NALUs required for random access may be transmitted out-ofband (i.e. they can be handled by the transport layer). However, it may be preferable to encapsulate a self-contained track, i.e. having these required NALUs for random access be transmitted with the description of the track (e.g. 'trak' or traf box) or within the data of the track (e.g. cmdaf or iimda' box), especially for streaming applications where one wants to limit the number of requests on the network to start playout or seek within a media file.

Consequently, there is a need to improve the mechanisms making it possible to access a random picture in an encapsulated video stream.

The present invention has been devised to address one or more of the foregoing concerns.

SUMMARY OF THE INVENTION

According to a first aspect of the invention, there is provided a method for encapsulating a video bit-stream in a server, the method comprising: obtaining at least one network abstraction layer unit (NAL unit) of a first picture, buffering the at least one obtained NAL unit, obtaining at least one NAL unit of a second picture, the second picture being a random access picture following the first picture, and encapsulating the at least one obtained NAL unit of the second picture and the at least one buffered NAL unit in an encapsulation structure making it possible to generate a video bit-stream without processing the encapsulated at least one buffered NAL unit when processing the encapsulated video bit-stream.

Accordingly, the method of the invention makes it possible to improve video streaming by enabling access to random pictures in video streams while improving the use of the resources of a decoder and the transmission bandwidth.

According to an embodiment, the at least one obtained NAL unit of the first picture is buffered upon determining that the at least one obtained NAL unit of the first picture is of a specific type.

According to an embodiment, the at least one buffered NAL unit is encapsulated with the at least one obtained NAL unit of the second picture upon determining that the at least one obtained NAL unit of the second picture references the at least one buffered NAL unit. Accordingly, the buffered NAL units are selectively encapsulated as a function of NAL units referencing buffered NAL units.

According to an embodiment, the at least one obtained NAL unit of the second picture and the at least one buffered NAL unit of the first picture are encapsulated into a same self-contained track.

According to an embodiment, the at least one obtained NAL unit of the second picture and at least one buffered NAL unit to be encapsulated within the encapsulation structure are encapsulated within a media data part of the track. According to an embodiment, at least one buffered NAL unit to be encapsulated within the encapsulation structure is encapsulated within a metadata part of the track.

According to an embodiment, the method further comprises determining an identifier of a type of data of an obtained NAL unit of the specific type, an obtained NAL unit being buffered as a function of a type and of an identifier.

According to an embodiment, the specific type is the type of a parameter set NAL unit.

According to an embodiment, the method further comprises identifying at least one picture as being the at least one random access picture.

According to a second aspect of the invention, there is provided a method for generating a video bit-stream from an encapsulated video bit-stream, in a client device, the method comprising, receiving an instruction for generating a video bit-stream starting from a selected random access picture, obtaining at least one network abstraction layer unit (NAL unit) of the selected random access picture, the obtained NAL unit being associated with a specific structure comprising at least one additional NAL unit; accessing the specific structure and obtaining the at least one additional NAL unit, and generating a video bit-stream comprising the at least one additional NAL unit and the at least one obtained NAL unit of the selected random access picture, wherein the specific structure is accessed only when the selected random access point is used as an access picture.

According to an embodiment, the at least one additional NAL unit is of a specific type, wherein the at least one NAL unit of the selected random access picture and the at least one additional NAL unit are obtained from a same self-contained track, and wherein at least one additional NAL unit of the specific structure is obtained from a metadata part of the track.

According to an embodiment, at least one additional NAL unit of the specific structure is obtained from a media data part of the track.

According to a third aspect of the invention, there is provided a method for generating a video bit-stream from an encapsulated video bit-stream, in a client device, the method comprising, obtaining a set of network abstraction layer units (NAL units) of an encapsulated picture, if the encapsulated picture is a random access picture that is not to be used as an access picture, filtering the obtained set of NAL units to skip at least one additional NAL unit, and generating a video bit-stream as a function of the filtered NAL units, wherein an additional NAL unit is a NAL unit of a picture that has been added to a set of NAL units a following random access picture.

According to an embodiment. the skipped at least one additional NAL unit is of a specific type, wherein at least one NAL unit of the filtered NAL units and at least one skipped additional NAL unit belong to a same self-contained track, and wherein the at least one skipped additional NAL unit belongs to a media data part of the track.

According to an embodiment, the video bit-stream is of the versatile video codec type, the NAL units being of the video coding layer type or of the non video coding layer type.

According to an embodiment, the specific type is a type of abstraction layer units corresponding to an adaptation set.

According to a fourth aspect of the invention, there is provided a device for comprising a processing unit configured for carrying out each of the steps of the method described above. The fourth aspect of the present invention has optional features and advantages similar to the first, second, and third above-mentioned aspects.

At least parts of the methods according to the invention may be computer implemented. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module" or "system". Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.

Since the present invention can be implemented in software, the present invention can be embodied as computer readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid state memory device and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g., a microwave or RF signal.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will now be described, by way of example only, and with reference to the following drawings in which: Figures la and lb illustrate an example of a sequence of picture units complying with VVC and an example of description of random (or synchronization) samples in the ISO Base media file format, respectively; Figure 2 illustrates an example of streaming media data from a server to a client; Figure 3 illustrates an example of steps of an encapsulation process according to a particular embodiment of the invention; Figures 4a and 4b illustrate detail of steps of an encapsulation process according to a particular embodiment of the invention and an example of steps for appending additional data required for a random access, respectively; Figures 5a and 5b illustrate an example of steps carried out during a step of buffering non-VOL NAL units, as illustrated in Figure 4a, and an optional step for adapting additional non-VCL NAL units that are required for random access, respectively; Figures 6a and 6b illustrate an example of steps of a parsing process according to a first and a second embodiment of the invention, respectively; Figure 7 schematically illustrates a processing device configured to implement at least one embodiment of the present invention; Figure 8 illustrates an example of a specific NALU-like structure in a data part of a VVC track aggregating additional non-VCL NAL units required only for random access; Figure 9 illustrates an example of encapsulation of a video bit-stream into media segments containing several media fragments; and Figure 10 illustrates an example of a mapping of additional non-VOL NAL units that may be used to indicate additional NAL units that are duplicated only for enabling random access.

DETAILED DESCRIPTION OF THE INVENTION

According to embodiments, NAL units are read from a compressed video bit-stream so as to organize data of the bit-stream into an encapsulated file. The type of some NAL units and/or the identifier of data within encoded within these NAL units are decoded to determine the role of the NAL units, in particular whether they are required in case of random access, making it possible to identify and buffer additional NAL units that are required for random access so as to make sure that the encapsulated file enables random access. These additional NAL units are signalled within the encapsulated file so that they can be used when needed, for example when a user is seeking a particular portion in a video sequence. The additional NAL units may be stored in a metadata part of the encapsulated file or in a data part of the encapsulated file.

Still according to embodiments, a video bit-stream encapsulated according to the invention is parsed to extract NAL units. The additional NAL units are identified, from a data part or from a metadata part of the encapsulated file so as to avoid needlessly extracting NAL units.

Figure 2 illustrates an example of streaming media data from a server to a client. As illustrated, a server 200 comprises an encapsulation module 205, connected, via a network interface (not represented), to a communication network 210 to which is also connected, via a network interface (not represented), a de-encapsulation module 215 of a client 220.

Server 200 processes data, e.g. video and/or audio data, for streaming or for storage. To that end, server 200 obtains or receives data comprising, for example, the recording of a scene by one or more cameras, referred to as a source video. The source video is received by the server as an original sequence of pictures 225. The server encodes the sequence of pictures into media data (i.e. bit-stream) using a media encoder (e.g. video encoder), not represented, and encapsulates the media data in one or more media files or media segments 230 using encapsulation module 205. Encapsulation module 205 comprises at least one of a writer or a packager to encapsulate the media data. The media encoder may be implemented within encapsulation module 205 to encode received data or may be separate from encapsulation module 205.

Client 220 is used for processing data received from communication network 210, for example for processing media file 230. After the received data have been de-encapsulated in de-encapsulation module 215 (also known as a parser), the de-encapsulated data (or parsed data), corresponding to a media data bit-stream, are decoded, forming, for example, audio and/or video data that may be stored, displayed or output. The media decoder may be implemented within de-encapsulation module 215 or it may be separate from de-encapsulation module 215. The media decoder may be configured to decode one or more video bit-streams in parallel.

It is noted that media file 230 may be communicated to de-encapsulation module 215 in different ways. In particular, encapsulation module 205 may generate media file 230 with a media description (e.g. DASH MPD) and communicates (or streams) it directly to de-encapsulation module 215 upon receiving a request from client 220. The media file 230 may also be downloaded by and stored on the client 220.

For the sake of illustration, media file 230 may encapsulate media data (e.g. encoded audio or video) into boxes according to ISO Base Media File Format (ISOBMFF, ISO/IEC 14496-12 and ISO/IEC 14496-15 standards). In such a case, media file 230 may correspond to one or more media files (indicated by a FileTypeBox ftyp') or one or more segment files (indicated by a SegmentTypeBox styp'). According to ISOBMFF, media file 230 may include two kinds of boxes, a "media data box", identified as mdat or 'imda', containing the media data and "metadata boxes" (e.g. 'moot/ or 'moor) containing metadata defining placement and timing of the media data. In a particular embodiment, the sequence of pictures 225 is encoded, or compressed, according to the Versatile Video Codec specification ISO/IEC 23090-3.

Encapsulation Figure 3 illustrates an example of steps of an encapsulation process according to a particular embodiment of the invention. As illustrated, a first step is directed to obtaining a compressed video bit-stream (step 300). Such a compressed video bit-stream may be a sequence of NAL units. It may be available in its entirety (for example a pre-encoded compressed video bit-stream read from a storage device or from a memory). Alternatively, the compressed video bit-stream may be generated on the fly (live or real-time encoding).

Next, one or more access units are read (step 305), each access unit being a set of picture units that belong to different layers and contain coded pictures associated with the time for output from the decoded picture buffer (DPB). The number of access units that are read may depend on the buffering capabilities of the module carrying out the encapsulation process, on its configuration (e.g. to generate one file for storage, to generate a file with movie fragments, or to generate media segments for streaming, each media segment possibly with one or more fragments), or the encoding mode (e.g. live or pre-encoded).

For example, when the bit-stream is pre-encoded or available from storage, the module carrying out the encapsulation process may inspect several access units and their corresponding NALUs to have a better knowledge of the bit-stream and to improve the encapsulation.

When the bit-stream comes from live encoding, the module carrying out the encapsulation process may generate a file or media segments containing movie fragments. Typically, the module carrying out the encapsulation module buffers and processes a group of images corresponding to a fragment (e.g. from 0.5 to several seconds of compressed video) before being able to output encapsulated access units. The module carrying out the encapsulation process inspects the NAL units, especially the non-VCL NALUs, from the read access unit(s). It looks for non-VCL NALUs that may be referenced along the video sequence by pictures or access units (for example the non-VCL NAL unit 110 in Figure la). This may be for example an APS NALU, a picture parameter set (PPS) NALU, or a sequence parameter set (SPS) NALU. It may also be an SEI message. It may also be update of an APS, a PPS, or a SPS. An update is a new version of a parameter set which may include new parameters associated with a parameter set identifier already used in the video sequence. The potential non-VOL NAL units that may be useful for random access and referenced by picture units are called parameter sets for sync samples. They correspond to a set of parameters required for decoding a sample starting from a sync sample. The parameters within this set may be associated with an identifier. For example, a video parameter set NAL unit (VPS NAL unit) has an identifier represented by the syntax element vps_video_parameter_set_id. An SPS NAL unit has an identifier represented by the sps_seq_parameter_set id syntax element. A PPS NAL unit has an identifier represented by the pps_pic_parameter set_id syntax element. An APS NAL unit also has a syntax element adaptation_parameter_set_id providing an identifier. There may also be several types of APS NAL units. Different types of APS NAL unit define their own identifier space. In this case, the identifier of the non-VCL NAL unit identifier may be a pair of identifier and type.

Next, as illustrated, the non-VCL NAL units are buffered in a memory of the module carrying out the encapsulation process (step 310). This step is described in more detail by reference to Figure 5a.

Next, from the read access units, it is determined if, from the read access unit(s), one or more samples should be declared as synchronization samples (step 315), also called sync or random access samples, and which of these samples should be considered as synchronization samples. The sync samples may correspond to the IRAPs comprised within the read access units or may correspond only to some of these IRAPs, depending, for example, on a fragmentation or segmentation configuration. For example, it may be decided to declare one sync sample every second even if the input bit-stream provides two or more I RAP pictures per second.

When declaring a sync sample in the sample description, the encapsulation process appends all or a part of the buffered non-VOL NALUs in the sample description or within the sample data (step 320). This step is described in detail with reference to Figure 4a. Next, the read access units are encapsulated in the media file or media segment (step 325).

It is to be noted that the encapsulation described by reference to Figure 2 may also apply to a media file that is already encapsulated. Accordingly, instead of reading access units from an input video bit-stream (steps 300 and 305), samples are read from an encapsulated file previously received. The encapsulation still decides if and where to place sync samples in the new encapsulated file and appends the non-VOL NALUs that are required for random access. This enables an initial encapsulation that was not suitable for random access or not optimized for random access to provide a single track with random access capabilities. This may be used, in particular, when segmenting a media file encapsulated as one huge media file for streaming. This may also be used for fragmenting a media file encapsulated as a single media file. This may also be used to transform a multi-track encapsulation, e.g. a video track plus one or more non-VCL tracks, into a single track with random access capabilities.

Video tracks generated with the encapsulation process illustrated in Figure 3 may be further indicated by a specific sample entry type, for example a VisualSampleEntry vvce or vvi2' indicating that this video track is a single track allowing random access and containing indications about additional non-VCL NALUs required for random access that are not useful for normal playout. The cwil' sample entry type in the Working Draft for VVC file format (w19049) may be reused instead of cvvcri (or vvi2) but a specific sample entry would explicitly indicate that specific signalling is in use in the track for the additional non-VOL NALUs. Considering these sample entry types, the handling of sync samples is specified as follows for encapsulated VVC bitstreams.

When the sample entry name is lvvc11 or the reused 'vvi 1' (or new vvi2' or vvcr) and the track does not have a track reference of type 'vvcN' (e.g. reference to a non-VOL track like in multi-track approach), the following applies: * if the sample is a sync sample, a. when the sample entry type is vvct, all the parameter sets needed for decoding that sample must be included in the sample entry.

b. when the sample entry type is vvi1 vvi1' (or new vvi2' or vvcrT, all the parameter sets needed for decoding that sample must be included either in the sample entry or in the sample itself.

* otherwise (i.e. the sample is not a sync sample), a. when the sample entry type is 'Nivel, all the parameter sets needed for decoding that sample must be included in the sample entry.

b. when the sample entry type is vvit (or new vvi2' or vvcrT, all the parameter sets needed for decoding the sample must be included either in the sample entry or in any of the samples since the previous sync sample to the sample itself, inclusive.

For interoperability purposes, when using Evvi/' sample entry type, the sync sample may be updated as follows: a sync sample in vvif (or a specific VisualSampleEntry type like Ivy& or vvi2) may contain an indication describing, or may contain a specific box providing, the additional non-VCL NALUs required for random access. Examples are provided herein below.

Figure 4a illustrates detail of steps of an encapsulation process according to a particular embodiment of the invention. Such steps may be carried out in an encapsulation module such as encapsulation module 205 in Figure 2, for example an ISOBMFF writer (referred to as a "writer" hereafter).

As illustrated, a first step is directed to initializing a description of a video track (step 400). For the sake of illustration, a new trak box may be created within a 'moot/ box. The handler of the track may be set to 'vide' to indicate that it is a video track.

According to a particular embodiment, the NAL units (NALUs) of the input coded stream (e.g. VVC stream) are successively processed until the end of the stream is reached. To that end, it is determined whether or not there is a next NAL unit to be read (step 405) and, if there is a NAL unit to be read, it is read. If no NAL unit can be found, this terminates the encapsulation process and the track is finalized (step 410). For example, indexing information (e.g. sidx' or additional user data or movie fragment random access box) may be computed and inserted in the metadata part of the file.

If a NAL unit to be processed is read in the input VVC bit-stream (step 405), it is checked whether or not the read NAL unit indicates the beginning of a new picture unit (PU) (step 415). This is determined according to the video codec specification used which defines the types of NAL units that are allowed before a new picture unit. If the read NAL unit indicates a new picture unit, a new sample description is started within the sample table box (step 420). For the sake of illustration, this may comprise creating a new NAL unit sample in the media data part of the file (e.g. the mdat' box) and in appending a new entry in the boxes describing the samples (e.g. decoding time to sample box, optionally the composition time to sample box, sample size box, sample to chunk box, or track run box when the file is stored as a fragmented file).

Next, it is determined whether or not the NAL unit is an update of a parameter set (step 425), for example a new APS or a PPS. If the NAL unit is an update of a parameter set, the NAL unit, that is a non-VCL NAL unit, is buffered (step 430) and the algorithm loops to step 405 to process the next NAL unit in the input VVC bit-stream.

The non-VCL NAL unit detected as an update of a parameter set is also appended to the data container to keep the current picture unit contained as a sample in the data part of the file.

The buffering step 310 or 430 may include buffering the non-VCL for the sample or picture unit being read in 305 or 415 but, when appending the buffered NAL units in the encapsulated bit-stream (step 320 or 440), the ones for the current sample must be included in the sample data (step 325 or 450) and not duplicated and not be marked as additional non-VCL NAL units required for random access. By doing so, the buffer is maintained up to date for next sync samples and the current sample is kept consistent with its representation in the input bit-stream. The step of determining whether or not the NAL unit is an update of a parameter set makes it possible to reduce the number of additional NAL units that are buffered and that may be copied within the description of the random access or synchronization samples. Only those that are relevant or the most recent updates are buffered.

Next, if the NAL unit is not an update of a parameter set (step 425), it is determined whether or not the read NAL unit is part of an IRAP picture and whether it contains a reference to a parameter set for sync sample (e.g. an APS NAL unit for VVC bit-stream) (step 435). For example, in a VVC bit-stream, it is a VCL NALU or a picture header (PH, non-VCL) NAL unit with specific signalling that indicates that the picture unit is an IRAP (based on a VCL NAL unit type corresponding to IRAP types [nal_unit_type 25 in the range of IDR_W RADL to CRA_NUT] and on gdr_or_irap_pic flag of PH). If the read NAL unit is a picture header or the read NAL unit is the first VCL NAL unit with no picture header present in the picture unit, it is determined that the input VVC bit-stream contains a new IRAP picture. Accordingly, it may be decided to encapsulate this IRAP picture as a sync sample.

If the read NAL unit is part of an IRAP picture and if it contains one or more references to a parameter set for sync sample (e.g. an APS NAL unit for VVC bit-stream) buffered in step 430, additional non-VCL NAL units are required to decode the stream from this random access point. As a result, the NAL units buffered at step 430, or a portion of these buffered NAL units, are appended to the encapsulated file (step 440).

According to a first embodiment, these NAL units are stored in the metadata part of the file, for example in sample description. According to a second embodiment, they are stored in the data part of the file (e.g. in the mdat or 'imda' box within the sample data). When these NAL units are stored in the data part, the sample size is incremented by the length of all the appended NAL units (step 445). When these NAL units are stored in the metadata part, the corresponding box is updated or created (step 445).

Next or if the read NAL unit is not part of an RAP picture and if it does not contain any reference to a parameter set for sync sample (e.g. an APS NAL unit) present in the buffer, the current NAL unit is appended to the data part of the file (step 450), prefixed by its length in bytes, so as to form a NALU sample according to ISO/IEC 14496- 15, and the sample size is updated accordingly with the NAL unit length in the sample size or compact sample size box (step 455). In a variant, the test 435 applies to RAP and non RAP pictures and checks whether a picture (through picture header or a slice header) contains one or more references to a parameter set for a sync sample (e.g. an APS NAL unit) present in the buffer, these referenced parameter sets are appended to the encapsulated file with the sample description (in steps 440 and 445).

Figure 4b illustrates an example of steps for appending additional data required for a random access, as performed by the encapsulation process in steps 320 or 440. As illustrated, the process begins when the encapsulation process takes the decision to append additional data in connection with a given sample (step 460). Most of the time, the sample is a sync sample, but there may be additional data also appended within non sync or non-random access samples. In the case of VVC bit-stream, the additional data may consist in parameter set updates or in APS NAL units. Additional data means that these data may not be required during normal playout.

Next, a structure describing the current sample is selected or created (step 462). It may be one box in the sample description, one box in a track fragment, or even a specific NAL-unit like structure in the data part of the file. Such a selected structure is used to set an information, using some box parameters or specific NAL unit types, that indicates that the sample has some dependencies, other than image dependencies, to some additional data in samples or picture units that precede the last sync sample in the decoding order (step 464).

Next, it is decided, depending on the configuration of the encapsulation module, as defined by a user, an application, or a script, whether to provide the additional data in the selected structure (step 466).

If additional data are to be provided, the indication that additional data are provided for the current sample is set in the selected structure and the size of the additional data is set in a parameter of the selected structure (step 468). For the sake of illustration, it may be a dedicated parameter in a box or the NAL unit length if stored as a NAL unit specific structure. Next, the additional data are written in the selected structure, for example as an array of bytes in a box or as a NALU payload in a NAL unit like specific structure (step 470).

On the contrary, if additional data are not to be provided (step 466), the indication that no additional data is provided for the current sample is set in the selected structure. Next, it is decided whether the additional data for the current sample should be referenced or not (step 472), depending on the configuration of the encapsulation module. The choice is recorded in the selected structure.

If the additional data for the current sample should be referenced, it means that the additional data are included within the sample as a one or more regular NAL units. Accordingly, the position of the first byte of this one or more NAL units is set in the selected structure and the length, in bytes, is also set in the selected structure (step 474). The position may be specified from the beginning of the file or from the beginning of a movie fragment or relative to the sample offset information for the current sample (available from the sample description, e.g. through SampleSizeBox, SampleToChunkBox or ChunkOffsetBox). Next, the additional data are copied as regular NAL units within the current sample (step 476), preferably as first NAL units for the current sample or before its first VOL NAL unit. In any case, it should be copied within the sample in a position that complies with the NAL unit order defined by the video codec specification. In a variant, the position and length indicating the additional NAL unit within a sample is described as a list of NAL unit index, or as a first byte plus last byte positions.

Examples of structures that may be selected and used for the storage of additional data are described herein below.

Next, turning back to Figure 4a, the algorithm loops on step 405 so as to process all the NAL units.

Back to step 440 wherein buffered NAL units are appended to the sample description or to the sample data, it is noted that by default, all the buffered non-VOL NAL units are appended to the sample description or sample data. However, in a particular embodiment, the buffered NAL units are further analysed to append as few NAL units as possible in the sample description or sample data. This analysis may consist, for example when considering a group of pictures, in looking for the non-VOL NAL units referenced by the picture unit of this group of pictures. By doing so, it may be detected, within the encapsulation process, that some buffered non-VCL NAL units are not referenced by any picture unit in the group of pictures. Accordingly, it is not requested to append these non-VCL NAL units in step 440. However, they are preferably retained within the buffer because they may be used in a future group of pictures.

Figure 5a illustrates an example of steps carried out during a step of buffering non-VCL NAL units, as illustrated in Figure 4a (i.e. step 430). For the sake of illustration, it is assumed that the internal buffer used for storing non-VOL NAL units corresponding to updates has a pre-defined fixed size. It may contain different tables, for example one table per type of non-VOL NAL unit as defined in the used video codec specification. Alternatively, it may contain different tables depending on the NAL unit type but also on an additional type further refining the information on the purpose of the nonVCL NAL unit. For example, APS NAL units may have an additional type indicating the kind of APS: ALF APS or LMCS APS or SCALING APS. This information requires inspection of the first bits or bytes of non-VOL NAL units. For example, for APS NAL units, this consists in getting the aps_param_type. The size of a table may be set to the maximum allowable number of NAL units for a given type (either NAL unit type, or combination of the NAL unit type and the additional type indicating a kind for a given NAL unit type), according to the video codec specification. The purpose of using a fixed size is for the writer to control the number of buffered non-VOL NAL units and to keep only the latest version of a NAL unit of a given type with a given identifier. This avoids storing too many (and potentially unnecessary) additional non-VOL NAL units. According to a particular embodiment, the size may be determined as a function of the maximum allowable number of parameter sets. This is specified by the video codec specification and then known by the encapsulation module.

It is noted that obtaining the type of a NAL unit is a typical step carried out by a writer, which can be done, for example, when reading the NAL unit (e.g. step 405 in Figure 4a).

As illustrated, first steps are directed to obtaining a non-VOL NAL unit for parameter set update (step 500) and its type (step 505). The type of the NAL unit may be used to identify a corresponding table where to store the obtained NAL unit. Next, an identifier of the parameter set to which is directed the obtained NAL unit is obtained (step 515). It may be obtained by parsing the beginning of the NAL unit payload, for example to get an adaptafion_parameter_set_id from an APS NALU or a picture_parameter set_id for a PPS NALU. The parsing of such an identifier depends on the type of NAL unit. It may consist in reading the first bits of the NAL unit payload or in decoding a variable length encoded value, for example an Exp-Golomb coded value. The fixed or variable length encoding of the identifier is given by the video codec specification. Optionally, an additional type may also be obtained by NAL unit inspection (e.g. the type of an APS NAL unit).

After having obtained the identifier of the adaptation set, the obtained NAL unit for parameter set update may be adapted (step 520). This optional step is described in more detail in reference to Figure 5b.

Next, the NALU is stored in the identified table at an index corresponding to the obtained identifier (step 525). This makes sure that the bit-stream resulting from parsing will only contain one instance of a pair of NALU type and adaptation set identifier for a given picture, reducing the processing load for the video decoder and avoiding breaking some compliance rules. For example, the VVC specification states that all SPS, respectively PPS, NAL units with a particular value of sps_seq_parameter set id, respectively pps_pic_parameter set_id, in a CVS, respectively within a picture unit, shall have the same content and that "All APS NAL units with a particular value of adaptation_parameter_set_id and a particular value of aps_params_type within a PU, regardless of whether they are prefix or suffix APS NAL units, shall have the same content".

Figure 5b illustrates an optional step for adapting additional non-VCL NAL units that are required for random access. Some non-VCL NAL units may appear at specific position in the bit-stream with respect to other NAL units. For example, the VVC specification defines prefix and suffix APS NAL units, or prefix and suffix SEI NAL units. The WC specification specifies their authorized position. When appending such "NAL-order sensitive" NAL units, an adaptation of the NAL unit may have to be done so that the extracted bit-stream does not violate the expected NAL unit order imposed by the video codec specification when it is parsed.

For example, considering the configuration illustrated in Figure 5b, input bit-stream 550 contains both prefix APS 555-1 and suffix APS 555-2 NAL units. Input bit-stream 500 contains three picture units referenced 560-1 to 560-3, the first one and the third one providing random access into the bit-stream (RAP PU).

Encapsulated bit-stream 570-1 is an example of direct encapsulation into a NALU sample of PU 560-3 in the inidaf box. This encapsulation is invalid because it contains a Suffix APS NAL unit that is placed before the first VCL NAL unit of the sample (Suffix APS 575 is placed before VCL NAL unit 580). When using VVC, the extracted bit-stream by a parser from this encapsulation would lead to a non compliant bit-stream regarding the order of the APS NAL units.

On the contrary, encapsulated bit-stream 570-2 is an example of adaptation of a NAL unit 585 so as to produce a valid NALU sample in the rndaf box. From this encapsulation, the extracted bit-stream by a parser would lead to a compliant bit-stream regarding the order of the APS NAL units. Adapting the NAL unit type when copying a non-VCL NAL unit required for random access in a sample may be handled directly by the encapsulation module when encapsulating the sample or may be handled by the parser through a rewriting instruction indicated by the encapsulation module. An example of rewriting instruction is described in reference to Figure 9.

Parsing Figure 6a illustrates an example of steps of a parsing process according to a first embodiment of the invention. According to this embodiment, the additional non-VOL NAL units that are required for random access are stored within the metadata part of the file As illustrated, a first step is directed to initializing a media player to start reading a media file encapsulated according to the invention (step 600). Next, the media player plays the file, processing the metadata as for classical NALU-based encapsulation, getting samples from the data part of the file (step 605).

When a user seeking a particular portion in the played video sequence selects an entry from which the video is to be restarted (step 610), the parser accesses a specific structure (step 615) to obtain the additional NAL units that are required for playing the video from the selected entry (i.e. random access) and put these NAL units as the first NAL units in the extracted bit-stream that is provided to the video decoder. Examples of such a specific structure are given herein below.

Next, the other NALUs are extracted from the ',Tidal" box according to the sample description in the metadata part. As illustrated, the algorithm loops to step 605 so that the player keeps on playing the file (until the end of the file, a new seeking instruction, or a stop instruction), without getting the additional NAL units stored in the specific metadata structure. By doing so, only the seek or the start of the playout at a given random access point requires additional processing steps for the parser and for the video decoder. In addition, no redundant NALUs are required within the video bit-stream.

Figure 6b illustrates an example of steps of a parsing process according to a second embodiment of the invention. According to this embodiment, the additional NAL units required for random access are stored within the data part of the file (either as regular NAL units or as specific NAL units).

As illustrated, a first step is directed to initializing a media player to start reading a media file encapsulated according to the invention (step 650). Next, the media player plays the file (step 655). To that end, each time the media player encounters a random access or a synchronization sample, it further checks whether the corresponding sample contains additional NAL units that are only required for random access. During normal playout, the parser skips (or filters) the additional NAL units, using information directly from the NAL units or from certain metadata structures (e.g. specific box or sample group or NALU mapping information).

When a user seeking a particular portion in the played video sequence selects an entry from which the video is to be restarted (step 660), the parser does not filter the additional NAL units and provide both the additional NAL units required for random access and the NAL units for the random access sample to the video decoder (step 665).

Next, as illustrated, the algorithm loops to step 655 so that the player keeps on playing the file (until the end of the file, a new seeking instruction, or a stop instruction), while filtering the NAL units not required (those marked as additional NALUs required for random access).

Examples of structures enabling NALU filtering are given herein below.

Device for encapsulation or parsing Figure 7 is a schematic block diagram of a computing device 700 for implementation of one or more embodiments of the invention. The computing device 700 may be a device such as a micro-computer, a workstation or a light portable device. The computing device 700 comprises a communication bus 702 connected to: -a central processing unit (CPU) 704, such as a microprocessor; -a random access memory (RAM) 708 for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method for transmitting media data, of which the memory capacity can be expanded by an optional RAM connected to an expansion port for example; -a read only memory (ROM) 706 for storing computer programs for implementing embodiments of the invention; -a network interface 712 that is, in turn, typically connected to a communication network 714 over which digital data to be processed are transmitted or received. The network interface 712 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 704; -a user interface (UI) 716 for receiving inputs from a user or to display information to a user; -a hard disk (HD) 710; -an I/O module 718 for receiving/sending data from/to external devices such as a video source or display.

The executable code may be stored either in read only memory 706, on the hard disk 710 or on a removable digital medium for example such as a disk. According to a variant, the executable code of the programs can be received by means of a communication network, via the network interface 712, in order to be stored in one of the storage means of the communication device 700, such as the hard disk 710, before being executed.

The central processing unit 704 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the invention, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 704 is capable of executing instructions from main RAM memory 708 relating to a software application after those instructions have been loaded from the program ROM 706 or the hard-disc (HD) 710 for example. Such a software application, when executed by the CPU 704, causes the steps of the flowcharts shown in the previous figures to be performed.

In this embodiment, the apparatus is a programmable apparatus which uses software to implement the invention. However, alternatively, the present invention may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).

Encapsulating additional non-VCL NALUs for random access within a metadata part of a file Several alternatives are described for this embodiment according to which the additional non-VCL NAL units that are required for random access are stored within the metadata part of the encapsulated file.

It is recalled that the ISO/IEC 14496-12 provides tools for description of random access or synchronization samples in a track. For example, the Sync sample box provides a compact marking of sync samples within the stream. The table is arranged in strictly increasing order of sample number and is defined as follows: aligned(8) class SyncSampleBox extends FullBox('stss; version = 0, 0) { unsigned int(32) entry count; int i; for (i=0; i < entry count; i++) { unsigned int(32) sample number; where sample_number gives, for each sync sample in the stream, its sample number.

This box may be used by media players or parsers to locate which is the first sync sample prior to a specified seek or start time.

Extension of SyncSampleBox, containing only past APS for a sync sample In a particular embodiment, the SyncSampleBox is extended to provide, for each declared sync sample, the required APS NAL units for random access to this sync sample (the part in bold). The encapsulation process may use this version of the box in the step of copying the required APS NAL units for random access (e.g. step 440 in Figure 4a). A de-encapsulation (or parser) module such as de-encapsulation module 215 in Figure 2, receiving this new version of the SyncSampleBox, may retrieve the required APS NAL units required for random access when needed (e.g. step 615 in Figure 6a).

This embodiment keeps the APS NAL units for the sync sample in the data part of the file.

aligned(8) class SyncSampleBox extends FullBox('stssi, version, 0) { unsigned int(32) entry count; for (int i=0; i < entry count; i++) unsigned int(32) sample number; if (version >= 1) unsigned int(8) numNalus; for (int i=0; i< numNalus; i++) ( unsigned int(16) nalUnitLength; bit(8*nalUnitLength) nalUnit; where - numNALUs indicates the number of APS NAL units required for random access to a given sync sample (i.e. to start decoding from the first NAL unit for this sync sample). This corresponds to the APS NAL units occurred in the bit-stream before the first NAL unit of the given sync sample, -nalUnitLength indicates the length in bytes of the NAL unit, and - nalUnit contains an APS NAL unit, as specified in the video codec specification, for example ISO/IEC 23090-3.

Extension of SyncSampleBox, containing only APS, possibly past and current In a particular embodiment, the SyncSampleBox is extended to provide, for each declared sync sample, the required APS NAL units for this sync sample (the part in bold). A parameter or flag, called for example all_APS, indicates whether this box contains the full list of APS (past + current) or only the past ones required for random access. The encapsulation module may use this version of the box in the step of copying the required APS NALUs for random access (e.g. step 440 in Figure 4a). The parameter or flag is set to true when the encapsulation module buffered and stored the APS required for random access plus the ones for the current sync sample. When the encapsulation module only buffered and stored APS NALUs that occurred before the first NALU of the sync sample, it sets the parameter or flag to false. Using this embodiment, a de-encapsulation (or parser) module, such as de-encapsulation module 215 in Figure 2, analyzing this new version of the SyncSampleBox, may retrieve the required APS NALUs for random access when needed (e.g. step 615 in Figure 6a) and possibly those for the current sync sample. This variant may be an alternative providing a single VVC track instead of a WC track plus another non-VCL track dedicated to the storage of APS NAL units. Such track may be identified by a specific sample entry type. aligned(8) class SyncSampleBox extends FullBox('stss', version, 0) { unsigned int(32) entry count; for (int 1=0; i < entry count; i++) unsigned int(32) sample number; if (version >= 1) ( unsigned int(1) all APS; unsigned int(7) numNalus; for (int i=0; K numNalus; i++) ( unsigned int(16) nalUnitLength; bit(rnalUnitLength) nalUnit; where -All_APS flag when set to true, indicates that the all APS NAL units required to decode the given sync sample (past + current) are present in the box (the APS NAL unit corresponding to the given sync sample is not in the data part of the file, to avoid duplication). When set to false, it indicates that only the APS NAL units from previous picture units are present in the box (the APS NAL units for the sync sample are in the data part of the file), -numNALUs indicates the number of APS NALUs present in the box for a given sync sample, -nalUnitLength indicates the length in bytes of the NAL unit, and -nalUnit contains an APS NAL unit, as specified in the video codec

specification, for example ISO/IEC 23090-3.

In a variant, the all_APS flag may be declared at box level, using a specific value of the flag parameter. The indication would then apply to all the sync samples declared in the box. The previous variant offers more flexibility by allowing a description at sync sample level, instead of track level with this variant. On the other hand, this variant avoids the need for a parser to check on a sync sample basis and can apply the same processing to get APS for every sync sample.

Extension of SyncSampleBox, containing non-VCL NALUs, possibly past and current In a particular embodiment, a new version of the SyncSampleBox is used to provide the required APS NAL units for random access but also possibly other non-VCL NAL units that are required for random access. This extension may be described as follows: aligned(8) class SyncSampleBox extends FullBox('stss', version, 0) { unsigned int(32) entry count; for (int i=0; i < entry count; i++) unsigned int(32) sample number; if (version >= 1) ( unsigned int(8) numArrays; for (j=0; j < numArrays; j++) ( unsigned int(1) array completeness; bit(1) reserved = 0; unsigned int(6) NAL_unit type; unsigned int(16) numNalus; for (i=0; i< numNalus; i++) ( unsigned int(16) nalUnitLength; bit(rnalUnitLength) nalUnit; where numArrays indicates the number of arrays of NAL units of the indicated type(s), array_completeness when equal to 1, indicates that all NAL units of the given type are in the following array and none are in the stream and when equal to 0, indicates that additional NAL units of the indicated type may be in the stream; the default and permitted values are constrained by the sample entry name, NAL_unit_type indicates the type of the NAL units in the following array (which must be all of that type); it takes a value as defined in the video codec specification for non-VCL NAL units, numNalus indicates the number of NAL units of the indicated type for the sync sample to which this entry applies, nalUnitLength indicates the length in bytes of the NAL unit, and nalUnit contains a non-VOL NAL unit, as specified in the video codec

specification, for example ISO/IEC 23090-3.

In a variant, there is no array_completeness parameter (assumed to be set to false). The arrays of non-VOL NAL units within the box are not complete, in the sense that they do not contain the non-VOL NAL units for a given sync sample, but only those coming from previous samples (or picture units in the bit-stream). This variant corresponds to an encapsulation where the encapsulation module, such as encapsulation module 205 in Figure 2, buffers all non-VOL NAL units detected up to the first NALU of a new IRAP (e.g. in step 425 in Figure 4a).

Dedicated structure for additional random access NALUs Caren') In a particular embodiment, instead of declaring the additional NAL units required for random access in the SyncSampleBox, a distinct box is used. In tracks with a sample entry type indicating a self-contained track suitable for random access, instead of using the SyncSampleBox, the following box may be used to describe the sync samples The box, named for example "Additional Random Access NALUs" (or SyncSampleConfigurationBox) may be defined as follows: Box Type: 'aran' (or any other reserved four-character code not already in use) Container Sample TableBox Mandatory: No Quantity Zero or one (per file or per track fragment) The syntax is as follows: aligned(8) class AdditionalRandomAccessNALUBox extends FullBox(saranc version, flags) { unsigned int(32) entry count; for (int i=0; i < entry count; i++) unsigned int(32) sample number; unsigned int(8) numArrays; for (j=0; j < numArrays; j++) ( unsigned int(8) NAL_unit type; unsigned int(8) numNalus; for (1=0; i< numNalus; i++) ( unsigned int(16) nalUnittength; bit(8*nalUnithength) nalUnit; The aran' box may not be present when every sample in the video bit-stream is a sync sample.

Using this variant, a sync sample for the VVC File Format may be described as follows: For each sync sample in a VVC track, all APSs needed for decoding of the corresponding video elementary stream from that decoding time forward are in that VVC track sample or succeeding WC track samples. These APS needed for decoding may have explicit signalling or may be provided in a dedicated structure like the aran' box.

Moreover, for signalling of various types of random access points in a VVC track, the following is recommended: The 'e'en' table (and the equivalent flag in movie fragments) must be used in a VVC track unless all samples are sync samples. Note that track fragment random access box refers to the presence of signalled sync samples in a movie fragment.

Generic extension for SyncSampleBox In a particular embodiment, the SyncSampleBox is extended with additional fields directly providing, for all or for a subset of sync samples declared in the SyncSampleBox, the additional data (e.g. non-VCL NAL units) required for random access, as a payload with its length. The SyncSampleBox is then updated as follows, using for example a new version (e.g. with the value 1) of the box. Altematively, this extension may be signaled by using a specific value of the flags parameter (not shown below): aligned(8) class SyncSampleBox extends FullBoxestssc version, flags=0) unsigned int(32) entry count; int i; for (1=0; i < entry count; i++) unsigned int(32) sample number; if (version == 1) { unsigned int(16) length; unsigned int(8) inline_datallength]; where length is the number of bytes that correspond to the inline data following this field. The value of length must be greater than zero (the value of length equal to 0 is reserved); and inline_data corresponds to the additional data (e.g. the additional non-VCL NAL units in case of VVC bit-stream) required for random access. A player seeking or starting playout from a given sync sample in the SyncSampleBox has to append the provided inline_data to the bitstream, for example at the start of the NAL units of this given sync sample.

It is noted that during the encapsulation time, when setting a sync sample (e.g. step 315 in Figure 3), the encapsulation module may append, for a NAL unit-based video compression scheme, as inline_data in the new version of the SyncSampleBox, the identified non-VCL NAL units required for random access, each prefixed by its length, so that inline_data can be directly pasted within a NAL unit sample. According to this embodiment, for signaling of various types of random access points, the new version of the sync sample table should be used in a VVC track unless all samples are sync samples.

Alternative to the generic extension for SyncSampleBox An alternative to the extension of the SyncSampleBox may be to extend the SampleDependencyTypeBox sdtp' in a similar way as the SyncSampleBox is extended in the previous variant: extend it in a new version directly providing additional data from which a sample may depend on as inline_data with its length. An extension may also consist in using the reserved value 3 in the sample_depends_on parameter to indicate that the sample does not depend on others pictures but depends on more general information like coding configurations or parameter sets. The presence of the inline_data may be conditioned to the fact that the sample_depends_on parameter is set to this value 3. As for the SyncSampleBox extension, the inline_data may directly provide the additional data to be used for random access onto a given random access or sync sample.

aligned(8) class SampleDependencyTypeBox extends FullBox(sdtp', version, 0) { for (1=0; i < sample_count; i++){ unsigned int(2) is leading; unsigned int(2) sample depends on; unsigned int(2) sample is depended on; unsigned int(2) sample has redundancy; if (version >0)1 if (sample_depends_on == 3) ( unsigned int(16) length; unsigned int(8) inline_dataflength]; Live stream or fragmented file: specific (aran) box When encoding and packaging live streams, encapsulating the file into movie fragments is convenient to reduce latency. Using movie fragments reduces the set of encapsulation tools and ISOBMFF boxes, especially for the sample description. One possibility for fragmented files is to use the arad box as defined above within a ?rat box. The variants on arrays for APS NAL units only or for all non-VOL NAL units also combined with past-only or past plus current apply here.

A variant of the use of aran' box within a ?rat box consists, for track fragments, in assuming that only the first sample of the fragment is a sync sample (e.g. when the first-sample-flags-present is set in the TrackFragmentHeader box), Box Type: 'aran' (or any other reserved four-character code not already in use) Container Sample TableBox or TrackFragmentBox Mandatory: No Quantity: Zero or one (per file or per track fragment) aligned(8) class AdditionalRandomAccessNALUBox extends FullBox('arans, version, flags) { unsigned int(8) numArrays; for (int j=0; j < numArrays; j++) ( unsigned int(8) NAL_unit type; unsigned int(8) numNalus; for (int i=0; i< numNalus; i++) ( unsigned int(16) nalUnitLength; bit(8*nalUnitLength) nalUnit; This variant avoids the presence of the sample_number parameter, relying for example, on the first-sample-flags-present flags of the TrackHeaderBox. Alternatively, a different name may be used for this box (to distinguish it from an carani box that would only be used in SampleTableBox). Alternatively, a specific value for the flags parameter of the box may be used in which this specific value for flags controls the absence/presence of the sample_number parameter. For example, the value Ox000001 denoted first_sample_only_present indicates that only one contiguous set of additional NAL units is described within the box and this set of additional NAL units applies to the first sample of a run of samples. When this flags value is not set, then there may be several additional NAL units for some samples in a run of samples aligned(8) class AdditionalRandomAccessNALUBox extends FullBox(taranc version, flags) { extends FullBoxraranc version, flags) ( if (flags & Ox000001 == 1) (II for first sample for (j=0; j < numArrays; j++) unsigned int(8) NAL unit type; unsigned int(8) numNalus; for (i0; i< numNalus; i++) unsigned int(16) nalUnitLength; bit(8*nalUnitLength) nalUnit; ) else ( for some given samples unsigned int(32) entry count; for (int i=0; i < entry count; i++) unsigned int(32) sample_number unsigned int(8) numArrays; for 0=0; 1< numArrays; j++) unsigned int(8) NAL_unit type; unsigned int(8) numNalus; for (1=0; i< numNalus; i++) unsigned int(16) nalUnitLength; bit(8*nalUnitLength) nalUnit; The box may be further controlled by using a flag value indicating whether a single NAL unit type is present or not, for example using the flags value 0x000002. When this value is set, the box does not contain the loop on arrays because it means there is only one NAL unit type present. Only the loop on numNalus may remain. The NAL unit type is provided as a first parameter of the box.

aligned(8) class AdditionalRandomAccessNALUBox extends FullBoxraranc version, flags) { if (flags & Ox000001 == 1) (// for first sample if (flags & 0x000002 == 1) ft only one NALU type unsigned int(8) NAL_unit type; unsigned int(8) numNalus; for (1=0; i< numNalus; i++) unsigned int(16) nalUnitLength; bit(8*nalUnitLength) nalUnit; ) else {// Several NALU types unsigned int(8) numArrays; for (i=0; j < numArrays; j++) unsigned int(8) NALunit type; unsigned int(8) numNalus; for (1=0; i< numNalus; i++) unsigned int(16) nalUnitLength; bit(8"nalUnitLength) nalUnit; I else for some given samples unsigned int(32) entry count; for (int 1=0; i < entry count; i++) unsigned int(32) sample_number if (flags & 0x000002 == 1) ft only one NALU type numArrays = 1; 10)e/se{ unsigned int(8) numArrays; for (1=0; j < numArrays; j++) unsigned int(8) NAL_unit type; unsigned int(8) numNalus; for (i=0; i< numNalus; i++) unsigned int(16) nalUnitLength; bit(8*nalUnitLength) nalUnit; The variants on arrays for APS NAL units only or for all non-VCL NAL units also combined with past-only or past plus current, as described herein before, apply here.

In another variant, the additional non-VCL NAL units only required for random access are provided as a set of bits with their length as shown below, without indication of NAL unit-based data. The box may then be renamed as AdditionalRandomAccessInformationBox, because it is no more NAL unit specific. However, given the type of a track where this box and additional data are used, the encapsulation module takes care of providing data so that they can be merged at parsing time within a sample data without breaking the samples. For example, in case of NALU samples and of additional NAL units from a VVC bit-stream provided as inline_data, the concatenated inline_data must be a concatenation of NALU length followed by the additional NAL unit itself (header and payload). The length parameter indicates the number of bytes for all these concatenated additional NAL units required for random access. A parser, while de-encapsulating a movie fragment, when it encounters such box, during normal playout of the movie fragment, may ignore these inline_data. When seeking to this fragment or starting playout from this fragment, a parser prepends (e.g. step 655 in Figure 6b) the inline data in the NALU sample corresponding to the first sample of the movie fragment.

Box Type: 'arai' (or any other reserved four-character code not already in use) Container TrackFragmentBox Mandatory: No Quantity: Zero or one aligned(8) class AdditionalRandomAccessInformationBox extends FullBox(aranc version=0, flags=0) unsigned int(16) length; unsigned int(8) inline datallength]; Encapsulating additional non-VCL NALUs for random access within a data part of a file Several alternatives are described for these embodiments according to which the additional NAL units that are required for random access are stored in the data part of the encapsulated file. These additional NAL units may be stored as a contiguous set of NAL units at the beginning of a sample. They may also be stored as a contiguous set of NAL units after one or more non-VOL NAL units of a sample. They may also be stored as non-contiguous NAL units within a sample, preferably before the first VOL NAL unit. In any case, the encapsulation module takes care of NAL unit order specified by the video codec specification so that the extracted bit-stream by any ISOBMFF and ISO/IEC 14496-15 compliant parser leads to a conforming bit-stream.

Specific NALU type In one embodiment, the structure to append the additional non-VOL NAL units required only for random access is a specific NAL unit. It is identified with a reserved NAL unit type value that is not used by the video codec specification. For example, when using VVC, a NAL unit type value in the range of unspecified non-VOL NAL unit types may be used. This particular NAL unit type, when set by an encapsulation module, warns a parser that this is a specific NAL unit. According to this embodiment, this NAL unit is dedicated to contain an additional NALU that is present in a sample only for random access onto or for seeking to this sample. A parser may decide to skip this specific NAL unit during normal playout. This specific NAL unit is special because the NAL unit type indicates that this is a NAL unit container. This NAL unit simply packages the whole original NAL unit into an optional NAL unit, that can be skipped or appended depending on the usage, here random access NAL unit.

The NAL unit header follows the syntax from VVC specification. The nal_unit type is a reserved one. The NAL unit payload simply consists in the whole original NAL unit (header + payload). Then, when a parser decides to process such specific NAL unit, it first removes the NAL unit header with the specific NAL unit type and then process the original NAL unit, for example an APS NAL unit required for random access. The NAL unit length preceding this specific NAL unit consists in the total length of the specific NAL unit (i.e. the length of its NAL unit header plus the length of the whole original NAL unit). This specific NAL unit type should not go into a video bit-stream. This means that when ignored or skipped, it must be removed by a parser before feeding the NAL units of the sample to a video decoder. When processed, the NAL unit header with specific NAL unit header must be removed and only the whole original NAL unit must be passed to the video decoder.

Specific Aggregated NALU Figure 8 illustrates an example of a specific NALU-like structure 800 in the data part of a VVC track 805, aggregating the additional non-VOL NAL units required only for random access 810. This track has a specific sample entry type (e.g. vvi2' or (war) indicating that it is self-contained and suitable for random access because it contains and allows identification of the additional non-VOL NAL units required for random access. In a particular embodiment, the additional non-VOL NAL units required only for random access are stored as NAL units in a specific structure in the data part of the file. A specific NAL unit type, not used by the video codec specification is used for declaring, at encapsulation time, and identifying, at parsing time, a set of optional set of NAL units, that can be skipped or appended, that are required only for seeking or random access operation into the media file. An example of such a specific NAL unit structure is described in Figure 8 with reference 800.

According to the illustrated example, a first parameter, skippableSize, provides the size of the skippable NAL unit. In addition to this parameter, the structure contains a NAL unit header, compliant with the video codec specification in use, except for the NAL unit type that is a specific value, not in use in the video codec specification.

This header is followed by the NAL unit payload consisting in a parameter that provides the number of bytes following this aggregation that should be considered as skippable in case this specific NAL unit is referenced by an extractor with data_length equal to zero. Additionally, there is a loop on the aggregated additional NAL units within the structure until the size of the whole structure is reached.

This specific NAL unit type should not go into a video bit-stream. This means that when ignored or skipped, it must be completely removed by a parser before feeding the NAL units of the sample to a video decoder. When processed, the NAL unit header with specific NAL unit header must be removed and only the aggregated NAL units must be passed to the video decoder.

Including additional NALUs required for random access by reference Figure 9 illustrates an example of encapsulation of a video bit-stream into media segments, for example into media segments 900-1 to 900-3, containing several media fragments, in particular media fragments 905-1 and 905-2.

The encapsulation module, for example encapsulation module 205 in Figure 2, may provide one or more additional non-VCL NAL units, for example the additional non-VCL NAL unit 910, that are required for random access, only for the first sample of the segment. For other random access points in the segment, for example the first sample of a fragment 905-2 within the segment 900-2, one or more references to the additional NAL units required for random access are included as one or more specific NAL units, instead of being duplicated For example, reference 920 to the additional NAL unit 910 is included as specific NAL unit 915 in the second fragment 905-2 of the segment 900-2. The specific NAL unit 915 may consist in a simple copy or in a rewriting instruction of the referenced NAL unit 910.

Such a specific NAL unit may be an Extractor NAL unit. It has a specific NAL unit type that does not conflict with the other types of NAL units in the video codec specification. It may simply consist in a copy of one or more NAL units or it may be composed of an inline constructor plus one sample constructor. Using an inline constructor plus a sample constructor may be useful if a modification of the referenced NAL units is required when copying those to the sample. For example, editing the NAL unit type of a referenced NAL unit (e.g. changing a suffix APS into a prefix). For the sake of illustration, specific NAL unit 915 may be a new Extractor, for example signalled with a specific type of NAL unit that is different from a classical Extractor to further indicates, in addition to the extraction operation, that it may be skipped. This means that it should be resolved by players or parsers only for a given playout mode, for example on seeking or on random access to a sync sample. In normal playout, it may be skipped or ignored by a parser. In any case, this specific NAL unit should not appear in the video bit-stream. It must be ignored and then removed from the bit-stream or resolved by applying the copy or extraction operations into NAL units compliant with the video codec specification.

Either the Extractor NAL unit type indicates that it is a skippable or optional or additional Extractor to be processed for random access or, it may be a classical Extractor but with a specific constructor type indicating that it is an optional or additional constructor to be applied only for random access. To signal this, a non-used constructor type value is reserved for the signalling of such constructor.

For example, the Extractor structure may be updated as follows: class aligned(8) Extractor 0 { NALUnitHeader0; do { unsigned int(8) constructor type; if( constructor type == 0) SampleConstructor(); else if( constructor type == 2) InlineConstructor0; else if( constructor type == 3) SampleConstructorFromTrackGroup0; else if( constructor type == 4) ReferenceConstructor(); else if( constructor type == 5) DefaultReferenceConstructor0; else if( constructor type == 6) NALUStartlnlineConstructor0; else if (constructor type == 7) OptionalConsfructor0; }while( !End0fNALUnit0) 35.) where the optionalConstructor may have the same parameters as a SampleConstructor except that it is optional to process.

In a variant, the optional constructor has simplified description. For example, assuming that it extracts non-VCL NAL units from the current track containing the Extractor NAL unit, the track_ref_index parameter is not indicated. The sample offset is present, for example referencing a previous sync sample in a movie fragment or in a segment. The data_offset and data_length parameters are also present to indicate which byte range to copy, for example from a previous sync sample.

In another variant the optional constructor may be a constructor with a copy mode set to the value 'I to copy full NAL units or to the value 2 to copy only the NAL unit payload in case the NAL unit header needs to be rewritten (for example to change NAL unit types from SUFFIX_APS to a PREFIX_APS or from a SUFFIX_SEI to a PREFIX_SEI).

Encapsulating additional non-VCL NALUs for random access within both a metadata and a data part of a file Extended SyncSampleBox to refer additional NALUs In a particular embodiment, a description of the additional non-VOL NAL units required for random access is provided in the metadata part of the file, while these NAL units are stored in the media data part. The description may be provided in an extension of the SyncSampleBox. The SyncSampleBox still contains the list of indexes for samples that are sync samples. While the list of sample indexes is generic (agnostic to the video compression format), the extension proposed for the SyncSampleBox is specific to the video compression format. The SyncSampleBox then rewrites into (extensions or new parameters appear in bold): aligned(8) class SyncSampleBox extends FullBox('stssc version, 0) { unsigned int(32) entry count; int i; for (i=0; i < entry count; i++) unsigned int(32) sample number; if (version > 0) f unsigned int (32) codec specific_parameters; where the codec_specific_parameters is defined for VVC as a parameter called, for example, required_nalu_info, or as a combination of this required_nalu_info parameter with a reserved_bits parameter for extensibility. The parameter required_nalu_info (or any other name) is a parameter providing means to identify, within a given sync sample (corresponding to the sample_number -th sample), the NAL units that are required only for random access. It consists in a 1-based index indicating in the list of NAL units for a given sync sample those that are required only for random access. It may be coded on the same number of bits as the number of bits used for codec_specific_parameters, or when combined with reserved_bits, on a total number of bits for both parameters that should be equal to the number of bits used for the codec_specific_parameters. The required_nalu_info information is set by the encapsulation module when copying the additional non-VOL NAL units required for random access (e.g. step 320 in Figure 3 or step 440 in Figure 4). It is useful for a player or a reader to skip these NAL units during normal playout mode (e.g. step 655 in Figure 6b).

In a variant, the codec_specific_parameters parameter consists in a list of indexes of additional NAL units required only for random access, still described in the extended SyncSampleBox but defined here as an offset (or start_offset) and a length, expressed in number of NAL units. The offset is a 1-based index indicating the index of the first NAL unit required only for random access and the length provides the number of additional NAL units. This variant is suitable when all the additional NAL units required only for random access are contiguous and is defined as follows: unsigned int (8) ra_nalu_start_offset; unsigned int (8) ra_nalu_length; unsigned int (16) reserved_bits; In this example, the offset and length for the indexes of the additional NAL units required only for random access is coded on 8 bits each (letting possibility for other information in the codec_specific_parameters). This is because usually the non-VOL NAL units are placed before the VOL ones in the NALU sample and there are not so many additional NAL units required for random access. In case of video compression format or profile or level or complex bitstream requiring more additional NAL units than 256, of course, the length field (ra_nalu_length) or the offset (ra_nalu_start offset) field may be coded onto 16 bits (or any combination of number of bits between offset, length and reserved bits (set to 0) that is no greater than 32 bits or no greater than the number of bits in use for codec_specific_parameters).

Extended SampleDependencyTypeBox sdtp' to refer additional NALUs An alternative to the extension of the SyncSampleBox may be to extend the SampleDependencyTypeBox csdtp in a similar way as the SyncSampleBox is extended in the previous variant: extend it with a codec specific_parameter providing more information, for example on the dependent samples. An extension may also consist in using the reserved value 3 in the sample_depends_on parameter to indicate that the sample does not depend on others pictures but depends on more general information like coding configurations or parameter sets. The presence of the codec_specific parameters may be conditioned to the fact that the sample_depends_on parameter is set to this value 3. As for the SyncSampleBox extension, the codec_specific parameters may provide an indication on the additional non-VOL NAL units that are required only for random access.

aligned(8) class SampleDependencyTypeBox extends FullBox(sdtp; version, 0) { for (i=0; i < sample_count; i++){ unsigned int(2) is leading; unsigned int(2) sample depends on; unsigned int(2) sample is depended on; unsigned int(2) sample has redundancy; if (version > 0) if (sample_depends_on == 3) I. unsigned int (32) codec specific_parameters; where codec_specific_parameters has the same semantics as in the extended SyncSampleBox.

Use SubSampleInformation box ISOBMFF provides a box for the description of subsample information (subs) in an encapsulated file or even in an encapsulated fragmented file (i.e. an ISOBMFF file containing movie fragments). A subsample, or sub-sample, is a contiguous range of bytes of a sample. Depending on the flag value of the SubSampleInformafionBox and depending on the video codec format, the subsample can be based on a NAL unit basis, a slice basis, or a tile basis, etc. To indicate which NAL units within a sample correspond to additional NAL units only required for random access, the subsample information box may be used as follows: -the parameter entry_count is set to the number of sync samples when the 'subs' box is only used to describe additional NAL units required for random access.

-the sample_delta parameter indicates the distance in samples between two sync samples, or between two samples for which information at subsample level is available.

-the semantics for the subsample_count and subsample_size parameters is unchanged.

For example, the subsample_count may correspond to the number of additional NAL units required for random access for a given sample. When the additional NAL units are copied in the data by the encapsulation module as a contiguous byte range at the beginning of a sample, the subsample_count may be set to 1. When the additional NAL units are copied in the data by the encapsulation module as a contiguous byte range within a sample, subsample_count may be set to at least two: the first range corresponding to the range of NAL units preceding the contiguous range of additional NAL units (described as the second subsample). When the additional NAL units are not copied into the data as one contiguous byte range, the value of subsample_count may be set at least equal to the number of ranges corresponding to additional NAL units only required for random access plus the number of ranges corresponding to regular (always required or from the original PU in the bit-stream corresponding to the current sample) NAL units that is placed within two ranges of additional NALUs.

-the subsample_priority is set to a low value for a subsample corresponding to additional NAL units required only for random access, indicating that the subsample is not highest priority for the normal playout, because required only for random access or seeking operation. As such, the discardable flag is set to true, meaning that it is not required to decode the current sample (at least for normal playout).

-the codec_specific_parameters may be set as follows: the flags value of the 'subs' box may be set to 0 to indicate NAL-unit based subsamples. The 32 bits dedicated to the codec_specific_parameters may then be used as follows: the bits already used for VVC subsample description are kept untouched and some reserved bits are used for the specific signaling of additional NAL units required only for random access.

The codec_specific_parameters field may be defined as follows: if (flags == 0) {ft NAL-unit based subsample unsigned int(1) VclAlalUnitFlag; unsigned int(1) RapNalUnitFlag; unsigned int(1) GraNalUnitFlag; bit(28) reserved = 0; unsigned int(1) RaAdditionalNalUnitFlag; else (... unchanged..) where RaAdditionalNalUnitFlag equal to 0 indicates that all NAL units in the sub-sample are non-VCL NAL units that are required for normal playout. Value 1 indicates that all NAL units in the sub-sample are non-VOL NAL units that are only required for random access and that may be ignored or skipped during normal playout of the related video track.

From this codec_specific_parameters, players can rapidly check whether to further inspect the subsample information or not. For example, when discardable is set to true and codec_specific_parameters has the value 1, this indicates that the subsample may be skipped by a parser (e.g. step 655 in Figure 6b), because it corresponds to additional non-VOL NALUs only required for random access. This may be further confirmed by a sample entry type indicating a track containing additional non-VOL NAL units that are only required for random access (e.g. vvit or tvvi2' or vvcr). As another example, when the codec_specific_parameters has the value 0, the subsample corresponds to non-VOL NAL units that may be required for decoding.

The use of SubSampleInformation box may be useful when the encapsulation module, encapsulation module 205 in Figure 2, copies an additional nonVCL NAL units required for random access not necessarily within a sync sample but in a sample that is the first to use this additional non-VOL NAL unit. Then, the number of entries in the 'subs' box may be greater than the number of sync samples, also considering samples located between two sync samples. Such copying may lead to smoother bitrate in a group of images, avoiding a peak rate on the sync sample for the IRAP.

The extended use of SubSampleInformation box would apply as well in the compact version of the SubSampleInformation Box (e.g. 'subs' box with version =2).

Use flags from movie fragments boxes The ISOBMFF tools for movie fragment description provides means to indicate, on a sample basis, whether a given sample has dependency or not and whether it is a sync sample or not For example, the track fragment header box or the TrackExtendsBox 'trex' or the TrackRunBox 'thin' define values in their flags parameter or in fields in their definition for this purpose. Among these boxes, none provides description at NAL unit level on some "disposable", or "skippable" or "additional" NAL units required only in some situation, like seeking or random access for example.

An embodiment to provide a self-contained single track providing random access in a fragmented file consists in extending some boxes for the movie fragment description to support NAL unit-level description of additional information, for example using some reserved bits in some flags values.

In a particular embodiment, for encapsulation of fragmented file, the sample flags is extended as follows On bold): an additional parameter is provided in the sample flags as one bit flag (note that this additional parameter may be placed at any position in the list after the reserved bits and preferably before the last 16 bit parameter (for parsing convenience): bit(3) reserved?; unsigned int(2) is leading; unsigned int(2) sample depends on unsigned int(2) sample is depended on; unsigned int(2) sample has redundancy; bit(3) sample_padding value; bit(1) sample is non sync sample; (replaces stss) unsigned int (1) sample_has additional dependency; unsigned int(16) sample_degradation_priority; where the new parameter sample_has additional dependency indicates whether a given sample has dependency to previous samples that are not image coded data like VOL NAL unit but rather dependencies to high level information like parameter sets or bit-stream configuration, like for example non-VCL NAL units. In a variant, using another bit from the bits available in the resented parameters, the sample flags contain another parameter, for example sample_contains dependency data, that indicates whether these additional dependencies are provided within the sample or should be found by parsing other information from the sample or track description (for example track reference to a non-VCL track).

bit(2) reserved; unsigned int(2) is leading; unsigned int(2) sample depends on unsigned int(2) sample is depended on; unsigned int(2) sample has redundancy; bit(3) sample_padding value; bit(1) sample is non sync sample; unsigned int (1) sample_has_additional dependency; unsigned int (1) sample_contains_dependency data; unsigned int(16) sample degradation_priority; This additional parameter may be useful to handle random access for both single track and multi-track encapsulated files.

The use of these two additional parameters, combined with the sample is non sync sample parameter, allows description of the following 10 configurations: a single track contains random access or sync samples having dependencies to non-VOL NAL units in previous samples and these non-VOL NAL units are available in the data for this sample. This is indicated by setting the following values: sample_is_non_sync sample=0; sample_has_additional_dependency=1 and sample_contains_dependency_data=1.

a single track or video track in a multi-track encapsulation contains random access or sync samples having dependencies to non-VOL NAL units in previous samples but these non-VOL NAL units are not available in the data for this sample. This is indicated by setting the following values: sample_is_non_sync_sample=0, sample_has_additional_dependency=1, and sample_contains_dependency_data=0.

a single track or video track in a multi-track encapsulation contains samples that are not necessarily sync or random access samples having dependencies to non- VOL NAL units in previous samples before the last sync sample but these non-VOL NAL units are not available in the data for this sample. This is indicated by setting the following values: sample_is_non_sync_sample=1, sample_has_additional_dependency=1, and sample_contains_dependency_data=0.

a single track encapsulation contains samples that are not necessarily sync or random access samples having dependencies to non-VOL NAL units in previous samples before the last sync sample and these non-VOL NAL units are available in the data for this sample. This is indicated by setting the following values: sample_is_non_sync_sample=1, sample_has_addifional_dependency=1, and sample_contains_dependency_data=1.

a single track or video track in a multi-track encapsulation contains samples that are not necessarily sync or random access samples that have no dependencies to non-VOL NAL units in previous samples before the last sync sample. This is indicated by setting the following values: sample_is_non_sync_sample=1, sample_has_addifional_dependency=0, and sample_contains_dependency_data=0. This configuration may be the default case. As such, the default_sampleflags, in TrackExtendsBox or in TrackFragmentHeaderBox, may be initialized to this set of values. This allows explicit of additional dependencies signaling only on the samples, for example in the ?run' box. This ?run' box may have its first-sample-flags-present that is not set.

the configurations with sample_has_additional_dependency=0 and sample_contains_dependency_data=1 are not allowed; the configuration with sample_is_non_sync sample=0, sample_has_addifional_dependency=0, and sample_contains_dependency_data=0 indicates a real random access or synchronization sample having no dependencies of any kind to information from the bit-stream for previous picture units.

When the parameter sample_contains dependency data is set to 1, in the case of VVC bit-stream, this means that the additional non-VOL NAL units are contained within the samples. For example, APS NAL units needed for the decoding of some samples in a current movie fragment that would come from a previous movie fragment are duplicated in the current movie fragment. When the parameter sample contains dependency data is set to 0, in case of self-contained track, this means that the dependency data (e.g. additional non-VCL NAL units like for example APS NAL units) are provided in some box in the metadata part. These dependency data may be provided in the aran' or the 'subs' box within the track fragment or directly inline in a new version of the Itrun' box. When the parameter sample contains dependency data is set to 0, and the track is not a self-contained track (e.g. indicated by a specific sample entry type like Ivvi2I, IvvcrI, or updated (vvit). The non self-contained track may have a track reference to one or more other tracks providing non-VOL NAL units. A parser must inspect these one or more other tracks to get the appropriate non-VCL NAL units required to decode the sample.

Using (Sample group and) NALU mapping ISO/IEC 14496-12 defines sample groups describing random access or synchronization samples (e.g. tap or 'sync' sample groups).

Figure 10 illustrates an example of a NAL unit mapping for indicating the additional non-VOL NAL units, denoted 1000. This NAL unit mapping may be used to indicate additional NAL units that are duplicated only for enabling random access. All the samples may be associated to a NAL unit mapping in the sample group box 1010.

Alternatively, only the sync samples may be associated with a NAL unit mapping in the sample group box 1010. The NAL unit mapping is stored in a SampleGroupDescriptionBox with grouping_type equal to nalm' (not represented) that contains as entries NAL unit mapping structures as 1000. The group_description_index parameter in 1010 corresponds to an entry in this SampleGroupDescriptionBox with grouping_type equal to nalm'. This sample grouping and NAL unit mapping may be used in fragmented files (e.g. the track fragment flags may provide an indication that some samples (e.g. only the first one) contain additional data, this sample grouping and NALU mapping may provide the indication of which NAL units within a given sample in a track fragment are actually additional, optional or skippable ones).

As illustrated, the VVC track denoted 1005 contains a sbgp' box 1010 of the type 'na/m' with a new grouping_type_parameter aran7, or any four-character code reserved for describing additional random access NAL units. The NAL units mapped onto this new grouping_type_parameter is an indication for parsers that these NAL units are to be processed only for random access to or seeking in the encapsulated media file. In other words, players may ignore or skip these NAL units during normal playout. A NAL unit being mapped to groupl D 0 On 1000) by a NALUMapEntry implies that the NAL unit is required for decoding (by doing so, only the additional NAL units are mapped to a description in the SampleGroupDescriptionBox 1020). The corresponding sample group description box 1020 with grouping type set to arani (or the specific four-character code used in the grouping_type_parameter of the 'nalm' sample group 1010) may provide the sample_number of the sync sample requiring these additional NAL units. The VisualSampleGroupEntry of type aran71020-1 or 1020-2 in the sgpcli box 1020 may have no specific parameters (only those required by parent structures), just indicating that a NAL unit mapped onto this entry in the Isgprif is an optional, skippable or additional NAL unit only required for random access.

By parsing this NAL unit mapping, parsers have an indication of whether a given NAL unit is to be processed or should be processed only in case of random access. Moreover, a parser may only keep the mandatory NAL units to provide a simpler bit-stream to video decoders.

Specific NALU sample In one embodiment, where the additional non-VOL NAL units required for random access are appended in the sample itself, the definition of the NALU sample from ISO/IEC 14496-15 is updated with an additional parameter. This new parameter allows indicating within a NALU Sample whether it contains optional, skippable or additional NAL units that may be skipped during normal playout. This parameter is illustrated in bold below: aligned(8) class NALUSample{ unsigned int PictureLength = sample size //Size of Sample from SampleSizeBox (including flag below) unsigned int has_skippable_NALUs; II Flag indicating skippable (or additional) NALUs or not for (1=0; i<PictureLength-1; )//to end of the picture{ uint (DecoderConfigurationReconi.LengthSizeMinusOne+1)*8)NALUnitLength; bit(NALUnitLength*8) NAL Unit; i += (DecoderConfigurationRecordiengthSizeMinusOne+1) + NALUnitLength; A parser encountering such sample may process a subset of sample's NAL units, for example by checking for presence of a NAL unit mapping with grouping_type_parameter='aran' (as described hereinafter by reference to Figure 10.) or any metadata structure describing or referencing additional non-VOL NAL units only required for random access Using Segment Index Box It is to be recalled that the extension of the segment index box is under consideration in the MPEG File Format group. For example, a new version (=2) of the 'sick' box may provide a range of bytes sufficient to download a SAP (the part in bold 30 below).

aligned(8) class SegmentIndexBox extends FullBox('sidxr, version, 0) { unsigned int(32) reference ID; unsigned int(32) timescale; if (version==0) unsigned int(32) earliest_presentation time; unsigned int(32) first offset; else { unsigned int(64) earliest_presentation time; unsigned int(64) first offset; unsigned int(16) reserved = 0; unsigned int(16) reference count; for(i=1; i <= reference_count; i++) bit (1) reference type; unsigned int(31) referenced size; unsigned int(32) subsegment duration; bit(1) starts with SAP; unsigned int(3) SAP type; unsigned int(28) SAP delta time; if (version>=2) if (starts with SAP) { unsigned int(32) SAP range; where SAP_range provides a range of bytes, starting from the beginning of the sub-segment, sufficient to download the SAP Of any) associated with this sub-segment.

According to a particular embodiment, the SAP_range provides the range of bytes, from the beginning of the sub-segment, sufficient to download the SAP (if any) associated with this sub-segment, this byte range including the additional NAL units required for random access. Optionally, the byte offset and length are provided in a new version of the sidx' box as a new ARAN_range parameter as illustrated below.

aligned(8) class SegmentlndexBox extends FullBox('sidxr, version, 0) { unsigned int(32) reference ID; unsigned int(32) timescale; if (yersion==0) unsigned int(32) earliest_presentation time; unsigned int(32) first offset; else { unsigned int(64) earliest_presentation time; unsigned int(64) first offset; unsigned int(16) reserved = 0; unsigned int(16) reference count; for(i=1; i <= reference_count; i++) bit (1) reference type; unsigned int(31) referenced size; unsigned int(32) subsegment duration; bit(1) starts with_SAP; unsigned int(3) SAP type; unsigned int(28) SAP delta time; if (version>=2) ( if (starts_with_SAP) ( unsigned int(32) SAP range; unsigned int (16) ARAN offset; unsigned int (16) ARAN length; In a variant to the ARAN byte offset and length corresponding to the additional NALUs required for random access, the new version of the sidx' box rather contains a flag indicating, when set, that the SAP is a true SAP point, in the sense that the required non-VCL are also part of the sample. When this flag is not set, it indicates that the additional NAL units have to be obtained elsewhere, for example in a dedicated metadata structure as in a variant of the embodiment in which the additional non-VOL NAL units are stored within a data part of a file, or in another track providing non-VOL NALUs (e.g., referenced in the track reference box of this track).

In a variant, also under consideration by the MPEG file format group, the new version of the 'sidx' providing byte range for SAP is expressed as follows: aligned(8) class SegmentIndexBox extends FullBox('sidx', version, 0) { unsigned int(32) reference ID; unsigned int(32) timescale; if (version==0) unsigned int(32) earliest_presentation time; unsigned int(32) first offset; else { unsigned int(64) earliest_presentation time; unsigned int(64) first offset; unsigned int(16) reserved = 0; unsigned int(16) reference count; for(i=1; i <= reference_count; i++) bit (1) reference type; unsigned int(31) referenced size; unsigned int(32) subsegment duration; bit(1) starts with_SAP; unsigned int(3) SAP type; unsigned int(28) SAP delta time; if (flags & mask) ( for (1=1; i <= reference_count; i++) if ((reference_typefil == 0) && (starts_with_SAPIII == 1)) unsigned int(32) SAP end offset; where SAP_end_offset provides the position of the last byte of the SAP in the current sub-segment.

As in the previous variant, the new version of the sidx' box is extended with a byte offset and length (or a first and last byte positions) indicating where the additional NAL units are required for random access. The new sidx' box then rewrites as follows: aligned(8) class SegmentIndexBox extends FullBoxesidx1, version, 0) { unsigned int(32) reference ID; unsigned int(32) timescale; if (version==0) { unsigned int(32) earliest_presentation time; unsigned int(32) first offset; else { unsigned int(64) earliest_presentation time; unsigned int(64) first offset; unsigned int(16) reserved = 0; unsigned int(16) reference count; forff=1; i <= reference_count; i++) bit (1) reference type; unsigned int(31) referenced size; unsigned int(32) subsegment duration; bit(1) starts with SAP; unsigned int(3) SAP type; unsigned int(28) SAP delta time; if (flags & mask) { for(i=1; i <= reference_count; H+) if ((reference_type[i] == 0) && (starts_with_SAP[i] == 1)) unsigned int(32) SAP_end_offset; unsigned int (16) ARAN_offset; unsigned int (16) ARAN_Iength, 35} As another variant for this new version of the siclx. box, the ARAN offset and length parameters may be replaced by a flag indicating whether the SAP is self-contained (it contains the additional NALUs required for random access) or not.

Whatever the variant to indicate presence/absence or byte range for the ARAN, this may be controlled at the box level using reserved values for the flags parameter of the box. For example, a flag value has_self_contained_SAP indicates, when set, that the SAP of each sub-segment of the segment are self-contained (they all contain in their NALUs the additional NALUs required for random access). This allows use of the ISOBMFF segment indexing wherever are stored the additional NAL units required for random access.

Although the present invention has been described herein above with reference to specific embodiments, the present invention is not limited to the specific embodiments, and modifications will be apparent to a person skilled in the art which lie within the scope of the present invention.

Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the invention, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate.

In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that different features are recited in claims not dependent upon each other does not indicate that a combination of these features cannot be advantageously used.

Claims

CLAIMS1. A method for encapsulating a video bit-stream in a server, the method comprising: obtaining at least one network abstraction layer unit (NAL unit) of a first picture, buffering the at least one obtained NAL unit, obtaining at least one NAL unit of a second picture, the second picture being a random access picture following the first picture, and encapsulating the at least one obtained NAL unit of the second picture and the at least one buffered NAL unit in an encapsulation structure making it possible to generate a video bit-stream without processing the encapsulated at least one buffered NAL unit when processing the encapsulated video bit-stream.
2. The method of claim 1, wherein the at least one obtained NAL unit of the first picture is buffered upon determining that the at least one obtained NAL unit of the first picture is of a specific type.
3. The method of claim 1 or claim 2, wherein the at least one buffered NAL unit is encapsulated with the at least one obtained NAL unit of the second picture upon determining that the at least one obtained NAL unit of the second picture references the at least one buffered NAL unit.
4. The method of any one of claims 1 to, wherein the at least one obtained NAL unit of the second picture and the at least one buffered NAL unit of the first picture are encapsulated into a same self-contained track.
5. The method of claim 4, wherein the at least one obtained NAL unit of the second picture and at least one buffered NAL unit to be encapsulated within the encapsulation structure are encapsulated within a media data part of the track.
6. The method of claim 4 or claim 5, wherein at least one buffered NAL unit to be encapsulated within the encapsulation structure is encapsulated within a metadata part of the track.
7. The method of claim 2 or of any one of claims 3 to 6 depending on claim 2, further comprising determining an identifier of a type of data of an obtained NAL unit of the specific type, an obtained NAL unit being buffered as a function of a type and of an identifier.
8. The method of claim 2 or of any one of claims 3 to 7 depending on claim 2, wherein the specific type is the type of a parameter set NAL unit.
9. The method of any one of claims 1 to 8, further comprising identifying at least one picture as being the at least one random access picture.
10. A method for generating a video bit-stream from an encapsulated video bit-stream, in a client device, the method comprising, receiving an instruction for generating a video bit-stream starting from a selected random access picture, obtaining at least one network abstraction layer unit (NAL unit) of the selected random access picture, the obtained NAL unit being associated with a specific structure comprising at least one additional NAL unit; accessing the specific structure and obtaining the at least one additional NAL unit, and generating a video bit-stream comprising the at least one additional NAL unit and the at least one obtained NAL unit of the selected random access picture, wherein the specific structure is accessed only when the selected random access point is used as an access picture.
11. The method of claim 10, wherein the at least one additional NAL unit is of a specific type, wherein the at least one NAL unit of the selected random access picture and the at least one additional NAL unit are obtained from a same self-contained track, and wherein at least one additional NAL unit of the specific structure is obtained from a metadata part of the track.
12. The method of claim 11, wherein at least one additional NAL unit of the specific structure is obtained from a media data part of the track.
13. A method for generating a video bit-stream from an encapsulated video bit-stream, in a client device, the method comprising, obtaining a set of network abstraction layer units (NAL units) of an encapsulated picture, if the encapsulated picture is a random access picture that is not to be used as an access picture, filtering the obtained set of NAL units to skip at least one additional NAL unit, and generating a video bit-stream as a function of the filtered NAL units, wherein an additional NAL unit is a NAL unit of a picture that has been added to a set of NAL units a following random access picture.
14. The method of claim 13, wherein the skipped at least one additional NAL unit is of a specific type, wherein at least one NAL unit of the filtered NAL units and at least one skipped additional NAL unit belong to a same self-contained track, and wherein the at least one skipped additional NAL unit belongs to a media data part of the track.
15. The method of any one of claims 2, 11, 12, or 14, or of any one of claims 3 to 9 depending on claim 2, wherein the video bit-stream is of the versatile video codec type, the NAL units being of the video coding layer type or of the non video coding layer type.
16. The method of claim 15, wherein the specific type is a type of abstraction layer units corresponding to an adaptation set.
17. A computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing each of the steps of the method according to any one of claims 1 to 16 when loaded into and executed by the programmable apparatus.
18. A non-transitory computer-readable storage medium storing instructions of a computer program for implementing each of the steps of the method according to any one of claims 1 to 16.
19. A device for transmitting media data, the device comprising a processing unit configured for carrying out each of the steps of the method according to any one of claims 1 to 16.
20. A signal carrying an information dataset for media data, the information dataset comprising encoded media data encapsulated according to the method of claims 1 to 16.