GB2617359A - Method and apparatus for describing subsamples in a media file - Google Patents

Method and apparatus for describing subsamples in a media file Download PDF

Info

Publication number
GB2617359A
GB2617359A GB2205003.3A GB202205003A GB2617359A GB 2617359 A GB2617359 A GB 2617359A GB 202205003 A GB202205003 A GB 202205003A GB 2617359 A GB2617359 A GB 2617359A
Authority
GB
United Kingdom
Prior art keywords
subsample
description
subsamples
sample
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB2205003.3A
Other versions
GB202205003D0 (en
Inventor
Tocze Lionel
Bellessort Romain
Denoual Franck
Maze Frédéric
Ruellan Hervé
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to GB2205003.3A priority Critical patent/GB2617359A/en
Publication of GB202205003D0 publication Critical patent/GB202205003D0/en
Priority to PCT/EP2023/058166 priority patent/WO2023194179A1/en
Publication of GB2617359A publication Critical patent/GB2617359A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • H04N21/2353Processing of additional data, e.g. scrambling of additional data or processing content descriptors specifically adapted to content descriptors, e.g. coding, compressing or processing of metadata

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Library & Information Science (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Encapsulating media data in a file, or reading encoded media data from a file, the media data comprising samples (e.g. video frames, images or point cloud frames), at least some of the samples being organized into subsamples (e.g. spatial divisions such as tiles). Subsample descriptive metadata (e.g. in a SubSampleInformationBox 790) describes the subsample organization, the metadata comprising a subsample description for each subsample, where at least one subsample description comprises a reference to a set of property values. In embodiments the reference is a reference_id parameter 705 pointing to a reference table such as a SubsampleRef array containing the common properties. Flags 715 may be used on the description box (SubSampleInformationBox) to indicate other properties. One “applies_to_all_samples” flag may indicate all samples are divided into subsamples, allowing an entry_count parameter 720 to be omitted. A “regular_sample_pattern” flag may indicate that samples containing subsamples are regularly spaced, so a sample_delta parameter 730, 735 need only be included once. A “fixed_nb_subsamples_per_sample” 725, 745 may indicate that all samples are organised into the same number of subsamples, avoiding repetition of the subsample_count 740 parameter and the “merged property” flag may mean certain parameters are merged into a single combined parameter for efficiency.

Description

METHOD AND APPARATUS FOR DESCRIBING SUBSAMPLES IN A MEDIA FILE
FIELD OF THE INVENTION
The present disclosure concerns a method and a device for encapsulating media data into a media file. It concerns more particularly the encapsulation of samples of media data where samples of media data are organized into subsamples.
BACKGROUND OF THE INVENTION
The ISO base media file format (ISOBMFF, also called file format) is a general file format specified for the encapsulation of media data into a media file. This file format is generic, and it has been used as the basis for a number of other more specific file formats (e.g., ISO/IEC 14496-15 for the carriage of NAL unit structured video, ISO/IEC 23008-12 for the Image File Format or High Efficiency Image File Format (HEIF), ISO/IEC 23090-2 for the Omnidirectional MediA Format (OMAF), ISO/IEC 23090-10 for the carriage of Visual Volumetric Video-based Coding or ISO/IEC 23090-18 for the carriage of G-PCC Data...). All these file formats use the generic structures and mechanisms defined in ISOBMFF possibly enriched with additional structures or mechanisms specific to a given type of media data.
ISOBMFF is standardized by the International Standardization Organization as ISO/IEC 14496-12. A media file comprises two different parts. A first part corresponds to the actual media data, while a second part corresponds to a description of the media data and is called the metadata part. Media data can be timed sequences of media data or untimed media data. For timed sequences of media data, such as audio-visual presentations, this metadata part contains characteristics of the timed sequences of media data like the timing, size, or media descriptive or transformative information. For untimed media data, this metadata part contains characteristics of the media data like size or media descriptive or transformative information.
An ISO base media file (that we later call media file or movie file or media presentation) may come as one file comprising the whole media presentation or as multiple segment files, each segment comprising a temporal portion of the media presentation. An ISO base media file is structured into "boxes". In the file format, the overall presentation is called a movie. It is logically divided into tracks; each track represents a timed sequence of media (frames of video, audio samples, subtitles, for example). Within each track, each timed unit is called a sample. The sample is defined as all the media data associated with a same presentation time for a track. Each track comprises one or more sample descriptions; each sample in the track is associated with a sample description by reference. All the structure-data or metadata, including that defining the placement and timing of the media, is contained in structured boxes. The media data (frames of video, for example) is referred to by this structure-data or metadata. The overall duration of each track is defined in the metadata. Each sample has a defined duration. The exact decoding timestamp of a sample is defined by summing the duration of the preceding samples. And the exact presentation timestamp of a sample is defined by adding an offset (0 per default) to its decoding timestamp.
An ISOBMFF file may also comprise general untimed metadata, describing still images or sequence of unfimed images through items.
A metadata structure (called subsample information box or subs' ) may be used in the metadata part of the media file to describe subsample (or sub-sample) organization (or subsample information) and content of subsamples (or sub-samples) inside samples of a track or of an item (as an associated item property). This subsample information box comprises a description of the subsample organization of each sample or item containing sub-samples. Each subsample is described by a set of properties. There may be cases where properties describing the subsample organization of a sample or item are identical for all or at least several subsamples. This invention focuses on the optimization of this subsample information box structure allowing reducing the description cost and avoiding the duplication of information.
SUMMARY OF THE INVENTION
The present invention has been devised to address one or more of the foregoing concerns.
According to a first aspect of the invention there is provided a method of encapsulating media data in a media file, the media data comprising one or more samples, at least some of the one or more samples being organized into subsamples, the method comprising: generating subsample descriptive metadata describing the subsample organization, the subsample descriptive metadata comprising for each subsample a subsample description, the subsample description comprising properties; wherein at least one subsample description comprises a reference to a set of at least one property value.
In an embodiment, said at least one subsample description comprises an information related to the size of the subsample.
In an embodiment, the subsample descriptive metadata comprises one or more
subsample description boxes.
In an embodiment, the reference refers to another subsample description, the another subsample description comprising the set of at least one property value.
In an embodiment, the reference is the index of the another subsample description within the subsample descriptive metadata.
In an embodiment, subsample description comprises a property indicating whether the subsample description comprises the reference.
In an embodiment, the subsample descriptive metadata comprises a flag indicating that all the samples are organized into subsamples.
In an embodiment, the subsample descriptive metadata comprises a flag indicating that the subsample description has a repetition pattern between samples for all the samples.
In an embodiment, the subsample descriptive metadata comprises a flag indicating that at least one property is a merge of several properties.
In an embodiment, the subsample descriptive metadata comprises a flag indicating that all the samples have a same number of subsamples.
In an embodiment, the method comprises: generating pattern descriptive metadata for describing patterns, the pattern descriptive metadata comprising at least one pattern description, a pattern description being a set of at least one property value; and wherein the reference refers to a pattern description in the pattern descriptive metadata.
In an embodiment, the pattern description corresponds to several successive subsamples.
In an embodiment, the pattern description corresponds to the subsamples of a sample.
According to another aspect of the invention there is provided a method of reading media data in a media file, the media data comprising one or more samples, at least some of the one or more samples being organized into subsamples, the method comprising: obtaining from the media file subsample descriptive metadata describing the subsample organization, the subsample descriptive metadata comprising for each subsample a subsample description, the subsample
description comprising properties;
wherein at least one subsample description comprises a reference to a set of at least one property value; and obtaining the properties describing the subsample from the subsample description and the set of at least one property value.
According to another aspect of the invention there is provided a computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to the invention, when loaded into and executed by the programmable apparatus.
According to another aspect of the invention there is provided a computer-readable storage medium storing instructions of a computer program for implementing a method according to the invention.
According to another aspect of the invention there is provided a computer program which upon execution causes the method of the invention to be performed.
According to another aspect of the invention there is provided a device for encapsulating media data in a media file, the media data comprising one or more samples, at least some of the one or more samples being organized into subsamples, the device comprising a processor configured for: generating subsample descriptive metadata describing the subsample organization, the subsample descriptive metadata comprising for each subsample a subsample description, the subsample description comprising properties; wherein at least one subsample description comprises a reference to a set of at least one property value.
According to another aspect of the invention there is provided a device for reading media data in a media file, the media data comprising one or more samples, at least some of the one or more samples being organized into subsamples, the device comprising a processor configured for: obtaining from the media file subsample descriptive metadata describing the subsample organization, the subsample descriptive metadata comprising for each subsample a subsample description, the subsample description comprising properties; wherein at least one subsample description comprises a reference to a set of at least one property value; and obtaining the properties describing the subsample from the subsample description and the set of at least one property value.
At least parts of the methods according to the invention may be computer implemented. Accordingly, the present invention may take the form of an entire hardware embodiment, an entire software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module" or "system". Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
Since the present invention can be implemented in software, the present invention can be embodied as computer-readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible, non-transitory carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid state memory device and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the invention will now be described, by way of example only, and with reference to the following drawings in which: Figure la illustrates an example of samples organized in subsamples in a media file; Figure lb illustrates an example of encapsulated media data organized as a non-fragmented presentation in a media file according to the ISO Base Media File Format; Figure 2 describes the content of the subsample information box 'subs from
the prior art;
Figure 3 is an example of a system for the encapsulation or storage of multimedia presentations, where the proposed method for describing subsamples may be used; Figure 4 illustrates the encapsulation process according to one embodiment of the invention; Figure 5 illustrates the encapsulation process according to a second embodiment of the invention; Figure 6 illustrates a parsing method according to an embodiment of the invention; Figure 7 describes a first syntax embodiment of a subsample information box to support reference to a previous subsample and its common properties; Figure 8a and 8b illustrates a first syntax embodiment of a subsample information box to support reference to an explicit pattern description and its common properties; Figure 9 is a schematic block diagram of a computing device for implementation of one or more embodiments of the invention; Figure 10 is another example of metadata structures providing subsample
description;
Figure II illustrates the main steps of a method for encapsulating subsample description using the data structure illustrated by Figure 10.
DETAILED DESCRIPTION OF THE INVENTION
Timed media data is encapsulated in tracks of samples, while untimed media data is encapsulated into items. While the wording "subsample" (or sub-sample) refers to samples, the subsample concept may apply to both samples and items. The subsample information box may be placed in a track to describe the subsample organization (or subsample information) of the samples of the track. It may also be placed into an item property container box to describe the subsample organization of an item. In the following, the description uses the wording sample in a generic way to refer to both track samples or items unless otherwise specified.
A subsample is a contiguous range of bytes inside a sample. The specific definition of a sub-sample (e.g. its type) is specific to each type of media data carried inside an ISOBMFF file. Depending on the kind of data a sample represents, a sub-sample may be a NAL unit (ISO/IEC 14496-15) from AVC, HEVC, VVC... or any NALunit based video codec, a Type-length-value unit or G-FCC unit (ISO/IEC 23090-18), any set of these units (e.g., decoding-unit-based, tile-based, slice-based or picture-based subsamples) or any arbitrary data chunk. We may use as a shortcut "data unit" to mention data corresponding to a sub-sample, whatever the type of data: NAL units, TLV units, etc. The possible types for a subsample depend on the codec format: for example, in video samples, a sub-sample may correspond to a picture, to a sub-picture, to a tile, to a slice or to a coding tree unit or to a NAL unit. As another example, for volumetric media data, a sub-sample may correspond to a Point Cloud frame, to a Point Cloud subframe, to a file, to a slice or to a TLV unit.
A subsample description comprises different properties describing the subsample. A first type of properties is generic and present in all subsamples descriptions.
This is for example the case of the sub-sample size, that represents the number of bytes contained in a sub-sample. This information enables determining, for each sub-sample, it's byte-range inside a sample. This is realized by determining it's position inside a sample, by summing all previous sub-sample sizes information.
A second type of properties depends on the type of the subsample. This second type of property is only present in the description of the subsamples of that type of subsample. The type of subsample is typically associated with the codec used to encode the sample. The type of subsample or the type of subsample information or properties may then depend on a sample entry type and possibly on an additional property, for example on the flags value of the 'subs' box in ISOBMFF.
A subsample information box may be provided at the beginning of a media file (e.g. in the sample description of tracks described under a 'moovr box) or may be provided in movie fragments (e.g. in 'traf' boxes). It may also be provided as an item property when media data is described as data which does not require timed processing, as opposed to a timed sample data (e.g. as items in a 'meta' box).
Figure la illustrates an example of samples organized in subsamples in a media file. A bitstream 100 to be encapsulated using ISOBMFF file format, may be composed of several samples (e.g. frames for video, or items for a collection of images) 105-1 to 105-m. Each sample 105 may be the concatenation of several subsamples as for example the subsamples 110-11 to 110-14 for the sample 105-1 or the subsamples 110-ml to 110-m4 for the sample 105-m. The subsample organization and content depend on the configuration of the encoder that generates the corresponding bitstream.
It may occur, due to the choice of the configuration or the encoder specification, that the organization into subsamples is the same for each sample and that therefore the description of the subsamples is repeated from one sample to the next.
It may be the case for example when using the point cloud file compression format as specified in ISO/IEC 23090-9, where samples 105-1 to 105-m may represent a point cloud frame. In that case, a frame is composed for example of one geometry data unit 110-11 (GDU) and several, three in the illustrated example, attribute data units (ADU) 110-12, 110-13 and 110-14. According to the point cloud compression specification, all the following frames 105-2 to 105-m shall also contain the same types of data units: one GDU and three ADUs. Therefore, the description of a subsample to specify the properties of the data unit it contains such as the type of the data units is repeated for all the samples.
This repetition of the subsample description may also occur in bitstream corresponding to other media types, such as for example video encoded pictures using the VVC (or HEVC) file format, when using sub-pictures or tiles during encoding.
For example, in the case of VVC, samples 105-1 to 105-m may describe a 1920"1080 video sequence, where each subsample describes a quarter of the video.
Subsample 110-11 may describe the top-left part of the video frame 105-1. Subsample 110-12 may describe the top right of the video frame 105-1. Subsample 110-13 may describe the bottom-left part of the video frame 105-1, and subsample 110-14 may describe the bottom-right part of the same video frame 105-1. For each following sample the same sub-picture description may also apply. Therefore, in that case, the property that is repeated for each sample is the position of the sub-pictures it contains.
Figure lb illustrates an example of encapsulated media data organized as a non-fragmented media file according to the ISO Base Media File Format.
The media data encapsulated in the media file 115 starts with a FileTypeBox ('ftyp') box, not illustrated, providing a set of brands identifying the precise specifications to which the encapsulated media data conforms that are used by a reader to determine whether it can process the encapsulated media data. The ftyp ' box is followed by a MovieBox ( xmoov' ) box referenced 120, a MetaBox 'meta' 130 and a MediaDataBox mdati 140, which contains the media data (timed or untimed) that are described in the other metadata boxes.
The MovieBox box provides initialization information that is needed for a reader to initiate the processing of the encapsulated media data. In particular, it provides a description of the presentation content, the number of tracks, and information regarding their respective timelines and characteristics.
As illustrated, 'moovr box 120 also contains one (or more) TrackBox ( Ntrak' ) boxes 121 describing each track in the presentation. TrackBox box 121 contains in its box hierarchy a MediaBox 122 ('mdia') that describes the creation date of the ISOBMFF file and the duration of the media data encapsulated. The 'maid' box contains a MedialnformationBox ('minf') which in turn contains a SampleTableBox Nstbl' ) box. This SampleTableBox contains descriptive and timing information of the timed media samples 141-1 to 141-N contained in the mdat' 140. More particularly for subsample description, the SampleTableBox NStbl' contains a subsample information box called SubSampleInformationBox ('subs') 125 which describes a contiguous range of bytes of a sample, enabling to know subsamples information such as their sizes as further described with Figure 2.
The MetaBox 'meta' 130 comprises general untimed metadata including metadata structures describing one or more still images encapsulated as items. This 'meta' box 130 contains an iinf' box (ItemInfoBox) 131 that describes several single images. Each single image is described by a metadata structure ItemInfoEntry also denoted items 131-1 and 131-2. Each item has a unique 16-bit or 32-bit identifier item_l D. The media data corresponding to these items are stored in a container for media data, e.g., the 'mdat box 140 (for example corresponding to samples 145-1 and 145-M). The media data may also be stored in an idat' box, in an 'imda' box or in another file. An 'hoc' box (ItemLocationBox) 132 provides for each item the offset and length of its associated media data in the mdat' , 'idat' , or imda' box. The media data for an item may be fragmented into extents. In this case, the 'ibobr box 132 provides the number of extents for the item and for each extent its offset and length in the 'mdat' , 'idat' , Or 'imda' box.
ISOBMFF provides a mechanism to describe and associate properties with items. These properties are called item properties. The ItemPropertiesBox 'iprp' 133 enables the association of any item with an ordered set of item properties. The ItemPropertiesBox consists of two parts: an item property container box ipco 133-1 that contains an implicitly indexed list of item properties 133-3 or 133-4, and an item property association box ipma' 133-2 that contains one or more entries. Each entry in the item property association box associates an item with its item properties.
The ItemProperty 133-4 and ItemFullProperty (here illustrated for 'subs' support 135) boxes are designed for the description of an item property. ItemFullProperty allows defining several versions of the syntax of the box and may contain one or more properties whose presence is conditioned by either the version or the flags parameter. The ItemPropertyContainerBox 133-1 is designed for containing an implicitly indexed list of item properties (each inheriting from an ItemPropertyBox or ItemFullPropertyBox).
The ItemPropertyAssociation box 133-2 is designed to associate items and/or entity groups with item properties. It provides the description of a list of item identifiers and/or entity group identifiers, each identifier (item_ID) being associated with a list of item property index referring to an item property in the ItemPropertyContainerBox 133- 1. As an example of item property that can be stored in the ItemPropertyContainerBox, there is the subsample information property defined in ISO/IEC 23008-12 defining the Image File Format, also known as High Efficiency Image File Format (HEIF). This subsample information property may benefit from the same optimization as those described hereafter for the 'subs' box from ISOBMFF.
The MediaDataBox radat' 140 comprises the actual media data, here the data for timed samples 141-1 to 141-N described by 'rnoovr box and the data for items (data which does not require timed processing) 145-1 to 145-M described by 'meta' box. These samples or items may correspond to one of the samples described in Figure la.
According to ISOBMFF, the media file 115 can be fragmented into a plurality of media files or fragments 0.e., a MovieBox 'moovr followed by a series of couple of MovieFragmentBox moof' plus MediaDataBox 'mdat' or of MovieFragmentBox 'moof' plus IdentifiedMediaDataBox According to ISOBMFF standard, the subsample information box ('subs') is not mandatory. Moreover, the 'subs' box allows a sparse representation of subsample information, avoiding describing subsample information for samples not containing subsamples of for which no subsample information is available.
However, the point cloud encapsulation specification ISO/IEC 23090-18 mandates the usage of the 'subs' box for single track and allows it as well for multi-track encapsulation of point cloud data. Therefore, for each sample of a point cloud as described previously in Figure la, the subs' box contains repeated properties that increase the cost of the point cloud description in the file. This description cost issue is the main problem solved by this invention. Generally speaking, to solve this issue, various mechanisms are provided to reduce the redundancy that occurs when describing subsample information.
Moreover, this problem is not specific to point cloud (or volumetric media data) as explained previously with the usage of 'subs' box. It may also apply to VVC bitstreams for describing the organization of sub-pictures. Indeed, ISO/IEC 14496-15 may also use the subs' box for describing in ISOBMFF the organization in sub-pictures, by the codec specific parameters value when flags=4. In that case, even if the sub-picture organization is the same for the whole sequence contained in the bitstream, the content of the 'subs' box is not optimal as the same position for a sub-picture is repeated for each image of the video sequence. The syntax of the subsample information box 'subs' does not enable to avoid repetition of data that is common to multiple samples or subsamples. However, the size of the subsamples for consecutive samples have a low probability to be the same. This is due to the fact that subsamples typically contain encoded data where the size of the encoded data depends on the actual content of the data being encoded. This means that a description of each sub-sample, at least in terms of size, has generally to be provided. Based on these constraints, there is a need to improve the description cost of the subs' box.
Figure 2 describes the content of the subsample information box subs' from the prior art.
The Subsample information box describes, for a set of samples, the characteristics of their subsamples, each subsample being a contiguous range of bytes of a sample.
The 'subs' box is specified with a version, which is an integer that specifies the version of the box. It enables a reader to check if it is able to support the description provided by the box. It is also used to provide different descriptions of the subsamples contained in a sample and for keeping backward compatibility with previous versions of the same standard.
Then, the entry count is an integer that gives the number of entries (actually samples) that are described by the 'subs' box in the following loop.
For each sample, the 'subs' box specifies: - a sample delta, that indicates how many samples are skipped. This element provides a means to skip one or more samples, for which subsample description does not exist. Therefore, the subsample descriptions use a sparse representation.
sample delta is an integer that indicates the sample having sub-sample structure. It is coded as the difference, in decoding order, between the desired sample number, and the sample number indicated in the previous entry.
-a subsample count, which is an integer that specifies the number of subsamples for the current sample. If there is no subsample structure, then this field takes the value 0. This provides another way to signal that a sample has no subsample information.
Then, for each sub-sample contained in the current sample, the 'subs' specifies: - the subsample size 205 as an integer that specifies the size, in bytes, of the current sub-sample. This information is useful to access a particular sub-sample of a sample, knowing the cumulated sizes of the preceding subsamples for this sample (also known as byte-range).
- a set of other characteristics 210, namely: * a subsample priority, that is an integer specifying the degradation priority for each sub-sample. Higher values of subsample priority indicate subsamples which are important to, and have a greater impact on, the decoded quality.
* discardable that indicates when its value is 0 that the sub-sample is required to decode the current sample (otherwise the subsample is not required to decode the current sample but may be used for enhancements).
* codec specific parameters which specifies information about a sub-sample for a given coding system. For example, ISO/I EC 23090-18, depending on the value of the flags parameter, the codec_specific_parameters structure provides either (for flags =0) the type of G-PCC data unit which corresponds to the sub-sample (GDU/ADU), also known as G-PCC unit based subsamples, or (for flags =1) the tile identifier to which a subsample belongs (also known as Tile-based subsamples).
As another example, for VVC encoded video, ISO/I EC 14496-15 provides several specific codec specific parameters definitions. Among them, some of interest describes information to obtain the tile or the sub-picture position (for flags =2 or flags =4 respectively).
The set of characteristics 210 depends mainly on the configuration of the encoder, whereas the subsample size depends on the content of the encoded media (either 2D video or point cloud (volumetric media data) in our examples). It is then most probable that for a fixed configuration, the set of characteristics 210 is the same for a given subsample from one sample to another. On the contrary, even with fixed configuration, the sizes of the subsamples may vary from one sample to another.
The invention proposes optimizations for the description of subsamples in a subsample information box that may be used to describe subsample information for track or item, based on the following ideas. It is proposed to identify some properties that are common among some subsamples or for a set of subsamples. The description of these common properties is made once, typically for the first subsample associated with these common properties. For the other subsamples associated with these common properties, the description of the subsample comprises a reference to the common properties. This reference may be a reference to a previous subsample. In this case, the common properties are implicitly stored in the referred subsample. Alternatively, an explicit description of the common properties is made in a new pattern box, and the reference refers to the corresponding pattern in this new pattern box for the subsamples or for the set of subsamples.
The invention also proposes enhancements in the description of the subsample information box s ub s ' to reduce the cost of its description adapted to the configuration of the bitstream.
In reference to Figure 2, the invention aims at grouping the description of similar properties in the set of characteristics 210 (called common properties in the rest of this document), and at using different reference mechanisms to use a single description of these properties for specifying as many subsamples as possible. The main advantage of using a reference mechanism is the reduction of the size of the 'subs' box inside a track or inside a collection of items of an ISOBMFF file. The different mechanisms improving 'subs' box are described in relation with Figure 4 and Figure 5.
It is to be noted than more than one SubSampleInformationBox (' subs' ) may be present in the same container box, assuming the value of the flags parameter differs in each of these subs boxes. Therefore, the reference mechanism introduced by the invention may be used to optimize independently different 'subs' boxes, corresponding to different common properties.
In some embodiments, it may happen that even the size is the same for several sub-samples. For example, raw image data or uncompressed point cloud, which means not encoded image data or point cloud, may be split in tiles with a same size for each file. In this case, the sub-samples corresponding to files data in the sample may all have the same size. In that case, a whole subsample may be described by reference. This may for instance be achieved by considering, each time a subsample is described by reference, an additional Boolean value (to the has reference parameter) indicating whether the reference also describes the subsample size or not.
An alternative to the 'subs' box, using a mechanism to store common properties and to associate subsamples to these properties is described in relation with Figure 10 and Figure 11 Figure 3 is an example of a system for the encapsulation or storage of multimedia presentations, where the proposed method for describing subsamples may be used.
The file writer 300 takes as input the media data 305. The file writer 300 processes the media data 305 to prepare it for streaming or for storage. This step is called media encapsulation. It consists in adding metadata describing the media data in terms of kind of data, codec in use, size, data offsets, timing.... The media data 305 may be the raw data captured by sensors or may be data generated by a content creator or editing tools. For example, the input media data 305 can be video data, point cloud data or sequence of images. The media data 305 may otherwise be available as compressed or encoded media data 315, possibly as different encoded versions. For example, for point cloud media data, the data may be compressed using the M PEG-I Part-9 standard.
As another example, for video/image data, the data may be compressed using VVC/HEVC or any other video/image compression standard. This means that the encoding or compression may be performed by the file writer itself using the encoder module 310 (possibly one per media type) or may be performed outside of the file writer. The compression may be live encoding (as well as the encapsulation). The file writer 300, through the encapsulation module 320, encapsulates the media data into movie file, movie fragments, for example according to ISOBMFF and its extensions (e.g. possibly NAL-unit based File Format for video data, M PEG-I Part-18 for point cloud data). The file writer 300 then generates a media file 325 or one or more segment files 325. The file writer 300 may optionally generate a streaming manifest like a DASH MPD or HLS playlist (not represented). The generated file 325, segment files or manifest may be stored on a network (340) for redistribution via on demand or live streaming.
The encapsulation process is further described in reference to Figure 4 or Figure 5, implementing the proposed methods for describing subsamples.
The file writer 300 may be connected, via a network interface (not represented), to a communication network 330 to which is also connected, via a network interface (not represented), to a media player 350 comprising a de-encapsulation module 360.
The media player 350 is used for processing data received from communication network 330, or read from a (local or remote) storage device, for example for processing media file or media segments 325. The data may be streamed to the media player, thus involving a streaming module (not represented) in charge of parsing a streaming manifest and of determining requests to fetch the media streams and of adapting the transmission, according to indication in the manifest and media player parameters like for example available bandwidth, CPU, application needs or user preference. The received data is de-encapsulated in the de-encapsulation module 360 (also known as a ISOBMFF parser or as an ISOBMFF reader or simply as a parser or as a reader), the de-encapsulated data (or parsed data) may be decoded by a decoder module for storage, display or output to an application or to a user(s). The decoder module (possibly one or more per media type) may be part of the media player or may be an external module or dedicated hardware. The de-encapsulated data often correspond to encoded media data 365 (e.g. video bitstream, point cloud bitstream.. ). The de-encapsulation, decoding and rendering may be realized as a live processing of the media file while it is being received, for example by processing data chunks for each media stream in parallel and in synchronized manner to minimize the latency between the recording of the media (i.e. media data 305) and its display to a user as media data 375.
It should be noted that the media file 325 may be communicated to the media player 350 in different ways. In particular, the file writer 300 may generate a media file 325 with a media description (e.g. DASH MPD) and communicate (or stream) it directly to the media player 350 upon receiving a request from media player 350. The media file 335 may also be downloaded, at once or progressively, by the media player and stored alongside the media player 350.
The parsing process performed by the de-encapsulation module 360 is further described in reference to Figure 6.
Figure 4 and Figure 5 describe in more details the encapsulation process according to different embodiments of the invention.
Figure 4 illustrates the encapsulation process according to one embodiment of the invention. This process is performed by the encapsulation module 320.
The process starts in step 400 by configuring the file writer 300. This configuration may set up some parameters for the encapsulation module 320. This step may consist in setting a number of tracks or items. For example, for point cloud encapsulation, it may consist in defining the use of single-track encapsulation where all samples contain GPCC data units (GDUs and corresponding number of ADUs), requiring the description of G-PCC unit based subsamples by a 'subs' box with f lags=0. For WC, it may also set the number of tracks to 1, where each sample will contain NAL units corresponding to several sub-pictures and where the subs' box with f lag s=4 will be used to describe the relative position of sub-pictures in the full picture. The configuration may also impact the encoder module 310. The encoder settings may define, for example in the case of a video stream, the number of sub-pictures and therefore the number of sub-samples that are contained in a sample. For a point cloud, they may specify the number of attributes encoded and therefore the number of ADUs data units present for one GDU in each sample. This configuration may also comprise settings indicating whether the number of sub-frames per frame is fixed or not, whether the settings apply to all the samples (or items) of the sequence or only to some samples, for example at regular time intervals. These parameters may be hard-coded in the file writer 300 or specified by the user, for example through a command line, control scripts or through a graphical user interface.
After the configuration of the encapsulation module, metadata structures of a media file such as top-level boxes (e.g., ftyp' or stypr, 'moovi, 'mciat) and contained sub-boxes for track and sample description like 'tram, 'stbir StSdr, etc) are created during an initialization step (step 405). Such an initialization step may comprise reading parameter sets (e.g. geometry and attribute parameter sets) from an encoded bitstream of point cloud data or may comprise obtaining information about the sensor (for uncompressed data) like a number of points, the types of attributes associated with the points (e.g., colour, reflectance, timestamp, areas of interests, etc.). For an encoded bitstream of video data, the initialization step may read sequence parameter sets or picture parameter sets to obtain for example the size (height and width) of the image and of any sub-pictures that may be present when the sequence is not encoded by the file writer but received from an external encoder. It is to be noted that some of the setting parameters defined in the configuration step 400 may be reflected in the track descriptions or the sample descriptions.
According to this embodiment of the invention, during the initialization step 405, an empty SubsampleRef array is created, to store common properties. This array typically has a maximum capacity, for instance a predetermined size of 128 entries. Each entry in said array may comprise a set of values for subsampiejoriority, discardable and codec specific parameters, a number of occurrences, and an appearance index. Considering the number of occurrences and appearance index allows, as described hereafter, to preserve entries that are more likely to be used as references.
Then, the encapsulation process is realized by checking if there is still a remaining sample to be read from the bitstream (step 410). If this is the case, the process reads one sample of the media bitstream, starting from the first remaining one. During the read operation of the sample, the file writer determines if the sample contains subsamples or not. For a point cloud compressed sample, the presence of subsamples may be determined by parsing the data unit header (also called TLVs) and determining from the header the type of the current data unit (GDU, ADU, parameter set) and the size of the unit in bytes. Then, the location of a next data unit of the sample may be determined, and the number of subsamples present in a sample may be determined. For a video sequence, a similar process may be realized using the NAL unit header to identify the type of the current data unit and searching for specific delimiter code in the bitstream, to determine the size and location of the next data unit in the sample. At the end of this step, the file writer has determined if subsamples are present or not in the current sample. Moreover, if the sample contains subsamples, the file writer has determined the number of subsamples and the characteristics of the subsamples such as for example their types and their sizes. These data are then stored in a SubsampleArray structure in memory. If there is no subsample, the SubsampleArray structure is empty.
If step 410 determines that there are no more samples in the bitstream, then the ISOBMFF file is finished (step 450), possibly with creation of some boxes for indexing or random access.
When the test 410 returns true, then the process checks in step 415 if there are still subsamples to describe for the current sample.
If there are no remaining subsamples, then the next step of the process is step 410, skipping all operations performed to describe subsamples. This may be the case when a sample does not contain any subsamples or for which the encapsulation setting does not require generating a subsample description.
This may also be the case when the last subsample of the current sample, has been processed.
When subsamples are present (test 415 true), starting from the first subsample, step 415 reads a new sub-sample from the SubsampleArray to start the creation of the Subsample information box of the ISOBMFF file for the current sample.
If step 415 is positive, in step 420, the file writer 300 may then check if the current subsample description matches the description of a previous subsample, meaning that a set of common properties may be identified between the current sub-sample and any of the previous ones. This may be performed by checking if one of the stored set of common properties inside the SubsampleRef array corresponds to the values of corresponding characteristics of the current sub-sample and may therefore be used as a reference for the description of the current sub-sample.
If no previous subsample reference is found (i.e. when the check of step 420 is negative), then steps 425 and 430 are performed for the current sub-sample. It results in adding another new element describing the current sub-sample in the 'subs' box and in the SubsampleRef array. Step 425 adds the full description of the sub-sample to the 'subs' box. Step 430 adds this description to the SubsampleRef array in order to be able to use it for describing following subsamples sharing the same properties. Typically, this may first consist in creating a new entry based on the values of subsample priority, discardable and codec specific parameters, with a number of occurrences of 0, and an appearance index equal to number of sub-sample previously processed. Then, as a second step, if the array has a maximum capacity, it is then checked whether said maximum capacity has been reached. If not, the new entry is appended to the table. On the other hand, if the maximum capacity has been reached, it is necessary to remove an existing entry prior to adding a new one. In this context, it is advantageous to put the new entry in place of the entry with lowest number of occurrences (1st criterion) and lowest appearance index (2 criterion). This method of managing the array of reference is provided as an example. Other methods may be used, for instance by enabling the writer to explicitly indicate which subsamples should be considered for addition to the array of references.
Otherwise, when the step 420 is positive, at step 440, a new description is added to the SubSampleInformafionBox for describing the current sub-sample. This description takes advantage of the stored common properties of a previous subsample found On the SubsampleRef array at step 420). This description refers to the previous subsample to signal the common properties of the current sub-sample and define any additional information such as the size of the sub-sample. For example, the reference may be specified using the corresponding index in the SubsampleRef array. The syntax to indicate whether the subsample is using a reference or not and which set of common properties is used is further described in relation with Figure 7. It is to be noted that a reference to a previous subsample corresponds either to a sub-sample of the same sample or to a sub-sample of any previous samples or any previous samples that are not before a sync sample (to guarantee random access). This provides more possibilities to group common properties between all subsamples.
At this step, the number of occurrences of the entry in the SubsampleRef array found at step 420 is increased by one. The appearance index of this entry is set to the number of subsamples previously processed.
After any of the steps 430 or 440, the step 415 is realized again. If another subsample exists for the current sample, it is processed by steps 420 to 440. Otherwise, if there are no more subsamples for the current sample, the process goes to step 410.
Possibly, the replaced entry at step 430 may be selected differently. For example, it may be selected using only the lowest appearance index.
Possibly, the number of occurrences and the appearance index are not stored in the SubsampleRef array. Instead, the SubsampleRef array is kept ordered according to the selection criteria used to decide which entry to replace in it. For example, at creation, a new entry is added at the end of the SubsampleRef array. At step 440, the entry used as a reference is moved at the end of the SubsampleRef array. At step 430, when the SubsampleRef array has reached its maximum capacity, its first entry is removed.
Possibly, the SubsampleRef array may be reset to an empty array during the encoding. For example, the SubsampleRef array may be reset to an empty array when a random access point is created for the media to allow a reader to parse the 'subs' box without having to reconstruct the content of the SubsampleRef array from the beginning.
Figure 7 describes a first syntax embodiment of a subsample information box to support reference to a previous subsample and reusing its properties.
The syntax of the new SubsampleInformationBox is shown in 790, as proposed for the support of the invention, uses new version values 2 or 3 to enable the optimization of the 'subs' . Using new values for the version parameter enables to keep a backward compatibility with the subs' box as defined in the standard. Optionally, each of these new versions may also be used with new semantics for the flags parameter, as described hereafter to support additional improvements of the 'subs' box.
For using the syntax shown in 790, the flags parameter 715 is defined as a generic part (for example the first byte) and a codec-specific part (for example the two last bytes). Each part defines its own set of values. For example, the new generic part defines the following values: * applies to all samples: Flag mask is Ox800000. When set, this value indicates that the subsample information box applies to all samples of the track or track fragment (this avoids repeating a sample delta always equal to 1).
* regular sample pattern: Flag mask is 0x400000. When set, this value indicates that the sample delta always takes the same value and is declared only once in the SubsampleInfomafionBox.
* fixed nb subsamples per sample: Flag mask is 0x200000. When
_ _ _ _
set, this value indicates that the subsample count can be encoded once for all samples having a subsample description and is not repeated in the loop of entries.
* merged_property: Flag mask is Ox100000. When set, this value indicates that the 'subs' box uses an optimized syntax for the coding of the set of characteristics 210.
The usage of the flags applies to all sample, regular_sample_pattern, fixed_nb_subsamples_per_sample and merged property is described later and introduces additional optimizations that may apply to any of the embodiments described here. In the following, they are supposed to be set to 0. The last two bytes may be used by derived specifications to define their own flags values, for example to identify the type of the subsamples (NAL units, pictures, tiles, subpictures, slice, G-PCC units, G-PCC tiles, G-PCC slices, G-PCC sub-frames...).
In that case, syntax element (or parameter, or field) 700, 705, 706 and 710 are present in the description of subsamples when using the process of Figure 4.
The element 710 is used for the description of subsamples that do not refer to any previous subsamples. In that case the description of a sub-sample is the same as the one of Figure 2.
The elements 700, 705 and 706 are the new syntax elements enabling the description of a subsample by referencing a previous subsample.
First the flag has reference 700 enables signaling how a subsample is described. If the value of this parameter 700 is 0, then the description of the subsample uses the element 710. If the value of this parameter 700 is 1, then the description of the sub-sample is realized by referring to a previous subsample.
Then, when a subsample refers to a previous one, the value of the parameter reference id 705 indicates the index of the SubsampleRef array, containing the common properties.
The parameter reserved 706 may be added to keep the byte alignment of description 710 as required by the ISOBMFF standard. Its value may be set to O. Using this syntax has the advantage to allow referring to any previous subsample of the bitstream, from the same sample or from any other sample, or for any other sample that is not prior the previous sync sample (to guarantee random access).
A first alternative to this embodiment may avoid adding the reserved 706 field, by reducing the size of the subs arnica e_size by one bit, using for example the following syntax (new or modified elements are shown in bold): for (int j=0; j<subsample_count;j++){ if (version==3){ unsigned int(31) subsample_size; 35} else t unsigned int(15) subsample_size; unsigned int(1) has reference; if (has reference) { unsigned int(8) reference id; else ( unsigned int(8) subsample priority; unsigned int(8) discardable; unsigned int(32) codec specific parameters; In this case, the size of the reference id value is increased by 1 bit.
A second alternative for this embodiment may enable referencing not only a single subsample but a set of one or more subsamples, by using an internal counter and a pattern length, for example using the following syntax (new or modified elements are shown in bold): int k=0 for (int j=0; j<subsample count;j++){ if (version-3) unsigned int(32) subsample_size; } else unsigned int(16) subsample_size; If (k==0) ( unsigned int(1) has reference; if (has reference) ( unsigned int(7) pattern length; unsigned int(8) reference id; k = pattern length; 1 else unsigned int(7) reserved; unsigned int(8) subsample priority; unsigned int(8) discardable; unsigned int(32) codec specific parameters; k = 1; k--; In this embodiment, a length is signaled in addition to the reference_id in order to indicate that a new sequence of subsamples is described by referencing a previously met sequence of subsamples. The first new subsample is described in reference to the subsample associated with reference_id, and next N following subsamples are described by reference to the N subsamples occurring after the subsample associated with reference_id. In this case, the pattern_length variable is therefore typically equal to N+1 given that the described sequence of subsamples has a length of N+1. If the currently described subsample is not defined using a reference, to keep the byte alignment a reserved (7 bits) field is added to the description. Its value may be set to 0 by default.
Alternative conventions may be chosen for describing the pattern length. For instance, it may be decided that pattern_length should not comprise the first subsample, i.e. be equal to N. As another variant, reference_id may be represented on 6 bits, so that 1 bit may be used to indicate whether a pattern length is indicated or not: if the bit value is 0, no pattern length is added, which means that a single subsample is described through this reference; on the other hand, if the bit value is 1, a pattern length is indicated on 8 bits. In yet another variant, the pattern_length and the reference_id may be stored on 7 bits, for instance using 4 bits for the reference_id and 3 bits for the pattern_length. This limits the range of possible values, but it allows a more compact representation, which may be a better trade-off depending on the considered data.
For this alternative, improvements described for the first alternative may also apply.
When using this alternative, the SubsampleRef array used by the steps described in reference to Figure 4 may contain some pattern usage information to prevent replacing an entry in the middle of an often used sequence of entries. For example, a pattern_usage value may be stored for each entry and may be increased each time the entry is referred to as part of a pattern. This pattern_usage value is then used as the first criterion for deciding which entry to replace inside the SubsampleRef array, selecting the entry with the smallest pattern_usage value.
Possibly, the number of occurrences, the appearance index and the pattern_usage are not stored in the SubsampleRef array. Instead, the SubsampleRef array is kept ordered according to the selection criteria used to decide which entry to replace in it. In this alternative, when several entries are referenced at once as a pattern, all these entries are moved as a block at the end of the SubsampleRef array.
A third alternative may be used when each sample has a similar set of subsamples, the subsamples differing only through their sizes. This alternative enables to describe the common properties of the subsample once, using for example the following syntax for the subsample information box (this syntax may be enclosed as a new version of the box): aligned(8) class SubSampleInformationBox extends FullBox('subs', version=2 or 3, flags) ( unsigned int(16) subsample count; for (int i = 0; i subsample count; i++) { unsigned int(8) subsample priority; unsigned int(8) discardable; unsigned int(32) codec specific parameters; unsigned int(32) entry count; for (int i = 0; i < entry count; i++) { for (int j= 0; j < subsample count; j++) ( if (version == 3) { unsigned int(32) subsample size; 1 else ( unsigned int(16) subsample size; In this alternative, a first loop describes the properties of the subsamples for all the samples. Then a second loop describes the size of each sub-sample for each 30 sample.
In this case, only the subsample size parameter is specified for each sub-sample and the cost of the repetition of the common properties is reduced.
An improvement to describe the size of a subsample that refers to a previous subsample, applicable to any previous alternative description, may be to encode the size of the sub-sample as a difference with the size of the referred to subsample. For example, subsampie_sioe_metta parameter, encoded as a signed integer, may encode this difference. The size of the sub-sample is the sum of the value of the subsample_size_delta parameter and of the value of the subsample size for the referred to subsample. This subsample size delta parameter may be specified in place of the subsample size parameter.
Figure 5 illustrates the encapsulation process according to a second embodiment of the invention. This process is performed by the encapsulation module 320. According to this embodiment, common parts in the subsample descriptions are stored in a dedicated box. The description of the subsamples may refer to these common parts. In the following, these common parts are called patterns and the dedicated box is referred to as a pattern box, identified by a dedicated four-character code (for example 'sspb' for Sub-Sample Pattern Box or s spot for Sub-Sample Pattern Container).
The process starts at step 500 by configuring the file writer 300. This step is similar as the one described in step 400 of Figure 4.
Then, step 510, similar to step 405 of Figure 4, initializes the information for the generation of some of the boxes of the ISOBMFF file 325.
The next step 520 is then realized, to create a new PatternBox description structure used to store common properties. The syntax of this pattern box is described hereafter in relation with Figure 8b. The PatternBox when added to an ISOBMFF file, may apply to all the samples of the described bitstream. Therefore, it may appear in the box hierarchy inside a 'moovr box, or in a stblt box. This PatternBox may be identified by a 4CC 'patt' or 'patb' or any dedicated 4CC for description of subsample pattern that do not conflict with any existing 4CC.
In a variant, the PatternBox may also be defined in the box hierarchy inside a MovieFragmentBox 'moot' similarly to the SubSamplel nformationBox to apply only on the samples of the fragment. The same 4CC patt' or 'patio' may be used.
In a variant, the PatternBox may only apply to a group of samples, e.g., using the ISOBMFF sample group mechanism. The PatternBox can be defined as a SampleGroupDescriptionEntry in a SampleGroupDescriptionBox.
The PatternBox description structure is created depending for example on the encapsulation mode (single-track or multi-tracks) and on the characteristics of the bitstream such as the number of attributes for point cloud data or the number of sub-pictures for a video bitstream. This information is either provided as input through the user interface (in the case where the encoder is integrated in the file writer) or determined by reading parameter sets of the bitstream to encapsulate.
For example, for the encapsulation of a point cloud bitstream, the number of attributes may be detected by parsing information from the sequence parameter set.
Knowing the number of attributes (noted nb_adus), determining the number of subsamples for one sample of a single point cloud frame is done as follows. It corresponds to one geometry data unit plus the number of attribute data units as point cloud bitstream compliancy requires all attribute data units to be present (either by an ADU or a defaulted ADU). In the case of several point cloud frames (for example when concatenating the point clouds captured by N different LiDARs in the same bitstream), a sample will contain N* (1+ nb_adus) subsamples.
It is then possible to determine a subsample pattern describing common properties in the PatternBox. The subsample patterns in the PatternBox may be determined using different methods. A first method may determine exhaustively all the possible subsample patterns based on the considered technology, without parsing the bitstream to encapsulate. If said technology allows many types of subsamples, the number of patterns may be high, resulting in a significant description cost. Therefore, a second method may only determine a subset of the most likely patterns to be used as reference description based on generic assumptions for the considered technology (e.g. consider only the most frequently used subsample types). Finally, a third one may dynamically detect patterns while parsing the bitstream and may determine which patterns to store in the pattern box to minimize the description size.
As another example, for video bitstream encapsulation, the number of sub-pictures could be set by the user and the configuration of the encoder may require using one NAL unit per sub-picture. Therefore, for the video encoded bitstream, the number of subsamples may correspond to the number of sub-pictures. Then a PatternBox may be determined in order to store the common properties applying to any of the sample of the bitstream.
The patterns may also be provided using a different granularity for subsample description. A first granularity may define patterns used for the whole sample, which may be of interest when all samples are encoded in the same way. A second granularity level may only describe a set of contiguous subsamples inside a sample.
Then in step 530, the encapsulation process is realized by reading one sample of the media bitstream starting from the first one. Similarly to step 410 of Figure 4, during the read operation of the sample, the file writer determines if the sample contains or not subsamples. At the end of this step, the file writer has determined if subsamples are present or not in the current sample. Moreover, if there are some subsamples, it has determined the number of subsamples and the characteristics of the subsamples such as for example the type and the size of each sub-sample. This information is then stored in a SubsampleArray structure in memory.
If there are no subsamples, then the next step of the process is step 580, skipping all operations performed to describe subsamples. (not represented for simplification) When subsamples are present, the process determines during step 550 the pattern of the PatternBox that corresponds to this set of subsamples. For this, knowing the length N of each pattern, the process checks for each pattern if the set of characteristics of the N following subsamples, starting from the current sub-sample, matches the common properties of the pattern description. Length N is obtained from the pattern_length parameter, described in relation with Figure 8a and 8b.
When an exhaustive description of all the possible patterns is contained in the PatternBox, one pattern matching a sequence of subsamples starting at the current sub-sample will always be found. In the case where the PatternBox defines only a subset of all the possible patterns, it may happen that no reference is found for the sequence of subsamples. In this case, the sequence of subsamples is described directly inside subsample information box without any reference to a pattern, for example using the
same description as in the prior art.
In all cases, during the next step 560, the file writer 300 adds to the subsample information box (125 or 135) the information describing the current set of subsamples as later explained in relation with Figure 8a and 8b, using or not reference to common subsample(s) description(s) provided in the PatternBox.
Then in step 570, it is checked whether all the subsamples of the current sample are described in the subsample information box. It may be done by verifying that the processing of the sample has reached the end of the sample. If the test is negative, there is still some subsample information to add for describing the sample, and step 550 is realized again for processing the remaining subsamples. Otherwise, all the subsamples have been described and added in the subs' box. Another check in step 580 is realized to check whether there are more samples to add to the ISO Base Media file.
If another sample is present (test 580 positive), then steps 530 to 570 are realized for processing the next sample and its subsample descriptions. Otherwise, the creation of ISO Base Media file ends, possibly with creation of boxes for indexing or random access information.
Figure 8a and 8b illustrates a first syntax embodiment of a subsample information box to support reference to an explicit pattern description and its common properties. The subsample information box syntax 800 as proposed for the support of the invention, uses new version values 2013 to enable further optimizations of the subs using specific semantics for the flags parameter as described for the element 790 of Figure 7. The version parameter also enables to keep a backward compatibility with previous definition of the 'subs' box.
The usage of the flags applies to all sample, regular sample pattern, fixed rib subsamples per sample and merged property is described later on and may apply to any other described embodiment. For this section, these generic flags value are not set.
In this case, the syntax elements 810, 840 and 845 in the subsample information box, and the PatternBox 805 are present in the ISOBMFF for supporting the process of Figure 5.
In the subsample information box, the element 845 is used for the description of subsamples that do not refer to any pattern from the PatternBox. In this case the description is the same as the one of Figure 2. This may occur in case a non-exhaustive pattern description is done in step 520. This enables to reduce the PatternBox size and enables to support samples containing additional parameter set data units or occasional data units (such as Tile inventory data unit for point cloud). In this case, the parameter has reference 840 is set to 0.
The elements 840 and 810 are the new syntax elements in the subsample information box enabling to reference a set of common properties 825 contained in the PatternBox 805.
First the flag has reference 840 enables to differentiate between the description of a set of subsamples where all the description 845 is comprised in subs box (when set to 0) and one that refers to a pattern described in the additional PatternBox (when set to 1).
Then, when the set of subsamples refers to a pattern, the value pattern id 810 indicates the identifier of the corresponding description in the PatternBox, which contains the common properties 825. Said identifier is typically the index of the pattern in the PatternBox.
The parameter reserved 850 is added to keep the byte alignment of the description 800 when the subsample has no reference to the PatternBox.
The PatternBox as described in 805 enables to support a granularity for subsample description at the sample level. It defines patterns used for a whole sample as declared in the loop over the entry count value that corresponds to a sample. When setting the pattern identifier 810, counter k is initialized to the pattern_length value of the corresponding pattern description in the PatternBox (defined in step 520).
Then this counter enables to determine, inside the pattern, to which common properties a subsample is referring to.
In an alternative, to support description of a subset of subsamples inside a sample, the information 840, 810 and 850 may be removed from the entry count loop and may be declared in the subsample _count loop as follow (introduced new features are indicated in bold): for (int j=0; j<subsample ccunt;j++){ if (version-3) unsigned int(32) subsample size; I else f unsigned int(16) subsample_size; if (k==0) 1 unsigned int(1) has reference; if (has reference==1) f unsigned int(15) pattern_id; k = pattern_length; 1 else f unsigned int(7) reserved; unsigned int(8) subsample priority; unsigned int(8) discardable; unsigned int(32) codec specific parameters; k = 1; k--; In another alternative, the pattern_id parameter may be repeated for each sub-sample in the pattern and the implicit position inside the pattern is computed to determine to which set of common properties a subsample is referring to.
In another alternative, when an exhaustive pattern description is used in the PatternBox, the has reference flag may be omitted and, only the pattern id parameter may be present (either for all samples or set of subsample description). In this case the patternid parameter may be encoded using 16 bits. Indeed, in that case all samples necessarily refer to a pattern, hence implicitly has reference is always 1.
The PatternBox syntax 805 proposed for pattern description may use the codec-specific part of the flags parameter 715 to make association between the subsample information box 800 and the referenced PatternBox 805. It enables to support different subsample information corresponding to different sets of common properties 825.
The PatternBox may comprise, for example, the following syntax elements: pattern count 815 defines the number of patterns described in the Pattern Box.
pattern length 820 defines the size of the pattern description. It enables to support different sizes of patterns for a sample, for a sequence of subsamples inside a sample or for a sequence of sub-samples spanning one or more samples.
The common properties 825 then signals the information describing the set of subsamples in the pattern that corresponds to subsample descriptions referencing the pattern.
In some cases, the use of the PatternBox has the drawback of being less efficient than the solution proposed by the first embodiment. Indeed, the description of the PatternBox introduces additional costs compared to referencing a previous subsample as in the first embodiment. However, using the PatternBox is advantageous when the number of samples to which it applies is high. In both embodiments, the shared properties are stored once and only once in the media file and shared by reference by all the corresponding subsample descriptions.
In addfion PatternBox enables random access by a reader to any samples as the referencing mechanism does not require the parsing of all the samples in sequence. This mechanism may be used for describing pattern applying to a single subsample, by similarity to the first embodiment, by setting the parameter pattern length to the value 1, providing a random access capacity at the subsample level.
A first alternative for the PatternBox mechanism is to define a parameter pattern length 830 that applies to all the patterns instead of the parameter pattern_length 820 that applies to a single pattern. Using this common parameter may be signaled by setting the value of fixed nb subsamples per sample to 1. This may be the case, for example, depending on the pattern granularity, when either the pattern applies at the sample level and all the samples have the same number of subsamples, or when the pattern applies at the subsample level and all the patterns contain the same number of subsamples.
Note that a pattern may be restricted to apply only to a full sample. It may also apply only to a set of subsamples inside a single sample. It may also apply only to one or more full samples. It may also apply to any sequence of sub-samples contained in one or more samples.
Figure 6 illustrates a parsing method according to an embodiment of the invention. The media player 350 may use the de-encapsulation module 360 to parse the provided ISOBMFF file 335. The parsing process may use implicit references to a previous subsample, or explicit references to a pattern for the identification of the information corresponding to subsamples and may use this to refer to information for the correct reconstruction of a compliant bitstream or possibly to extract a subset of data corresponding to specific subsamples. Whether such references may be used is typically indicated through the version number of the 'subs' box; for instance, 0 or 1 indicate explicit encoding of the sub-sample information, while 2 or 3 indicates encoding that may use reference to previous subsamples. Version 2 or 3 may also make use of generic On opposition to codec-specific) flags values. For example, in the case of the extraction of point cloud data, it may consist in keeping only the geometry data unit and a subset of attributes data units. For example, in the case of the extraction of video data, it may consist in keeping only NALU data unit corresponding to a specific set of sub-pictures.
The first step 600 comprises receiving the media file to be parsed. This media may be streamed from distant location using the communication network 330 or it may be read from a storage location.
The step 605 comprises initializing the de-encapsulation module 360. This is done by parsing the top-level boxes of the media file, for example the 'moov' box and its hierarchy of boxes, e.g., the Thralcr boxes and sample description boxes. When the media player contains a decoder module 370, the decoder may also be initialized during this step, for example using decoder configuration information from the sample description (e.g. the G-PCC' configuration box 'gpcCr).
At step 610 a sample is read. The de-encapsulation module 360 reads the sample description from metadata part of the media file 335 to locate the corresponding data in a media data box.
At step 615, the de-encapsulation module 360 checks if the usage of ISOBMFF requires access to details of the subsample information. Indeed, some samples may have no subsample description. For example, when sample_delta information is set to a value > 1, next subsample description is not for the following sample.
When the operation requires accessing subsample information, then step 620 reads the subsample description provided in the ISO Base Media file.
Then, in step 625, for each subsample composing the sample, a first check is done to verify if the subsample description refers to common properties. This is for example realized in either the first or the second embodiment by checking if the has reference flag is set to 1 in the subsample description. As an example, in the first embodiment, if we assume that references are used to indicate similar values for subsample priority, discardable and codec specific parameters, has reference equal to 0 means that subsample priority, discardable and codec specific parameters are explicitly indicated for considered sub-sample. With the same assumptions, has reference equal to 1 means that the values of these properties are identical to the corresponding ones for the sub-sample indicated by reference id, which is an integer that specifies the index of a sub-sample in a reference table. Each entry in said table may comprise a set of values for subsample priority, discardable and codec specific parameters, a number of occurrences, and an appearance index. The reference table is typically initially empty. In addition, to preserve random access, the reference table is also typically reset on each sync sample.
In the case where there is no reference to another sub-sample or set of subsamples, the description is directly read, at step 630, from the subsample description using the set of parameters 710 or 845. In addition, when using reference to a previous subsample, as in the first embodiment, the current subsample description is stored in a SubsampleArray structure to be used if referenced from a later subsample. To provide more details on this case, and for illustrative purpose, let us consider that the reference table has a maximum number of entries equal to 128. When a subsample for which has reference equals 0 is met, a new entry is created based on the values of subsample priority, discardableandcodec specific parameters, witha number of occurrences of 0, and an appearance index equal to the number of subsamples that have been previously processed. It is then checked whether the reference table comprises fewer than 128 entries. If so, the new entry is appended to the table. If not, the new entry replaces the entry of the table with lowest number of occurrences (1st criterion) and lowest appearance index (2nd criterion).
When a reference exists, then step 635 is realized to find the referenced description. This may comprise, in the case of a reference to a previous subsample, accessing the information stored in the SubsampleArray at the index indicated by reference id. For instance, in this case, and if we assume the same context as the one described in previous paragraph for illustrative purpose, when a sub-sample for which has reference equals 1 is met, the entry of the reference table with corresponding reference_id is updated by incrementing its number of occurrences by 1 and by setting its appearance index to the number of sub-samples previously processed. If the pattern_length parameter is also present, step 635 also comprises obtaining the common properties for the subsequent subsamples based on the values of the pattern length parameter and the reference id parameter. For an explicit pattern description, the PatternBox with the same codec-specific part of the flags parameter is identified and the pattern (containing the common properties) referred to by the value of the pattern_id parameter is extracted. If the pattern_id parameter is indicated only for the first subsample corresponding to a pattern, then the description for all the subsamples corresponding to the pattern is obtained. Otherwise, if the pattern id parameter is indicated for each subsample corresponding to a pattern, step 635 is limited to retrieving the description for the current sub-sample.
Then using the common properties and the remaining additional description of the subsample information box (typically the subsample_siz4 the data unit corresponding to the sub-sample is retrieved and provided to decoder at step 640. As described in reference to step 635, it may happen that the reference corresponds to several subsamples; in this case, step 640 is applied for all these subsamples. Then, step 645 performs a check to identify if more subsamples are present for the current sample. If yes, steps 620 to 645 are iterated.
If no further subsamples are present, an additional check 650 is realized by the media writer to check whether there is any further sample to process. If this is the case then the process continues at step 610. Otherwise, the parsing of the media file is finished. This step 650 is also realized after step 620 when the access to subsample(s) information is not required.
In addition to all the previously described embodiment, further optimizations of the subsample information box may be contemplated.
Further modifications of the syntax of the subsample information box may enable to reduce the cost of the subsample description. For that purpose, the different fields introduced in the generic part of the flags parameter may be used. These proposed modifications either reduce the size of the property description or remove information that may be deduced from other ISOBMFF boxes. The different fields may be used separately or simultaneously in any combination.
When the flag field applies to all sample is set to 1 in the generic part of the flags parameter 715, it means that all the samples contain subsamples. It enables a first reduction of the subsample description by removing the parameters entry count 720 and sample_delta 735. Indeed, when description applies to all samples of a track, the sample delta value may be inferred to have the value 1 and the entry count (720) is the number of samples, that is known by media player, for example by reading the entry count information contained in the TimeToSampleBox ( s t ts) or SampleSizeBox ( sts) boxes.
For example, the file writer 300 may set this field to 1 when the user requires an encoding applying the same subsample organization to all samples, through an encoder configuration. For offline encapsulation, the writer may analyse the media file in a two-pass manner so as to determine whether all samples have a subsample description or not. For some applications, the writer may determine it by configuration settings or by construction: for example, when encapsulating point cloud data as single track according to ISO/IEC 23090-18, it knows that all the G-PCC units will be described as sub-samples, for every samples.
In an alternative, the number of consecutive samples that are described in the 'subs' box may be signalled by a new sample_count parameter, for example using the following syntax: aligned (8) class SubSampleInformationBox extends FullBox('subs', version=2 or 3, flags) unsigned int(32) entry count; for (int 1=0; i < entry_count; i++) { unsigned int(32) sample_delta; unsigned int(32) sample_count; for (int k=0; k < sample_count; k++) { unsigned int(16) subsample count; if (subsample count > 0) { for (int j=0; j < subsample count; j++) { if (version == 3) { unsigned int(32) subsample size; else { unsigned int(16) subsample size; unsigned int(1) has reference; if (has reference == 1) unsigned int(7) reference id; else { unsigned int(7) reserved; unsigned int(8) subsample priority; unsigned int(8) discardable; unsigned int(32) codec specific parameters; Possibly, in the loop over the entry count value, the variable i is increased by the value of the sample_count parameter.
When the flag field regular sample pattern is set to 1 in flags 715, it means that the samples containing subsamples are regularly spaced inside the media and therefore the sample delta parameter value is the same for all the entries of the 'subs' box. It enables reducing the subsample description by defining the sample delta value 730 only once in the subsample information box and removing its definition 735 at the entry level. Using this modification avoids the repetition of the sample delta parameters for all entries of the subs' box.
For example, this field may be set to 1 by file writer 300 when all the samples contain only one subsample except one in fifty samples that can be used as random access points and that contain a first subsample containing some encoding parameters and a second subsample containing the actual encoded media frame.
Note that when using this regular sample pattern field, an initial sample delta parameter may also be encoded to indicate the value of the sample_delta for the first entry contained in the subs' box.
When the flag field fixed_nb_subsamples_per_sample is set to 1 in flags 715, it means that all the samples are organized with the same number of subsamples. It enables reducing the subsample description by defining subsample count 740 at the higher level for all the samples, and by removing its definition 745 at the entry level.
Using this modification, it avoids the repetition of the subsample_count parameter for each entry in the subs' box.
For example, this field may be set to 1 by file writer 300 when a point cloud bitstream is encapsulated in a single track data where VolumetricVisualSampleEntry is defined with a sample entry type gpcli: each sample contains one subsample corresponding to a geometry data unit and one or more subsamples corresponding to attribute data units.
As a variant, the subs ample count parameter may be replaced by an array of subsample count values, enabling to describe the number of subsamples for each sample when this number is not constant for all the samples but follows a regular pattern. 25 When the flag field merged property is set to 1 in flags 715, it means that an optimized syntax is used for the coding of the description of a sub-sample. It enables reducing the subsample description by using a more compact description. In this case the set of property 710 is replaced by a unique sub sample property parameter 750.
This parameter may store only the value of the codec specific parameters parameter.
For example, for some standards like SVC, the codec specific parameters parameter already includes a DiscardableFlag parameter and PriorityId parameter. Therefore the merged_property parameter may be set to 1 by the file writer when encapsulating this type of bitstream, avoiding the repetition of information or presence of parameters that are not always specified for ISOBMFF-derived specifications (for example the discardable or subsample priority parameters). In this case, the sub sample property parameter is used to store the value of the codec specific parameters parameter.
It is to be noted that the new versions of subs box described in previous embodiments may be described as a new box, still for subsample description, but in a more compact way. This box may then have a different 4CC than the 'subs' box, for example csbs' for Compact Subsample information box. This box may contain the various embodiments encoding subsample information by reference. It may also contain the new usage of flags values with a generic part and a codec-specific part. It may be stored in track or track fragments. Several occurrences of this box may be present in a same container box provided that their codec-specific flags value differ. Having the compact subsample description in a dedicated box may make life easier for writers or parsers, avoiding the versioning handling plus the combination with the new usage of the flags value (containing a generic part or not). The same distinction may apply for sub sample properties for a collection or a sequence of media items. A brand may indicate when this new box is in use, so that parser can determine whether they can process the subsample information or not.
Figure 9 is a schematic block diagram of a computing device 900 for the implementation of one or more embodiments of the invention. The computing device 900 may be a device such as a microcomputer, a workstation, or a light portable device. The 25 computing device 900 comprises a communication bus 902 connected to: -a central processing unit (CPU) 904, such as a microprocessor; - a random access memory (RAM) 908 for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method for encapsulating, indexing, de-encapsulating, and/or accessing data, the memory capacity thereof can be expanded by an optional RAM connected to an expansion port, for example; - a read-only memory (ROM) 906 for storing computer programs for implementing embodiments of the invention; - a network interface 912 that is, in turn, typically connected to a communication network 914 over which digital data to be processed are transmitted or received. The network interface 912 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 904; -a user interface (UI) 916 for receiving inputs from a user or to display information to a user; - a hard disk (HD) 910; and/or - an I/O module 918 for receiving/sending data from/to external devices such as a video source or display.
The executable code may be stored either in read-only memory 906, on the hard disk 910 or on a removable digital medium for example such as a disk. According to a variant, the executable code of the programs can be received by means of a communication network, via the network interface 912, in order to be stored in one of the storage means of the communication device 900, such as the hard disk 910, before being executed.
The central processing unit 904 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the invention, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 904 is capable of executing instructions from main RAM memory 908 relating to a software application after those instructions have been loaded from the program ROM 906 or the hard-disc (HD) 910 for example. Such a software application, when executed by the CPU 904, causes the steps of the flowcharts shown in the previous figures to be performed.
In this embodiment, the apparatus is a programmable apparatus which uses software to implement the invention. However, alternatively, the present invention may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).
Although the present invention has been described herein above with reference to specific embodiments, the present invention is not limited to the specific embodiments, and modifications will be apparent to a person skilled in the art which lie within the scope of the present invention.
Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the invention, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.
Figure 10 is another example of metadata structures providing subsample description. In the contrary to previous embodiments, this embodiment is not based on a subsample information box.
The subsamples are illustrated by reference 1010 that may correspond to a media data box mdat' or idat or 'imdar of the media file. 1010 is a data container for bytes of data (ex: video data, audio data, text data or volumetric data...). Each sample may have subsamples or not. For example, sample A has 3 subsamples. For example, sample E (1011) has no subsample. As another example, sample X (1012) has only a
subsample description for its second part 1012-2.
Some samples may have the same number of subsamples (like samples A and Z, which have 3 subsamples, or samples M and N, which have 2 subsamples). Some subsamples in different samples may also share the same properties. For example, samples A and Z may have their first subsample that is a geometry data unit. As another example, the subsamples for samples N and M may correspond to sub-pictures or tiles or slices having the same position inside the images of an image sequence.
Most of the time, especially for compressed data, the size of the subsamples may vary within each sample.
In this embodiment, the metadata structures providing subsample description are generated by a file format writer during an encapsulation step. These metadata structures, when present in a media file, are used by file readers (or parsers) to get information on subsamples, for example for partial access or partial extraction or partial decoding.
According to the embodiment illustrated on Figure 10, there may be four metadata structures to describe the subsamples.
The box 1020 is a metadata structure grouping samples having the same sub-samples patterns. It may be implemented in ISOBMFF or specifications deriving from ISOBMFF as a SampleToGroupBox sbgp' . For example, samples A and Z may be mapped to the same entry in the Mapping Table 1030, for example using the group_description_index parameter of the SampleToGroupBox. This usage means that group description index=1 links samples having this group description index value to the first entry in the mapping table 1030 (subsample pattern 1031 in the example of Figure 10). A specific grouping type may be used to indicate that the purpose of sample grouping is for subsample description.
The specific grouping type value may be a 4CC value like 'subs' or ssso' for samples' sub-samples offsets. The grouping type_parameter of the siogy may be used to indicate the type of the subsamples: picture, frame, subframe, sub-picture, slice, file, coding tree unit, TLV unit... A dedicated 4CC may be defined for each subsample type: for example, 'tly-u' for TLV unit, 'sloe' for slice, 'tile' for tile, etc... so that a writer and a reader can understand unambiguously the kinds of subsamples that are described in the media file. The entry _count and sample count parameters keep the same semantics as defined in ISOBMFF. The samples that have no subsample or no subsample description, like sample E, may use a group description index with value 0 to indicate that they are not mapped to any subsample pattern.
The box 1030 is a metadata structure defining a list of subsample patterns. A subsample pattern contains a number of subsamples with a list of their associated subsample properties. For example, subsample pattern 1031 consists in 3 subsamples: the first being associated with the first property of a property container, the second being associated with the second property of this same property container, the third being associated with the third property of this same property container. A subsample pattern may contain an identifier. When no identifier is present, a subsample pattern may be referenced via an implicit index corresponding to its declaration order in the metadata structure 1030. For example, the first entry in the Mapping Table 1030 is the subsample pattern 1031, the second subsample pattern is 1032, etc. The Mapping Table 1030 may be implemented in ISOBMFF as a SampleGroupDescripfionBox 'sgpcir. The grouping type value of this 'sgpd' may be used to indicate that the description is about subsample information. For this purpose it may be set to the same value as the one used in the sbgp implementing the metadata structure grouping samples having the same subsample patterns 1020, for example: 'subs' or sssor. The 'sgpcif for subsample patterns may contain a number of patterns, for example indicated by the entry count parameter. In the example 1030, the number of patterns is 3. Then, each subsample pattern may be defined as a specific SampleGroupEntry, as follows: abstract class SubsampleSampleGroupEntry (unsigned int(32) grouping type) extends SampleGroupDescriptionEntry (grouping type) unsigned int (16) nb subsamples; unsigned int (16) property index[nb subsamples]; where: -nb subsamples parameter indicates the number of subsamples within a pattern (3 for 1031, 2 for 1032 and 2 for 1033).
-property index indicates for each sub-sample of a pattern a reference to a property or to a list of properties associated with this subsample ("(1)" for 1st subsample of the subsample pattern 1031, "(6)" for the 2 subsample of the last subsample of the subsample pattern 1033 declared inside 1030). The reference to a property or to a list of properties may corresponds to an entry in a property container 1040. A subsample may not be associated with any property like the first sub-sample of subsample pattern 1033.
The box 1040 is a metadata structure defining properties or list of properties that may be associated to subsamples. Elementary properties may be defined like P1, P2...
P5 or list of properties may also be defined like the 5th entry "Plist" 1045 consisting in two properties P1 and P4. This means that when a sub-sample is associated with this 5th entry (like for example the 1st subsample of the subsample pattern 1032), the corresponding subsample has these 2 properties. A list of properties may be defined suing references to already defined properties in the property container 1040. For example, the entry (5) in 1040 may be defined by referencing (1) and (4). A list of properties may then consist in a number of properties followed by a list of references. The size of the list corresponds to the number of properties.
In a first variant, the metadata structure defining properties or list of properties 1040 may be implemented in ISOBMFF or derived specifications as a new Box or FullBox dedicated to the declaration of subsample properties. This box is identified by a 400, for example isspo' for subsample properties. Each property may be declared as a 32-bit parameter. Optionally, in a specific version of the box, or depending on the flags value of the box, the 32 bits code may contain one bit (for example the most significant bit) dedicated to indicating an extension: when set to 1, this bit indicates that an additional 32 bits code follows for further describing the property. When set to 0, this bit indicates that there is no additional 32 bits code to describe the property. As an example, code c specific parameter value of the subsample information box of the previous embodiments may be stored as a property inside this new box. Another example may also be a combination of the codec specific parameter with the subsample_priority and the discardable parameters of the subsample information box. The former may not require the use of the extension bit while the latter does.
In a second variant, the metadata structure defining properties or list of properties may use an existing container, for example the ItemPropertiesBox containing an p c o and an Nipma' box. Each property may then be described as a specific box or full box inheriting from ItemProperty or from ItemFullProperty boxes in the 'ipco'. In this variant, the association between a subsample and its properties may be described in the ipma' where the item_ID corresponds to an identifier of a sub-sample in the metadata structure defining a list of subsample patterns 1030. In this variant, the metadata structure defining a list of subsample patterns 1030 then consists in a list of subsample patterns that may only indicate a number of subsamples for the samples mapped to one entry of this box 1030. The reference to a property or to a list of properties may not be present. As well, the SubsampleSampleGroupEntry may only contain a number of subsamples and no property indexes. The ItemPropertiesBox may be declared in a 'meta' box of the track containing the samples with subsamples, for example.
The box 1050 is a metadata structure defining a list of byte ranges. It may contain byte ranges only for samples that are mapped to a subsample pattern in 1020. By default, each byte range is encoded on 32 bits. In some variants, where an indication of a maximum length for the data units is available, the byte ranges may be coded on fewer bytes (e.g. 8 or 16). This maximum length, for example, may be determined from the DecoderConfigurationRecord in some sample entries (e.g. the lengthSizeMinusOne in avcC', '1-ivcC' or 'xi-vcc). Samples that are not mapped to any subsample pattern (liked sample E) may not have their sample size or any byte range indicated in the metadata structure defining a list of byte ranges 1050. Samples containing subsamples that are associated with subsample properties and subsamples not associated to any subsample property (e.g. sample X) may have all their byte ranges (1053) given in the metadata structure defining a list of byte ranges 1050. The metadata structure 1050 may be implemented in ISOBMFF or in a derived specification as a new version of the subs box: aligned (8) class SubSampleInformationBox extends FullBox(Isubs', version, flags) { if (version-new version) f unsigned int(32) entry_count; unsigned int(num bytes) subsample size [entry count]; 1 else { // original 'subs' box unsigned int(32) entry count; int i,j; for (1=0; i < entry_count; i++) { unsigned int(32) sample delta; unsigned int(16) subsample count; if (subsample_count > 0) { for (j=0; j < subsample count; if (version == 1) unsigned int(32) subsample size; else unsigned int(16) subsample_size; unsigned int(8) subsample_priority; unsigned int(8) discardable; unsigned int(32) codec specific parameters; where: entry count indicates the number of byte ranges (sub-sample sizes) declared in the box; subsample size indicates the number of bytes for a subsample; num bytes indicates the number of bytes used to encode the subsempie_size. This number of bytes may be 32 bytes by default or may depend on information about the sample description as explained above.
This embodiment has the benefit of mutualizing the properties, which are declared once and referenced in multiple subsample patterns. It avoids indicating the number of subsamples for each sample having a subsample description as is the case in the subsample information box.
It is to be noted that the properties defined in metadata structure 1040 could also be defined within each subsample pattern in metadata structure 1030. However, this would be a bit less efficient than the current approach illustrated by Figure 10 because it provides less mutualization. For example, with such a variant, the property P1 would be declared two times in mapping table 1030: one time for the first subsample pattern 1031, one other time for the second subsample pattern 1032.
Figure 11 illustrates the main steps of a method for encapsulating subsample description using the data structure illustrated by Figure 10.
In 1100, the encapsulation is setup: this comprises obtaining the type of data to encapsulate, the number of tracks, the formats in use, etc that will be used to create the rfloov' box, and its sub-boxes for ISOBMFF and derived specifications. The encapsulation may generate one media file, fragmented or not or many media segment files, this choice being part of the setup step 1100. The encapsulation module may create the metadata structures like those illustrated in Figure 10.
In step 1101, the encapsulation module reads first data to encapsulate. The data may be compressed data, for example compressed audio or video or may be uncompressed data like text or even raw audio or video. The first data read in 1101 may correspond to information on the type of data (e.g. video or audio or text or volumetric data...). For compressed data, the first data may provide information on the codec(s) in use (e.g. AVC, HEVC, WC for video or some point cloud compression codec). The data may be read as soon as it is generated, captured, or encoded, or may be read from a storage medium.
In step 1102, the encapsulation module determines the type of subsamples for the media to encapsulate. When the data contains several media types, this may be done for each media type. When one media type is encapsulated into multiple tracks, the determination of subsample types may be made for each track or for a subset of tracks, otherwise it may be determined when reading a sample. When determined, the subsample type may be indicated in the grouping_type_parameter of the sample to group box sbgpr or, more generally, in a dedicated parameter used for indicating the subsample type in the metadata structure 1020.
In step 1103, the encapsulation module reads sample data. The sample data may be composed of data units matching subsample description (for example NAL units in NAL unit-based video codecs or TLV units for G-PCC bitstreams). Using sample data read in 1103, the encapsulation module may determine the number of subsamples with their size in bytes, respectively in steps 1104 and 1105. Then, by parsing information either in the initial data or in the sample data, the encapsulation module determines in step 1106 the properties for some or all the subsamples of the current sample. For example, for G-PCC input data, parsing a TLV type may provide the property for a G-PCC unit-based sub-sample. As another example, parsing the NAL unit type allows determination of the vcl_idc parameter for an HEVC slice subsample or a VVC sub-picture subsample.
In step 1107, the encapsulation module checks whether the metadata structure storing the subsample patterns (1030) already contains a pattern with the same number of subsamples and the same associated property or list of properties. If it is the case, the current sample is mapped in the metadata structure 1020 to the found entry in the metadata structure 1030 in step 1108. If the test 1107 returns false, a new entry is created in the metadata structure 1030 with the determined number of samples and their associated properties or list of properties in 1109. If the properties are stored within a subsample pattern in the metadata structure 1030, the encapsulation module may generate the property values or list of property values within the subsample pattern in the 1030 metadata structure. If the properties are stored in their own metadata structure 1040 (the storage location of the properties may be a setting of the encapsulation in 1100), the encapsulation module may add any missing property or property list in the metadata structure 1040 and insert a reference to the property or the property list corresponding to each sub-sample in the metadata structure 1030. This process is repeated until no more samples are available for reading in 1110. Then, when no more samples remain to be read, the encapsulation module finalizes the media file or segment file in 1111. This may comprise, for example, finalizing boxes providing indexing information for a media file or for a media segment file or in computing additional sample description parameters to insert in the sample description.
The parsing of a media file or media segment file encapsulated according to this embodiment may follow the same process described in relation with Figure 6. For the initialization and the loop on data reading, the way a reader obtains subsample information is a bit different when the metadata structures described in reference to Figure 10 are in use, in particular in steps 620 to 645. These differences are now described.
Each time a sample is read in 610, the parser locates the corresponding sample in the metadata structure grouping samples having the same subsample patterns 1020.
This is done by keeping in memory the number of the read samples in the sequence of samples. The metadata structure 1020 follows the same sample order as the sample description in ' stbl and its sub-boxes (or in trunf for fragmented files). The parser then obtains the group_descripfion_index for this sample and use it to locate a subsample pattern corresponding to the current sample. In parallel to reading the sample and obtaining the group_description_index, the parser also iterates over the metadata structure defining a list of byte ranges 1050. For example, from Figure 10 when the sample A is read, its corresponding subsample pattern obtained from 1020 is also read in 1030 and the byte ranges for the 3 subsamples of the sample A are read from 1050.
By doing so, the parser then gets for a sample its subsamples with their byte ranges. To obtain the properties, the parser simply reads from the subsample pattern in 1030 the properties or list of properties, either directly for the variant including the properties inside the subsample patterns or by reference to a property container in the variant using references in the subsample pattern. When references are used, the parser then search the property container 1040 to obtain the property values applying to a sub-sample of the current sample. For the specific case of sample E that is not mapped, the parser may not read anything in the metadata structure defining a list of byte ranges 1050 for a variant not including byte range of non-mapped samples. This process is iterated over the samples of a movie fragment (fragment file) or of a whole sequence (non-fragmented file). In case the reader performs seek or randomly accesses the media file, the parser may loop over the metadata structure grouping samples having the same subsample patterns 1020 to identify which entry in the Mapping Table 1030 the current sample is mapped to. As well, it may obtain the number of subsamples for the preceding samples to synchronize the reading in the metadata structure defining the list of byte ranges. To avoid this, in a variant, the metadata structure defining a list of byte ranges 1050 may contain delimiter between samples, for example a 32 bit code equal to OxFFFFFFFF. So that by detecting these separators, a parser can synchronize on the first byte range for the current sample, thus avoiding the need to determine the cumulated number of byte ranges between the first sample and the current sample parsed or randomly accessed by the parser.
Any step of the algorithms described herein may be implemented in software by execution of a set of instructions or program by a programmable computing machine, such as a PC ("Personal Computer"), a DSP ("Digital Signal Processor") or a microcontroller; or else implemented in hardware by a machine or a dedicated component, such as an FPGA ("Field-Programmable Gate Array") or an ASIC ("Application-Specific Integrated Circuit").
Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the invention, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate.
In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.

Claims (19)

  1. CLAIMS1 A method of encapsulating media data in a media file, the media data comprising one or more samples, at least some of the one or more samples being organized into subsamples, the method comprising: generating subsample descriptive metadata describing the subsample organization, the subsample descriptive metadata comprising for each subsample a subsample description, the subsample description comprising properties; wherein at least one subsample description comprises a reference to a set of at least one property value.
  2. 2. The method of claim 1, wherein said at least one subsample description comprises an information related to the size of the subsample.
  3. 3. The method of claim 1, wherein the subsample descriptive metadata comprises one or more subsample description boxes.
  4. 4 The method of claim 1, wherein the reference refers to another subsample description, the another subsample description comprising the set of at least one property value.
  5. 5. The method of claim 4, wherein the reference is the index of the another subsample description within the subsample descriptive metadata.
  6. 6. The method of claim 1, wherein subsample description comprises a property indicating whether the subsample description comprises the reference.
  7. 7. The method of claim 1, wherein the subsample descriptive metadata comprises a flag indicating that all the samples are organized into subsamples.
  8. 8 The method of claim 1, wherein the subsample descriptive metadata comprises a flag indicating that the subsample description has a repetition pattern between samples for all the samples.
  9. 9. The method of claim 1, wherein the subsample descriptive metadata comprises a flag indicating that at least one property is a merge of several properties.
  10. 10. The method of claim 1, wherein the subsample descriptive metadata comprises a flag indicating that all the samples have a same number of subsamples.
  11. 11 The method of claim 1, wherein the method comprises: generating pattern descriptive metadata for describing patterns, the pattern descriptive metadata comprising at least one pattern description, a pattern description being a set of at least one property value; and wherein the reference refers to a pattern description in the pattern descriptive metadata.
  12. 12. The method of claim 11, wherein the pattern description corresponds to several successive subsamples.
  13. 13. The method of claim 12, wherein the pattern description corresponds to the subsamples of a sample.
  14. 14 A method of reading media data in a media file, the media data comprising one or more samples, at least some of the one or more samples being organized into subsamples, the method comprising: obtaining from the media file subsample descriptive metadata describing the subsample organization, the subsample descriptive metadata comprising for each subsample a subsample description, the subsample description comprising properties; wherein at least one subsample description comprises a reference to a set of at least one property value; and obtaining the properties describing the subsample from the subsample description and the set of at least one property value.
  15. 15. A computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to any one of claims 1 to 14, when loaded into and executed by the programmable apparatus.
  16. 16. A computer-readable storage medium storing instructions of a computer program for implementing a method according to any one of claims 1 to 14.
  17. 17. A computer program which upon execution causes the method of any one of claims 1 to 14 to be performed.
  18. 18 A device for encapsulating media data in a media file, the media data comprising one or more samples, at least some of the one or more samples being organized into subsamples, the device comprising a processor configured for: -generating subsample descriptive metadata describing the subsample organization, the subsample descriptive metadata comprising for each subsample a subsample description, the subsample description comprising properties; -wherein at least one subsample description comprises a reference to a set of at least one property value.
  19. 19 A device for reading media data in a media file, the media data comprising one or more samples, at least some of the one or more samples being organized into subsamples, the device comprising a processor configured for: obtaining from the media file subsample descriptive metadata describing the subsample organization, the subsample descriptive metadata comprising for each subsample a subsample description, the subsample description comprising properties; wherein at least one subsample description comprises a reference to a set of at least one property value; and obtaining the properties describing the subsample from the subsample description and the set of at least one property value.
GB2205003.3A 2022-04-05 2022-04-05 Method and apparatus for describing subsamples in a media file Pending GB2617359A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB2205003.3A GB2617359A (en) 2022-04-05 2022-04-05 Method and apparatus for describing subsamples in a media file
PCT/EP2023/058166 WO2023194179A1 (en) 2022-04-05 2023-03-29 Method and apparatus for describing subsamples in a media file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB2205003.3A GB2617359A (en) 2022-04-05 2022-04-05 Method and apparatus for describing subsamples in a media file

Publications (2)

Publication Number Publication Date
GB202205003D0 GB202205003D0 (en) 2022-05-18
GB2617359A true GB2617359A (en) 2023-10-11

Family

ID=81581553

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2205003.3A Pending GB2617359A (en) 2022-04-05 2022-04-05 Method and apparatus for describing subsamples in a media file

Country Status (1)

Country Link
GB (1) GB2617359A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3609187A1 (en) * 2017-07-09 2020-02-12 LG Electronics Inc. -1- Method for transmitting region-based 360-degree video, method for receiving region-based 360-degree video, region-based 360-degree video transmission device, and region-based 360-degree video reception device
US20210105492A1 (en) * 2019-10-02 2021-04-08 Nokia Technologies Oy Method and apparatus for storage and signaling of sub-sample entry descriptions

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3609187A1 (en) * 2017-07-09 2020-02-12 LG Electronics Inc. -1- Method for transmitting region-based 360-degree video, method for receiving region-based 360-degree video, region-based 360-degree video transmission device, and region-based 360-degree video reception device
US20210105492A1 (en) * 2019-10-02 2021-04-08 Nokia Technologies Oy Method and apparatus for storage and signaling of sub-sample entry descriptions

Also Published As

Publication number Publication date
GB202205003D0 (en) 2022-05-18

Similar Documents

Publication Publication Date Title
US11581022B2 (en) Method and apparatus for storage and signaling of compressed point clouds
CN110506423B (en) Method and apparatus for encoding media data including generated content
JP6643430B2 (en) Playback apparatus, playback method, and program
US11595670B2 (en) Method and apparatus for storage and signaling of sub-sample entry descriptions
US20230025332A1 (en) Method, device, and computer program for improving encapsulation of media content
US11606576B2 (en) Method and apparatus for generating media file comprising 3-dimensional video content, and method and apparatus for replaying 3-dimensional video content
US20220167025A1 (en) Method, device, and computer program for optimizing transmission of portions of encapsulated media content
GB2593897A (en) Method, device, and computer program for improving random picture access in video streaming
US20230370659A1 (en) Method, device, and computer program for optimizing indexing of portions of encapsulated media content data
WO2019192870A1 (en) Method and apparatus for encapsulating images or sequences of images with proprietary information in a file
US11575951B2 (en) Method, device, and computer program for signaling available portions of encapsulated media content
US20220337846A1 (en) Method and apparatus for encapsulating encoded media data in a media file
GB2617359A (en) Method and apparatus for describing subsamples in a media file
US20220166997A1 (en) Method and apparatus for encapsulating video data into a file
GB2575288A (en) Method and apparatus for encapsulating images or sequences of images with proprietary information in a file
GB2623523A (en) Method and apparatus describing subsamples in a media file
WO2023194179A1 (en) Method and apparatus for describing subsamples in a media file
CN118044211A (en) Method, apparatus and computer program for optimizing media content data encapsulation in low latency applications
GB2611105A (en) Method, device and computer program for optimizing media content data encapsulation in low latency applications
KR20220069970A (en) Method, device, and computer program for encapsulating media data into media files
GB2613853A (en) Method, device, and computer program for optimizing encapsulation of point cloud data
WO2023052199A1 (en) Method and apparatus for encapsulation of media data in a media file
GB2617550A (en) Method, device, and computer program for improving multitrack encapsulation of point cloud data
WO2024012915A1 (en) Method, device, and computer program for optimizing dynamic encapsulation and parsing of content data
GB2620651A (en) Method, device, and computer program for optimizing dynamic encapsulation and parsing of content data