WO2011159605A1 - Method and apparatus for encapsulating coded multi-component video - Google Patents

Method and apparatus for encapsulating coded multi-component video Download PDF

Info

Publication number
WO2011159605A1
WO2011159605A1 PCT/US2011/040168 US2011040168W WO2011159605A1 WO 2011159605 A1 WO2011159605 A1 WO 2011159605A1 US 2011040168 W US2011040168 W US 2011040168W WO 2011159605 A1 WO2011159605 A1 WO 2011159605A1
Authority
WO
WIPO (PCT)
Prior art keywords
media data
file
layer
media
component
Prior art date
Application number
PCT/US2011/040168
Other languages
French (fr)
Inventor
Zhenyu Wu
Li Hua Zhu
Original Assignee
Technicolor Usa Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Technicolor Usa Inc filed Critical Technicolor Usa Inc
Priority to EP11727605.5A priority Critical patent/EP2580920A1/en
Priority to JP2013515413A priority patent/JP2013532441A/en
Priority to KR1020127032653A priority patent/KR20130088035A/en
Priority to BR112012031874A priority patent/BR112012031874A2/en
Priority to US13/703,929 priority patent/US20130097334A1/en
Priority to CN2011800293844A priority patent/CN103098484A/en
Publication of WO2011159605A1 publication Critical patent/WO2011159605A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234327Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into layers, e.g. base layer and one or more enhancement layers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/647Control signaling between network components and server or clients; Network processes for video distribution between server and clients, e.g. controlling the quality of the video stream, by dropping packets, protecting content from unauthorised alteration within the network, monitoring of network load, bridging between two different networks, e.g. between IP and wireless
    • H04N21/64784Data processing by the network
    • H04N21/64792Controlling the complexity of the content stream, e.g. by dropping packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8451Structuring of content, e.g. decomposing content into time segments using Advanced Video Coding [AVC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/24Systems for the transmission of television signals using pulse code modulation

Definitions

  • Patent Application Serial No. 61/354,422 entitled “Extension to the Extractor data structure of SVC/MVC file formats,” and filed on June 14, 2010, and U.S. Provisional Patent Application Serial No. 61/354,424, entitled “Some extensions for ISO Base Media File Format for HTTP streaming,” and filed on June 14, 2010.
  • the teachings of the above- identified provisional patent applications are expressly incorporated herein by reference.
  • the present invention relates generally to HTTP streaming. More specifically, the invention relates to encapsulating a media entity for coded multi-component video streams such as scalable video coding (SVC) steams and multi-view coding (MVC) streams for HTTP streaming.
  • SVC scalable video coding
  • MVC multi-view coding
  • an encoded video is often encapsulated and stored at the server side as a file that is compliant with BMFF, such as an MP4 file.
  • the file is usually divided into multiple movie fragments and these fragments are further grouped into segments, which are addressable by client URL requests.
  • different encoded representations of the video content are stored in these segments, so that a client can dynamically choose the desired representation to download and playback during a session.
  • Encoded layered video such as an SVC or MVC bitstream, provides natural support for such bitrate adaptation by enabling different operating points, i.e., representations, in terms of temporal / spatial resolutions, quality, views, etc., by decoding different subsets of the bitstream.
  • BMFF ISO Base Media File Format
  • MP4 file format the metadata for all the layers or representations for one media file are stored in the moov Movie Box, while the media content data for all the layers or representations are stored in the mdat Movie Box.
  • HTTP streaming when the client requests one layer, the whole file has to be sent since all the layers or representations are mixed together and the client does not know where to find the required layer or representation.
  • Extractor is an internal file data structure defined in the SVC / MVC Amendments to the AVC file format extension of BMFF: Information Technology - coding of audio-visual objects - Part 15: Advanced Video Coding (AVC) file format, Amendment 2: File format support for Scalable Video Coding, 2008, pages 15-17. Extractor is designed to enable extraction of NAL units from other tracks by reference, without copying.
  • track is a timed sequence of related samples in an ISO base media file. For media data, a track corresponds to a sequence of images or sampled audio.
  • the syntax of Extractor is shown below: class aligned(8) Extractor () ⁇
  • NALUnitHeader The NAL unit structure as specified in ISO/IEC 14496-10 Annex G for NAL units of type 20:
  • nal_unit_type shall be set to the extractor NAL unit type (type 31).
  • forbidden_zero_bit, reserved_one_bit, and reserved_three_2bits shall be set as specified in ISO/IEC 14496-10 Annex G.
  • Video Coding (AVC) file format Amendment 2: File format support for Scalable Video Coding, ISO/IEC 14496-15 : 2004/ Amd.2: 2008, page 17.
  • AVC Video Coding
  • track_ref_index specifies the index of the track reference of type 'seal' to use to find the track from which to extract data.
  • the sample in that track from which data is extracted is temporally aligned or nearest preceding in the media decoding timeline, i.e. using the time- to-sample table only, adjusted by an offset specified by sample_offset with the sample containing the Extractor.
  • the first track reference has the index value 1; the value 0 is reserved.
  • sample_offset gives the relative index of the sample in the linked track that shall be used as the source of information.
  • Sample 0 zero is the sample with the same, or the closest preceding, decoding time compared to the decoding time of the sample containing the extractor; sample 1 (one) is the next sample, sample -1 (minus 1) is the previous sample, and so on.
  • data_offset The offset of the first byte within the reference sample to copy. If the extraction starts with the first byte of data in that sample, the offset takes the value 0. The offset shall reference the beginning of a NAL unit length field.
  • data_length The number of bytes to copy. If this field takes the value 0, then the entire single referenced NAL unit is copied (i.e. the length to copy is taken from the length field referenced by the data offset, augmented by the additional_bytes field in the case of Aggregators). Further details can be found in Information technology - Coding of audio-visual objects - part 15: Advanced Video Coding (AVC) file format, Amendment 2: File format support for Scalable Video Coding, ISO/IEC 14496-15 :2004/Amd.2:2008.
  • AVC Advanced Video Coding
  • a client If a client has already downloaded one or more content component of a piece of media content from the server, and is in the process to download another content component, the client needs to know whether the previously downloaded content components are among the set of the dependent components of the new one, so that it can make other requests as necessary to download the complete component set.
  • This use case also requires a mechanism to signal an external dependent content component and its location information.
  • BMFF BMFF
  • tref ' a box type used to provide a reference from the containing track to another track in the presentation. This box can be used to describe the dependencies among tracks, however, the dependency is limited to tracks within the same media file.
  • One approach is to signal such information using some out-of-band mechanism.
  • the server can send a manifest file to the client before a session starts.
  • the manifest file is a file that contains dependency and location information of each content component of a requested media content. Then the client is able to request all the necessary component files.
  • this out-of-band approach is not applicable for local file playback, where no manifest file is available.
  • This invention directs to methods and apparatuses for encapsulating component files from a media entity containing more than one layer and for reading a component file.
  • a method for encapsulating and creating component files from a media entity containing more than one layer extracts metadata and media data corresponding to the extracted metadata for each layer from the media entity.
  • the extracted media data and metadata are associated to enable the creation, for each layer, of a component file containing the extracted metadata and the extracted media data.
  • the file encapsulator includes an extractor for extracting metadata and media data corresponding to the extracted metadata for each layer from the media entity; and a correlator for associating the extracted media data with the extracted metadata to enable creation, for each layer, of a component file.
  • Figure 1 shows an example MP4 file format.
  • Figure 2 shows one embodiment of the current invention to encapsulate a media entity.
  • Figure 3 shows the structure of an Encapsulator used to encapsulate or create component files from a media entity which contain multiple layers/representations
  • Figure 4 shows an example of associating additional media data with component files based on dependency relationship.
  • Figure 5 shows an example to extract an NAL unit, by reference, from a movie box/fragment that is different from the one that the extractor is within.
  • Figure 6 shows the involved encapsulation operations for an SVC / MVC type video bitsteam into multiple component files using one of the invented new extractor data structures.
  • Figure 7 shows the structure of a file reader used to read the component files.
  • Figure 8 shows the process of reading an encapsulated component file for a video decoder involving one embodiment of the present invention.
  • Figure 9 shows the encapsulation operations for an SVC / MVC type video bitstream into multiple movie fragments using another preferred new extractor data structures.
  • Figure 10 shows the process of reading an encapsulated component file for a video decoder involving another embodiment of the present invention.
  • a media entity such as a media file or a set of media files or a streaming media
  • a component file is used in a broader sense that it represents a fragment, a segment, a file and other equivalent terms thereto.
  • a media entity containing multiple representations or components is parsed to extract metadata and media data for each representation/component.
  • the representation/component includes layers, such as layers with various temporal/spatial resolutions and quality in SVC, and views in MVC. In the following, layers are also used to refer to representations/components, and these terms are used interchangeably.
  • the metadata describes, for example, what is contained in the media entity for each representation and how to use the media data contained therein.
  • the media data contain media data samples required for serving the purpose of the media data, e.g. decoding of the content, or any necessary information on how to obtain the required data samples.
  • the extracted metadata and media data for each representation or layer are associated/correlated and stored together for user access.
  • a media entity contains three layers: base layer, enhancement layer 1 and enhancement layer 2.
  • the media entity is parsed to extract the metadata and media data for each of the three layers, and those data are stored separately as component files with the metadata and corresponding media data associated together.
  • Fig. 3 shows the structure of a preferred encapsulator 300 used to encapsulate and create component files from a media entity which contain multiple layers, such as SVC encoded videos.
  • the inputs media entity 310 is passed to a metadata extractor 320 and a media data extractor 340.
  • the metadata extractor 320 extracts the metadata 330 for each layer.
  • the media data extractor 340 takes in the metadata 330 and extracts the corresponding media data 350.
  • the metadata extractor 320 and the media data extractor 340 are implemented as one extractor. Both data, metadata 330 and media data 350, are feed into a correlator 380 which associates these two types of data and creates the output component files 390, one component file for each layer.
  • a layered video such as a video encoded by AVC extensions of SVC or MVC, contains multiple media components (scalable layers or views).
  • Such an encoded bitstream can provide different operating points, i.e., representations or layers, in terms of temporal / spatial resolutions, quality, views, etc., by decoding different subsets of the bitstream.
  • there exist coding dependencies among the layers of the bitstream i.e., the decoding of a layer may depend on other layers. Therefore, to request one of such a bitstream' s representations may require retrieving and decoding one or more components or media data from the encapsulated video file.
  • an encoded layered video is often encapsulated into an MP4 file in a way that each layer is stored separately in different segments or component files.
  • certain media data samples, such as NAL units, of the bitstream are required by, or related to, multiple segments or component files, due to the decoding dependencies described above or other dependencies based on the application.
  • FIG. 4 shows an example of this embodiment.
  • an SVC bitstream has three spatial layers, HD1080p, SD and QVGA.
  • Three movie fragments or component files are formed corresponding to the three operating points, and each is addressable by a different URL.
  • all the media data samples, NAL units in this example, required for decoding are copied and stored as media samples contained in the "mdat" box. So, when a client requests a particular operating point or representation by using a proper URL, the server can retrieve the corresponding movie fragment or component file and forwarded to the client.
  • the media data extractor 340 in Fig. 3 further extracts, for each layer, from the input media entity 310 additional media data related to the extracted media data for each of the layers.
  • Correlator 380 further associates the additional extracted media data for each layer to create corresponding component files.
  • BMFF Base Media File Format
  • a reference is identified and built for those additional media data that are related to or required by the media data of a movie fragment or a component file.
  • the reference rather than those additional media data, is associated with the component file along with the metadata and media data thereof.
  • a Reference Identifier 360 is added to the structure of the Encapsulator 300.
  • the Reference Identifier 360 identifies, from input media entity 310, references 370 to those additional media data that are related to extracted media data 350 for each layer.
  • references 370 are associated, via correlator 380, with extracted metadata 330 and extracted media data 350 for each layer, e.g. by embedding said references into said extracted media data 350, for creating corresponding component files 390.
  • the extension is added to provide the extractor data structure with the extra capability to reference to NAL units that reside in a different movie box/fragment or component file other than the one in which extractor resides.
  • the extended Extractor is defined as the following:
  • DataEntryBox (entry version, entry J ags) data_entry; //added extension unsigned int(8) track_ref_index;
  • data_entry is a Uniform Resource Locator (URL) or Uniform Resource Name (URN) entry.
  • Name is a URN, and is required in a URN entry.
  • Location is a URL, and is required in a URL entry and optional in a URN entry, where it gives a location to find the resource with the given name.
  • Each is a null-terminated string using UTF-8 characters. If the self- contained flag is set, the URL form is used and no string is present; the box terminated with the entry-flags field.
  • the URL type should be of a service that delivers a file. Relative URLs are permissible and are relative to the file containing the Movie Box/Fragment that contains the track that the Extractor belongs to.
  • FIG. 6 shows the involved encapsulation operations for an SVC / MVC type video bitsteam into multiple movie fragments or component files using the invented new extractor data structure.
  • the process starts at step 601. Each NAL unit is read in one by one in step 610. If the end of the bitstream is reached in step 620, the process stops at 690; otherwise, the process proceeds to the next step 630. Decision step 630 determines if the current NAL unit depends on NAL units from other track for decoding. If the determination is that the current NAL unit does not depend on NAL units from other tracks for decoding, the control is then transferred to step 640, wherein a sample is formed using the current NAL unit and is placed in the current track.
  • step 650 determines if the track from which NAL units are required by the current NAL unit resides within the same movie fragment. If the determination is that the track resides in the same movie fragment, step 670 is employed to fill in an extended Extractor to reference to the NAL unit from that other track. If the determination is that the track resides in a different movie fragment, the URL or URN of such a movie fragment is identified in step 660 and the process proceeds to step 670 with the identified URL and URN to be filled in an extended Extractor. After such an extended Extractor is filled in, it is embedded into the current track in step 680. Then, the process starts over with the next NAL unit in step 610.
  • references 370 are embedded into extracted metadata 330 and indices to reference 370 are added to extracted media data 350 via correlator 380, which further associates the metadata and the media data for each layer for creating corresponding component files 390.
  • a box called HTTP Streaming Information Box is disclosed. This box contains the information that can assist the HTTP streaming of the ISO file. It is preferred that the HTTP Streaming Information Box be placed as early as possible in the component files, e.g. at the beginning of the files. The box can also serve as a source by the server when forming a manifest file for the client.
  • Another type of box called Media Reference Box which is contained in the HTTP Streaming Information Box is also disclosed. This box contains the information about the external dependent files. The extractor structure is further extended so that it can reference media samples across different component files. The information contained in Media Reference Box can be utilized by extractors to save signaling overhead.
  • the HTTP Streaming Information Box aids the HTTP streaming operation of an ISO media file. It contains relevant information about HTTP streaming delivery of the file, including Media Reference Box as defined below, among other possible types of boxes.
  • the HTTP Streaming Information Box is preferably placed as early as possible in files, for maximum utility.
  • Media Reference Box is contained in HTTP Streaming Information Box, and it contains a table of the data references in the form of URL that declare the locations of the external files that each track included in the box is dependent on. By reading this box, the file reader is able to identify the external dependent file sources, such as external component files, of a track in the file, as well as means to retrieve them.
  • entry _count is an integer that counts the actual entries
  • track_ID is an integer that uniquely identifies the track in the file upon which the box is applied;
  • dependent_source_count is an integer that counts the external media sources that the track in the file with track_ID that is dependent on; data_entry: is a URL entry that points to one external media source the designated track is dependent on. Each is a null-terminated string using UTF-8 characters.
  • the URL type should be of a service that delivers a file. Relative URLs are permissible and are relative to the file that contains this media reference box.
  • Media Reference Box is designed to facilitate, in a number of ways, HTTP streaming of a media entity containing more than one layer.
  • the in-file information from the box can be easily extracted to be included in a manifest file.
  • Such information in the manifest can help the client, before actual HTTP streaming, discover relevant service information and perform the corresponding service initialization, such as requesting all the associated component files, allocating necessary buffer resources, etc.
  • the client when the client requests a different representation of some multi-component media content, which has another representation already been delivered as a component file, the client can check the corresponding Media Reference Box in the file to see if the file contains any of the dependent components of the new representation that can be reused.
  • the box helps reduce the signaling overhead of the extended Extractor structure as defined below.
  • Extractor is further proposed to extend its capability of referencing data from tracks of external media files.
  • media_reference_index specifies an index of the entry to the reference table contained in the Media Reference Box that has the same associated track_ID value as the track that contains the extractor. If media_reference_index equals to 0, the extractor references to the data from another track but within the same file as the extractor. In this case, there shall not be a reference table in Media Reference Box that has the same track_ID value as the track. If media_reference_index is between 1 and the value of dependent_source_count from the reference table associated with the track from the Media Reference Box, the URL referenced by media_reference_index from the reference table points to an external file, which contains a track from which the extractor extracts data.
  • extractors it is now possible to use extractors to link to and extract data from a track that belongs to an external component file. It is especially useful when content components from an encoded piece of multi-component media content, such as encoded by SVC or MVC, are encapsulated into different component files. With the extended extractors, extraction can take place across file boundaries. This avoids duplicating the same data in different component files.
  • Figure 9 shows the involved encapsulation operations for an SVC / MVC type video bitsteam into multiple movie fragments or component files using the disclosed HTTP Streaming Information Box and Media Reference Box as well as the further extended extractor data structure.
  • This process is similar to the process shown in Fig. 6 with a few modifications due to the above described boxes, and the further extension of the extractor.
  • location information URL/URN is indentified in step 660
  • the location information is used to fill in the reference table in the mref box (Media Reference Box) in step 965.
  • Step 970 further fills in an Extractor with the indices to the location information of the reference table.
  • the extractor is then embedded into the current track.
  • mref box and its container hsin box (HTTP Streaming Information Box) are embedded into the metadata of the component file.
  • a file reader 700 shown in Fig. 7 is employed.
  • a parser 710 first parses the component file to get metadata and media data, and a reference if available. If, according to the decoded reference, the media data are related to media data of other component files such as through decoding dependency, a retriever 720 retrieves the related media data from other component files as indicated in the reference.
  • a processor 730 further processes the metadata and media data obtained from the component file as well as the additional media data if available.
  • the parsing operation by the parser 710 includes various necessary operations to obtain the metadata, the media data that are ready for the processor 730, and the reference ready for the retriever 720. It will include further parsing the metadata and/or the media data when necessary.
  • the reference is embedded in the media data, and thus the reference is obtained by parsing the media data. If a reference is available, the parsing step further includes analyzing the syntax of the reference and decoding the reference.
  • the processor 730 can contain a video decoder if the component file contains video content. In a different embodiment, the parser and the retriever can be incorporated in the processor.
  • FIG. 8 shows the process of reading an SVC / MVC type video bitstream for a video decoder involving the present invention.
  • Step 801 accesses a component video file whose metadata and media data for each layer are identified in step 805.
  • the identified metadata and media data are parsed in step 810 and each NAL unit of the media data is read in one by one in step 815.
  • a decision is first made at step 820 to determine if the end of the bitstream is reached, and the process ends at step 825 if the answer is "Yes". Otherwise, the process proceeds to decision step 830 to determine if the current NAL unit is an extractor.
  • the NAL unit is sent to decoder at step 835. If the current NAL unit is an extractor, it is determined at step 840 that whether the current NAL unit depends on a NAL unit outside the same component file or not. If the required NAL unit is within the same component file, it is retrieved from the current file in step 845 and sent to the decoder at step 835. If the required NAL unit is from another component file, the NAL unit is located using the reference information Data_entry in the extractor in step 850, retrieved from the remote file in step 855 and then sent to the decoder in step 835.
  • the reference is identified in the parser 710 by parsing the media data to get the embedded reference indices, and to obtain corresponding reference according to the reference indices.
  • the corresponding process of reading an SVC / MVC type video bitstream for a video decoder is shown in Fig. 10, which is similar to the process of Fig. 8.
  • the parsing of the metadata in step 810 enables the analysis of the reference contained therein in parallel with the parsing of the media data.
  • other component files that are referenced to are identified in step 1014. Retrieving of those other component files are started in step 1012 in parallel with the remaining steps of the process.
  • step 850 After accessing the location information of the component file that the current NAL unit depends on in step 850, local storage, such as media buffer, is checked for availability of such a component file. If the required component file is available locally, then the NAL unit of the local copy is retrieved; otherwise, the NAL unit from remote file is retrieved. Note that the local copy of the component file can be obtained by the parallel retrieving in step 1012, or it can be obtained from a previous request of such component file.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method and a device for encapsulating a media entity containing more than one layer into multiple component files, each for one layer, are described along with the corresponding method and device for component file reading. A new box for ISO BMFF and extensions to the Extractor data structure of SVC / MVC file formats are proposed. The new box enables access of referenced component files in parallel with the processing of the current component file. The extractor extensions of the invention allow NAL units referencing across different component files. The present invention enables adaptive HTTP streaming of the media files.

Description

METHOD AND APPARATUS FOR ENCAPSULATING CODED MULTI- COMPONENT VIDEO
CROSS-REFERENCE TO RELATED APPLICATIONS
The present application for patent claims the benefit of priority from U.S. Provisional
Patent Application Serial No. 61/354,422, entitled "Extension to the Extractor data structure of SVC/MVC file formats," and filed on June 14, 2010, and U.S. Provisional Patent Application Serial No. 61/354,424, entitled "Some extensions for ISO Base Media File Format for HTTP streaming," and filed on June 14, 2010. The teachings of the above- identified provisional patent applications are expressly incorporated herein by reference.
The present application is related to the following co-pending, commonly owned U.S. Patent Application Serial No. I entitled "Method and Apparatus for Encapsulating Coded Multi-component Video", filed concurrently herewith (Attorney Docket No. PU100140). The teachings of the non-provisional patent applications identified immediately above are expressly incorporated herein by reference.
TECHNICAL FIELD
The present invention relates generally to HTTP streaming. More specifically, the invention relates to encapsulating a media entity for coded multi-component video streams such as scalable video coding (SVC) steams and multi-view coding (MVC) streams for HTTP streaming.
BACKGROUND OF THE INVENTION
In HTTP streaming applications, an encoded video is often encapsulated and stored at the server side as a file that is compliant with BMFF, such as an MP4 file. Moreover, to realize adaptive HTTP streaming, the file is usually divided into multiple movie fragments and these fragments are further grouped into segments, which are addressable by client URL requests. In practice, different encoded representations of the video content are stored in these segments, so that a client can dynamically choose the desired representation to download and playback during a session. Encoded layered video, such as an SVC or MVC bitstream, provides natural support for such bitrate adaptation by enabling different operating points, i.e., representations, in terms of temporal / spatial resolutions, quality, views, etc., by decoding different subsets of the bitstream. However, existing ISO Base Media File Format (BMFF) standards, such as the MP4 file format, do not support separate access of each layer or representation, and thus are not applicable to the HTTP streaming application. As shown in Fig. 1, in MP4 file format, the metadata for all the layers or representations for one media file are stored in the moov Movie Box, while the media content data for all the layers or representations are stored in the mdat Movie Box. In HTTP streaming, when the client requests one layer, the whole file has to be sent since all the layers or representations are mixed together and the client does not know where to find the required layer or representation.
As will be seen later, in adaptive HTTP streaming applications, it is desirable to be able to reference media data samples, such as network abstract layer (NAL) units, across movie fragment or component file boundaries. In SVC/MVC context, such a reference may be built by using mechanisms like "Extractor". Extractor is an internal file data structure defined in the SVC / MVC Amendments to the AVC file format extension of BMFF: Information Technology - coding of audio-visual objects - Part 15: Advanced Video Coding (AVC) file format, Amendment 2: File format support for Scalable Video Coding, 2008, pages 15-17. Extractor is designed to enable extraction of NAL units from other tracks by reference, without copying. Here track is a timed sequence of related samples in an ISO base media file. For media data, a track corresponds to a sequence of images or sampled audio. The syntax of Extractor is shown below: class aligned(8) Extractor () {
NALUnitHeader( );
unsigned int(8) track_ref_index;
signed int(8) sample_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_length;
} The semantics of the Extractor data structure are:
NALUnitHeader: The NAL unit structure as specified in ISO/IEC 14496-10 Annex G for NAL units of type 20:
nal_unit_type shall be set to the extractor NAL unit type (type 31).
forbidden_zero_bit, reserved_one_bit, and reserved_three_2bits shall be set as specified in ISO/IEC 14496-10 Annex G.
Other fields (nal_ref_idc, idr_flag, priority _id, no_inter_layer_pred_flag, dependency_id, quality_id, temporal_id, use_ref_base_pic_flag, discardable_flag, and output_flag) shall be set as specified in B.4 of Information technology - Coding of audio-visual objects - part 15: Advanced
Video Coding (AVC) file format, Amendment 2: File format support for Scalable Video Coding, ISO/IEC 14496-15 : 2004/ Amd.2: 2008, page 17.
track_ref_index specifies the index of the track reference of type 'seal' to use to find the track from which to extract data. The sample in that track from which data is extracted is temporally aligned or nearest preceding in the media decoding timeline, i.e. using the time- to-sample table only, adjusted by an offset specified by sample_offset with the sample containing the Extractor. The first track reference has the index value 1; the value 0 is reserved.
sample_offset gives the relative index of the sample in the linked track that shall be used as the source of information. Sample 0 (zero) is the sample with the same, or the closest preceding, decoding time compared to the decoding time of the sample containing the extractor; sample 1 (one) is the next sample, sample -1 (minus 1) is the previous sample, and so on.
data_offset: The offset of the first byte within the reference sample to copy. If the extraction starts with the first byte of data in that sample, the offset takes the value 0. The offset shall reference the beginning of a NAL unit length field.
data_length: The number of bytes to copy. If this field takes the value 0, then the entire single referenced NAL unit is copied (i.e. the length to copy is taken from the length field referenced by the data offset, augmented by the additional_bytes field in the case of Aggregators). Further details can be found in Information technology - Coding of audio-visual objects - part 15: Advanced Video Coding (AVC) file format, Amendment 2: File format support for Scalable Video Coding, ISO/IEC 14496-15 :2004/Amd.2:2008.
Currently extractors are only able to extract, by reference, the NAL units from other tracks, but within the same movie box/fragment. In other words, it is not possible to use extractors to extract NAL units from a different segment or file. This restriction limits the use of extractors in the above use case.
If a client has already downloaded one or more content component of a piece of media content from the server, and is in the process to download another content component, the client needs to know whether the previously downloaded content components are among the set of the dependent components of the new one, so that it can make other requests as necessary to download the complete component set. This use case also requires a mechanism to signal an external dependent content component and its location information.
In BMFF, there is a box type called "tref ', which is used to provide a reference from the containing track to another track in the presentation. This box can be used to describe the dependencies among tracks, however, the dependency is limited to tracks within the same media file.
One approach is to signal such information using some out-of-band mechanism. For example, for an HTTP streaming application, the server can send a manifest file to the client before a session starts. The manifest file is a file that contains dependency and location information of each content component of a requested media content. Then the client is able to request all the necessary component files. However, this out-of-band approach is not applicable for local file playback, where no manifest file is available.
Prior solutions to the problems mentioned above have not adequately been established in the art. It would be desirable to provide the ability to parse and encapsulate layers without sacrificing speed and transport efficiency. Such results have not heretofore been achieved in the art.
SUMMARY OF THE INVENTION
This invention directs to methods and apparatuses for encapsulating component files from a media entity containing more than one layer and for reading a component file. According to an aspect of the present invention, there is provided a method for encapsulating and creating component files from a media entity containing more than one layer. The method extracts metadata and media data corresponding to the extracted metadata for each layer from the media entity. The extracted media data and metadata are associated to enable the creation, for each layer, of a component file containing the extracted metadata and the extracted media data.
According to another aspect of the present invention, there is provided a file encapsulator. The file encapsulator includes an extractor for extracting metadata and media data corresponding to the extracted metadata for each layer from the media entity; and a correlator for associating the extracted media data with the extracted metadata to enable creation, for each layer, of a component file.
BRIEF DESCRIPTION OF THE DRAWINGS
The above features of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings in which:
Figure 1 shows an example MP4 file format.
Figure 2 shows one embodiment of the current invention to encapsulate a media entity.
Figure 3 shows the structure of an Encapsulator used to encapsulate or create component files from a media entity which contain multiple layers/representations,
Figure 4 shows an example of associating additional media data with component files based on dependency relationship.
Figure 5 shows an example to extract an NAL unit, by reference, from a movie box/fragment that is different from the one that the extractor is within.
Figure 6 shows the involved encapsulation operations for an SVC / MVC type video bitsteam into multiple component files using one of the invented new extractor data structures.
Figure 7 shows the structure of a file reader used to read the component files.
Figure 8 shows the process of reading an encapsulated component file for a video decoder involving one embodiment of the present invention. Figure 9 shows the encapsulation operations for an SVC / MVC type video bitstream into multiple movie fragments using another preferred new extractor data structures.
Figure 10 shows the process of reading an encapsulated component file for a video decoder involving another embodiment of the present invention.
DETAILED DESCRIPTION
In present invention, a media entity, such as a media file or a set of media files or a streaming media, is divided or encapsulated into multiple movie component files, which are addressable by client URL requests. Here, a component file is used in a broader sense that it represents a fragment, a segment, a file and other equivalent terms thereto.
In one embodiment of the present invention, a media entity containing multiple representations or components is parsed to extract metadata and media data for each representation/component. Examples of the representation/component includes layers, such as layers with various temporal/spatial resolutions and quality in SVC, and views in MVC. In the following, layers are also used to refer to representations/components, and these terms are used interchangeably. The metadata describes, for example, what is contained in the media entity for each representation and how to use the media data contained therein. The media data contain media data samples required for serving the purpose of the media data, e.g. decoding of the content, or any necessary information on how to obtain the required data samples. The extracted metadata and media data for each representation or layer are associated/correlated and stored together for user access. The storing operation can be done physically on a hard drive or other storing media, or can be performed virtually through a relationship management mechanism so that the metadata and media data appear to be stored together when interfacing with other applications or modules when they indeed are actually located in different places on storing media. Fig. 2 illustrates an example of this embodiment. In Fig. 2, a media entity contains three layers: base layer, enhancement layer 1 and enhancement layer 2. The media entity is parsed to extract the metadata and media data for each of the three layers, and those data are stored separately as component files with the metadata and corresponding media data associated together.
Fig. 3 shows the structure of a preferred encapsulator 300 used to encapsulate and create component files from a media entity which contain multiple layers, such as SVC encoded videos. The inputs media entity 310 is passed to a metadata extractor 320 and a media data extractor 340. The metadata extractor 320 extracts the metadata 330 for each layer. The media data extractor 340 takes in the metadata 330 and extracts the corresponding media data 350. Note that in a different embodiment, the metadata extractor 320 and the media data extractor 340 are implemented as one extractor. Both data, metadata 330 and media data 350, are feed into a correlator 380 which associates these two types of data and creates the output component files 390, one component file for each layer.
A layered video, such as a video encoded by AVC extensions of SVC or MVC, contains multiple media components (scalable layers or views). Such an encoded bitstream can provide different operating points, i.e., representations or layers, in terms of temporal / spatial resolutions, quality, views, etc., by decoding different subsets of the bitstream. Furthermore, there exist coding dependencies among the layers of the bitstream, i.e., the decoding of a layer may depend on other layers. Therefore, to request one of such a bitstream' s representations may require retrieving and decoding one or more components or media data from the encapsulated video file. To facilitate the extraction process for different representations, an encoded layered video is often encapsulated into an MP4 file in a way that each layer is stored separately in different segments or component files. In this case, it needs to be taken into account that certain media data samples, such as NAL units, of the bitstream are required by, or related to, multiple segments or component files, due to the decoding dependencies described above or other dependencies based on the application.
In another embodiment of the present invention, additional media data required by a segment or a component file are extracted and associated with the segment or component file. Figure 4 shows an example of this embodiment. In the figure, an SVC bitstream has three spatial layers, HD1080p, SD and QVGA. Three movie fragments or component files are formed corresponding to the three operating points, and each is addressable by a different URL. Inside each movie fragment or component file, all the media data samples, NAL units in this example, required for decoding are copied and stored as media samples contained in the "mdat" box. So, when a client requests a particular operating point or representation by using a proper URL, the server can retrieve the corresponding movie fragment or component file and forwarded to the client. In this embodiment, the media data extractor 340 in Fig. 3 further extracts, for each layer, from the input media entity 310 additional media data related to the extracted media data for each of the layers. Correlator 380 further associates the additional extracted media data for each layer to create corresponding component files.
For the sake of storage space saving, it is desirable to be able to reference media data samples, such as NAL units, across movie fragment or component file boundaries, without actually duplicating the same data in each component file. However, ISO Base Media File Format (BMFF) and its extensions currently do not support this feature. To solve this problem, in a further embodiment of the present invention, a reference is identified and built for those additional media data that are related to or required by the media data of a movie fragment or a component file. The reference, rather than those additional media data, is associated with the component file along with the metadata and media data thereof. One can embed the references into the extracted media data for each layer, and then associate the extracted metadata and extracted media data for each layer for creating corresponding component files.
In this embodiment, a Reference Identifier 360 is added to the structure of the Encapsulator 300. The Reference Identifier 360 identifies, from input media entity 310, references 370 to those additional media data that are related to extracted media data 350 for each layer. Then references 370 are associated, via correlator 380, with extracted metadata 330 and extracted media data 350 for each layer, e.g. by embedding said references into said extracted media data 350, for creating corresponding component files 390.
As discussed earlier, in SVC/MVC context, such a reference may be built by using mechanisms like "Extractor". Currently extractors are only able to extract, by reference, the NAL units from other tracks, but within the same movie box/fragment. In other words, it is not possible to use extractors to extract NAL units from a different segment or file. This restriction limits the use of extractors in other cases. Hereafter, an extension to the extractor data structure is disclosed, where the extension is aimed to support efficient encapsulation of SVC / MVC type layered video content into multiple component files as described before.
The extension is added to provide the extractor data structure with the extra capability to reference to NAL units that reside in a different movie box/fragment or component file other than the one in which extractor resides.
The extended Extractor is defined as the following:
Syntax: aligned (8) class DataEntryUrlBox (bit (24) flags)
extends FullBox ('url', version = 0, flags) {
string location;
}
aligned (8) class DataEntryUrnBox (bit (24) flags)
extends FullBox ('urn', version = 0, flags) {
string name;
string location;
}
class aligned (8) Extractor () {
NALUnitHeader ( );
DataEntryBox (entry version, entry J ags) data_entry; //added extension unsigned int(8) track_ref_index;
signed int(8) sample_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_length;
}
Semantics: data_entry is a Uniform Resource Locator (URL) or Uniform Resource Name (URN) entry. Name is a URN, and is required in a URN entry. Location is a URL, and is required in a URL entry and optional in a URN entry, where it gives a location to find the resource with the given name. Each is a null-terminated string using UTF-8 characters. If the self- contained flag is set, the URL form is used and no string is present; the box terminated with the entry-flags field. The URL type should be of a service that delivers a file. Relative URLs are permissible and are relative to the file containing the Movie Box/Fragment that contains the track that the Extractor belongs to.
Other fields have the same semantics as the original Extractor described before. With the extended extractor, it is now possible to extract a NAL unit, by reference, from a movie box/fragment that is different from the one the extractor is within. Figure 5 shows such an example, with the same SVC bitstream as Figure 4 but using the new extended Extractor data structure. As can be seen from the figure, now the SD movie fragment can reference to the NAL units from the QVGA movie fragments. Likewise, the HD1080p movie fragment can use the extractors to reference NAL units from both QVGA and SD movie fragments. Compared to Figure 4, no NAL units are duplicated across these movie fragments, thus storage space is saved.
Figure 6 shows the involved encapsulation operations for an SVC / MVC type video bitsteam into multiple movie fragments or component files using the invented new extractor data structure. The process starts at step 601. Each NAL unit is read in one by one in step 610. If the end of the bitstream is reached in step 620, the process stops at 690; otherwise, the process proceeds to the next step 630. Decision step 630 determines if the current NAL unit depends on NAL units from other track for decoding. If the determination is that the current NAL unit does not depend on NAL units from other tracks for decoding, the control is then transferred to step 640, wherein a sample is formed using the current NAL unit and is placed in the current track. If the determination from step 630 is that there is dependency between the current NAL unit and NAL units from other track, the process goes on to step 650. Decision step 650 further determines if the track from which NAL units are required by the current NAL unit resides within the same movie fragment. If the determination is that the track resides in the same movie fragment, step 670 is employed to fill in an extended Extractor to reference to the NAL unit from that other track. If the determination is that the track resides in a different movie fragment, the URL or URN of such a movie fragment is identified in step 660 and the process proceeds to step 670 with the identified URL and URN to be filled in an extended Extractor. After such an extended Extractor is filled in, it is embedded into the current track in step 680. Then, the process starts over with the next NAL unit in step 610.
In a different embodiment, references 370 are embedded into extracted metadata 330 and indices to reference 370 are added to extracted media data 350 via correlator 380, which further associates the metadata and the media data for each layer for creating corresponding component files 390. In the context of the ISO Media Base File Format, a box called HTTP Streaming Information Box is disclosed. This box contains the information that can assist the HTTP streaming of the ISO file. It is preferred that the HTTP Streaming Information Box be placed as early as possible in the component files, e.g. at the beginning of the files. The box can also serve as a source by the server when forming a manifest file for the client. Another type of box called Media Reference Box which is contained in the HTTP Streaming Information Box is also disclosed. This box contains the information about the external dependent files. The extractor structure is further extended so that it can reference media samples across different component files. The information contained in Media Reference Box can be utilized by extractors to save signaling overhead.
The detailed definition for the proposed HTTP streaming information box, media reference box and further improved extractors are as follows.
• HTTP Streaming Information Box
Definition:
Box Type: 'hsin'
Container: File
Mandatory: No
Quantity: Zero or one
The HTTP Streaming Information Box aids the HTTP streaming operation of an ISO media file. It contains relevant information about HTTP streaming delivery of the file, including Media Reference Box as defined below, among other possible types of boxes. The HTTP Streaming Information Box is preferably placed as early as possible in files, for maximum utility.
Syntax: aligned(8) class HTTPStreaminglnfoBox extends Box ('hsin') {
}
• Media Reference Box
Definition:
Box Type: 'mref
Container: 'hsin' Mandatory: No
Quantity: Zero or one
Media Reference Box is contained in HTTP Streaming Information Box, and it contains a table of the data references in the form of URL that declare the locations of the external files that each track included in the box is dependent on. By reading this box, the file reader is able to identify the external dependent file sources, such as external component files, of a track in the file, as well as means to retrieve them. Syntax:
aligned(8) class DataEntryUrlBox ( bit(24) flags ) extends Box ( 'url') {
string location;
} aligned(8) class MediaReferenceBox extends Box ('mref ) {
unsigned int(16) entr _count;
for ( i = 1; i <= entr _count; i++) {
unsigned int(32) track_ID;
unsigned int(16) dependent_source_count;
for ( j = 1; j <= dependent_source_count; j++){
DataEntryUrlBox data_entry;
}
}
}
Semantics:
entry _count: is an integer that counts the actual entries;
track_ID: is an integer that uniquely identifies the track in the file upon which the box is applied;
dependent_source_count: is an integer that counts the external media sources that the track in the file with track_ID that is dependent on; data_entry: is a URL entry that points to one external media source the designated track is dependent on. Each is a null-terminated string using UTF-8 characters. The URL type should be of a service that delivers a file. Relative URLs are permissible and are relative to the file that contains this media reference box.
Media Reference Box, as defined above, is designed to facilitate, in a number of ways, HTTP streaming of a media entity containing more than one layer.
First, it can explicitly signal the dependency relationship among component files at the beginning of a component file through the reference table. Thus, once the client has downloaded a small portion of the component file, it is able to know all the related external component files of its track(s), and make corresponding requests to obtain the complete set(s) for playback through the references contained in the table, if necessary.
Second, the in-file information from the box can be easily extracted to be included in a manifest file. Such information in the manifest can help the client, before actual HTTP streaming, discover relevant service information and perform the corresponding service initialization, such as requesting all the associated component files, allocating necessary buffer resources, etc.
Third, when the client requests a different representation of some multi-component media content, which has another representation already been delivered as a component file, the client can check the corresponding Media Reference Box in the file to see if the file contains any of the dependent components of the new representation that can be reused.
Finally, the box helps reduce the signaling overhead of the extended Extractor structure as defined below.
• Extractors
Extractor is further proposed to extend its capability of referencing data from tracks of external media files.
Extended syntax:
class aligned (8) Extractor ( ) {
NALUnitHeader ( );
unsigned int(16) media_reference_index; unsigned int(8) track_ref_index;
signed int(8) sample_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_length;
}
Semantics:
media_reference_index: specifies an index of the entry to the reference table contained in the Media Reference Box that has the same associated track_ID value as the track that contains the extractor. If media_reference_index equals to 0, the extractor references to the data from another track but within the same file as the extractor. In this case, there shall not be a reference table in Media Reference Box that has the same track_ID value as the track. If media_reference_index is between 1 and the value of dependent_source_count from the reference table associated with the track from the Media Reference Box, the URL referenced by media_reference_index from the reference table points to an external file, which contains a track from which the extractor extracts data.
The semantics of other fields remain the same as the original extractor definition.
With the further extended extractor structure, it is now possible to use extractors to link to and extract data from a track that belongs to an external component file. It is especially useful when content components from an encoded piece of multi-component media content, such as encoded by SVC or MVC, are encapsulated into different component files. With the extended extractors, extraction can take place across file boundaries. This avoids duplicating the same data in different component files.
Figure 9 shows the involved encapsulation operations for an SVC / MVC type video bitsteam into multiple movie fragments or component files using the disclosed HTTP Streaming Information Box and Media Reference Box as well as the further extended extractor data structure. This process is similar to the process shown in Fig. 6 with a few modifications due to the above described boxes, and the further extension of the extractor. After location information URL/URN is indentified in step 660, the location information is used to fill in the reference table in the mref box (Media Reference Box) in step 965. Step 970 further fills in an Extractor with the indices to the location information of the reference table. The extractor is then embedded into the current track. When the end of the bitstream is reached at step 620, mref box and its container hsin box (HTTP Streaming Information Box) are embedded into the metadata of the component file.
To read a component file, a file reader 700 shown in Fig. 7 is employed. A parser 710 first parses the component file to get metadata and media data, and a reference if available. If, according to the decoded reference, the media data are related to media data of other component files such as through decoding dependency, a retriever 720 retrieves the related media data from other component files as indicated in the reference. A processor 730 further processes the metadata and media data obtained from the component file as well as the additional media data if available. The parsing operation by the parser 710 includes various necessary operations to obtain the metadata, the media data that are ready for the processor 730, and the reference ready for the retriever 720. It will include further parsing the metadata and/or the media data when necessary. In one embodiment, the reference is embedded in the media data, and thus the reference is obtained by parsing the media data. If a reference is available, the parsing step further includes analyzing the syntax of the reference and decoding the reference. The processor 730 can contain a video decoder if the component file contains video content. In a different embodiment, the parser and the retriever can be incorporated in the processor.
Figure 8 shows the process of reading an SVC / MVC type video bitstream for a video decoder involving the present invention. Step 801 accesses a component video file whose metadata and media data for each layer are identified in step 805. The identified metadata and media data are parsed in step 810 and each NAL unit of the media data is read in one by one in step 815. For the current NAL unit, a decision is first made at step 820 to determine if the end of the bitstream is reached, and the process ends at step 825 if the answer is "Yes". Otherwise, the process proceeds to decision step 830 to determine if the current NAL unit is an extractor. If it is not an extractor, which means it is a normal NAL unit containing decoding data, the NAL unit is sent to decoder at step 835. If the current NAL unit is an extractor, it is determined at step 840 that whether the current NAL unit depends on a NAL unit outside the same component file or not. If the required NAL unit is within the same component file, it is retrieved from the current file in step 845 and sent to the decoder at step 835. If the required NAL unit is from another component file, the NAL unit is located using the reference information Data_entry in the extractor in step 850, retrieved from the remote file in step 855 and then sent to the decoder in step 835.
In another embodiment, the reference is identified in the parser 710 by parsing the media data to get the embedded reference indices, and to obtain corresponding reference according to the reference indices. The corresponding process of reading an SVC / MVC type video bitstream for a video decoder is shown in Fig. 10, which is similar to the process of Fig. 8. At step 810, since the reference is placed at the beginning of the component file according to a preferred embodiment, the parsing of the metadata in step 810 enables the analysis of the reference contained therein in parallel with the parsing of the media data. When analyzing the reference, other component files that are referenced to are identified in step 1014. Retrieving of those other component files are started in step 1012 in parallel with the remaining steps of the process. After accessing the location information of the component file that the current NAL unit depends on in step 850, local storage, such as media buffer, is checked for availability of such a component file. If the required component file is available locally, then the NAL unit of the local copy is retrieved; otherwise, the NAL unit from remote file is retrieved. Note that the local copy of the component file can be obtained by the parallel retrieving in step 1012, or it can be obtained from a previous request of such component file.
Although preferred embodiments of the present invention have been described in detail herein, it is to be understood that this invention is not limited to these embodiments, and that other modifications and variations may be effected by one skilled in the art without departing from the scope of the invention as defined by the appended claims.

Claims

1. A method for creating component files from a media entity containing more than one layer, the method comprising the steps of:
extracting metadata for each layer from said media entity;
extracting media data from said media entity corresponding to the extracted metadata for each layer of said media entity; and
associating said extracted media data with said extracted metadata to enable creation, for said each layer, of a component file containing said extracted metadata and said extracted media data.
2. The method of claim 1, wherein said component file is at least one of a movie box, a movie fragment, a segment and a file.
3. The method of claim 1, further comprising the steps of:
extracting, for said each layer, from said media entity additional media data related to said extracted media data for said each layer; and
associating said extracted media data and said additional media data for each layer for creating corresponding component files.
4. The method of claim 1, further comprising the steps of:
identifying references to additional media data related to said extracted media data for each layer; and
associating said references with said extracted metadata and extracted media data for each layer for creating corresponding component files.
5. The method of claim 4, wherein said media data and additional media data comprise data samples.
6. The method of claim 5, wherein a data sample comprises a network abstract layer unit.
7. The method of claim 6, wherein said references contain at least one of a uniform resource locator and a uniform resource name of said network abstract layer units in said additional media data.
8. The method of claim 4, further comprising the steps of:
embedding said references into said extracted metadata for each layer; and adding indices to said references in said extracted media data.
9. The method of claim 8, wherein said references are placed at a beginning of said component file for each layer.
10. The method of claim 8, wherein said references are filled into media reference boxes and said indices are filled into extractors.
11. A file encapsulator for creating component files from a media entity containing more than one layer, the encapsulator comprising:
an extractor for extracting metadata for each layer from said media entity and for extracting media data from said media entity corresponding to said extracted metadata for each layer of said media entity; and
a correlator for associating said extracted media data with said extracted metadata to enable creation, for said each layer, of a component file containing said extracted metadata and said extracted media data.
12. The method of claim 11, wherein said component file is at least one of a movie box, a movie fragment, a segment and a file.
13. The file encapsulator of claim 11, wherein said extractor further extracts, for said each layer, from said media entity additional media data related to said extracted media data for each layer; and said correlator further associates said extracted media data and said additional media data for each layer for creating corresponding component files.
14. The file encapsulator of claim 11, further comprising:
a reference identifier for identifying a reference to additional media data, from said media entity, related to said extracted media data for each layer, wherein said reference is associated, via said correlator, with said extracted metadata and extracted media data for each layer for creating corresponding component files.
15. The file encapsulator of claim 14, wherein said media data and additional media data comprise data samples.
16. The file encapsulator of claim 15, wherein a data sample comprises a network abstract layer unit.
17. The file encapsulator of claim 16, wherein said references contain at least one of a uniform resource locator and a uniform resource name of said network abstract layer units in said additional media data.
18. The file encapsulator of claim 14, wherein said correlator further embeds said reference into said extracted metadata for each layer and adds indices to said references in said extracted media data.
19. The file encapsulator of claim 18, wherein said correlator places said references at the beginning of said component file for each layer.
20. The file encapsulator of claim 19, wherein said references are filled into media reference boxes and said indices are filled into extractors.
21. A method for reading a component file, comprising the steps of:
parsing said component file to obtain metadata, media data and references; and
if, according to said references, said media data of said component file are related to media data of other component files, retrieving said related media data from said other component files using said references.
22. The method of claim 21, wherein said media data of said component file are related to media data of other component files according to coding dependency.
23. The method of claim 21, wherein said media data and said related media data comprise data samples.
24. The method of claim 23, wherein a data sample comprises a network abstract layer unit.
25. The method of claim 21, further comprising the step of parsing said metadata to obtain said references.
26. The method of claim 25, further comprising the steps of:
parsing said media data to get reference indices embedded therein; and obtaining corresponding references according to said reference indices.
27. The method of claim 25, wherein the retrieving step comprises:
retrieving said other component files according to said references in parallel.
28. The method of claim 27, wherein the retrieving step further comprising:
checking local file storage;
if said local file storage contains said other component files, retrieving said other component files from said local storage.
29. A file reader, comprising:
a parser for parsing a component file to obtain metadata, media data and a reference;
a retriever for retrieving media data related to said media data from other component files according to said reference; and
a processor for processing said metadata, media data and said retrieved media data from other component files.
30. The file reader of claim 29, wherein said media data of said component file are related to media data of other component files in terms of coding dependency.
31. The file reader of claim 29, wherein said media data and said related media data comprise data samples.
32. The file reader of claim 31, wherein a data sample comprises a network abstract layer unit.
33. The file reader of claim 29, wherein said processor comprises a video decoder.
34. The file reader of claim 29, wherein said parser further comprises means for obtaining said references.
35. The file reader of claim 34, wherein said parser further parses said media data to get reference indices embedded therein, and obtains corresponding references according to said reference indices.
36. The file reader of claim 34, wherein said retriever further retrieves said other component files according to said obtained references in parallel.
37. The file reader of claim 36, wherein the retriever further checks local file storage, and if said local file storage contains said other component files, retrieves said other component files from said local storage.
PCT/US2011/040168 2010-06-14 2011-06-13 Method and apparatus for encapsulating coded multi-component video WO2011159605A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
EP11727605.5A EP2580920A1 (en) 2010-06-14 2011-06-13 Method and apparatus for encapsulating coded multi-component video
JP2013515413A JP2013532441A (en) 2010-06-14 2011-06-13 Method and apparatus for encapsulating encoded multi-component video
KR1020127032653A KR20130088035A (en) 2010-06-14 2011-06-13 Method and apparatus for encapsulating coded multi-component video
BR112012031874A BR112012031874A2 (en) 2010-06-14 2011-06-13 method and apparatus for encapsulating encoded multicomponent video
US13/703,929 US20130097334A1 (en) 2010-06-14 2011-06-13 Method and apparatus for encapsulating coded multi-component video
CN2011800293844A CN103098484A (en) 2010-06-14 2011-06-13 Method and apparatus for encapsulating coded multi-component video

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US35442210P 2010-06-14 2010-06-14
US35442410P 2010-06-14 2010-06-14
US61/354,422 2010-06-14
US61/354,424 2010-06-14

Publications (1)

Publication Number Publication Date
WO2011159605A1 true WO2011159605A1 (en) 2011-12-22

Family

ID=44454826

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/040168 WO2011159605A1 (en) 2010-06-14 2011-06-13 Method and apparatus for encapsulating coded multi-component video

Country Status (6)

Country Link
EP (1) EP2580920A1 (en)
JP (1) JP2013532441A (en)
KR (1) KR20130088035A (en)
CN (1) CN103098484A (en)
BR (1) BR112012031874A2 (en)
WO (1) WO2011159605A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2596633A1 (en) * 2010-07-20 2013-05-29 Nokia Corp. A media streaming apparatus
EP2680527A1 (en) * 2012-06-28 2014-01-01 Alcatel-Lucent Adaptive streaming aware node, encoder and client enabling smooth quality transition
US9202350B2 (en) 2012-12-19 2015-12-01 Nokia Technologies Oy User interfaces and associated methods
GB2538998A (en) * 2015-06-03 2016-12-07 Nokia Technologies Oy A method, an apparatus, a computer program for video coding
US10582231B2 (en) 2015-06-03 2020-03-03 Nokia Technologies Oy Method, an apparatus, a computer program for video coding
US11477253B2 (en) 2006-06-09 2022-10-18 Qualcomm Incorporated Enhanced block-request streaming system using signaling or block creation
US11743317B2 (en) 2009-09-22 2023-08-29 Qualcomm Incorporated Enhanced block-request streaming using block partitioning or request controls for improved client-side handling

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014230055A (en) * 2013-05-22 2014-12-08 ソニー株式会社 Content supply device, content supply method, program, and content supply system
GB2524531B (en) * 2014-03-25 2018-02-07 Canon Kk Methods, devices, and computer programs for improving streaming of partitioned timed media data
GB2527786B (en) 2014-07-01 2016-10-26 Canon Kk Method, device, and computer program for encapsulating HEVC layered media data
GB2579389B (en) * 2018-11-29 2022-07-27 Canon Kk Method, device and computer program for encapsulating media data into a media file

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1481553A1 (en) * 2002-02-25 2004-12-01 Sony Electronics Inc. Method and apparatus for supporting avc in mp4
AU2003248055A1 (en) * 2002-07-12 2004-02-02 Matsushita Electric Industrial Co., Ltd. Data processing device
US7725593B2 (en) * 2005-07-15 2010-05-25 Sony Corporation Scalable video coding (SVC) file format
US20070022215A1 (en) * 2005-07-19 2007-01-25 Singer David W Method and apparatus for media data transmission
KR20050092688A (en) * 2005-08-31 2005-09-22 한국정보통신대학교 산학협력단 Integrated multimedia file format structure, its based multimedia service offer system and method
KR101198583B1 (en) * 2005-10-12 2012-11-06 한국과학기술원 Apparatus of multimedia middle ware using metadata and management method and storing medium thereof
JP4818373B2 (en) * 2006-01-09 2011-11-16 韓國電子通信研究院 SVC file data sharing method and file
JP2013534101A (en) * 2010-06-14 2013-08-29 トムソン ライセンシング Method and apparatus for encapsulating encoded multi-component video

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
"Information technology - Coding of audio-visual objects", ADVANCED VIDEO CODING (AVC) FILE FORMAT, AMENDMENT 2: FILE FORMAT SUPPORT FOR SCALABLE VIDEO CODING, ISO/IEC 14496-15:2004/AMD.2, 2008
"Information technology - Coding of audio-visual objects", ADVANCED VIDEO CODING (AVC) FILE FORMAT, AMENDMENT 2: FILE FORMAT SUPPORT FOR SCALABLE VIDEO CODING, ISO/IEC 14496-15:2004/AMD.2, 2008, pages 17
"SVC / MVC Amendments to the AVC file format extension of BMFF: Information Technology - coding of audio-visual objects", ADVANCED VIDEO CODING (A VC) FILE FORMAT, AMENDMENT 2: FILE FORMAT SUPPORT FOR SCALABLE VIDEO CODING, 2008, pages 15 - 17
AMON P ET AL: "File Format for Scalable Video Coding", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 17, no. 9, 1 September 2007 (2007-09-01), pages 1174 - 1185, XP011193013, ISSN: 1051-8215, DOI: 10.1109/TCSVT.2007.905521 *
ANONYMOUS: "Text of ISO/IEC 14496-15/FDAM2 SVC File Format Extension", 83. MPEG MEETING;14-1-2008 - 18-1-2008; ANTALYA; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11),, no. N9682, 12 March 2008 (2008-03-12), XP030016176 *
DAVID SINGER: "Editor's draft of the Part12 file format amendment", 86. MPEG MEETING; 13-10-2008 - 17-10-2008; BUSAN; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11),, no. M15812, 20 October 2008 (2008-10-20), XP030044409 *
GRÜNEBERG K ET AL: "Deliverable D3.2 MVC/SVC storage format", no. Project No: FP7-ICT-214063, 29 January 2009 (2009-01-29), pages 1 - 34, XP002599508, Retrieved from the Internet <URL:http://www.ist-sea.eu/Public/SEA_D3.2_HHI_FF_20090129.pd> [retrieved on 20100901] *
YE-KUI WANG ET AL: "Comments to the MVC file format draft", 88. MPEG MEETING; 20-4-2009 - 24-4-2009; MAUI; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11),, no. M16444, 17 April 2009 (2009-04-17), XP030045041 *
ZHENYU WU ET AL: "Some extensions to ISO Base Media File Format and MPEG-2 Transport Stream to support multi-component media content HTTP Streaming", 93. MPEG MEETING; 26-7-2010 - 30-7-2010; GENEVA; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11),, no. M17894, 22 July 2010 (2010-07-22), XP030046484 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11477253B2 (en) 2006-06-09 2022-10-18 Qualcomm Incorporated Enhanced block-request streaming system using signaling or block creation
US11770432B2 (en) 2009-09-22 2023-09-26 Qualcomm Incorporated Enhanced block-request streaming system for handling low-latency streaming
US11743317B2 (en) 2009-09-22 2023-08-29 Qualcomm Incorporated Enhanced block-request streaming using block partitioning or request controls for improved client-side handling
EP2596633A4 (en) * 2010-07-20 2014-01-15 Nokia Corp A media streaming apparatus
EP2596633A1 (en) * 2010-07-20 2013-05-29 Nokia Corp. A media streaming apparatus
US9769230B2 (en) 2010-07-20 2017-09-19 Nokia Technologies Oy Media streaming apparatus
EP2680527A1 (en) * 2012-06-28 2014-01-01 Alcatel-Lucent Adaptive streaming aware node, encoder and client enabling smooth quality transition
WO2014001246A1 (en) * 2012-06-28 2014-01-03 Alcatel Lucent Adaptive streaming aware node, encoder and client enabling smooth quality transition
CN104429041A (en) * 2012-06-28 2015-03-18 阿尔卡特朗讯公司 Adaptive streaming aware node, encoder and client enabling smooth quality transition
US9202350B2 (en) 2012-12-19 2015-12-01 Nokia Technologies Oy User interfaces and associated methods
US9665177B2 (en) 2012-12-19 2017-05-30 Nokia Technologies Oy User interfaces and associated methods
GB2538998A (en) * 2015-06-03 2016-12-07 Nokia Technologies Oy A method, an apparatus, a computer program for video coding
US10979743B2 (en) 2015-06-03 2021-04-13 Nokia Technologies Oy Method, an apparatus, a computer program for video coding
US10582231B2 (en) 2015-06-03 2020-03-03 Nokia Technologies Oy Method, an apparatus, a computer program for video coding

Also Published As

Publication number Publication date
JP2013532441A (en) 2013-08-15
EP2580920A1 (en) 2013-04-17
BR112012031874A2 (en) 2017-11-28
CN103098484A (en) 2013-05-08
KR20130088035A (en) 2013-08-07

Similar Documents

Publication Publication Date Title
EP2580920A1 (en) Method and apparatus for encapsulating coded multi-component video
EP3703384B1 (en) Media encapsulating and decapsulating
US20130097334A1 (en) Method and apparatus for encapsulating coded multi-component video
EP3092772B1 (en) Media encapsulating and decapsulating
CN113170239B (en) Method, apparatus and storage medium for encapsulating media data into media files
US20120233345A1 (en) Method and apparatus for adaptive streaming
KR20180018662A (en) A method, device, and computer program product for obtaining media data and metadata from an encapsulated bit-stream from which an operating point descriptor may be dynamically set
US20130091154A1 (en) Method And Apparatus For Encapsulating Coded Multi-Component Video
GB2583844A (en) Method, device, and computer program for transmitting portions of encapsulated media content
TW201909007A (en) Processing media data using a common descriptor for one of the file format logic boxes
GB2593897A (en) Method, device, and computer program for improving random picture access in video streaming
GB2599170A (en) Method, device, and computer program for optimizing indexing of portions of encapsulated media content data
CN110870323B (en) Processing media data using omnidirectional media format
JP7241874B2 (en) Method, apparatus, and computer program for signaling available portions of encapsulated media content
EP3821614B1 (en) An apparatus, a method and a computer program for video coding and decoding
EP3977750A1 (en) An apparatus, a method and a computer program for video coding and decoding
JP2013534101A (en) Method and apparatus for encapsulating encoded multi-component video
EP4068781A1 (en) File format with identified media data box mapping with track fragment box
US20230336602A1 (en) Addressable resource index events for cmaf and dash multimedia streaming
GB2620582A (en) Method, device, and computer program for improving indexing of portions of encapsulated media data
WO2024015256A1 (en) Method for bandwidth switching by cmaf and dash clients using addressable resource index tracks and events

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180029384.4

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11727605

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2013515413

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2011727605

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 20127032653

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 13703929

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112012031874

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 112012031874

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20121213