KR20130088035A

KR20130088035A - Method and apparatus for encapsulating coded multi-component video

Info

Publication number: KR20130088035A
Application number: KR1020127032653A
Authority: KR
Inventors: 젱유 우; 리 후아 주
Original assignee: 톰슨 라이센싱
Priority date: 2010-06-14
Filing date: 2011-06-13
Publication date: 2013-08-07
Also published as: EP2580920A1; BR112012031874A2; JP2013532441A; WO2011159605A1; CN103098484A

Abstract

하나 이상의 레이어를 포함하는 미디어 엔티티를, 각각 하나의 레이어에 대해, 다수의 컴포넌트 파일들로 인캡슐레이팅하기 위한 방법 및 장치는 컴포넌트 파일 판독을 위한 해당 방법 및 장치와 함께 설명된다. ISO BMFF 및 SVC/MVC 파일 포맷들의 추출기 데이터 구조에 대한 확장을 위한 새로운 박스가 제안된다. 본 새로운 박스는 현재의 컴포넌트 파일의 처리와 병행하여 참조된 컴포넌트 파일들의 액세스를 가능하게 한다. 본 발명의 추출기 확장은 상이한 컴포넌트 파일들을 넘나들며 NAL 유닛들 참조를 허용한다. 본 발명은 미디어 파일들의 적응적 HTTP 스트리밍을 가능하게 한다.A method and apparatus for encapsulating a media entity comprising one or more layers, for each layer, into a plurality of component files is described together with the corresponding method and apparatus for reading a component file. A new box is proposed for extension to the extractor data structure of ISO BMFF and SVC / MVC file formats. This new box allows access to the referenced component files in parallel with the processing of the current component file. The extractor extension of the present invention allows for NAL units references across different component files. The present invention enables adaptive HTTP streaming of media files.

Description

METHOD AND APPARATUS FOR ENCAPSULATING CODED MULTI-COMPONENT VIDEO}

본 특허 출원은, "SVC/MVC 파일 포맷들의 추출기 데이터 구조에 대한 확장 (Extension to the Extractor data structure of SVC/MVC file formats)"라는 제목이 붙여졌으며, 2010년 6월 14일에 출원된 미국 가특허 출원 일련 번호 제61/354,422호, 및 "HTTP 스트리밍을 위한 ISO 기반 미디어 파일 포맷에 대한 일부 확장(Some extensions for ISO Base Media File Format for HTTP streaming)"이라는 제목이 붙여졌으며, 2010년 6월 14일에 출원된 미국 가특허 출원 일련 번호 제61/354,424호로부터의 우선권의 이익을 주장하는 바이다. 상기 식별된 가특허 출원들의 가르침들은 본 명세서에서 참조로서, 명백히 병합된다.This patent application is entitled, "Extension to the Extractor data structure of SVC / MVC file formats," and was filed on June 14, 2010. Patent application serial number 61 / 354,422, and entitled "Some extensions for ISO Base Media File Format for HTTP streaming", June 2010 14 It claims the benefit of priority from US Provisional Patent Application Serial No. 61 / 354,424, filed May. The teachings of the above identified patent applications are expressly incorporated herein by reference.

본 출원은, "코딩된 멀티-컴포넌트 비디오를 인캡슐레이팅하기 위한 방법 및 장치( Method and Apparatus for Encapsulating Coded Multi - component Video )"라는 제목이 붙여졌으며, 이와 동시에 출원된, 다음의 계류 중이며 일반적으로 소유된 미국 특허 출원 일련 번호 제__/______호(대리인 번호 제PU100140호)에 관한 것이다. 상기 직접 식별된 정규 특허 출원의 가르침들은 본 명세서에서 참조로서 명백히 병합된다.The present application, "the coded multi-method and apparatus for rating the capsule component video (Method and Apparatus for Encapsulating Coded Multi - component Video ) "and at the same time, the following pending and generally owned US Patent Application Serial No. __ / ______ (Agent No. PU100140), filed at the same time. The teachings of the patent application are expressly incorporated herein by reference.

본 발명은 일반적으로 HTTP 스트리밍(HTTP Streaming)에 관한 것이다. 보다 구체적으로, 본 발명은 HTTP 스트리밍을 위한, 스케일러블 비디오 코딩(SVC : scalable video coding) 스트림들 및 멀티-뷰 코딩(MVC : multi-view coding) 스트림들과 같은 코딩된 멀티-컴포넌트 비디오 스트림들에 대한 미디어 엔티티(media entity)를 인캡슐레이팅(encapsulating)하는 것에 관한 것이다.The present invention relates generally to HTTP streaming. More specifically, the present invention provides coded multi-component video streams, such as scalable video coding (SVC) streams and multi-view coding (MVC) streams for HTTP streaming. To encapsulating a media entity for.

HTTP 스트리밍 어플리케이션에 있어서, 인코딩된 비디오는 종종 인캡슐레이팅되어서, MP4 파일과 같이, BMFF에 순응하는 파일로서 서버 측에 저장된다. 게다가, 적응적 HTTP 스트리밍을 실현하기 위해, 상기 파일은 보통 다수의 무비 프래그먼트들(movie fragments)로 분할되며, 이들 프래그먼트들은 클라이언트 URL 요청들에 의해 어드레스 가능한 세그먼트들(addressable segments)로 더 그룹핑된다. 실제로, 비디오 컨텐츠의 상이한 인코딩된 표현들(representations)은, 클라이언트가 세션(session) 과정에서 다운로딩 및 재생하기 위해 원하는 표현을 동적으로 선택할 수 있도록, 상기 세그먼트들에 저장된다.In HTTP streaming applications, encoded video is often encapsulated and stored on the server side as a BMFF-compliant file, such as an MP4 file. In addition, to realize adaptive HTTP streaming, the file is usually divided into a number of movie fragments, which are further grouped into addressable segments by client URL requests. Indeed, different encoded representations of video content are stored in the segments so that the client can dynamically select the desired representation for downloading and playing back during the session.

SVC 또는 MVC 비트스트림과 같이, 인코딩되고 레이어링된 비디오는, 비트스트림의 상이한 서브 세트들을 디코딩함으로써, 상이한 동작 포인트들(operating points), 즉 시간/공간 해상도, 품질, 뷰(views) 등에 대한 표현들을 가능하게 하여, 이러한 비트 전송률 적응(bitrate adaptation)을 위한 정상적인 지원을 제공한다. 하지만 MP4 파일 포맷과 같은 기존의 ISO 기반 미디어 파일 포맷(BMFF : Base Media File Format) 표준들은 각각의 레이어 또는 표현에 대한 개별적인 액세스를 지원하지 않으며, 이에 따라 HTTP 스트리밍 어플리케이션에 적용될 수 없다. 도 1에 도시되는 바와 같이, MP4 파일 포맷에 있어서, 1개의 미디어 파일에 대한 모든 레이어들 또는 표현들에 대한 메타데이터는 moov 무비 박스(moov Movie Box)에 저장되지만, 모든 레이어들 또는 표현들에 대한 미디어 컨텐츠 데이터는 mdat 무비 박스(mdat Movie Box)에 저장된다. HTTP 스트리밍에 있어서, 클라이언트가 하나의 레이어를 요청할 때, 모든 레이어들 또는 표현들이 함께 혼합되어서, 요구되는 레이어 또는 표현을 어디서 찾아야할 지를 클라이언트가 알지 못하기 때문에, 전체 파일이 전송되어야 한다. Like SVC or MVC bitstreams, encoded and layered video decodes different subsets of the bitstream, thereby rendering representations of different operating points, i.e., temporal / spatial resolution, quality, views, etc. This makes it possible to provide normal support for this bitrate adaptation. However, existing ISO Base Media File Format (BMFF) standards, such as the MP4 file format, do not support individual access to each layer or representation and therefore cannot be applied to HTTP streaming applications. As shown in FIG. 1, in the MP4 file format, metadata for all layers or representations for one media file is represented by a moov movie box ( moov). Movie content, but media content data for all layers or representations is stored in the mdat movie box ( mdat). Movie Box). In HTTP streaming, when a client requests one layer, all the layers or expressions are mixed together, so the entire file must be transferred because the client does not know where to find the required layer or expression.

추후에 알려지겠지만, 적응적 HTTP 스트리밍 어플리케이션들에 있어서, 무비 프래그먼트 또는 컴포넌트 파일 경계들(boundaries)을 넘나들며, 네트워크 추상화 레이어(NAL : network abstract layer) 유닛들과 같은 미디어 데이터 샘플들을 참조(reference)할 수 있다는 것은 바람직하다. SVC/MVC 환경(context)에서, 이러한 참조는 "추출기(Extractor)"와 같은 메커니즘들을 사용함으로써, 확립될 수 있다. 추출기는 BMFF의 AVC 파일 포맷 확장에 대한 SVC/MVC 보정 : Information Technology - coding of audio - visual objects - Part 15 : Advanced Video Coding(AVC) file format , Amendment 2 : File format support for Scalable Video Coding, 2008, pages 15-17에 정의된 내부 파일 데이터 구조이다. 추출기는, 복제하지 않고, 참조에 의해 다른 트랙들(tracks)로부터 NAL 유닛들의 추출을 가능하게 하도록 설계되었다. 여기서, 트랙은 ISO 기반 미디어 파일 내에서의 관련된 샘플들의 일정한 시퀀스(timed sequence)이다. 미디어 데이터에 대해, 트랙은 이미지들 또는 샘플링된 오디오의 한 시퀀스에 해당한다. 추출기의 구문(syntax)은 아래에 도시된다 :As will be known later, in adaptive HTTP streaming applications, it may traverse movie fragment or component file boundaries and reference media data samples such as network abstract layer (NAL) units. It is desirable to be able to. In the SVC / MVC context, such a reference can be established by using mechanisms such as an "extractor". Extractor Corrects SVC / MVC for AVC File Format Extensions in BMFF: Information Technology- coding of audio - visual objects - Part 15: Advanced Video Coding (AVC) file format , Amendment 2: File format support for Scalable Internal file data structure as defined in Video Coding, 2008, pages 15-17 . The extractor is designed to enable extraction of NAL units from other tracks by reference, without duplicating. Here, a track is a timed sequence of related samples in an ISO base media file. For media data, a track corresponds to a sequence of images or sampled audio. The syntax of the extractor is shown below:

class aligned(8) Extractor () {class aligned (8) Extractor () {

NALUnitHeader();NALUnitHeader ();

unsigned int(8) track_ref_index;unsigned int (8) track_ref_index;

signed int(8) sample_offset;signed int (8) sample_offset;

unsigned int ((lengthSizeMinusOne + 1)*8)unsigned int ((lengthSizeMinusOne + 1) * 8)

data_offset; data_offset;

data_length;data_length;

}}

추출기 데이터 구조의 의미(semantics)는 다음과 같다:The semantics of the extractor data structure are as follows:

NALUnitHeader : 유형 20의 NAL 유닛들에 대한 ISO / IEC 14496-10 Annex G에 명시되는 NAL 유닛 구조:NALUnitHeader: according to ISO / IEC 14496-10 Annex G for NAL units of type 20 NAL unit structure specified:

nal_unit_type은 추출기 NAL 유닛 유형(유형 31)로 설정될 것이다.nal_unit_type will be set to the extractor NAL unit type (type 31).

forbidden_zero_bit, reserved_one_bit, 및 reserved_three_2bits는 ISO / IEC 14496-10 Annex G에 명시되는 바와 같이 설정될 것이다.forbidden_zero_bit, reserved_one_bit, and reserved_three_2bits are in ISO / IEC 14496-10 Annex G Will be set as specified.

다른 필드들(nal_ref_idc, idr_flag, priority_id, no_inter_layer_pred_flag, dependency_id, quality_id, temporal_id, use_ref_base_pic_flag, discardable_flag, 및 output_flag)은 Information Technology - Coding of audio - visual objects - Part 15 : Advanced Video Coding(AVC) file format , Amendment 2 : File format support for Scalable Video Coding, ISO / IEC 14496-15 : 2004/ Amd .2 : 2008, page 17의 B.4에 명시되는 바와 같이 설정될 것이다.The other fields (nal_ref_idc, idr_flag, priority_id, no_inter_layer_pred_flag, dependency_id, quality_id, temporal_id, use_ref_base_pic_flag, discardable_flag, and output_flag) are described by Information Technology- Coding. of audio - visual objects - Part 15: Advanced Video Coding (AVC) file format , Amendment 2: File format support for Scalable As specified in B.4 of Video Coding, ISO / IEC 14496-15: 2004 / Amd . 2: 2008, page 17 Will be set together.

track_ref_index는 데이터를 추출할 트랙을 찾기 위해 사용하는 유형 'scal'의 트랙 참조에 대한 색인을 명시한다. 데이터가 추출되는 해당 트랙 내의 샘플은 시간적으로 배열되거나, 또는, 즉 추출기를 포함하는 샘플과 함께 sample_offset으로 명시되는 오프셋(offset)에 의해서 조정되는 시간-샘플 표(time-to-sample table)만을 사용할 때, 미디어 디코딩 타임라인 상에서 가장 가깝게 앞선다(nearest preceding). 제1 트랙 참조는 색인 값 1을 갖고; 값 0은 유보(reserved)된다.track_ref_index specifies the index of a track reference of type 'scal' used to find the track from which to extract data. The samples in the track from which the data is extracted are arranged in time, or use only a time-to-sample table adjusted by the offset specified by sample_offset with the sample containing the extractor. When preceded most closely on the media decoding timeline. The first track reference has an index value of 1; The value 0 is reserved.

sample_offset은 정보의 소스로 사용될 링크된 트랙 내의 샘플의 관련된 색인을 부여한다. 샘플 0(영)은, 추출기를 포함하는 샘플의 디코딩 시간과 비교할 때, 동일한 또는 가장 근접하게 앞서는 디코딩 시간을 갖는 샘플이고; 샘플 1(일)은 다음 샘플이며, 샘플 -1(마이너스 1)은 이전 샘플이다.sample_offset gives the associated index of the sample in the linked track to be used as the source of information. Sample 0 (zero) is a sample having the same or closest leading decoding time as compared to the decoding time of the sample comprising the extractor; Sample 1 (one) is the next sample and sample -1 (minus 1) is the previous sample.

data_offset : 복제할 참조 샘플 내의 제1 바이트의 오프셋. 추출이 해당 샘플 내의 데이터의 제1 바이트로 시작하는 경우, 오프셋은 값 0을 얻는다. 오프셋은 NAL 유닛 길이 필드(NAL unit length field)의 시작부(beginning)를 참조할 것이다.data_offset: The offset of the first byte in the reference sample to duplicate. If the extraction starts with the first byte of data in that sample, the offset gets a value of zero. The offset will refer to the beginning of the NAL unit length field.

data_length : 복제할 바이트들의 수. 본 필드가 값 0을 얻는 경우, 전체의 단일 참조된 NAL 유닛이 복제된다{즉 복제할 길이는 집합자들(aggregators)의 경우에 additional_bytes 필드만큼 증대되는 데이터 오프셋으로 참조되는 길이 필드(length field)로부터 얻어진다}.data_length: The number of bytes to duplicate. If this field gets a value of 0, the entire single referenced NAL unit is replicated (i.e., the length field to be referenced by the data offset is increased by an additional_bytes field in the case of aggregators). Obtained from.

추가적인 세부 사항들은 Information Technology - Coding of audio - visual objects - Part 15 : Advanced Video Coding ( AVC ) file format , Amendment 2 : File format support for Scalable Video Coding , ISO / IEC 14496-15 : 2004/Amd.2 : 2008에서 발견될 수 있다.Additional details are provided in the Information Technology - Coding of audio - visual objects- Part 15: Advanced Video Coding ( AVC ) file format , Amendment 2: File format support for Scalable Video Coding , ISO / IEC 14496-15: 2004 / Amd.2: 2008 .

현재 추출기들은 다른 트랙들로부터의, 그러나 동일한 무비 박스/프래그먼트 에 속하는 NAL 유닛들을 참조에 의해서만 추출할 수 있다. 다른 말로, 상이한 세그먼트 또는 파일로부터 NAL 유닛들을 추출하기 위해 추출기들을 사용하는 것은 가능하지 않다. 이러한 제약(restriction)은 상기 사용의 경우에 추출기들의 사용을 제한한다.Currently extractors can extract NAL units from other tracks but belonging to the same movie box / fragment only by reference. In other words, it is not possible to use extractors to extract NAL units from different segments or files. This restriction limits the use of extractors in the case of said use.

클라이언트가 서버로부터 한 조각(piece)의 미디어 컨텐츠 중 하나 이상의 컨텐츠 컴포넌트를 이미 다운로딩했으며, 또 다른 컨텐츠 컴포넌트를 다운로딩하기 위한 프로세스 중일 경우, 클라이언트는, 완전한 컴포넌트 세트를 다운로딩하기 위해 필수적인 다른 요청들을 할 수 있도록, 이전에 다운로딩된 컨텐츠 컴포넌트들이 새로운 컨텐츠 컴포넌트들 중 종속적인 컴포넌트들의 세트 사이에 존재하는 지의 여부를 알 필요가 있다. 상기의 사용의 경우는 또한 외부적이고 종속적인 컨텐츠 컴포넌트 및 자체 위치 정보를 시그널링하기 위한 메커니즘을 요구한다.If the client has already downloaded one or more content components of a piece of media content from the server and is in the process of downloading another content component, the client may request another request that is necessary to download the complete set of components. In order to be able to hear it, it is necessary to know whether previously downloaded content components exist between the set of dependent components of the new content components. The above use case also requires a mechanism for signaling external and dependent content components and their location information.

BMFF에 있어서, "tref"라고 불리는 박스 유형이 존재하는데, 이는 프리젠테이션(presentation)에서 포함하는 트랙(containing track)에서부터 또 다른 트랙에 이르기까지 참조를 제공하도록 사용된다. 상기 박스는 트랙들 사이의 종속성들(dependencies)을 설명하도록 사용될 수 있지만, 종속성은 동일한 미디어 파일 내의 트랙들로 제한된다.For BMFF, there is a box type called "tref", which is used to provide a reference from the containing track to another track in the presentation. The box can be used to account for dependencies between tracks, but the dependency is limited to tracks in the same media file.

한 접근법은 일부 대역외(out-of-band) 메커니즘을 사용하여 이러한 정보를 시그널링하는 것이다. 예를 들어 HTTP 스트리밍 어플리케이션에 있어서, 서버는, 세션이 시작되기 이전에, 클라이언트에 메니페스트 파일(manifest file)을 전송할 수 있다. 상기 메니페스트 파일은 요청된 미디어 컨텐츠의 각각의 컨텐츠 컴포넌트에 대한 종속성 및 위치 정보를 포함하는 파일이다. 이에 따라 클라이언트는 모든 필수적인 컴포넌트 파일들을 요청할 수 있다. 하지만 이러한 대역외 접근법은 로컬 파일 재생(local file playback)에 대해 적용 가능하지 않는데, 이 경우에는 어떤 메니페스트 파일도 이용 가능하지 않다.One approach is to signal this information using some out-of-band mechanism. For example, in an HTTP streaming application, the server may send a manifest file to the client before the session begins. The manifest file is a file containing dependency and location information for each content component of the requested media content. This allows the client to request all necessary component files. However, this out-of-band approach is not applicable for local file playback, in which case no manifest file is available.

상기 언급된 문제들에 대한 이전의 해결책들은 당업에서 적절하게 확립되지 않았다. 스피드 및 운송 효율(transport efficiency)을 희생하지 않고 레이어들을 분석하고 인캡슐레이팅하는 능력을 제공하는 것은 바람직할 것이다. 이러한 결과들은 지금까지 당업에서 성취되지 않았다.Previous solutions to the above mentioned problems have not been properly established in the art. It would be desirable to provide the ability to analyze and encapsulate layers without sacrificing speed and transport efficiency. These results have not been achieved so far in the industry.

저장 공간 절약을 위해, 실제로 각각의 컴포넌트 파일 내에서의 동일한 데이터를 복사하지 않고, 무비 프래그먼트 또는 컴포넌트 파일 경계들을 넘나들며, NAL 유닛들과 같은 미디어 데이터 샘플들을 참조할 수 있다는 것은 바람직하다. 하지만, ISO 기반 미디어 파일 포맷(BMFF) 및 확장들은 현재 이러한 기능을 지원하지 않는다. 따라서, 이를 해결하고자 한다.To save storage space, it is desirable to be able to reference media data samples, such as NAL units, across movie fragment or component file boundaries, without actually copying the same data within each component file. However, ISO-based media file formats (BMFFs) and extensions do not currently support this feature. Therefore, this is to be solved.

본 발명은 하나 이상의 레이어를 포함하는 미디어 엔티티로부터 컴포넌트 파일들을 인캡슐레이팅하기 위한, 그리고 컴포넌트 파일을 판독하기 위한 방법들 및 장치들에 관한 것이다.The present invention relates to methods and apparatuses for encapsulating component files from a media entity comprising one or more layers and for reading the component file.

본 발명의 한 양상에 따르면, 하나 이상의 레이어를 포함하는 미디어 엔티티로부터 컴포넌트 파일들을 인캡슐레이팅하고 생성하기 위한 방법이 제공된다. 본 방법은 메타데이터, 및 미디어 엔티티로부터 각각의 레이어에 대해 추출된 메타데이터에 해당하는 미디어 데이터를 추출한다. 추출된 미디어 데이터 및 메타데이터는 각각의 레이어에 대해, 추출된 메타데이터 및 추출된 미디어 데이터를 포함하는 컴포넌트 파일의 생성을 가능하게 하기 위해, 결합된다.According to one aspect of the present invention, a method is provided for encapsulating and generating component files from a media entity comprising one or more layers. The method extracts metadata and media data corresponding to the extracted metadata for each layer from the media entity. The extracted media data and metadata are combined for each layer to enable creation of a component file that includes the extracted metadata and the extracted media data.

본 발명의 또 다른 양상에 따르면, 파일 인캡슐레이터(file encapsulator)가 제공된다. 파일 인캡슐레이터는 메타데이터, 및 미디어 엔티티로부터 각각의 레이어에 대해 추출된 메타데이터에 해당하는 미디어 데이터를 추출하기 위한 추출기; 및 각각의 레이어에 대해, 컴포넌트 파일의 생성을 가능하게 하기 위해, 추출된 미디어 데이터를 추출된 메타데이터와 결합시키기 위한 상관기(correlator)를 포함한다.According to another aspect of the present invention, a file encapsulator is provided. The file encapsulator includes an extractor for extracting metadata and media data corresponding to the extracted metadata for each layer from the media entity; And a correlator for combining the extracted media data with the extracted metadata to enable generation of a component file for each layer.

본 발명의 상기 특징들은 다음의 첨부된 도면들을 참조하여, 예시적인 실시예들을 상세하게 설명함으로써, 보다 더 명백해 질 것이다. The above features of the present invention will become more apparent by explaining exemplary embodiments in detail, with reference to the following attached drawings.

상기 제시된 과제를 해결하기 위해, 본 발명의 실시예는 무비 프래그먼트 또는 컴포넌트 파일의 미디어 데이터에 관련되고, 이에 의해 요구되는 이들의 추가적인 미디어 데이터에 대한 참조가 식별되고 확립된다. 상기 참조는, 상기 추가적인 미디어 데이터가 아닌, 메타데이터 및 미디어 데이터와 함께 컴포넌트 파일에 결합된다. 각각의 레이어에 대해 참조들은 추출된 미디어 데이터에 삽입될 수 있으며, 이후 해당 컴포넌트 파일들을 생성하기 위해, 각각의 레이어에 대해 추출된 메타데이터 및 추출된 미디어 데이터를 결합시킬 수 있다.In order to solve the problems presented above, embodiments of the present invention relate to media data of a movie fragment or component file, whereby references to their additional media data required are identified and established. The reference is coupled to the component file with metadata and media data, but not the additional media data. References to each layer may be inserted into the extracted media data, and then the extracted metadata and the extracted media data may be combined for each layer to generate corresponding component files.

도 1은 MP4 파일 포맷의 예시를 도시하는 도면.
도 2는 미디어 엔티티를 인캡슐레이팅하기 위한 본 발명의 한 실시예를 도시하는 도면.
도 3은 다수의 레이어들/표현들을 포함하는 미디어 엔티티로부터 컴포넌트 파일들을 인캡슐레이팅 또는 생성하도록 사용되는 인캡슐레이터의 구조를 도시하는 도면.
도 4는 종속성 관계(dependency relationship)를 기반으로 하여 추가적인 미디어 데이터를 컴포넌트 파일들과 결합시키는 예시를 도시하는 도면.
도 5는 추출기가 포함된 무비 박스/프래그먼트와는 상이한 무비 박스/프래그먼트로부터, 참조에 의해, NAL 유닛을 추출하기 위한 예시를 도시하는 도면.
도 6은 발명된 새로운 추출기 데이터 구조들 중 하나를 사용하여, 다수의 컴포넌트 파일들로의 SVC/MVC 유형 비디오 비트스트림에 대한 포함된 인캡슐레이션 동작들을 도시하는 도면.
도 7은 컴포넌트 파일들을 판독하도록 사용되는 파일 판독기(file reader)의 구조를 도시하는 도면.
도 8은 본 발명의 한 실시예를 포함하는 비디오 디코더에 대한 인캡슐레이팅된 컴포넌트 파일을 판독하는 프로세스를 도시하는 도면.
도 9는 다른 선호되는 새로운 추출기 데이터 구조들을 사용하여, 다수의 무비 프래그먼트들로의 SVC/MVC 유형 비디오 비트스트림에 대한 인캡슐레이션 동작들을 도시하는 도면.
도 10은 본 발명의 또 다른 실시예를 포함하는 비디오 디코더에 대한 인캡슐레이팅된 컴포넌트 파일을 판독하는 프로세스를 도시하는 도면. 1 is a diagram illustrating an example of an MP4 file format.
2 illustrates one embodiment of the present invention for encapsulating a media entity.
3 illustrates the structure of an encapsulator used to encapsulate or generate component files from a media entity comprising multiple layers / expressions.
4 illustrates an example of combining additional media data with component files based on a dependency relationship.
5 shows an example for extracting, by reference, a NAL unit from a movie box / fragment that is different from a movie box / fragment in which an extractor is included.
FIG. 6 illustrates included encapsulation operations for an SVC / MVC type video bitstream into multiple component files, using one of the new extractor data structures invented. FIG.
7 shows the structure of a file reader used to read component files.
8 illustrates a process for reading an encapsulated component file for a video decoder that includes one embodiment of the present invention.
9 illustrates encapsulation operations for an SVC / MVC type video bitstream into multiple movie fragments using other preferred new extractor data structures.
10 illustrates a process for reading an encapsulated component file for a video decoder incorporating another embodiment of the present invention.

본 발명에 있어서, 하나의 미디어 파일, 한 세트의 미디어 파일들, 또는 스트리밍 미디어와 같은 미디어 엔티티는, 클라이언트 URL 요청들에 의해 어드레스 가능한 다수의 무비 컴포넌트 파일들로 분할되거나, 또는 인캡슐레이팅된다. 여기서, 컴포넌트 파일은 프래그먼트, 세그먼트, 파일 및 다른 등가적인 용어들(terms)을 나타내는 보다 넓은 개념으로 사용된다.In the present invention, a media entity, such as one media file, a set of media files, or streaming media, is divided or encapsulated into a number of movie component files addressable by client URL requests. Here, component file is used in a broader sense to represent fragments, segments, files, and other equivalent terms.

본 발명의 한 실시예에서, 다수의 표현들 또는 컴포넌트들을 포함하는 미디어 엔티티는 각각의 표현/컴포넌트에 대해 메타데이터 및 미디어 데이터를 추출하기 위해 분석된다. 표현/컴포넌트의 예시들은, 다양한 시간/공간 해상도들 및 SVC의 품질, 및 MVC의 뷰를 갖는 레이어들과 같은 레이어들을 포함한다. 다음으로, 레이어들은 또한 표현들/컴포넌트들을 언급하기 위해 사용되며, 이들 용어들은 서로 교환 가능하게 사용된다. 메타데이터는, 예를 들어 각각의 표현에 대해 미디어 엔티티에 포함된 것과, 거기에 포함된 미디어 데이터를 사용하는 방법을 설명한다. 미디어 데이터는, 미디어 데이터의 목적, 예컨데 컨텐츠의 디코딩을 달성하기 위해 요구되는 미디어 데이터 샘플들, 또는 상기 요구되는 데이터 샘플들을 획득하는 방법에 대한 임의의 필수적인 정보를 포함한다. 각각의 표현 또는 레이어에 대해 추출된 메타데이터 및 미디어 데이터는 사용자 액세스를 위해 결합/상관 및 저장된다. 저장 동작은 하드 드라이브 또는 다른 저장 미디어에서 물리적으로 수행될 수 있거나, 또는 메타데이터 및 미디어 데이터가 실제로 저장 미디어 상의 상이한 장소에 위치되는 경우, 다른 어플리케이션들 또는 모듈들과 인터페이싱할 때, 메타데이터 및 미디어 데이터가 함께 저장되는 것으로 나타내도록, 관계 매니지먼트 메커니즘(relationship management mechanism)을 통해 가상으로(virtually) 수행될 수 있다. 도 2는 상기 실시예에 대한 예시를 도시한다. 도 2에서, 미디어 엔티티는 3개의 레이어들 : 기본 레이어, 개선 레이어 1 및 개선 레이어 2를 포함한다. 미디어 엔티티는 3개의 레이어들 각각에 대해 메타데이터 및 미디어 데이터를 추출하기 위해 분석되며, 이들 데이터는 메타데이터 및 함께 결합된 해당 미디어 데이터를 포함한 컴포넌트 파일들로서 개별적으로 저장된다.In one embodiment of the present invention, a media entity comprising multiple representations or components is analyzed to extract metadata and media data for each representation / component. Examples of representations / components include layers such as layers having various temporal / spatial resolutions and quality of SVC, and a view of MVC. Next, layers are also used to refer to representations / components, and these terms are used interchangeably. Metadata describes, for example, what is included in the media entity for each representation and how to use the media data contained therein. The media data includes any essential information about the purpose of the media data, such as the media data samples required to achieve decoding of the content, or how to obtain the required data samples. The extracted metadata and media data for each representation or layer are combined / correlated and stored for user access. The storage operation can be performed physically on a hard drive or other storage media, or when interfacing with other applications or modules, if the metadata and media data are actually located in different places on the storage media. It may be performed virtually through a relationship management mechanism to indicate that data is stored together. 2 shows an example for this embodiment. In FIG. 2, the media entity includes three layers: base layer, enhancement layer 1 and enhancement layer 2. The media entity is analyzed to extract metadata and media data for each of the three layers, which are stored separately as component files containing the metadata and corresponding media data joined together.

도 3은 SVC 인코딩된 비디오들과 같은 다수의 레이어들을 포함하는 미디어 엔티티로부터 컴포넌트 파일들을 인캡슐레이팅 및 생성하도록 사용된 선호되는 인캡슐레이터(300)의 구조를 도시한다. 입력 미디어 엔티티(310)는 메타데이터 추출기(320) 및 미디어 데이터 추출기(340)에 전달된다. 메타데이터 추출기(320)는 각각의 레이어에 대해 메타데이터(330)를 추출한다. 미디어 데이터 추출기(340)는 메타데이터(330)를 받아들이고, 해당 미디어 데이터(350)를 추출한다. 한 다른 실시예에서는, 메타데이터 추출기(320) 및 미디어 데이터 추출기(340)가 하나의 추출기로 구현됨을 주목한다. 메타데이터(330) 및 미디어 데이터(350)인 데이터 모두는 상관기(380)에 공급되는데, 상기 상관기는 이들 두 유형의 데이터를 결합시키고, 각각의 레이어에 대해 하나의 컴포넌트 파일인 출력 컴포넌트 파일들(390)을 생성한다.3 shows the structure of a preferred encapsulator 300 used to encapsulate and generate component files from a media entity that includes multiple layers, such as SVC encoded videos. The input media entity 310 is communicated to the metadata extractor 320 and the media data extractor 340. The metadata extractor 320 extracts metadata 330 for each layer. The media data extractor 340 accepts the metadata 330 and extracts the media data 350. Note that in another embodiment, the metadata extractor 320 and the media data extractor 340 are implemented as one extractor. Both data, metadata 330 and media data 350, are supplied to the correlator 380, which combines these two types of data and output component files (one component file for each layer). 390).

SVC 또는 MVC의 AVC 확장에 의해 인코딩된 비디오와 같이 레이어링된 비디오는 다수의 미디어 컴포넌트들(스케일러블 레이어들 또는 뷰)을 포함한다. 이러한 인코딩된 비트스트림은, 비트스트림의 상이한 서브 세트들을 디코딩함으로써, 시간/공간 해상도, 품질, 뷰 등에 의해 상이한 동작 포인트들, 즉 표현들 또는 레이어들을 제공할 수 있다. 게다가, 비트스트림의 레이어들 사이의 코딩 종속성들이 존재하는데, 즉 레이어의 디코딩은 다른 레이어들에 종속될 수 있다. 따라서, 이러한 비트스트림의 표현들 중 하나를 요청하는 것은, 인캡슐레이팅된 비디오 파일로부터 하나 이상의 컴포넌트들 또는 미디어 데이터를 검색(retrieving) 및 디코딩하는 것을 요구할 수 있다. 상이한 표현들에 대한 추출 프로세스를 용이하게 하기 위해, 인코딩되고 레이어링된 비디오는 종종, 각각의 레이어가 상이한 세그먼트들 또는 컴포넌트 파일들에 개별적으로 저장되는 방식으로, MP4 파일로 인캡슐레이팅된다. 이러한 경우에, 비트스트림의, NAL 유닛들과 같은 특정 미디어 데이터 샘플들이, 상기 설명된 디코딩 종속성들 또는 어플리케이션을 기반으로 한 다른 종속성들로 인해, 다수의 세그먼트들 또는 컴포넌트 파일들에 의해 요구되며, 이들에 관련된다는 것을 고려할 필요가 있다.Layered video, such as video encoded by AVC extension of SVC or MVC, includes a number of media components (scalable layers or views). This encoded bitstream may provide different operating points, i.e., representations or layers, by temporal / spatial resolution, quality, view, etc., by decoding different subsets of the bitstream. In addition, there are coding dependencies between the layers of the bitstream, ie the decoding of the layer may depend on other layers. Thus, requesting one of the representations of such a bitstream may require retrieving and decoding one or more components or media data from the encapsulated video file. To facilitate the extraction process for different representations, encoded and layered video is often encapsulated into an MP4 file in such a way that each layer is stored separately in different segments or component files. In this case, certain media data samples, such as NAL units, of the bitstream are required by multiple segments or component files, due to the decoding dependencies described above or other dependencies based on the application, It is worth considering that they are relevant.

본 발명의 또 다른 실시예에서, 세그먼트 또는 컴포넌트 파일에 의해 요구되는 추가적인 미디어 데이터는 추출되어, 세그먼트 또는 컴포넌트 파일과 결합된다. 도 4는 본 실시예에 대한 예시를 도시한다. 본 도면에서, SVC 비트스트림은 3개의 공간 레이어들(spatial layers)인, HD1080p, SD 및 QVGA를 포함한다. 3개의 동작 포인트들에 해당하는 3개의 무비 프래그먼트들 또는 컴포넌트 파일들이 형성되고, 각각은 상이한 URL에 의해 어드레스될 수 있다. 각각의 무비 프래그먼트 또는 컴포넌트 파일 내부에 있어서, 디코딩을 위해 요구되는 모든 미디어 데이터 샘플들, 본 예시에서, NAL 유닛들은 복제되어, "mdat" 박스에 포함된 미디어 샘플들로 저장된다. 따라서, 클라이언트가 적절한 URL을 사용함으로써, 특정한 동작 포인트 또는 표현을 요청할 때, 서버는 해당 무비 프래그먼트 또는 컴포넌트 파일을 검색할 수 있고, 이들은 클라이언트로 전달될 수 있다. 본 실시예에서, 도 3의 미디어 데이터 추출기(340)는, 각각의 레이어들에 대해 추출된 미디어 데이터에 관련된 추가적인 미디어 데이터를 입력 미디어 엔티티(310)로부터, 각각의 레이어에 대해 더 추출한다. 상관기(380)는 해당 컴포넌트 파일들을 생성하기 위해, 각각의 레이어에 대해 추가적인 추출된 미디어 데이터를 더 결합시킨다.In another embodiment of the present invention, additional media data required by the segment or component file is extracted and combined with the segment or component file. 4 shows an example for this embodiment. In this figure, the SVC bitstream includes three spatial layers, HD1080p, SD and QVGA. Three movie fragments or component files corresponding to three operating points are formed, each of which may be addressed by a different URL. Within each movie fragment or component file, all the media data samples required for decoding, in this example, NAL units are duplicated and stored as media samples contained in the "mdat" box. Thus, when a client requests a particular operating point or expression by using the appropriate URL, the server can retrieve that movie fragment or component file, which can be passed to the client. In this embodiment, the media data extractor 340 of FIG. 3 further extracts, for each layer, additional media data related to the extracted media data for each layer. The correlator 380 further combines additional extracted media data for each layer to generate corresponding component files.

저장 공간 절약을 위해, 실제로 각각의 컴포넌트 파일 내에서의 동일한 데이터를 복사하지 않고, 무비 프래그먼트 또는 컴포넌트 파일 경계들을 넘나들며, NAL 유닛들과 같은 미디어 데이터 샘플들을 참조할 수 있다는 것은 바람직하다. 하지만, ISO 기반 미디어 파일 포맷(BMFF) 및 확장들은 현재 이러한 기능을 지원하지 않는다. 이러한 문제를 해결하기 위해, 본 발명의 추가적인 실시예에서, 무비 프래그먼트 또는 컴포넌트 파일의 미디어 데이터에 관련되고, 이에 의해 요구되는 이들의 추가적인 미디어 데이터에 대한 참조가 식별되고 확립된다. 상기 참조는, 상기 추가적인 미디어 데이터가 아닌, 메타데이터 및 미디어 데이터와 함께 컴포넌트 파일에 결합된다. 각각의 레이어에 대해 참조들은 추출된 미디어 데이터에 삽입될 수 있으며, 이후 해당 컴포넌트 파일들을 생성하기 위해, 각각의 레이어에 대해 추출된 메타데이터 및 추출된 미디어 데이터를 결합시킬 수 있다.To save storage space, it is desirable to be able to reference media data samples, such as NAL units, across movie fragment or component file boundaries, without actually copying the same data within each component file. However, ISO-based media file formats (BMFFs) and extensions do not currently support this feature. In order to solve this problem, in a further embodiment of the present invention, references are made to and associated with the media data of a movie fragment or component file, as required by them. The reference is coupled to the component file with metadata and media data, but not the additional media data. References to each layer may be inserted into the extracted media data, and then the extracted metadata and the extracted media data may be combined for each layer to generate corresponding component files.

본 실시예에서, 참조 식별기(360)는 인캡슐레이터(300)의 구조에 추가된다. 참조 식별기(360)는 각각의 레이어에 대해 추출된 미디어 데이터(350)에 관련된 이들의 추가적인 미디어 데이터에 대한 참조들(370)을 입력 미디어 엔티티(310)로부터 식별한다. 이후 참조들(370)은, 해당 컴포넌트 파일들(390)을 생성하기 위해, 예컨데 상기 참조들을 상기 추출된 미디어 데이터(350)에 삽입함으로써, 각각의 레이어에 대해 추출된 메타데이터(330) 및 추출된 미디어 데이터(350)와 상관기(380)를 통해 결합된다.In this embodiment, the reference identifier 360 is added to the structure of the encapsulator 300. Reference identifier 360 identifies from input media entity 310 references to their additional media data related to the extracted media data 350 for each layer. References 370 then extract the extracted metadata 330 and extraction for each layer by, for example, inserting the references into the extracted media data 350 to generate corresponding component files 390. Media data 350 is combined with a correlator 380.

앞서 논의된 바와 같이, SVC/MVC 환경에서, 이러한 참조는 "추출기"와 같은 메커니즘들을 사용함으로써 확립될 수 있다. 현재 추출기들은 다른 트랙들로부터의, 그러나 동일한 무비 박스/프래그먼트에 속하는 NAL 유닛들만을, 참조에 의해서 추출할 수 있다. 다른 말로, 상이한 세그먼트 또는 파일로부터 NAL 유닛들을 추출하기 위해 추출기들을 사용하는 것은 가능하지 않다. 이러한 제약은 다른 경우들에 있어서 추출기들의 사용을 제한한다. 이후에, 추출기 데이터 구조에 대한 확장이 개시되는데, 여기서 확장은 앞서 설명된 다수의 컴포넌트 파일들로의 SVC/MVC 유형 레이어링된 비디오 컨텐츠의 효율적인 인캡슐레이션을 지원하는 것을 목표로 한다. As discussed above, in an SVC / MVC environment, such a reference may be established by using mechanisms such as an “extractor”. Currently extractors can only extract, by reference, NAL units from other tracks, but belonging to the same movie box / fragment. In other words, it is not possible to use extractors to extract NAL units from different segments or files. This constraint limits the use of extractors in other cases. Subsequently, an extension to the extractor data structure is disclosed, where the extension is aimed at supporting efficient encapsulation of SVC / MVC type layered video content into a number of component files described above.

확장은, 추출기가 귀속하는 파일 이외의 상이한 무비 박스/프래그먼트 또는 컴포넌트 파일에 귀속하는 NAL 유닛들을 참조하는 추가적인 기능을 추출기 데이터 구조에 제공하도록, 추가된다.The extension is added to provide the extractor data structure with additional functionality that references NAL units that belong to different movie box / fragment or component files other than the file to which the extractor belongs.

확장된 추출기는 다음과 같이 정의된다:The extended extractor is defined as follows:

구문:construction:

aligned (8) class DataEntryUrlBox (bit(24)flags)aligned (8) class DataEntryUrlBox (bit (24) flags)

extends FullBox('url', version = 0, flags){extends FullBox ('url', version = 0, flags) {

string location;string location;

}}

aligned (8) class DataEntryUrnBox (bit(24)flags)aligned (8) class DataEntryUrnBox (bit (24) flags)

extends FullBox('urn', version = 0, flags){extends FullBox ('urn', version = 0, flags) {

string name;string name;

string location;string location;

}}

class aligned (8) Extractor (){class aligned (8) Extractor () {

NALUnitHeader ();NALUnitHeader ();

DataEntryBox(entry_version, entry_flags) data_entry;//added extesionDataEntryBox (entry_version, entry_flags) data_entry; // added extesion

unsigned int(8) track_ref_index;unsigned int (8) track_ref_index;

signed int(8) sample_offset;signed int (8) sample_offset;

unsigned int((lengthSizeMinusOne + 1) * 8)unsigned int ((lengthSizeMinusOne + 1) * 8)

data_offset;data_offset;

data_length;data_length;

}}

의미 :meaning :

data_entry는 유니폼 리소스 로케이터(URL) 또는 유니폼 리소스 네임(URN : Uniform Resource Name) 엔트리이다. Name은 URN이고, URN 엔트리에 요구된다. Location은 URL이고, URL 엔트리에 요구되며, URL 엔트리에서 선택 가능한데, 이것은 부여된 이름을 통해 리소스를 찾기 위한 위치(location)를 부여한다. 각각은 UTF-8 문자들(characters)을 사용하는, 널문자로 끝나는(Null-terminated) 문자열(string)이다. 독립적인(self-contained) 플래그가 셋(set)되는 경우, URL 형식(form)이 사용되고, 어떤 문자열도 존재하지 않으며; 박스는 엔트리-플래그들 필드(entry-flags field)로 끝난다. URL 유형은 파일을 전달하는 서비스일 것이다. 상대 URL들(relative URLs)은 허용 가능하며, 추출기가 속하는 트랙을 포함한 무비 박스/프래그먼트를 포함하는 파일과 관련이 있다.data_entry is a Uniform Resource Locator (URL) or Uniform Resource Name (URN) entry. Name is a URN and is required for a URN entry. Location is a URL, required for a URL entry, selectable from the URL entry, which gives the location to find the resource via the given name. Each is a null-terminated string that uses UTF-8 characters. If the self-contained flag is set, the URL form is used and no string exists; The box ends with an entry-flags field. The URL type will be a service for delivering files. Relative URLs are acceptable and are associated with a file containing a movie box / fragment containing the track to which the extractor belongs.

다른 필드들은 앞서 설명된 본래의 추출기와 동일한 의미를 갖는다.The other fields have the same meaning as the original extractor described above.

확장된 추출기를 통해, 이제 추출기가 속하는 것과는 상이한 무비 박스/프래그먼트로부터, 참조에 의해, NAL 유닛을 추출하는 것이 가능하다. 도 5는 도 4와 동일한 SVC 비트스트림을 갖지만 새로운 확장된 추출기 데이터 구조를 사용하는 예시를 도시한다. 본 도면으로부터 확인될 수 있는 바와 같이, 이제 SD 무비 프래그먼트는 QVGA 무비 프래그먼트들로부터 NAL 유닛들을 참조할 수 있다. 마찬가지로, HD1080p 무비 프래그먼트는 QVGA 및 SD 무비 프래그먼트들 모두로부터 NAL 유닛들을 참조하기 위해 추출기들을 사용할 수 있다. 도 4와 비교해볼 때, 어떤 NAL 유닛들도 이들 무비 프래그먼트들을 넘나들며 복사되지 않기 때문에, 저장 공간이 절약된다.With the extended extractor, it is now possible to extract, by reference, a NAL unit from a movie box / fragment that is different than the extractor belongs to. FIG. 5 shows an example having the same SVC bitstream as FIG. 4 but using a new extended extractor data structure. As can be seen from this figure, the SD movie fragment can now refer to NAL units from QVGA movie fragments. Similarly, the HD1080p movie fragment can use extractors to reference NAL units from both QVGA and SD movie fragments. Compared with FIG. 4, no NAL units are copied across these movie fragments, thus saving storage space.

도 6은 발명된 새로운 추출기 데이터 구조를 사용하는, 다수의 무비 프래그먼트들 또는 컴포넌트 파일들로의 SVC/MVC 유형 비디오 비트스트림에 대한 포함된 인캡슐레이션 동작들을 도시한다. 본 프로세스는 단계(601)에서 시작한다. 각각의 NAL 유닛은 단계(610)에서 하나씩 판독된다. 비트스트림의 끝이 단계(620)에 도달되면, 본 프로세스는 단계(690)에서 종료된다; 그렇지 않으면, 본 프로세스는 다음 단계(630)로 진행한다. 결정 단계(630)는 현재의 NAL 유닛이 디코딩을 위해 다른 트랙으로부터의 NAL 유닛들에 종속되는지를 결정한다. 만일, 상기 결정이 현재의 NAL 유닛은 디코딩을 위해 다른 트랙들로부터의 NAL 유닛들에 종속되지 않다는 것일 경우, 제어는 단계(640)로 전달되는데, 여기서 현재의 NAL 유닛을 사용하는 샘플이 형성되고, 이는 현재의 트랙에 위치된다. 만일, 단계(630)로부터의 결정이 현재의 NAL 유닛과 다른 트랙으로부터의 NAL 유닛들 사이에 종속성이 존재한다는 것일 경우, 본 프로세스는 단계(650)로 진행한다. 결정 단계(650)는, 현재의 NAL 유닛에 의해 NAL 유닛들이 요구되는 트랙이 동일한 무비 프래그먼트 내에 귀속하는지의 여부를 더 결정한다. 상기 결정이 상기 트랙은 동일한 무비 프래그먼트 내에 귀속한다는 것이라면, 단계(670)는, 다른 트랙으로부터 NAL 유닛을 참조하도록 확장된 추출기를 채우기(fill in) 위해 활용된다. 상기 결정이 상기 트랙은 상이한 무비 프래그먼트에 귀속한다는 것이라면, 이러한 무비 프래그먼트의 URL 또는 URN은 단계(660)에서 식별되며, 본 프로세스는 확장된 추출기에 채워질 식별된 URL 및 URN를 통해 단계(670)로 진행한다. 이러한 확장된 추출기가 채워진 이후에, 이것은 단계(680)에서 현재의 트랙에 삽입된다. 이후, 본 프로세스는 단계(610)에서 다음의 NAL 유닛과 함께 시작한다.FIG. 6 illustrates included encapsulation operations for an SVC / MVC type video bitstream into multiple movie fragments or component files, using the inventive extractor data structure. The process begins at step 601. Each NAL unit is read one at step 610. If the end of the bitstream reaches step 620, the process ends at step 690; Otherwise, the process proceeds to the next step 630. Decision step 630 determines whether the current NAL unit is dependent on NAL units from another track for decoding. If the determination is that the current NAL unit is not dependent on NAL units from other tracks for decoding, then control passes to step 640, where a sample using the current NAL unit is formed. , It is located on the current track. If the determination from step 630 is that there is a dependency between the current NAL unit and NAL units from another track, the process proceeds to step 650. The decision step 650 further determines whether the track for which NAL units are required by the current NAL unit belongs to the same movie fragment. If the determination is that the track belongs to the same movie fragment, step 670 is utilized to fill in the extractor extended to reference the NAL unit from another track. If the determination is that the track belongs to a different movie fragment, then the URL or URN of this movie fragment is identified at step 660 and the process proceeds to step 670 via the identified URL and URN to be populated with the extended extractor. Proceed. After this extended extractor is filled, it is inserted in the current track at step 680. The process then begins at step 610 with the next NAL unit.

다른 한 실시예에서, 참조들(370)은 추출된 메타데이터(330)에 삽입되며, 참조(370)에 대한 색인들은 상관기(380)를 통해 추출된 미디어 데이터(350)에 추가되는데, 여기서 상관기(380)는 해당 컴포넌트 파일들(390)을 생성하기 위해, 각각의 레이어에 대해 메타데이터와 미디어 데이터를 더 결합시킨다. ISO 미디어 기반 파일 포맷의 환경에서, HTTP 스트리밍 정보 박스라고 불리는 박스가 개시된다. 본 박스는 ISO 파일의 HTTP 스트리밍을 도울 수 있는 정보를 포함한다. HTTP 스트리밍 정보 박스가 컴포넌트 파일들에 있어서 가급적 초반에, 예컨데 상기 파일들의 시작부에 위치되어야 한다는 것은 바람직하다. 본 박스는 또한, 클라이언트에 대한 메니페스트 파일을 형성할 때, 서버에 의한 소스의 역할을 수행할 수 있다. HTTP 스트리밍 정보 박스에 포함된 미디어 참조 박스(Media Reference Box)라 불리는 또 다른 유형의 박스가 또한 개시된다. 본 박스는 외부적인 종속 파일들에 대한 정보를 포함한다. 추출기 구조는, 상이한 컴포넌트 파일들을 넘나들며 미디어 샘플들을 참조할 수 있도록, 더 확장된다. 미디어 참조 박스 내에 포함된 정보는 시그널링 오버헤드(signaling overhead)를 피하기 위해, 추출기들에 의해 활용될 수 있다.In another embodiment, the references 370 are inserted into the extracted metadata 330, with indices for the reference 370 added to the extracted media data 350 through the correlator 380, where the correlator 380 further combines metadata and media data for each layer to create corresponding component files 390. In the context of the ISO media based file format, a box called an HTTP streaming information box is disclosed. This box contains information that can help with HTTP streaming of ISO files. It is desirable that the HTTP streaming information box be located as early as possible in the component files, eg at the beginning of the files. The box may also serve as a source by the server when forming the manifest file for the client. Another type of box called a Media Reference Box included in an HTTP streaming information box is also disclosed. This box contains information about external dependent files. The extractor structure is further extended to reference media samples across different component files. The information contained within the media reference box can be utilized by the extractors to avoid signaling overhead.

본 제안된 HTTP 스트리밍 정보 박스, 미디어 참조 박스, 및 추가로 개선된 추출기들에 대한 상세한 정의(detailed definition)는 다음과 같다.Detailed definitions of the proposed HTTP streaming information box, media reference box, and further improved extractors are as follows.

● HTTP 스트리밍 정보 박스● HTTP streaming information box

정의(Definition) :Definition:

박스 유형 : 'hsin'Box type: 'hsin'

컨테이너(Container) : 파일Container: File

강제성(Mandatory) : 아니오Mandatory: No

수량(Quantity) : 0 또는 1Quantity: 0 or 1

HTTP 스트리밍 정보 박스는 ISO 미디어 파일의 HTTP 스트리밍 동작을 돕는다. 이것은, 다른 가능한 유형들의 박스들 중에서, 아래에 정의된 미디어 참조 박스를 포함하는 파일의 HTTP 스트리밍 전달에 대한 관련 정보를 포함한다. HTTP 스트리밍 정보 박스는 최대의 활용(maximum utility)을 위해 파일들 내에서 가급적 초반에 바람직하게 위치된다.The HTTP Streaming Information box assists in the HTTP streaming operation of ISO media files. This includes, among other possible types of boxes, relevant information about the HTTP streaming delivery of a file containing a media reference box defined below. The HTTP streaming information box is preferably located as early as possible within the files for maximum utility.

구문 : Syntax:

aligned(8) class HTTPStreamingInfoBox extends Box ('hsin'){aligned (8) class HTTPStreamingInfoBox extends Box ('hsin') {

}}

● 미디어 참조 박스● Media Reference Box

정의 :Justice :

박스 유형 : 'mref'Box type: 'mref'

컨테이너 : 'hsin'Container: 'hsin'

강제성 : 아니오Force: No

수량 : 0 또는 1Quantity: 0 or 1

미디어 참조 박스는 HTTP 스트리밍 정보 박스에 포함되며, 본 박스에 포함된 각각의 트랙이 종속된 외부 파일들의 위치들을 표시하는 URL 형태의 데이터 참조들의 표를 포함한다. 본 박스를 판독함으로써, 파일 판독기는 파일 내의 트랙의, 외부 컴포넌트 파일들과 같은 외부 종속 파일 소스들, 및 이들을 검색하기 위한 수단을 식별할 수 있다.The media reference box is included in the HTTP streaming information box and includes a table of data references in the form of URLs indicating the locations of the external files on which each track included in this box depends. By reading this box, the file reader can identify external dependent file sources, such as external component files, of tracks in the file, and means for retrieving them.

구문 :Syntax:

aligned(8) class DataEntryUrlBox (bit(24) flags) extends Box('url'){aligned (8) class DataEntryUrlBox (bit (24) flags) extends Box ('url') {

string location;string location;

}}

aligned(8) class MediaReferenceBox extends Box('mref'){aligned (8) class MediaReferenceBox extends Box ('mref') {

unsigned int(16) entry_count;unsigned int (16) entry_count;

for(i=1; i<=entry_count; i++){for (i = 1; i <= entry_count; i ++) {

unsigned int(32) track_ID;unsigned int 32 track_ID;

unsigned int(16) dependent_source_count;unsigned int (16) dependent_source_count;

for(j=1; j<=dependent_source_count;j++){for (j = 1; j <= dependent_source_count; j ++) {

DataEntryUrlBox data_entry;DataEntryUrlBox data_entry;

}}

의미 : meaning :

entry_count : 실제 엔트리들을 카운팅하는 정수(integer)이다.entry_count: An integer that counts actual entries.

track_ID : 박스가 적용되는 파일 내의 트랙만을 식별하는 정수이다;track_ID: an integer identifying only the track in the file to which the box applies;

dependent_source_count : track_ID를 갖는 파일 내의 트랙이 종속된 외부 미디어 소스들을 카운팅하는 정수이다. dependent_source_count: An integer that counts external media sources that the track in the file with track_ID depends on.

data_entry : 지정된 트랙이 종속된 하나의 외부 미디어 소스를 지시하는 URL 엔트리이다. 이들 각각은 UTF-8 문자들을 사용하는, 널문자로 끝나는 문자열이다. URL 유형은 파일을 전달하는 서비스일 것이다. 상대 URL들은 허용 가능하며, 본 미디어 참조 박스를 포함하는 파일과 관련이 있다.data_entry: A URL entry indicating one external media source that the specified track is dependent on. Each of these is a null terminated string, using UTF-8 characters. The URL type will be a service for delivering files. Relative URLs are acceptable and related to the file containing this media reference box.

상기 정의된 미디어 참조 박스는 하나 이상의 레이어를 포함하는 미디어 엔티티의 HTTP 스트리밍을, 많은 방법들로 용이하게 하도록 설계되었다.The media reference box defined above is designed to facilitate, in many ways, the HTTP streaming of a media entity comprising one or more layers.

우선적으로, 참조 표를 통해 컴포넌트 파일의 시작부에서 컴포넌트 파일들 사이의 종속 관계를 명백히 시그널링할 수 있다. 따라서, 일단 클라이언트가 컴포넌트 파일의 작은 부분을 다운로딩했다면, 자체 트랙(들)에 대한 모든 관련된 외부 컴포넌트 파일들을 알 수 있으며, 필요하다면, 표에 포함된 참조들을 통해 재생을 위한 완전한 세트(들)을 획득하기 위해 해당 요청들을 할 수 있다.First of all, it is possible to explicitly signal the dependencies between the component files at the beginning of the component file via the reference table. Thus, once the client has downloaded a small portion of the component file, all relevant external component files for its own track (s) can be known and, if necessary, the complete set (s) for playback via the references contained in the table. The requests can be made to obtain.

둘째로, 본 박스로부터의 인-파일(in-file) 정보는 메니페스트 파일에 포함되도록 쉽게 추출될 수 있다. 메니페스트 내의 이러한 정보는, 실제 HTTP 스트리밍 이전에, 클라이언트가 관련 서비스 정보를 발견하고, 모든 결합된 컴포넌트 파일들을 요청하는 것, 그리고 필수 버퍼 자원들을 할당하는 것 등과 같은 해당 서비스 초기화(service initialization)를 수행하는 것을 도울 수 있다.Second, in-file information from this box can be easily extracted to be included in the manifest file. This information in the manifest allows the service initialization, such as the client to find relevant service information, request all associated component files, and allocate required buffer resources, before actual HTTP streaming. Can help to perform.

셋째로, 클라이언트가 일부 멀티-컴포넌트 미디어 컨텐츠의 상이한 표현을 요청하고, 또 다른 표현이 컴포넌트 파일로서 이미 전달되었을 때, 클라이언트는 파일 내의 해당 미디어 참조 박스를 체킹하여, 해당 파일이, 재사용될 수 있는 새로운 표현에 대한 임의의 종속 컴포넌트들을 포함하는지의 여부를 확인할 수 있다.Third, when a client requests a different representation of some multi-component media content and another representation has already been delivered as a component file, the client checks the corresponding media reference box in the file so that the file can be reused. You can check whether it includes any dependent components for the new representation.

결국, 본 박스는 아래에 정의된 바와 같이 확장된 추출기 구조의 시그널링 오버헤드를 감소시키는 것을 돕는다.In turn, this box helps to reduce the signaling overhead of the extended extractor structure as defined below.

● 추출기들● extractors

추출기는 외부 미디어 파일들의 트랙들로부터의 데이터를 참조하는 기능을 확장하도록 더 제안되었다.The extractor was further proposed to extend the ability to reference data from tracks of external media files.

확장된 구문 : Extended syntax:

class aligned(8) Extractor(){class aligned (8) Extractor () {

NALUnitHeader ();NALUnitHeader ();

unsigned int(16) media_reference_index;unsigned int (16) media_reference_index;

unsigned int(8) track_ref_index;unsigned int (8) track_ref_index;

signed int(8) sample_offset;signed int (8) sample_offset;

unsigned int ((lengthSizeMinusOne + 1) * 8)unsigned int ((lengthSizeMinusOne + 1) * 8)

data_offset;data_offset;

data_length;data_length;

}}

의미 : meaning :

media_reference_index : 추출기를 포함하는 트랙과 동일한 결합된 track_ID 값을 갖는 미디어 참조 박스 내에 포함된 참조 표에 대한 엔트리의 색인을 명시한다. media_reference_index가 0과 동일하다면, 추출기는 또 다른 트랙으로부터의, 그러나 추출기와 동일한 파일에 속하는 데이터를 참조한다. 이러한 경우에, 트랙과 동일한 track_ID 값을 갖는 미디어 참조 박스 내의 참조 표는 존재하지 않을 것이다. media_reference_index가 1과, 미디어 참조 박스로부터의 트랙과 결합된 참조 표로부터의 dependent_source_count 값 사이에 존재한다면, 참조 표로부터의 media_reference_index로 참조되는 URL은 외부 파일을 지시하는데, 상기 외부 파일은 추출자가 데이터를 추출하는 트랙을 포함한다.media_reference_index: specifies the index of the entry for the reference table contained within the media reference box with the same combined track_ID value as the track containing the extractor. If media_reference_index is equal to 0, the extractor references data from another track but belonging to the same file as the extractor. In this case, there would not be a lookup table in the media reference box with the same track_ID value as the track. If media_reference_index is between 1 and the value of dependent_source_count from the reference table associated with the track from the media reference box, the URL referenced by the media_reference_index from the reference table indicates an external file, where the extractor extracts data. It includes a track.

다른 필드들의 의미들은 본래의 추출기 정의와 여전히 동일하다.The meanings of the other fields are still the same as the original extractor definition.

추가적인 확장된 추출기 구조를 통해, 이제 외부 컴포넌트 파일에 속하는 트랙으로부터의 데이터에 링크하고, 상기 데이터를 추출하기 위해, 추출기들을 사용하는 것이 가능하다. 이것은, SVC 또는 MVC에 의해 인코딩된 것과 같이 멀티-컴포넌트 미디어 컨텐츠의 인코딩된 조각으로부터의 컨텐츠 컴포넌트들이 상이한 컴포넌트 파일들로 인캡슐레이팅될 때, 특히 유용하다. 확장된 추출기들을 통해, 추출은 파일 경계들을 넘나들며 발생할 수 있다. 이것은 상이한 컴포넌트 파일들내에서 동일한 데이터를 복사하는 것을 피한다.With an additional extended extractor structure, it is now possible to link to data from tracks belonging to an external component file and to use extractors to extract the data. This is particularly useful when content components from encoded pieces of multi-component media content, such as encoded by SVC or MVC, are encapsulated into different component files. With extended extractors, extraction can occur across file boundaries. This avoids copying the same data in different component files.

도 9는 개시된 HTTP 스트리밍 정보 박스, 미디어 참조 박스, 및 추가적인 확장된 추출기 데이터 구조를 사용하는, 다수의 무비 프래그먼트들 또는 컴포넌트 파일들로의 SVC/MVC 유형 비디오 비트스트림에 대한 포함된 인캡슐레이션 동작들을 도시한다. 본 프로세스는 상기 설명된 박스들로 인한 몇 가지의 수정들, 및 추출기의 추가적인 확장을 포함한 도 6에 도시된 프로세스와 유사하다. 위치 정보 URL/URN이 단계(660)에서 식별된 이후에, 위치 정보는 단계(965)에서 mref 박스(Media Reference Box) 내의 참조 표를 채우기 위해 사용된다. 단계(970)는 참조 표의 위치 정보에 대한 색인들로 추출기를 더 채운다. 이후, 추출기는 현재의 트랙에 삽입된다. 비트스트림의 끝이 단계(620)에 도달할 때, mref 박스 및 컨테이너 hsin 박스(HTTP Streaming Information Box)는 컴포넌트 파일의 메타데이터에 삽입된다.FIG. 9 includes an encapsulation operation for an SVC / MVC type video bitstream into multiple movie fragments or component files, using the disclosed HTTP streaming information box, media reference box, and additional extended extractor data structure. Show them. The process is similar to the process shown in FIG. 6, including some modifications due to the boxes described above, and further expansion of the extractor. After the location information URL / URN has been identified at step 660, the location information is used to populate a lookup table in the mref box (Media Reference Box) at step 965. Step 970 further populates the extractor with indices for the location information of the lookup table. The extractor is then inserted into the current track. When the end of the bitstream reaches step 620, the mref box and the container hsin box (HTTP Streaming Information Box) are inserted into the metadata of the component file.

컴포넌트 파일을 판독하기 위해, 도 7에 도시된 파일 판독기(700)가 활용된다. 분석기(710)는 우선적으로 메타데이터, 미디어 데이터, 및 이용 가능하다면, 참조를 얻기 위해 컴포넌트 파일을 분석한다. 디코딩된 참조에 따라, 미디어 데이터가 디코딩 종속성(decoding dependency)을 통해서와 같이, 다른 컴포넌트 파일들의 미디어 데이터에 관련이 있다면, 검색기(720)는 참조로 지시된 다른 컴포넌트 파일들로부터 관련 미디어 데이터를 검색한다. 프로세서(730)는 컴포넌트 파일로부터 획득된 메타데이터, 미디어 데이터, 및 이용 가능하다면, 추가적인 미디어 데이터를 더 처리한다. 분석기(710)에 의한 분석 동작은 메타데이터, 프로세서(730)를 위해 준비된 미디어 데이터, 및 검색기(720)를 위해 준비된 참조를 획득하기 위한 다양하고 필수적인 동작들을 포함한다. 이것은, 필요할 때, 메타데이터 및/또는 미디어 데이터를 더 분석하는 단계를 포함할 것이다. 한 실시예에서, 참조는 미디어 데이터에 삽입되며, 따라서 참조는 미디어 데이터를 분석함으로써 획득된다. 참조가 이용 가능하다면, 분석 단계는 참조의 구문을 분석하는 단계, 및 참조를 디코딩하는 단계를 더 포함한다. 컴포넌트 파일이 비디오 컨텐츠를 포함한다면, 프로세서(730)는 비디오 디코더를 포함할 수 있다. 한 상이한 실시예에서, 분석기 및 검색기는 프로세서 내에 병합될 수 있다.To read the component file, the file reader 700 shown in FIG. 7 is utilized. The analyzer 710 first analyzes the component file to obtain metadata, media data, and, if available, a reference. According to the decoded reference, if the media data is related to the media data of other component files, such as through a decoding dependency, then the searcher 720 retrieves the relevant media data from the other component files indicated by the reference. do. Processor 730 further processes metadata obtained from the component file, media data, and additional media data, if available. Analysis operations by analyzer 710 include various and necessary operations to obtain metadata, media data prepared for processor 730, and references prepared for searcher 720. This may include further analyzing the metadata and / or media data when needed. In one embodiment, the reference is inserted into the media data, so that the reference is obtained by analyzing the media data. If a reference is available, the analyzing step further includes parsing the reference, and decoding the reference. If the component file includes video content, processor 730 may include a video decoder. In one different embodiment, the analyzer and searcher may be incorporated into a processor.

도 8은 본 발명을 포함하는 비디오 디코더에 대한 SVC/MVC 유형 비디오 비트스트림을 판독하는 프로세스를 도시한다. 단계(801)는 컴포넌트 비디오 파일에 액세스하는데, 각각의 레이어 대해 상기 컴포넌트 비디오 파일의 메타데이터 및 미디어 데이터는 단계(805)에서 식별된다. 식별된 메타데이터 및 미디어 데이터는 단계(810)에서 분석되고, 미디어 데이터의 각각의 NAL 유닛은 단계(815)에서 하나씩 판독된다. 현재의 NAL 유닛에 대해, 우선적으로, 비트스트림의 끝이 도달되었는지의 여부를 결정하기 위한 결정이 단계(820)에서 이루어지며, 해당 대답이 "예"라면, 본 프로세스는 단계(825)에서 종료된다. 그렇지 않으면, 본 프로세스는 결정 단계(830)로 진행하여, 현재의 NAL 유닛이 추출기인지의 여부를 결정한다. 만일, 현재의 NAL 유닛이 추출기가 아니라면, 이는 현재의 NAL 유닛이 데이터를 디코딩하는 단계를 포함하는 일반적인 NAL 유닛이라는 것을 의미하며, NAL 유닛은 단계(835)에서 디코더로 전송된다. 만일, 현재의 NAL 유닛이 추출기라면, 현재의 NAL 유닛이 동일한 컴포넌트 파일 외부의 NAL 유닛에 종속되는지의 여부가 단계(840)에서 결정된다. 요구되는 NAL 유닛이 동일한 컴포넌트 파일 내에 속해 있다면, 이것은 단계(845)에서 현재의 파일로부터 검색되며, 단계(835)에서 디코더로 전송된다. 요구되는 NAL 유닛이 또 다른 컴포넌트 파일로부터 온 것이라면, NAL 유닛은 단계(850)에서 참조 정보 Data _ entry를 사용하여 추출기 내에 위치되고, 단계(855)에서 리모트 파일(remote file)로부터 검색되며, 이후 단계(835)에서 디코더로 전송된다. 8 shows a process for reading an SVC / MVC type video bitstream for a video decoder incorporating the present invention. Step 801 accesses the component video file, where the metadata and media data of the component video file is identified in step 805 for each layer. The identified metadata and media data are analyzed at step 810 and each NAL unit of media data is read one at step 815. For the current NAL unit, a decision is first made at step 820 to determine whether the end of the bitstream has been reached, and if the answer is yes, then the process ends at step 825. do. Otherwise, the process proceeds to decision step 830 to determine whether the current NAL unit is an extractor. If the current NAL unit is not an extractor, this means that the current NAL unit is a generic NAL unit that includes decoding the data, which is sent to the decoder in step 835. If the current NAL unit is an extractor, it is determined at step 840 whether the current NAL unit is dependent on NAL units outside the same component file. If the required NAL unit belongs to the same component file, it is retrieved from the current file in step 845 and sent to the decoder in step 835. If the NAL unit that is required comes from another component files, NAL units are located in the extractor use the reference information Data _ entry in step 850, it is retrieved from the remote file (remote file), at step 855, since In step 835 it is sent to the decoder.

또 다른 실시예에서, 삽입된 참조 색인들을 얻고, 상기 참조 색인들에 따른 해당 참조를 획득하기 위해, 미디어 데이터를 분석함으로써, 참조는 분석기(710)에서 식별된다. 비디오 디코더에 대한 SVC/MVC 유형 비디오 비트스트림을 판독하는 해당 프로세스가 도 10에 도시되는데, 이는 도 8의 프로세스와 유사하다. 단계(810)에서, 한 선호되는 실시예에 따라, 참조가 컴포넌트 파일의 시작부에 위치되기 때문에, 단계(810)에서 메타데이터의 분석은 미디어 데이터의 분석과 병행하여 포함된 참조의 분석을 가능하게 한다. 참조를 분석할 때, 참조되는 다른 컴포넌트 파일들은 단계(1014)에서 식별된다. 상기 다른 컴포넌트 파일들에 대한 검색은 프로세스 중 나머지 단계들과 병행하여, 단계(1012)에서 시작된다. 단계(850)에서 현재의 NAL 유닛이 종속되는 컴포넌트 파일의 위치 정보에 액세스한 이후에, 미디어 버퍼와 같은 로컬 저장소(local storage)는 이러한 컴포넌트 파일의 이용도(availability)를 위해 체킹된다. 요구되는 컴포넌트 파일이 로컬로 이용 가능하다면, 로컬 복제(local copy)의 NAL 유닛이 검색되고; 그렇지 않으면, 리모트 파일로부터의 NAL 유닛이 검색된다. 컴포넌트 파일의 로컬 복제가 단계(1012)에서 병행하는 검색에 의해 획득될 수 있거나, 또는 이것이 이러한 컴포넌트 파일의 이전 요청으로부터 획득될 수 있다는 것에 주목한다.In another embodiment, the reference is identified at analyzer 710 by obtaining the inserted reference indices and analyzing the media data to obtain a corresponding reference according to the reference indices. The corresponding process of reading an SVC / MVC type video bitstream for a video decoder is shown in FIG. 10, which is similar to the process of FIG. 8. In step 810, according to one preferred embodiment, because the reference is located at the beginning of the component file, the analysis of the metadata in step 810 enables analysis of the included reference in parallel with the analysis of the media data. Let's do it. When resolving the reference, the other component files referenced are identified in step 1014. The search for the other component files begins at step 1012, in parallel with the remaining steps of the process. After accessing the location information of the component file upon which the current NAL unit is dependent in step 850, local storage, such as a media buffer, is checked for availability of such component file. If the required component file is available locally, a NAL unit of local copy is retrieved; Otherwise, the NAL unit from the remote file is retrieved. Note that a local copy of the component file may be obtained by a parallel search in step 1012, or this may be obtained from a previous request of such a component file.

본 발명의 선호되는 실시예들이 본 명세서에서 상세하게 설명되었을지라도, 본 발명이 상기 실시예들에 제한되는 것은 아니며, 다른 수정들 및 변형들이, 첨부된 청구항들에 의해 한정되는 본 발명의 범주로부터 벗어나지 않고 당업자에 의해 성취될 수 있다는 것이 이해되어야 할 것이다. Although preferred embodiments of the present invention have been described in detail herein, the present invention is not limited to the above embodiments, and other modifications and variations are intended from the scope of the invention, which is defined by the appended claims. It will be appreciated that it may be accomplished by one of ordinary skill in the art without departing.

300 : 인캡슐레이터 310 : 미디어 엔티티
320 : 메타데이터 추출기 330 : 메타데이터
340 : 미디어 데이터 추출기 350 : 미디어 데이터
360 : 참조 식별기 370 : 참조
380 : 상관기 390 : 컴포넌트 파일들
700 : 파일 판독기 710 : 분석기
720 : 검색기 730 : 프로세서300: encapsulator 310: media entity
320: metadata extractor 330: metadata
340: Media Data Extractor 350: Media Data
360: reference identifier 370: reference
380: Correlator 390: Component Files
700: File Reader 710: Analyzer
720: Finder 730: Processor

Claims

A method for generating component files from a media entity comprising one or more layers, the method comprising:
Extracting metadata for each layer from the media entity;
Extracting media data from the media entity, corresponding to the extracted metadata for each layer of the media entity; And
Associating the extracted media data with the extracted metadata to enable generation of a component file including the extracted metadata and extracted media data for each layer; Included,
A method for generating component files from a media entity that includes one or more layers.

The method of claim 1, wherein the component file is at least one of a movie box, a movie fragment, a segment, and a file.
A method for generating component files from a media entity that includes one or more layers.

The method of claim 1, further comprising: extracting, from the media entity, for each layer, additional media data related to the extracted media data for each layer; And
Combining, for each layer, the extracted media data and the additional media data to generate corresponding component files;
A method for generating component files from a media entity that includes one or more layers.

The method of claim 1,
Identifying references to additional media data related to the extracted media data for each layer; And
Combining the extracted metadata and extracted media data with the references, for each layer, to generate corresponding component files;
A method for generating component files from a media entity that includes one or more layers.

The method of claim 4, wherein the media data and additional media data comprise data samples.
A method for generating component files from a media entity that includes one or more layers.

The apparatus of claim 5, wherein the data sample comprises a network abstract layer unit.
A method for generating component files from a media entity that includes one or more layers.

The apparatus of claim 6, wherein the references include, in the additional media data, at least one of a uniform resource locator and a uniform resource name of the network abstract layer unit.
A method for generating component files from a media entity that includes one or more layers.

5. The method of claim 4,
Inserting the references into the extracted metadata for each layer; And
Adding indices to the references in the extracted media data; further comprising:
A method for generating component files from a media entity that includes one or more layers.

The method of claim 8, wherein the references are located at the beginning of the component file for each layer.
A method for generating component files from a media entity that includes one or more layers.

The apparatus of claim 8, wherein the references are populated in media reference boxes and the indices are populated in extractors.
A method for generating component files from a media entity that includes one or more layers.

A file encapsulator for generating component files from a media entity that includes one or more layers, the encapsulator comprising:
An extractor for extracting metadata for each layer from the media entity and extracting media data from the media entity corresponding to the extracted metadata for each layer of the media entity; And
A correlator for combining the extracted media data with the extracted metadata to enable generation of the respective layer of the extracted metadata and the component file including the extracted media data. Comprising;
File encapsulator for generating component files from a media entity that includes one or more layers.

The method of claim 11, wherein the component file is at least one of a movie box, a movie fragment, a segment, and a file.
File encapsulator for generating component files from a media entity that includes one or more layers.

12. The apparatus of claim 11, wherein the extractor further extracts additional media data related to the extracted media data for each layer from the media entity for the respective layer; The correlator further combines the extracted media data and the additional media data for each layer to generate corresponding component files,
File encapsulator for generating component files from a media entity that includes one or more layers.

12. The apparatus of claim 11, further comprising a reference identifier for identifying a reference to additional media data from the media entity associated with the extracted media data for each layer, the reference being: Combined with the extracted metadata and extracted media data for each layer through the correlator to generate component files,
File encapsulator for generating component files from a media entity that includes one or more layers.

The method of claim 14, wherein the media data and the additional media data comprise data samples.
File encapsulator for generating component files from a media entity that includes one or more layers.

The apparatus of claim 15, wherein the data sample comprises a network abstract layer unit.
File encapsulator for generating component files from a media entity that includes one or more layers.

17. The apparatus of claim 16, wherein the reference comprises, in the additional media data, at least one of a uniform resource locator of the network abstraction layer and a uniform resource name.
File encapsulator for generating component files from a media entity that includes one or more layers.

15. The apparatus of claim 14, wherein the correlator further inserts the reference to the extracted metadata for each layer and adds indices to the references in the extracted media data.
File encapsulator for generating component files from a media entity that includes one or more layers.

19. The system of claim 18, wherein the correlator locates the references at the beginning of the component file for each layer.
File encapsulator for generating component files from a media entity that includes one or more layers.

20. The apparatus of claim 19, wherein the references are populated in media reference boxes and the indices are populated in extractors.
File encapsulator for generating component files from a media entity that includes one or more layers.

A method for reading a component file,
Analyzing the component file to obtain metadata, media data, and references; And
In accordance with the references, if the media data of the component file relates to media data of other component files, retrieving the related media data from the other component files using the references; Included,
Method for reading a component file.

The method of claim 21, wherein the media data of the component file is related to media data of other component files according to a coding dependency.
Method for reading a component file.

The method of claim 21, wherein the media data and the related media data comprise data samples.
Method for reading a component file.

The method of claim 23, wherein the data sample comprises a network abstract layer unit.
Method for reading a component file.

The method of claim 21, further comprising analyzing the metadata to obtain the references.
Method for reading a component file.

26. The method of claim 25, further comprising: analyzing the media data to obtain embedded reference indices; And
Acquiring corresponding references according to the reference indices;
Method for reading a component file.

The method of claim 25, wherein the retrieving comprises simultaneously retrieving the other component files in accordance with the references.
Method for reading a component file.

28. The method of claim 27, wherein said retrieving comprises: checking a local file storage; And
If the local file store is included in the other component files, retrieving the other component files from the local store; further comprising:
Method for reading a component file.

As a file reader,
A parser for analyzing component files to obtain metadata, media data, and references;
A searcher for retrieving media data related to the media data from other component files in accordance with the reference; And
A processor for processing the retrieved media data from the metadata, media data, and other component files;
File reader.

The method of claim 29, wherein the media data of the component file is related to media data of other component files by a coding dependency.
File reader.

30. The apparatus of claim 29, wherein the media data and the related media data comprise data samples.
File reader.

32. The apparatus of claim 31, wherein the data sample comprises a network abstract layer unit.
File reader.

The processor of claim 29, wherein the processor comprises a video decoder.
File reader.

30. The apparatus of claim 29, wherein the analyzer further comprises means for obtaining the references.
File reader.

35. The apparatus of claim 34, wherein the analyzer further analyzes the media data to obtain inserted reference indices, and obtains corresponding references according to the reference indices,
File reader.

35. The system of claim 34, wherein the searcher further searches the other component files simultaneously in accordance with the obtained references.
File reader.

37. The system of claim 36, wherein the searcher further checks a local file store and, if the local file store includes the other component files, retrieves the other component files from the local store,
File reader.