JP2013532441A

JP2013532441A - Method and apparatus for encapsulating encoded multi-component video

Info

Publication number: JP2013532441A
Application number: JP2013515413A
Authority: JP
Inventors: ウー，チエンユ; フアズー，リ
Original assignee: Thomson Licensing SAS
Current assignee: Thomson Licensing SAS
Priority date: 2010-06-14
Filing date: 2011-06-13
Publication date: 2013-08-15
Also published as: WO2011159605A1; BR112012031874A2; KR20130088035A; CN103098484A; EP2580920A1

Abstract

複数のレイヤを含むメディア・エンティティを複数のコンポーネント・ファイルにカプセル化する（レイヤごとに１つのコンポーネント・ファイルにカプセル化する）方法および装置について、それに対応するコンポーネント・ファイルを読み取る方法および装置とともに説明する。ＩＳＯＢＭＦＦの新たなボックス、およびＳＶＣ／ＭＶＣファイル・フォーマットの抽出器データ構造の拡張を、提案する。この新たなボックスにより、参照したコンポーネント・ファイルに対するアクセスを、現在のコンポーネント・ファイルの処理と並行して行うことが可能になる。本発明の抽出器の拡張により、異なるコンポーネント・ファイルにまたがるＮＡＬユニットの参照が可能になる。本発明は、メディア・ファイルの適応型ＨＴＴＰストリーミングを可能にする。 Describes a method and apparatus for encapsulating a media entity including multiple layers into multiple component files (encapsulating into one component file per layer) along with a method and apparatus for reading the corresponding component file To do. A new box of ISO BMFF and an extension of the extractor data structure of the SVC / MVC file format are proposed. This new box allows access to the referenced component file in parallel with the processing of the current component file. The extractor extension of the present invention allows for referencing NAL units across different component files. The present invention enables adaptive HTTP streaming of media files.

Description

（関連出願の相互参照）
本特許出願は、２０１０年６月１４日出願の「ＥｘｔｅｎｓｉｏｎｔｏｔｈｅＥｘｔｒａｃｔｏｒｄａｔａｓｔｒｕｃｔｕｒｅｏｆＳＶＣ／ＭＶＣｆｉｌｅｆｏｒｍａｔｓ」と題する米国仮特許出願第６１／３５４，４２２号、および２０１０年６月１４日出願の「ＳｏｍｅｅｘｔｅｎｓｉｏｎｓｆｏｒＩＳＯＢａｓｅＭｅｄｉａＦｉｌｅＦｏｒｍａｔｆｏｒＨＴＴＰｓｔｒｅａｍｉｎｇ」と題する米国仮特許出願第６１／３５４，４２４号の優先権の利益を主張するものである。 (Cross-reference of related applications)
This patent application is filed on June 14, 2010, entitled “Extension to the Extractor data structure of SVC / MVC file formats” and US Provisional Patent Application No. 61 / 354,422, filed June 14, 2010. It claims the benefit of the priority of US Provisional Patent Application No. 61 / 354,424 entitled “Some extensions for ISO Base Media File Format for HTTP Streaming”.

本願は、本願と同時に出願された「ＭｅｔｈｏｄａｎｄＡｐｐａｒａｔｕｓｆｏｒＥｎｃａｐｓｕｌａｔｉｎｇＣｏｄｅｄＭｕｌｔｉ−ｃｏｍｐｏｎｅｎｔＶｉｄｅｏ」と題する、同時係属の所有者が共通の米国特許出願第＿／＿号（代理人番号ＰＵ１００１４０）に関連するものである。 This application is related to United States Patent Application No. __ / _ (Attorney No. PU100140), co-owned owner, entitled “Method and Apparatus for Encapsulating Coded Multi-component Video” filed concurrently with this application. It is.

本発明は、一般に、ＨＴＴＰストリーミングに関する。さらに詳細には、本発明は、ＨＴＴＰストリーミングのスケーラブル・ビデオ符号化（ＳＶＣ）ストリームやマルチビュー符号化（ＭＶＣ）ストリームなどの符号化マルチコンポーネント・ビデオ・ストリームのメディア・エンティティのカプセル化に関する。 The present invention generally relates to HTTP streaming. More particularly, the invention relates to the encapsulation of media entities of encoded multi-component video streams, such as scalable video coding (SVC) streams and multi-view coding (MVC) streams of HTTP streaming.

ＨＴＴＰストリーミングの分野では、サーバ側で、ＭＰ４ファイルなどのＢＭＦＦに準拠したファイルとして、符号化ビデオをカプセル化して記憶することが多い。さらに、適応型ＨＴＴＰストリーミングを実現するために、通常は、ファイルは、複数の動画フラグメントに分割され、これらのフラグメントが、さらに、クライアントＵＲＬ要求によってアドレス可能な複数のセグメントにグループ化される。実際には、これらのセグメントには、ビデオ・コンテンツの様々な符号化表現が記憶され、クライアントが、所望の表現を動的に選択して、セッション中にダウンロードして再生することができるようになっている。 In the field of HTTP streaming, on the server side, encoded video is often encapsulated and stored as a BMFF-compliant file such as an MP4 file. In addition, to achieve adaptive HTTP streaming, a file is typically divided into a plurality of video fragments that are further grouped into segments that can be addressed by a client URL request. In practice, these segments store various encoded representations of the video content so that the client can dynamically select the desired representation to download and play during the session. It has become.

ＳＶＣやＭＶＣのビットストリームなど、符号化された階層型ビデオは、ビットストリームの様々なサブセットを復号することによって、時間的／空間的解像度、画質、ビューなどに関して様々な動作点すなわち表現を可能にすることにより、このようなビットレート適応の自然なサポートを実現する。しかし、ＭＰ４ファイル・フォーマットなど、既存のＩＳＯＢａｓｅＭｅｄｉａＦｉｌｅＦｏｒｍａｔ（ＢＭＦＦ）規格は、各レイヤまたは表現への個別アクセスに対応していないので、ＨＴＴＰストリーミング分野に適用することができない。図１に示すように、ＭＰ４ファイル・フォーマットでは、１つのメディア・ファイルの全てのレイヤまたは表現のメタデータは、ｍｏｏｖ動画ボックスに記憶され、一方、全てのレイヤまたは表現のメディア・コンテンツ・データは、ｍｄａｔ動画ボックスに記憶される。ＨＴＴＰストリーミングでは、クライアントが１つのレイヤを要求したとき、全てのレイヤまたは表現が混ざった状態であり、所要のレイヤまたは表現がどこで見つかるのかクライアントには分からないので、ファイル全体を送信しなければならない。 Encoded hierarchical video, such as SVC and MVC bitstreams, allows different operating points or representations in terms of temporal / spatial resolution, image quality, views, etc. by decoding different subsets of the bitstream By doing so, natural support for such bit rate adaptation is realized. However, the existing ISO Base Media File Format (BMFF) standard, such as the MP4 file format, does not support individual access to each layer or expression, and therefore cannot be applied to the HTTP streaming field. As shown in FIG. 1, in the MP4 file format, the metadata for all layers or representations of one media file is stored in the moov video box, while the media content data for all layers or representations is , Mdat video box. In HTTP streaming, when a client requests one layer, all layers or representations are mixed and the client does not know where to find the required layer or representation, so the entire file must be sent .

後述のように、適応型ＨＴＴＰストリーミングの分野では、ネットワーク抽象化レイヤ（ＮＡＬ）ユニットなどのメディア・データ・サンプルを、動画フラグメントまたはコンポーネント・ファイルの境界をまたいで参照することができることが望ましい。ＳＶＣ／ＭＶＣの状況では、このような参照は、「抽出器」などの機構を用いることによって実現することができる。「抽出器」は、以下のＢＭＦＦのＡＶＣファイル・フォーマット拡張に対するＳＶＣ／ＭＶＣ補正に定義される、内部ファイル・データ構造である。ＩｎｆｏｒｍａｔｉｏｎＴｅｃｈｎｏｌｏｇｙ − ｃｏｄｉｎｇｏｆａｕｄｉｏ−ｖｉｓｕａｌｏｂｊｅｃｔｓ − Ｐａｒｔ１５：ＡｄｖａｎｃｅｄＶｉｄｅｏＣｏｄｉｎｇ（ＡＶＣ）Ｆｉｌｅｆｏｒｍａｔ，Ａｍｅｎｄｍｅｎｔ２：ＦｉｌｅｆｏｒｍａｔｓｕｐｐｏｒｔｆｏｒＳｃａｌａｂｌｅＶｉｄｅｏＣｏｄｉｎｇ，２００８，１５〜１７ページ。抽出器は、コピーを行わずに、参照によってその他のトラックからＮＡＬユニットを抽出することを可能にするように設計されている。ここで、トラックとは、ＩＳＯベース・メディア・ファイル中の関連するサンプルの時限シーケンスである。メディア・データでは、トラックは、画像またはサンプリングした音声のシーケンスに対応する。抽出器のシンタックスを、以下に示す。
class aligned(8) Extractor () {
NALUnitHeader( );
unsigned int(8) track_ref_index;
signed int(8) sample_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_length;
} As described below, in the field of adaptive HTTP streaming, it is desirable to be able to reference media data samples, such as network abstraction layer (NAL) units, across video fragment or component file boundaries. In the SVC / MVC context, such a reference can be achieved by using a mechanism such as an “extractor”. "Extractor" is an internal file data structure defined in SVC / MVC correction for the following BMFF AVC file format extension. Information Technology-coding of audio-visual objects-Part 15: Advanced Video Coding (AVC) File format, Amendment 2: File format superV. The extractor is designed to allow NAL units to be extracted from other tracks by reference without copying. Here, a track is a timed sequence of related samples in an ISO base media file. In media data, a track corresponds to a sequence of images or sampled audio. The syntax of the extractor is shown below.
class aligned (8) Extractor () {
NALUnitHeader ();
unsigned int (8) track_ref_index;
signed int (8) sample_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_length;
}

抽出器データ構造のセマンティクスは、以下の通りである。
ＮＡＬＵｎｉｔＨｅａｄｅｒ：タイプ２０のＮＡＬユニットのＩＳＯ／ＩＥＣ１４４９６−１０ＡｎｎｅｘＧに指定されるＮＡＬユニット構造：
ｎａｌ＿ｕｎｉｔ＿ｔｙｐｅは、抽出器ＮＡＬユニット・タイプ（タイプ３１）に設定されるものとする。
ｆｏｒｂｉｄｄｅｎ＿ｚｅｒｏ＿ｂｉｔ、ｒｅｓｅｒｖｅｄ＿ｏｎｅ＿ｂｉｔおよびｒｅｓｅｒｖｅｄ＿ｔｈｒｅｅ＿２ｂｉｔｓは、ＩＳＯ／ＩＥＣ１４４９６−１０ＡｎｎｅｘＧに指定されるように設定されるものとする。
その他のフィールド（ｎａｌ＿ｒｅｆ＿ｉｄｃ、ｉｄｒ＿ｆｌａｇ、ｐｒｉｏｒｉｔｙ＿ｉｄ、ｎｏ＿ｉｎｔｅｒ＿ｌａｙｅｒ＿ｐｒｅｄ＿ｆｌａｇ、ｄｅｐｅｎｄｅｎｃｙ＿ｉｄ、ｑｕａｌｉｔｙ＿ｉｄ、ｔｅｍｐｏｒａｌ＿ｉｄ、ｕｓｅ＿ｒｅｆ＿ｂａｓｅ＿ｐｉｃ＿ｆｌａｇ、ｄｉｓｃａｒｄａｂｌｅ＿ｆｌａｇおよびｏｕｔｐｕｔ＿ｆｌａｇ）は、Ｂ．４ｏｆＩｎｆｏｒｍａｔｉｏｎｔｅｃｈｎｏｌｏｇｙ − Ｃｏｄｉｎｇｏｆａｕｄｉｏ−ｖｉｓｕａｌｏｂｊｅｃｔｓ − ｐａｒｔ１５：ＡｄｖａｎｃｅｄＶｉｄｅｏＣｏｄｉｎｇ（ＡＶＣ）ｆｉｌｅｆｏｒｍａｔ，Ａｍｅｎｄｍｅｎｔ２：ＦｉｌｅｆｏｒｍａｔｓｕｐｐｏｒｔｆｏｒＳｃａｌａｂｌｅＶｉｄｅｏＣｏｄｉｎｇ，ＩＳＯ／ＩＥＣ１４４９６−１５：２００４／Ａｍｄ．２：２００８，１７ページに指定されるように設定されるものとする。 The semantics of the extractor data structure are as follows:
NALunitHeader: NAL unit structure specified in ISO / IEC 14496-10 Annex G of type 20 NAL unit:
nal_unit_type shall be set to extractor NAL unit type (type 31).
Forbidden_zero_bit, reserved_one_bit, and reserved_three_2 bits shall be set as specified in ISO / IEC 14496-10 Annex G.
Other fields (nal_ref_idc, idr_flag, priority_id, no_inter_layer_pred_flag, dependency_id, quality_id, temporal_id, use_ref_base_pic_flag, discardpable_bladtable_bladg. 4 of Information technology-Coding of audio-visual objects-part 15: Advanced Video Coding (AVC) file format, Amendment 2: File format, Amendment 2: File format 2: 2008, page 17 is set as specified.

ｔｒａｃｋ＿ｒｅｆ＿ｉｎｄｅｘは、データを抽出するトラックを発見するために使用するタイプ「ｓｃａｌ」のトラック参照の指標を指定する。データが抽出されるトラック中のサンプルは、メディア復号タイムライン中で、すなわち時間／サンプル・テーブルのみを使用して、抽出器を含むサンプルと時間的に整列される、またはそれより先行する最も近い位置に位置し、ｓａｍｐｌｅ＿ｏｆｆｓｅｔによって指定されるオフセットだけ調節される。第１のトラック参照は、指標値１を有する。値０は予約される。 track_ref_index specifies an index of a track reference of type “scal” used to find the track from which data is extracted. The sample in the track from which the data is extracted is closest in time before or ahead of the sample containing the extractor in the media decoding timeline, i.e., using only the time / sample table. Located in position and adjusted by the offset specified by sample_offset. The first track reference has an index value of 1. The value 0 is reserved.

ｓａｍｐｌｅ＿ｏｆｆｓｅｔは、情報源として使用すべきリンクされたトラック中のサンプルの相対指標を与える。サンプル０（ゼロ）は、抽出器を含むサンプルの復号時間と比較して、同じ復号時間またはそれより先行する最も近い復号時間を有するサンプルである。サンプル１（イチ）は、次のサンプルであり、サンプル−１（マイナス１）は、前のサンプルである。以下同様に続く。 sample_offset gives a relative indication of the samples in the linked track to be used as an information source. Sample 0 (zero) is the sample that has the same decoding time or the closest preceding decoding time as compared to the decoding time of the sample containing the extractor. Sample 1 (one) is the next sample, and sample-1 (minus 1) is the previous sample. The same goes for the following.

ｄａｔａ＿ｏｆｆｓｅｔ：コピーする参照サンプル内の最初のバイトのオフセットである。サンプル中のデータの最初のバイトから抽出が開始される場合には、オフセットは、値０をとる。オフセットは、ＮＡＬユニット長さフィールドの冒頭を参照するものとする。 data_offset: the offset of the first byte in the reference sample to be copied. If the extraction starts from the first byte of data in the sample, the offset takes the value 0. The offset shall refer to the beginning of the NAL unit length field.

ｄａｔａ＿ｌｅｎｇｔｈ：コピーするバイト数である。このフィールドが値０をとる場合には、参照した１つのＮＡＬユニットの全体がコピーされる（すなわち、コピーする長さは、データ・オフセットによって参照した長さフィールドからとり、Ａｇｇｒｅｇａｔｏｒの場合にはａｄｄｉｔｉｏｎａｌ＿ｂｙｔｅｓフィールドで増補する）。 data_length: the number of bytes to be copied. If this field takes the value 0, the entire referenced NAL unit is copied (ie, the length to be copied is taken from the length field referenced by the data offset, and in the case of an aggregator, additional_bytes. Augment in the field).

さらなる詳細は、Ｉｎｆｏｒｍａｔｉｏｎｔｅｃｈｎｏｌｏｇｙ − Ｃｏｄｉｎｇｏｆａｕｄｉｏ−ｖｉｓｕａｌｏｂｊｅｃｔｓ − ｐａｒｔ１５：ＡｄｖａｎｃｅｄＶｉｄｅｏＣｏｄｉｎｇ（ＡＶＣ）ｆｉｌｅｆｏｒｍａｔ，Ａｍｅｎｄｍｅｎｔ２：ＦｉｌｅｆｏｒｍａｔｓｕｐｐｏｒｔｆｏｒＳｃａｌａｂｌｅＶｉｄｅｏＣｏｄｉｎｇ，ＩＳＯ／ＩＥＣ１４４９６−１５：２００４／Ａｍｄ．２：２００８に見ることができる。 Further details are: Information technology-Coding of audio-visual objects-part 15: Advanced Video Coding (AVC) file format, Amendment 2: File format, Amendment 2: File format. 2: 2008 can be seen.

現在のところ、抽出器は、参照によってＮＡＬユニットを他のトラックから抽出することはできるが、同じ動画ボックス／フラグメント内から抽出することしかできない。換言すれば、抽出器を使用しても、別のセグメントまたはファイルからＮＡＬユニットを抽出することはできない。この制約により、抽出器の使用は上記の使用事例に制限されている。 Currently, the extractor can extract NAL units from other tracks by reference, but only from within the same video box / fragment. In other words, using an extractor cannot extract NAL units from another segment or file. This restriction limits the use of the extractor to the use case described above.

クライアントがサーバから１つのメディア・コンテンツの１つまたは複数のコンテンツ・コンポーネントを既にダウンロードしており、別のコンテンツ・コンポーネントをダウンロードするプロセスに入っている場合、クライアントは、必要に応じて完全なコンポーネント・セットをダウンロードするその他の要求を行うことができるように、以前にダウンロードしたコンテンツ・コンポーネントが、新たなコンテンツ・コンポーネントの依存性コンポーネントのセットの中に含まれるかどうかを知る必要がある。この使用事例でも、外部依存性コンテンツ・コンポーネントおよびその位置情報を信号通信する機構が必要である。 If the client has already downloaded one or more content components of one media content from the server and is in the process of downloading another content component, the client can complete the component as needed It needs to know if the previously downloaded content component is included in the set of dependent components of the new content component so that other requests to download the set can be made. This use case also requires a mechanism for signaling externally dependent content components and their location information.

ＢＭＦＦでは、提示中の含んでいるトラックから別のトラックへの参照を実現するために使用される、「ｔｒｅｆ」と呼ばれるボックス・タイプがある。このボックスを使用して、トラック間の依存性を記述することができるが、その依存性は、同じメディア・ファイル内のトラックに限定される。 In BMFF there is a box type called “tref” that is used to realize a reference from the containing track being presented to another track. This box can be used to describe dependencies between tracks, but the dependencies are limited to tracks within the same media file.

１つの手法は、何らかの帯域外機構を用いてこの情報を信号通信するものである。例えば、ＨＴＴＰストリーミングの分野では、サーバは、セッションの開始前に、クライアントにマニフェスト・ファイルを送信することができる。マニフェスト・ファイルは、要求されたメディア・コンテンツの各コンテンツ・コンポーネントの依存性および位置情報を含むファイルである。この場合、クライアントは、必要な全てのコンポーネント・ファイルを要求することができる。ただし、この帯域外手法は、マニフェスト・ファイルが利用できないローカルなファイル再生には適用することができない。 One approach is to signal this information using some out-of-band mechanism. For example, in the field of HTTP streaming, the server can send a manifest file to the client before the session starts. The manifest file is a file that includes dependency and location information for each content component of the requested media content. In this case, the client can request all necessary component files. However, this out-of-band technique cannot be applied to local file playback where a manifest file is not available.

上述の問題に対する従来の解決策は、当技術分野ではまだ十分に確立されていない。速度およびトランスポート効率を犠牲にすることなく、複数のレイヤを構文解析してカプセル化することができるようになれば、望ましいであろう。このような成果は、当技術分野では、これまでのところまだ達成されていない。 Conventional solutions to the above problems are not yet well established in the art. It would be desirable if multiple layers could be parsed and encapsulated without sacrificing speed and transport efficiency. Such results have not been achieved so far in the art.

本発明は、複数のレイヤを含むメディア・エンティティからコンポーネント・ファイルをカプセル化し、コンポーネント・ファイルを読み取る、方法および装置に関する。 The present invention relates to a method and apparatus for encapsulating and reading a component file from a media entity that includes multiple layers.

本発明の１つの態様によれば、複数のレイヤを含むメディア・エンティティからコンポーネント・ファイルをカプセル化して作成する方法が提供される。この方法では、メディア・エンティティから、各レイヤに、メタデータを抽出し、その抽出したメタデータに対応するメディア・データを抽出する。抽出したメディア・データとメタデータとを関連付けて、抽出したメタデータおよび抽出したメディア・データを含むコンポーネント・ファイルを各レイヤに作成することを可能にする。 According to one aspect of the invention, a method is provided for encapsulating and creating a component file from a media entity that includes multiple layers. In this method, metadata is extracted from the media entity to each layer, and media data corresponding to the extracted metadata is extracted. The extracted media data and the metadata are associated with each other, and a component file including the extracted metadata and the extracted media data can be created in each layer.

本発明の別の態様によれば、ファイル・カプセル化装置が提供される。このファイル・カプセル化装置は、メディア・エンティティから各レイヤにメタデータを抽出し、且つ抽出したメタデータに対応するメディア・データを抽出する抽出器と、抽出したメディア・データを抽出したメタデータと関連付けて、各レイヤにコンポーネント・ファイルを作成することを可能にする相関器とを含む。 According to another aspect of the invention, a file encapsulation device is provided. The file encapsulating apparatus extracts metadata from a media entity to each layer and extracts media data corresponding to the extracted metadata; and metadata obtained by extracting the extracted media data; And a correlator that makes it possible to create a component file for each layer.

本発明の上記の特徴は、添付の図面を参照しながらその例示的な実施例について詳細に説明することにより、さらに明らかになるであろう。 The above features of the present invention will become more apparent from the detailed description of exemplary embodiments thereof with reference to the accompanying drawings.

例示的なＭＰ４ファイル・フォーマットを示す図である。FIG. 3 illustrates an exemplary MP4 file format. メディア・エンティティをカプセル化する本発明の一実施例を示す図である。FIG. 3 illustrates an embodiment of the present invention that encapsulates a media entity. 複数のレイヤ／表現を含むメディア・エンティティからコンポーネント・ファイルをカプセル化または作成するために使用されるカプセル化装置の構造を示す図である。FIG. 2 illustrates the structure of an encapsulating device used to encapsulate or create a component file from a media entity that includes multiple layers / representations. 依存関係に基づいて追加メディア・データをコンポーネント・ファイルと関連付ける一例を示す図である。It is a figure which shows an example which associates additional media data with a component file based on a dependency. 抽出器が存在する動画ボックス／フラグメントとは異なる動画ボックス／フラグメントから参照によってＮＡＬユニットを抽出する一例を示す図である。It is a figure which shows an example which extracts a NAL unit by reference from the moving image box / fragment different from the moving image box / fragment which an extractor exists. 本発明の新たな抽出器データ構造の１つを用いた、ＳＶＣ／ＭＶＣタイプ・ビデオ・ビットストリームから複数のコンポーネント・ファイルへの関連するカプセル化動作を示す図である。FIG. 6 illustrates the associated encapsulation operation from a SVC / MVC type video bitstream to multiple component files using one of the new extractor data structures of the present invention. コンポーネント・ファイルを読み取るために使用されるファイル読取り装置の構造を示す図である。FIG. 2 shows the structure of a file reader used to read a component file. 本発明の一実施例を含むビデオ・デコーダの、カプセル化されたコンポーネント・ファイルを読み取るプロセスを示す図である。FIG. 6 illustrates a process for reading an encapsulated component file of a video decoder that includes an embodiment of the present invention. 別の好ましい新たな抽出器データ構造を用いた、ＳＶＣ／ＭＶＣタイプ・ビデオ・ビットストリームから複数の動画フラグメントへのカプセル化動作を示す図である。FIG. 5 illustrates an encapsulation operation from an SVC / MVC type video bitstream into multiple video fragments using another preferred new extractor data structure. 本発明の別の実施例を含むビデオ・デコーダの、カプセル化されたコンポーネント・ファイルを読み取るプロセスを示す図である。FIG. 6 illustrates a process for reading an encapsulated component file of a video decoder that includes another embodiment of the present invention.

本発明では、メディア・ファイルもしくはメディア・ファイル・セットまたはストリーミング・メディアなどのメディア・エンティティを、クライアントＵＲＬ要求によってアドレス可能な複数の動画コンポーネント・ファイルに分割またはカプセル化する。ここでは、コンポーネント・ファイルは、フラグメント、セグメント、ファイル、およびその他のそれらと等価な用語を表す広い意味で用いられる。 In the present invention, a media entity, such as a media file or media file set or streaming media, is divided or encapsulated into a plurality of video component files that are addressable by a client URL request. Here, component files are used in a broad sense to represent fragments, segments, files, and other equivalent terms.

本発明の一実施例では、複数の表現またはコンポーネントを含むメディア・エンティティを構文解析して、各表現／コンポーネントのメタデータおよびメディア・データを抽出する。この表現／コンポーネントの例としては、様々な時間的／空間的解像度を有するレイヤなどのレイヤ、ＳＶＣの画質レイヤなどのレイヤ、およびＭＶＣのビューなどがある。以下では、レイヤも、表現／コンポーネントを指すために使用され、これらの用語は、入れ替え可能である。メタデータは、例えば、各表現のメディア・エンティティに何が含まれているか、またそこに含まれるメディア・データをどのように使用するかを記述している。メディア・データは、例えばコンテンツの復号など、メディア・データの目的を実施するのに必要なメディア・データ・サンプル、または所要のデータ・サンプルを取得する方法に関する任意の必要な情報を含む。各表現またはレイヤの抽出されたメタデータおよびメディア・データは、関連付け／相関付けされ、ユーザのアクセスに備えて一緒に記憶される。記憶動作は、ハード・ドライブまたはその他の記憶媒体上で物理的に行ってもよいし、あるいは、メタデータおよびメディア・データが実際には記憶媒体上の異なる場所に位置しているときでも、その他のアプリケーションまたはモジュールとのインタフェースをとったときに、それらが一緒に記憶されているように見えるように、関係管理機構を介して仮想的に実行してもよい。図２は、この実施例の例を示す。図２では、メディア・エンティティは、ベース・レイヤ、エンハンスメント・レイヤ１、およびエンハンスメント・レイヤ２の３つのレイヤを含む。メディア・エンティティを構文解析して、３つのレイヤそれぞれのメタデータおよびメディア・データを抽出し、これらのデータを、関連付けられたメタデータと対応するメディア・データとを備えるコンポーネント・ファイルとして、別々に記憶する。 In one embodiment of the invention, a media entity that includes multiple representations or components is parsed to extract metadata and media data for each representation / component. Examples of this representation / component include layers such as layers with various temporal / spatial resolutions, layers such as SVC image quality layers, and MVC views. In the following, layers are also used to refer to representations / components, and these terms are interchangeable. The metadata describes, for example, what is contained in the media entity of each representation and how to use the media data contained therein. The media data includes any necessary information regarding the media data samples necessary to perform the purpose of the media data, such as decryption of the content, or how to obtain the required data samples. The extracted metadata and media data for each representation or layer is correlated / correlated and stored together for user access. Storage operations may be performed physically on a hard drive or other storage medium, or other, even when metadata and media data are actually located at different locations on the storage medium When interfacing with other applications or modules, they may be executed virtually via a relationship management mechanism so that they appear to be stored together. FIG. 2 shows an example of this embodiment. In FIG. 2, the media entity includes three layers: a base layer, an enhancement layer 1 and an enhancement layer 2. Parse the media entity to extract metadata and media data for each of the three layers, and separate these data as component files with associated metadata and corresponding media data Remember.

図３は、ＳＶＣ符号化ビデオなど、複数のレイヤを含むメディア・エンティティからコンポーネント・ファイルをカプセル化して作成するために使用される好ましいカプセル化装置３００の構造を示す。入力メディア・エンティティ３１０は、メタデータ抽出器３２０およびメディア・データ抽出器３４０に渡される。メタデータ抽出器３２０は、各レイヤのメタデータ３３０を抽出する。メディア・データ抽出器３４０は、メタデータ３３０を取り込み、対応するメディア・データ３５０を抽出する。なお、別の実施例では、メタデータ抽出器３２０およびメディア・データ抽出器３４０は、１つの抽出器として実施されることに留意されたい。メタデータ３３０およびメディア・データ３５０の両データは、これら２つのタイプのデータを関連付けて出力コンポーネント・ファイル３９０を作成する相関器３８０に送られる。コンポーネント・ファイルは、各レイヤに１つ作成される。 FIG. 3 shows the structure of a preferred encapsulation device 300 used to encapsulate and create a component file from a media entity that includes multiple layers, such as SVC encoded video. Input media entity 310 is passed to metadata extractor 320 and media data extractor 340. The metadata extractor 320 extracts the metadata 330 of each layer. The media data extractor 340 takes in the metadata 330 and extracts the corresponding media data 350. It should be noted that in another embodiment, metadata extractor 320 and media data extractor 340 are implemented as one extractor. Both metadata 330 and media data 350 are sent to a correlator 380 that associates these two types of data to create an output component file 390. One component file is created for each layer.

ＳＶＣまたはＭＶＣのＡＶＣ拡張によって符号化されたビデオなど、階層型ビデオは、複数のメディア・コンポーネント（スケーラブルなレイヤまたはビュー）を含む。このような符号化ビットストリームは、ビットストリームの様々なサブセットを復号することによって、時間的／空間的解像度、画質、ビューなどに関して様々な動作点すなわち表現またはレイヤを提供することができる。さらに、ビットストリームのレイヤ間には、符号化依存性がある、すなわち、１つのレイヤの復号が、他のレイヤに依存する場合がある。従って、このようなビットストリームの表現のうちの１つを要求する際には、カプセル化ビデオ・ファイルから１つまたは複数のコンポーネントまたはメディア・データを取り出して復号する必要がある場合がある。様々な表現の抽出プロセスを容易にするために、符号化階層型ビデオは、各レイヤが異なるセグメントまたはコンポーネント・ファイルに別々に記憶されるような方法でＭＰ４ファイルにカプセル化されることが多い。この場合には、上述の復号依存性またはアプリケーションによるその他の依存性によって、ＮＡＬユニットなど、ビットストリームの特定のメディア・データ・サンプルが、複数のセグメントまたはコンポーネント・ファイルによって必要とされる、またはそれらと関係付けられることを考慮する必要がある。 Hierarchical video, such as video encoded with an SVC or MVC AVC extension, includes multiple media components (scalable layers or views). Such an encoded bitstream can provide different operating points or representations or layers with respect to temporal / spatial resolution, image quality, view, etc. by decoding different subsets of the bitstream. Furthermore, there is an encoding dependency between the layers of the bitstream, that is, decoding of one layer may depend on other layers. Thus, when requesting one of such bitstream representations, it may be necessary to retrieve and decode one or more components or media data from the encapsulated video file. To facilitate the various representation extraction processes, encoded hierarchical video is often encapsulated in MP4 files in such a way that each layer is stored separately in a different segment or component file. In this case, due to the decoding dependencies described above or other dependencies by the application, a particular media data sample of the bitstream, such as a NAL unit, is required by multiple segments or component files, or Need to be taken into account.

本発明の別の実施例では、セグメントまたはコンポーネント・ファイルによって必要とされる追加メディア・データを抽出して、当該セグメントまたはコンポーネント・ファイルと関連付ける。図４は、この実施例の例を示す。この図では、ＳＶＣビットストリームは、３つの空間レイヤ、ＨＤ１０８０ｐ、ＳＤおよびＱＶＧＡを有する。３つの動画フラグメントまたはコンポーネント・ファイルは、これら３つの動作点に対応して形成され、それぞれが異なるＵＲＬによってアドレス可能である。各動画フラグメントまたはコンポーネント・ファイル内で、復号に必要な全てのメディア・データ・サンプル（この例ではＮＡＬユニット）がコピーされ、「ｍｄａｔ」ボックスに収納されるメディア・サンプルとして記憶される。そのため、クライアントが適切なＵＲＬを用いて特定の動作点または表現を要求したとき、サーバは、対応する動画フラグメントまたはコンポーネント・ファイルを取り出して、クライアントに転送することができる。この実施例では、図３のメディア・データ抽出器３４０は、さらに各レイヤに、入力メディア・エンティティ３１０から、各レイヤに抽出したメディア・データに関係する追加メディア・データを抽出する。相関器３８０は、さらに、この各レイヤの追加の抽出メディア・データを関連付けて、対応するコンポーネント・ファイルを作成する。 In another embodiment of the invention, additional media data required by the segment or component file is extracted and associated with the segment or component file. FIG. 4 shows an example of this embodiment. In this figure, the SVC bitstream has three spatial layers, HD 1080p, SD and QVGA. Three video fragments or component files are formed corresponding to these three operating points, each addressable by a different URL. Within each video fragment or component file, all media data samples (NAL units in this example) required for decoding are copied and stored as media samples stored in the “mdat” box. Thus, when a client requests a specific operating point or representation using an appropriate URL, the server can retrieve the corresponding video fragment or component file and transfer it to the client. In this embodiment, the media data extractor 340 of FIG. 3 further extracts additional media data related to the media data extracted for each layer from the input media entity 310 for each layer. The correlator 380 further associates this additional extracted media data for each layer to create a corresponding component file.

記憶スペースを節約するために、各コンポーネント・ファイル内の同じデータを実際に複製することなく、複数の動画フラグメントまたはコンポーネント・ファイルの境界をまたいで、ＮＡＬユニットなどのメディア・データ・サンプルを参照することができることが望ましい。しかし、ＩＳＯＢａｓｅＭｅｄｉａＦｉｌｅＦｏｒｍａｔ（ＢＭＦＦ）およびその拡張では、現在のところこの機能に対応していない。この問題を解決するために、本発明のさらに別の実施例では、動画フラグメントまたはコンポーネント・ファイルのメディア・データと関係付けられる、またはこれによって必要とされる追加メディア・データについて、参照を識別して構築する。これらの追加メディア・データではなく、この参照を、そのメタデータおよびメディア・データとともにコンポーネント・ファイルと関連付ける。参照は、各レイヤの抽出メディア・データに埋め込むことができ、その後、各レイヤの抽出メタデータおよび抽出メディア・データを関連付けて、対応するコンポーネント・ファイルを作成することができる。 To save storage space, reference media data samples such as NAL units across multiple video fragment or component file boundaries without actually duplicating the same data in each component file It is desirable to be able to. However, the ISO Base Media File Format (BMFF) and its extensions do not currently support this function. To solve this problem, yet another embodiment of the present invention identifies references for additional media data related to or required by the media data of a video fragment or component file. And build. Instead of these additional media data, this reference is associated with the component file along with its metadata and media data. The reference can be embedded in each layer of extracted media data, and each layer's extracted metadata and extracted media data can then be associated to create a corresponding component file.

この実施例では、カプセル化装置３００の構造に参照識別器３６０が追加される。参照識別器３６０は、入力メディア・エンティティ３１０から、各レイヤの抽出メディア・データ３５０に関係するそれらの追加メディア・データに対する参照３７０を識別する。次いで、相関器３８０により、例えば参照３７０を抽出メディア・データ３５０に埋め込むことによって、参照３７０を各レイヤの抽出メタデータ３３０および抽出メディア・データ３５０と関連付けて、対応するコンポーネント・ファイル３９０を作成する。 In this embodiment, a reference identifier 360 is added to the structure of the encapsulation device 300. Reference identifier 360 identifies references 370 to those additional media data related to each layer of extracted media data 350 from input media entity 310. Correlator 380 then associates reference 370 with each layer of extracted metadata 330 and extracted media data 350, for example by embedding reference 370 in extracted media data 350, to create a corresponding component file 390. .

上述のように、ＳＶＣ／ＭＶＣの状況では、このような参照は、「抽出器」などの機構を用いることによって構築することができる。現在のところ、抽出器は、参照によってＮＡＬユニットを他のトラックから抽出することはできるが、同じ動画ボックス／フラグメント内から抽出することしかできない。換言すれば、抽出器を使用しても、別のセグメントまたはファイルからＮＡＬユニットを抽出することはできない。この制約により、他のケースの抽出器の使用は制限されている。以下、抽出器データ構造の拡張を開示するが、この拡張は、上述したＳＶＣ／ＭＶＣタイプの階層型ビデオ・コンテンツから複数のコンポーネント・ファイルへの効率的なカプセル化をサポートすることを目的とするものである。 As mentioned above, in the SVC / MVC context, such a reference can be constructed by using a mechanism such as an “extractor”. Currently, the extractor can extract NAL units from other tracks by reference, but only from within the same video box / fragment. In other words, using an extractor cannot extract NAL units from another segment or file. This restriction limits the use of extractors in other cases. In the following, an extension of the extractor data structure is disclosed, which is intended to support efficient encapsulation of the above-mentioned SVC / MVC type hierarchical video content into multiple component files. Is.

この拡張は、抽出器が存在する動画ボックス／フラグメントまたはコンポーネント・ファイルとは異なる動画ボックス／フラグメントまたはコンポーネント・ファイルに存在するＮＡＬユニットを参照する追加機能を有する抽出器データ構造を提供するために追加されるものである。 This extension was added to provide an extractor data structure with additional functionality to reference NAL units residing in a video box / fragment or component file that is different from the video box / fragment or component file in which the extractor resides It is what is done.

拡張された抽出器は、以下のように定義される。 The extended extractor is defined as follows:

シンタックス：
aligned (8) class DataEntryUrlBox (bit (24) flags)
extends FullBox ('url', version = 0, flags) {
string location;
}
aligned (8) class DataEntryUrnBox (bit (24) flags)
extends FullBox ('urn', version = 0, flags) {
string name;
string location;
}
class aligned (8) Extractor () {
NALUnitHeader ( );
DataEntryBox (entry_version, entry_flags) data_entry;// added extension
unsigned int(8) track_ref_index;
signed int(8) sample_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_length;
} Syntax:
aligned (8) class DataEntryUrlBox (bit (24) flags)
extends FullBox ('url', version = 0, flags) {
string location;
}
aligned (8) class DataEntryUrnBox (bit (24) flags)
extends FullBox ('urn', version = 0, flags) {
string name;
string location;
}
class aligned (8) Extractor () {
NALUnitHeader ();
DataEntryBox (entry_version, entry_flags) data_entry; // added extension
unsigned int (8) track_ref_index;
signed int (8) sample_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_length;
}

セマンティクス：
ｄａｔａ＿ｅｎｔｒｙは、統一資源ロケータ（ＵＲＬ）または統一資源名（ＵＲＮ）エントリである。名称はＵＲＮであり、ＵＲＮエントリで必要である。位置はＵＲＬであり、ＵＲＬエントリで必要であるが、ＵＲＮエントリでは任意選択であり、所与の名称を有する資源が見つかる位置を与える。それぞれはＵＴＦ−８文字を用いたナル終了文字列である。自立フラグがセットされた場合には、ＵＲＬフォームを使用し、文字列は存在せず、ボックスはエントリ・フラグ・フィールドで終了する。ＵＲＬタイプは、ファイルを配信するサービスのものとする。相対ＵＲＬは許容されるが、抽出器が属するトラックを含む動画ボックス／フラグメントを含むファイルに関するものである。 Semantics:
data_entry is a unified resource locator (URL) or unified resource name (URN) entry. The name is URN and is required in the URN entry. The location is a URL and is required in the URL entry, but is optional in the URN entry, giving the location where a resource with a given name is found. Each is a null-terminated character string using UTF-8 characters. If the independence flag is set, the URL form is used, there is no character string, and the box ends with the entry flag field. The URL type is a service for distributing files. Relative URLs are allowed, but relate to files containing video boxes / fragments containing the track to which the extractor belongs.

その他のフィールドは、前述の元の抽出器と同じセマンティクスを有する。 The other fields have the same semantics as the original extractor described above.

拡張された抽出器では、参照によって、抽出器が存在する動画ボックス／フラグメントとは異なる動画ボックス／フラグメントから、ＮＡＬユニットを抽出することができる。図５は、このような例を示し、ＳＶＣビットストリームは図４と同じであるが、新たな拡張された抽出器データ構造を用いている。この図から分かるように、ＳＤの動画フラグメントは、ＱＶＧＡの動画フラグメントのＮＡＬユニットを参照することができる。同様に、ＨＤ１０８０ｐの動画フラグメントは、抽出器を使用して、ＱＶＧＡおよびＳＤの両方の動画フラグメントのＮＡＬユニットを参照することができる。図４と比較すると、これらの動画フラグメントをまたいで複製されるＮＡＬユニットは存在せず、従って、記憶スペースが節約される。 In the extended extractor, by reference, the NAL unit can be extracted from a moving image box / fragment different from the moving image box / fragment in which the extractor exists. FIG. 5 shows such an example, where the SVC bitstream is the same as in FIG. 4 but uses a new extended extractor data structure. As can be seen from this figure, the NAL unit of the QVGA video fragment can be referred to for the SD video fragment. Similarly, HD 1080p video fragments can use an extractor to reference the NAL units of both QVGA and SD video fragments. Compared to FIG. 4, there are no NAL units replicated across these video fragments, thus saving storage space.

図６は、本発明の新しい抽出器データ構造を用いた、ＳＶＣ／ＭＶＣタイプのビデオ・ビットストリームから複数の動画フラグメントまたはコンポーネント・ファイルへの関連するカプセル化動作を示す。このプロセスは、ステップ６０１で開始される。ステップ６１０で、各ＮＡＬユニットを１つ１つ読み込む。ステップ６２０で、ビットストリームの末尾に到達した場合には、プロセスは、６９０で停止し、そうでない場合には、プロセスは、次のステップ６３０に進む。判断ステップ６３０では、現在のＮＡＬユニットが復号に際して他のトラックのＮＡＬユニットに依存するかどうかを判定する。この判定の結果が、現在のＮＡＬユニットが復号に際して他のトラックのＮＡＬユニットに依存しない場合には、制御はステップ６４０に移り、現在のＮＡＬユニットを用いてサンプルが形成され、現在のトラック中に配置される。ステップ６３０の判定の結果、現在のＮＡＬユニットと他のトラックのＮＡＬユニットの間に依存性がある場合には、プロセスは、ステップ６５０に進む。判断ステップ６５０では、現在のＮＡＬユニットによって必要とされるＮＡＬユニットのトラックが同じ動画フラグメント内に存在するかどうかをさらに判定する。この判定の結果、トラックが同じ動画フラグメント内に存在する場合には、ステップ６７０を利用して、拡張された抽出器に記入（ｆｉｌｌｉｎ）して、当該他のトラックのＮＡＬユニットを参照する。判定の結果、トラックが別の動画フラグメント内に存在する場合には、ステップ６６０で、この動画フラグメントのＵＲＬまたはＵＲＮを識別し、この識別したＵＲＬおよびＵＲＮを拡張された抽出器に記入するものとして、プロセスは、ステップ６７０に進む。この拡張された抽出器は、記入後に、ステップ６８０で現在のトラックに埋め込まれる。その後、ステップ６１０で、次のＮＡＬユニットのプロセスが開始される。 FIG. 6 illustrates the associated encapsulation operation from a SVC / MVC type video bitstream to multiple video fragments or component files using the new extractor data structure of the present invention. The process begins at step 601. In step 610, each NAL unit is read one by one. If the end of the bitstream is reached at step 620, the process stops at 690, otherwise the process proceeds to the next step 630. In decision step 630, it is determined whether the current NAL unit depends on the NAL unit of another track for decoding. If the result of this determination is that the current NAL unit does not depend on the NAL units of other tracks for decoding, control passes to step 640 where a sample is formed using the current NAL unit and Be placed. If the determination in step 630 shows that there is a dependency between the current NAL unit and the NAL unit of another track, the process proceeds to step 650. In decision step 650, it is further determined whether a track of the NAL unit required by the current NAL unit exists within the same video fragment. If the result of this determination is that the track is in the same video fragment, step 670 is used to fill in the expanded extractor and refer to the NAL unit of the other track. If the result of the determination is that the track is in another video fragment, then in step 660 the URL or URN of this video fragment is identified and the identified URL and URN are entered into the extended extractor. The process proceeds to step 670. This expanded extractor is embedded in the current track at step 680 after entry. Thereafter, in step 610, the process for the next NAL unit is started.

別の実施例では、相関器３８０によって、参照３７０が抽出メタデータ３３０に埋め込まれ、参照３７０の指標が抽出メディア・データ３５０に付加される。相関器３８０は、さらに、各レイヤのメタデータとメディア・データを関連付けて、対応するコンポーネント・ファイル３９０を作成する。ＩＳＯＭｅｄｉａＢａｓｅＦｉｌｅＦｏｒｍａｔの状況では、ＨＴＴＰストリーミング情報ボックスと呼ばれるボックスが開示されている。このボックスは、ＩＳＯファイルのＨＴＴＰストリーミングを補助することができる情報を含む。ＨＴＴＰストリーミング情報ボックスは、例えばファイルの冒頭など、できる限り早い段階でコンポーネント・ファイル中に配置されることが好ましい。このボックスは、サーバがクライアントに対するマニフェスト・ファイルを形成するときのソースとして機能することもできる。ＨＴＴＰストリーミング情報ボックスに含まれる、メディア参照ボックスと呼ばれる別のタイプのボックスも、開示されている。このボックスは、外部依存ファイルに関する情報を含む。抽出器構造は、さらに、異なるコンポーネント・ファイルをまたいで複数のメディア・サンプルを参照することができるように拡張される。メディア参照ボックスに含まれる情報は、抽出器が信号通信のオーバヘッドを節約するために利用することができる。 In another embodiment, correlator 380 embeds reference 370 in extracted metadata 330 and adds an index of reference 370 to extracted media data 350. The correlator 380 further associates the metadata of each layer with the media data to create a corresponding component file 390. In the situation of ISO Media Base File Format, a box called an HTTP streaming information box is disclosed. This box contains information that can assist HTTP streaming of ISO files. The HTTP streaming information box is preferably placed in the component file as early as possible, for example at the beginning of the file. This box can also serve as a source when the server creates a manifest file for the client. Another type of box, called a media reference box, included in an HTTP streaming information box is also disclosed. This box contains information about the external dependency file. The extractor structure is further extended to allow multiple media samples to be referenced across different component files. The information contained in the media reference box can be used by the extractor to save signaling overhead.

提案したＨＴＴＰストリーミング情報ボックス、メディア参照ボックスおよびさらに改良された抽出器の詳細な定義は、以下の通りである。 Detailed definitions of the proposed HTTP streaming information box, media reference box and further improved extractor are as follows.

ＨＴＴＰストリーミング情報ボックス
定義：
ボックス・タイプ：'hsin'
コンテナ：ファイル
必須：いいえ
量：０または１ HTTP streaming information box Definition:
Box type: 'hsin'
Container: File Required: No Quantity: 0 or 1

ＨＴＴＰストリーミング情報ボックスは、ＩＳＯメディア・ファイルのＨＴＴＰストリーミング動作を補助する。ＨＴＴＰストリーミング情報ボックスは、考えられる他のタイプのボックスの中でも特に、以下に定義するメディア参照ボックスを含む、ファイルのＨＴＴＰストリーミング配信に関する関連情報を含む。ＨＴＴＰストリーミング情報ボックスは、可用性を最大限に高めるために、できる限り早い段階でファイル中に配置することが好ましい。 The HTTP streaming information box assists the HTTP streaming operation of the ISO media file. The HTTP streaming information box contains relevant information regarding HTTP streaming delivery of files, including the media reference box defined below, among other possible types of boxes. The HTTP streaming information box is preferably placed in the file as early as possible in order to maximize availability.

シンタックス：
aligned(8) class HTTPStreamingInfoBox extends Box ('hsin') {
} Syntax:
aligned (8) class HTTPStreamingInfoBox extends Box ('hsin') {
}

メディア参照ボックス
定義：
ボックス・タイプ：'mref'
コンテナ：'hsin'
必須：いいえ
量：０または１ Media reference box Definition:
Box type: 'mref'
Container: 'hsin'
Required: No Quantity: 0 or 1

メディア参照ボックスは、ＨＴＴＰストリーミング情報ボックスに含まれ、当該ボックス内に含まれる各トラックが依存する外部ファイルの位置を宣言するＵＲＬの形態をしたデータ参照のテーブルを含む。このボックスを読み取ることにより、ファイル読取り装置は、外部コンポーネント・ファイルなど、ファイル中のトラックの外部依存ファイル・ソースと、それらを取り出すための手段とを識別することができる。 The media reference box is included in the HTTP streaming information box and includes a data reference table in the form of a URL that declares the location of an external file on which each track included in the box depends. By reading this box, the file reader can identify the external dependent file sources of the tracks in the file, such as external component files, and the means for retrieving them.

シンタックス：
aligned(8) class DataEntryUrlBox ( bit(24) flags ) extends Box ( 'url') {
string location;
}
aligned(8) class MediaReferenceBox extends Box ('mref' ) {
unsigned int(16) entry_count;
for ( i = 1;i <= entry_count;i++) {
unsigned int(32) track_ID;
unsigned int(16) dependent_source_count;
for ( j = 1;j <= dependent_source_count;j++){
DataEntryUrlBox data_entry;
}
}
} Syntax:
aligned (8) class DataEntryUrlBox (bit (24) flags) extends Box ('url') {
string location;
}
aligned (8) class MediaReferenceBox extends Box ('mref') {
unsigned int (16) entry_count;
for (i = 1; i <= entry_count; i ++) {
unsigned int (32) track_ID;
unsigned int (16) dependent_source_count;
for (j = 1; j <= dependent_source_count; j ++) {
DataEntryUrlBox data_entry;
}
}
}

セマンティクス：
ｅｎｔｒｙ＿ｃｏｕｎｔは、実際のエントリをカウントした整数である。
ｔｒａｃｋ＿ＩＤは、ボックスが適用されるファイル中のトラックを一意に識別する整数である。
ｄｅｐｅｎｄｅｎｔ＿ｓｏｕｒｃｅ＿ｃｏｕｎｔは、ｔｒａｃｋ＿ＩＤを有するファイル中のトラックが依存する外部メディア・ソースをカウントした整数である。
ｄａｔａ＿ｅｎｔｒｙは、指定されたトラックが依存する１つの外部メディア・ソースを指すＵＲＬエントリである。それぞれは、ＵＴＦ−８文字を用いたナル終了文字列である。ＵＲＬタイプは、ファイルを配信するサービスのものとする。相対ＵＲＬは許容されるが、このメディア参照ボックスを含むファイルに関するものである。 Semantics:
entry_count is an integer obtained by counting actual entries.
track_ID is an integer that uniquely identifies the track in the file to which the box is applied.
The dependent_source_count is an integer that counts the external media source on which the track in the file with track_ID depends.
data_entry is a URL entry that points to one external media source on which the specified track depends. Each is a null-terminated character string using UTF-8 characters. The URL type is a service for distributing files. Relative URLs are allowed, but relate to the file containing this media reference box.

上記で定義したメディア参照ボックスは、複数のレイヤを含むメディア・エンティティのＨＴＴＰストリーミングをいくつかの方法で容易にするように設計される。 The media reference box defined above is designed to facilitate HTTP streaming of media entities containing multiple layers in several ways.

第１に、メディア参照ボックスは、参照テーブルによってコンポーネント・ファイルの冒頭でコンポーネント・ファイル間の依存関係を明示的に信号通信することができる。従って、クライアントは、コンポーネント・ファイルのわずかな部分を一度ダウンロードした後は、その１つまたは複数のトラックの関連する外部コンポーネント・ファイルを全て知ることができ、必要に応じて、テーブルに含まれる参照により、対応する要求を行って、１つまたは複数の完全なセットを再生のために取得することができる。 First, the media reference box can explicitly signal dependencies between component files at the beginning of the component file by means of a lookup table. Thus, once a client has downloaded a small portion of a component file, it can know all the associated external component files for that track or tracks, and if necessary, references included in the table Can make a corresponding request to obtain one or more complete sets for playback.

第２に、このボックスのファイル内情報は、容易に抽出して、マニフェスト・ファイルに含めることができる。この情報をマニフェスト中に含めることで、クライアントが実際のＨＴＴＰストリーミングより前に、関連サービス情報を発見し、例えば全ての関連コンポーネント・ファイルの要求や必要なバッファ資源の割当てなど対応するサービス初期化を実行するのを助けることができる。 Secondly, the in-file information in this box can be easily extracted and included in the manifest file. By including this information in the manifest, the client discovers the relevant service information prior to the actual HTTP streaming and performs the corresponding service initialization, eg requesting all relevant component files and allocating necessary buffer resources. Can help to perform.

第３に、クライアントは、コンポーネント・ファイルとして既に配信されている別の表現を有する、表現の異なる何らかのマルチコンポーネント・メディア・コンテンツを要求するときに、ファイル中の対応するメディア参照ボックスを確認して、そのファイルが再利用可能な新たな表現の任意の依存性コンポーネントを含むかどうかを調べることができる。 Third, when a client requests some multi-component media content with a different representation that has another representation already delivered as a component file, it checks the corresponding media reference box in the file. , Whether the file contains any dependency components of the new representation that can be reused.

最後に、このボックスは、以下のように定義された拡張された抽出器構造の信号通信のオーバヘッドを低減するのに役立つ。 Finally, this box helps to reduce the signaling overhead of the extended extractor structure defined as follows.

抽出器
抽出器は、さらに、外部メディア・ファイルのトラックのデータを参照する機能を拡張することが提案されている。 Extractor The extractor has also been proposed to extend the ability to refer to track data in external media files.

拡張シンタックス：
class aligned(8) Extractor ( ) {
NALUnitHeader ( );
unsigned int(16) media_reference_index;
unsigned int(8) track_ref_index;
signed int(8) sample_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_length;
} Extended syntax:
class aligned (8) Extractor () {
NALUnitHeader ();
unsigned int (16) media_reference_index;
unsigned int (8) track_ref_index;
signed int (8) sample_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_offset;
unsigned int ((lengthSizeMinusOne + 1) * 8)
data_length;
}

セマンティクス：
ｍｅｄｉａ＿ｒｅｆｅｒｅｎｃｅ＿ｉｎｄｅｘ：抽出器を含むトラックと同じ関連するｔｒａｃｋ＿ＩＤの値を有するメディア参照ボックスに含まれる参照テーブルのエントリの指標を指定する。ｍｅｄｉａ＿ｒｅｆｅｒｅｎｃｅ＿ｉｎｄｅｘが０に等しい場合には、抽出器は、別のトラックのデータではあるが、抽出器と同じファイル内のデータを参照する。この場合には、このトラックと同じｔｒａｃｋ＿ＩＤの値を有するメディア参照ボックス内の参照テーブルは存在しないものとする。ｍｅｄｉａ＿ｒｅｆｅｒｅｎｃｅ＿ｉｎｄｅｘが、１から、メディア参照ボックスのトラックと関連付けられた参照テーブルのｄｅｐｅｎｄｅｎｔ＿ｓｏｕｒｃｅ＿ｃｏｕｎｔの値までの間である場合には、参照テーブルのｍｅｄｉａ＿ｒｅｆｅｒｅｎｃｅ＿ｉｎｄｅｘによって参照されるＵＲＬが外部ファイルを指し、この外部ファイルが、抽出器がデータを抽出するトラックを含む。 Semantics:
media_reference_index: specifies the index of the reference table entry contained in the media reference box with the same associated track_ID value as the track containing the extractor. If media_reference_index is equal to 0, the extractor refers to data in the same file as the extractor, although it is data of another track. In this case, it is assumed that there is no reference table in the media reference box having the same track_ID value as this track. If media_reference_index is between 1 and the value of dependent_source_count in the reference table associated with the track in the media reference box, the URL referenced by the media_reference_index in the reference table points to the external file, and this external file is The extractor includes a track from which data is extracted.

他のフィールドのセマンティクスは、元の抽出器の定義と同じままである。 The semantics of the other fields remain the same as in the original extractor definition.

このさらに拡張された抽出器構造では、抽出器を使用して、外部コンポーネント・ファイルに属するトラックのデータにリンクしてこれを抽出することができる。これは、ＳＶＣやＭＶＣで符号化されたものなど、１つの符号化されたマルチコンポーネント・メディア・コンテンツからのコンテンツ・コンポーネントを異なるコンポーネント・ファイルにカプセル化するときに、特に有用である。この拡張された抽出器では、ファイルの境界をまたいで抽出を行うことができる。これにより、異なるコンポーネント・ファイル中の同じデータを複製しなくても済む。 In this further expanded extractor structure, an extractor can be used to link and extract data for tracks belonging to an external component file. This is particularly useful when encapsulating content components from one encoded multi-component media content, such as those encoded with SVC or MVC, into different component files. This extended extractor can perform extraction across file boundaries. This eliminates the need to duplicate the same data in different component files.

図９は、開示したＨＴＴＰストリーミング情報ボックスおよびメディア参照ボックスならびに上記のさらに拡張された抽出器データ構造を用いた、ＳＶＣ／ＭＶＣタイプのビデオ・ビットストリームから複数の動画フラグメントまたはコンポーネント・ファイルへの関連するカプセル化動作を示す。このプロセスは、上述のボックスおよび抽出器のさらなる拡張によるいくつかの修正点を除けば、図６に示すプロセスと同様である。ステップ６６０で位置情報ＵＲＬ／ＵＲＮが識別された後で、ステップ９６５で、この位置情報を使用して、ｍｒｅｆボックス（メディア参照ボックス）中の参照テーブルに記入を行う。ステップ９７０で、さらに、参照テーブルの位置情報の指標を抽出器に記入する。次いで、抽出器を現在のトラックに埋め込む。ステップ６２０でビットストリームの末尾に到達している場合には、ｍｒｅｆボックスおよびそのコンテナｈｓｉｎボックス（ＨＴＴＰストリーミング情報ボックス）を、コンポーネント・ファイルのメタデータに埋め込む。 FIG. 9 shows the association of a SVC / MVC type video bitstream to multiple video fragments or component files using the disclosed HTTP streaming information box and media reference box and the above-extended extractor data structure described above. The encapsulating operation is shown. This process is similar to the process shown in FIG. 6 except for some modifications due to further expansion of the box and extractor described above. After the location information URL / URN is identified in step 660, in step 965, the location information is used to fill in the reference table in the mref box (media reference box). In step 970, the index of position information of the reference table is further written in the extractor. The extractor is then embedded in the current track. If the end of the bitstream has been reached in step 620, the mref box and its container hsin box (HTTP streaming information box) are embedded in the metadata of the component file.

コンポーネント・ファイルを読み取るために、図７に示すファイル読取り装置７００を利用する。パーサ７１０が、最初に、コンポーネント・ファイルを構文解析して、メタデータおよびメディア・データを取得し、入手可能であれば参照を取得する。復号した参照から、メディア・データが復号依存性などによって他のコンポーネント・ファイルのメディア・データに関係していることが分かった場合には、リトリーバ（ｒｅｔｒｉｅｖｅｒ）７２０が、この関連するメディア・データを、参照に示されている他のコンポーネント・ファイルから取り出す。さらに、プロセッサ７３０は、コンポーネント・ファイルから取得したメタデータおよびメディア・データを処理し、入手可能であれば追加メディア・データも処理する。パーサ７１０による構文解析動作は、プロセッサ７３０のために用意されたメタデータおよびメディア・データ、ならびにリトリーバ７２０のために用意された参照を取得するために必要な様々な動作を含む。この構文解析動作は、必要に応じて、メタデータおよび／またはメディア・データをさらに構文解析することを含む。一実施例では、参照は、メディア・データに埋め込まれるので、メディア・データを構文解析することによって取得される。参照が入手可能である場合には、構文解析ステップは、参照のシンタックスを分析すること、および参照を復号することをさらに含む。プロセッサ７３０は、コンポーネント・ファイルがビデオ・コンテンツを含む場合には、ビデオ・デコーダを含むことができる。別の実施例では、パーサおよびリトリーバをプロセッサに組み込むこともできる。 In order to read the component file, a file reader 700 shown in FIG. 7 is used. Parser 710 first parses the component file to obtain metadata and media data, and obtains a reference if available. If the decrypted reference reveals that the media data is related to the media data of other component files, such as by decryption dependencies, the retriever 720 can retrieve the associated media data. Retrieve from the other component file indicated in the reference. In addition, the processor 730 processes the metadata and media data obtained from the component file, and any additional media data if available. Parsing operations by parser 710 include the various operations necessary to obtain metadata and media data prepared for processor 730 and a reference prepared for retriever 720. This parsing operation includes further parsing the metadata and / or media data as needed. In one embodiment, the reference is obtained by parsing the media data as it is embedded in the media data. If the reference is available, the parsing step further includes analyzing the syntax of the reference and decoding the reference. The processor 730 can include a video decoder if the component file includes video content. In another embodiment, a parser and retriever may be incorporated into the processor.

図８は、本発明を含むビデオ・デコーダのＳＶＣ／ＭＶＣタイプ・ビデオ・ビットストリームを読み取るプロセスを示す。ステップ８０１で、コンポーネント・ビデオ・ファイルにアクセスし、ステップ８０５で、当該コンポーネント・ビデオ・ファイルの各レイヤのメタデータおよびメディア・データを識別する。ステップ８１０で、識別したメタデータおよびメディア・データを構文解析し、ステップ８１５で、メディア・データの各ＮＡＬユニットを１つ１つ読み込む。現在のＮＡＬユニットについて、ステップ８２０で最初に判断を行い、ビットストリームの末尾に到達しているかどうかを判定し、その結果が「はい」である場合には、プロセスは、ステップ８２５で終了する。そうでない場合には、プロセスは、判断ステップ８３０に進み、現在のＮＡＬユニットが抽出器であるかどうかを判定する。現在のＮＡＬユニットが抽出器ではない場合には、それは現在のＮＡＬユニットが復号用データを含む通常のＮＡＬユニットであることを意味するので、ステップ８３５で、このＮＡＬユニットをデコーダに送信する。現在のＮＡＬユニットが抽出器である場合には、ステップ８４０で、現在のＮＡＬユニットが同じコンポーネント・ファイル外のＮＡＬユニットに依存するかどうかを判定する。必要とされるＮＡＬユニットが同じコンポーネント・ファイル内にある場合には、ステップ８４５で、そのＮＡＬユニットを現在のファイルから取り出し、ステップ８３５で、デコーダに送信する。必要とされるＮＡＬユニットが別のコンポーネント・ファイルのものである場合には、ステップ８５０で、抽出器中の参照情報Ｄａｔａ＿ｅｎｔｒｙを用いてそのＮＡＬユニットを突き止め、ステップ８５５で遠隔ファイルから取り出し、その後、ステップ８３５でデコーダに送信する。 FIG. 8 illustrates the process of reading the SVC / MVC type video bitstream of the video decoder including the present invention. Step 801 accesses the component video file, and step 805 identifies the metadata and media data for each layer of the component video file. At step 810, the identified metadata and media data are parsed, and at step 815, each NAL unit of media data is read one by one. For the current NAL unit, a determination is first made at step 820 to determine if the end of the bitstream has been reached, and if the result is “yes”, the process ends at step 825. If not, the process proceeds to decision step 830 to determine if the current NAL unit is an extractor. If the current NAL unit is not an extractor, it means that the current NAL unit is a normal NAL unit containing decoding data, so in step 835 this NAL unit is sent to the decoder. If the current NAL unit is an extractor, step 840 determines whether the current NAL unit depends on a NAL unit outside the same component file. If the required NAL unit is in the same component file, the NAL unit is retrieved from the current file at step 845 and sent to the decoder at step 835. If the required NAL unit is from another component file, then in step 850 the reference information Data_entry in the extractor is used to locate the NAL unit, in step 855 it is retrieved from the remote file, then In step 835, the data is transmitted to the decoder.

別の実施例では、パーサ７１０がメディア・データを構文解析することによって参照を識別して、埋め込まれた参照指標を得、これらの参照指標に従って対応する参照を取得する。ビデオ・デコーダのＳＶＣ／ＭＶＣタイプ・ビデオ・ビットストリームを読み取る対応するプロセスを、図１０に示す。これは、図８のプロセスと同様である。好ましい実施例ではコンポーネント・ファイルの冒頭に参照が配置されるので、ステップ８１０のメタデータの構文解析により、その中に含まれる参照の分析をメディア・データの構文解析と並行して行うことが可能になる。参照を分析するときには、ステップ１０１４で、参照されたその他のコンポーネント・ファイルを識別する。これらのその他のコンポーネント・ファイルの取出しは、ステップ１０１２で、このプロセスの残りのステップと並行して開始される。ステップ８５０で、現在のＮＡＬユニットが依存するコンポーネント・ファイルの位置情報にアクセスした後で、このコンポーネント・ファイルを入手することができるかどうか、メディア・バッファなどのローカル記憶装置を確認する。必要なコンポーネント・ファイルがローカルで入手できる場合には、そのローカル・コピーのＮＡＬユニットを取り出し、そうでない場合には、遠隔ファイルのＮＡＬユニットを取り出す。なお、このコンポーネント・ファイルのローカル・コピーは、ステップ１０１２で並行取出しによって取得してもよいし、このコンポーネント・ファイルの以前の要求から取得してもよいことに留意されたい。 In another embodiment, parser 710 identifies references by parsing the media data to obtain embedded reference indices and obtains corresponding references according to these reference indices. The corresponding process for reading the SVC / MVC type video bitstream of the video decoder is shown in FIG. This is similar to the process of FIG. Since the reference is placed at the beginning of the component file in the preferred embodiment, the parsing of the metadata in step 810 allows the analysis of the references contained therein to be performed in parallel with the parsing of the media data. become. When analyzing a reference, step 1014 identifies the other referenced component files. The retrieval of these other component files begins in step 1012 in parallel with the remaining steps of the process. In step 850, after accessing the location information of the component file on which the current NAL unit depends, the local storage device, such as a media buffer, is checked to see if this component file is available. If the required component file is available locally, the local copy NAL unit is retrieved; otherwise, the remote file NAL unit is retrieved. Note that a local copy of this component file may be obtained by parallel fetching at step 1012, or from a previous request for this component file.

本明細書では、本発明の好ましい実施例について詳細に説明したが、本発明はこれらの実施例に限定されるわけではないこと、および当業者なら、添付の特許請求の範囲に定義する本発明の範囲を逸脱することなく、その他の修正形態および変形形態を実施することができることを理解されたい。 Although the present invention has been described in detail with reference to preferred embodiments thereof, it is to be understood that the invention is not limited to these embodiments and that those skilled in the art will recognize the invention as defined in the appended claims. It should be understood that other modifications and variations can be made without departing from the scope thereof.

Claims

A method of creating a component file from a media entity containing multiple layers,
Extracting metadata for each layer from the media entity;
Extracting from the media entity media data corresponding to the extracted metadata of each layer of the media entity;
Associating the extracted media data with the extracted metadata to enable creation of component files including the extracted metadata and the extracted media data in each of the layers. , Said method.

The method of claim 1, wherein the component file is at least one of a video box, a video fragment, a segment, and a file.

Extracting additional media data related to the extracted media data of each layer from the media entity to each layer;
The method of claim 1, further comprising associating the extracted media data and the additional media data for each layer for creation of a corresponding component file.

Identifying additional media data references related to the extracted media data for each layer;
The method of claim 1, further comprising associating the reference with the extracted metadata and the extracted media data for each layer to create a corresponding component file.

The method of claim 4, wherein the media data and the additional media data include data samples.

6. The method of claim 5, wherein the data sample includes a network abstraction layer unit.

The method of claim 6, wherein the reference includes at least one of a unified resource locator and a unified resource name of the network abstraction layer unit in the additional media data.

Embedding the reference in the extracted metadata of each layer;
The method of claim 4, further comprising: adding the reference indication to the extracted media data.

9. The method of claim 8, wherein the reference is placed at the beginning of the component file for each layer.

9. The method of claim 8, wherein the reference is entered in a media reference box and the indicator is entered in an extractor.

A file encapsulation device that creates a component file from a media entity containing multiple layers,
An extractor that extracts metadata of each layer from the media entity and extracts media data corresponding to the extracted metadata of each layer of the media entity from the media entity;
A correlator that associates the extracted media data with the extracted metadata and that enables creation of component files including the extracted metadata and the extracted media data in each of the layers. The file encapsulation device.

12. The file encapsulation device according to claim 11, wherein the component file is at least one of a movie box, a movie fragment, a segment, and a file.

The extractor further extracts, for each layer, additional media data related to the extracted media data of each layer from the media entity;
12. The file encapsulation device according to claim 11, wherein the correlator further associates the extracted media data and the additional media data of each layer to create a corresponding component file.

A reference identifier for identifying a reference of additional media data related to the extracted media data of each layer from the media entity, wherein the correlator extracts the reference from the extracted meta data of each layer; 12. The file encapsulation device according to claim 11, wherein a corresponding component file is created in association with data and the extracted media data.

The file encapsulation device of claim 14, wherein the media data and the additional media data include data samples.

16. The file encapsulation device of claim 15, wherein the data sample includes a network abstraction layer unit.

The file encapsulation device of claim 16, wherein the reference includes at least one of a unified resource locator and a unified resource name of the network abstraction layer unit in the additional media data.

15. The file encapsulation device according to claim 14, wherein the correlator further embeds the reference in the extracted metadata of each layer, and adds an index of the reference to the extracted media data.

19. The file encapsulation device according to claim 18, wherein the correlator places the reference at the beginning of the component file for each layer.

20. The file encapsulation device of claim 19, wherein the reference is entered in a media reference box and the indicator is entered in an extractor.

A method of reading a component file,
Parsing the component file to obtain metadata, media data and references;
According to the reference, when the media data of the component file is related to media data of another component file, the related media Retrieving the data.

The method of claim 21, wherein the media data of the component file is related to media data of other component files according to encoding dependencies.

The method of claim 21, wherein the media data and the related media data comprise data samples.

24. The method of claim 23, wherein the data sample comprises a network abstraction layer unit.

The method of claim 21, further comprising parsing the metadata to obtain the reference.

Parsing the media data to obtain a reference index embedded therein;
26. The method of claim 25, further comprising: obtaining a corresponding reference according to the reference index.

Said removing step comprises:
26. The method of claim 25, comprising retrieving the other component file in parallel according to the reference.

Said removing step comprises:
Checking the local file storage;
28. The method of claim 27, further comprising retrieving the other component file from the local storage device if the local file storage device includes the other component file.

A parser that parses component files to obtain metadata, media data, and references;
A retriever for retrieving media data related to the media data from other component files according to the reference;
A processor for processing media data retrieved from the metadata, the media data, and the other component files.

30. The file reader of claim 29, wherein the media data of the component file is related to media data of other component files in terms of encoding dependency.

30. The file reader of claim 29, wherein the media data and the related media data include data samples.

32. The file reader of claim 31, wherein the data sample includes a network abstraction layer unit.

30. The file reader of claim 29, wherein the processor includes a video decoder.

30. The file reader of claim 29, wherein the parser further comprises means for obtaining the reference.

35. The file reader of claim 34, wherein the parser further parses the media data to obtain a reference index embedded therein and obtains a corresponding reference according to the reference index.

35. The file reader of claim 34, wherein the retriever further retrieves the other component files in parallel according to the obtained reference.

The retriever further checks a local file storage device, and if the local file storage device includes the other component file, retrieves the other component file from the local storage device. Item 37. The file reader according to Item 36.