JP4114868B2

JP4114868B2 - Multiplexer and multiplexing method

Info

Publication number: JP4114868B2
Application number: JP2003168432A
Authority: JP
Inventors: 正真遠間; 義徳松井; 陽司能登屋
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2002-06-26
Filing date: 2003-06-12
Publication date: 2008-07-09
Anticipated expiration: 2023-06-12
Also published as: JP2004350250A

Description

【０００１】
【発明の属する技術分野】
本発明は、動画像データや音声データ等のメディアデータを多重化する多重化装置、および動画像データや音声データ等のメディアデータが多重化されたビット列を読み込んで逆多重化する逆多重化装置に関する。
【０００２】
【従来の技術】
近年、通信ネットワークの大容量化および伝送技術の進歩により、インターネット上で、動画、音声、テキスト、あるいは、静止画等のマルチメディアコンテンツを含む動画像ファイルをパーソナルコンピュータに配信する動画配信サービスの普及が著しい。また、携帯端末等のいわゆる第３世代の移動体通信システムの規格の標準化を図ることを目的とする国際標準化団体3GPP(Third Generation Partnership Project)で、無線による動画配信に関する規格としてTS26.234(Transparent end-to-end packet switched streaming service)が定められる等の動きも見られ、動画配信サービスは、携帯電話機やＰＤＡ等の移動体通信端末への提供の拡大も見込まれている。
【０００３】
動画配信サービスにおいて、動画像ファイルを配信する際には、まず、多重化装置において、動画、静止画、音声およびテキスト等のメディアデータを取り込んで、メディアデータの再生に必要なヘッダ情報とメディアデータの実体データとを多重化して動画像ファイルデータを作成することが必要となるが、この動画像ファイルデータの多重化ファイルフォーマットとして、ＭＰ４ファイルフォーマットが注目されている。
【０００４】
このＭＰ４ファイルフォーマットは、国際標準化団体であるISO/IEC(International Standardization Organization/International Engineering Consortium) JTC1/SC29/WG 11 において標準化が進められている多重化ファイルフォーマットであり、上記3GPPのTS26.234でも採用されていることから、広く普及するものと予想されている。
【０００５】
ここで、ＭＰ４ファイルのデータ構造について説明する。なお、このＭＰ４ファイルのデータ構造については、非特許文献１に開示されている。
ＭＰ４ファイルは、ボックスと呼ばれるオブジェクト単位でヘッダ情報やメディアデータの実体データが格納されており、複数のボックスを階層的に配列することによって構成される。
【０００６】
図１８は、従来のＭＰ４ファイルを構成するボックスの構造を説明するための図である。
ボックス９０１は、ボックス９０１のヘッダ情報が格納されるボックスヘッダ部９０２と、ボックス９０１に含まれるデータ（例えば、そのボックスの下の階層のボックスや情報を記述するためのフィールド等）が格納されるボックスデータ格納部９０３とから構成される。
【０００７】
このボックスヘッダ部９０２は、ボックスサイズ９０４、ボックスタイプ９０５、バージョン９０６、フラグ９０７のフィールドを有している。
ボックスサイズ９０４は、このフィールドに割り当てられたバイトサイズも含めてボックス９０１全体のサイズ情報が記述されるフィールドである。
【０００８】
ボックスタイプ９０５は、ボックス９０１の種別を識別するための識別子が記述されるフィールドである。この識別子は、通常４つのアルファベット文字列によって表される。なお、以下、本明細書中において、この識別子によって各ボックスを示す場合がある。
【０００９】
バージョン９０６は、ボックス９０１のバージョンを示すバージョン番号が記述されるフィールドであり、フラグ９０７は、ボックス９０１毎に設定されるフラグ情報が記述されるフィールドである。このバージョン９０６とフラグ９０７は、全てのボックス９０１に必須のフィールドではないので、これらのフィールドを有しないボックス９０１も存在しうる。
【００１０】
このような構造のボックス９０１が複数連なって構成されるＭＰ４ファイルは、ファイルの構成に不可欠な基本部と、必要に応じて使用される拡張部とに大別することができる。まず、ＭＰ４ファイルの基本部について説明する。
図１９は、従来のＭＰ４ファイルの基本部を説明するための図である。
【００１１】
ＭＰ４ファイル９１０の基本部９１１は、ファイルヘッダ部９１２とファイルデータ部９１３とから構成される。
ファイルヘッダ部９１２は、ファイル全体のヘッダ情報、例えば、動画像（ビデオ）データの圧縮符号化方式等の情報が格納される部分であり、ファイルタイプボックス９１４とムービーボックス９１５とから構成される。
【００１２】
ファイルタイプボックス９１４は、“ｆｔｙｐ”の識別子で識別されるボックスであり、ＭＰ４ファイルを識別するための情報が格納される。ＭＰ４ファイルにどのようなメディアデータを格納するかについて、また、どのような圧縮符号化方式を用いた動画像（ビデオ）データや音声（オーディオ）データ等を格納するかについては、標準化団体やサービス事業者が独自に規定することができるため、ＭＰ４ファイルがどの規定に従って作成されたものであるかを識別するための情報を、このファイルタイプボックス９１４に格納する。
【００１３】
ムービーボックス９１５は、“ｍｏｏｖ”の識別子で識別されるボックスであり、ファイルデータ部９１３に格納される実体データのヘッダ情報、例えば、表示時間長等の情報が格納される。
ファイルデータ部９１３は、“ｍｄａｔ”の識別子で識別されるムービーデータボックス９１６によって構成される。なお、このファイルデータ部９１３の代わりに、このＭＰ４ファイル９１０とは異なる外部のファイルを参照することもできる。このように、外部のファイルを参照する場合には、ＭＰ４ファイル９１０の基本部９１１は、ファイルヘッダ部９１２のみから構成されることになる。本明細書では、この外部ファイルの参照をする場合ではなく、ＭＰ４ファイル９１０内に実体データを含む場合について説明する。
【００１４】
ムービーデータボックス９１６は、サンプルと称される単位でメディアデータの実体データを格納するボックスである。このサンプルとは、ＭＰ４ファイルにおける最小のアクセス単位であり、MPEG(Moving Picture Experts Group)-4 Visualの圧縮符号化方式によって符号化したビデオデータのVOP(Video Object Plane)やオーディオデータのフレームに相当するものである。
【００１５】
ここで、従来におけるＭＰ４ファイルの基本部の構造について階層を掘り下げて、ムービーボックス９１５の構造を説明することとする。
図２０は、従来のＭＰ４ファイルにおけるムービーボックスの構造を説明するための図である。
【００１６】
図２０（ａ）に示すように、ムービーボックス９１５は、先に説明したボックスヘッダ部９０２とボックスデータ格納部９０３とから構成されている。そして、ボックスヘッダ部９０２を構成するボックスサイズ９０４のフィールドには、ムービーボックス９１５のサイズ情報が記述され（図２０（ａ）では、“ｘｘｘｘ”とする。）、ボックスタイプ９０５のフィールドには、ムービーボックス９１５の識別子“ｍｏｏｖ”が記述される。
【００１７】
また、ムービーボックス９１５のボックスデータ格納部９０３には、ＭＰ４ファイル９１０の基本部９１１のヘッダ情報が格納されるムービーヘッダボックス９１７や、ビデオトラックやオーディオトラック等、トラック毎のヘッダ情報が格納されるトラックボックス９１８等が格納されている。なお、ここにいうトラックとは、ＭＰ４ファイル９１０に含まれる各メディアのサンプルデータ全体を意味し、動画像や音声やテキスト等のトラックは、それぞれビデオトラック、オーディオトラックやテキストトラック等と称される。また、ＭＰ４ファイル９１０内に同一メディアのデータが複数存在する場合は、同一メディアに対して複数のトラックが存在することになる。具体的に説明すると、例えば、ＭＰ４ファイル９１０内に２種類の動画像データが含まれている場合、２つのビデオトラックが存在することになる。
【００１８】
ムービーヘッダボックス９１７も、先に説明したボックスヘッダ部９０２とボックスデータ格納部９０３とから構成されており、ボックスヘッダ部９０２を構成するボックスサイズ９０４のフィールドには、ムービーヘッダボックス９１７のサイズ情報が記述され（図２０（ａ）では、“ｘｘｘ”とする。）、ボックスタイプ９０５のフィールドには、ムービーヘッダボックス９１７の識別子“ｍｖｈｄ”が記述される。そして、ムービーヘッダボックス９１７のボックスデータ格納部９０３には、ＭＰ４ファイル９１０の基本部９１１に含まれるコンテンツの再生に要する時間長に関する情報等が格納される。
【００１９】
また、トラックボックス９１８のボックスヘッダ部９０２を構成するボックスサイズ９０４のフィールドには、トラックボックス９１８のサイズ情報が記述され（図２０（ａ）では、“ｘｘ”とする。）、ボックスタイプ９０５のフィールドには、トラックボックス９１８の識別子“ｔｒａｋ”が記述される。そして、トラックボックス９１８のボックスデータ格納部９０３には、トラックヘッダボックス９１９が格納されている。
【００２０】
トラックヘッダボックス９１９は、トラック毎のヘッダ情報を記述するためのフィールドを有するボックスであり、“ｔｋｈｄ”の識別子によって識別される。このトラックヘッダボックス９１９のボックスデータ格納部９０３には、トラックの種類を識別するためのトラックＩＤを記述するフィールドや、トラックの再生に要する時間長に関する情報等が記述される。
【００２１】
このように、ムービーボックス９１５には、ボックス９０１が階層的に配列されており、“ｔｒａｋ”で識別されるトラックボックス９１８にビデオやオーディオ等のトラック毎のヘッダ情報が格納されている。そして、このトラックボックス９１８に含まれる下位のボックスにおいて、トラックのサンプル単位のヘッダ情報が格納されている。
【００２２】
図２０（ａ）に示すムービーボックス９１５の構造をツリー状に示すと、図２０（ｂ）のような図が得られる。
すなわち、ムービーボックス９１５の下位のボックス群としてムービーヘッダボックス９１７、トラックボックス９１８が配列され、トラックボックス９１８の下位のボックス群としてトラックヘッダボックス９１９が配列されており、ボックス９０１が階層的に配置されていることがわかる。
【００２３】
ＭＰ４ファイルフォーマットの標準化当初、ＭＰ４ファイル９１０は、上記基本部９１１のみから構成されていた。しかし、メディアデータの情報量が多くなると、サイズが大きくなってしまうので、ストリーミング再生への適用が難しい等の種々の問題があり、ヘッダボックスとデータボックスとの組が複数連なる拡張部の使用を加える改良がなされている。
【００２４】
図２１は、従来における拡張部を含むＭＰ４ファイルの構造を示す図である。図２１に示すように、上記改良が加えられたＭＰ４ファイル９２０は、基本部９１１と拡張部９２１とから構成される。この拡張部９２１を含むＭＰ４ファイル９２０では、全てのメディアデータを拡張部９２１に格納することができるので、ＭＰ４ファイル基本部９１１のムービーデータボックス９１６を省略することとしてもよい。
【００２５】
拡張部９２１は、所定の単位で区切られたパケット９２２が複数連なって構成される。
このパケット９２２は、ムービーフラグメントボックス９２３とムービーデータボックス９１６とが一対となって構成され、ムービーフラグメントとも称される。
【００２６】
ムービーデータボックス９１６は、上記区切られた所定の単位でトラック毎のサンプルを格納し、ムービーフラグメントボックス９２３は、このムービーデータボックス９１６に対応してヘッダ情報を格納するボックスであり、“ｍｏｏｆ”という識別子によって識別される。このムービーフラグメントボックス９２３の構造について、さらに詳しく説明する。
【００２７】
図２２は、従来におけるムービーフラグメントボックスの構造を説明するための図である。
図２２に示すように、ムービーフラグメントボックス９２３のボックスデータ格納部９０３には、ムービーフラグメントヘッダボックス９２４と複数のトラックフラグメントボックス９２５が格納されている。
【００２８】
ムービーフラグメントヘッダボックス９２４は、“ｍｆｈｄ”の識別子で識別されるボックスであり、ムービーフラグメントボックス９２３全体のヘッダ情報が格納される。
トラックフラグメントボックス９２５は、“ｔｒａｆ”の識別子で識別されるボックスであり、トラック毎のヘッダ情報が格納される。
【００２９】
なお、通常１つのトラックのヘッダ情報に対して、１つのトラックフラグメントボックス９２５が用意されるが、１つのトラックのヘッダ情報に対して、複数のトラックフラグメントボックス９２５が用意されるとしてもよい。このように、１つのトラックのヘッダ情報を複数のトラックフラグメントボックス９２５に分割して格納する際には、トラックフラグメントボックス９２５の先頭サンプルの復号時間が昇順となるように配列される。
【００３０】
そして、このトラックフラグメントボックス９２５のボックスデータ格納部９０３には、トラックフラグメントヘッダボックス９２６と１つ以上のトラックフラグメントランボックス９２７が格納されている。
トラックフラグメントヘッダボックス９２６は、“ｔｆｈｄ”の識別子で識別されるボックスであり、トラックの種類を識別するためのトラックＩＤを記述するフィールドや、サンプルの再生時間長等のデフォルト値に関する情報等を格納する。
【００３１】
トラックフラグメントランボックス９２７は、“ｔｒｕｎ”の識別子で識別されるボックスであり、サンプル単位のヘッダ情報を格納する。図２３を用いて、このトラックフラグメントランボックス９２７について詳しく説明する。
図２３は、従来におけるトラックフラグメントランボックス９２７の構造を説明するための図である。
【００３２】
フラグ９０７は、ボックス９０１毎に設定されるフラグ情報が記述されるフィールドであるが、ここでは、フラグ９０７に続いてデータオフセット９２９からサンプルコンポジションタイムオフセット９３６までの各フィールドがトラックフラグメントランボックス９２７に存在するか否かを示すフラグ情報が記述される。
【００３３】
サンプルカウント９２８は、トラックフラグメントランボックス９２７にどれだけの数のサンプルに関するヘッダ情報が格納されるかを示す情報が記述されるフィールドである。
データオフセット９２９は、トラックフラグメントランボックス９２７にヘッダ情報が格納されているサンプルのうちトラックフラグメントランボックス９２７の先頭に位置するサンプルの実体データが、組となっているムービーデータボックス９１６のどこに格納されているかを示すポインタ情報が記述されるフィールドである。
【００３４】
先頭サンプルフラグ９３０は、トラックフラグメントランボックス９２７の先頭サンプルがランダムアクセス可能なサンプルである場合に、後述するサンプルフラグ９３５のフィールドの値を上書きすることができるフィールドである。ここで、ランダムアクセスとは、例えば、ＭＰ４ファイルの再生装置において、再生の途中でデータの再生位置を１０秒後に移動させたり、データの途中から再生を開始したりする処理動作を意味する。そして、ランダムアクセス可能なサンプルとは、ビデオサンプルのうち、ＭＰ４ファイルの再生装置において、他のフレームのデータを参照することなく単独で復号化できるフレーム、すなわち画面内符号化フレーム（いわゆるイントラフレーム）を構成するサンプルを意味する。なお、オーディオサンプルでは、いずれのサンプルも単独で復号化することができるので、全てのオーディオサンプルがランダムアクセス可能なサンプルといえる。
【００３５】
テーブル９３１は、サンプル毎のヘッダ情報を示すエントリ９３２が、サンプルカウント９２８において示される個数分集積されたものである。
エントリ９３２は、サンプル毎のヘッダ情報を示すフィールドの集まりであり、いずれのフィールドが含まれるかは、上記フラグ９０７によって示される。エントリ９３２に含まれるフィールドには、サンプルの再生時間長が記述されるサンプルデュレーション９３３、サンプルのサイズが記述されるサンプルサイズ９３４、サンプルがランダムアクセス可能であるか否かを示すフラグ情報が記述されるサンプルフラグ９３５、そして、双方向予測を用いたサンプルを扱うために、サンプルの復号時間と表示時間との差分値が記述されるサンプルコンポジションタイムオフセット９３６がある。
【００３６】
なお、これらのフィールドがエントリ９３２に含まれない場合は、各サンプルのヘッダ情報は、トラックフラグメントヘッダボックス９２６や、ムービーフラグメントボックス９１５内のムービーエクステンドボックス（識別子“ｍｖｅｘ”）に、これらのフィールドのデフォルト値が記述されているので、これらのデフォルト値が使用される。
【００３７】
また、トラックフラグメントランボックス９２７には、復号時間の早いサンプルから順にヘッダ情報が記述される。従って、ＭＰ４ファイルを再生する装置がサンプルのヘッダ情報を検索する際には、ファイル中の先頭のトラックフラグメントボックス９２５から順にトラックフラグメントヘッダボックス９２６内のトラックＩＤを参照することで、取得するトラックのヘッダ情報を含むトラックフラグメントボックス９２５を検索し、トラックフラグメントボックス９２５内においても、先頭のトラックフラグメントランボックス９２７から順にサンプルのヘッダ情報を検索することになる。
【００３８】
なお、この拡張部９２１を含むＭＰ４ファイル９２０の場合であっても、復号化時の初期化情報等、トラック全体に必要な情報は、ムービーボックス９１５に格納される。
続いて、このような構造を有する拡張部９２１を含むＭＰ４ファイルの構成例について説明する。
【００３９】
図２４は、従来における拡張部を含むＭＰ４ファイルの拡張部の構成例を示す図である。
図２４では、コンテンツの格納方法について２通りの例を示して説明することとし、コンテンツの再生時間長は、６０秒であるとする。
【００４０】
図２４（ａ）に示すＭＰ４ファイル９４０は、基本部９４１および拡張部９４２の両方にメディアデータを格納する構成になっている。すなわち、基本部９４１のｍｄａｔ＿１（符号９４５）に０〜３０秒までのメディアデータが格納され、拡張部９４２のｍｄａｔ＿２（符号９４７）に３０〜４５秒までのメディアデータが格納され、ｍｄａｔ＿３（符号９４９）に４５〜６０秒までのメディアデータが格納されている。そして、ｍｄａｔ＿１（符号９４５）のヘッダ情報はｍｏｏｖ９４４に格納され、ｍｄａｔ＿２（符号９４７）のヘッダ情報はｍｏｏｆ＿１（符号９４６）に格納され、ｍｄａｔ＿３（符号９４９）のヘッダ情報はｍｏｏｆ＿２（符号９４８）に格納されている。
【００４１】
これに対して、図２４（ｂ）に示すＭＰ４ファイル９５０は、拡張部９５２だけにメディアデータを格納する構成になっている。すなわち、基本部９５１は、ｆｔｙｐ９５３とｍｏｏｖ９５４とから構成されてｍｄａｔを含まず、拡張部９５２のｍｄａｔ＿１（符号９５６）に０〜３０秒までのメディアデータが格納され、ｍｄａｔ＿２（符号９５８）に３０〜６０秒までのメディアデータが格納されている。そして、ｍｄａｔ＿１（符号９５６）のヘッダ情報はｍｏｏｆ＿１（符号９５５）に格納され、ｍｄａｔ＿２（符号９５８）のヘッダ情報はｍｏｏｆ＿２（符号９５７）に格納されている。
【００４２】
ここで、上記ＭＰ４ファイルの拡張部がどのように作成されるかを図２５〜図２７を用いて説明する。
図２５は、従来の多重化装置の構成を示すブロック図である。
多重化装置９６０は、メディアデータを多重化してＭＰ４ファイルの拡張部データを作成する装置である。ここでは、ビデオデータとオーディオデータとを多重化してＭＰ４ファイルの拡張部データを作成するものとする。
【００４３】
第１入力部９６１はビデオデータを多重化装置９６０に取り込み、第１データ蓄積部９６２に蓄積させ、また、第２入力部９６４はオーディオデータを多重化装置９６０に取り込み、第２データ蓄積部９６５に蓄積させる。
第１解析部９６３は、第１データ蓄積部９６２から１サンプルずつビデオデータを読み出して解析し、ビデオサンプルのヘッダ情報をパケット単位決定部９６７に出力する。また、第２階席部９６６は、第２データ蓄積部９６５から１サンプルずつオーディオデータを読み出して解析し、オーディオサンプルのヘッダ情報をパケット単位決定部９６７に出力する。このビデオサンプルヘッダ情報およびオーディオサンプルヘッダ情報には、サンプルのサイズや再生時間長を示す情報が含まれており、ビデオサンプルヘッダ情報には、ビデオサンプルがイントラフレームであるか否かを示す情報も含まれている。
【００４４】
パケット単位決定部９６７は、パケットに含まれるサンプル数が一定となるように、ビデオデータおよびオーディオデータのパケット単位を決定し、取得したサンプルヘッダ情報に基づいて各パケットのヘッダ情報を作成する。
図２６に、従来におけるパケット単位決定部の処理動作フローを示す。ここで、１つのパケットに格納されるサンプルの数をＮとし、この値は予め定められて、多重化装置９６０のメモリ等に保持されている。
【００４５】
まず、第１解析部９６３が１つのビデオサンプルを取得して（Ｓ９０１）、ビデオサンプルヘッダ情報をパケット単位決定部９６７に出力すると、パケット単位決定部９６７は、ビデオサンプルヘッダ情報をパケット作成テーブルに追加する（Ｓ９０２）。
次に、パケット単位決定部９６７は、パケットに含まれるビデオサンプルの数を更新し（Ｓ９０３）、パケットに含まれるビデオサンプルの数がＮになったかどうかを判定する（Ｓ９０４）。
【００４６】
ここで、パケットに含まれるビデオサンプルの数がＮに満たない場合（Ｓ９０４のＮｏ）、上記Ｓ９０１〜Ｓ９０３までの処理が繰り返され、パケットに含まれるビデオサンプルの数がＮになった場合（Ｓ９０４のＹｅｓ）、パケット単位決定部９６７は、Ｎ個のビデオサンプルをパケット化して処理動作を終了する（Ｓ９０５）。
【００４７】
パケット単位決定部９６７は、同様に、オーディオについても上記Ｓ９０１〜Ｓ９０５までの処理動作によって、オーディオサンプルのパケット化を行なう。そして、全てのサンプルのパケット化が完了するまで、パケット単位決定部９６７は、このフローの処理動作を繰り返す。
【００４８】
図２７に、従来におけるビデオサンプルのヘッダ情報を格納するパケット作成テーブルの一例を示す。このパケット作成テーブル９６８ａには、ビデオサンプル毎に、サンプルのサイズ、サンプルの再生時間長や、そのビデオサンプルがイントラフレームであるか否かを示す画面内符号化フレームフラグに関する情報が記述される。ここでは、パケットに格納される先頭のビデオサンプルは、サイズが３００バイト、再生時間長が３０ｍｓ、画面内符号化フレームでないことが示されており、２番目のビデオサンプルは、画面内符号化フレームであることが示されている。そして、このパケット作成テーブル９６８ａは、パケット単位決定部９６７においてこれらの情報が順次追加され、１パケットに含まれる最後のサンプルとなるＮ番目まで作成されると、パケット作成テーブル蓄積部９６８に出力される。
【００４９】
再び図２５を参照すると、続いて、パケット単位決定部９６７は、パケット作成テーブル９６８ａにＮ個分のサンプルのヘッダ情報を記述した後、パケット作成テーブル９６８ａをパケット作成テーブル蓄積部９６８に出力するとともに、パケットヘッダ作成部９６９にパケット作成信号を出力する。
【００５０】
パケットヘッダ作成部９６９は、パケット作成信号を取得すると、パケット作成テーブル蓄積部９６８に保持されているパケット作成テーブル９６８ａからパケットサンプルヘッダ情報を読み出してｍｏｏｆデータを作成する。また、パケットヘッダ作成部９６９は、作成したｍｏｏｆデータをパケット結合部９７１に出力するとともに、パケットに含まれるサンプルの実体データが第１データ蓄積部９６２および第２データ蓄積部９６５のどこに格納されているかを示すポインタ情報と、サンプルのサイズ情報とを含むｍｄａｔ情報をパケットデータ作成部９７０に出力する。
【００５１】
パケットデータ作成部９７０は、取得したｍｄａｔ情報に基づいて第１データ蓄積部９６２および第２データ蓄積部９６５からサンプルの実体データを読み出してｍｄａｔデータを作成し、ｍｄａｔデータをパケット結合部９７１に出力する。
【００５２】
そして、パケット結合部９７１は、ｍｏｏｆデータとｍｄａｔデータとを結合させて、１パケット分のｍｐ４拡張部データを出力する。
最終的には、出力された１パケット分のｍｐ４拡張部データは、ＭＰ４ファイルを作成する装置に取り込まれ、順次作成されるｍｐ４拡張部データが順番に並べられることによって、ＭＰ４ファイルの拡張部が作成される。その後、このファイル作成装置で、ＭＰ４ファイルの基本部と拡張部とが結合されることによって、ＭＰ４ファイルが作成されることになる。
【００５３】
【非特許文献１】
ISO/IEC JTC1/SC29/WG11 MPEG、N4854「Proposed Revised Common Text Multimedia File Format Specification」、2002年3月21日
【００５４】
【発明が解決しようとする課題】
しかしながら、このような従来の多重化装置によって多重化されたＭＰ４ファイルの拡張部を再生する際には、以下のような問題がある。
その１つとして、まず、従来の多重化装置では、パケットに含まれるサンプルの再生開始時間を考慮することなく多重化が行なわれるので、例えば、ある再生開始時間のビデオサンプルと同期が図られているオーディオサンプルが、ビデオサンプルと異なるパケットに格納される場合がある。そのため、ＭＰ４ファイルの再生装置側で、再生時のデータアクセスの効率が悪化するという問題がある。
【００５５】
また、従来の多重化装置では、パケットに含まれるサンプルの数を基準として多重化を行なうので、ランダムアクセス可能なサンプル、すなわちイントラフレームに相当するビデオサンプルをパケット内のどこに格納するかは、パケット毎にまちまちとなることが多い。そのため、ＭＰ４ファイルの再生装置側で、ランダムアクセス可能なサンプルを検索する際に、パケットに含まれる全てのビデオサンプルを検索しなければならず、サンプルの検索に要する計算量が膨大となってしまうという問題もある。
【００５６】
これらの問題について、図２８を用いてさらに詳しく説明する。
図２８は、従来における多重化装置の問題点を説明するための図である。
図２８（ａ）では、再生時のデータアクセスの効率が悪化するという第１の問題を明らかにする。
【００５７】
各ｍｄａｔに含まれるサンプルのヘッダ情報は、直前のｍｏｏｆに格納されており、ｍｄａｔ＿１に格納されている再生開始時間２０ｓのビデオサンプルに関するヘッダ情報は、ｍｏｏｆ＿１に先頭サンプルとして格納されており、ｍｄａｔ＿１０に格納されている再生開始時間２０ｓのオーディオサンプルに関するヘッダ情報は、ｍｏｏｆ＿１０に最終サンプルとして格納されている。
【００５８】
従って、ＭＰ４ファイルの再生装置が、コンテンツの再生時間２０ｓの部分を再生しようとすれば、ｍｏｏｆ＿１に格納されているビデオサンプルのヘッダ情報を取得してからオーディオサンプルのヘッダ情報を取得するまでにｍｏｏｆ＿１０まで検索しなければならず、データアクセスの効率が悪くなってしまう。
【００５９】
図２８（ｂ）では、ランダムアクセス可能なサンプルの検索に要する計算量が膨大となってしまうという第２の問題を明らかにする。
ｍｄａｔ＿１の最後に格納されているｉ番目のランダムアクセス可能なビデオサンプルに関するヘッダ情報は、ｍｏｏｆ＿１に最終サンプルとして格納されており、ｍｄａｔ＿３の最後に格納されているｉ＋１番目のランダムアクセス可能なビデオサンプルに関するヘッダ情報は、ｍｏｏｆ＿３に最終サンプルとして格納されている。
【００６０】
従って、ＭＰ４ファイルの再生装置が、ランダムアクセスを行なおうとすれば、ｍｏｏｆの最終サンプルまで検索しなければならず、検索に必要な計算量が膨大となってしまう。
さらに、これら第１および第２の問題に加えて、従来の多重化装置で作成されるＭＰ４ファイルの拡張部の構成では、サンプルデータを取得するためのシークの回数が多くなるため、光ディスク再生機器等のシーク速度が遅い機器におけるランダムアクセス再生に適さないという問題もある。
【００６１】
この問題について、再び図２８（ｂ）を用いて説明する。ｍｏｏｆ＿１のｉ番目のランダムアクセス可能なビデオサンプルにランダムアクセスしようとする場合、再生装置は、まず、ｉ番目のランダムアクセス可能なビデオサンプルのヘッダ情報を取得するために、ｍｏｏｆ＿１の先頭位置まで読み出しポインタを移動させ、ｍｏｏｆ＿１内を順に解析する。このとき、１回目のシークが必要となる。
【００６２】
その後、再生装置は、ｍｄａｔ＿１のどこにｉ番目のランダムアクセス可能なビデオサンプルの実体データが格納されているかを取得し、実体データの開始位置へ読み出しポインタを移動させる。このとき、ｉ番目のランダムアクセス可能なビデオサンプルの実体データがｍｄａｔ＿１の終端に格納されているため、ｍｏｏｆ＿１の先頭位置から連続的に読み出しポインタを移動させてサンプルの実体データを取得できず、２回目のシークが必要となる。
【００６３】
すなわち、ｍｏｏｆ＿１の先頭位置と実体データの開始位置に読み出しポインタを移動させる時にそれぞれシーク動作を行なうことになるので、再生装置がシーク速度の遅い機器である場合は、ランダムアクセス再生に時間がかかってしまう。特に、このｉ番目のランダムアクセス可能なビデオサンプルと同期が図られているオーディオサンプル等の実体データが異なるパケット等、ビデオサンプルの実体データと離れて格納されている場合には、さらにシーク動作が必要となり、ランダムアクセス再生を迅速に行なうことが困難となる。
【００６４】
そこで、本発明は、これらの問題点に鑑みてなされたものであり、メディアデータの多重化ファイルが再生時のデータアクセスの効率に優れ、サンプルの検索に要する計算量が少なくなるようにメディアデータを多重化することができる多重化装置を提供することを目的とする。
【００６５】
また、多重化ファイルがシーク速度の遅い機器におけるランダムアクセス再生に適するようにメディアデータを多重化することができる多重化装置を提供することを目的とする。
【００６６】
【課題を解決するための手段】
上記の目的を達成するために、本発明に係る多重化装置は、画像データと、音声データおよびテキストデータのうち少なくとも１つとを含むメディアデータをパケット多重化して多重化データを作成する多重化装置であって、前記メディアデータを取得するメディアデータ取得手段と、前記メディアデータ取得手段が取得した前記メディアデータを解析して、前記メディアデータに含まれる前記画像データ、音声データおよびテキストデータの最小のアクセス単位であるサンプルについて、サンプルの再生開始時間を示す再生開始時間情報を取得する解析手段と、前記解析手段が取得した前記再生開始時間情報に基づいて、前記メディアデータに含まれる前記画像データ、音声データおよびテキストデータの各サンプルの再生開始時間を揃えて前記メディアデータをパケット化する単位を決定するパケット単位決定手段と、前記パケット単位決定手段が決定したパケット化単位で前記メディアデータのヘッダを格納するパケットヘッダ部を作成するパケットヘッダ部作成手段と、前記パケット単位決定手段が決定したパケット化単位で前記メディアデータの実体データを格納するパケットデータ部を作成するパケットデータ部作成手段と、前記パケットヘッダ部作成手段が作成したパケットヘッダ部と、前記パケットデータ部作成手段が作成したパケットデータ部とを結合してパケットを作成するパケット化手段とを備え、前記パケット単位決定手段は、前記メディアデータを格納するのに必要な全てのパケットについて、前記画像データ、音声データおよびテキストデータの各サンプルの再生開始時間を揃えて前記単位を決定することを特徴とする。
【００６７】
これによって、メディアデータに含まれる画像データと、音声データおよびテキストデータの再生開始時間が揃えられてパケットに格納されることとなるので、再生装置側で再生時におけるデータアクセスの効率を向上させることができる。
【００６８】
また、本発明に係る多重化装置は、前記画像データは、動画データであり、前記解析手段は、さらに、前記メディアデータ取得手段が取得した前記動画データを解析して、前記動画データが、画面内符号化サンプルであることを示すイントラフレーム情報が含まれているサンプルを１つ以上含む場合に、前記イントラフレーム情報を取得し、前記パケット単位決定手段は、前記解析手段が前記イントラフレーム情報を取得した場合に、前記イントラフレーム情報と前記再生開始時間情報とに基づいて、前記メディアデータをパケット化する単位を決定し、前記イントラフレーム情報を含む前記動画データのサンプルを、前記パケット化単位の先頭に配置するのが好ましい。
【００６９】
これによって、パケットに含まれる先頭のビデオサンプルは、イントラフレームのビデオサンプルとなるので、再生装置側でランダムアクセス時におけるサンプルの検索に要する計算量を大幅に削減することができる。
さらに、本発明に係る多重化装置は、前記パケットデータ部作成手段は、前記パケット化単位に含まれる前記メディアデータのサンプルについて、サンプルの再生開始時間が昇順となるようにインタリーブして格納する前記パケットデータ部を作成するのがより好ましい。
【００７０】
これによって、ビデオサンプルとオーディオサンプルとが再生開始時間が昇順となってｍｄａｔに格納されるので、再生装置側でのランダムアクセス時におけるシーク動作の回数を少なくすることができ、シーク速度の遅い再生装置でも迅速なランダムアクセス再生を実現することができる。
【００７１】
なお、本発明は、このような多重化装置として実現することができるだけでなく、このような多重化装置が備える特徴的な手段をステップとする多重化方法として実現したり、それらのステップをコンピュータに実行させるプログラムとして実現したりすることもできる。そして、そのようなプログラムは、ＣＤ−ＲＯＭ等の記録媒体やインターネット等の伝送媒体を介して配信することができるのは言うまでもない。
【００７２】
【発明の実施の形態】
以下、本発明の実施の形態について、図面を参照しながら説明する。なお、本実施の形態におけるビデオデータとして、MPEG-4 Visualの符号化データを用いることとし、本実施の形態におけるオーディオデータとして、MPEG-4 Audioの符号化データを用いることとする。そして、本実施の形態では、主に、ビデオデータとオーディオデータとを多重化する装置について説明するが、テキストデータ等のその他のメディアデータの多重化について排除することを意図するものではない。
【００７３】
（実施の形態１）
まず、本発明の実施の形態１に係る多重化装置について、図１から図５を参照しながら説明する。
図１は、本発明の実施の形態１に係る多重化装置の機能的な構成を示すブロック図である。
この多重化装置１００は、ビデオデータやオーディオデータを多重化してＭＰ４ファイルの拡張部データを作成する装置であり、第１入力部１０１、第１データ蓄積部１０２、第１解析部１０３、第２入力部１０４、第２データ蓄積部１０５、第２データ解析部１０６、パケット単位決定部１０７、パケット作成テーブル蓄積部１１１、パケットヘッダ作成部１１２、パケットデータ作成部１１３およびパケット結合部１１４を備える。
【００７４】
第１入力部１０１は、符号化されたビデオデータを画像符号化装置等から多重化装置１００内に取り込むインターフェースであり、取得したビデオ入力データを順次、第１データ蓄積部１０２に蓄積させる。
第１データ蓄積部１０２は、ビデオ入力データを一時的に保持するキャッシュメモリやＲＡＭ（Random Access Memory）等である。
【００７５】
第１解析部１０３は、第１データ蓄積部１０２に保持されているビデオ入力データのうちビデオサンプル１つ分のデータであるビデオサンプルデータを読み出して解析し、ビデオサンプルのヘッダ情報を出力する処理部であり、ＣＰＵやメモリによって実現される。なお、この第１解析部１０３において出力されるビデオサンプルヘッダ情報には、ビデオサンプルのサイズ、再生時間長およびイントラフレームであるか否かを示す情報が含まれる。さらに、このビデオサンプルヘッダ情報には、双方向予測を用いたサンプルの場合、復号時間と表示時間の差分情報も含まれる。
【００７６】
第２入力部１０４は、符号化されたオーディオデータを音声符号化装置等から多重化装置１００内に取り込むインターフェースであり、取得したオーディオ入力データを順次、第２データ蓄積部１０５に蓄積させる。
第２データ蓄積部１０５は、オーディオ入力データを一時的に保持するキャッシュメモリやＲＡＭ等である。
【００７７】
第２解析部１０６は、第２データ蓄積部１０５に保持されているオーディオ入力データのうちオーディオサンプル１つ分のデータであるオーディオサンプルデータを読み出して解析し、オーディオサンプルのヘッダ情報を出力する処理部であり、ＣＰＵやメモリによって実現される。なお、この第２解析部１０６において出力されるオーディオサンプルヘッダ情報には、オーディオサンプルのサイズおよび再生時間長を示す情報が含まれている。
【００７８】
パケット単位決定部１０７は、パケットに含まれるビデオサンプルおよびオーディオサンプルのヘッダ情報を集積させて、パケットに含まれるビデオサンプルの再生開始時間とオーディオサンプルの再生開始時間とが揃うように、ビデオデータおよびオーディオデータのパケット単位を決定する処理部であり、ＣＰＵやメモリによって実現される。また、パケット単位決定部１０７は、決定したパケット単位分のサンプルヘッダ情報の集まりをパケット作成テーブルとしてパケット作成テーブル蓄積部１１１に出力するとともに、パケット単位の決定後にパケットヘッダの作成を指示するパケット作成信号をパケットヘッダ作成部１１２に出力する。そして、このパケット単位決定部１０７は、パケット単位を時間単位で調整する時間調整部１０８と、ビデオデータのパケット単位を決定するビデオパケット単位決定部１０９と、オーディオデータのパケット単位を決定するオーディオパケット単位決定部１１０とを備える。
【００７９】
時間調整部１０８は、パケットが定められた時間単位内に納まるように、パケットの終了時間を調整する処理部である。この時間調整部１０８は、まず、予め定められた時間（ターゲットタイム）をビデオパケット単位決定部１０９に出力する。なお、このターゲットタイムは、ユーザが指定することとしてもよい。この場合、多重化装置１００は、キーボード等の入力装置を介してターゲットタイムの指定を取得し、入力装置から指定されたターゲットタイムを示すターゲットタイム入力信号が時間調整部１０８に出力されることとなる。
【００８０】
ビデオパケット単位決定部１０９は、第１解析部１０３からビデオサンプルヘッダ情報を取得してビデオデータのパケット単位を決定する処理部である。
このビデオパケット単位決定部１０９は、時間調整部１０８からターゲットタイムを、また、第１解析部１０３からビデオサンプルヘッダ情報を取得して、ビデオデータがターゲットタイム内のパケットに納まるように、各ビデオサンプルヘッダ情報に含まれる各ビデオサンプルの再生時間長をカウントしながら、パケットに含まれる最後のビデオサンプルのヘッダ情報まで順次ビデオパケット作成テーブルに追加していく。ビデオパケット単位決定部１０９は、パケットに含まれる最後のビデオサンプルのヘッダ情報をビデオパケット作成テーブルに追加すると、そのパケットに含まれる最初のビデオサンプルの再生開始時間とそのパケットに含まれるビデオサンプルの再生時間長の総和とを示すビデオサンプル再生時間情報をオーディオパケット単位決定部１１０に出力する。
【００８１】
オーディオパケット単位決定部１１０は、第２解析部１０６から取得したオーディオサンプルヘッダ情報を取得してオーディオデータのパケット単位を決定する処理部である。
このオーディオパケット単位決定部１１０は、ビデオパケット単位決定部１０９からビデオサンプル再生時間情報を、また、第２解析部１０６からオーディオサンプルヘッダ情報を取得して、パケットの先頭に、そのパケットに含まれる先頭のビデオサンプルの再生開始時間と同一または近似する再生開始時間のオーディオサンプルを配置し、各オーディオサンプルヘッダ情報に含まれる各オーディオサンプルの再生時間長をカウントしながら、そのパケットに含まれるオーディオサンプルの再生時間長の総和が、そのパケットに含まれるビデオサンプルの再生時間長の総和と同一または近似するように、そのパケットに含まれる最後のオーディオサンプルを配置する。
【００８２】
なお、ここで、ビデオサンプルの再生開始時間と近似する再生開始時間のオーディオサンプルとは、ビデオサンプルの再生開始時間以降であって、最も早い再生開始時間のオーディオサンプル、または、ビデオサンプルの再生開始時間以前であって、最も遅い再生開始時間のオーディオサンプルを意味する。
【００８３】
その後、オーディオパケット単位決定部１１０は、パケットに含まれる先頭のオーディオサンプルから最後のオーディオサンプルまでのオーディオサンプルヘッダ情報を順次オーディオパケット作成テーブルに追加する。
パケット作成テーブル蓄積部１１１は、パケット単位決定部１０７から出力されるビデオパケット作成テーブルおよびオーディオパケット作成テーブルを一時的に保持するキャッシュメモリやＲＡＭ等である。
【００８４】
パケットヘッダ作成部１１２は、パケットのヘッダ情報が格納されるパケットヘッダ部（ｍｏｏｆ）を作成する処理部であり、ＣＰＵやメモリによって実現される。
このパケットヘッダ作成部１１２は、パケット単位決定部１０７からパケット作成信号を取得すると、パケット作成テーブル蓄積部１１１からパケット作成テーブルを参照してパケットサンプルヘッダ情報を読み出してｍｏｏｆデータを作成し、パケット結合部１１４に出力する。
【００８５】
また、パケットヘッダ作成部１１２は、パケットに含まれるビデオサンプルおよびオーディオサンプルの実体データが、第１データ蓄積部１０２および第２データ蓄積部１０５のどこに格納されているかを示すポインタ情報や、サンプルのサイズを示すサンプルサイズ情報や、パケットデータ部（ｍｄａｔ）の作成を指示する信号が含まれるｍｄａｔ情報をパケットデータ作成部１１３に出力する。
【００８６】
なお、このパケットヘッダ作成部１１２は、ｍｏｏｆを作成する際に、例えば、AMR(Advanced Multi Rate CODEC)のような、データの途中で符号化レートの切替が発生する符号化方式によって符号化されたメディアデータについて、符号化レートに応じてヘッダ情報を異なるｔｒａｆに格納することもできる。
【００８７】
パケットデータ作成部１１３は、パケットの実体データが格納されるパケットデータ部（ｍｄａｔ）を作成する処理部であり、ＣＰＵやメモリによって実現される。
このパケットデータ作成部１１３は、パケットヘッダ作成部１１２からｍｄａｔ情報を取得すると、ｍｄａｔ情報に含まれるポインタ情報とサンプルサイズ情報とに基づいて、第１データ蓄積部１０２からパケットに含まれるビデオサンプルのビデオ実体データを読み出し、第２データ蓄積部１０５からパケットに含まれるオーディオサンプルのオーディオ実体データを読み出してｍｄａｔデータを作成し、パケット結合部１１４に出力する。
【００８８】
パケット結合部１１４は、ｍｏｏｆデータとｍｄａｔデータとを結合させて、１パケット分のｍｐ４拡張部データを作成する処理部であり、ＣＰＵやメモリによって実現される。このパケット結合部１１４は、パケットヘッダ作成部１１２からｍｏｏｆデータを取得し、パケットデータ作成部１１３からｍｄａｔデータを取得して、ｍｏｏｆデータとｍｄａｔデータとを結合させて１パケット分のｍｐ４拡張部データを作成し、順次作成したｍｐ４拡張部データをＭＰ４ファイルを作成する装置に出力する。
【００８９】
このように構成される多重化装置１００において、ＭＰ４ファイルの拡張部が作成される処理手順について図２を用いて説明する。
図２は、多重化装置１００の処理動作を示すフロー図である。
まず、第１入力部１０１および第２入力部１０４は、多重化装置１００内にそれぞれビデオデータおよびオーディオデータを取り込むと（Ｓ１００）、第１入力部１０１はビデオ入力データを第１データ蓄積部１０２に蓄積させ、第２入力部１０４はオーディオ入力データを第２データ蓄積部１０５に蓄積させる。
【００９０】
次に、第１解析部１０３は、第１データ蓄積部１０２からビデオサンプルデータを読み出して解析し、ビデオサンプルヘッダ情報をパケット単位決定部１０７のビデオパケット単位決定部１０９に出力する。そして、ビデオパケット単位決定部１０９は、第１解析部１０３から取得したビデオサンプルヘッダ情報と時間調整部１０８から取得したターゲットタイムとに基づいてビデオデータのパケット単位を決定する（Ｓ１１０）。なお、ビデオパケット単位決定部１０９がビデオデータのパケット単位を決定する処理動作については、詳しく後述する。
【００９１】
その後、ビデオパケット単位決定部１０９は、パケット単位が決定されたパケットに含まれるビデオサンプルの再生時間情報をオーディオパケット単位決定部１１０に出力する（Ｓ１２０）。
そして、オーディオパケット単位決定部１１０は、ビデオパケット単位決定部１０９から取得したビデオサンプルの再生時間情報に基づいて、オーディオデータのパケット単位を決定する（Ｓ１３０）。このとき、オーディオパケット単位決定部１１０は、パケットに含まれる先頭のオーディオサンプルの再生開始時間が、パケットに含まれる先頭のビデオサンプルの再生開始時間と同一またはこれに近似するように、パケット単位を決定する。
【００９２】
オーディオパケット単位決定部１１０がオーディオデータのパケット単位を決定すると、パケット単位決定部１０７は、パケット作成テーブルをパケット作成テーブル蓄積部１１１に出力し、パケット作成信号をパケットヘッダ作成部１１２に出力する。
【００９３】
その後、パケットヘッダ作成部１１２は、決定された単位でｍｏｏｆデータを作成してパケット結合部１１４に出力し、また、パケットデータ作成部１１３は、決定された単位でｍｄａｔデータを作成してパケット結合部１１４に出力し、パケット結合部１１４がｍｏｏｆデータとｍｄａｔデータとを結合させて、決定された単位で１パケットを作成し（Ｓ１４０）、１パケット分のｍｐ４拡張部データとして出力する。
【００９４】
１パケットを作成し終えると、多重化装置１００は、第１入力部１０１および第２入力部１０４から、まだ入力されるデータがあるか否かを判断する（Ｓ１５０）。ここで、入力データがある場合（Ｓ１５０のＮｏ）、多重化装置１００は、バッファメモリ、すなわち第１データ蓄積部１０２、第２データ蓄積部１０５およびパケット作成テーブル蓄積部１１１に保持されているデータのうち、既にパケット化が終了したデータをクリアして（Ｓ１６０）、上記Ｓ１１０からＳ１５０までの処理動作を繰り返す。
【００９５】
一方、入力データがない場合（Ｓ１５０のＹｅｓ）、多重化装置１００は、ＭＰ４ファイルの拡張部の作成処理を終了する。
このように、多重化装置１００は、まずビデオデータのパケット単位を決定した後にオーディオデータのパケット単位を決定して、メディアデータの多重化を行なうことによって、ＭＰ４ファイルの拡張部を作成する。
【００９６】
ここで、図２のステップＳ１１０において、ビデオパケット単位決定部１０９がビデオデータのパケット単位を決定する処理動作について詳しく説明する。
図３は、ビデオパケット単位決定部１０９の処理動作を示すフロー図である。このフローに先立ってビデオパケット単位決定部１０９は、時間調整部１０８からターゲットタイムを取得しておく。
【００９７】
そして、ビデオパケット単位決定部１０９は、第１解析部１０３からビデオサンプルヘッダ情報を取得すると（Ｓ１１１）、ビデオサンプルヘッダ情報をビデオパケット作成テーブルに追加する（Ｓ１１２）。
このとき、ビデオパケット単位決定部１０９は、ビデオサンプルヘッダ情報に含まれるビデオサンプルの再生時間長の合計、すなわちパケットに含まれるビデオデータの総再生時間が、先に取得したターゲットタイムになったか、あるいは、ターゲットタイムを超えたか否かを判定する（Ｓ１１３）。
【００９８】
パケットに含まれるビデオデータの総再生時間がターゲットタイムに至っていない場合（Ｓ１１３のＮｏ）、ビデオパケット単位決定部１０９は、次のビデオサンプルヘッダ情報を取得して（Ｓ１１１）、Ｓ１１２とＳ１１３の処理動作を繰り返す。
【００９９】
パケットに含まれるビデオデータの総再生時間がターゲットタイムに至っている場合（Ｓ１１３のＹｅｓ）、ビデオパケット単位決定部１０９は、ビデオパケット作成テーブルに最後に追加したビデオサンプルヘッダ情報が指し示すビデオサンプルを、パケットに含まれる最後のビデオサンプルに決定し（Ｓ１１４）、パケット単位を決定する処理動作を終了する。
【０１００】
続いて、図２のステップＳ１３０において、オーディオパケット単位決定部１１０がオーディオデータのパケット単位を決定する処理動作について詳しく説明する。
図４は、オーディオパケット単位決定部１１０の処理動作を示すフロー図である。
【０１０１】
このフローに先立って、オーディオパケット単位決定部１１０は、ビデオパケット単位決定部１０９からビデオサンプル再生時間情報を取得しておく。
そして、オーディオパケット単位決定部１１０は、第２解析部１０６からオーディオサンプルヘッダ情報を取得すると（Ｓ１３１）、先に取得したビデオサンプル再生時間情報を参照して（Ｓ１３２）、パケットに含まれる先頭のビデオサンプルの再生開始時間を読み出し、パケットに含まれる先頭のビデオサンプルの再生開始時間と同一または近似する再生開始時間のオーディオサンプルを、そのパケットのオーディオ先頭サンプルに決定する（Ｓ１３３）。
【０１０２】
オーディオパケット単位決定部１１０は、パケットに含まれるオーディオ先頭サンプルを決定すると、オーディオサンプルヘッダ情報を順次取得して（Ｓ１３４）、オーディオサンプルヘッダ情報をオーディオパケット作成テーブルに追加していく（Ｓ１３５）。
【０１０３】
その後、オーディオパケット単位決定部１１０は、ビデオサンプル再生時間情報を参照して、パケットに含まれるビデオサンプルの再生時間長の総和を読み出し（Ｓ１３６）、パケットに含まれるオーディオサンプルの再生時間長の総和が、パケットに含まれるビデオサンプルの再生時間長の総和と同一または近似する値となるように、そのパケットに含まれる最後のオーディオサンプルを決定し（Ｓ１３７）、パケット単位を決定する処理動作を終了する。
【０１０４】
このような多重化装置１００による処理動作を経て作成されるＭＰ４ファイルの拡張部は、再生装置側におけるデータアクセスの効率に優れている。その理由について、図５に多重化装置１００が作成するＭＰ４ファイル拡張部のデータ構造の例を示して説明する。
【０１０５】
図５（ａ）に示すＭＰ４ファイル拡張部２００は、複数のパケットから構成され、ＭＰ４ファイルの基本部に結合されている。
ＭＰ４ファイル拡張部２００を構成する各パケットは、パケットヘッダ部のｍｏｏｆと、パケットデータ部のｍｄａｔから構成されている。ここで、パケット＿１は、ＭＰ４ファイル拡張部２００の１番目のパケットであることを意味し、パケット＿１に含まれるｍｏｏｆは、ｍｏｏｆ＿１、パケット＿１に含まれるｍｄａｔは、ｍｄａｔ＿１と示す。また、図５（ａ）の各ｍｄａｔ中に示す“Ｖ”は、ビデオサンプルであることを指し示すものであり、図５（ａ）の各ｍｄａｔ中に示す“Ａ”は、オーディオサンプルであることを指し示すものである（以下、他の図においても同様とする。）。
【０１０６】
ＭＰ４ファイル拡張部２００のｍｄａｔ＿１には、再生開始時間が２０秒のビデオサンプルがビデオ先頭サンプルとして格納されており、同じく再生開始時間が２０秒のオーディオサンプルがオーディオ先頭サンプルとして格納されている。また、ｍｄａｔ＿２にも、再生開始時間が３０秒のビデオサンプルがビデオ先頭サンプルとして格納されており、同じく再生開始時間が３０秒のオーディオサンプルがオーディオ先頭サンプルとして格納されている。
【０１０７】
このように、１つのパケットにビデオサンプルとオーディオサンプルとを、各々の再生開始時間を揃えて格納することによって、再生装置側で、ＭＰ４ファイル拡張部２００を再生する時に、データアクセスに要する計算量を大幅に削減することができる。
【０１０８】
また、各メディアデータの再生開始時間が揃えられてパケットに格納されているので、任意の数のパケットでデータを分割して、ＭＰ４ファイルデータのサイズを所望のサイズに調整することもできる。
ここで、多重化装置１００が作成するＭＰ４ファイル拡張部は、図５（ｂ）に示すデータ構造としてもよい。
【０１０９】
図５（ｂ）は、多重化装置１００が作成するＭＰ４ファイル拡張部のデータ構造の第２例を示す図である。
図５（ｂ）に示すＭＰ４ファイル拡張部２１０のｍｄａｔ＿１には、再生開始時間が２０秒のビデオサンプルがビデオ先頭サンプルとして格納されており、ｍｄａｔ＿２には、再生開始時間が２０秒のオーディオサンプルがオーディオ先頭サンプルとして格納されている。また、ｍｄａｔ＿３には、再生開始時間が３０秒のビデオサンプルがビデオ先頭サンプルとして格納されており、ｍｄａｔ＿４には、再生開始時間が３０秒のオーディオサンプルがオーディオ先頭サンプルとして格納されている。
【０１１０】
このように、１つのパケットにビデオまたはオーディオのいずれか一方のデータを格納して、ビデオデータを格納するパケットと、再生開始時間が揃えられたオーディオデータを格納するパケットを交互に配列することによっても、再生装置側で、ＭＰ４ファイル拡張部２００を再生する時に、データアクセスに要する計算量を大幅に削減することができる。
【０１１１】
以上説明したように、本実施の形態１に係る多重化装置１００によれば、各メディアデータの再生開始時間を揃えて、各メディアデータをパケット化するので、再生装置側におけるデータアクセスの効率化を図ることができる。
【０１１２】
（実施の形態２）
次に、本発明の実施の形態２に係る多重化装置について、図６から図９を参照しながら説明する。
本実施の形態２に係る多重化装置は、主な構成要素において、上記実施の形態１に係る多重化装置１００と共通するが、パケット単位決定部において特徴的な構成を備えており、この点において上記実施の形態１に係る多重化装置１００と異なる。以下、この異なる点を中心に説明する。なお、上記実施の形態１と同一の構成要素については、同一の符号を用いることとし、説明を省略する。
【０１１３】
図６は、本実施の形態２に係る多重化装置のパケット単位決定部の機能的な構成を示すブロック図である。
このパケット単位決定部１１７は、パケットに含まれるビデオサンプルおよびオーディオサンプルのヘッダ情報を集積させて、各々の再生開始時間が揃うように、かつ、パケットに含まれる先頭のビデオサンプルがイントラフレームとなるように、ビデオデータおよびオーディオデータのパケット単位を決定する処理部であり、時間調整部１０８と、ビデオパケット単位決定部１１９と、オーディオパケット単位決定部１１０とを備える。
【０１１４】
ビデオパケット単位決定部１１９は、第１解析部１０３からビデオサンプルヘッダ情報を取得してビデオデータのパケット単位を、時間またはイントラフレームのいずれかを基準に決定する処理部であり、時間基準単位調整部１２０と、Ｉフレーム基準単位調整部１２１とを備える。
【０１１５】
時間基準単位調整部１２０は、時間調整部１０８から出力されるターゲットタイムに基づいてビデオデータのパケット単位を調整する処理部であり、各ビデオサンプルヘッダ情報の再生時間長をカウントして、パケットが定められた時間単位となるようにパケット単位を調整する。
【０１１６】
Ｉフレーム基準単位調整部１２１は、第１解析部１０３から出力されるビデオサンプルヘッダ情報にイントラフレームであることを示す情報が含まれているか否かに基づいてビデオデータのパケット単位を調整する処理部であり、イントラフレームであることを示す情報が含まれているビデオサンプルヘッダ情報を取得すると、イントラフレームのビデオサンプルでパケット単位を切り替えて、次のパケットのビデオ先頭サンプルがイントラフレームのビデオサンプルとなるようにパケット単位を調整する。
【０１１７】
このように構成されるパケット単位決定部１１７を備えた本実施の形態２に係る多重化装置において、ビデオパケット単位決定部１１９がビデオデータのパケット単位を決定する処理動作について詳しく説明する。
図７は、ビデオパケット単位決定部１１９の処理動作を示すフロー図である。
【０１１８】
このフローに先立って、ビデオパケット単位決定部１１９は、時間調整部１０８からターゲットタイムを取得して、時間基準単位調整部１２０に保持する。
そして、上記実施の形態１と同様に、ビデオパケット単位決定部１１９は、第１解析部１０３からビデオサンプルヘッダ情報を取得すると（Ｓ２０１）、ビデオサンプルヘッダ情報をビデオパケット作成テーブルに追加する（Ｓ２０２）。
【０１１９】
このとき、ビデオパケット単位決定部１１９は、Ｉフレーム基準単位調整部１２１において、取得したビデオサンプルヘッダ情報にイントラフレームであることを示す情報が含まれているか否かを判定する（Ｓ２０３）。
イントラフレームであることを示す情報が含まれている場合（Ｓ２０３のＹｅｓ）、ビデオパケット単位決定部１１９は、時間基準単位調整部１２０において、パケットに含まれる全ビデオサンプルの総再生時間が、先に取得したターゲットタイムを超えているか否かを判定する（Ｓ２０５）。
【０１２０】
ここで、イントラフレームであることを示す情報が含まれていない場合（Ｓ２０３のＮｏ）またはターゲットタイムを超えていない場合（Ｓ２０５のＮｏ）、ビデオパケット単位決定部１１９は、時間基準単位調整部１２０において、ビデオサンプルヘッダ情報に含まれるビデオサンプルの再生時間長を加算することによって、パケットに含まれるビデオサンプルの再生時間長の総和を更新し（Ｓ２０４）、次のビデオサンプルヘッダ情報を取得して（Ｓ２０１）上記処理動作を繰り返す。
【０１２１】
一方、ターゲットタイムを超えている場合（Ｓ２０５のＹｅｓ）、ビデオパケット単位決定部１１９は、パケットに含まれる最後のビデオサンプルを、Ｉフレーム基準単位調整部１２１においてイントラフレームであると判定されたビデオサンプルの１つ前のビデオサンプルに決定し（Ｓ２０６）、ビデオデータのパケット単位決定の処理動作を終了する。
【０１２２】
このようなビデオパケット単位決定部１１９の処理動作を経て作成されるＭＰ４ファイルの拡張部は、パケットの先頭に格納されるビデオサンプルが必ずイントラフレームのビデオサンプルとなるので、再生装置側でランダムアクセス時にパケットの先頭のビデオサンプルから再生を開始することができるようになり、ランダムアクセス可能なビデオサンプルの検索に要する計算量を大幅に削減することができる。
【０１２３】
また、パケットの先頭に格納されるビデオサンプルが必ずイントラフレームのビデオサンプルとなることによって、パケットヘッダ部（ｍｏｏｆ）では、ビデオトラックのヘッダ情報を格納するｔｒａｆの先頭に位置するｔｒｕｎの先頭サンプルフラグフィールドにのみ、ランダムアクセス可能であることを示す情報を記述すればよく、各ｔｒｕｎのサンプルフラグフィールドは、デフォルト値を使用することにより省略できるので、ｍｏｏｆデータ作成時の負荷が軽減されるとともに、ＭＰ４ファイル全体のファイルサイズの削減を図ることもできる。
【０１２４】
なお、この処理動作によると、ビデオデータに含まれるイントラフレーム同士の間隔が大きくなると、１パケットあたりの再生時間長が長くなる場合がある。そのため、パケット単位決定部１１７は、以下に述べるような処理動作としてもよい。
【０１２５】
図８は、ビデオパケット単位決定部１１９の第２の処理動作を示すフロー図である。
上記第１の処理動作と同様に、このフローに先立って、ビデオパケット単位決定部１１９は、時間調整部１０８からターゲットタイムを取得して、時間基準単位調整部１２０に保持する。
【０１２６】
そして、ビデオパケット単位決定部１１９は、第１解析部１０３からビデオサンプルヘッダ情報を取得すると（Ｓ２１１）、ビデオサンプルヘッダ情報をビデオパケット作成テーブルに追加する（Ｓ２１２）。
このとき、ビデオパケット単位決定部１１９は、時間基準単位調整部１２０において、パケットに含まれる全ビデオサンプルの総再生時間が、先に取得したターゲットタイムを超えているか否かを判定する（Ｓ２１３）。
【０１２７】
ターゲットタイムを超えている場合（Ｓ２１３のＹｅｓ）、ビデオパケット単位決定部１１９は、パケットに含まれる最後のビデオサンプルを、今回取得したビデオサンプルヘッダ情報の１つ前のビデオサンプルヘッダ情報が指し示すビデオサンプルに決定し（Ｓ２１４）、ビデオデータのパケット単位決定の処理動作を終了する。
【０１２８】
一方、ターゲットタイムを超えていない場合（Ｓ２１３のＮｏ）、ビデオパケット単位決定部１１９は、Ｉフレーム基準単位調整部１２１において、取得したビデオサンプルヘッダ情報にイントラフレームであることを示す情報が含まれているか否かを判定する（Ｓ２１５）。
【０１２９】
ここで、イントラフレームであることを示す情報が含まれている場合（Ｓ２１５のＹｅｓ）、ビデオパケット単位決定部１１９は、パケットに含まれる最後のビデオサンプルを、Ｉフレーム基準単位調整部１２１においてイントラフレームであると判定されたビデオサンプルの１つ前のビデオサンプルに決定し（Ｓ２１４）、ビデオデータのパケット単位決定の処理動作を終了する。
【０１３０】
他方、イントラフレームであることを示す情報が含まれていない場合（Ｓ２１５のＮｏ）、ビデオパケット単位決定部１１９は、時間基準単位調整部１２０において、ビデオサンプルヘッダ情報に含まれるビデオサンプルの再生時間長を加算することによって、パケットに含まれるビデオサンプルの再生時間長の総和を更新し（Ｓ２１６）、次のビデオサンプルヘッダ情報を取得して（Ｓ２１１）上記処理動作を繰り返す。
【０１３１】
このようなビデオパケット単位決定部１１９の第２の処理動作を経て作成されるＭＰ４ファイルの拡張部は、所定の時間制限を設定してパケットを作成してパケットサイズを所望のサイズ以下に保ちつつ、イントラフレームのビデオサンプルが存在すれば、パケットの先頭に格納することができるので、再生装置側でランダムアクセス時にパケットの先頭のビデオサンプルについてのみランダムアクセス可能なビデオサンプルであるか否かを判定すればよくなり、ランダムアクセス可能なビデオサンプルの検索に要する計算量を削減することができる。
【０１３２】
なお、ビデオパケット単位決定部１１９は、ビデオデータのパケット単位決定の処理動作を終了すると、ビデオサンプル再生時間情報をオーディオパケット単位決定部１１０に出力し、オーディオパケット単位１１０でオーディオデータのパケット単位決定の処理動作が行なわれるのは、上記実施の形態１の場合と同様である。
【０１３３】
このようなパケット単位決定部１１７による処理動作を経て作成されるＭＰ４ファイルの拡張部は、再生装置側におけるランダムアクセス時の検索負荷を軽減させる。その理由について、図９に本実施の形態２に係る多重化装置が作成するＭＰ４ファイル拡張部のデータ構造の例を示して説明する。
【０１３４】
図９（ａ）に示すＭＰ４ファイル拡張部２２０のｍｄａｔ＿１には、イントラフレームのビデオサンプルがビデオ先頭サンプルとして格納されており、ｍｄａｔ＿２にも同じくイントラフレームのビデオサンプルがビデオ先頭サンプルとして格納されている。
【０１３５】
このように、イントラフレームのビデオサンプルを先頭のビデオサンプルとしてパケットに格納することによって、再生装置側でランダムアクセス時において、ランダムアクセス可能なビデオサンプルを取得するためにパケットの先頭のビデオサンプルのみを検索すれば足りるため、パケットに含まれる全てのビデオサンプルを検索する必要がなくなり、ランダムアクセス時のサンプル検索負荷を大幅に軽減することができる。
【０１３６】
また、このとき、ＭＰ４ファイル拡張部２２０のｍｏｏｆ＿１およびｍｏｏｆ＿２においても、ビデオトラックのヘッダ情報を格納するｔｒａｆの先頭に位置するｔｒｕｎの先頭サンプルフラグフィールドにのみ、ランダムアクセス可能であることを示す情報を記述することによって、ｍｏｏｆ＿１およびｍｏｏｆ＿２のサイズを削減することもできる。
【０１３７】
ここで、本実施の形態２に係る多重化装置が作成するＭＰ４ファイル拡張部は、図９（ｂ）に示すデータ構造としてもよい。
図９（ｂ）に示すＭＰ４ファイル拡張部２３０のｍｄａｔ＿１には、イントラフレームのビデオサンプルがビデオ先頭サンプルとして格納されており、ｍｄａｔ＿３にも同じくイントラフレームのビデオサンプルがビデオ先頭サンプルとして格納されている。また、ｍｄａｔ＿２およびｍｄａｔ＿４には、オーディオサンプルが格納されている。
【０１３８】
このように、１つのパケットにビデオまたはオーディオのいずれか一方のデータを格納して、ビデオデータを格納するパケットには、イントラフレームのビデオサンプルを先頭のビデオサンプルとして格納することによっても、再生装置側でランダムアクセス時におけるサンプル検索負荷を大幅に軽減することができる。
【０１３９】
なお、これらＭＰ４ファイル拡張部のデータ構造例のいずれにおいても、パケットに格納される先頭のビデオサンプルの再生開始時間と先頭のオーディオサンプルの再生開始時間とを揃えることによって、再生装置側でのデータアクセスに要する計算量を大幅に削減することができる。
【０１４０】
以上説明したように、本実施の形態２に係る多重化装置によれば、ランダムアクセス可能なビデオサンプルを先頭のビデオサンプルとして、パケットを作成するので、再生装置におけるランダムアクセス時のサンプル検索に要する計算量を削減することができる。
【０１４１】
（実施の形態３）
さらに、本発明の実施の形態３に係る多重化装置について、図１０から図１４を参照しながら説明する。
本実施の形態３に係る多重化装置は、主な構成要素において、上記実施の形態１および２に係る多重化装置と共通するが、パケットデータ作成部において特徴的な構成を備えており、この点において上記実施の形態１および２に係る多重化装置と異なる。以下、この異なる点を中心に説明する。なお、上記実施の形態１および２と同一の構成要素については、同一の符号を用いることとし、説明を省略する。
【０１４２】
図１０は、本実施の形態３に係る多重化装置のパケットデータ作成部の機能的な構成を示すブロック図である。
このパケットデータ作成部１３０は、パケットデータ部（ｍｄａｔ）を、ビデオサンプルの実体データとオーディオサンプルの実体データとをインタリーブして格納することによって作成する処理部であり、ｍｄａｔ情報取得部１３１と、ビデオ実体データ読出部１３２と、オーディオ実体データ読出部１３３と、インタリーブ配列部１３４とを備える。
【０１４３】
ｍｄａｔ情報取得部１３１は、パケットヘッダ作成部１１２からｍｄａｔ情報を取得して、パケットデータ作成部１３０を構成する他の各部に実体データの読出指示や再生時間情報を出力する処理部である。
このｍｄａｔ情報取得部１３１は、パケットヘッダ作成部１１２からｍｄａｔ情報を取得するとｍｄａｔ情報を解析して、ビデオサンプルおよびオーディオサンプルの再生開始時間と再生終了時間とを示す再生時間情報を取得し、この再生時間情報に基づいて、パケットに含まれる全てのビデオサンプルとオーディオサンプルとを再生開始時間が昇順となるように並び替える。
【０１４４】
そして、ｍｄａｔ情報取得部１３１は、並び替えた順番に従って再生開始時間の若いサンプルから順に、ビデオ実体データ読出部１３２にビデオサンプルの実体データの読み出しを指示するビデオ読出指示を出力する、または、オーディオ実体データ読出部１３３にオーディオサンプルの実体データの読み出しを指示するオーディオ読出指示を出力する。このビデオ読出指示には、ビデオサンプルの実体データが第１データ蓄積部１０２のどこに格納されているかを示すポインタ情報とビデオサンプルのサイズ情報とが含まれており、オーディオ読出指示には、オーディオサンプルの実体データが第２データ蓄積部１０５のどこに格納されているかを示すポインタ情報とオーディオサンプルのサイズ情報とが含まれている。
【０１４５】
ビデオ実体データ読出部１３２は、ｍｄａｔ情報取得部１３１からビデオ読出指示を取得して、第１データ蓄積部１０２からビデオ実体データを読み出す処理部である。このビデオ実体データ読出部１３２は、ビデオ読出指示に含まれるポインタ情報とサイズ情報とを参照して第１データ蓄積部１０２からビデオ実体データを読み出して、読み出したビデオ実体データをインタリーブ配列部１３４に出力する。
【０１４６】
オーディオ実体データ読出部１３３は、ｍｄａｔ情報取得部１３１からオーディオ読出指示を取得して、第２データ蓄積部１０５からオーディオ実体データを読み出す処理部である。このオーディオ実体データ読出部１３３は、オーディオ読出指示に含まれるポインタ情報とサイズ情報とを参照して第２データ蓄積部１０５からオーディオ実体データを読み出して、読み出したオーディオ実体データをインタリーブ配列部１３４に出力する。
【０１４７】
インタリーブ配列部１３４は、ビデオ実体データ読出部１３２およびオーディオ実体データ読出部１３３から出力される読出ビデオデータおよび読出オーディオデータを出力される順に逐次取得し、インタリーブして配列することによってｍｄａｔデータを作成し、パケット結合部１１４に出力する処理部である。
【０１４８】
このように構成されるパケットデータ作成部１３０を備えた本実施の形態３に係る多重化装置において、パケットデータ作成部１３０がｍｄａｔを作成する処理動作について詳しく説明する。
図１１は、パケットデータ作成部１３０の処理動作を示すフロー図である。
【０１４９】
まず、パケットデータ作成部１３０は、ｍｄａｔ情報取得部１３１において、パケットヘッダ作成部１１２からｍｄａｔ情報を取得する（Ｓ３０１）。ｍｄａｔ情報取得部１３１は、取得したｍｄａｔ情報を解析して、サンプルのポインタ情報とサイズ情報と再生時間情報とを抽出する。そして、ｍｄａｔ情報取得部１３１は、抽出したサンプルの再生時間情報に基づいて、パケットに含まれる全てのビデオサンプルとオーディオサンプルとを再生開始時間が昇順となるように並び替える。続いて、ｍｄａｔ情報取得部１３１は、並び替えた順番に従って再生開始時間の若いサンプルから順に、抽出したビデオサンプルのポインタ情報とサイズ情報とを含むビデオ読出指示をビデオ実体データ読出部１３２に出力する、または、抽出したオーディオサンプルのポインタ情報とサイズ情報とを含むオーディオ読出指示をオーディオ実体データ読出部１３３に出力する。
【０１５０】
ビデオ実体データ読出部１３２は、ビデオ読出指示を取得すると、ポインタ情報とサイズ情報とを参照して第１データ蓄積部１０２からビデオ実体データを読み出してインタリーブ配列部１３４に出力し、オーディオ実体データ読出部１３３は、オーディオ読出指示を取得すると、ポインタ情報とサイズ情報とを参照して第２データ蓄積部１０５からオーディオ実体データを読み出してインタリーブ配列部１３４に出力する（Ｓ３０２）。
【０１５１】
インタリーブ配列部１３４は、読み出した実体データをビデオ実体データ読出部１３２およびオーディオ実体データ読出部１３３から受け取ると、受け取った順に逐次配列する（Ｓ３０３）。
ここで、インタリーブ配列部１３４は、ビデオ実体データとオーディオ実体データの全て、すなわち、１パケットに格納される実体データの全ての配列が完了するまで、実体データの配列を続行する（Ｓ３０４のＮｏ、Ｓ３０３）。
【０１５２】
そして、１パケットに格納される実体データの全ての配列が完了すると（Ｓ３０４のＹｅｓ）、インタリーブ配列部１３４は、配列した実体データをｍｄａｔデータとして、パケット結合部１１４に出力して（Ｓ３０５）、ｍｄａｔの作成の処理動作を終了する。
【０１５３】
このようなパケットデータ作成部１３０の処理動作を経て作成されるＭＰ４ファイルの拡張部は、シークに時間がかかる光ディスク機器等におけるランダムアクセス再生に適している。その理由について図１２に本実施の形態３に係る多重化装置が作成するＭＰ４ファイル拡張部のデータ構造の概略を示して説明する。
【０１５４】
図１２に示すＭＰ４ファイル拡張部２４０は、４〜８秒までのコンテンツデータを格納するパケット１、８〜１２秒までのコンテンツデータを格納するパケット２、１２〜１６秒までのコンテンツデータを格納するパケット３というように、複数のパケットが配列されることで構成されている。
【０１５５】
各パケットは、ｍｏｏｆ２４１とｍｄａｔ２４２とから構成されており、ｍｏｏｆ２４１には、ビデオトラックに関するｔｆｈｄ（Ｖ）およびｔｒａｆ（Ｖ−１、Ｖ−２）と、オーディオトラックに関するｔｆｈｄ（Ａ）およびｔｒａｆ（Ａ−１、Ａ−２）とが格納されている。また、ｔｒａｆ（Ｖ−１）とｔｒａｆ（Ａ−１）に格納されるヘッダ情報が指し示すサンプルの実体データは、ｍｄａｔ＿１に格納され、ｔｒａｆ（Ｖ−２）とｔｒａｆ（Ａ−２）に格納されるヘッダ情報が指し示すサンプルの実体データは、ｍｄａｔ＿２に格納されている。そして、ｍｄａｔ２４２には、ビデオサンプルの実体データとオーディオサンプルの実体データとが交互にインタリーブして格納されている。
【０１５６】
このとき、再生装置側で、再生時間が４秒の位置から再生を開始するランダムアクセス処理に際して、ｍｏｏｆ＿１の先頭位置に読み出しポインタを移動させれば、後はｍｏｏｆ＿１を解析して、読み出しポインタを連続的に移動させることによりｍｏｏｆ＿１に連続するｍｄａｔ＿１から再生に必要な実体データを取得することができる。
【０１５７】
すなわち、このＭＰ４ファイル拡張部２４０によれば、再生装置は、ｍｏｏｆ＿１の先頭位置に読み出しポインタを移動させる１回のシーク動作だけで、ランダムアクセス再生を実現することができるので、シークに時間がかかる光ディスク機器等に有効といえる。
【０１５８】
ここで、ｍｄａｔ２４２において、ビデオサンプルの実体データの直後に格納されるオーディオサンプルの実体データは、直前のビデオサンプルの再生開始時間と揃えられているので、ビデオデータとオーディオデータの同期再生は担保されている。図１３に、ＭＰ４ファイル拡張部２４０のｍｄａｔ＿１に実体データが格納されている様子を示す。
【０１５９】
図１３に示すように、ｍｄａｔ＿１の先頭に格納されているビデオサンプル１の再生開始時間は４０００ｍｓであり、ビデオサンプル１の直後に格納されているオーディオサンプル１の再生開始時間は、４０００ｍｓであり、ビデオサンプル１とオーディオサンプル１の再生開始時間は同一に揃えられている。
【０１６０】
通常、ビデオサンプルとオーディオサンプルのサンプルレートは異なることが多いので、ここでは、ビデオサンプルの再生時間長は５００ｍｓとし、オーディオサンプルの再生時間長は１００ｍｓとする。
従って、ＭＰ４ファイル拡張部２４０のｍｄａｔ＿１には、ビデオサンプル１の直後にオーディオサンプル１〜５がインタリーブして格納され、その後に、ビデオサンプル２、オーディオサンプル６〜１０、ビデオサンプル３・・・の順に格納されることになる。
【０１６１】
このとき、ビデオサンプル２の再生開始時間は、４５００ｍｓであり、ビデオサンプル２の直後に格納されているオーディオサンプル６の再生開始時間も４５００ｍｓであり、ビデオサンプルとそのビデオサンプル直後のオーディオサンプルの再生開始時間は、常に同一となるように揃えられている。
【０１６２】
また、ビデオサンプルとオーディオサンプルのサンプルレートは異なるため、ビデオサンプルの再生開始時間とその直後のオーディオサンプルの再生開始時間とが同一とならない場合も生じうる。このような場合でも、ビデオサンプル直後のオーディオサンプルを、ビデオサンプルの再生開始時間と近似する再生開始時間を有するオーディオサンプルとすることによって、ビデオデータとオーディオデータの同期再生を担保することができる。
【０１６３】
図１４は、ＭＰ４ファイル拡張部のｍｄａｔ＿１に実体データが格納されている様子を示す第２のデータ構造を示す図である。
図１４に示すように、ＭＰ４ファイル拡張部２５０のｍｄａｔ＿１の先頭に格納されているビデオサンプル１の再生開始時間は、４０００ｍｓであり、ビデオサンプル１の直後に格納されているオーディオサンプル１の再生開始時間は、４０５０ｍｓであり、ビデオサンプル１の直後に格納されるオーディオサンプルとして、ビデオサンプル１の再生開始時間以降であって最も早い再生開始時間を有するオーディオサンプル１が格納されている。
【０１６４】
ここで、先に説明した場合と同様に、ビデオサンプルの再生時間長は５００ｍｓとし、オーディオサンプルの再生時間長は１００ｍｓとする。
従って、ＭＰ４ファイル拡張部２５０のｍｄａｔ＿１には、ビデオサンプル１の直後に、オーディオサンプル１〜５がインタリーブして格納され、その後に、ビデオサンプル２、オーディオサンプル６〜１０、ビデオサンプル３・・・の順に格納されることになる。
【０１６５】
このとき、ビデオサンプル２の再生開始時間は、４５００ｍｓであり、ビデオサンプル２の直後に格納されているオーディオサンプル６の再生開始時間は、４５５０ｍｓであり、ビデオサンプルとそのビデオサンプル直後のオーディオサンプルの再生開始時間は、常に近似するように揃えられている。
【０１６６】
なお、ここで、ビデオサンプルの直後に格納されるオーディオサンプルとして、ビデオサンプルの再生開始時間以前であって最も遅い再生開始時間を有するオーディオサンプルを格納することとしてもよい。この場合、ビデオサンプル１の直後に格納されるオーディオサンプル１は、３９５０ｍｓの再生時間を有することになる。
【０１６７】
以上説明したように、本実施の形態３に係る多重化装置によれば、ビデオサンプルの直後に、ビデオサンプルの再生開始時間と同一または近似する再生開始時間を有するオーディオサンプルを配置し、ビデオサンプルとオーディオサンプルとを再生開始時間が昇順となるようにインタリーブしてｍｄａｔに格納するので、シーク速度の遅い再生装置においても、迅速にランダムアクセス可能なデータ構造のＭＰ４ファイル拡張部を作成することができる。
【０１６８】
（実施の形態４）
続いて、本発明の実施の形態４に係る逆多重化装置について、図１５および図１６を参照しながら説明する。
図１５は、本実施の形態４に係る逆多重化装置の機能的な構成を示すブロック図である。
逆多重化装置３００は、上記実施の形態１、２および３に係る多重化装置で作成されたＭＰ４ファイル拡張部を含むＭＰ４ファイルデータを取得して解析し、メディアデータを逆多重化して再生データを出力する装置であり、ファイル入力部３０１、ファイルデータ蓄積部３０２、ヘッダ分離解析部３０３、ｍｏｏｖ解析部３０４、ｍｏｏｆ解析部３０５、ｔｒａｆ解析部３０６、ｔｒｕｎ解析部３０７、ＲＡ検索部３０８およびサンプル取得部３０９を備えている。
【０１６９】
ファイル入力部３０１は、ＭＰ４ファイルデータを取得するインターフェースであり、取得したＭＰ４ファイルの入力データを順次、ファイルデータ蓄積部３０２に蓄積させる。
ファイルデータ蓄積部３０２は、ＭＰ４入力データを一時的に保持するキャッシュメモリやＲＡＭ等である。
【０１７０】
ヘッダ分離解析部３０３は、ファイルデータ蓄積部３０２に保持されているＭＰ４入力データのうちＭＰ４ファイルのヘッダデータを読み出して解析し、ＭＰ４ファイルの基本部ヘッダのｍｏｏｖデータと、拡張部ヘッダのｍｏｏｆデータとに分離して、それぞれｍｏｏｖ解析部３０４およびｍｏｏｆ解析部３０５に出力する処理部であり、ＣＰＵやメモリによって実現される。
【０１７１】
ｍｏｏｖ解析部３０４は、ＭＰ４ファイルのｍｏｏｖを解析して、メディアデータの符号化レートやコンテンツの再生時間長等、メディアデータの解析に必要なメディア情報を取得する処理部であり、ＣＰＵやメモリによって実現される。このｍｏｏｖ解析部は、取得したメディア情報をｍｏｏｆ解析部３０５に出力する。
【０１７２】
ｍｏｏｆ解析部３０５は、ＭＰ４ファイルのｍｏｏｆを、ｍｏｏｖ解析部３０４から取得したメディア情報に基づいて解析し、トラック毎のヘッダデータであるｔｒａｆデータをｔｒａｆ解析部３０６に出力する処理部であり、ＣＰＵやメモリによって実現される。
【０１７３】
ｔｒａｆ解析部３０６は、ＭＰ４ファイルのｔｒａｆを解析して、ｔｒａｆに含まれるサンプル毎のヘッダデータであるｔｒｕｎデータをｔｒｕｎ解析部３０７に出力する処理部であり、ＣＰＵやメモリによって実現される。
ｔｒｕｎ解析部３０７は、ＭＰ４ファイルのｔｒｕｎを解析して、ｔｒｕｎ内の各フィールドに記述されている情報を取得して、サンプル取得部３０９にｔｒｕｎ解析情報を出力する処理部であり、ＣＰＵやメモリによって実現される。このｔｒｕｎ解析情報には、例えば、そのサンプルのサイズや、そのサンプルがファイルデータ蓄積部３０２のどこに格納されているかを示すデータオフセット情報や、さらにビデオサンプルの場合にはイントラフレームであることか否かを示すフラグ情報等が含まれている。
【０１７４】
また、このｔｒｕｎ解析部３０７は、次に述べるＲＡ検索部３０８から、ランダムアクセス後の再生開始位置を示し、再生の開始を指示する出力信号である再生開始指示を取得すると、再生開始指示によって示されるｔｒｕｎから順に解析して、サンプル取得部３０９にｔｒｕｎ解析情報を出力する。
【０１７５】
ＲＡ検索部３０８は、ランダムアクセス後の再生開始時間を示す目標再生時間情報を取得して、ビデオトラックに関するヘッダ情報を格納する先頭のｔｒａｆ内の先頭のｔｒｕｎに含まれる先頭サンプルについての再生開始時間、およびイントラフレームであるかを示す情報である先頭サンプル情報を読み出して、ランダムアクセス後の再生開始位置となるビデオサンプルを検索する処理部であり、ＣＰＵやメモリによって実現される。このＲＡ検索部３０８は、ユーザからのランダムアクセス指示を受け付ける逆多重化装置３００の入力装置から目標再生時間情報を取得すると、ｔｒｕｎ解析部３０７から先頭サンプル情報のみを順次取得して、目標再生時間情報と同一または近似する再生開始時間を有するビデオサンプルを検索し、再生開始指示をｔｒｕｎ解析部３０７に出力する。
【０１７６】
サンプル取得部３０９は、ｔｒｕｎ解析情報に基づいて、サンプルの実体データを読み出して復号化し、再生データをディスプレイ等の表示装置に出力する処理部である。このサンプル取得部３０９は、ｔｒｕｎ解析部３０７からｔｒｕｎ解析情報を取得すると、これに含まれるデータオフセット情報を参照して、ファイルデータ蓄積部３０２からサンプルの実体データを読み出す。ここで、ｔｒｕｎ解析情報の取得開始をもって、再生開始が指示されたものとする。
【０１７７】
このように構成される逆多重化装置３００におけるランダムアクセス処理動作について図１６を用いて説明する。
図１６は、逆多重化装置３００のランダムアクセス処理動作を示すフロー図である。なお、このフローに先立って、逆多重化装置３００は、入力装置を介してユーザからのランダムアクセス指示を受け付けているものとする。
【０１７８】
まず、逆多重化装置３００は、ファイル入力部３０１において、上記実施の形態１、２または３に係る多重化装置において作成されたＭＰ４ファイルのデータを取得すると（Ｓ４００）、順次ファイルデータ蓄積部３０２に蓄積させていく。
【０１７９】
次に、逆多重化装置３００は、ヘッダ分離解析部３０３において、ＭＰ４ファイルのファイルヘッダ部のみを分離して解析し（Ｓ４１０）、さらに、基本部ヘッダと拡張部ヘッダとに分離して、ｍｏｏｖ解析部３０４において基本部ヘッダを解析し、ｍｏｏｆ解析部３０５において拡張部ヘッダを解析する（Ｓ４２０）。
【０１８０】
続いて、逆多重化装置３００は、ｍｏｏｆ解析部３０５において、拡張部ヘッダをさらに、トラック毎のヘッダに分離して、ｔｒａｆ解析部３０６において、トラックフラグメント、すなわち、ｔｒａｆを解析する（Ｓ４３０）。このとき、逆多重化装置３００は、ｔｒａｆ解析部３０６において、トラックフラグメントをさらに分離して、ｔｒｕｎ解析部３０７において、ｔｒｕｎを解析する。
【０１８１】
ここで、逆多重化装置３００は、ＲＡ検索部３０８において目標再生時間情報の入力があると、ｔｒｕｎ解析部３０７から先頭サンプル情報をＲＡ検索部３０８に出力し、ＲＡ検索部３０８において、目標再生時間情報と同一または近似する再生開始時間が示されている先頭サンプル情報であるか否かを判定する（Ｓ４４０）。
【０１８２】
このとき、対象サンプルが見つからなければ（Ｓ４５０のＮｏ）、逆多重化装置３００は、ＲＡ検索部３０８において、ファイル内における格納順で次に配置された拡張部ヘッダにおける先頭サンプル情報を取得して、先に取得している目標再生時間情報と同一または近似する再生開始時間が示されている先頭サンプル情報であるか否かを判定する（Ｓ４４０）。
【０１８３】
一方、対象サンプルが見つかれば（Ｓ４５０のＹｅｓ）、逆多重化装置３００は、ＲＡ検索部３０８において、再生開始指示を生成し、ｔｒｕｎ解析部３０７に出力する。ｔｒｕｎ解析部３０７は、ＲＡ検索部３０８から再生開始指示を受けると、再生開始指示を受けたｔｒｕｎから順に、ｔｒｕｎ解析情報をサンプル取得部３０９に出力する。ここで、再生開始指示を受けたｔｒｕｎとは、ＲＡ検索部３０８において再生開始を指示されたサンプルを含むｔｒｕｎを指す。
【０１８４】
その後、逆多重化装置３００は、サンプル取得部３０９において、ｔｒｕｎ解析情報に含まれるデータオフセット情報を参照して、ファイルデータ蓄積部３０２から対象サンプルの実体データを取得し（Ｓ４６０）、復号化して再生データを出力してランダムアクセス処理動作を終了する。
【０１８５】
以上説明したように、本実施の形態４に係る逆多重化装置３００によれば、上記実施の形態１、２または３に係る多重化装置が作成するＭＰ４ファイル拡張部を含むＭＰ４ファイルについてランダムアクセス再生を行なう際に、各パケットの先頭に格納されているビデオサンプルのみを検索することによって、ランダムアクセス後の再生開始位置とすべきビデオサンプルを判定することができるので、ランダムアクセス時のサンプル検索負荷が大幅に軽減されることになる。
【０１８６】
（適用例）
ここで、本発明に係る多重化装置の適用例について図１７を用いて説明する。
図１７は、本発明に係る多重化装置の適用例を示す図である。
本発明に係る多重化装置は、ビデオデータやオーディオデータ等のメディアデータを取得して多重化し、ＭＰ４ファイルデータを作成する録画機能付き携帯電話機４０３やパーソナルコンピュータ４０４に適用されうる。また、本発明に係る逆多重化装置は、作成されたＭＰ４ファイルデータを読み込んで再生する携帯電話機４０７に適用されうる。
【０１８７】
ここで、録画機能付き携帯電話機４０３およびパーソナルコンピュータ４０４において作成されたＭＰ４ファイルデータは、ＳＤメモリカード４０５やＤＶＤ−ＲＡＭ４０６等の記録媒体に格納されたり、通信ネットワーク４０２を介して画像配信サーバ４０１に送信されて、画像配信サーバ４０１から他の携帯電話機４０７等に配信されたりする。
【０１８８】
このように、本発明に係る多重化装置および逆多重化装置は、画像配信システム等におけるＭＰ４ファイルの作成装置または再生装置として利用されるものである。
以上、本発明に係る多重化装置および逆多重化装置について、各実施の形態等に基づいて説明したが、本発明は、これらの実施の形態等に限定されるものではない。
【０１８９】
例えば、上記各実施の形態では、ビデオデータとして、MPEG-4 Visualの符号化データを用いることとしたが、ビデオデータとして、MPEG-4 AVC(Advanced Video Coding)やH.263等のその他の動画像圧縮符号化方式による符号化データを用いてもよい。なお、MPEG-4 AVC(Advanced Video Coding)やH.263の符号化データでは、１ピクチャが１サンプルに相当することになる。
【０１９０】
同様に、オーディオデータとして、MPEG-4 Audioの符号化データを用いることとしたが、オーディオデータとして、G.726等のその他の音声圧縮符号化方式による符号化データを用いてもよい。
また、上記各実施の形態では、ビデオデータとオーディオデータとを用いて説明しているが、テキストデータ等が含まれている場合でも、オーディオデータのパケット化と同じように処理することによって、本発明の効果を得ることができる。
【０１９１】
さらに、上記実施の形態２において、イントラフレーム毎にパケット化を行なうとする場合には、パケット単位決定部１１７の構成要素から時間基準単位調整部１２０を省略し、図７のステップＳ２０５の処理を省略することとしてもよい。
【０１９２】
またさらに、上記実施の形態３において、ＭＰ４ファイルの再生装置側で予め設定されているバッファモデルに従ってＭＰ４ファイルが再生されることとなっている場合には、そのバッファモデルを満たすようにビデオサンプルのデータとオーディオサンプルのデータとをインタリーブしてｍｄａｔに格納することとしてもよい。ここで、バッファモデルとは、規格で定められた条件に従って符号化データが入力される場合に、その規格で定められたサイズのバッファを再生装置に持たせることで、バッファが空になる（アンダーフロー）、または、バッファから溢れる（オーバーフロー）ことなく、再生装置が復号化を行なうことができることを保証するためのモデルである。
【０１９３】
また、上記実施の形態１、２および３において、作成されるＭＰ４ファイルの拡張部のｍｏｏｆに格納するｔｒａｆの個数について言及していないが、ｍｏｏｆに格納するｔｒａｆは、１つのトラックにつき１つのｔｒａｆを格納するのが好ましい。このようにすることで、トラック毎に、ｍｏｏｆ内の先頭ｔｒａｆのみを解析すれば、ｍｏｏｆに格納されるトラックの全てのサンプルについてのヘッダ情報を取得することができるので、ヘッダ情報取得時の効率がさらに向上することとなる。
【０１９４】
さらに、上記実施の形態１、２および３において、作成されるＭＰ４ファイルの拡張部のｍｏｏｆにヘッダ情報が格納されるサンプルの実体データは、ｍｏｏｆに連続する１つのｍｄａｔに格納するとしているが、ｍｏｏｆに連続する複数のｍｄａｔに分割して格納することとしてもよい。具体的に説明すると、ｍｏｏｆ＿１にヘッダ情報が格納されるサンプルの実体データを、ｍｄａｔ＿１、ｍｄａｔ＿２、ｍｄａｔ＿３の順に格納し、ｍｏｏｆ＿２にヘッダ情報が格納されるサンプルの実体データを、ｍｄａｔ＿４、ｍｄａｔ＿５、ｍｄａｔ＿６の順に格納するとしてもよい。
【０１９５】
そして、上記実施の形態２および３では、パケット内に動画像データのイントラフレームが含まれる場合には、パケットの先頭に配置することとしているが、ランダムアクセスが可能であれば、Ｐ(Predictive)フレームやＢ(Bidirectionally predictive)フレーム等、イントラフレーム以外のビデオサンプルをパケットの先頭に配置することとしてもよい。以下、これについて、ビデオデータとしてMPEG-4 AVCの符号化データを用いた場合を例に挙げて説明する。
【０１９６】
MPEG-4 AVCでは、イントラピクチャから復号化しても正しい復号結果を得られない場合がある。より詳しく説明すると、MPEG-4 AVCのイントラピクチャには、IDR(Instantaneous Decoder Refresh)ピクチャと、それ以外のピクチャ（以下、non-IDRイントラピクチャと称する。）の２種類があり、IDRピクチャから復号化を開始すると、必ず正しい復号結果を得ることができるが、non-IDRイントラピクチャから復号化を開始すると、non-IDRイントラピクチャおよび表示順でnon-IDRイントラピクチャ以降の複数枚のピクチャについて、正しい復号結果を得られないことがある。
【０１９７】
そのため、MPEG-4 AVCでは、non-IDRイントラピクチャから正しい復号結果を得るためには、どのピクチャから復号化を開始すればよいかを示す補助情報（Recovery Point Supplemental Enhancement Information 以下、“Recovery Point
SEI” と称する。）を付加することができる。
【０１９８】
例えば、Ｐｉｃ＿１、Ｐｉｃ＿２、Ｐｉｃ＿３、Ｐｉｃ＿４、Ｐｉｃ＿５で示される５枚のピクチャが、この順序でビデオデータに含まれ、Ｐｉｃ＿５がnon-IDRイントラピクチャで、表示順でＰｉｃ＿５およびＰｉｃ＿５以降のピクチャを正しく復号化しようとすると、Ｐｉｃ＿１から復号化を開始しなければならない場合、Ｐｉｃ＿１の直前に、Recovery Point SEIを配置することによって、ビデオデータ内における格納順で４枚後のピクチャであるＰｉｃ＿５、および、表示順でそれ以降のピクチャを正しく復号化するためには、Ｐｉｃ＿１から復号化を開始する必要があることを示すことができる。
【０１９９】
すなわち、この場合に、Ｐｉｃ＿１は、ランダムアクセス可能なサンプルであるといえるので、MPEG-4 AVCの符号化データの場合、IDRピクチャまたはRecovery Point SEIが付加されたピクチャのサンプルを、ランダムアクセス可能なサンプルとして、パケットの先頭に配置することとしてもよい。なお、Recovery Point SEIはイントラピクチャ以外のピクチャに付加することもできる。
【０２００】
このとき、Recovery Point SEIが付加されたピクチャのサンプルと、Recovery Point SEIが付加されたピクチャから復号化を開始することで正しい復号結果を得られるようになるピクチャのサンプルとを同一パケットに格納することによって、サンプルデータ取得時の処理量を削減することができる。
【０２０１】
さらに、IDRピクチャと、Recovery Point SEIが付加されたピクチャのサンプルとは、先頭サンプルフラグ９３０、あるいはサンプルフラグ９３５における特定のフラグ値（以降、ノンシンクサンプルフラグと呼ぶ。）により識別することができる。ＭＰ４においては、ランダムアクセス可能なサンプルのうち、ランダムアクセスするサンプルと正しい復号結果が得られるサンプルとが一致するサンプルについてのみ、ノンシンクサンプルフラグを０にセットすることができる。このため、IDRピクチャのサンプルではノンシンクサンプルフラグを０とし、Recovery Point SEIが付加されたピクチャのサンプルではノンシンクサンプルフラグを１とすることにより、両者を識別することができる。
【０２０２】
以上のような識別方法を用いることにより、IDRピクチャとRecovery Point SEIが付加されたピクチャに限らず、互いに異なる性質をもつランダムアクセス可能なサンプルを識別することができる。実際には、以下のように使用することができる。
【０２０３】
まず１つ目は、特定のサンプルのみを再生していくことにより、早送り再生を行う場合である。このときは、復号したサンプルをただちに表示できることが望ましいので、ノンシンクサンプルフラグが０であるサンプルのみを復号化し、再生することとする。
【０２０４】
２つ目は、コンテンツの途中から再生を開始する、あるいは特定区間をスキップして次区間の再生を開始するような場合である。このとき、復号を開始するサンプルと正しい復号結果が得られるサンプルとが異なる可能性があるのは、再生開始時のみである。そこで、ノンシンクサンプルフラグが０であるサンプル、あるいはノンシンクサンプルフラグが１であるランダムアクセス可能なサンプルのどちらからでも再生を開始できることとする。
【０２０５】
なお、このような格納方法は、MPEG-4 AVCのRecovery Point SEIの場合に限られず、復号化を開始するサンプルと、正しい復号結果が得られるサンプルとが異なる場合に適用することができ、例えば、MPEG2-VideoにおけるOpen GOP(Group Of Pictures)のような構造に適用することができる。
【０２０６】
さらに、サンプルがランダムアクセス可能であることを示す識別情報が付加されている際には、その識別情報によってランダムアクセス可能であることが示されているサンプルをパケットの先頭に配置することとしてもよい。
【０２０７】
【発明の効果】
以上の説明から明らかなように、本発明に係る多重化装置によれば、メディアデータに含まれる画像データと、音声データおよびテキストデータの再生開始時間が揃えられてパケットに格納されるので、再生装置側における再生時のデータアクセスの効率化を実現することができる。
【０２０８】
また、パケットに含まれる先頭のビデオサンプルをイントラフレームのビデオサンプルとすることで、再生装置側におけるランダムアクセス時のサンプル検索に要する計算量を大幅に削減することが可能になるという効果が奏される。
さらに、パケットに含まれるビデオサンプルとオーディオサンプルとが再生開始時間が昇順となって格納されるので、再生装置側におけるランダムアクセス時のシーク動作の回数を少なくすることができ、シーク速度の遅い再生装置でも迅速なランダムアクセス再生を可能とする多重化を実現することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態１に係る多重化装置の機能的な構成を示すブロック図である。
【図２】多重化装置の処理動作を示すフロー図である。
【図３】ビデオパケット単位決定部の処理動作を示すフロー図である。
【図４】オーディオパケット単位決定部の処理動作を示すフロー図である。
【図５】（ａ）は、多重化装置が作成するＭＰ４ファイル拡張部のデータ構造の第１例を示す図であり、（ｂ）は、多重化装置が作成するＭＰ４ファイル拡張部のデータ構造の第２例を示す図である。
【図６】本実施の形態２に係る多重化装置のパケット単位決定部の機能的な構成を示すブロック図である。
【図７】ビデオパケット単位決定部の第１の処理動作を示すフロー図である。
【図８】ビデオパケット単位決定部の第２の処理動作を示すフロー図である。
【図９】（ａ）は、多重化装置が作成するＭＰ４ファイル拡張部のデータ構造の第１例を示す図であり、（ｂ）は、多重化装置が作成するＭＰ４ファイル拡張部のデータ構造の第２例を示す図である。
【図１０】本実施の形態３に係る多重化装置のパケットデータ作成部の機能的な構成を示すブロック図である。
【図１１】パケットデータ作成部の処理動作を示すフロー図である。
【図１２】多重化装置が作成するＭＰ４ファイル拡張部のデータ構造の概略を示す図である。
【図１３】多重化装置が作成するＭＰ４ファイル拡張部のデータ構造の第１例を示す図である。
【図１４】多重化装置が作成するＭＰ４ファイル拡張部のデータ構造の第２例を示す図である。
【図１５】本実施の形態４に係る逆多重化装置の機能的な構成を示すブロック図である。
【図１６】逆多重化装置の処理動作を示すフロー図である。
【図１７】本発明に係る多重化装置の適用例を示す図である。
【図１８】従来のＭＰ４ファイルを構成するボックスの構造を説明するための図である。
【図１９】従来のＭＰ４ファイルの基本部を説明するための図である。
【図２０】（ａ）は、従来のＭＰ４ファイルにおけるムービーボックスの構造を説明するための図であり、（ｂ）は、従来のＭＰ４ファイルにおけるムービーボックスの構造をツリー状に示す図である。
【図２１】従来における拡張部を含むＭＰ４ファイルの構造を示す図である。
【図２２】従来におけるムービーフラグメントボックスの構造を説明するための図である。
【図２３】従来におけるトラックフラグメントランボックスの構造を説明するための図である。
【図２４】（ａ）従来における拡張部を含むＭＰ４ファイルの第１の構成例を示す図であり、（ｂ）は、従来における拡張部を含むＭＰ４ファイルの第２の構成例を示す図である。
【図２５】従来の多重化装置の構成を示すブロック図である。
【図２６】従来におけるパケット単位決定部の処理動作を示すフロー図である。
【図２７】従来におけるビデオサンプルのヘッダ情報を格納するパケット作成テーブルの一例を示す図である。
【図２８】（ａ）は、従来における多重化装置の第１の問題点を説明するための図であり、（ｂ）は、従来における多重化装置の第２の問題点を説明するための図である。
【符号の説明】
１００、９６０多重化装置
１０１、９６１第１入力部
１０２、９６２第１データ蓄積部
１０３、９６３第１解析部
１０４、９６４第２入力部
１０５、９６５第２データ蓄積部
１０６、９６６第２解析部
１０７、１１７、９６７パケット単位決定部
１０８時間調整部
１０９、１１９ビデオパケット単位決定部
１１０オーディオパケット単位決定部
１１１、９６８パケット作成テーブル蓄積部
１１２、９６９パケットヘッダ作成部
１１３、１３０、９７０パケットデータ作成部
１１４、９７１パケット結合部
１２０時間基準単位調整部
１２１Ｉフレーム基準単位調整部
１３１ｍｄａｔ情報取得部
１３２ビデオ実体データ読出部
１３３オーディオ実体データ読出部
１３４インタリーブ配列部
２００、２１０、２２０、２３０、２４０、２５０ＭＰ４ファイル拡張部
２４１、９２３、９４６、９４８、９５５、９５７ムービーフラグメントボックス
２４２、９１６、９４５、９４７、９４９、９５６、９５８ムービーデータボックス
３００逆多重化装置
３０１ファイル入力部
３０２ファイルデータ蓄積部
３０３ヘッダ分離解析部
３０４ｍｏｏｖ解析部
３０５ｍｏｏｆ解析部
３０６ｔｒａｆ解析部
３０７ｔｒｕｎ解析部
３０８ＲＡ検索部
３０９サンプル取得部
４０１画像配信サーバ
４０２通信ネットワーク
４０３録画機能付き携帯電話機
４０４パーソナルコンピュータ
４０５ＳＤメモリカード
４０６ＤＶＤ−ＲＡＭ
４０７携帯電話機
９０１ボックス
９０２ボックスヘッダ部
９０３ボックスデータ格納部
９０４ボックスサイズ
９０５ボックスタイプ
９０６バージョン
９０７フラグ
９１０、９２０、９４０、９５０ＭＰ４ファイル
９１１、９４１、９５１基本部
９１２ファイルヘッダ部
９１３ファイルデータ部
９１４、９４３、９５３ファイルタイプボックス
９１５、９４４、９５４ムービーボックス
９１７ムービーヘッダボックス
９１８トラックボックス
９１９トラックヘッダボックス
９２１、９４２、９５２拡張部
９２２パケット
９２４ムービーフラグメントヘッダボックス
９２５トラックフラグメントボックス
９２６トラックフラグメントヘッダボックス
９２７トラックフラグメントランボックス
９２８サンプルカウント
９２９データオフセット
９３０先頭サンプルフラグ
９３１テーブル
９３２エントリ
９３３サンプルデュレーション
９３４サンプルサイズ
９３５サンプルフラグ
９３６サンプルコンポジションタイムオフセット
９６８ａパケット作成テーブル[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a multiplexing device that multiplexes media data such as moving image data and audio data, and a demultiplexing device that reads and demultiplexes a bit string in which media data such as moving image data and audio data is multiplexed. About.
[0002]
[Prior art]
In recent years, with the increase in communication network capacity and advances in transmission technology, the spread of video distribution services that distribute video files including multimedia content such as video, audio, text, or still images to personal computers over the Internet Is remarkable. Also, the international standardization organization 3GPP (Third Generation Partnership Project), which aims to standardize the standard of so-called third generation mobile communication systems such as mobile terminals, is the TS26.234 (Transparent The end-to-end packet switched streaming service has been established, and the video distribution service is expected to be expanded to mobile communication terminals such as mobile phones and PDAs.
[0003]
When distributing a moving image file in a moving image distribution service, first, the multiplexing device captures media data such as a moving image, a still image, audio and text, and header information and media data necessary for reproducing the media data. However, the MP4 file format is attracting attention as a multiplexed file format of the moving image file data.
[0004]
This MP4 file format is a multiplexed file format that is being standardized in ISO / IEC (International Standardization Organization / International Engineering Consortium) JTC1 / SC29 / WG 11 which is an international standardization organization. Because it is adopted, it is expected to be widely spread.
[0005]
Here, the data structure of the MP4 file will be described. The data structure of this MP4 file is disclosed in Non-Patent Document 1.
The MP4 file stores entity information such as header information and media data in units of objects called boxes, and is configured by arranging a plurality of boxes hierarchically.
[0006]
FIG. 18 is a diagram for explaining the structure of a box constituting a conventional MP4 file.
The box 901 stores a box header portion 902 in which header information of the box 901 is stored, and data included in the box 901 (for example, a box in a hierarchy below the box and a field for describing information). A box data storage unit 903.
[0007]
The box header portion 902 has fields of a box size 904, a box type 905, a version 906, and a flag 907.
The box size 904 is a field in which the size information of the entire box 901 including the byte size assigned to this field is described.
[0008]
The box type 905 is a field in which an identifier for identifying the type of the box 901 is described. This identifier is usually represented by four alphabetic character strings. In the following description, each box may be indicated by this identifier.
[0009]
The version 906 is a field in which a version number indicating the version of the box 901 is described, and the flag 907 is a field in which flag information set for each box 901 is described. Since this version 906 and flag 907 are not mandatory fields for all boxes 901, there may be boxes 901 that do not have these fields.
[0010]
An MP4 file composed of a plurality of boxes 901 having such a structure can be roughly divided into a basic part indispensable for the structure of the file and an extended part used as necessary. First, the basic part of the MP4 file will be described.
FIG. 19 is a diagram for explaining a basic part of a conventional MP4 file.
[0011]
The basic part 911 of the MP4 file 910 includes a file header part 912 and a file data part 913.
The file header portion 912 is a portion in which header information of the entire file, for example, information such as a compression encoding method for moving image (video) data, is stored, and includes a file type box 914 and a movie box 915.
[0012]
The file type box 914 is a box identified by an identifier of “ftyp”, and stores information for identifying an MP4 file. As to what kind of media data is stored in the MP4 file, and what kind of compression encoding method is used to store moving image (video) data, audio (audio) data, etc., standardization organizations and services Since the service provider can uniquely define the information, the file type box 914 stores information for identifying which of the MP4 files has been created.
[0013]
The movie box 915 is a box identified by an identifier of “moov”, and stores header information of entity data stored in the file data portion 913, for example, information such as a display time length.
The file data portion 913 includes a movie data box 916 identified by an identifier “mdat”. Instead of the file data portion 913, an external file different from the MP4 file 910 can be referred to. In this way, when referring to an external file, the basic part 911 of the MP4 file 910 is composed of only the file header part 912. In this specification, a case will be described in which entity data is included in the MP4 file 910 instead of referring to the external file.
[0014]
The movie data box 916 is a box for storing media data entity data in units called samples. This sample is the smallest access unit in the MP4 file, and corresponds to the VOP (Video Object Plane) of video data encoded by the MPEG (Moving Picture Experts Group) -4 Visual compression encoding method and the frame of audio data To do.
[0015]
Here, the structure of the movie box 915 will be described by digging down the hierarchy of the structure of the basic part of the conventional MP4 file.
FIG. 20 is a diagram for explaining the structure of a movie box in a conventional MP4 file.
[0016]
As shown in FIG. 20A, the movie box 915 includes the box header portion 902 and the box data storage portion 903 described above. The size information of the movie box 915 is described in the box size 904 field constituting the box header portion 902 (in FIG. 20A, “xxxx”), and the box type 905 field contains An identifier “moov” of the movie box 915 is described.
[0017]
The box data storage unit 903 of the movie box 915 stores header information for each track such as a movie header box 917 storing the header information of the basic unit 911 of the MP4 file 910 and a video track or an audio track. A track box 918 and the like are stored. Here, the track means the entire sample data of each medium included in the MP4 file 910, and tracks such as moving images, audio, and text are referred to as video tracks, audio tracks, text tracks, and the like, respectively. . In addition, when there are a plurality of data of the same medium in the MP4 file 910, a plurality of tracks exist for the same medium. More specifically, for example, when two types of moving image data are included in the MP4 file 910, there are two video tracks.
[0018]
The movie header box 917 also includes the box header portion 902 and the box data storage portion 903 described above, and the size information of the movie header box 917 is stored in the box size 904 field constituting the box header portion 902. In the box type 905 field, the identifier “mvhd” of the movie header box 917 is described. The box data storage unit 903 of the movie header box 917 stores information related to the length of time required to reproduce the content included in the basic unit 911 of the MP4 file 910.
[0019]
Further, in the box size 904 field constituting the box header portion 902 of the track box 918, the size information of the track box 918 is described (in FIG. 20A, “xx”), and the box type 905 is displayed. In the field, an identifier “trak” of the track box 918 is described. A track header box 919 is stored in the box data storage unit 903 of the track box 918.
[0020]
The track header box 919 is a box having a field for describing header information for each track, and is identified by an identifier of “tkhd”. In the box data storage section 903 of the track header box 919, a field for describing a track ID for identifying the type of track, information on a time length required for reproducing the track, and the like are described.
[0021]
As described above, the boxes 901 are hierarchically arranged in the movie box 915, and header information for each track such as video and audio is stored in the track box 918 identified by “trak”. In the lower box included in the track box 918, header information for each track sample is stored.
[0022]
When the structure of the movie box 915 shown in FIG. 20A is shown in a tree shape, a diagram as shown in FIG. 20B is obtained.
That is, a movie header box 917 and a track box 918 are arranged as a lower box group of the movie box 915, a track header box 919 is arranged as a lower box group of the track box 918, and the boxes 901 are arranged hierarchically. You can see that
[0023]
At the beginning of standardization of the MP4 file format, the MP4 file 910 was composed only of the basic unit 911. However, as the amount of media data increases, the size increases, so there are various problems such as difficulty in application to streaming playback, and the use of an extension unit with a plurality of pairs of header boxes and data boxes is required. Improvements have been made.
[0024]
FIG. 21 is a diagram showing a structure of a conventional MP4 file including an extension unit. As shown in FIG. 21, the MP4 file 920 to which the above improvements are added is composed of a basic part 911 and an extension part 921. Since all media data can be stored in the extension unit 921 in the MP4 file 920 including the extension unit 921, the movie data box 916 of the MP4 file basic unit 911 may be omitted.
[0025]
The extension unit 921 is configured by a plurality of packets 922 separated in predetermined units.
This packet 922 includes a pair of a movie fragment box 923 and a movie data box 916, and is also referred to as a movie fragment.
[0026]
The movie data box 916 stores a sample for each track in the predetermined unit divided above, and the movie fragment box 923 is a box for storing header information corresponding to the movie data box 916, and is called “moof”. Identified by an identifier. The structure of the movie fragment box 923 will be described in more detail.
[0027]
FIG. 22 is a diagram for explaining the structure of a conventional movie fragment box.
As shown in FIG. 22, a movie fragment header box 924 and a plurality of track fragment boxes 925 are stored in the box data storage unit 903 of the movie fragment box 923.
[0028]
The movie fragment header box 924 is a box identified by the identifier “mfhd”, and stores header information of the entire movie fragment box 923.
The track fragment box 925 is a box identified by an identifier “traf”, and stores header information for each track.
[0029]
Normally, one track fragment box 925 is prepared for header information of one track, but a plurality of track fragment boxes 925 may be prepared for header information of one track. As described above, when the header information of one track is divided into a plurality of track fragment boxes 925 and stored, the decoding time of the first sample in the track fragment box 925 is arranged in ascending order.
[0030]
In the box data storage unit 903 of the track fragment box 925, a track fragment header box 926 and one or more track fragment run boxes 927 are stored.
The track fragment header box 926 is a box identified by an identifier of “tfhd”, and stores a field describing a track ID for identifying a track type, information on a default value such as a playback time length of a sample, and the like. To do.
[0031]
The track fragment run box 927 is a box identified by an identifier of “trun” and stores header information in units of samples. The track fragment run box 927 will be described in detail with reference to FIG.
FIG. 23 is a diagram for explaining the structure of a conventional track fragment run box 927.
[0032]
The flag 907 is a field in which flag information set for each box 901 is described. Here, each field from the data offset 929 to the sample composition time offset 936 is the track fragment run box 927 following the flag 907. Flag information indicating whether or not the file exists is described.
[0033]
The sample count 928 is a field in which information indicating how many samples of header information are stored in the track fragment run box 927 is described.
The data offset 929 is stored in the movie data box 916 in which the actual data of the sample located at the head of the track fragment run box 927 among the samples whose header information is stored in the track fragment run box 927 is stored. This is a field in which pointer information indicating whether or not
[0034]
The head sample flag 930 is a field in which the value of the field of the sample flag 935 described later can be overwritten when the head sample of the track fragment run box 927 is a randomly accessible sample. Here, random access means, for example, a processing operation of moving the data playback position after 10 seconds in the middle of playback or starting playback from the middle of data in an MP4 file playback device. The randomly accessible sample is a frame that can be decoded independently without referring to the data of other frames in the MP4 file playback device among video samples, that is, an intra-frame encoded frame (so-called intra frame). Means sample. In addition, since any sample can be decoded independently, it can be said that all audio samples are randomly accessible samples.
[0035]
In the table 931, entries 932 indicating header information for each sample are accumulated for the number indicated in the sample count 928.
The entry 932 is a collection of fields indicating header information for each sample, and which field is included is indicated by the flag 907. In the field included in the entry 932, a sample duration 933 in which the playback time length of the sample is described, a sample size 934 in which the sample size is described, and flag information indicating whether the sample is randomly accessible are described. In order to handle samples using bi-directional prediction, there is a sample composition time offset 936 in which the difference value between the sample decoding time and the display time is described.
[0036]
If these fields are not included in the entry 932, the header information of each sample is stored in the track fragment header box 926 and the movie extend box (identifier “mvex”) in the movie fragment box 915. Since default values are described, these default values are used.
[0037]
In the track fragment run box 927, header information is described in order from the sample with the earliest decoding time. Therefore, when the apparatus that plays the MP4 file searches for the header information of the sample, the track ID in the track fragment header box 926 is referred to in order from the first track fragment box 925 in the file, thereby obtaining the track information to be acquired. The track fragment box 925 including the header information is searched, and within the track fragment box 925, the sample header information is searched in order from the first track fragment run box 927.
[0038]
Even in the case of the MP4 file 920 including the extension unit 921, information necessary for the entire track, such as initialization information at the time of decoding, is stored in the movie box 915.
Next, a configuration example of an MP4 file including the extension unit 921 having such a structure will be described.
[0039]
FIG. 24 is a diagram illustrating a configuration example of an MP4 file extension unit including a conventional extension unit.
In FIG. 24, the content storage method will be described with reference to two examples, and the playback time length of the content is assumed to be 60 seconds.
[0040]
The MP4 file 940 shown in FIG. 24A is configured to store media data in both the basic unit 941 and the extension unit 942. That is, media data of 0 to 30 seconds is stored in mdat_1 (reference numeral 945) of the basic unit 941, and media data of 30 to 45 seconds is stored in mdat_2 (reference numeral 947) of the extension section 942, and mdat_3 (reference numeral 949). ) Stores media data for 45 to 60 seconds. The header information of mdat_1 (reference 945) is stored in moov 944, the header information of mdat_2 (reference 947) is stored in moof_1 (reference 946), and the header information of mdat_3 (reference 949) is stored in moof_2 (reference 948). Has been.
[0041]
On the other hand, the MP4 file 950 shown in FIG. 24B is configured to store media data only in the extension unit 952. That is, the basic unit 951 includes ftyp 953 and moov 954 and does not include mdat. Media data from 0 to 30 seconds is stored in mdat_1 (reference numeral 956) of the extension section 952, and 30 to mdat_2 (reference numeral 958). Media data up to 60 seconds is stored. The header information of mdat_1 (reference numeral 956) is stored in moof_1 (reference numeral 955), and the header information of mdat_2 (reference numeral 958) is stored in moof_2 (reference numeral 957).
[0042]
Here, how the extension part of the MP4 file is created will be described with reference to FIGS.
FIG. 25 is a block diagram showing a configuration of a conventional multiplexing apparatus.
The multiplexing device 960 is a device that multiplexes media data and creates MP4 file extension data. Here, it is assumed that the extension data of the MP4 file is created by multiplexing the video data and the audio data.
[0043]
The first input unit 961 takes video data into the multiplexing device 960 and stores it in the first data storage unit 962, and the second input unit 964 takes audio data into the multiplexing device 960 and sends it to the second data storage unit 965. To accumulate.
The first analysis unit 963 reads the video data sample by sample from the first data storage unit 962 and analyzes it, and outputs the header information of the video sample to the packet unit determination unit 967. The second floor seat 966 reads audio data sample by sample from the second data storage unit 965 and analyzes it, and outputs the header information of the audio sample to the packet unit determination unit 967. The video sample header information and the audio sample header information include information indicating the sample size and the playback time length, and the video sample header information also includes information indicating whether the video sample is an intra frame. include.
[0044]
The packet unit determination unit 967 determines packet units of video data and audio data so that the number of samples included in the packet is constant, and creates header information of each packet based on the acquired sample header information.
FIG. 26 shows a processing operation flow of the conventional packet unit determination unit. Here, the number of samples stored in one packet is N, and this value is determined in advance and held in the memory of the multiplexing apparatus 960 or the like.
[0045]
First, when the first analysis unit 963 acquires one video sample (S901) and outputs the video sample header information to the packet unit determination unit 967, the packet unit determination unit 967 stores the video sample header information in the packet creation table. Add (S902).
Next, the packet unit determination unit 967 updates the number of video samples included in the packet (S903), and determines whether the number of video samples included in the packet has become N (S904).
[0046]
Here, when the number of video samples included in the packet is less than N (No in S904), the processes from S901 to S903 are repeated, and the number of video samples included in the packet becomes N (S904). In step S905, the packet unit determination unit 967 packetizes the N video samples and ends the processing operation.
[0047]
Similarly, the packet unit determination unit 967 converts audio samples into packets by performing the processing operations from S901 to S905. Then, the packet unit determination unit 967 repeats the processing operation of this flow until packetization of all samples is completed.
[0048]
FIG. 27 shows an example of a packet creation table for storing conventional video sample header information. In the packet creation table 968a, for each video sample, information on the size of the sample, the playback time length of the sample, and the intra-frame encoded frame flag indicating whether or not the video sample is an intra frame is described. Here, it is shown that the first video sample stored in the packet has a size of 300 bytes, a playback time length of 30 ms, and is not an intra-frame encoded frame. The second video sample is an intra-frame encoded frame. It is shown that. This packet creation table 968a is output to the packet creation table storage unit 968 when these pieces of information are sequentially added by the packet unit determination unit 967 and created up to the Nth sample as the last sample included in one packet. The
[0049]
Referring to FIG. 25 again, subsequently, the packet unit determination unit 967 describes the header information of N samples in the packet creation table 968a, and then outputs the packet creation table 968a to the packet creation table storage unit 968. The packet creation signal is output to the packet header creation unit 969.
[0050]
When the packet header creation unit 969 obtains the packet creation signal, the packet header creation unit 969 reads the packet sample header information from the packet creation table 968a held in the packet creation table storage unit 968, and creates moof data. In addition, the packet header creation unit 969 outputs the created moof data to the packet combining unit 971, and the actual data of the sample included in the packet is stored in the first data storage unit 962 and the second data storage unit 965. Mdat information including the pointer information indicating whether or not and the sample size information are output to the packet data creation unit 970.
[0051]
Based on the acquired mdat information, the packet data creation unit 970 reads sample substance data from the first data storage unit 962 and the second data storage unit 965 to create mdat data, and outputs the mdat data to the packet combining unit 971. To do.
[0052]
Then, the packet combining unit 971 combines the moof data and the mdat data, and outputs mp4 extension unit data for one packet.
Eventually, the output mp4 extension data for one packet is captured by the apparatus that creates the MP4 file, and the mp4 extension data that is sequentially created is arranged in order, so that the extension part of the MP4 file is Created. Thereafter, the MP4 file is created by combining the basic part and the extension part of the MP4 file with this file creation device.
[0053]
[Non-Patent Document 1]
ISO / IEC JTC1 / SC29 / WG11 MPEG, N4854 “Proposed Revised Common Text Multimedia File Format Specification”, March 21, 2002
[0054]
[Problems to be solved by the invention]
However, when reproducing the extended portion of the MP4 file multiplexed by such a conventional multiplexing device, there are the following problems.
As one of them, first, in the conventional multiplexing apparatus, multiplexing is performed without considering the reproduction start time of the sample included in the packet, so that, for example, synchronization with a video sample at a certain reproduction start time is achieved. Audio samples may be stored in different packets than video samples. Therefore, there is a problem that the efficiency of data access at the time of reproduction deteriorates on the reproduction apparatus side of the MP4 file.
[0055]
In addition, in the conventional multiplexing apparatus, multiplexing is performed based on the number of samples included in the packet. Therefore, the random accessible sample, that is, the video sample corresponding to the intra frame is stored in the packet. In many cases, it becomes mixed. Therefore, when searching for a randomly accessible sample on the MP4 file playback device side, all video samples included in the packet must be searched, and the amount of calculation required for searching the sample becomes enormous. There is also a problem.
[0056]
These problems will be described in more detail with reference to FIG.
FIG. 28 is a diagram for explaining problems of a conventional multiplexing apparatus.
FIG. 28A clarifies the first problem that the efficiency of data access during reproduction deteriorates.
[0057]
The header information of the sample included in each mdat is stored in the immediately preceding moof, and the header information regarding the video sample of the playback start time 20 s stored in mdat_1 is stored as the first sample in moof_1, and is stored in mdat_10. The stored header information related to the audio sample of the reproduction start time 20 s is stored as the final sample in moof_10.
[0058]
Therefore, if the MP4 file playback apparatus tries to play back the portion of the content playback time of 20 s, moof_10 from the acquisition of the video sample header information stored in moof_1 to the acquisition of the audio sample header information. The data access efficiency becomes worse.
[0059]
FIG. 28B clarifies the second problem that the amount of calculation required to search for a randomly accessible sample becomes enormous.
The header information related to the i-th randomly accessible video sample stored at the end of mdat_1 is stored as the last sample in moof_1, and is related to the i + 1th randomly accessible video sample stored at the end of mdat_3. The header information is stored as a final sample in moof_3.
[0060]
Therefore, if the MP4 file playback device tries to perform random access, it must search to the last sample of moof, and the amount of calculation required for the search becomes enormous.
Further, in addition to the first and second problems, the configuration of the MP4 file extension unit created by the conventional multiplexing device increases the number of seeks for obtaining sample data, so that the optical disk playback device There is also a problem that it is not suitable for random access reproduction in a device having a low seek speed such as the above.
[0061]
This problem will be described again with reference to FIG. When trying to randomly access the i-th randomly accessible video sample of moof_1, the playback apparatus first reads the pointer to the top position of moof_1 in order to obtain header information of the i-th randomly accessible video sample. , And the inside of moof_1 is analyzed in order. At this time, the first seek is required.
[0062]
After that, the playback apparatus obtains where in mdat_1 the actual data of the i-th randomly accessible video sample is stored, and moves the read pointer to the start position of the actual data. At this time, since the actual data of the i-th randomly accessible video sample is stored at the end of mdat_1, the actual data of the sample cannot be acquired by continuously moving the read pointer from the top position of moof_1. A second seek is required.
[0063]
That is, since the seek operation is performed when the read pointer is moved to the start position of moof_1 and the start position of the actual data, if the playback device is a device with a slow seek speed, it takes time for random access playback. End up. In particular, when the actual data such as an audio sample that is synchronized with the i-th randomly accessible video sample is stored separately from the actual data of the video sample such as a different packet, the seek operation is further performed. It becomes necessary, and it becomes difficult to perform random access reproduction quickly.
[0064]
Therefore, the present invention has been made in view of these problems, and media data is multiplexed so that a multiplexed file of media data is excellent in data access efficiency at the time of reproduction and the calculation amount required for sample search is reduced. An object of the present invention is to provide a multiplexing device that can multiplex.
[0065]
It is another object of the present invention to provide a multiplexing device capable of multiplexing media data so that the multiplexed file is suitable for random access reproduction in a device having a low seek speed.
[0066]
[Means for Solving the Problems]
In order to achieve the above object, a multiplexing apparatus according to the present invention creates a multiplexed data by packet multiplexing media data including image data and at least one of audio data and text data. The media data acquisition means for acquiring the media data, the media data acquired by the media data acquisition means is analyzed, and the minimum of the image data, audio data, and text data included in the media data is analyzed. Analyzing means for obtaining reproduction start time information indicating the reproduction start time of the sample for the sample that is an access unit, and the image data included in the media data based on the reproduction start time information obtained by the analyzing means, Align the playback start time of each sample of audio data and text data A packet unit determining unit for determining a unit for packetizing the media data; a packet header unit generating unit for generating a packet header unit for storing a header of the media data in a packetization unit determined by the packet unit determining unit; A packet data part creating means for creating a packet data part for storing the actual data of the media data in a packetization unit determined by the packet unit determining means; a packet header part created by the packet header part creating means; and the packet Packetizing means for creating a packet by combining the packet data part created by the data part creating means The packet unit determining means determines the unit by aligning the reproduction start times of the samples of the image data, audio data, and text data for all the packets necessary for storing the media data. It is characterized by that.
[0067]
As a result, the playback start times of the image data, audio data, and text data included in the media data are aligned and stored in the packet, so that the playback device can improve the efficiency of data access during playback. Can do.
[0068]
In the multiplexing apparatus according to the present invention, the image data is moving image data, and the analyzing unit further analyzes the moving image data acquired by the media data acquiring unit, and the moving image data is displayed on a screen. When at least one sample including intra frame information indicating that it is an intra-coded sample is included, the intra frame information is acquired, and the packet unit determining means is configured to acquire the intra frame information by the analyzing means. When acquired, the unit for packetizing the media data is determined based on the intra frame information and the playback start time information, and the sample of the moving image data including the intra frame information is determined as the packetization unit. It is preferable to arrange at the head.
[0069]
As a result, since the first video sample included in the packet becomes an intra-frame video sample, the amount of calculation required for searching for the sample at the time of random access on the playback device side can be greatly reduced.
Furthermore, in the multiplexing device according to the present invention, the packet data section creation means stores the media data samples included in the packetization unit in an interleaved manner so that the playback start times of the samples are in ascending order. More preferably, the packet data part is created.
[0070]
As a result, the video sample and audio sample are stored in mdat in ascending order of playback start time, so the number of seek operations during random access on the playback device can be reduced, and playback with a slow seek speed is possible. Even a device can realize quick random access reproduction.
[0071]
The present invention can be realized not only as such a multiplexing apparatus, but also as a multiplexing method using steps characteristic of the multiplexing apparatus as a step, or by performing these steps as a computer. It can also be realized as a program to be executed. Needless to say, such a program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.
[0072]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings. Note that MPEG-4 Visual encoded data is used as video data in the present embodiment, and MPEG-4 Audio encoded data is used as audio data in the present embodiment. In the present embodiment, an apparatus for multiplexing video data and audio data will be mainly described. However, it is not intended to exclude multiplexing of other media data such as text data.
[0073]
(Embodiment 1)
First, a multiplexing apparatus according to Embodiment 1 of the present invention will be described with reference to FIGS.
FIG. 1 is a block diagram showing a functional configuration of a multiplexing apparatus according to Embodiment 1 of the present invention.
The multiplexing device 100 is a device that multiplexes video data and audio data to create MP4 file extension data, and includes a first input unit 101, a first data storage unit 102, a first analysis unit 103, a second analysis unit 103, and a second analysis unit 103. An input unit 104, a second data storage unit 105, a second data analysis unit 106, a packet unit determination unit 107, a packet creation table storage unit 111, a packet header creation unit 112, a packet data creation unit 113, and a packet combination unit 114 are provided.
[0074]
The first input unit 101 is an interface for fetching encoded video data from the image encoding device or the like into the multiplexing device 100, and causes the first data storage unit 102 to sequentially store the acquired video input data.
The first data storage unit 102 is a cache memory or RAM (Random Access Memory) that temporarily holds video input data.
[0075]
The first analysis unit 103 reads out and analyzes video sample data that is data for one video sample from the video input data held in the first data storage unit 102, and outputs header information of the video sample Which is realized by a CPU and a memory. Note that the video sample header information output from the first analysis unit 103 includes information indicating the size of the video sample, the playback time length, and whether it is an intra frame. Further, in the case of a sample using bi-directional prediction, the video sample header information includes difference information between the decoding time and the display time.
[0076]
The second input unit 104 is an interface for importing encoded audio data from the speech encoding device or the like into the multiplexing device 100, and causes the second data storage unit 105 to sequentially store the acquired audio input data.
The second data storage unit 105 is a cache memory, RAM, or the like that temporarily holds audio input data.
[0077]
The second analysis unit 106 reads and analyzes audio sample data, which is data for one audio sample among the audio input data held in the second data storage unit 105, and outputs header information of the audio sample Which is realized by a CPU and a memory. Note that the audio sample header information output from the second analysis unit 106 includes information indicating the size and playback time length of the audio sample.
[0078]
The packet unit determination unit 107 accumulates the header information of the video sample and the audio sample included in the packet so that the playback start time of the video sample and the playback start time of the audio sample included in the packet are aligned. A processing unit that determines a packet unit of audio data, and is realized by a CPU and a memory. The packet unit determination unit 107 outputs a collection of sample header information for the determined packet units to the packet generation table storage unit 111 as a packet generation table, and creates a packet for instructing the creation of a packet header after the packet unit is determined The signal is output to the packet header creation unit 112. The packet unit determining unit 107 includes a time adjusting unit 108 that adjusts the packet unit in time units, a video packet unit determining unit 109 that determines a packet unit of video data, and an audio packet that determines a packet unit of audio data. A unit determining unit 110.
[0079]
The time adjustment unit 108 is a processing unit that adjusts the end time of a packet so that the packet is within a predetermined time unit. First, the time adjustment unit 108 outputs a predetermined time (target time) to the video packet unit determination unit 109. The target time may be designated by the user. In this case, the multiplexing apparatus 100 acquires the target time specification via an input device such as a keyboard, and outputs a target time input signal indicating the target time specified from the input device to the time adjustment unit 108. Become.
[0080]
The video packet unit determination unit 109 is a processing unit that acquires video sample header information from the first analysis unit 103 and determines a packet unit of video data.
The video packet unit determination unit 109 acquires the target time from the time adjustment unit 108 and the video sample header information from the first analysis unit 103 so that each video is stored in a packet within the target time. While counting the playback time length of each video sample included in the sample header information, the header information of the last video sample included in the packet is sequentially added to the video packet creation table. When the video packet unit determination unit 109 adds the header information of the last video sample included in the packet to the video packet creation table, the playback start time of the first video sample included in the packet and the video sample included in the packet Video sample playback time information indicating the total playback time length is output to the audio packet unit determination unit 110.
[0081]
The audio packet unit determination unit 110 is a processing unit that acquires the audio sample header information acquired from the second analysis unit 106 and determines the packet unit of audio data.
The audio packet unit determination unit 110 acquires video sample playback time information from the video packet unit determination unit 109 and audio sample header information from the second analysis unit 106, and is included in the head of the packet. The audio sample included in the packet is arranged while arranging the audio sample of the playback start time that is the same as or close to the playback start time of the first video sample, and counting the playback time length of each audio sample included in each audio sample header information The last audio sample included in the packet is arranged so that the total sum of the playback time lengths of the packets is the same as or approximate to the total playback time length of the video samples included in the packets.
[0082]
Here, the audio sample with the reproduction start time approximate to the video sample reproduction start time is the audio sample with the earliest reproduction start time or the video sample reproduction start after the video sample reproduction start time. It means an audio sample with the latest playback start time before the time.
[0083]
Thereafter, the audio packet unit determination unit 110 sequentially adds audio sample header information from the first audio sample to the last audio sample included in the packet to the audio packet creation table.
The packet creation table storage unit 111 is a cache memory, a RAM, or the like that temporarily holds the video packet creation table and the audio packet creation table output from the packet unit determination unit 107.
[0084]
The packet header creation unit 112 is a processing unit that creates a packet header part (moof) in which packet header information is stored, and is realized by a CPU or a memory.
When the packet header creation unit 112 acquires the packet creation signal from the packet unit determination unit 107, the packet header creation unit 112 reads the packet sample header information by referring to the packet creation table from the packet creation table storage unit 111, creates moof data, Output to the unit 114.
[0085]
The packet header creation unit 112 also includes pointer information indicating where in the first data storage unit 102 and the second data storage unit 105 the actual data of the video sample and the audio sample included in the packet is stored, Sample size information indicating the size and mdat information including a signal instructing creation of the packet data portion (mdat) are output to the packet data creation portion 113.
[0086]
Note that the packet header creation unit 112 was encoded by a coding method such as AMR (Advanced Multi Rate CODEC) in which coding rate switching occurs in the middle of data when creating a moof. For media data, header information can also be stored in different trafs according to the encoding rate.
[0087]
The packet data creation unit 113 is a processing unit that creates a packet data part (mdat) in which packet actual data is stored, and is realized by a CPU or a memory.
When the packet data creation unit 113 acquires the mdat information from the packet header creation unit 112, the packet data creation unit 113 receives the video sample contained in the packet from the first data storage unit 102 based on the pointer information and the sample size information contained in the mdat information. The video substance data is read, the audio substance data of the audio sample included in the packet is read from the second data storage unit 105 to generate mdat data, and is output to the packet combining unit 114.
[0088]
The packet combining unit 114 is a processing unit that combines the moof data and the mdat data to create mp4 extension data for one packet, and is realized by a CPU or a memory. The packet combining unit 114 acquires moof data from the packet header generation unit 112, acquires mdat data from the packet data generation unit 113, combines the moof data and mdat data, and mp4 expansion unit data for one packet. Are generated, and the sequentially created mp4 extension data is output to an apparatus for creating an MP4 file.
[0089]
With reference to FIG. 2, a description will be given of a processing procedure for creating an MP4 file extension in the multiplexing apparatus 100 configured as described above.
FIG. 2 is a flowchart showing the processing operation of the multiplexing apparatus 100.
First, when the first input unit 101 and the second input unit 104 capture video data and audio data, respectively, in the multiplexing apparatus 100 (S100), the first input unit 101 stores the video input data in the first data storage unit 102. The second input unit 104 causes the second data storage unit 105 to store the audio input data.
[0090]
Next, the first analysis unit 103 reads and analyzes the video sample data from the first data storage unit 102 and outputs the video sample header information to the video packet unit determination unit 109 of the packet unit determination unit 107. Then, the video packet unit determination unit 109 determines a packet unit of video data based on the video sample header information acquired from the first analysis unit 103 and the target time acquired from the time adjustment unit 108 (S110). The processing operation in which the video packet unit determination unit 109 determines the video data packet unit will be described in detail later.
[0091]
Thereafter, the video packet unit determination unit 109 outputs the reproduction time information of the video sample included in the packet whose packet unit is determined to the audio packet unit determination unit 110 (S120).
Then, the audio packet unit determination unit 110 determines the packet unit of the audio data based on the reproduction time information of the video sample acquired from the video packet unit determination unit 109 (S130). At this time, the audio packet unit determination unit 110 sets the packet unit so that the reproduction start time of the first audio sample included in the packet is the same as or close to the reproduction start time of the first video sample included in the packet. decide.
[0092]
When the audio packet unit determination unit 110 determines the packet unit of the audio data, the packet unit determination unit 107 outputs the packet creation table to the packet creation table storage unit 111 and outputs the packet creation signal to the packet header creation unit 112.
[0093]
Thereafter, the packet header creation unit 112 creates the moof data in the determined unit and outputs it to the packet combining unit 114, and the packet data creation unit 113 creates the mdat data in the determined unit and combines the packets. The packet combining unit 114 combines the moof data and the mdat data to create one packet in the determined unit (S140), and outputs it as mp4 extension data for one packet.
[0094]
When the creation of one packet is completed, the multiplexing apparatus 100 determines whether there is still input data from the first input unit 101 and the second input unit 104 (S150). Here, when there is input data (No in S150), the multiplexing apparatus 100 sets the data held in the buffer memory, that is, the first data storage unit 102, the second data storage unit 105, and the packet creation table storage unit 111. Among them, data that has already been packetized is cleared (S160), and the processing operations from S110 to S150 are repeated.
[0095]
On the other hand, if there is no input data (Yes in S150), the multiplexing apparatus 100 ends the MP4 file extension processing.
As described above, the multiplexing apparatus 100 first determines the packet unit of the video data, then determines the packet unit of the audio data, and multiplexes the media data, thereby creating the extension portion of the MP4 file.
[0096]
Here, the processing operation in which the video packet unit determination unit 109 determines the packet unit of video data in step S110 of FIG. 2 will be described in detail.
FIG. 3 is a flowchart showing the processing operation of the video packet unit determination unit 109. Prior to this flow, the video packet unit determination unit 109 acquires the target time from the time adjustment unit 108.
[0097]
When the video packet unit determination unit 109 acquires the video sample header information from the first analysis unit 103 (S111), the video packet unit determination unit 109 adds the video sample header information to the video packet creation table (S112).
At this time, the video packet unit determination unit 109 determines whether the total playback time length of the video samples included in the video sample header information, that is, the total playback time of the video data included in the packet has reached the previously acquired target time, Alternatively, it is determined whether or not the target time has been exceeded (S113).
[0098]
When the total reproduction time of the video data included in the packet has not reached the target time (No in S113), the video packet unit determination unit 109 acquires the next video sample header information (S111), and performs the processing in S112 and S113. Repeat the operation.
[0099]
When the total playback time of the video data included in the packet has reached the target time (Yes in S113), the video packet unit determination unit 109 determines the video sample indicated by the video sample header information added last to the video packet creation table, The last video sample included in the packet is determined (S114), and the processing operation for determining the packet unit is terminated.
[0100]
Next, the processing operation in which the audio packet unit determination unit 110 determines the audio data packet unit in step S130 of FIG. 2 will be described in detail.
FIG. 4 is a flowchart showing the processing operation of the audio packet unit determination unit 110.
[0101]
Prior to this flow, the audio packet unit determination unit 110 acquires video sample playback time information from the video packet unit determination unit 109.
When the audio packet unit determination unit 110 acquires the audio sample header information from the second analysis unit 106 (S131), the audio packet unit determination unit 110 refers to the previously acquired video sample playback time information (S132), The reproduction start time of the video sample is read, and an audio sample having a reproduction start time that is the same as or close to the reproduction start time of the first video sample included in the packet is determined as the audio first sample of the packet (S133).
[0102]
When the audio packet unit determination unit 110 determines the audio head sample included in the packet, the audio packet unit determination unit 110 sequentially acquires the audio sample header information (S134), and adds the audio sample header information to the audio packet creation table (S135).
[0103]
Thereafter, the audio packet unit determination unit 110 refers to the video sample playback time information, reads the sum of the playback time lengths of the video samples included in the packet (S136), and sums the playback time lengths of the audio samples included in the packet. Determines the last audio sample included in the packet so that it becomes equal to or close to the sum of the playback time lengths of the video samples included in the packet (S137), and ends the processing operation for determining the packet unit To do.
[0104]
The MP4 file extension created through such processing by the multiplexing device 100 is excellent in data access efficiency on the playback device side. The reason will be described with reference to FIG. 5 showing an example of the data structure of the MP4 file extension created by the multiplexing apparatus 100.
[0105]
The MP4 file extension unit 200 shown in FIG. 5A is composed of a plurality of packets and is coupled to the basic part of the MP4 file.
Each packet constituting the MP4 file extension unit 200 is composed of a packet header part moof and a packet data part mdat. Here, the packet_1 means the first packet of the MP4 file extension unit 200, the moof included in the packet_1 is indicated as moof_1, and the mdat included in the packet_1 is indicated as mdat_1. Also, “V” shown in each mdat in FIG. 5A indicates that it is a video sample, and “A” shown in each mdat in FIG. 5A is an audio sample. (Hereinafter, the same applies to other drawings).
[0106]
In mdat_1 of the MP4 file extension unit 200, a video sample with a playback start time of 20 seconds is stored as a video head sample, and an audio sample with a playback start time of 20 seconds is stored as an audio head sample. In mdat_2, a video sample with a playback start time of 30 seconds is stored as a video head sample, and an audio sample with a playback start time of 30 seconds is also stored as an audio head sample.
[0107]
In this way, by storing video samples and audio samples in one packet with their respective playback start times aligned, the amount of calculation required for data access when the MP4 file extension unit 200 is played back on the playback device side. Can be greatly reduced.
[0108]
Further, since the reproduction start times of the respective media data are aligned and stored in the packet, it is possible to divide the data by an arbitrary number of packets and adjust the size of the MP4 file data to a desired size.
Here, the MP4 file extension created by the multiplexing apparatus 100 may have the data structure shown in FIG.
[0109]
FIG. 5B is a diagram illustrating a second example of the data structure of the MP4 file extension created by the multiplexing apparatus 100.
In the mdat_1 of the MP4 file extension unit 210 shown in FIG. 5B, a video sample with a playback start time of 20 seconds is stored as a video head sample, and an audio sample with a playback start time of 20 seconds is stored in mdat_2. Stored as the first audio sample. In mdat_3, a video sample with a playback start time of 30 seconds is stored as a video head sample, and in mdat_4, an audio sample with a playback start time of 30 seconds is stored as an audio head sample.
[0110]
In this way, by storing either video or audio data in one packet, and alternately arranging a packet for storing video data and a packet for storing audio data with the same reproduction start time. However, when the MP4 file expansion unit 200 is played back on the playback device side, the amount of calculation required for data access can be greatly reduced.
[0111]
As described above, according to multiplexing apparatus 100 according to the first embodiment, each media data is packetized by aligning the playback start time of each media data, so that the efficiency of data access on the playback device side is increased. Can be achieved.
[0112]
(Embodiment 2)
Next, a multiplexing apparatus according to Embodiment 2 of the present invention will be described with reference to FIGS.
The multiplexing apparatus according to the second embodiment is common in the main components to the multiplexing apparatus 100 according to the first embodiment, but has a characteristic configuration in the packet unit determination unit. 3 differs from the multiplexing apparatus 100 according to the first embodiment. Hereinafter, this difference will be mainly described. In addition, about the same component as the said Embodiment 1, the same code | symbol shall be used and description is abbreviate | omitted.
[0113]
FIG. 6 is a block diagram showing a functional configuration of the packet unit determination unit of the multiplexing apparatus according to the second embodiment.
The packet unit determination unit 117 accumulates the header information of the video sample and audio sample included in the packet so that the respective reproduction start times are aligned, and the first video sample included in the packet becomes an intra frame. As described above, the processing unit determines a packet unit of video data and audio data, and includes a time adjustment unit 108, a video packet unit determination unit 119, and an audio packet unit determination unit 110.
[0114]
The video packet unit determination unit 119 is a processing unit that acquires video sample header information from the first analysis unit 103 and determines a packet unit of video data based on either time or an intra frame. Unit 120 and an I-frame reference unit adjustment unit 121.
[0115]
The time reference unit adjustment unit 120 is a processing unit that adjusts the packet unit of video data based on the target time output from the time adjustment unit 108, counts the reproduction time length of each video sample header information, The packet unit is adjusted to be a predetermined time unit.
[0116]
The I frame reference unit adjustment unit 121 adjusts the packet unit of video data based on whether or not the video sample header information output from the first analysis unit 103 includes information indicating an intra frame. If the video sample header information that contains information indicating that it is an intra frame is acquired, the packet unit is switched by the video sample of the intra frame, and the video sample of the next packet is the video sample of the intra frame. The packet unit is adjusted so that
[0117]
The processing operation in which the video packet unit determining unit 119 determines the packet unit of video data in the multiplexing apparatus according to the second embodiment having the packet unit determining unit 117 configured as described above will be described in detail.
FIG. 7 is a flowchart showing the processing operation of the video packet unit determination unit 119.
[0118]
Prior to this flow, the video packet unit determination unit 119 acquires the target time from the time adjustment unit 108 and stores the target time in the time reference unit adjustment unit 120.
As in the first embodiment, when the video packet unit determination unit 119 acquires the video sample header information from the first analysis unit 103 (S201), the video packet unit determination unit 119 adds the video sample header information to the video packet creation table (S202). ).
[0119]
At this time, the video packet unit determination unit 119 determines in the I frame reference unit adjustment unit 121 whether the acquired video sample header information includes information indicating an intra frame (S203).
When information indicating an intra frame is included (Yes in S203), the video packet unit determination unit 119 causes the time reference unit adjustment unit 120 to calculate the total playback time of all video samples included in the packet. It is determined whether or not the acquired target time is exceeded (S205).
[0120]
Here, when information indicating an intra frame is not included (No in S203) or when the target time is not exceeded (No in S205), the video packet unit determination unit 119 includes the time reference unit adjustment unit 120. In step S204, the sum of the playback time lengths of the video samples included in the packet is updated by adding the playback time length of the video samples included in the video sample header information (S204), and the next video sample header information is acquired. (S201) The above processing operation is repeated.
[0121]
On the other hand, when the target time is exceeded (Yes in S205), the video packet unit determination unit 119 determines that the last video sample included in the packet is an intra frame in the I frame reference unit adjustment unit 121. The video sample immediately before the sample is determined (S206), and the video data packet unit determination processing operation is terminated.
[0122]
In the extension part of the MP4 file created through the processing operation of the video packet unit determination unit 119, the video sample stored at the head of the packet is always an intra-frame video sample. Sometimes, playback can be started from the first video sample of the packet, and the amount of calculation required to search for a randomly accessible video sample can be greatly reduced.
[0123]
Also, since the video sample stored at the head of the packet is always an intra-frame video sample, the packet header portion (moof) has a trun head sample flag located at the head of the traf storing the video track header information. It is only necessary to describe information indicating that random access is possible only in the field. Since the sample flag field of each trun can be omitted by using the default value, the load at the time of creating the moof data is reduced. The file size of the entire MP4 file can also be reduced.
[0124]
According to this processing operation, when the interval between intra frames included in the video data is increased, the reproduction time length per packet may be increased. Therefore, the packet unit determination unit 117 may perform processing operations as described below.
[0125]
FIG. 8 is a flowchart showing the second processing operation of the video packet unit determination unit 119.
Similar to the first processing operation, prior to this flow, the video packet unit determination unit 119 acquires the target time from the time adjustment unit 108 and holds it in the time reference unit adjustment unit 120.
[0126]
When the video packet unit determination unit 119 acquires the video sample header information from the first analysis unit 103 (S211), the video packet unit determination unit 119 adds the video sample header information to the video packet creation table (S212).
At this time, the video packet unit determination unit 119 determines in the time reference unit adjustment unit 120 whether or not the total playback time of all video samples included in the packet exceeds the previously acquired target time (S213). .
[0127]
When the target time is exceeded (Yes in S213), the video packet unit determination unit 119 indicates the video indicated by the video sample header information immediately before the video sample header information acquired this time as the last video sample included in the packet. The sample is determined (S214), and the processing operation for determining the video data packet unit is terminated.
[0128]
On the other hand, when the target time has not been exceeded (No in S213), the video packet unit determination unit 119 includes information indicating that it is an intra frame in the acquired video sample header information in the I frame reference unit adjustment unit 121. It is determined whether or not (S215).
[0129]
Here, when information indicating an intra frame is included (Yes in S215), the video packet unit determination unit 119 determines the last video sample included in the packet in the I frame reference unit adjustment unit 121. The video sample immediately before the video sample determined to be a frame is determined (S214), and the processing operation for determining the video data packet unit is terminated.
[0130]
On the other hand, when information indicating an intra frame is not included (No in S215), the video packet unit determination unit 119 causes the time reference unit adjustment unit 120 to reproduce the playback time of the video sample included in the video sample header information. By adding the length, the sum of the playback time lengths of the video samples included in the packet is updated (S216), the next video sample header information is acquired (S211), and the above processing operation is repeated.
[0131]
The extension unit of the MP4 file created through the second processing operation of the video packet unit determination unit 119 creates a packet by setting a predetermined time limit and keeps the packet size below a desired size. If there is an intra-frame video sample, it can be stored at the beginning of the packet, so the playback device determines whether or not only the top video sample of the packet is randomly accessible during random access. This can reduce the amount of calculation required to search for a randomly accessible video sample.
[0132]
When the video packet unit determination unit 119 completes the video data packet unit determination processing operation, the video packet unit determination unit 119 outputs the video sample playback time information to the audio packet unit determination unit 110, and determines the audio data packet unit in the audio packet unit 110. The processing operation is performed in the same manner as in the first embodiment.
[0133]
The MP4 file extension unit created through such processing operations by the packet unit determination unit 117 reduces the search load during random access on the playback device side. The reason will be described with reference to FIG. 9 showing an example of the data structure of the MP4 file extension created by the multiplexing apparatus according to the second embodiment.
[0134]
In the mdat_1 of the MP4 file extension unit 220 shown in FIG. 9A, an intra-frame video sample is stored as a video head sample, and an intra-frame video sample is also stored in mdat_2 as a video head sample. .
[0135]
In this way, by storing the intra-frame video sample in the packet as the first video sample, only the first video sample of the packet is acquired in order to obtain a randomly accessible video sample at the time of random access on the playback device side. Since searching is sufficient, there is no need to search all video samples included in the packet, and the sample search load during random access can be greatly reduced.
[0136]
At this time, also in the MOOF_1 and MOOF_2 of the MP4 file extension unit 220, information indicating that random access is possible only in the leading sample flag field of trun located at the beginning of the traf storing the header information of the video track. By describing, the size of moof_1 and moof_2 can also be reduced.
[0137]
Here, the MP4 file extension created by the multiplexing apparatus according to the second embodiment may have the data structure shown in FIG.
In the mdat_1 of the MP4 file extension unit 230 shown in FIG. 9B, the intra-frame video sample is stored as the video head sample, and the intra-frame video sample is also stored in the mdat_3 as the video head sample. . Audio samples are stored in mdat_2 and mdat_4.
[0138]
As described above, the playback device can also be configured by storing either video or audio data in one packet and storing the video sample of the intra frame as the first video sample in the packet storing the video data. The sample search load during random access can be greatly reduced.
[0139]
Note that in any of these data structure examples of the MP4 file extension unit, the data on the playback device side is obtained by aligning the playback start time of the first video sample stored in the packet with the playback start time of the first audio sample. The amount of computation required for access can be greatly reduced.
[0140]
As described above, according to the multiplexing apparatus according to the second embodiment, since a packet is created using a randomly accessible video sample as the first video sample, it is necessary for the playback apparatus to search for a sample during random access. The amount of calculation can be reduced.
[0141]
(Embodiment 3)
Furthermore, a multiplexing apparatus according to Embodiment 3 of the present invention will be described with reference to FIGS.
The multiplexing apparatus according to the third embodiment is common to the multiplexing apparatuses according to the first and second embodiments in the main components, but has a characteristic configuration in the packet data creation unit. In this respect, the multiplexing apparatus according to the first and second embodiments is different. Hereinafter, this difference will be mainly described. In addition, about the same component as the said Embodiment 1 and 2, the same code | symbol shall be used and description is abbreviate | omitted.
[0142]
FIG. 10 is a block diagram showing a functional configuration of the packet data creation unit of the multiplexing apparatus according to the third embodiment.
The packet data creation unit 130 is a processing unit that creates a packet data portion (mdat) by interleaving and storing video sample entity data and audio sample entity data. The mdat information acquisition unit 131, A video entity data reading unit 132, an audio entity data reading unit 133, and an interleave arrangement unit 134 are provided.
[0143]
The mdat information acquisition unit 131 is a processing unit that acquires the mdat information from the packet header creation unit 112 and outputs an instruction to read the actual data and reproduction time information to the other units constituting the packet data creation unit 130.
When the mdat information acquisition unit 131 acquires the mdat information from the packet header creation unit 112, the mdat information analysis unit 131 analyzes the mdat information and acquires reproduction time information indicating the reproduction start time and the reproduction end time of the video sample and the audio sample. Based on the reproduction time information, all video samples and audio samples included in the packet are rearranged so that the reproduction start time is in ascending order.
[0144]
Then, the mdat information acquisition unit 131 outputs a video reading instruction for instructing the video entity data reading unit 132 to read out the actual data of the video sample in order from the sample with the lowest playback start time according to the rearranged order, or the audio An audio read instruction for instructing the substance data reading unit 133 to read the substance data of the audio sample is output. This video read instruction includes pointer information indicating where the actual data of the video sample is stored in the first data storage unit 102 and video sample size information. The audio read instruction includes the audio sample. Pointer information indicating where in the second data storage unit 105 the actual data is stored, and audio sample size information.
[0145]
The video entity data reading unit 132 is a processing unit that acquires a video reading instruction from the mdat information acquisition unit 131 and reads the video entity data from the first data storage unit 102. The video entity data reading unit 132 reads the video entity data from the first data storage unit 102 with reference to the pointer information and the size information included in the video reading instruction, and stores the read video entity data in the interleave arrangement unit 134. Output.
[0146]
The audio entity data reading unit 133 is a processing unit that acquires an audio reading instruction from the mdat information acquisition unit 131 and reads audio entity data from the second data storage unit 105. The audio entity data reading unit 133 reads the audio entity data from the second data storage unit 105 with reference to the pointer information and the size information included in the audio reading instruction, and the read audio entity data to the interleave arrangement unit 134. Output.
[0147]
The interleaving array unit 134 sequentially acquires the read video data and the read audio data output from the video substance data reading unit 132 and the audio substance data reading unit 133 in the order of output, and creates mdat data by arranging them in an interleaved manner. And a processing unit that outputs to the packet combining unit 114.
[0148]
In the multiplexing apparatus according to the third embodiment provided with the packet data creation unit 130 configured as described above, a processing operation in which the packet data creation unit 130 creates mdat will be described in detail.
FIG. 11 is a flowchart showing the processing operation of the packet data creation unit 130.
[0149]
First, the packet data creation unit 130 obtains mdat information from the packet header creation unit 112 in the mdat information acquisition unit 131 (S301). The mdat information acquisition unit 131 analyzes the acquired mdat information and extracts sample pointer information, size information, and reproduction time information. Then, the mdat information acquisition unit 131 rearranges all the video samples and audio samples included in the packet so that the playback start times are in ascending order based on the extracted playback time information of the samples. Subsequently, the mdat information acquisition unit 131 outputs a video reading instruction including pointer information and size information of the extracted video sample to the video entity data reading unit 132 in order from the sample with the smallest playback start time according to the rearranged order. Alternatively, an audio reading instruction including pointer information and size information of the extracted audio sample is output to the audio entity data reading unit 133.
[0150]
When acquiring the video read instruction, the video entity data reading unit 132 reads the video entity data from the first data storage unit 102 with reference to the pointer information and the size information, and outputs the video entity data to the interleave array unit 134. When acquiring the audio reading instruction, the unit 133 reads the audio entity data from the second data storage unit 105 with reference to the pointer information and the size information, and outputs it to the interleave arrangement unit 134 (S302).
[0151]
When the interleaving arrangement unit 134 receives the read entity data from the video entity data reading unit 132 and the audio entity data reading unit 133, the interleaving arrangement unit 134 sequentially arranges the received entity data in the order received (S303).
Here, the interleave arrangement unit 134 continues the arrangement of the entity data until all the arrangement of the video entity data and the audio entity data, that is, all of the entity data stored in one packet is completed (No in S304). S303).
[0152]
When all arrangements of entity data stored in one packet are completed (Yes in S304), the interleave arrangement unit 134 outputs the arranged entity data as mdat data to the packet combining unit 114 (S305). The processing for creating mdat is terminated.
[0153]
The MP4 file extension unit created through the processing operation of the packet data creation unit 130 is suitable for random access reproduction in an optical disk device or the like that takes time to seek. The reason for this will be described with reference to FIG. 12 showing an outline of the data structure of the MP4 file extension created by the multiplexing apparatus according to the third embodiment.
[0154]
The MP4 file extension unit 240 shown in FIG. 12 stores packet 1 for storing content data for 4 to 8 seconds, packet 2 for storing content data for 8 to 12 seconds, and content data for 12 to 16 seconds. A packet 3 is configured by arranging a plurality of packets.
[0155]
Each packet is composed of a moof 241 and an mdat 242. The moof 241 includes tfhd (V) and traf (V-1, V-2) relating to a video track, and tfhd (A) and traf (A-) relating to an audio track. 1, A-2) are stored. The sample entity data indicated by the header information stored in traf (V-1) and traf (A-1) is stored in mdat_1, and stored in traf (V-2) and traf (A-2). Sample entity data indicated by the header information is stored in mdat_2. In the mdat 242, the actual data of the video sample and the actual data of the audio sample are alternately interleaved and stored.
[0156]
At this time, if the read pointer is moved to the top position of moof_1 in the random access process in which the playback starts from the position where the playback time is 4 seconds on the playback device side, the moof_1 is analyzed thereafter, and the read pointer is continuously set. Therefore, the entity data necessary for reproduction can be acquired from mdat_1 that is continuous with moof_1.
[0157]
That is, according to the MP4 file extension unit 240, the playback device can realize random access playback by only one seek operation for moving the read pointer to the top position of moof_1, so it takes time to seek. It can be said that it is effective for optical disk devices.
[0158]
Here, in mdat242, the audio sample entity data stored immediately after the video sample entity data is aligned with the playback start time of the immediately preceding video sample, so that synchronized playback of video data and audio data is guaranteed. ing. FIG. 13 shows a state in which entity data is stored in mdat_1 of the MP4 file extension unit 240.
[0159]
As shown in FIG. 13, the playback start time of the video sample 1 stored at the head of mdat_1 is 4000 ms, and the playback start time of the audio sample 1 stored immediately after the video sample 1 is 4000 ms. The playback start times of the video sample 1 and the audio sample 1 are the same.
[0160]
Usually, the sample rate of the video sample and the audio sample is often different, and here, the playback time length of the video sample is 500 ms, and the playback time length of the audio sample is 100 ms.
Therefore, in mdat_1 of the MP4 file extension unit 240, audio samples 1 to 5 are interleaved and stored immediately after the video sample 1, and thereafter, the video sample 2, the audio samples 6 to 10, the video sample 3,. They are stored in order.
[0161]
At this time, the playback start time of the video sample 2 is 4500 ms, the playback start time of the audio sample 6 stored immediately after the video sample 2 is also 4500 ms, and the playback of the video sample and the audio sample immediately after the video sample is played back. The start times are always set to be the same.
[0162]
Further, since the sample rates of the video sample and the audio sample are different, the playback start time of the video sample and the playback start time of the audio sample immediately after that may not be the same. Even in such a case, synchronized playback of video data and audio data can be ensured by using an audio sample immediately after the video sample as an audio sample having a playback start time approximate to the playback start time of the video sample.
[0163]
FIG. 14 is a diagram illustrating a second data structure showing a state in which entity data is stored in mdat_1 of the MP4 file extension unit.
As shown in FIG. 14, the playback start time of the video sample 1 stored at the head of mdat_1 of the MP4 file extension unit 250 is 4000 ms, and the playback start of the audio sample 1 stored immediately after the video sample 1 is started. The time is 4050 ms, and the audio sample 1 having the earliest reproduction start time after the reproduction start time of the video sample 1 is stored as an audio sample stored immediately after the video sample 1.
[0164]
Here, similarly to the case described above, the playback time length of the video sample is 500 ms, and the playback time length of the audio sample is 100 ms.
Therefore, the audio samples 1 to 5 are interleaved and stored immediately after the video sample 1 in the mdat_1 of the MP4 file extension unit 250, and then the video sample 2, the audio samples 6 to 10, the video sample 3. Are stored in the order of.
[0165]
At this time, the playback start time of the video sample 2 is 4500 ms, the playback start time of the audio sample 6 stored immediately after the video sample 2 is 4550 ms, and the video sample and the audio sample immediately after the video sample are The playback start times are always aligned so as to approximate each other.
[0166]
Here, as an audio sample stored immediately after the video sample, an audio sample having the latest playback start time before the video sample playback start time may be stored. In this case, the audio sample 1 stored immediately after the video sample 1 has a playback time of 3950 ms.
[0167]
As described above, according to the multiplexing apparatus of the third embodiment, an audio sample having a playback start time that is the same as or close to the playback start time of the video sample is arranged immediately after the video sample, and the video sample And the audio sample are interleaved so that the playback start time is in ascending order and stored in mdat. Therefore, even in a playback device with a low seek speed, it is possible to create an MP4 file extension portion having a data structure that can be quickly accessed at random. it can.
[0168]
(Embodiment 4)
Subsequently, a demultiplexing apparatus according to Embodiment 4 of the present invention will be described with reference to FIG. 15 and FIG.
FIG. 15 is a block diagram showing a functional configuration of the demultiplexing apparatus according to the fourth embodiment.
The demultiplexing apparatus 300 acquires and analyzes MP4 file data including the MP4 file extension created by the multiplexing apparatus according to the first, second, and third embodiments, demultiplexes the media data, and plays back the data. A file input unit 301, a file data storage unit 302, a header separation analysis unit 303, a moov analysis unit 304, a moof analysis unit 305, a traf analysis unit 306, a run analysis unit 307, an RA search unit 308, and a sample An acquisition unit 309 is provided.
[0169]
The file input unit 301 is an interface for acquiring MP4 file data, and causes the file data storage unit 302 to store the acquired MP4 file input data sequentially.
The file data storage unit 302 is a cache memory or RAM that temporarily holds MP4 input data.
[0170]
The header separation analysis unit 303 reads and analyzes the MP4 file header data out of the MP4 input data held in the file data storage unit 302, and the MP4 file basic part header moov data and the extension part header moof data. And processing units that are output to the moov analysis unit 304 and the moof analysis unit 305, respectively, and are realized by a CPU or a memory.
[0171]
The moov analysis unit 304 is a processing unit that analyzes the moov of the MP4 file and acquires media information necessary for the analysis of the media data such as the encoding rate of the media data and the playback time length of the content. Realized. The moov analysis unit outputs the acquired media information to the moof analysis unit 305.
[0172]
The moof analysis unit 305 is a processing unit that analyzes the moof of the MP4 file based on the media information acquired from the moov analysis unit 304, and outputs the traf data that is header data for each track to the traf analysis unit 306. And memory.
[0173]
The traf analysis unit 306 is a processing unit that analyzes the traf of the MP4 file and outputs the run data, which is header data for each sample included in the traf, to the run analysis unit 307, and is realized by a CPU or a memory.
The run analysis unit 307 is a processing unit that analyzes the run of the MP4 file, acquires information described in each field in the run, and outputs the run analysis information to the sample acquisition unit 309. It is realized by. The trun analysis information includes, for example, the size of the sample, data offset information indicating where the sample is stored in the file data storage unit 302, and whether it is an intra frame in the case of a video sample. The flag information indicating that is included.
[0174]
Also, the trun analysis unit 307 indicates a reproduction start position after random access from the RA search unit 308 described below, and obtains a reproduction start instruction that is an output signal instructing the start of reproduction. The analysis is sequentially performed from the “run”, and the run analysis information is output to the sample acquisition unit 309.
[0175]
The RA search unit 308 acquires target playback time information indicating the playback start time after random access, and plays start time for the first sample included in the first trun in the first traf that stores the header information regarding the video track. And a processing unit that reads out the head sample information, which is information indicating whether the frame is an intra frame, and searches for a video sample as a playback start position after random access, and is realized by a CPU or a memory. When the RA search unit 308 acquires the target playback time information from the input device of the demultiplexer 300 that accepts a random access instruction from the user, the RA search unit 308 sequentially acquires only the top sample information from the run analysis unit 307, and acquires the target playback time. A video sample having a playback start time that is the same as or similar to the information is searched, and a playback start instruction is output to the run analysis unit 307.
[0176]
The sample acquisition unit 309 is a processing unit that reads and decodes the actual data of the sample based on the trun analysis information, and outputs the reproduction data to a display device such as a display. When the sample acquisition unit 309 acquires the run analysis information from the run analysis unit 307, the sample acquisition unit 309 refers to the data offset information included therein and reads the actual data of the sample from the file data storage unit 302. Here, it is assumed that the reproduction start is instructed when the acquisition of the trun analysis information is started.
[0177]
A random access processing operation in the demultiplexing apparatus 300 configured as described above will be described with reference to FIG.
FIG. 16 is a flowchart showing the random access processing operation of the demultiplexer 300. Prior to this flow, it is assumed that the demultiplexer 300 receives a random access instruction from the user via the input device.
[0178]
First, when the file input unit 301 acquires the data of the MP4 file created in the multiplexing device according to the first, second, or third embodiment (S400), the demultiplexing device 300 sequentially stores the file data storage unit 302. To accumulate.
[0179]
Next, the demultiplexing apparatus 300 separates and analyzes only the file header part of the MP4 file in the header separation analysis unit 303 (S410), and further separates the basic part header and the extension part header into moov. The analysis unit 304 analyzes the basic part header, and the moof analysis unit 305 analyzes the extension part header (S420).
[0180]
Subsequently, in the demultiplexing apparatus 300, the moof analysis unit 305 further separates the extension unit header into headers for each track, and the traf analysis unit 306 analyzes the track fragment, that is, traf (S430). At this time, the demultiplexing apparatus 300 further separates the track fragments in the traf analysis unit 306 and analyzes the trun in the run analysis unit 307.
[0181]
Here, when the target reproduction time information is input in the RA search unit 308, the demultiplexing apparatus 300 outputs the head sample information from the run analysis unit 307 to the RA search unit 308, and the RA search unit 308 outputs the target reproduction time. It is determined whether or not the first sample information indicates the reproduction start time that is the same as or similar to the time information (S440).
[0182]
At this time, if the target sample is not found (No in S450), the demultiplexing apparatus 300 acquires the first sample information in the extension unit header arranged next in the storage order in the file in the RA search unit 308. Then, it is determined whether or not the head sample information indicates the reproduction start time that is the same as or similar to the previously acquired target reproduction time information (S440).
[0183]
On the other hand, if the target sample is found (Yes in S450), the demultiplexer 300 generates a reproduction start instruction in the RA search unit 308 and outputs the reproduction start instruction to the run analysis unit 307. When receiving the playback start instruction from the RA search unit 308, the run analysis unit 307 outputs the run analysis information to the sample acquisition unit 309 in order from the trun that has received the playback start instruction. Here, the trun that has received the instruction to start reproduction refers to the trun that includes the sample instructed to start reproduction by the RA search unit 308.
[0184]
Thereafter, the demultiplexer 300 refers to the data offset information included in the run analysis information in the sample acquisition unit 309, acquires the actual data of the target sample from the file data storage unit 302 (S460), decodes it. The reproduction data is output and the random access processing operation is terminated.
[0185]
As described above, according to the demultiplexing apparatus 300 according to the fourth embodiment, random access is performed on the MP4 file including the MP4 file extension generated by the multiplexing apparatus according to the first, second, or third embodiment. When performing playback, it is possible to determine the video sample that should be the playback start position after random access by searching only the video sample stored at the beginning of each packet. The load will be greatly reduced.
[0186]
(Application example)
Here, an application example of the multiplexing apparatus according to the present invention will be described with reference to FIG.
FIG. 17 is a diagram illustrating an application example of the multiplexing device according to the present invention.
The multiplexing device according to the present invention can be applied to a cellular phone 403 with a recording function and a personal computer 404 that acquire and multiplex media data such as video data and audio data and create MP4 file data. Further, the demultiplexing apparatus according to the present invention can be applied to a mobile phone 407 that reads and reproduces the created MP4 file data.
[0187]
Here, the MP4 file data created in the cellular phone with recording function 403 and the personal computer 404 is stored in a recording medium such as the SD memory card 405 or the DVD-RAM 406 or stored in the image distribution server 401 via the communication network 402. And transmitted from the image distribution server 401 to another mobile phone 407 or the like.
[0188]
As described above, the multiplexing device and the demultiplexing device according to the present invention are used as an MP4 file creation device or playback device in an image distribution system or the like.
As described above, the multiplexing apparatus and the demultiplexing apparatus according to the present invention have been described based on the respective embodiments and the like, but the present invention is not limited to these embodiments and the like.
[0189]
For example, in each of the above embodiments, MPEG-4 Visual encoded data is used as the video data, but other moving images such as MPEG-4 AVC (Advanced Video Coding) and H.263 are used as the video data. You may use the encoding data by an image compression encoding system. In MPEG-4 AVC (Advanced Video Coding) and H.263 encoded data, one picture corresponds to one sample.
[0190]
Similarly, MPEG-4 Audio encoded data is used as the audio data, but encoded data according to another audio compression encoding method such as G.726 may be used as the audio data.
In each of the above embodiments, the video data and the audio data are described. However, even when text data or the like is included, the processing is performed in the same manner as the packetization of the audio data. The effects of the invention can be obtained.
[0191]
Further, in the second embodiment, when packetization is performed for each intra frame, the time reference unit adjustment unit 120 is omitted from the components of the packet unit determination unit 117, and the process of step S205 in FIG. It may be omitted.
[0192]
Furthermore, in the third embodiment, when an MP4 file is to be played in accordance with a buffer model set in advance on the playback device side of the MP4 file, the video sample is set so as to satisfy the buffer model. Data and audio sample data may be interleaved and stored in mdat. Here, the buffer model means that when encoded data is input in accordance with the conditions defined in the standard, the buffer is emptied by providing the playback apparatus with a buffer having a size defined in the standard (underscore). This is a model for ensuring that the playback apparatus can perform decoding without overflowing (overflow) from the buffer.
[0193]
In the first, second, and third embodiments, the number of trafs stored in the moof of the extension part of the MP4 file to be created is not mentioned, but the traf stored in the moof is one traf per track. Is preferably stored. In this way, if only the first traf in the moof is analyzed for each track, the header information for all the samples of the track stored in the moof can be obtained, so the efficiency at the time of obtaining the header information Will be further improved.
[0194]
Furthermore, in Embodiments 1, 2, and 3, sample entity data in which header information is stored in the moof of the extension part of the MP4 file to be created is stored in one mdat continuous with moof. It is good also as dividing | segmenting and storing in several mdat continuous with moof. More specifically, sample entity data in which header information is stored in moof_1 is stored in the order of mdat_1, mdat_2, and mdat_3, and sample entity data in which header information is stored in moof_2 is stored in mdat_4, mdat_5, and mdat_6. You may store in order.
[0195]
In Embodiments 2 and 3, when an intra frame of moving image data is included in a packet, it is arranged at the head of the packet. However, if random access is possible, P (Predictive) Video samples other than intra frames, such as frames and B (Bidirectionally predictive) frames, may be arranged at the head of the packet. Hereinafter, this will be described by taking as an example the case of using MPEG-4 AVC encoded data as video data.
[0196]
In MPEG-4 AVC, there is a case where a correct decoding result cannot be obtained even when decoding from an intra picture. More specifically, there are two types of MPEG-4 AVC intra pictures: IDR (Instantaneous Decoder Refresh) pictures and other pictures (hereinafter referred to as non-IDR intra pictures), which are decoded from IDR pictures. When decoding is started, a correct decoding result can always be obtained, but when decoding is started from a non-IDR intra picture, a non-IDR intra picture and a plurality of pictures after the non-IDR intra picture in display order A correct decoding result may not be obtained.
[0197]
Therefore, in MPEG-4 AVC, in order to obtain a correct decoding result from a non-IDR intra picture, auxiliary information indicating which picture should be decoded (Recovery Point Supplemental Enhancement Information, hereinafter referred to as “Recovery Point”).
SEI ”) can be added.
[0198]
For example, five pictures indicated by Pic_1, Pic_2, Pic_3, Pic_4, and Pic_5 are included in the video data in this order, Pic_5 is a non-IDR intra picture, and Pic_5 and Pic_5 and subsequent pictures are correctly decoded in display order. If decoding is to be started from Pic_1, the Recovery Point SEI is placed immediately before Pic_1, so that Pic_5, which is the fourth picture in the storage order in the video data, and the display are displayed. In order to correctly decode the subsequent pictures in order, it can be shown that it is necessary to start decoding from Pic_1.
[0199]
That is, in this case, it can be said that Pic_1 is a randomly accessible sample. Therefore, in the case of MPEG-4 AVC encoded data, a sample of a picture to which an IDR picture or a recovery point SEI is added can be randomly accessed. It is good also as arrange | positioning at the head of a packet as a sample. Note that Recovery Point SEI can be added to pictures other than intra pictures.
[0200]
At this time, the sample of the picture to which the Recovery Point SEI is added and the picture sample from which a correct decoding result can be obtained by starting decoding from the picture to which the Recovery Point SEI is added are stored in the same packet. As a result, the processing amount at the time of obtaining sample data can be reduced.
[0201]
Furthermore, an IDR picture and a sample of a picture to which Recovery Point SEI is added can be identified by a specific flag value (hereinafter referred to as a non-sync sample flag) in the head sample flag 930 or the sample flag 935. . In MP4, the non-sync sample flag can be set to 0 only for samples in which random access samples and samples for which correct decoding results are obtained among samples that can be accessed randomly. For this reason, the IDR picture sample can be identified by setting the nonsync sample flag to 0, and the picture sample to which the Recovery Point SEI is added by setting the nonsync sample flag to 1.
[0202]
By using the identification method as described above, it is possible to identify randomly accessible samples having different properties, not limited to pictures to which IDR pictures and Recovery Point SEI are added. In practice, it can be used as follows.
[0203]
The first is a case where fast-forward playback is performed by playing back only a specific sample. At this time, since it is desirable that the decoded sample can be displayed immediately, only the sample whose non-sync sample flag is 0 is decoded and reproduced.
[0204]
The second is a case where playback is started from the middle of the content, or a specific section is skipped and playback of the next section is started. At this time, there is a possibility that the sample from which decoding is started and the sample from which a correct decoding result is obtained are only at the start of reproduction. Therefore, it is assumed that reproduction can be started from either a sample whose nonsync sample flag is 0 or a randomly accessible sample whose nonsync sample flag is 1.
[0205]
Note that such a storage method is not limited to MPEG-4 AVC Recovery Point SEI, and can be applied when a sample for starting decoding is different from a sample from which a correct decoding result is obtained. It can be applied to a structure such as Open GOP (Group Of Pictures) in MPEG2-Video.
[0206]
Further, when identification information indicating that the sample is randomly accessible is added, the sample indicated by the identification information as being randomly accessible may be arranged at the head of the packet. .
[0207]
【The invention's effect】
As is apparent from the above description, according to the multiplexing device of the present invention, the reproduction start times of the image data, audio data, and text data included in the media data are aligned and stored in the packet. The efficiency of data access during playback on the device side can be realized.
[0208]
In addition, by using the first video sample included in the packet as an intra-frame video sample, it is possible to significantly reduce the amount of calculation required for sample search at random access on the playback device side. The
In addition, video samples and audio samples contained in the packet are stored in ascending order of playback start time, so the number of seek operations during random access on the playback device can be reduced, and playback with a slow seek speed is possible. Multiplexing that enables quick random access reproduction can be realized even in the apparatus.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a functional configuration of a multiplexing apparatus according to Embodiment 1 of the present invention.
FIG. 2 is a flowchart showing a processing operation of the multiplexing apparatus.
FIG. 3 is a flowchart showing a processing operation of a video packet unit determination unit.
FIG. 4 is a flowchart showing a processing operation of an audio packet unit determination unit.
FIG. 5A is a diagram illustrating a first example of a data structure of an MP4 file extension unit created by a multiplexing device, and FIG. 5B is a data structure of an MP4 file extension unit created by a multiplexing device; It is a figure which shows the 2nd example.
FIG. 6 is a block diagram showing a functional configuration of a packet unit determination unit of the multiplexing apparatus according to the second embodiment.
FIG. 7 is a flowchart showing a first processing operation of a video packet unit determination unit.
FIG. 8 is a flowchart showing a second processing operation of the video packet unit determination unit.
FIG. 9A is a diagram illustrating a first example of a data structure of an MP4 file extension created by a multiplexing device, and FIG. 9B is a data structure of an MP4 file extension created by a multiplexing device; It is a figure which shows the 2nd example.
FIG. 10 is a block diagram showing a functional configuration of a packet data creation unit of the multiplexing apparatus according to the third embodiment.
FIG. 11 is a flowchart showing a processing operation of a packet data creation unit.
FIG. 12 is a diagram showing an outline of a data structure of an MP4 file extension created by a multiplexing device.
FIG. 13 is a diagram illustrating a first example of a data structure of an MP4 file extension created by the multiplexing device.
FIG. 14 is a diagram illustrating a second example of the data structure of the MP4 file extension created by the multiplexing device.
FIG. 15 is a block diagram showing a functional configuration of a demultiplexing apparatus according to the fourth embodiment.
FIG. 16 is a flowchart showing the processing operation of the demultiplexer.
FIG. 17 is a diagram illustrating an application example of a multiplexing device according to the present invention.
FIG. 18 is a diagram for explaining the structure of a box constituting a conventional MP4 file.
FIG. 19 is a diagram for explaining a basic part of a conventional MP4 file.
20A is a diagram for explaining the structure of a movie box in a conventional MP4 file, and FIG. 20B is a diagram showing the structure of a movie box in a conventional MP4 file in a tree shape.
FIG. 21 is a diagram illustrating a structure of a conventional MP4 file including an extension unit.
FIG. 22 is a diagram for explaining the structure of a conventional movie fragment box.
FIG. 23 is a diagram for explaining the structure of a conventional track fragment run box.
24A is a diagram illustrating a first configuration example of an MP4 file including a conventional extension unit, and FIG. 24B is a diagram illustrating a second configuration example of an MP4 file including a conventional extension unit. is there.
FIG. 25 is a block diagram showing a configuration of a conventional multiplexing apparatus.
FIG. 26 is a flowchart showing a processing operation of a conventional packet unit determination unit.
FIG. 27 is a diagram showing an example of a packet creation table for storing conventional header information of video samples.
FIG. 28A is a diagram for explaining a first problem of the conventional multiplexing device, and FIG. 28B is a diagram for explaining a second problem of the conventional multiplexing device. FIG.
[Explanation of symbols]
100, 960 Multiplexer
101, 961 1st input part
102, 962 First data storage unit
103, 963 First analysis unit
104, 964 second input section
105, 965 Second data storage unit
106,966 Second analysis unit
107, 117, 967 Packet unit determination unit
108 Time adjustment section
109, 119 Video packet unit determination unit
110 Audio packet unit determination unit
111, 968 packet creation table storage unit
112,969 Packet header creation unit
113, 130, 970 Packet data creation unit
114, 971 packet combining part
120 Time base unit adjustment section
121 I frame reference unit adjustment unit
131 mdat information acquisition unit
132 Video substance data reading unit
133 Audio entity data reading unit
134 Interleaved array part
200, 210, 220, 230, 240, 250 MP4 file extension
241, 923, 946, 948, 955, 957 Movie fragment box
242, 916, 945, 947, 949, 956, 958 Movie data box
300 Demultiplexer
301 File input section
302 File data storage unit
303 Header separation analysis unit
304 moov analysis part
305 moof analysis part
306 traf analysis unit
307 run analysis part
308 RA search unit
309 Sample acquisition unit
401 Image distribution server
402 Communication network
403 Mobile phone with recording function
404 Personal computer
405 SD memory card
406 DVD-RAM
407 mobile phone
901 box
902 Box header
903 Box data storage
904 box size
905 box type
906 version
907 flag
910, 920, 940, 950 MP4 file
911, 941, 951 Basic part
912 File header
913 File data part
914, 943, 953 File type box
915, 944, 954 Movie box
917 Movie header box
918 Track box
919 Track header box
921, 942, 952 Expansion part
922 packets
924 Movie fragment header box
925 track fragment box
926 Track fragment header box
927 truck fragment runbox
928 sample count
929 data offset
930 First sample flag
931 table
932 entries
933 Sample duration
934 sample size
935 sample flag
936 Sample composition time offset
968a packet creation table

Claims

A multiplexing apparatus that packet-multiplexes media data including image data and at least one of audio data and text data to create multiplexed data,
Media data acquisition means for acquiring the media data;
Analysis of the media data acquired by the media data acquisition means, and a playback start indicating the playback start time of the sample for the sample that is the minimum access unit of the image data, audio data, and text data included in the media data An analysis means for acquiring time information;
Based on the reproduction start time information acquired by the analysis means, a unit for packetizing the media data is determined by aligning the reproduction start times of the samples of the image data, audio data, and text data included in the media data. A packet unit determining means to perform,
A packet header part creating means for creating a packet header part for storing a header of the media data in a packetization unit determined by the packet unit determining means;
A packet data part creating means for creating a packet data part for storing the actual data of the media data in a packetization unit determined by the packet unit determining means;
A packetizing unit for creating a packet by combining the packet header part created by the packet header part creating unit and the packet data part created by the packet data part creating unit ;
The packet unit determining means determines the unit by aligning the reproduction start times of the samples of the image data, audio data, and text data for all packets necessary for storing the media data. Multiplexer to do.

The packet unit determining means includes
The reproduction start times of the audio data and text data samples arranged at the beginning of the packetization unit are aligned with the reproduction start times of the image data samples arranged at the beginning of the packetization unit. The multiplexing apparatus according to claim 1.

The packet unit determining means includes
The audio data and the text data sample arranged at the beginning of the packetization unit are after the reproduction start time of the image data sample arranged at the beginning of the packetization unit, and the sample of the image data The multiplexing apparatus according to claim 2, wherein a reproduction start time sample closest to the reproduction start time is used.

The packet unit determining means includes
The audio data and the text data sample arranged at the head of the packetization unit are before the reproduction start time of the image data sample arranged at the head of the packetization unit, and the image data sample The multiplexing apparatus according to claim 2, wherein a reproduction start time sample closest to the reproduction start time is used.

The image data is video data,
The analysis means further includes:
When the moving image data acquired by the media data acquisition unit is analyzed, and the moving image data includes one or more samples including intra frame information indicating that it is an intra-screen encoded sample, the intra Get frame information
The packet unit determining means includes
The unit for packetizing the media data is determined based on the intra frame information and the reproduction start time information when the analyzing unit acquires the intra frame information. Multiplexer.

The packet unit determining means includes
The multiplexing apparatus according to claim 5, wherein a sample of the moving image data including the intra frame information is arranged at a head of the packetization unit.

The packet unit determining means includes
The playback start time of the audio data and text data samples placed at the beginning of the packetization unit at the playback start time of the video data samples including the intra frame information placed at the beginning of the packetization unit The multiplexing apparatus according to claim 6, wherein:

The packet data part creation means includes:
2. The multiplexing apparatus according to claim 1, wherein the packet data unit that stores and interleaves the media data samples included in the packetization unit so that the reproduction start times of the samples are in ascending order is created. .

The packet data part creation means includes:
The multiplexing apparatus according to claim 8, wherein the packet data unit that stores the media data samples included in the packetization unit in an interleaved manner so as to satisfy a predetermined rule is created.

A multiplexing method for creating multiplexed data by packet multiplexing media data including image data and at least one of audio data and text data,
A media data acquisition step of acquiring the media data;
The media data acquired in the media data acquisition step is analyzed, and a playback start indicating the playback start time of the sample for the sample that is the minimum access unit of the image data, audio data, and text data included in the media data is started. An analysis step to obtain time information;
Based on the reproduction start time information acquired in the analysis step, a unit for packetizing the media data is determined by aligning the reproduction start times of the samples of the image data, audio data, and text data included in the media data. A packet unit determination step to be performed;
A packet header part creating step for creating a packet header part for storing a header of the media data in the packetization unit determined in the packet unit determiner step;
A packet data part creating step for creating a packet data part for storing the actual data of the media data in the packetization unit determined in the packet unit determining step;
Seen containing a packet header portion created in the packet header part generation step, and a packetizing step of creating a packet by combining the packet data unit that was created in the packet data unit generating step,
In the packet unit determination step, the unit is determined by aligning the reproduction start times of the samples of the image data, audio data, and text data for all packets necessary for storing the media data. Multiplexing method to do.

In the packet unit determination step,
The reproduction start time of the audio data and the text data sample arranged at the head of the packetization unit is aligned with the reproduction start time of the image data sample arranged at the head of the packetization unit. The multiplexing method according to claim 10.

The image data is video data,
In the analyzing step,
When the moving image data acquired in the media data acquisition step is analyzed, and the moving image data includes one or more samples including intra frame information indicating that it is an intra-screen encoded sample, the intra Get frame information
In the packet unit determination step,
The unit for packetizing the media data is determined based on the intra frame information and the reproduction start time information when the intra frame information is acquired in the analyzing step. Multiplexing method.

In the packet unit determination step,
The multiplexing method according to claim 12, wherein a sample of the moving image data including the intra frame information is arranged at the head of the packetization unit.

In the packet unit determination step,
The playback start time of the audio data and text data samples placed at the beginning of the packetization unit at the playback start time of the video data sample including the intra frame information placed at the beginning of the packetization unit The multiplexing method according to claim 13, wherein:

In the packet data part creation step,
11. The multiplexing method according to claim 10, wherein the packet data unit is generated to interleave and store the media data samples included in the packetization unit so that the reproduction start times of the samples are in ascending order. .

A program for a multiplexing device that creates multiplexed data by packet multiplexing media data including image data and at least one of audio data and text data,
A media data acquisition step of acquiring the media data;
The media data acquired in the media data acquisition step is analyzed, and a playback start indicating the playback start time of the sample for the sample that is the minimum access unit of the image data, audio data, and text data included in the media data is started. An analysis step to obtain time information;
Based on the reproduction start time information acquired in the analysis step, a unit for packetizing the media data is determined by aligning the reproduction start times of the samples of the image data, audio data, and text data included in the media data. A packet unit determination step to be performed;
A packet header part creating step for creating a packet header part for storing a header of the media data in the packetization unit determined in the packet unit determiner step;
A packet data part creating step for creating a packet data part for storing the actual data of the media data in the packetization unit determined in the packet unit determining step;
Seen containing a packet header portion created in the packet header part generation step, and a packetizing step of creating a packet by combining the packet data unit that was created in the packet data unit generating step,
In the packet unit determining step, in the multiplexing method for determining the unit by aligning the reproduction start times of the samples of the image data, audio data, and text data for all the packets necessary for storing the media data A program characterized by causing a computer to execute each step.