JP2005045401A

JP2005045401A - Object generating apparatus and method thereof

Info

Publication number: JP2005045401A
Application number: JP2003201159A
Authority: JP
Inventors: Masahiko Takaku; 雅彦高久
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2003-07-24
Filing date: 2003-07-24
Publication date: 2005-02-17

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem that it is required to prepare an object of a size matching the intention of a provider of a BIFS source in the case that a still picture and a moving picture are embedded to a scene by utilizing the BIFS source and tools or the like that are provided. <P>SOLUTION: A BIFS source pre-process module 501 acquires URL information 508 and size information 509 of the entity of the object on the basis of a scene description 505 used for encoding the scene and an object transcoder module 502 revises a size of a reference object 510 on the basis of the acquired URL information and size information 508 to output a modified reference object 511. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明はオブジェクト生成装置およびその方法に関し、例えば、シーン記述により画面中に配置すべき静止画、動画、コンピュータグラフィクスなどのオブジェクトの生成に関する。
【０００２】
【従来の技術】
ディジタルデータ処理能力の向上と通信技術の発展によって、いわゆるマルチメディアを簡単に表現することができるようになり、複数の静止画、動画、音声、ベクトルグラフィクスなどを用いた複雑な表現が一般的になった。このような複合的なマルチメディア（あるいはマルチオブジェクト）表現の方法として、オブジェクトベース符号化方式がある。この具体的な例が、ＭＰＥＧ−４システムと呼ばれれる標準規格（ＩＳＯ／ＩＥＣ１４４９６−１）である（例えば、非特許文献１参照）。
【０００３】
ＩＳＯ／ＩＥＣ１４４９６−１では、ＭＰＥＧ−４のオーディオビジュアルオブジェクトを空間にどのように配置して表示すべきかを示す、ＢＩＦＳ（ＢｉｎａｒｙＦｏｒｍａｔｆｏｒＳｃｅｎｅｓ）と呼ばれるバイナリ形式が規格化されている。ＢＩＦＳは、オブジェクト記述子（ＯｂｊｅｃｔＤｅｓｃｒｉｐｔｏｒ）およびその特殊な形式である初期オブジェクト記述子（ＩｎｉｔｉａｌＯｂｊｅｃｔＤｅｓｃｒｉｐｔｏｒ）、さらに、音声や映像をエンコードしたＭＰＥＧ−４ビジュアルやＭＰＥＧ−４オーディオの定義によるビットストリームとともに、複合化されたいわゆるマルチメディアのシーンを合成する。
【０００４】
ＢＩＦＳは、その定義の根底にＶＲＭＬ（ＩＳＯ／ＩＥＣ１４７７２−１ＶｉｒｔｕａｌＲｅａｌｉｔｙＭｏｄｅｌｉｎｇＬａｎｇｕａｇｅ）を使用したもので、ＶＲＭＬを拡張したものと考えてよい。
【０００５】
規格の詳細はＩＳＯ／ＩＥＣ１４４９６−１に定義されているので、ここでは説明しないが、以下では、発明の本質をより簡潔に説明するため、ＶＲＭＬを基本としたＢＩＦＳの表現をＢＩＦＳソースとよび、テキスト形式で表現する。ＢＩＦＳのテキスト表現には、ＸＭＴ（ＥｘｔｅｎｓｉｂｌｅＭＰＥＧ−４Ｔｅｘｔｕａｌｆｏｒｍａｔ）とよばれるＸＭＬ形式の表現があり、以下では、このようなＸＭＴによる表現も含めて、明示的にＸＭＴと表現している場合を除き、ＢＩＦＳソースとする。すなわち、ＢＩＦＳソースは、オブジェクトベース符号化方式において、オブジェクトの配置や動作を定義するシーンを符号化するためのシーン記述方式の具体的な例である。
【０００６】
図１から３はＶＲＭＬを基本としたＢＩＦＳソースの記述例を示す図である。
【０００７】
先頭に記述された「＃ＶＲＭＬＶ２．０ｕｔｆ８」はコメントで、ＶＲＭＬの規約上定められたものであって発明と直接関係はないが、一般的な記述法の例として記述されている。なお「ｕｔｆ８」は、ＩＳＯ１０６４６を基準とするＵＴＦ−８エンコーディングによる国際文字の指定である。
【０００８】
図４は、図１から３に示すＢＩＦＳソースをエンコードした場合に、その結果として期待される表示の効果を模式的に示す図である。以下では、図１から３のＢＩＦＳソースの例と図４を用いて詳細に説明する。
【０００９】
図１に示す二つ目のコメント行「＃ −−− 背景画像 −−−」以下は、背景の静止画像を示し、ｕｒｌで指定されるオブジェクト「ｂａｃｋｇｒｏｕｎｄ．ＪＰＧ」が図４に示す背景画２０１になる。
【００１０】
同様に、三つ目のコメント行「＃ −−− 静止画：画像１ −−−」以下は図４に示す静止画２０２に、四つ目のコメント行「＃ −−− 動画：画像２ −−−」以下は図４に示す動画２０３にそれぞれ対応する。
【００１１】
図２に示すコメント行「＃ −−− 動作 −−−」と、図３に示すコメント行「＃ −−− イベント間の接続 −−−」以下の記述は、ポインティングデバイスによってオブジェクトが選択された際の動作などをそれぞれ定義し、例えば、図４の拡大エリア２０４に、選択されたオブジェクトが拡大表示される、といった動作を定義する。
【００１２】
以下では、図１から３のＢＩＦＳソース例についてごく簡単に説明する。
【００１３】
まず、Ｇｒｏｕｐ｛ … ｝の部分は、表示されるシーン全体をグループ化するノードを示す。次のｃｈｉｌｄｒｅｎ［ … ］はグループの子ノードの集合を示し、次のＴｒａｎｓｆｏｒｍ２Ｄ｛ … ｝で最初のノードの座標変換を表す。
【００１４】
Ｔｒａｎｓｆｏｒｍ２Ｄは、さらに子ノードを保持し、Ｓｈａｐｅ｛ … ｝によって特定のオブジェクトをシーンに取り込む。これは、Ｓｈａｐｅで指定されたオブジェクトをＴｒａｎｓｆｏｒｍ２Ｄによってシーン中の座標にマッピングすることと等価である。Ｓｈａｐｅは、さらにａｐｐｅａｒａｎｃｅＡｐｐｅａｒａｎｃｅ｛ … ｝とｇｅｏｍｅｔｒｙＢｉｔｍａｐ｛ … ｝を保持し、前者はＳｈａｐｅの視覚特性、後者はその具体的な表示を指定する。視覚特性としては、ｔｅｘｔｕｒｅＩｍａｇｅＴｅｘｔｕｒｅ｛ … ｝によって具体的な画像オブジェクトを指定し、そのオブジェクトを示すｕｒｌが記述される。ｓｃａｌｅやｔｒａｎｓｌａｔｉｏｎはそれぞれ倍率と座標を記述する。
【００１５】
ここで着目すべき点は、Ｔｒａｎｓｆｏｒｍ２Ｄとｇｅｏｍｅｔｒｙに記述されたｓｃａｌｅやｔｒａｎｓｌａｔｉｏｎの値である。ｓｃａｌｅの値は、ＲＯＵＴＥコマンドにより、オブジェクト選択時から一定の時間で、選択されたオブジェクトの拡大と縮小を行う。その動作は、例えばＤＥＦＰＩＣ１ＳＣＡＬＥ（図２）で定義されたＰｏｓｉｔｉｏｎＩｎｔｅｒｐｏｌａｔｏｒ２Ｄ｛ … ｝によって値が変化することで行われる。この例では、縦横が０．５（半分）から１．０（等倍）へと変化し、さらに元の０．５へと戻される。
【００１６】
同様に、ｔｒａｎｓｌａｔｉｏｎの値も変化してオブジェクトの位置が変更される。ｔｒａｎｓｌａｔｉｏｎは、対象になるオブジェクトの平行移動を意味し、この値が変化することによって、オブジェクトの位置が移動する。
【００１７】
以上では、何ら具体的な説明なしに、オブジェクトという言葉を使って説明を行ったが、このオブジェクトの位置と大きさについてより詳しく説明するため、さらに具体的な表現を用いることにする。
【００１８】
図１から３に示すＢＩＦＳソースでは、我々が目にする具体的なオブジェクトとして、背景画のｂａｃｋｇｒｏｕｎｄ．ＪＰＧ、静止画のｐｉｃｔｕｒｅ１．ｉｍｇ、および、動画のｐｉｃｔｕｒｅ２．ｍｏｖｉｅが使用されている。これらのオブジェクトは、ｕｒｌの記述の後にＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）形式で記述される。この表記は、よく知られたＲＦＣに定義されたものであり、かつ、ＶＲＭＬの定義形式であるので、ここでは詳しく説明しない。
【００１９】
これら三つのオブジェクトは、この例では、ｂａｃｋｇｒｏｕｎｄ．ＪＰＧが横５１２ピクセル×縦３８４ピクセル、ｐｉｃｔｕｒｅ１．ｉｍｇおよびｐｉｃｔｕｒｅ２．ｍｏｖｉｅが横３２０ピクセル×縦２４０ピクセルと仮定されている。上述したｓｃａｌｅは、拡大縮小の割合を示すから、各オブジェクトは、１．０（等倍）で表現された後、それらを包含する領域としての背景画ｂａｃｋｇｒｏｕｎｄ．ＪＰＧを除き、初期値で０．５、すなわち半分のサイズに縮小される。背景画ｂａｃｋｇｒｏｕｎｄ．ＪＰＧはそのままのサイズであるが、静止画と動画は、半分に縮小され、初期段階のサイズは横１６０ピクセル×縦１２０ピクセルである。また、縮小された静止画と動画の位置は、ｔｒａｎｓｌａｔｉｏｎによって座標（１６８、７６）と座標（１６８、−７６）に指定されるため、表示領域の右上と右下に移動される。
【００２０】
このようにして、図４に示すような具体的な位置表現が行われる。注意すべきことは、このようなＢＩＦＳソースによって定義された表示の効果は、ｔｅｘｔｕｒｅ内のｕｒｌで指定されたオブジェクトの絶対的な大きさを何ら規定しない状態で表現されることである。ＩＳＯ／ＩＥＣ１４４９６−１の定義は、その原型となったＶＲＭＬの定義と同様に、その座標系に、いわゆる右手系のデカルト三次元座標を使用し、表現されるテクスチャは、元の大きさに対する比率でのみ表現される。そのため、明示的に、この座標系の中に有限の大きさで定義されるオブジェクトを配置して、そのオブジェクトにテクスチャを貼り付けるといった記述を行わない限り、そのオブジェクトが表示される大きさは、オブジェクトしか知り得ないことになる。
【００２１】
図１から３に示すＢＩＦＳソースの例では、背景画２０１が横５１２ピクセル×縦３８４ピクセルで、かつ、静止画２０２が横６４０ピクセル×縦４８０ピクセルである場合、背景画２０１は、その位置は静止画２０２と完全には重ならないとは言え、いわゆるオーバラップされた状態になり隠されてしまう。このような表示は、おそらく図１から３に示すＢＩＦＳソースの作成者の意図とは異なる。言い換えれば、ＢＩＦＳソースの作成者は、表示されるオブジェクトの絶対的な大きさを認識してソースを記述しているから意図した表現が行われる。この例では、ＢＩＦＳソースの作成者は、背景画２０１を横５１２ピクセル×縦３８４ピクセル、静止画２０２と動画２０３を横３２０ピクセル×縦２４０ピクセルと認識していて、それによりＢＩＦＳソースの作成者の意図が正しく表現される。
【００２２】
ここまでは、図１から３に示すＢＩＦＳソースの例に従い、オブジェクトの配置を中心に説明したが、ＩＳＯ／ＩＥＣ１４４９６−１の定義によれば、そのＢＩＦＳの表示領域については、ＤｅｃｏｄｅｒＣｏｎｆｉｇＤｅｓｃｒｉｐｔｏｒ（復号器設定記述子）内のＤｅｃｏｄｅｒＳｐｅｃｉｆｉｃＩｎｆｏ（復号器固有情報）に保持されるＢＩＦＳＣｏｎｆｉｇにおいて定義される。ＢＩＦＳＣｏｎｆｉｇの定義は次のようなものである。

【００２３】
もし、ｈａｓＳｉｚｅが真（ｔｒｕｅ）にセットされていれば（すなわちビットが立っていれば）、ｐｉｘｅｌＷｉｄｔｈとｐｉｘｅｌＨｅｉｇｈｔによって、表示される具体的な表現（シーン）の領域サイズが指定されることになる。
【００２４】
このように、エンコードされたビットストリーム中にサイズが指定されることにより、ＢＩＦＳソースで指示された表現は、さらに、その表示領域が指定されることになる。
【００２５】
このようにして、ＢＩＦＳソースもしくはシーン記述と、これに記述されないビットストリーム中のデータ表現により、複数のオブジェクトが複雑に表現された、いわゆるマルチメディアのシーンを自在に表現することが可能である。
【００２６】
上述したように、ＢＩＦＳソースの作成者は、オブジェクトの配置と動作に関する定義、すなわちシーン記述を、シーン記述から参照される静止画や動画などの外部参照オブジェクトのサイズなどを予め判断し、意図した状態になるように設計しなければならない。しかし、現実には、それでも不充分な場合がある。任意のあらゆる参照オブジェクトに対して意図が反映されるようにシーン記述を設計することは、全く不可能であるとは言えないが、そのためには、シーン記述であるＢＩＦＳソースの作成者の充分な配慮が必須である。例えば、図１から３に例示したＢＩＦＳソースは、横３２０×縦２４０ピクセルのオブジェクトを参照する場合、図４に示すような表示が行われ得る。しかし、オブジェクトのサイズが異なれば、極端な場合、参照されるオブジェクトのほんの一部が表示されるだけになる。
【００２７】
また、ＢＩＦＳソースの作成者が予め特定の意図をもったＢＩＦＳソースを作成し、しかる後、第三者が参照されるオブジェクトを入れ替えると、ＢＩＦＳソースの意図どおりに表現されない場合がある。この問題は、例えばＭＰＥＧ−４の規格をよく知るデザイナがある意図をもってＢＩＦＳソースを設計し、しかる後、エンコードされたＢＩＦＳやＢＩＦＳソースと、ＢＩＦＳを生成するエンコーダを含むＭＰＥＧ−４データの簡易生成ツールを提供するといった場合に顕著に現れる。このような場合、提供されたＢＩＦＳやＢＩＦＳソース、あるいは、ツールなどを利用して、その利用者が欲する静止画や動画を埋め込むといったことが想定されるが、その利用者は、提供者の意図に合った大きさのオブジェクトを予め準備しなければならない。
【００２８】
上記の問題に対処するため、ＢＩＦＳソースに期待するオブジェトのサイズなどをコメントとして埋め込む手法や、提供するツールに期待するオブジェトのサイズへの変換機能を埋め込む、といったことが行われる。しかし、これらの手法は、ツールに固有の機能でのみ実現されたり、利用者がＢＩＦＳソースを読んで独自に判断する必要が生じるなど、不便である。また、ツールに固有の機能で実現する場合は、ＢＩＦＳ（もしくはＢＩＦＳソース）に独自の拡張を施したり、ＢＩＦＳとは異なる別情報を用意したりする必要があり、さらに互換性の問題も発生する。
【００２９】
【非特許文献１】
標準勧告書ＩＳＯ／ＩＥＣ１４４９６−１
【００３０】
【発明が解決しようとする課題】
本発明は、上述の問題を個々にまたはまとめて解決するもので、シーン記述の意図に合うオブジェクトの生成を目的とする。
【００３１】
【課題を解決するための手段】
本発明は、前記の目的を達成する一手段として、以下の構成を備える。
【００３２】
本発明は、オブジェクトベース符号化に関連するオブジェクトを生成する際に、シーンを符号化するためのシーン記述からオブジェクトの実体の位置を示す情報を取得し、その位置情報に関連するサイズ情報を取得し、取得した位置情報およびサイズ情報に基づき、参照すべきオブジェクトの大きさを変更することを特徴とする。
【００３３】
【発明の実施の形態】
以下、本発明にかかる実施形態の画像処理を図面を参照して詳細に説明する。
【００３４】
［概要］
以下では、ＢＩＦＳソース中にＩＳＯ／ＩＥＣ１４４９６−１の定義に矛盾しない記述を付与し、それによってＢＩＦＳソースの作成者の意図に合うＭＰＥＧ−４によるシーン表現を実現することが可能なＭＰＥＧ−４データを生成する実施形態を説明する。
【００３５】
実施形態において説明するＭＰＥＧ−４データ生成装置は、オブジェクトベース符号化によるＩＳＯ／ＩＥＣ１４４９６（ＭＰＥＧ−４）に関するオブジェクトデータ生成装置である。ここで説明するＭＰＥＧ−４データ生成装置は、細部においては異なる構成で表現される場合もあるが、ハードウェアで構成しても、コンピュータ装置にソフトウェアを供給するなどしてソフトウェアで構成しても実現可能で、その構成要素は、本発明の本質に合致する限り、組み替え、統合、再配置が可能である。
【００３６】
なお、以下の説明では、簡略化のため、図１から３に示したＢＩＦＳソースと、図４に示した期待される表示の効果を説明の前提にする。
【００３７】
［ＸＭＴによる記述］
図１から３においては、ＩＳＯ／ＩＥＣ１４４９６−１の定義におけるＢＩＦＳの基本型とでもいうべきＶＲＭＬを起点とするＢＩＦＳソースを例示した。ＶＲＭＬのような形式の場合、定義されない形式のノードや命令文を使用することは、暗黙のうちに受け入れることはできない。一方、ＩＳＯ／ＩＥＣ１４４９６−１Ａｍｅｎｄｍｅｎｔ２のＴｅｘｔｕａｌＦｏｒｍａｔには、ＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）を利用したＸＭＴ（ＥｘｔｅｎｓｉｂｌｅＭＰＥＧ−４Ｔｅｘｔｕａｌｆｏｒｍａｔ）が定義されているが、これはＸＭＬによる記述であるから自由に拡張することができる。
【００３８】
図５は、図１から３に示した「静止画：画像１」の部分に含まれるＳｈａｐｅのＸＭＴ記述例を示す図である。
【００３９】
図５に示す記述は、ＢＩＦＳソースの記述と論理的に等価で、もし図１から３に示すＢＩＦＳソース全体をＸＭＴで書き換えるならば、図４と同等の表示の効果が期待できる。しかし、ＸＭＴによる記述を、一般的なＸＭＴを解釈することが可能なＢＩＦＳエンコーダを介してエンコードし、ｕｒｌに指定されるオブジェクトをそのまま利用すれば、図１から３に示すＢＩＦＳソースの場合と同様、ＢＩＦＳソースの作成者が意図した表示の効果を期待できるとは限らない。つまり、このような記述の場合、ＵＲＬ参照されるオブジェクトのサイズはＢＩＦＳソースを作成した段階で仮定されていて、その仮定されたサイズと異なるサイズのオブジェクトを使用する場合の結果は配慮されていない。
【００４０】
実施形態は、ＸＭＴによるＢＩＦＳソースに記述を加え、それをＸＭＴの解釈時に利用して、ＸＭＴの規格を逸脱しない範囲でＢＩＦＳソースの作成者の意図を反映するものである。
【００４１】
図６は実施形態のＸＭＴの記述例を示す図である。なお、図６に記述されたＸＭＴは、図５に記述されたＸＭＴと同等のものである。図６には、二つの例６０１および６０２を示すが、これらは本質的に同等であり、後述するＭＰＥＧ−４データ生成装置のアプリケーション処理の細部に若干の違いを必要とするだけである。
【００４２】
図６に示すＸＭＴによるＢＩＦＳソースは、ＸＭＴの規格に定義されない、期待されるオブジェクトの幅を与えるｗｉｄｔｈと、高さを与えるｈｅｉｇｈｔの属性を含む。これは、一見、ＸＭＴの規格を逸脱しているようにみえるが、ＸＭＴがＸＭＬによって定義されているため、定義されない属性は、暗黙のうちに無視される。当然のことながら、追加した属性が無視されるならば属性を追加したことの意味はない。そこで実施形態のＭＰＥＧ−４データ生成装置は、この無視される属性を利用するように構成する。
【００４３】
ところで、ＸＭＴの規格を逸脱しないという点において、実施形態のＭＰＥＧ−４データ生成装置を使用しない場合は追加の属性が無視される。言い換えれば、従来のＭＰＥＧ−４データ生成装置に、属性が追加されたＸＭＴの処理を行わせたとしても矛盾は生じない。
【００４４】
［ＭＰＥＧ−４データ生成装置］
図７は、ソフトウェアでＭＰＥＧ−４データ生成装置を構成する場合のコンピュータ装置の構成例を示すブロック図である。
【００４５】
ＣＰＵ１０１は、ＲＯＭ１０２やハードディスク（ＨＤ）１０９に記憶されたデータや制御プログラム、オペレーティングシステム（ＯＳ）、ＭＰＥＧ−４データ生成用のアプリケーションプログラムなどに従い、ＲＡＭ１０３、操作部１０４、モニタ１０６などを制御して、ＭＰＥＧ−４データ生成にかかわる各種の制御や処理を行う。
【００４６】
ＲＡＭ１０３は、ＣＰＵ１０１が各種プログラムを実行するための作業領域、操作部１０４から入力されるデータなどの一時保存領域である。
【００４７】
操作部１０４は、マウスやキーボードなどで、ＭＰＥＧ−４データ生成にかかわる各種の制御や処理の設定データなどを、ユーザが入力するためのものである。
【００４８】
モニタ１０６は、ＣＲＴやＬＣＤなどで、画像処理結果や、操作部１０４による操作の際のユーザインタフェイス画面などが表示される。
【００４９】
ネットワークインタフェイスカード（ＮＩＣ）１０８は、ＢＩＦＳデータ、ＭＰＥＧ−４データ、オブジェクトを含む各種データをネットワークを介して他のコンピュータ装置とやり取りするための通信インタフェイスである。
【００５０】
なお、図７には示さないが、操作部１０４、モニタ１０６、ＮＩＣ１０８およびＨＤ１０９は、それぞれ所定のインタフェイスを介して、ホストコンピュータのシステムバス１１０に接続されている。
【００５１】
図８はＭＰＥＧ−４データ生成装置の機能構成例を示すブロック図である。なお、ＭＰＥＧ−４データ生成装置の一般的な構成は、例えばＭＰＥＧ−４ビデオエンコーダのような様々な付加機能が追加し得るものの、図８に示す機能構成とほとんど変わらない。従って、図８においても、実施形態の説明に不要な付加機能は省略する。
【００５２】
図８において、実施形態に重要な構成はＢＩＦＳソースプリプロセスモジュール５０１である。言い換えれば、ＢＩＦＳソースプリプロセスモジュール５０１を除く他のモジュール群は、一般的に使用されるモジュールである。
【００５３】
まず、ＢＩＦＳソースプリプロセスモジュール５０１を除く他のモジュールについて簡単に説明する。
【００５４】
オブジェクトトランスコーダモジュール５０２は、ＢＩＦＳソース５０５のｕｒｌによって指定されるオブジェクトのＵＲＬ情報５０８に従い、参照オブジェクト５１０の縦横の大きさを変換し、異なるサイズに変換（トランスコード）する。勿論、ＭＰＥＧ−４データ生成装置の機能として、例えば、いわゆるモーションＪＰＥＧからＭＰＥＧ−４ビジュアルへ変換するといった圧縮符号化方式の変換（トランスコード）を行う可能性もあるし、動画だけでなく静止画の変換（トランスコード）を行う場合もある。このようなアプリケーション上の可能性に関しては、説明を簡略化するために一切配慮せず、単にトランスコードと表現する。
【００５５】
ＢＩＦＳエンコーダモジュール５０３は、ＢＩＦＳソースもしくはＢＩＦＳソースを解釈した結果を入力し、ＢＩＦＳデータ５０７を生成する。このようなモジュールは、ＩＳＯより出版されているリファレンスコードをはじめとして、様々な形で一般に使用されている。なお、ＢＩＦＳエンコーダの機能は、ＢＩＦＳソースを解釈してＢＩＦＳデータ５０７を生成することにあるが、ＢＩＦＳソースの解釈は、前段のＢＩＦＳソースプリプロセスモジュール５０１が行ってもよいし、ＢＩＦＳエンコーダモジュール５０３が行ってもよい。ＢＩＦＳソースプリプロセスモジュール５０１がＢＩＦＳソースを解釈する場合は、ＢＩＦＳエンコーダモジュール５０３の入力は、ＢＩＦＳソースというよりは、むしろＢＩＦＳソースを何らかの形式で内部記憶メモリ（例えばＲＡＭ１０３）などに保管したものになる。
【００５６】
ＭＰＥＧ−４データ生成モジュール５０４は、最終的に、有効なＭＰＥＧ−４データ５１２を生成する。例えば、ＩＳＯ／ＩＥＣ１４４９６−１に定義されるＭＰ４データファイルを作成する機能がこれに相当する。
【００５７】
次に、ＢＩＦＳソースプリプロセスモジュール５０１は、ＸＭＴで記述されたＢＩＦＳソース５０５に対して、一定の解釈を行う機能を有する。ＸＭＴは、ＸＭＬを使用して記述されるため、内部にＸＭＬパーザもしくはそれに類した機能を保持する。ＸＭＬパーザには、一般的に、ＤＯＭまたはＳＡＸと呼ばれるインタフェイス（Ｉ／Ｆ）形式が存在する。その詳細は実施形態に直接関係しないので、ここでは、木構造で表現されるＸＭＬの各エレメントを順次処理するＳＡＸ形式の動作に似た処理と仮定する。この仮定は、ＸＭＬの各エレメントを木構造のまま内部記憶メモリ（例えばＲＡＭ１０３）に格納するＤＯＭ形式であっても内部処理の違いだけであるので、とくに例外的なものとはならない。
【００５８】
［動作］
図９および１０はＢＩＦＳソースプリプロセスモジュール５０１の動作を説明するフローチャートである。
【００５９】
ＢＩＦＳソースプリプロセスモジュール５０１は、まず、ＢＩＦＳソース５０５を順次読み込み（Ｓ６０１）、その木構造を取得して内部メモリ（ＲＡＭ１０３など）に記憶する（Ｓ６０２）。このとき、ＸＭＴとして未知の属性を含む場合（Ｓ６０３）、その属性と値を取得する（Ｓ６０４）。図４に示した例６０１では、最初のＳｈａｐｅエレメントの解釈において、属性ｗｉｄｔｈと値”３２０”が取得される。
【００６０】
次に、ＢＩＦＳソースプリプロセスモジュール５０１は、この未知の属性が処理できるか否かを判断する（Ｓ６０５）。この例では、ｗｉｄｔｈとｈｅｉｇｈｔは処理可能な属性であるから、内部メモリに保管する（Ｓ６０６）。この処理により、例えば属性ｗｉｄｔｈ、値”３２０”の組み合わせが記憶される。
【００６１】
次に、ＢＩＦＳソースプリプロセスモジュール５０１は、現在処理中のエレメントにｕｒｌが含まれるか否かを判断し（Ｓ６０７）、ｕｒｌが含まれるならば、次にその親もしくは同じエレメントにおいて、処理可能な属性、すなわち、この例においてｗｉｄｔｈとｈｅｉｇｈｔが保管されているか否かを判断する（Ｓ６０８）。なお、親とは、ＸＭＴの木構造により直接辿ることができる親ノードまたはさらにその上位の親ノードなど「直系」の親である。
【００６２】
ＢＩＦＳソースプリプロセスモジュール５０１は、ＸＭＴの木構造の親に処理可能な属性が保管されている場合は、そのＵＲＬと、取得したｗｉｄｔｈ、ｈｅｉｇｈｔの値を記憶する（Ｓ６０９）。そして、ＸＭＬ木構造のエレメントの取得が終わりか否かを判定し（Ｓ６１０）、未了であれば処理をステップＳ６０２に戻し、ＸＭＬ木構造のエレメントの取得が終わるまでステップＳ６０２からＳ６１０を繰り返す。
【００６３】
ここで注意すべき点は、繰り返しエレメントを取得して処理を行う中で、親のエレメントで指定されたｗｉｄｔｈとｈｅｉｇｈｔによるサイズ情報は、子エレメントに指定されたｕｒｌが現れた段階で、正しい組み合わせとして記憶されるという点である。このような処理判断を行うことによって、図６に例示した複数のＢＩＦＳソース記述方法に対して柔軟に対処することができる。
【００６４】
エレメントがすべて解釈された後、ＢＩＦＳソースプリプロセスモジュール５０１は、改変ＢＩＦＳソース５０６を木構造から出力し（Ｓ６１１）、もし記憶されたＵＲＬ、ｗｉｄｔｈ、ｈｅｉｇｈｔの値があれば（Ｓ６１２）、それらを出力する（Ｓ６１３）。
【００６５】
改変ＢＩＦＳソース５０６は、入力のＢＩＦＳソース５０５と同じものであるかもしれないが、ＢＩＦＳソース５０５に未知の属性が含まれた場合は、未知の属性が木構造ではなく他の内部記憶メモリに保管されるため、未知の属性を含まない純粋なＸＭＴとして出力される。このように、ＢＩＦＳソースプリプロセスモジュール５０１には、ＸＭＴに含まれたＸＭＴに準拠しない情報を削除し、純粋なＸＭＴを出力する、副次的な効果も期待することができる。
【００６６】
勿論、アプリケーションによっては、このような改変ＢＩＦＳソース５０６ではなく、元のＢＩＦＳソース５０５をそのままＢＩＦＳエンコーダの入力に使いたい場合もあるだろう。そのような場合、ＢＩＦＳソースの書き出し（Ｓ６１１）は不要であるかも知れないが、こうしたアプリケーションの要請は発明の本質に影響を与えるものではない。
【００６７】
このようにして、ＢＩＦＳソースプリプロセスモジュール５０１によって出力された改変ＢＩＦＳソース５０６は、ＢＩＦＳエンコーダモジュール５０３によってエンコードされ、ＢＩＦＳデータ５０７が出力される。一方、ＵＲＬ情報５０８とサイズ情報５０９は、オブジェクトトランスコーダモジュール５０２の入力になる。なお、図１０に示すＵＲＬおよび処理可能な属性の値の書き出し（Ｓ６１３）の出力が、オブジェクトトランスコーダモジュール５０２に入力されるＵＲＬ情報５０８とサイズ情報５０９を意味することは言うまでもない。
【００６８】
こうして、ＢＩＦＳソース５０５からＵＲＬ情報５０８およびサイズ情報５０９が取得されたため、これらの情報を利用して、オブジェクトトランスコーダモジュール５０２は処理を行うことができるようになる。オブジェクトトランスコーダモジュール５０２は、ＵＲＬ情報５０８を参照してオブジェクトを取得し、取得したオブジェクトをサイズ情報５０９を使用してトランスコードする。具体的な例を挙げるならば、ＢＩＦＳソース５０５が図６の例６０２である場合、下記のように記述されていることから、ファイルｐｉｃｔｕｒｅ１．ｉｍｇを読み込み、横３２０×縦２４０の画像に変換する。
【００６９】
もし、ｐｉｃｔｕｒｅ１．ｉｍｇが横１６００×縦１２００サイズの画像であれば、縦横が２０％の大きさに変換されて、ＢＩＦＳソースの作成者の意図に合う表現が期待されるオブジェクトになる。
【００７０】
このようにして、参照オブジェクト５１０がトランスコードされる一方、ＢＩＦＳソースプリプロセスモジュール５０１によって出力された改変ＢＩＦＳソース５０６は、ＢＩＦＳエンコーダモジュール５０３によって、ＢＩＦＳデータ５０７にエンコードされる。
【００７１】
最終的に、ＢＩＦＳソースの作成者の意図に合うように変換された改変参照オブジェクト５１１と、ＢＩＦＳデータ５０７により、ＭＰＥＧ−４データ生成モジュール５０４がＭＰＥＧ−４データ５１２を出力し、最終的な結果が得られる。
【００７２】
以上では、ＢＩＦＳソースに記述されたＵＲＬがＢＩＦＳソースの利用者によって予め書き換えられているという仮定の基に説明を行ったが、当然、アプリケーションによってはその他の方法が使用される場合もある。例えば、ＵＲＬとしてＢＩＦＳソースの利用者が参照すべきオブジェクトを取得するためのＣＧＩプログラム（いわゆるＷｅｂサーバがクライアントシステムからの要求で起動するアプリケーションプログラム）を指定すれば、インターネット上のコンテンツを必要に応じて参照することが可能である。あるいは、特殊なプロトコルスキームを指定したＵＲＬを使用することにより、特定の記憶装置に保管されたファイルを自動参照させることも全く自然なことである。このような方法は、アプリケーションによって様々に対応可能であり、本発明は、そのような任意のアプリケーションにおいても、本発明の特徴を利用できる形態であれば適用可能である。以下では、具体例として、比較的単純な方法を例示する。
【００７３】
図１１は、ＢＩＦＳソース例７０１および７０３に含まれるＵＲＬの例と、アプリケーションが記憶する参照テーブル７０２および７０４との組み合わせを示す図である。
【００７４】
例７０１は、特殊なプロトコルスキームを指定したＵＲＬを使用する例で、ｕｒｌには「ｕｓｅｒ：」というスキームが指定されている。アプリケーションは、記憶する参照テーブル７０２を自動的に参照して、「ｕｓｅｒ：」に続く「ｃｏｎｔｅｎｔＩＤ？１」により、該当するＵＲＬになるｕｓｅｒ０１．ｊｐｇを取得する。参照テーブル７０２に記憶されたｕｓｅｒ０１．ｊｐｇが、まさしく、ＢＩＦＳソースの利用者が参照を望むオブジェクトのＵＲＬを示していることにより、正しいＵＲＬの取得が可能になる。
【００７５】
もし、ＢＩＦＳソースの作成者が同様の仕組みを用いて、参照テーブルにＢＩＦＳソースの作成者が使用したオブジェクトのＵＲＬを記述しておけば、アプリケーションの単純な制御により、参照テーブルの値を入れ替えるだけで、作成者と利用者のオブジェクトを入れ替えることが簡単に行える。
【００７６】
例７０３は、ｕｒｌに、図６に示す例６０１と同じＢＩＦＳソースのＵＲＬが記述されている。ＢＩＦＳソースの作成者はＢＩＦＳソースに記述されたｐｉｃｔｕｒｅ１．ｉｍｇを使用し、ＢＩＦＳソースの利用者は異なるＵＲＬのオブジェクトの参照を望むとする。この場合でも、オブジェクトトランスコーダモジュール５０２が参照テーブル７０４を利用するように設計されていれば、オブジェクトトランスコーダモジュール５０２は、参照テーブル７０４からｐｉｃｔｕｒｅ１．ｉｍｇに対応するＵＲＬであるｕｓｅｒ０１．ｊｐｇを取得して、処理を行うことができる。
【００７７】
このように、本実施形態は、アプリケーション設計に柔軟に対応することが可能である。
【００７８】
上記の説明においては、ＢＩＦＳソースプリプロセスモジュール５０１が出力するサイズ情報５０９や、ＢＩＦＳソース５０５に記述するサイズ情報そのものを、幅と高さで与えるように説明したが、これらの記述方法は、例えば幅３２０、縦横比３対４のように縦横どちらかの一辺の大きさと、残る辺の比率との組み合わせで表現してもよい。あるいは、何らかの仮想的な大きさと、それに対する縦横の倍率であってもよい。例えば前者の方法であれば、幅３２０に比率３／４を乗ずることで、横３２０×縦２４０を容易に計算することができる。このように、どのような方法であれ、ＢＩＦＳソースの作成者が、そのＢＩＦＳソースにおいて想定する大きさを与えることが可能であればよい。
【００７９】
最後に、本実施形態における付加的な説明を加える。
【００８０】
まず、ＸＭＴに非対応のＢＩＦＳエンコーダが使用される場合を説明する。
【００８１】
このようなケースでは、図８に示すＭＰＥＧ−４データ生成装置の基本的な動作に変わりはないが、ＢＩＦＳエンコーダモジュール５０３の入力として、ＸＭＴで記述された改変ＢＩＦＳソース５０６を使用することができない。先に説明したように、ＢＩＦＳエンコーダモジュール５０３は、ＢＩＦＳソースもしくはＢＩＦＳソースを解釈した結果を入力として、ＢＩＦＳを生成する機能をもつ。従って、もし入力がＢＩＦＳソースを解釈した結果の記述方法としてＢＩＦＳソースの記述方法に依存しない形式が用いらる場合と何ら変わるところはない。異なるのは、ＢＩＦＳエンコーダモジュール５０３が、例えばＶＲＭＬ形式のＢＩＦＳソースのみを解釈する場合である。
【００８２】
こうした場合でも、ＢＩＦＳソースプリプロセスモジュール５０１とオブジェクトトランスコーダモジュール５０２をフィルタモジュールとして、既存のＢＩＦＳエンコーダの前段として利用することで、本実施形態は適用可能である。
【００８３】
ＢＩＦＳソースプリプロセスモジュール５０１は、その内部にＢＩＦＳソースの解釈を行う機能を保持するため、ここで作成された木構造を基に、ＶＲＭＬ形式のＢＩＦＳソースを生成するようにアプリケーションを設計すればよい。
【００８４】
図１２は、このようなフィルタモジュールの結果例を示す図である。ＸＭＴに記述されたｗｉｄｔｈなどの属性は、ＢＩＦＳソースプリプロセスモジュール５０１により削除されている。このように、図８に示すモジュールが、それぞれ独立したモジュールとして、上述した限定された機能だけをもたなくても、本実施形態を適用可能である。
【００８５】
このように、本実施形態によれば、ＩＳＯ／ＩＥＣ１４４９６−１の定義に矛盾しない方法によって、ＢＩＦＳソースに期待されるオブジェクトの幅と高さを記述し、これを解釈することによって、オブジェクトの大きさを変更することができる。従って、予め作成済みの、あるいは、参照されるオブジェクトが未解決のＢＩＦＳソースであっても、オブジェクトが決定されたときに、参照されるオブジェクトに期待される幅と高さを利用する方法を提供し、作成者の意図に合う表現にすることができる。
【００８６】
なお、上記のＭＰＥＧ−４データ生成装置とそのモジュール構成は、必ずしもパーソナルコンピュータ上で動作するアプリケーションプログラムの分野に限定されるものではない。ＭＰＥＧ−４データ生成装置を、ディジタルカメラ、カムコーダ、携帯電話、ディジタルレコーダなどのハードウェア機器に適用することも可能である。
【００８７】
ＸＭＬにより記述されるシーン記述に、参照すべきオブジェクトの大きさを記述し、これをオブジェクトデータ生成装置が読み込んで、オブジェクトの実体の位置を示す情報と、関連する大きさの情報を抽出し、該当するオブジェクトの大きさを変更する。従って、予め作成済みの、あるいは、参照すべきオブジェクトが未解決のＢＩＦＳソースもしくはシーン記述であっても、オブジェクトが決定された時点で、参照すべきオブジェクトに期待される幅と高さを利用して、ＢＩＦＳソースの作成者の意図に合う表現が可能になる。
【００８８】
また、シーン記述に記述されるオブジェクトの実体の位置を示す情報と、関連する大きさの情報との抽出を、ノードの親子関係とシーン記述方式の定義に含まれない属性の判断によって行う。従って、シーン記述方式に矛盾しない方法によって、ＢＩＦＳソースの作成者の意図に合う表現が可能になる。
【００８９】
さらに、オブジェクトの大きさを変更する際、抽出されたオブジェクトの実体の位置を示す情報と、実際にシーンに適用されるオブジェクトの関連を与える参照テーブルを利用して大きさを変更するオブジェクトを決定すれば、アプリケーションプログラムに対する適用の可能性を高めることができる。
【００９０】
また、オブジェクトの実体の位置を示す情報と、関連する大きさの情報を抽出する際に、ＸＭＬにより記述されるシーン記述から、オブジェクトベース符号化によるシーンデータ符号化を行う手段が適用可能なシーン記述方式でシーン記述を出力すれば、アプリケーションプログラムに対する適用の可能性を高めることができる。
【００９１】
このように、オブジェクトベース符号化方式に矛盾することなく、オブジェクトベース符号化方式を利用しながら、予め作成済みの、あるいは、参照すべきオブジェクトが未解決のＢＩＦＳソースもしくはシーン記述であっても、オブジェクトが決定された時点で、参照すべきオブジェクトに期待される幅と高さを利用して、ＢＩＦＳソースの作成者の意図に合う表現が得られる。
【００９２】
ＭＰＥＧ−４データ生成装置を構成するモジュールが、複数の機器に分散されているとしても、適当な通信機能やデータ共有などによりそれらが対話的に通信可能であれば、それら複数の機器が本発明を構成する。あるいは、複数の機器の一部がインターネットやその他の通信機能によって離れた場所にあってもよい。
【００９３】
また、本発明の目的は、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体（または記録媒体）を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることは言うまでもない。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム（ＯＳ）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【００９４】
さらに、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【００９５】
本発明を上記記憶媒体に適用する場合、その記憶媒体には、先に説明したフローチャートに対応するプログラムコードが格納されることになる。
【００９６】
【発明の効果】
以上説明したように、本発明によれば、シーン記述の意図に合うオブジェクトを生成することができる。
【図面の簡単な説明】
【図１】ＶＲＭＬを基本としたＢＩＦＳソースの記述例を示す図、
【図２】ＶＲＭＬを基本としたＢＩＦＳソースの記述例を示す図、
【図３】ＶＲＭＬを基本としたＢＩＦＳソースの記述例を示す図、
【図４】図１から３に示すＢＩＦＳソースをエンコードした場合に、その結果として期待される表示の効果を模式的に示す図、
【図５】図１から３に示した「静止画：画像１」の部分に含まれるＳｈａｐｅのＸＭＴ記述例を示す図、
【図６】実施形態のＸＭＴの記述例を示す図、
【図７】ソフトウェアでＭＰＥＧ−４データ生成装置を構成する場合のコンピュータ装置の構成例を示すブロック図、
【図８】ＭＰＥＧ−４データ生成装置の機能構成例を示すブロック図、
【図９】ＢＩＦＳソースプリプロセスモジュール５０１の動作を説明するフローチャート、
【図１０】ＢＩＦＳソースプリプロセスモジュール５０１の動作を説明するフローチャート、
【図１１】ＢＩＦＳソース例に含まれるＵＲＬの例と、アプリケーションが記憶する参照テーブルとの組み合わせを示す図
【図１２】フィルタモジュールの結果例を示す図である。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an object generation apparatus and method, for example, to generation of objects such as still images, moving images, and computer graphics to be arranged on a screen by scene description.
[0002]
[Prior art]
With the improvement of digital data processing capability and the development of communication technology, so-called multimedia can be expressed easily, and complex expressions using multiple still images, moving images, audio, vector graphics, etc. are generally used. became. There is an object-based encoding method as a method of such complex multimedia (or multi-object) expression. A specific example is a standard (ISO / IEC 14496-1) called MPEG-4 system (for example, see Non-Patent Document 1).
[0003]
ISO / IEC 14496-1 standardizes a binary format called BIFS (Binary Format for Scenes) that indicates how an MPEG-4 audiovisual object should be arranged and displayed in a space. BIFS includes an object descriptor (Object Descriptor) and an initial object descriptor (Initial Object Descriptor), which is a special form of the object descriptor, and a bit stream defined by MPEG-4 visual and MPEG-4 audio encoding audio and video. The so-called multimedia scene is composed.
[0004]
BIFS uses VRML (ISO / IEC147772-1 Virtual Reality Modeling Language) as the basis of its definition, and can be considered as an extension of VRML.
[0005]
Details of the standard are defined in ISO / IEC 14496-1, and will not be described here, but in the following, in order to explain the essence of the invention more concisely, the representation of BIFS based on VRML is called BIFS source, Express in text format. The BIFS text representation includes an XML format representation called XMT (Extensible MPEG-4 Textual format). In the following, a case where the representation is explicitly expressed as XMT including such representation by XMT will be described. Except for the BIFS source. That is, the BIFS source is a specific example of a scene description method for encoding a scene that defines the arrangement and operation of an object in the object-based encoding method.
[0006]
1 to 3 are diagrams showing a description example of a BIFS source based on VRML.
[0007]
“#VRML V2.0 utf8” described at the beginning is a comment, which is defined in the VRML rules and is not directly related to the invention, but is described as an example of a general description method. “Utf8” is an international character designation based on UTF-8 encoding based on ISO10646.
[0008]
FIG. 4 is a diagram schematically showing the expected display effect when the BIFS source shown in FIGS. 1 to 3 is encoded. Hereinafter, a detailed description will be given using the example of the BIFS source of FIGS. 1 to 3 and FIG.
[0009]
The second comment line “# --- background image ---” shown in FIG. 1 shows a background still image, and an object “background” JPG designated by url is a background image 201 shown in FIG. become.
[0010]
Similarly, the third comment line “# −−− still image: image 1 −−−” and the following are added to the still image 202 shown in FIG. 4 and the fourth comment line “# −−− video: image 2 − "-" And the following respectively correspond to the moving image 203 shown in FIG.
[0011]
The comment lines "# ---- Operation ---" shown in FIG. 2 and the comment lines "# ---- Connection between events ----" shown in FIG. For example, an operation in which the selected object is enlarged and displayed in the enlarged area 204 of FIG. 4 is defined.
[0012]
In the following, the BIFS source example of FIGS.
[0013]
First, the group {...} Indicates a node that groups the entire displayed scene. The next child [...] Represents a set of child nodes of the group, and the next Transform2D {.
[0014]
The Transform 2D further holds child nodes, and takes a specific object into the scene by using Shape {...}. This is equivalent to mapping the object specified by Shape to coordinates in the scene by Transform2D. Shape further holds appearance Appearance {...} And geometry Bitmap {...}, The former designates the visual characteristics of Shape and the latter designates its specific display. As a visual characteristic, a specific image object is specified by texture ImageText {...}, And url indicating the object is described. Scale and translation describe the magnification and coordinates, respectively.
[0015]
The points to be noted here are the values of scale and translation described in Transform2D and geometry. The scale value is enlarged or reduced by a ROUTE command in a certain time from the time of object selection. The operation is performed by changing the value by PositionInterpolator2D {...} Defined by, for example, DEF PIC1SCALE (FIG. 2). In this example, the aspect changes from 0.5 (half) to 1.0 (same size), and then returns to the original 0.5.
[0016]
Similarly, the translation value also changes to change the position of the object. “translation” means a parallel movement of a target object, and the position of the object is moved by changing this value.
[0017]
In the above description, the term “object” is used without any specific explanation. However, in order to explain the position and size of the object in more detail, more specific expressions will be used.
[0018]
In the BIFS source shown in FIGS. 1 to 3, the background object background. JPG, still picture picture1. img and picture of the movie 2. movie is being used. These objects are described in URL (Uniform Resource Locator) format after the description of url. Since this notation is defined in the well-known RFC and is in the VRML definition format, it will not be described in detail here.
[0019]
These three objects, in this example, are bakground. JPG is 512 pixels wide by 384 pixels long, picture1. img and picture2. The movie is assumed to be 320 pixels wide × 240 pixels high. Since the scale described above indicates the ratio of enlargement / reduction, each object is expressed by 1.0 (same size), and then the background background. Except for JPG, the initial value is reduced to 0.5, that is, half the size. Background picture background. JPG is the size as it is, but still images and moving images are reduced in half, and the initial size is 160 pixels wide by 120 pixels high. Further, the positions of the reduced still image and moving image are designated by coordinates (168, 76) and coordinates (168, -76) by translation, and are thus moved to the upper right and lower right of the display area.
[0020]
In this way, a specific position expression as shown in FIG. 4 is performed. It should be noted that the display effect defined by such a BIFS source is expressed in a state that does not define the absolute size of the object specified by url in the texture. The definition of ISO / IEC14496-1 uses a so-called right-handed Cartesian three-dimensional coordinate system for the coordinate system, as in the original VRML definition, and the expressed texture is a ratio to the original size. Expressed only in Therefore, unless you explicitly place an object defined with a finite size in this coordinate system and paste a texture on that object, the size of the displayed object is Only objects will be known.
[0021]
In the example of the BIFS source shown in FIGS. 1 to 3, when the background image 201 is 512 pixels wide × 384 pixels long and the still image 202 is 640 pixels wide × 480 pixels high, the background image 201 is positioned at Although it does not completely overlap with the still image 202, it becomes a so-called overlapped state and is hidden. Such a display is probably different from the intention of the BIFS source creator shown in FIGS. In other words, the creator of the BIFS source recognizes the absolute size of the displayed object and describes the source, so that the intended expression is performed. In this example, the creator of the BIFS source recognizes the background image 201 as 512 pixels wide × 384 pixels high, and the still image 202 and the moving image 203 as 320 pixels wide × 240 pixels high, thereby creating the BIFS source creator. The intention of is expressed correctly.
[0022]
Up to this point, the description has focused on the arrangement of objects according to the example of the BIFS source shown in FIGS. 1 to 3, but according to the definition of ISO / IEC 14496-1, the display area of the BIFS has a DecoderConfigDescriptor (decoder setting). Defined in BIFSConfig held in DecoderSpecificInfo (decoder specific information) in the descriptor). The definition of BIFSConfig is as follows.

[0023]
If hasSize is set to true (that is, if a bit is set), the area size of a specific expression (scene) to be displayed is specified by pixelWidth and pixelHeight.
[0024]
Thus, by specifying the size in the encoded bitstream, the display area of the expression indicated by the BIFS source is further specified.
[0025]
In this way, a so-called multimedia scene in which a plurality of objects are expressed in a complex manner can be freely expressed by a BIFS source or scene description and data expression in a bitstream not described in the BIFS source or scene description.
[0026]
As described above, the creator of the BIFS source intended the definition regarding the arrangement and operation of the object, that is, the scene description in advance by determining the size of an external reference object such as a still image or a moving image referenced from the scene description. It must be designed to be in a state. In reality, however, there are still cases where it is insufficient. Designing a scene description to reflect intent for any given reference object is not impossible at all, but for that purpose, the author of the BIFS source that is the scene description is sufficient. Consideration is essential. For example, when the BIFS source illustrated in FIGS. 1 to 3 refers to an object of horizontal 320 × vertical 240 pixels, the display shown in FIG. 4 can be performed. However, if the object sizes are different, in extreme cases, only a portion of the referenced object is displayed.
[0027]
Also, if the BIFS source creator creates a BIFS source with a specific intention in advance and then replaces an object that is referred to by a third party, the BIFS source may not be expressed as intended. The problem is that, for example, a designer who is familiar with the MPEG-4 standard designs a BIFS source with the intention, and then easily generates MPEG-4 data including an encoded BIFS or BIFS source and an encoder that generates BIFS. This is especially true when providing tools. In such a case, it may be assumed that the user wants to embed a still image or video that the user wants using the provided BIFS or BIFS source, or a tool, etc. It is necessary to prepare in advance an object of a size suitable for.
[0028]
In order to cope with the above problem, a method of embedding the object size expected in the BIFS source as a comment or a function for converting the object size expected in the provided tool is embedded. However, these methods are inconvenient because they are realized only by functions unique to the tool, or the user needs to read the BIFS source and make an independent determination. In addition, when implementing with the function specific to the tool, it is necessary to give the BIFS (or BIFS source) a unique extension, or to prepare different information different from the BIFS, which also causes compatibility problems. .
[0029]
[Non-Patent Document 1]
Standard Recommendation ISO / IEC14496-1
[0030]
[Problems to be solved by the invention]
The present invention solves the above-mentioned problems individually or collectively, and aims to generate an object that meets the intention of scene description.
[0031]
[Means for Solving the Problems]
The present invention has the following configuration as one means for achieving the above object.
[0032]
When generating an object related to object-based encoding, the present invention acquires information indicating the position of the object entity from the scene description for encoding the scene, and acquires size information related to the position information. The size of the object to be referred to is changed based on the acquired position information and size information.
[0033]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, image processing according to an embodiment of the present invention will be described in detail with reference to the drawings.
[0034]
[Overview]
In the following, MPEG-4 data capable of providing a description in MPEG-4 conforming to the intention of the creator of the BIFS source by giving a description consistent with the definition of ISO / IEC 14496-1 in the BIFS source. An embodiment for generating the will be described.
[0035]
The MPEG-4 data generation apparatus described in the embodiment is an object data generation apparatus related to ISO / IEC14496 (MPEG-4) based on object-based encoding. The MPEG-4 data generation apparatus described here may be expressed in different configurations in detail, but may be configured by hardware or software by supplying software to a computer apparatus. It is feasible and its components can be rearranged, integrated and rearranged as long as they meet the essence of the present invention.
[0036]
In the following description, for simplification, the BIFS source shown in FIGS. 1 to 3 and the expected display effect shown in FIG.
[0037]
[Description by XMT]
1 to 3 exemplify a BIFS source starting from VRML, which may be called a basic type of BIFS in the definition of ISO / IEC 14496-1. In the case of a format such as VRML, using a node or statement in an undefined format cannot be accepted implicitly. On the other hand, in the Text / Format of ISO / IEC 14496-1 Amendment 2, XMT (Extensible MPEG-4 Textual format) using XML (Extensible Markup Language) is defined, but this is an XML description. Can be extended.
[0038]
FIG. 5 is a diagram illustrating an XMT description example of Shape included in the “still image: image 1” portion illustrated in FIGS.
[0039]
The description shown in FIG. 5 is logically equivalent to the description of the BIFS source. If the entire BIFS source shown in FIGS. 1 to 3 is rewritten with XMT, the display effect equivalent to that in FIG. 4 can be expected. However, if the description in XMT is encoded through a BIFS encoder capable of interpreting general XMT and the object specified in url is used as it is, the same as in the case of the BIFS source shown in FIGS. The display effect intended by the creator of the BIFS source is not always expected. In other words, in the case of such a description, the size of the object referenced by the URL is assumed at the stage of creating the BIFS source, and the result of using an object having a size different from the assumed size is not considered. .
[0040]
In the embodiment, a description is added to a BIFS source by XMT, and this is used when interpreting the XMT to reflect the intention of the creator of the BIFS source without departing from the XMT standard.
[0041]
FIG. 6 is a diagram illustrating a description example of the XMT according to the embodiment. Note that the XMT described in FIG. 6 is equivalent to the XMT described in FIG. FIG. 6 shows two examples 601 and 602, which are essentially equivalent and only require a slight difference in the application processing details of the MPEG-4 data generator described below.
[0042]
The BIFS source by XMT shown in FIG. 6 includes a width attribute that gives an expected object width and a height attribute that gives a height, which are not defined in the XMT standard. This appears to deviate from the XMT standard, but since XMT is defined by XML, undefined attributes are silently ignored. Of course, if the added attribute is ignored, there is no point in adding the attribute. Therefore, the MPEG-4 data generation apparatus of the embodiment is configured to use this ignored attribute.
[0043]
By the way, in the point of not deviating from the XMT standard, additional attributes are ignored when the MPEG-4 data generation apparatus of the embodiment is not used. In other words, even if the conventional MPEG-4 data generation apparatus performs the XMT processing with the added attribute, no contradiction occurs.
[0044]
[MPEG-4 data generator]
FIG. 7 is a block diagram showing a configuration example of a computer device when an MPEG-4 data generation device is configured by software.
[0045]
The CPU 101 includes a RAM 103, an operation unit 104, a monitor 106, and the like according to data stored in the ROM 102 and the hard disk (HD) 109, a control program, an operating system (OS), an application program for generating MPEG-4 data, and the like. Various types of control and processing related to MPEG-4 data generation are performed.
[0046]
A RAM 103 is a work area for the CPU 101 to execute various programs and a temporary storage area for data input from the operation unit 104.
[0047]
The operation unit 104 is used by a user to input setting data for various controls and processes related to MPEG-4 data generation using a mouse or a keyboard.
[0048]
The monitor 106 is a CRT, LCD, or the like, and displays an image processing result, a user interface screen at the time of operation by the operation unit 104, and the like.
[0049]
A network interface card (NIC) 108 is a communication interface for exchanging various data including BIFS data, MPEG-4 data, and objects with other computer apparatuses via a network.
[0050]
Although not shown in FIG. 7, the operation unit 104, the monitor 106, the NIC 108, and the HD 109 are each connected to the system bus 110 of the host computer via a predetermined interface.
[0051]
FIG. 8 is a block diagram showing a functional configuration example of the MPEG-4 data generation apparatus. The general configuration of the MPEG-4 data generation apparatus is almost the same as the functional configuration shown in FIG. 8 although various additional functions such as an MPEG-4 video encoder can be added. Therefore, also in FIG. 8, additional functions unnecessary for the description of the embodiment are omitted.
[0052]
In FIG. 8, a configuration important for the embodiment is a BIFS source preprocessing module 501. In other words, the module group other than the BIFS source preprocess module 501 is a commonly used module.
[0053]
First, modules other than the BIFS source preprocess module 501 will be briefly described.
[0054]
The object transcoder module 502 converts the vertical and horizontal sizes of the reference object 510 according to the URL information 508 of the object specified by the url of the BIFS source 505, and converts (transcodes) the reference object 510 to a different size. Of course, as a function of the MPEG-4 data generation device, for example, there is a possibility of performing conversion (transcoding) of a compression encoding method such as conversion from so-called motion JPEG to MPEG-4 visual. May be converted (transcoded). With regard to such application possibilities, no consideration is given to simplify the explanation, and it is simply expressed as transcoding.
[0055]
The BIFS encoder module 503 inputs a BIFS source or a result obtained by interpreting the BIFS source, and generates BIFS data 507. Such a module is generally used in various forms including a reference code published by ISO. The function of the BIFS encoder is to generate the BIFS data 507 by interpreting the BIFS source. However, the BIFS source may be interpreted by the previous BIFS source preprocessing module 501 or the BIFS encoder module 503. May do. When the BIFS source preprocessing module 501 interprets a BIFS source, the input of the BIFS encoder module 503 is not a BIFS source, but rather a BIFS source stored in some form in an internal storage memory (eg, RAM 103). Become.
[0056]
The MPEG-4 data generation module 504 finally generates valid MPEG-4 data 512. For example, this corresponds to the function of creating an MP4 data file defined in ISO / IEC 14496-1.
[0057]
Next, the BIFS source preprocessing module 501 has a function of performing a certain interpretation on the BIFS source 505 described in XMT. Since XMT is described using XML, it retains an XML parser or similar function therein. An XML parser generally has an interface (I / F) format called DOM or SAX. Since the details are not directly related to the embodiment, it is assumed here that the processing is similar to the operation of the SAX format in which each element of XML expressed in a tree structure is processed sequentially. This assumption is not particularly exceptional even in the DOM format in which each element of XML is stored in an internal storage memory (for example, RAM 103) as a tree structure because it is only a difference in internal processing.
[0058]
[Operation]
FIGS. 9 and 10 are flowcharts for explaining the operation of the BIFS source preprocessing module 501.
[0059]
The BIFS source preprocessing module 501 first reads the BIFS source 505 sequentially (S601), acquires its tree structure, and stores it in an internal memory (such as the RAM 103) (S602). At this time, when an unknown attribute is included as XMT (S603), the attribute and value are acquired (S604). In the example 601 shown in FIG. 4, the attribute width and the value “320” are acquired in the interpretation of the first Shape element.
[0060]
Next, the BIFS source preprocessing module 501 determines whether or not this unknown attribute can be processed (S605). In this example, width and height are attributes that can be processed, and are stored in the internal memory (S606). By this process, for example, a combination of the attribute width and the value “320” is stored.
[0061]
Next, the BIFS source preprocessing module 501 determines whether or not url is included in the currently processed element (S607), and if url is included, the parent or the same element can be processed next. It is determined whether or not attributes, that is, width and height in this example are stored (S608). Note that the parent is a “direct” parent such as a parent node that can be directly traced by the tree structure of the XMT or a parent node higher than that.
[0062]
If the attributes that can be processed are stored in the parent of the XMT tree structure, the BIFS source preprocessing module 501 stores the URL and the acquired width and height values (S609). Then, it is determined whether or not the acquisition of the XML tree structure element is completed (S610). If the acquisition is not completed, the process returns to step S602, and steps S602 to S610 are repeated until the acquisition of the XML tree structure element is completed.
[0063]
It should be noted that the size information by width and height specified in the parent element is the correct combination when the url specified in the child element appears, while acquiring and repeating the repeated elements. Is stored as. By performing such processing determination, it is possible to flexibly cope with the plurality of BIFS source description methods illustrated in FIG.
[0064]
After all the elements are interpreted, the BIFS source preprocessing module 501 outputs the modified BIFS source 506 from the tree structure (S611), and if there are stored URL, width and height values (S612), It outputs (S613).
[0065]
The modified BIFS source 506 may be the same as the input BIFS source 505. However, if the BIFS source 505 includes an unknown attribute, the unknown attribute is stored in another internal storage memory instead of the tree structure. Therefore, it is output as pure XMT that does not include unknown attributes. As described above, the BIFS source preprocessing module 501 can also be expected to have a secondary effect of deleting information that does not conform to XMT included in the XMT and outputting pure XMT.
[0066]
Of course, depending on the application, not the modified BIFS source 506 but the original BIFS source 505 may be used as it is for the input of the BIFS encoder. In such a case, writing BIFS source (S611) may not be necessary, but such application requirements do not affect the essence of the invention.
[0067]
In this way, the modified BIFS source 506 output by the BIFS source preprocessing module 501 is encoded by the BIFS encoder module 503, and the BIFS data 507 is output. On the other hand, the URL information 508 and the size information 509 are input to the object transcoder module 502. Needless to say, the output of the URL and the processable attribute value writing (S613) shown in FIG. 10 means the URL information 508 and the size information 509 input to the object transcoder module 502.
[0068]
Thus, since the URL information 508 and the size information 509 are obtained from the BIFS source 505, the object transcoder module 502 can perform processing using these information. The object transcoder module 502 acquires an object with reference to the URL information 508, and transcodes the acquired object using the size information 509. If a specific example is given, when the BIFS source 505 is the example 602 in FIG. 6, the file picture1. img is read and converted into a horizontal 320 × vertical 240 image.
[0069]
If picture1. If img is an image having a horizontal size of 1600 × longitudinal size, the vertical and horizontal sizes are converted to a size of 20%, and an object that is expected to represent the BIFS source creator's intention is obtained.
[0070]
In this manner, the reference object 510 is transcoded, while the modified BIFS source 506 output by the BIFS source preprocessing module 501 is encoded into the BIFS data 507 by the BIFS encoder module 503.
[0071]
Finally, the MPEG-4 data generation module 504 outputs the MPEG-4 data 512 by using the modified reference object 511 converted to suit the intention of the creator of the BIFS source and the BIFS data 507, and the final result Is obtained.
[0072]
In the above description, the URL described in the BIFS source has been described based on the assumption that the user of the BIFS source has been rewritten in advance, but naturally other methods may be used depending on the application. For example, if a CGI program (an application program that a so-called Web server starts in response to a request from a client system) for acquiring an object to be referred to by a BIFS source user is specified as a URL, content on the Internet can be used as needed. Can be referred to. Alternatively, it is quite natural to automatically refer to a file stored in a specific storage device by using a URL specifying a special protocol scheme. Such a method can be applied in various ways depending on the application, and the present invention can be applied to any arbitrary application as long as the features of the present invention can be used. Below, a comparatively simple method is illustrated as a specific example.
[0073]
FIG. 11 is a diagram illustrating combinations of URL examples included in the BIFS source examples 701 and 703 and reference tables 702 and 704 stored in the application.
[0074]
An example 701 is an example in which a URL specifying a special protocol scheme is used, and a scheme “user:” is specified in url. The application automatically refers to the stored reference table 702, and uses “contentID? 1” following “user:” to obtain the corresponding URL user01. Get jpg. User01.. Stored in the reference table 702. Since jpg indicates the URL of the object that the BIFS source user wants to refer to, the correct URL can be acquired.
[0075]
If the BIFS source creator describes the URL of the object used by the BIFS source creator in the reference table using the same mechanism, the value of the reference table is simply replaced by simple control of the application. Thus, it is easy to swap the creator and user objects.
[0076]
In Example 703, the URL of the same BIFS source as in Example 601 shown in FIG. 6 is described in url. The creator of the BIFS source is picture1. Using img, the BIFS source user wants to reference an object with a different URL. Even in this case, if the object transcoder module 502 is designed so as to use the reference table 704, the object transcoder module 502 uses the picture 1. img which is the URL corresponding to img. jpg can be acquired and processed.
[0077]
As described above, this embodiment can flexibly cope with application design.
[0078]
In the above description, the size information 509 output from the BIFS source preprocessing module 501 and the size information itself described in the BIFS source 505 have been described as being given in width and height. It may be expressed by a combination of the size of one side of the vertical and horizontal directions and the ratio of the remaining sides, such as width 320 and aspect ratio 3 to 4. Alternatively, it may be some virtual size and vertical and horizontal magnification. For example, in the former method, the width 320 × the height 240 can be easily calculated by multiplying the width 320 by the ratio 3/4. In this way, any method is acceptable as long as the BIFS source creator can give the expected size in the BIFS source.
[0079]
Finally, additional description in this embodiment will be added.
[0080]
First, a case where a BIFS encoder that does not support XMT is used will be described.
[0081]
In such a case, the basic operation of the MPEG-4 data generation apparatus shown in FIG. 8 is not changed, but the modified BIFS source 506 described in XMT cannot be used as the input of the BIFS encoder module 503. . As described above, the BIFS encoder module 503 has a function of generating a BIFS by inputting a BIFS source or a result obtained by interpreting a BIFS source. Therefore, there is no difference from the case where a format that does not depend on the description method of the BIFS source is used as the description method of the result obtained by interpreting the BIFS source. The difference is that the BIFS encoder module 503 only interprets a BIFS source, for example in VRML format.
[0082]
Even in such a case, the present embodiment can be applied by using the BIFS source preprocessing module 501 and the object transcoder module 502 as filter modules and as a pre-stage of an existing BIFS encoder.
[0083]
Since the BIFS source preprocessing module 501 retains the function of interpreting the BIFS source therein, an application may be designed to generate a VRML format BIFS source based on the tree structure created here. .
[0084]
FIG. 12 is a diagram illustrating a result example of such a filter module. Attributes such as width described in the XMT are deleted by the BIFS source preprocess module 501. Thus, the present embodiment can be applied even if the modules shown in FIG. 8 are independent modules and do not have only the limited functions described above.
[0085]
As described above, according to the present embodiment, the object size and size are described by interpreting the width and height of the object expected from the BIFS source by a method consistent with the definition of ISO / IEC14496-1. Can be changed. Therefore, even if a pre-created or referenced object is an unresolved BIFS source, a method is provided that uses the expected width and height of the referenced object when the object is determined. And can be expressed in accordance with the intention of the creator.
[0086]
The MPEG-4 data generation apparatus and its module configuration are not necessarily limited to the field of application programs that run on a personal computer. The MPEG-4 data generation apparatus can also be applied to hardware devices such as a digital camera, a camcorder, a mobile phone, and a digital recorder.
[0087]
In the scene description described in XML, the size of the object to be referred to is described, and this is read by the object data generation device, and the information indicating the position of the object entity and the related size information are extracted. Change the size of the corresponding object. Therefore, even if the object already created or the object to be referenced is an unresolved BIFS source or scene description, the width and height expected for the object to be referred to are used when the object is determined. Thus, an expression suitable for the intention of the creator of the BIFS source becomes possible.
[0088]
Also, extraction of information indicating the position of the object entity described in the scene description and related size information is performed by determining attributes not included in the definition of the parent-child relationship of the nodes and the scene description method. Therefore, an expression suitable for the intention of the creator of the BIFS source becomes possible by a method that is consistent with the scene description method.
[0089]
In addition, when changing the size of an object, the information indicating the position of the extracted object and the reference table that gives the relationship of the object actually applied to the scene is used to determine the object whose size is to be changed. Then, the possibility of application to the application program can be increased.
[0090]
A scene to which means for performing scene data encoding by object-based encoding from a scene description described in XML can be applied when extracting information indicating the position of an object entity and related size information. If the scene description is output by the description method, the possibility of application to the application program can be increased.
[0091]
In this way, while using the object-based encoding method without contradicting the object-based encoding method, even if the object that has been created in advance or the object to be referenced is an unresolved BIFS source or scene description, When the object is determined, the width and height expected for the object to be referenced is used to obtain a representation that fits the intention of the BIFS source creator.
[0092]
Even if the modules constituting the MPEG-4 data generation apparatus are distributed to a plurality of devices, the plurality of devices can be used as long as they can communicate interactively by an appropriate communication function or data sharing. Configure. Alternatively, some of the plurality of devices may be located away from each other by the Internet or other communication functions.
[0093]
Another object of the present invention is to supply a storage medium (or recording medium) in which a program code of software that realizes the functions of the above-described embodiments is recorded to a system or apparatus, and the computer (or CPU or CPU) of the system or apparatus. Needless to say, this can also be achieved by the MPU) reading and executing the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present invention. Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an operating system (OS) running on the computer based on the instruction of the program code. It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the actual processing and the processing is included.
[0094]
Furthermore, after the program code read from the storage medium is written into a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function is based on the instruction of the program code. It goes without saying that the CPU or the like provided in the expansion card or the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.
[0095]
When the present invention is applied to the storage medium, the storage medium stores program codes corresponding to the flowcharts described above.
[0096]
【The invention's effect】
As described above, according to the present invention, it is possible to generate an object that meets the intention of the scene description.
[Brief description of the drawings]
FIG. 1 is a diagram showing a description example of a BIFS source based on VRML;
FIG. 2 is a diagram showing a description example of a BIFS source based on VRML;
FIG. 3 is a diagram showing a description example of a BIFS source based on VRML;
FIG. 4 is a diagram schematically showing a display effect expected as a result when the BIFS source shown in FIGS. 1 to 3 is encoded;
FIG. 5 is a view showing an XMT description example of Shape included in the “still image: image 1” portion shown in FIGS. 1 to 3;
FIG. 6 is a diagram showing a description example of XMT according to the embodiment;
FIG. 7 is a block diagram showing a configuration example of a computer device when an MPEG-4 data generation device is configured by software;
FIG. 8 is a block diagram illustrating a functional configuration example of an MPEG-4 data generation device;
FIG. 9 is a flowchart for explaining the operation of the BIFS source preprocessing module 501;
FIG. 10 is a flowchart for explaining the operation of the BIFS source preprocessing module 501;
11 is a diagram showing a combination of an example of a URL included in a BIFS source example and a reference table stored by an application. FIG. 12 is a diagram showing a result example of a filter module.

Claims

A generation method for generating an object related to object-based encoding,
Obtain information indicating the position of the object entity from the scene description for encoding the scene, obtain size information related to the position information,
A generation method characterized by changing a size of an object to be referred to based on acquired position information and size information.

The generation method according to claim 1, wherein the size information is acquired based on information included in the scene description that does not conform to a standard of a scene description method.

The generation method according to claim 1, wherein when the position information is included in a node of a scene to be processed, size information included in a parent node of the node is set as size information related to the position information. .

The generation method according to claim 1, wherein the object to be referred to is determined by referring to the position information and a table that gives an association of objects actually applied to the scene.

In addition, scene data encoding by object-based encoding is performed,
5. The generation method according to claim 1, wherein encoded object data is generated from the encoded scene data and the object.

6. The reference object according to claim 1, wherein the reference object includes at least one of a moving image, a still image, computer graphics, a composite object, or a combination of the object-based encoding. The production method described in 1.

The generation according to claim 5 or 6, further comprising outputting a scene description of a scene description method that matches the scene data encoding when the scene data encoding is different from the scene description method. Method.

The generation method according to any one of claims 1 to 7, wherein the object-based encoding conforms to or conforms to MPEG-4.

The generation method according to any one of claims 1 to 8, wherein the scene description method is described in XML, and the size information is given to the scene description.

9. The generation method according to claim 1, wherein the scene description method is described by XMT.

The size information is described as an object height and width, a height or width and a ratio thereof, a virtual size and a ratio for converting the height into a height and a width, or a combination thereof. The generation method according to claim 9 or 10.

The position information is described in URL, and the size information includes the height and width of the object, the height or width and the ratio thereof, the virtual size and the ratio for converting the height and width, The generation method according to claim 9 or 10, wherein the generation method is described as a combination.

13. A program for controlling an information processing apparatus to realize generation of an object according to any one of claims 1 to 12.

A recording medium on which the program according to claim 13 is recorded.

A generation device for generating an object related to object-based encoding,
An acquisition means for acquiring information indicating a position of an object entity from a scene description for encoding a scene, and acquiring size information related to the position information;
A generation method comprising: changing means for changing a size of an object to be referred to based on the acquired position information and size information.