JP2005045400A

JP2005045400A - Object generating device and method therefor

Info

Publication number: JP2005045400A
Application number: JP2003201158A
Authority: JP
Inventors: Akira Kunimatsu; 亮國松; Masahiko Takaku; 雅彦高久
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2003-07-24
Filing date: 2003-07-24
Publication date: 2005-02-17

Abstract

<P>PROBLEM TO BE SOLVED: To solve a problem in which an object having a size matching the intention of a provider of a BIFS source when a static image and a dynamic image are embedded in a scene by using the provided BIFS source, tool, etc. <P>SOLUTION: A BIFS source interpretation module 704 and a node name extraction module 705A extract a node name defined with a reserved word defining the arbitrary node name from a scene description 703 for encoding the scene, extract size information 708 on an object and URL information 711 representing the position of the entity of the object from the extracted node name, and vary the size of an object 709 to be referred to on the basis of the extracted size information 708 and URL information 711. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明はオブジェクト生成装置およびその方法に関し、例えば、シーン記述により画面中に配置すべき静止画、動画、コンピュータグラフィクスなどのオブジェクトの生成に関する。
【０００２】
【従来の技術】
ディジタルデータ処理能力の向上と通信技術の発展によって、いわゆるマルチメディアを簡単に表現することができるようになり、複数の静止画、動画、音声、ベクトルグラフィクスなどを用いた複雑な表現が一般的になった。このような複合的なマルチメディア（あるいはマルチオブジェクト）表現の方法として、オブジェクトベース符号化方式がある。この具体的な例が、ＭＰＥＧ−４システムと呼ばれる標準規格（ＩＳＯ／ＩＥＣ１４４９６−１）である（例えば、非特許文献１参照）。
【０００３】
ＩＳＯ／ＩＥＣ１４４９６−１では、ＭＰＥＧ−４のオーディオビジュアルオブジェクトを空間にどのように配置して表示すべきかを示す、ＢＩＦＳ（ＢｉｎａｒｙＦｏｒｍａｔｆｏｒＳｃｅｎｅｓ）と呼ばれるバイナリ形式が規格化されている。ＢＩＦＳは、オブジェクト記述子（ＯｂｊｅｃｔＤｅｓｃｒｉｐｔｏｒ）およびその特殊な形式である初期オブジェクト記述子（ＩｎｉｔｉａｌＯｂｊｅｃｔＤｅｓｃｒｉｐｔｏｒ）、さらに、音声や映像をエンコードしたＭＰＥＧ−４ビジュアルやＭＰＥＧ−４オーディオの定義によるビットストリームとともに、複合化されたいわゆるマルチメディアのシーンを合成する。
【０００４】
ＢＩＦＳは、その定義の根底にＶＲＭＬ（ＩＳＯ／ＩＥＣ１４７７２−１ＶｉｒｔｕａｌＲｅａｌｉｔｙＭｏｄｅｌｉｎｇＬａｎｇｕａｇｅ）を使用したもので、ＶＲＭＬを拡張したものと考えてよい。
【０００５】
規格の詳細はＩＳＯ／ＩＥＣ１４４９６−１に定義されているので、ここでは説明しないが、以下では、発明の本質をより簡潔に説明するため、ＶＲＭＬを基本としたＢＩＦＳの表現をＢＩＦＳソースとよび、テキスト形式で表現する。ＢＩＦＳのテキスト表現には、ＸＭＴ（ＥｘｔｅｎｓｉｂｌｅＭＰＥＧ−４Ｔｅｘｔｕａｌｆｏｒｍａｔ）とよばれるＸＭＬ形式の表現があり、以下では、このようなＸＭＴによる表現も含めて、明示的にＸＭＴと表現している場合を除き、ＢＩＦＳソースとする。すなわち、ＢＩＦＳソースは、オブジェクトベース符号化方式において、オブジェクトの配置や動作を定義するシーンを符号化するためのシーン記述方式の具体的な例である。
【０００６】
図１から３は、ＶＲＭＬを基本としたＢＩＦＳソースの記述例を示す図である。
【０００７】
先頭に記述された「＃ＶＲＭＬＶ２．０ｕｔｆ８」はコメントで、ＶＲＭＬの規約上定められたものであって発明と直接関係はないが、一般的な記述法の例として記述されている。なお「ｕｔｆ８」は、ＩＳＯ１０６４６を基準とするＵＴＦ−８エンコーディングによる国際文字の指定である。
【０００８】
図４は、図１から３に示すＢＩＦＳソースをエンコードした場合に、その結果として期待される表示の効果を模式的に示す図である。以下では、図１から３のＢＩＦＳソースの例と図４を用いて詳細に説明する。
【０００９】
図１に示す二つ目のコメント行「＃ −−− 背景画像 −−−」以下は、背景の静止画像を示し、ｕｒｌで指定されるオブジェクト「ｂａｃｋｇｒｏｕｎｄ．ＪＰＧ」が図４に示す背景画２０１になる。
【００１０】
同様に、三つ目のコメント行「＃ −−− 静止画：画像１ −−−」以下は図４に示す静止画２０２に、四つ目のコメント行「＃ −−− 動画：画像２ −−−」以下は図４に示す動画２０３にそれぞれ対応する。
【００１１】
図２に示すコメント行「＃ −−− 動作 −−−」と、図３に示すコメント行「＃ −−− イベント間の接続 −−−」以下の記述は、ポインティングデバイスによってオブジェクトが選択された際の動作などをそれぞれ定義し、例えば、図４の拡大エリア２０４に、選択されたオブジェクトが拡大表示される、といった動作を定義する。
【００１２】
以下では、図１から３のＢＩＦＳソース例についてごく簡単に説明する。
【００１３】
まず、Ｇｒｏｕｐ｛ … ｝の部分は、表示されるシーン全体をグルーピングするノードを示す。次のｃｈｉｌｄｒｅｎ［ … ］はグループの子ノードの集合を示し、次のＴｒａｎｓｆｏｒｍ２Ｄ｛ … ｝で最初のノードの座標変換を表す。
【００１４】
Ｔｒａｎｓｆｏｒｍ２Ｄは、さらに子ノードを保持し、Ｓｈａｐｅ｛ … ｝によって特定のオブジェクトをシーンに取り込む。これは、Ｓｈａｐｅで指定されたオブジェクトをＴｒａｎｓｆｏｒｍ２Ｄによってシーン中の座標にマッピングすることと等価である。Ｓｈａｐｅは、さらにａｐｐｅａｒａｎｃｅＡｐｐｅａｒａｎｃｅ｛ … ｝とｇｅｏｍｅｔｒｙＢｉｔｍａｐ｛ … ｝を保持し、前者はＳｈａｐｅの視覚特性、後者はその具体的な表示を指定する。視覚特性としては、ｔｅｘｔｕｒｅＩｍａｇｅＴｅｘｔｕｒｅ｛ … ｝によって具体的な画像オブジェクトを指定し、そのオブジェクトを示すｕｒｌが記述される。ｓｃａｌｅやｔｒａｎｓｌａｔｉｏｎはそれぞれ倍率と座標を記述する。
【００１５】
ここで着目すべき点は、Ｔｒａｎｓｆｏｒｍ２Ｄとｇｅｏｍｅｔｒｙに記述されたｓｃａｌｅやｔｒａｎｓｌａｔｉｏｎの値である。ｓｃａｌｅの値は、ＲＯＵＴＥコマンドにより、オブジェクト選択時から一定の時間で、選択されたオブジェクトの拡大と縮小を行う。その動作は、例えばＤＥＦＰＩＣ１ＳＣＡＬＥ（図２）で定義されたＰｏｓｉｔｉｏｎＩｎｔｅｒｐｏｌａｔｏｒ２Ｄ｛ … ｝によって値が変化することで行われる。この例では、縦横が０．５（半分）から１．０（等倍）へと変化し、さらに元の０．５へと戻される。
【００１６】
同様に、ｔｒａｎｓｌａｔｉｏｎの値も変化してオブジェクトの位置が変更される。ｔｒａｎｓｌａｔｉｏｎは、対象になるオブジェクトの平行移動を意味し、この値が変化することによって、オブジェクトの位置が移動する。
【００１７】
以上では、何ら具体的な説明なしに、オブジェクトという言葉を使って説明を行ったが、このオブジェクトの位置と大きさについてより詳しく説明するため、さらに具体的な表現を用いることにする。
【００１８】
図１から３に示すＢＩＦＳソースでは、我々が目にする具体的なオブジェクトとして、背景画のｂａｃｋｇｒｏｕｎｄ．ＪＰＧ、静止画のｐｉｃｔｕｒｅ１．ｉｍｇ、および、動画のｐｉｃｔｕｒｅ２．ｍｏｖｉｅが使用されている。これらのオブジェクトは、ｕｒｌの記述の後にＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）形式で記述される。この表記は、よく知られたＲＦＣに定義されたものであり、かつ、ＶＲＭＬの定義形式であるので、ここでは詳しく説明しない。
【００１９】
これら三つのオブジェクトは、この例では、ｂａｃｋｇｒｏｕｎｄ．ＪＰＧが横５１２ピクセル×縦３８４ピクセル、ｐｉｃｔｕｒｅ１．ｉｍｇおよびｐｉｃｔｕｒｅ２．ｍｏｖｉｅが横３２０ピクセル×縦２４０ピクセルと仮定されている。上述したｓｃａｌｅは、拡大縮小の割合を示すから、各オブジェクトは、１．０（等倍）で表現された後、それらを包含する領域としての背景画ｂａｃｋｇｒｏｕｎｄ．ＪＰＧを除き、初期値で０．５、すなわち半分のサイズに縮小される。背景画ｂａｃｋｇｒｏｕｎｄ．ＪＰＧはそのままのサイズであるが、静止画と動画は、半分に縮小され、初期段階のサイズは横１６０ピクセル×縦１２０ピクセルである。また、縮小された静止画と動画の位置は、ｔｒａｎｓｌａｔｉｏｎによって座標（１６８、７６）と座標（１６８、−７６）に指定されるため、表示領域の右上と右下に移動される。
【００２０】
このようにして、図４に示すような具体的な位置表現が行われる。注意すべきことは、このようなＢＩＦＳソースによって定義された表示の効果は、ｔｅｘｔｕｒｅ内のｕｒｌで指定されたオブジェクトの絶対的な大きさを何ら規定しない状態で表現されることである。ＩＳＯ／ＩＥＣ１４４９６−１の定義は、その原型となったＶＲＭＬの定義と同様に、その座標系に、いわゆる右手系のデカルト三次元座標を使用し、表現されるテクスチャは、元の大きさに対する比率でのみ表現される。そのため、明示的に、この座標系の中に有限の大きさで定義されるオブジェクトを配置して、そのオブジェクトにテクスチャを貼り付けるといった記述を行わない限り、そのオブジェクトが表示される大きさは、オブジェクトしか知り得ないことになる。
【００２１】
図１から３に示すＢＩＦＳソースの例では、背景画２０１が横５１２ピクセル×縦３８４ピクセルで、かつ、静止画２０２が横６４０ピクセル×縦４８０ピクセルである場合、背景画２０１は、その位置は静止画２０２と完全には重ならないとは言え、いわゆるオーバラップされた状態になり隠されてしまう。このような表示は、おそらく図１から３に示すＢＩＦＳソースの作成者の意図とは異なる。言い換えれば、ＢＩＦＳソースの作成者は、表示されるオブジェクトの絶対的な大きさを認識してソースを記述しているから意図した表現が行われる。この例では、ＢＩＦＳソースの作成者は、背景画２０１を横５１２ピクセル×縦３８４ピクセル、静止画２０２と動画２０３を横３２０ピクセル×縦２４０ピクセルと認識していて、それによりＢＩＦＳソースの作成者の意図が正しく表現される。
【００２２】
ここまでは、図１から３に示すＢＩＦＳソースの例に従い、オブジェクトの配置を中心に説明したが、ＩＳＯ／ＩＥＣ１４４９６−１の定義によれば、そのＢＩＦＳの表示領域については、ＤｅｃｏｄｅｒＣｏｎｆｉｇＤｅｓｃｒｉｐｔｏｒ（復号器設定記述子）内のＤｅｃｏｄｅｒＳｐｅｃｉｆｉｃＩｎｆｏ（復号器固有情報）に保持されるＢＩＦＳＣｏｎｆｉｇにおいて定義される。ＢＩＦＳＣｏｎｆｉｇの定義は次のようなものである。

【００２３】
もし、ｈａｓＳｉｚｅが真（ｔｒｕｅ）にセットされていれば（すなわちビットが立っていれば）、ｐｉｘｅｌＷｉｄｔｈとｐｉｘｅｌＨｅｉｇｈｔによって、表示される具体的な表現（シーン）の領域サイズが指定されることになる。
【００２４】
このように、エンコードされたビットストリーム中にサイズが指定されることにより、ＢＩＦＳソースで指示された表現は、さらに、その表示領域が指定されることになる。
【００２５】
このようにして、ＢＩＦＳソースもしくはシーン記述と、これに記述されないビットストリーム中のデータ表現により、複数のオブジェクトが複雑に表現された、いわゆるマルチメディアのシーンを自在に表現することが可能である。
【００２６】
上述したように、ＢＩＦＳソースの作成者は、オブジェクトの配置と動作に関する定義、すなわちシーン記述を、シーン記述から参照される静止画や動画などの外部参照オブジェクトのサイズなどを予め判断し、意図した状態になるように設計しなければならない。しかし、現実には、それでも不充分な場合がある。任意のあらゆる参照オブジェクトに対して意図が反映されるようにシーン記述を設計することは、全く不可能であるとは言えないが、そのためには、シーン記述であるＢＩＦＳソースの作成者の充分な配慮が必須である。例えば、図１から３に例示したＢＩＦＳソースは、横３２０×縦２４０ピクセルのオブジェクトを参照する場合、図４に示すような表示が行われ得る。しかし、オブジェクトのサイズが異なれば、極端な場合、参照されるオブジェクトのほんの一部が表示されるだけになる。
【００２７】
また、ＢＩＦＳソースの作成者が予め特定の意図をもったＢＩＦＳソースを作成し、しかる後、第三者が参照されるオブジェクトを入れ替えると、ＢＩＦＳソースの意図どおりに表現されない場合がある。この問題は、例えばＭＰＥＧ−４の規格をよく知るデザイナがある意図をもってＢＩＦＳソースを設計し、しかる後、エンコードされたＢＩＦＳやＢＩＦＳソースと、ＢＩＦＳを生成するエンコーダを含むＭＰＥＧ−４データの簡易生成ツールを提供するといった場合に顕著に現れる。このような場合、提供されたＢＩＦＳやＢＩＦＳソース、あるいは、ツールなどを利用して、その利用者が欲する静止画や動画を埋め込むといったことが想定されるが、その利用者は、提供者の意図に合った大きさのオブジェクトを予め準備しなければならない。
【００２８】
上記の問題に対処するため、ＢＩＦＳソースに期待するオブジェトのサイズなどをコメントとして埋め込む手法や、提供するツールに期待するオブジェトのサイズへの変換機能を埋め込む、といったことが行われる。しかし、これらの手法は、ツールに固有の機能でのみ実現されたり、利用者がＢＩＦＳソースを読んで独自に判断する必要が生じるなど、不便である。また、ツールに固有の機能で実現する場合は、ＢＩＦＳ（もしくはＢＩＦＳソース）に独自の拡張を施したり、ＢＩＦＳとは異なる別情報を用意したりする必要があり、さらに互換性の問題も発生する。
【００２９】
【非特許文献１】
ＩＳＯ／ＩＥＣ１４４９６−１
【００３０】
【発明が解決しようとする課題】
本発明は、上述の問題を個々にまたはまとめて解決するもので、シーン記述の意図に合うオブジェクトの生成を目的とする。
【００３１】
【課題を解決するための手段】
本発明は、前記の目的を達成する一手段として、以下の構成を備える。
【００３２】
本発明は、オブジェクトベース符号化に関連するオブジェクトを生成する際に、シーンを符号化するためのシーン記述から、任意のノード名を定義する予約語によって定義されたノード名を抽出し、抽出したノード名からオブジェクトのサイズに関する情報、および、オブジェクトの実体の位置を示す情報を抽出し、抽出した位置情報およびサイズ情報に基づき、参照すべきオブジェクトの大きさを変更することを特徴とする。
【００３３】
【発明の実施の形態】
以下、本発明にかかる実施形態の画像処理を図面を参照して詳細に説明する。
【００３４】
第１実施形態は、任意のノード名称を定義する予約語によって定義する、外部参照オブジェクトに関わるノード名称の文字列を、期待されるオブジェクトの縦横のサイズ（大きさ）の値を付加した文字列として作成する。これにより、予めＢＩＦＳソース中に、期待されるオブジェクトサイズ情報を記述しておくことで、サイズ情報に基づき該当する外部参照オブジェクトの大きさを変更し、ＢＩＦＳ作成者が意図したサイズでオブジェクトを表現するＭＰＥＧ−４データを生成する。
【００３５】
第１実施形態において説明するＭＰＥＧ−４データ生成装置は、オブジェクトベース符号化によるＩＳＯ／ＩＥＣ１４４９６（ＭＰＥＧ−４）に関するオブジェクトデータ生成装置である。ここで説明するＭＰＥＧ−４データ生成装置は、細部においては異なる構成で表現される場合もあるが、ハードウェアで構成しても、コンピュータ装置にソフトウェアを供給するなどしてソフトウェアで構成しても実現可能で、その構成要素は、本発明の本質に合致する限り、組み替え、統合、再配置が可能である。
【００３６】
なお、以下の説明では、簡略化のため、図１から３に示したＢＩＦＳソースと、図４に示した期待される表示の効果を説明の前提にする。
【００３７】
【第１実施形態】
上述したように、ＢＩＦＳソースが例えばＶＲＭＬを基本とした表現で記述されるものとすると、特定のオブジェクトを貼り付ける部分は、例えば図５に示すように記述される。図１から３に示したＢＩＦＳソースの例では、Ｓｈａｐｅで記述されたノードを、その配置を行うＴｒａｎｓｆｏｒｍ２Ｄノードの子ノードとして記述したが、図５には、とくに説明に必要な部分のみを取り出して示す。
【００３８】
図６はＢＩＦＳソースの一例を示す図で、「ＤＥＦ」は、ＶＲＭＬにおいて任意のノード名称を定義する予約語である。ＢＩＦＳソースの解釈モジュールは、ＤＥＦによって定義された名称を参照する場合に備えて、これを保持する。
【００３９】
図７はＢＩＦＳソース記述方式におけるノード名称の作成方法を説明する図である。
【００４０】
この例のノードの名称は、キーワードになる文字列と、ＢＩＦＳ作成者が意図したオブジェクトの表示サイズの情報を基に作成される。キーワードには、文字列「ＰＬＡＣＥＨＯＬＤＥＲ」が使用されている。また、ＢＩＦＳ作成者が意図したオブジェクトの表示サイズは、キーワード文字列「ＰＬＡＣＥＨＯＬＤＥＲ」の直後に、縦横のサイズを示す二つの値を区切り文字「＿」（アンダスコア）を使って記述されている。図７に示す「＿３２０＿２４０」は縦２４０×横３２０ピクセルを示す。このように、ノード名称は、特定の文字列、期待する縦および横の大きさ、並びに、区切り文字から構成される。
【００４１】
なお、ノード名称の作成方法のうち、キーワード文字列および区切り文字の種類は、発明の本質ではなく、変更が可能である。また、キーワード文字列とサイズ情報の並び順も同様に、発明の本質ではなく、たとえ前後が逆であっても構わない。すなわち、ＤＥＦで定義するノード名称に、キーワード文字列とＢＩＦＳ作成者の期待するオブジェクトサイズを含むことが重要である。
【００４２】
ＢＩＦＳソースの文法上、ＤＥＦで定義するノード名称は、その名称の有効なスコープがグローバルであることなどにより、規格上一意に定めなければならないが、これは例えば次の方法で解決することができる。
【００４３】
図８はノード名称を一意に定める方法の一例を示す図である。
【００４４】
図８に示す例では、作成しようとするノード名称が重複する場合、新たに作成するノード名称のキーワードに一意の数を文字列として付加する。これにより、ノード名称の重複を防ぎ、一意性を保つ。例えば、作成しようとするノード名称「ＰＬＡＣＥＨＯＬＤＥＲ＿３２０＿２４０」と同名のノードが既に存在する場合、キーワード文字列の後ろに数字「１」を付加して、文字列「ＰＬＡＣＥＨＯＬＤＥＲ１＿３２０＿２４０」をノード名称にする。もし、以降もノード名称が重複する場合は、付加する数字をインクリメントして「ＰＬＡＣＥＨＯＬＤＥＲ２＿３２０＿２４０」のようなノード名称にすればよい。
【００４５】
以下では、この方法によってノード名称が定義されたＢＩＦＳソースを例に説明する。
【００４６】
［構成］
図９は、ソフトウェアでＭＰＥＧ−４データ生成装置を構成する場合のコンピュータ装置の構成例を示すブロック図である。
【００４７】
ＣＰＵ１０１は、ＲＯＭ１０２やハードディスク（ＨＤ）１０９に記憶されたデータや制御プログラム、オペレーティングシステム（ＯＳ）、ＭＰＥＧ−４データ生成用のアプリケーションプログラムなどに従い、ＲＡＭ１０３、操作部１０４、モニタ１０６などを制御して、ＭＰＥＧ−４データ生成にかかわる各種の制御や処理を行う。
【００４８】
ＲＡＭ１０３は、ＣＰＵ１０１が各種プログラムを実行するための作業領域、操作部１０４から入力されるデータなどの一時保存領域である。
【００４９】
操作部１０４は、マウスやキーボードなどで、ＭＰＥＧ−４データ生成にかかわる各種の制御や処理の設定データなどを、ユーザが入力するためのものである。
【００５０】
モニタ１０６は、ＣＲＴやＬＣＤなどで、画像処理結果や、操作部１０４による操作の際のユーザインタフェイス画面などが表示される。
【００５１】
ネットワークインタフェイスカード（ＮＩＣ）１０８は、ＢＩＦＳデータ、ＭＰＥＧ−４データ、オブジェクトを含む各種データをネットワークを介して他のコンピュータ装置とやり取りするための通信インタフェイスである。
【００５２】
なお、図９には示さないが、操作部１０４、モニタ１０６、ＮＩＣ１０８およびＨＤ１０９は、それぞれ所定のインタフェイスを介して、ホストコンピュータのシステムバス１１０に接続されている。
【００５３】
図１０はＭＰＥＧ−４データ生成装置の機能構成例を示すブロック図である。
【００５４】
図１０において、ＢＩＦＳソースエディタモジュール７０１は、ユーザ操作７０２により、ＢＩＦＳソース７０３を作成する。なお、ＢＩＦＳソースエディタモジュール７０１は、説明状記載さするが、第１実施形態には必ずしも必要ではない。
【００５５】
ＢＩＦＳソース解釈モジュール７０４は、予め作成されたＢＩＦＳソース７０３を読み込み、その内容を解釈する。このようなＢＩＦＳソースの解釈を行うモジュールは、ＭＰＥＧ−４の規格としてＩＳＯより出版されているリファレンスコードをはじめ、様々な形で一般に使用されている。
【００５６】
ノード名称抽出モジュール７０５は、外部参照オブジェクトに関わるＤＥＦによって定義されたノード名称およびノードをＢＩＦＳソース７０３から抽出する。サイズ情報抽出モジュール７０７は、ノード名称の文字列からサイズ情報７０８を抽出する。ＵＲＬ情報抽出モジュール７１０は、抽出されたノードにかかるＵＲＬ指定された外部の参照オブジェクト７０９の実体の位置を示すＵＲＬ情報７１１を抽出する。
【００５７】
オブジェクトトランスコーダモジュール７１２は、入力される動画や静止画のオブジェクト７０９を指定された大きさに変換した参照オブジェクトデータ７１４を生成する。このようなトランスコーダもまた、様々な形で一般に使用されている。
【００５８】
ＢＩＦＳエンコーダモジュール７０６は、ＢＩＦＳソースもしくはＢＩＦＳソースを解釈した結果を入力として、ＢＩＦＳデータ７１３を生成する。ＢＩＦＳソース解釈モジュール７０４と同様に、ＩＳＯより出版されているリファレンスコードをはじめとして、様々な形で一般に使用されている。
【００５９】
ＭＰＥＧ−４データ生成モジュール７１５は、最終的に有効なＭＰＥＧ−４データ７１６を生成する。
【００６０】
なお、上記のモジュール群は、どれ一つを取っても一般的に使用されているモジュールであり、それ自体は発明の本質ではない。本発明の本質は、これらのモジュールとその機能の組み合わせにある。
【００６１】
［動作］
図１１はＭＰＥＧ−４データ生成装置の動作を説明するフローチャートである。
【００６２】
まず、ＣＰＵ１０１は、ＢＩＦＳソース７０３を読み込み、ＢＩＦＳソース解釈モジュール７０４によりＢＩＦＳソース７０３を解釈する（Ｓ８１）。一般に、ＢＩＦＳ解釈モジュール７０４は、内部的にＢＩＦＳソースに記載されたいわゆるシーングラフをメモリ（ＲＡＭ１０３など）上に展開する。あるいは、ＢＩＦＳソースに記載された内容に従って内部的にイベントを発生させ、イベントドリブンにその処理関数を呼び出す場合もある。
【００６３】
次に、ノード名称抽出モジュール７０５により、ＤＥＦで定義され、かつ、キーワードになる文字列を含むノード名称およびそのノードを抽出する（Ｓ８２）。図１２はノード名称の抽出処理例を示す図である。この例では、予め定義されたキーワード「ＰＬＡＣＥＨＯＬＤＥＲ」が含まれるノード名称の「ＰＬＡＣＥＨＯＬＤＥＲ＿３２０＿２４０」およびそのノードが抽出される。
【００６４】
次に、サイズ情報抽出モジュール７０７により、抽出されたノード名称の文字列を分解して、オブジェクトのサイズ情報７０８を抽出する（Ｓ８３）。図１３はサイズ情報７０８の抽出処理例を示す図である。この例では、ノード名称「ＰＬＡＣＥＨＯＬＤＥＲ＿３２０＿２４０」から、区切り文字「＿」を目印に「３２０」および「２４０」というサイズ情報が取得される。
【００６５】
次に、ＵＲＬ情報抽出モジュール７１０により、抽出したノードが参照するノードおよびそれ以下のノードを検索し、外部の参照オブジェクト７０９自体の位置を示すＵＲＬ情報７１１を抽出する（Ｓ８４）。図１４はＵＲＬ情報７１１の抽出処理例を示す図である。この例では、抽出したノード「ＰＬＡＣＥＨＯＬＤＥＲ＿３２０＿２４０」にかかるノード「Ｓｈａｐｅ」の子ノードであるノード「ＩｍａｇｅＴｅｘｔｕｒｅ」までノードを辿った後、ＵＲＬ情報７１１として「ｐｉｃｔｕｒｅ１．ｉｍｇ」が抽出される。
【００６６】
また、図１５は、ＤＥＦによって定義されるノードが、外部参照オブジェクトの位置情報を記述するノードを参照するＢＩＦＳソースの例を示す図である。この場合も、図１４の例と同様に、ＵＲＬ情報の検索は可能で、ＵＲＬ情報抽出モジュール７１０をそのまま利用することができる。
【００６７】
ステップＳ８３およびＳ８４で抽出されたサイズ情報７０８およびＵＲＬ情報７１１は、オブジェクトトランスコーダモジュール７１２に渡され、動画や静止画の参照オブジェクト７０９は指定サイズに変換される（Ｓ８５）。すなわち、オブジェクトトランスコーダモジュール７１２は、ＵＲＬ情報７１１に基づき、ＢＩＦＳソースの利用者が実際に使用したいオブジェクトを読み込み、サイズ情報７０８に基づき、必要であれば、参照オブジェクト７０９を変換して、指定されたサイズの参照オブジェクトデータ７１４を出力する。勿論、もし、オブジェクトトランスコーダモジュール７１２に入力される参照オブジェクト７０９のサイズが、サイズ情報７０８が示すサイズと同じであれば、変換処理を行う必要はない。
【００６８】
オブジェクト変換処理（Ｓ８５）が行われると、参照オブジェクト７０９のサイズ変換が行われ、ＢＩＦＳソースの作成者の意図に合った大きさのオブジェクトが生成されることになる。例えば「ｐｉｃｔｕｒｅ１．ｉｍｇ」がＪＰＥＧ画像であり、ＢＩＦＳソースの作成者の意図したサイズが縦２４０×横３２０ピクセルで、ＢＩＦＳソースの利用者が使用する参照オブジェクトが縦４８０×横６４０ピクセルの画像であれば、参照オブジェクト７０９は縦横１／２に縮小された縦２４０×横３２０ピクセルの画像に変換される。もし、このサイズ変換が実行されなければ、ＢＩＦＳソース作成者の意図とは異なる縦横二倍のオブジェクトが表示される場合がある。
【００６９】
なお、ステップＳ８２からＳ８５の処理は、ステップＳ８６の判定によって、ＢＩＦＳソース７０３が終了するまで繰り返し実行される。また、オブジェクト変換処理（Ｓ８５）は、ＵＲＬ情報の抽出処理（Ｓ８４）の直後に位置しくなくとも、ＢＩＦＳデータ生成処理（Ｓ８７）の直前に行ってもよい。
【００７０】
ＢＩＦＳソース７０３が終了すると、ＢＩＦＳエンコーダモジュール７０６により、ＢＩＦＳソース解釈モジュール７０４の解釈に従ってＢＩＦＳソース７０３からＢＩＦＳデータ７１３をエンコード出力する（Ｓ８７）。
【００７１】
そして最終的に、ＭＰＥＧ−４データ生成モジュール７１５により、ＢＩＦＳソースの作成者の意図に合うように、参照オブジェクト７０９から変換された参照オブジェクトデータ７１４と、ＢＩＦＳデータ７１３とから有効なＭＰＥＧ−４データ７１６が生成される（Ｓ８８）。
【００７２】
【第２実施形態】
以下、本発明にかかる第２実施形態の画像処理を説明する。なお、第２実施形態において、第１実施形態と略同様の構成については、同一符号を付して、その詳細説明を省略する。
【００７３】
第２実施形態は、期待されるオブジェクトサイズ情報をエンコードした文字列を使い、任意のノード名称を定義する予約語によってノード名称を定義し、予めＢＩＦＳソースの中にサイズ情報を記述することで、第１実施形態と同様、ＢＩＦＳソースの作成者が意図したサイズでオブジェクトを表現するＭＰＥＧ−４データを生成するものである。
【００７４】
任意のノード名称を定義する予約語によって定義されるノード名称から、期待されるオブジェクトの縦横のサイズを取得し、外部参照オブジェクトの大きさを変更する点において、第２実施形態は第１実施形態と類似するが、記述するサイズ情報をエンコードすることで、サイズ情報を利用者から隠蔽して、縦横のサイズ以外の方法で表現したいといった要求、規格上名称に要求される一意性のためのより柔軟な配慮、といった利便性を生む、副次的な効果も期待される。
【００７５】
第２実施形態は、期待されるオブジェクトのサイズ情報をエンコードした文字列を使い、「ＤＥＦ」によってノード名称を定義し、予めＢＩＦＳソースの中にサイズ情報を記述することで、ＢＩＦＳソースの作成者が意図するサイズでオブジェクトを表現するＭＰＥＧ−４データを生成する。
【００７６】
第２実施形態で、第１実施形態と異なる点は、オブジェクトのサイズ情報をエンコードする点で、ノード名称の作成方法およびサイズ情報抽出モジュール７０７の処理が異なるものになる。これ以外は、すべて第１実施形態と同じで、ＭＰＥＧ−４データ生成装置も、第１実施形態で説明した図９および図１０の構成がそのまま利用することができる。
【００７７】
以下、第１実施形態と異なる、ノード名称の作成方法およびサイズ情報抽出モジュール７０７の処理を詳細に説明する。
【００７８】
図１６はＢＩＦＳソースの例を示す図、図１７から図１９はノード名称の作成例を説明する図である。
【００７９】
ノード名称は、キーワード、オブジェクトのサイズ情報および区切り文字を使って作成するが、オブジェクトのサイズ情報をエンコードすることが第１実施形態と異なる。
【００８０】
図１７はオブジェクトのサイズ情報をエンコードする方法例を示すフローチャートである。この例では、オブジェクトのサイズ情報を表す縦・横の数値を四桁の１６進数に変換し（Ｓ９１）、横サイズを上位四桁、縦サイズを下位四桁とする八桁の１６進数（に対応する文字列）を作成する（Ｓ９２）エンコードを行う。
【００８１】
図１８は、このエンコードを具体的に示す図である。縦２４０×横３２０ピクセルを表す数値「２４０」および「３２０」をそれぞれ四桁の１６進数「００Ｆ０」および「０１４０」に変換し、これをつないで「０１４０００Ｆ０」という文字列を作成する。
【００８２】
図１９はエンコードしたオブジェクトのサイズ情報を使ってノード名称を作成する処理を説明する図である。この例では、縦２４０×横３２０ピクセルというサイズ情報、キーワード「ＰＬＡＣＥＨＯＬＤＥＲ」および区切り文字「＿」を使用して、ノード名称「ＰＬＡＣＥＨＯＬＤＥＲ＿０１４０００Ｆ０」を作成する。なお、作成するノード名称を一意に定める処理は、第１実施形態と同様の手法が使えることはいうまでもない。
【００８３】
図２０および図２１はサイズ情報抽出モジュール７０７の動作を説明する図である。
【００８４】
サイズ情報抽出モジュール７０７は、ノード名称の文字列を分解し、オブジェクトのサイズ情報を抽出する。第１実施形態と異なるのは、サイズ情報の抽出の際にデコード処理が必要な点である。サイズ情報抽出モジュール７０７は、ノード名称の文字列からサイズ情報を示す文字列を抽出した後、この文字列をデコードしてサイズ情報を抽出する。
【００８５】
図２０はノード名称からサイズ情報を抽出する具体例を示す図である。この例では、サイズ情報抽出モジュール７０７により、ノード名称の文字列から文字列「０１４０００Ｆ０」を抽出し、この文字列をデコードして、縦２４０×横３２０ピクセルのサイズ情報を得る。つまり、抽出した文字列の上位四桁を１０進数に変換して横のサイズとして解釈し（Ｓ７１）、下位四桁を１０進数に変換して縦のサイズとして解釈する（Ｓ７２）。
【００８６】
なお、オブジェクトのサイズ情報のエンコード方法として、１０進数を１６進数に変換する手法を説明したが、これに限られるわけではない。また、より強固な暗号化ルーチンを用いるなどしても、本質には影響しない。このようなエンコードを行うことで、オブジェクトに期待する大きさをＢＩＦＳソースの利用者に明確にせず、隠蔽することにより、ＢＩＦＳソースの作成者の著作物としての保護を行うといった副次的な効果も期待される。例えば、上に説明した文字列「０１４０００Ｆ０」は、簡単な処理にも関わらず、オブジェクトの大きさそのものを、それから読み取ることは容易ではない。
【００８７】
また、第１実施形態で説明したように、名称の一意性という点でも副次的な効果が期待できる。簡単には、定義名の発現順序に従い、文字列にその順序番号を付加することで、文字列の長さを変えずに一意性を保持することも可能である。この場合、ＢＩＦＳソースからサイズ情報を抽出する際に、逆に順序番号を引いておけばよい。
【００８８】
さらに、エンコードおよびデコード処理が期待する「大きさ」として、縦と横の大きさを使用することを暗黙のうちに前提としたが、縦の大きさと縦横比の組み合わせなどを使用してもよい。あるいは、予め定められた絶対的な大きさに対する縦横比でもよい。これらの指定方法を使用した例の詳細は、方法が単純であるため、詳細な説明は省略する。
【００８９】
このように、ＢＩＦＳソースの作成者が期待するオブジェクトのサイズ情報が何らかの形で符号化されていたとしても、ＢＩＦＳソースに記述することが可能で、第１実施形態と同様に、ＢＩＦＳソースの作成者が意図するサイズでオブジェクトを表現するＭＰＥＧ−４データを生成することができる。
【００９０】
【変形例】
上記の実施形態においては、ＢＩＦＳソース７０３が参照するオブジェクト７０９のＵＲＬと、それと置き換えられるオブジェクトトランスコーダ７１２が出力するサイズ変換後の参照オブジェクトデータ７１４の関連については、何ら説明しなかった。しかし、このような関連付けは、アプリケーションプログラムの制御によって、様々に実現可能なものである。例えば、参照オブジェクト７０９のＵＲＬと置き換えられる参照オブジェクトデータ７１４の対応リストを何らかの形式で保持することで、これは容易に実現される。あるいは、ＵＲＬ情報７１１を取り出した後、ＢＩＦＳソース７０３を解釈した結果のデータに含まれる参照オブジェクト７０９のＵＲＬを、変換後の参照オブジェクトデータ７１４のＵＲＬに変更してもよい。あるいは、ＢＩＦＳソース７０３に含まれる参照オブジェクト７０９のＵＲＬ自体を書き換えることによっても実現可能である。さらに、ＵＲＬはオブジェクトを格納するファイルを示すとは限らず、オブジェクトのビットストリームを参照するＩＤである可能性もあり、このような場合には、ＵＲＬを書き換えずとも何ら矛盾しない場合もある。
【００９１】
また、本発明によるＭＰＥＧ−４データ生成装置は、必ずしも上に説明した様々なモジュールが独立したモジュールでなくても動作することは明らかである。このもっとも簡単な例は、ＢＩＦＳソース解釈モジュール７０４とＢＩＦＳエンコーダモジュール７０６に顕著である。これら二つのモジュールは、サイズ情報７０８およびＵＲＬ情報７１１の取出機能が実現されていれば、一つのモジュールとして実装可能である。本発明の本質は、ＢＩＦＳソース７０３に含まれる参照オブジェクト７０９を、ＢＩＦＳ作成者が意図したサイズに変換し、参照オブジェクトデータ７１４を生成する機能を一連の処理によって行うことにより実現され、各モジュールの独立性はその本質ではない。
【００９２】
このように、ＩＳＯ／ＩＥＣ１４４９６−１の定義に矛盾しない方法によって、期待されるオブジェクトの幅と高さをＢＩＦＳソースに記述し、これを解釈することによって、予め作成済み、あるいは、参照されるオブジェクトが未解決のＢＩＦＳソースであっても、オブジェクトが決定されたときに、参照されるオブジェクトに期待される幅と高さを利用して、ＢＩＦＳソースの作成者の意図に合う表現にすることができる。
【００９３】
なお、ＭＰＥＧ−４データ生成装置とそのモジュール構成は、必ずしもパーソナルコンピュータ上で動作するアプリケーションプログラムの分野に限定されるものではない。ＭＰＥＧ−４データ生成装置は、ディジタルカメラやカムコーダ、携帯電話、ディジタルレコーダなどのハードウェア機器に適用可能してもよい。
【００９４】
さらに、本発明は複数の機器（例えばホストコンピュータ、インタフェイス機器、スキャナ、プリンタなど）から構成されるシステムに適用しても、一つの機器（例えば携帯電話）からなる装置に適用してもよい。ＭＰＥＧ−４データ生成装置を構成するモジュールが、複数の機器に分散されているとしても、適当な通信機能やデータ共有などによりそれらが対話的に通信可能であれば、それら複数の機器が本発明を構成する。あるいは、複数の機器の一部がインターネットやその他の通信機能によって離れた場所にあってもよい。
【００９５】
また、本発明の目的は、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体（または記録媒体）を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることは言うまでもない。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム（ＯＳ）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【００９６】
さらに、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【００９７】
本発明を上記記憶媒体に適用する場合、その記憶媒体には、先に説明したフローチャートに対応するプログラムコードが格納されることになる。
【００９８】
【発明の効果】
以上説明したように、本発明によれば、シーン記述の意図に合うオブジェクトを生成することができる。
【図面の簡単な説明】
【図１】ＶＲＭＬを基本としたＢＩＦＳソースの記述例を示す図、
【図２】ＶＲＭＬを基本としたＢＩＦＳソースの記述例を示す図、
【図３】ＶＲＭＬを基本としたＢＩＦＳソースの記述例を示す図、
【図４】図１から３に示すＢＩＦＳソースをエンコードした場合に、その結果として期待される表示の効果を模式的に示す図、
【図５】第１実施形態におけるＢＩＦＳソースの例を示す図、
【図６】ＢＩＦＳソースの一例を示す図、
【図７】ＢＩＦＳソース記述方式におけるノード名称の作成方法を説明する図、
【図８】ノード名称を一意に定める方法の一例を示す図、
【図９】ソフトウェアでＭＰＥＧ−４データ生成装置を構成する場合のコンピュータ装置の構成例を示すブロック図、
【図１０】ＭＰＥＧ−４データ生成装置の機能構成例を示すブロック図、
【図１１】ＭＰＥＧ−４データ生成装置の動作を説明するフローチャート、
【図１２】ノード名称の抽出処理例を示す図、
【図１３】サイズ情報の抽出処理例を示す図、
【図１４】ＵＲＬ情報の抽出処理例を示す図、
【図１５】ＤＥＦによって定義されるノードが、外部参照オブジェクトの位置情報を記述するノードを参照するＢＩＦＳソースの例を示す図、
【図１６】ＢＩＦＳソースの例を示す図、
【図１７】ノード名称の作成例を説明する図、
【図１８】ノード名称の作成例を説明する図、
【図１９】ノード名称の作成例を説明する図、
【図２０】サイズ情報抽出モジュールの動作を説明する図、
【図２１】サイズ情報抽出モジュールの動作を説明する図である。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an object generation apparatus and method, for example, to generation of objects such as still images, moving images, and computer graphics to be arranged on a screen by scene description.
[0002]
[Prior art]
With the improvement of digital data processing capability and the development of communication technology, so-called multimedia can be expressed easily, and complex expressions using multiple still images, moving images, audio, vector graphics, etc. are generally used. became. There is an object-based encoding method as a method of such complex multimedia (or multi-object) expression. A specific example is a standard (ISO / IEC 14496-1) called MPEG-4 system (see, for example, Non-Patent Document 1).
[0003]
ISO / IEC 14496-1 standardizes a binary format called BIFS (Binary Format for Science) that indicates how to arrange and display MPEG-4 audiovisual objects in space. BIFS includes an object descriptor (Object Descriptor) and an initial object descriptor (Initial Object Descriptor), which is a special form of the object descriptor, and a bit stream defined by MPEG-4 visual and MPEG-4 audio encoding audio and video. The so-called multimedia scene is composed.
[0004]
BIFS uses VRML (ISO / IEC147772-1 Virtual Reality Modeling Language) as the basis of its definition, and can be considered as an extension of VRML.
[0005]
Details of the standard are defined in ISO / IEC 14496-1, and will not be described here, but in the following, in order to explain the essence of the invention more concisely, the representation of BIFS based on VRML is called BIFS source, Express in text format. The BIFS text representation includes an XML format representation called XMT (Extensible MPEG-4 Textual format). In the following, a case where the representation is explicitly expressed as XMT including such representation by XMT will be described. Except for the BIFS source. That is, the BIFS source is a specific example of a scene description method for encoding a scene that defines the arrangement and operation of an object in the object-based encoding method.
[0006]
FIGS. 1 to 3 are diagrams illustrating a description example of a BIFS source based on VRML.
[0007]
“#VRML V2.0 utf8” described at the beginning is a comment, which is defined in the VRML rules and is not directly related to the invention, but is described as an example of a general description method. “Utf8” is an international character designation based on UTF-8 encoding based on ISO10646.
[0008]
FIG. 4 is a diagram schematically showing the expected display effect when the BIFS source shown in FIGS. 1 to 3 is encoded. Hereinafter, a detailed description will be given using the example of the BIFS source of FIGS. 1 to 3 and FIG.
[0009]
The second comment line “# --- background image ---” shown in FIG. 1 shows a background still image, and an object “background” JPG designated by url is a background image 201 shown in FIG. become.
[0010]
Similarly, the third comment line “# −−− still image: image 1 −−−” and the following are added to the still image 202 shown in FIG. 4 and the fourth comment line “# −−− video: image 2 − "-" And the following respectively correspond to the moving image 203 shown in FIG.
[0011]
The comment lines "# ---- Operation ---" shown in FIG. 2 and the comment lines "# ---- Connection between events ----" shown in FIG. For example, an operation in which the selected object is enlarged and displayed in the enlarged area 204 of FIG. 4 is defined.
[0012]
In the following, the BIFS source example of FIGS.
[0013]
First, the group {...} Indicates a node that groups the entire scene to be displayed. The next child [...] Represents a set of child nodes of the group, and the next Transform2D {.
[0014]
The Transform 2D further holds child nodes, and takes a specific object into the scene by using Shape {...}. This is equivalent to mapping the object specified by Shape to coordinates in the scene by Transform2D. Shape further holds appearance Appearance {...} And geometry Bitmap {...}, The former designates the visual characteristics of Shape and the latter designates its specific display. As a visual characteristic, a specific image object is specified by texture ImageText {...}, And url indicating the object is described. Scale and translation describe the magnification and coordinates, respectively.
[0015]
The points to be noted here are the values of scale and translation described in Transform2D and geometry. The scale value is enlarged or reduced by a ROUTE command in a certain time from the time of object selection. The operation is performed by changing the value by PositionInterpolator2D {...} Defined by, for example, DEF PIC1SCALE (FIG. 2). In this example, the aspect changes from 0.5 (half) to 1.0 (same size), and then returns to the original 0.5.
[0016]
Similarly, the translation value also changes to change the position of the object. “translation” means a parallel movement of a target object, and the position of the object is moved by changing this value.
[0017]
In the above description, the term “object” is used without any specific explanation. However, in order to explain the position and size of the object in more detail, more specific expressions will be used.
[0018]
In the BIFS source shown in FIGS. 1 to 3, the background object background. JPG, still picture picture1. img and picture of the movie 2. movie is being used. These objects are described in the URL (Uniform Resource Locator) format after the description of url. Since this notation is defined in the well-known RFC and is in the VRML definition format, it will not be described in detail here.
[0019]
These three objects, in this example, are bakground. JPG is 512 pixels wide by 384 pixels long, picture1. img and picture2. The movie is assumed to be 320 pixels wide × 240 pixels high. Since the scale described above indicates the ratio of enlargement / reduction, each object is expressed by 1.0 (same size), and then the background background. Except for JPG, the initial value is reduced to 0.5, that is, half the size. Background picture background. JPG is the size as it is, but still images and moving images are reduced in half, and the initial size is 160 pixels wide by 120 pixels high. Further, the positions of the reduced still image and moving image are designated by coordinates (168, 76) and coordinates (168, -76) by translation, and are thus moved to the upper right and lower right of the display area.
[0020]
In this way, a specific position expression as shown in FIG. 4 is performed. It should be noted that the display effect defined by such a BIFS source is expressed in a state that does not define the absolute size of the object specified by url in the texture. The definition of ISO / IEC14496-1 uses a so-called right-handed Cartesian three-dimensional coordinate system for the coordinate system, as in the original VRML definition, and the expressed texture is a ratio to the original size. Expressed only in Therefore, unless you explicitly place an object defined with a finite size in this coordinate system and paste a texture on that object, the size of the displayed object is Only objects will be known.
[0021]
In the example of the BIFS source shown in FIGS. 1 to 3, when the background image 201 is 512 pixels wide × 384 pixels long and the still image 202 is 640 pixels wide × 480 pixels high, the background image 201 is positioned at Although it does not completely overlap with the still image 202, it becomes a so-called overlapped state and is hidden. Such a display is probably different from the intention of the BIFS source creator shown in FIGS. In other words, the creator of the BIFS source recognizes the absolute size of the displayed object and describes the source, so that the intended expression is performed. In this example, the creator of the BIFS source recognizes the background image 201 as 512 pixels wide × 384 pixels high, and the still image 202 and the moving image 203 as 320 pixels wide × 240 pixels high, thereby creating the BIFS source creator. The intention of is expressed correctly.
[0022]
Up to this point, the description has focused on the arrangement of objects according to the example of the BIFS source shown in FIGS. 1 to 3, but according to the definition of ISO / IEC 14496-1, the display area of the BIFS has a DecoderConfigDescriptor (decoder setting). Defined in BIFSConfig held in DecoderSpecificInfo (decoder specific information) in the descriptor). The definition of BIFSConfig is as follows.

[0023]
If hasSize is set to true (that is, if a bit is set), the area size of a specific expression (scene) to be displayed is specified by pixelWidth and pixelHeight.
[0024]
Thus, by specifying the size in the encoded bitstream, the display area of the expression indicated by the BIFS source is further specified.
[0025]
In this way, a so-called multimedia scene in which a plurality of objects are expressed in a complex manner can be freely expressed by a BIFS source or scene description and data expression in a bitstream not described in the BIFS source or scene description.
[0026]
As described above, the creator of the BIFS source intended the definition regarding the arrangement and operation of the object, that is, the scene description in advance by determining the size of an external reference object such as a still image or a moving image referenced from the scene description. It must be designed to be in a state. In reality, however, there are still cases where it is insufficient. Designing a scene description to reflect intent for any given reference object is not impossible at all, but for that purpose, the author of the BIFS source that is the scene description is sufficient. Consideration is essential. For example, when the BIFS source illustrated in FIGS. 1 to 3 refers to an object of horizontal 320 × vertical 240 pixels, the display shown in FIG. 4 can be performed. However, if the object sizes are different, in extreme cases, only a portion of the referenced object is displayed.
[0027]
Also, if the BIFS source creator creates a BIFS source with a specific intention in advance and then replaces an object that is referred to by a third party, the BIFS source may not be expressed as intended. The problem is that, for example, a designer who is familiar with the MPEG-4 standard designs a BIFS source with the intention, and then easily generates MPEG-4 data including an encoded BIFS or BIFS source and an encoder that generates BIFS. This is especially true when providing tools. In such a case, it may be assumed that the user wants to embed a still image or video that the user wants using the provided BIFS or BIFS source, or a tool, etc. It is necessary to prepare in advance an object of a size suitable for.
[0028]
In order to cope with the above problem, a method of embedding the object size expected in the BIFS source as a comment or a function for converting the object size expected in the provided tool is embedded. However, these methods are inconvenient because they are realized only by functions unique to the tool, or the user needs to read the BIFS source and make an independent determination. In addition, when implementing with the function specific to the tool, it is necessary to give the BIFS (or BIFS source) a unique extension, or to prepare different information different from the BIFS, which also causes compatibility problems. .
[0029]
[Non-Patent Document 1]
ISO / IEC 14496-1
[0030]
[Problems to be solved by the invention]
The present invention solves the above-mentioned problems individually or collectively, and aims to generate an object that meets the intention of scene description.
[0031]
[Means for Solving the Problems]
The present invention has the following configuration as one means for achieving the above object.
[0032]
When generating an object related to object-based encoding, the present invention extracts and extracts a node name defined by a reserved word that defines an arbitrary node name from a scene description for encoding a scene. Information regarding the size of the object and information indicating the position of the object entity are extracted from the node name, and the size of the object to be referenced is changed based on the extracted position information and size information.
[0033]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, image processing according to an embodiment of the present invention will be described in detail with reference to the drawings.
[0034]
In the first embodiment, a character string in which a node name character string related to an external reference object defined by a reserved word that defines an arbitrary node name is added with values of vertical and horizontal sizes (sizes) of expected objects. Create as. Thus, the expected object size information is described in the BIFS source in advance, so that the size of the corresponding external reference object is changed based on the size information, and the object is expressed in the size intended by the BIFS creator. MPEG-4 data to be generated is generated.
[0035]
The MPEG-4 data generation apparatus described in the first embodiment is an object data generation apparatus related to ISO / IEC14496 (MPEG-4) using object-based encoding. The MPEG-4 data generation apparatus described here may be expressed in different configurations in detail, but may be configured by hardware or software by supplying software to a computer apparatus. It is feasible and its components can be rearranged, integrated and rearranged as long as they meet the essence of the present invention.
[0036]
In the following description, for simplification, the BIFS source shown in FIGS. 1 to 3 and the expected display effect shown in FIG.
[0037]
[First Embodiment]
As described above, assuming that the BIFS source is described in an expression based on VRML, for example, a portion to which a specific object is pasted is described as shown in FIG. In the example of the BIFS source shown in FIGS. 1 to 3, the node described in Shape is described as a child node of the Transform2D node that arranges the node. In FIG. Show.
[0038]
FIG. 6 shows an example of a BIFS source. “DEF” is a reserved word that defines an arbitrary node name in VRML. The BIFS source interpretation module retains this in preparation for referencing names defined by DEF.
[0039]
FIG. 7 is a diagram for explaining a method for creating a node name in the BIFS source description method.
[0040]
The name of the node in this example is created based on the character string that becomes the keyword and information on the display size of the object intended by the BIFS creator. The character string “PLACEHOLDER” is used as the keyword. In addition, the display size of the object intended by the BIFS creator is described using two characters indicating vertical and horizontal sizes immediately after the keyword character string “PLACEHOLDER”, and using a delimiter “_” (underscore). “_320 — 240” shown in FIG. 7 represents 240 × 320 pixels. Thus, the node name is composed of a specific character string, expected vertical and horizontal sizes, and a delimiter character.
[0041]
Of the node name creation methods, the types of keyword character strings and delimiters are not the essence of the invention and can be changed. Similarly, the order of arrangement of the keyword character string and the size information is not the essence of the invention, and the order may be reversed. That is, it is important that the node name defined by DEF includes the keyword character string and the object size expected by the BIFS creator.
[0042]
In the BIFS source grammar, the node name defined by DEF must be uniquely defined in the standard because the effective scope of the name is global. This can be solved by the following method, for example. .
[0043]
FIG. 8 is a diagram illustrating an example of a method for uniquely determining a node name.
[0044]
In the example shown in FIG. 8, when the node names to be created overlap, a unique number is added as a character string to the keyword of the newly created node name. This prevents duplication of node names and maintains uniqueness. For example, when a node having the same name as the node name “PLACEHOLDER — 320 — 240” to be created already exists, a number “1” is added after the keyword character string, and the character string “PLACEHOLDER 1 — 320 — 240” is used as the node name. If the node name is duplicated thereafter, the added number may be incremented to a node name such as “PLACEHOLDER2_320_240”.
[0045]
Hereinafter, a BIFS source in which a node name is defined by this method will be described as an example.
[0046]
[Constitution]
FIG. 9 is a block diagram showing a configuration example of a computer apparatus when an MPEG-4 data generation apparatus is configured by software.
[0047]
The CPU 101 includes a RAM 103, an operation unit 104, a monitor 106, and the like according to data stored in the ROM 102 and the hard disk (HD) 109, a control program, an operating system (OS), an application program for generating MPEG-4 data, and the like. Various types of control and processing related to MPEG-4 data generation are performed.
[0048]
A RAM 103 is a work area for the CPU 101 to execute various programs and a temporary storage area for data input from the operation unit 104.
[0049]
The operation unit 104 is used by a user to input setting data for various controls and processes related to MPEG-4 data generation using a mouse or a keyboard.
[0050]
The monitor 106 is a CRT, LCD, or the like, and displays an image processing result, a user interface screen at the time of operation by the operation unit 104, and the like.
[0051]
A network interface card (NIC) 108 is a communication interface for exchanging various data including BIFS data, MPEG-4 data, and objects with other computer apparatuses via a network.
[0052]
Although not shown in FIG. 9, the operation unit 104, the monitor 106, the NIC 108, and the HD 109 are each connected to the system bus 110 of the host computer via a predetermined interface.
[0053]
FIG. 10 is a block diagram illustrating a functional configuration example of the MPEG-4 data generation apparatus.
[0054]
In FIG. 10, the BIFS source editor module 701 creates a BIFS source 703 by a user operation 702. Although the BIFS source editor module 701 is described in the form of explanation, it is not always necessary for the first embodiment.
[0055]
The BIFS source interpretation module 704 reads a BIFS source 703 created in advance and interprets its contents. Such a BIFS source interpretation module is generally used in various forms including a reference code published by ISO as an MPEG-4 standard.
[0056]
The node name extraction module 705 extracts the node name and node defined by the DEF related to the external reference object from the BIFS source 703. The size information extraction module 707 extracts the size information 708 from the character string of the node name. The URL information extraction module 710 extracts URL information 711 indicating the location of the entity of the external reference object 709 designated by the URL related to the extracted node.
[0057]
The object transcoder module 712 generates reference object data 714 obtained by converting an input moving image or still image object 709 into a specified size. Such transcoders are also commonly used in various forms.
[0058]
The BIFS encoder module 706 generates the BIFS data 713 by using the BIFS source or the interpretation result of the BIFS source as an input. Similar to the BIFS source interpretation module 704, it is generally used in various forms including reference codes published by ISO.
[0059]
The MPEG-4 data generation module 715 finally generates valid MPEG-4 data 716.
[0060]
Note that any one of the above module groups is a commonly used module, and is not itself the essence of the invention. The essence of the invention resides in the combination of these modules and their functions.
[0061]
[Operation]
FIG. 11 is a flowchart for explaining the operation of the MPEG-4 data generation apparatus.
[0062]
First, the CPU 101 reads the BIFS source 703 and interprets the BIFS source 703 by the BIFS source interpretation module 704 (S81). In general, the BIFS interpretation module 704 expands a so-called scene graph internally written in the BIFS source on a memory (such as the RAM 103). Alternatively, an event may be generated internally according to the contents described in the BIFS source, and the processing function may be called in an event driven manner.
[0063]
Next, the node name extraction module 705 extracts a node name including the character string defined by DEF and used as a keyword and its node (S82). FIG. 12 is a diagram illustrating an example of node name extraction processing. In this example, the node name “PLACEHOLDER_320 — 240” including the predefined keyword “PLACEHOLDER” and its nodes are extracted.
[0064]
Next, the size information extraction module 707 decomposes the extracted character string of the node name to extract the object size information 708 (S83). FIG. 13 is a diagram showing an example of the size information 708 extraction process. In this example, size information “320” and “240” is acquired from the node name “PLACEHOLDER — 320 — 240” using the delimiter “_” as a mark.
[0065]
Next, the URL information extraction module 710 searches for nodes referenced by the extracted node and nodes below it, and extracts URL information 711 indicating the position of the external reference object 709 itself (S84). FIG. 14 is a diagram showing an example of the URL information 711 extraction process. In this example, after tracing the node to the node “ImageTexture” that is a child node of the node “Shape” related to the extracted node “PLACEHOLDER — 320 — 240”, “picture1.img” is extracted as the URL information 711.
[0066]
FIG. 15 is a diagram illustrating an example of a BIFS source in which a node defined by DEF refers to a node describing position information of an external reference object. Also in this case, similarly to the example of FIG. 14, the URL information can be searched, and the URL information extraction module 710 can be used as it is.
[0067]
The size information 708 and the URL information 711 extracted in steps S83 and S84 are transferred to the object transcoder module 712, and the moving image or still image reference object 709 is converted to a specified size (S85). That is, the object transcoder module 712 reads an object that the BIFS source user actually wants to use based on the URL information 711, converts the reference object 709, if necessary, based on the size information 708, and is designated. The reference object data 714 having a predetermined size is output. Of course, if the size of the reference object 709 input to the object transcoder module 712 is the same as the size indicated by the size information 708, there is no need to perform conversion processing.
[0068]
When the object conversion process (S85) is performed, the size of the reference object 709 is converted, and an object having a size suitable for the intention of the creator of the BIFS source is generated. For example, “picture1.img” is a JPEG image, the size intended by the creator of the BIFS source is 240 × 320 pixels, and the reference object used by the BIFS source user is an image of 480 × 640 pixels If there is, the reference object 709 is converted into an image of vertical 240 × horizontal 320 pixels reduced to vertical and horizontal ½. If this size conversion is not executed, an object that is twice as long as that which is different from the intention of the BIFS source creator may be displayed.
[0069]
Note that the processing from step S82 to S85 is repeatedly executed until the BIFS source 703 is terminated by the determination at step S86. Further, the object conversion process (S85) may not be positioned immediately after the URL information extraction process (S84), but may be performed immediately before the BIFS data generation process (S87).
[0070]
When the BIFS source 703 ends, the BIFS encoder module 706 encodes and outputs BIFS data 713 from the BIFS source 703 in accordance with the interpretation of the BIFS source interpretation module 704 (S87).
[0071]
Finally, the MPEG-4 data generation module 715 uses the reference object data 714 converted from the reference object 709 and the valid MPEG-4 data from the BIFS data 713 to meet the intention of the BIFS source creator. 716 is generated (S88).
[0072]
Second Embodiment
The image processing according to the second embodiment of the present invention will be described below. Note that in the second embodiment, the same reference numerals as those in the first embodiment denote the same components, and a detailed description thereof will be omitted.
[0073]
In the second embodiment, a node name is defined by a reserved word that defines an arbitrary node name using a character string obtained by encoding expected object size information, and size information is described in the BIFS source in advance. Similar to the first embodiment, MPEG-4 data representing an object in a size intended by the creator of the BIFS source is generated.
[0074]
The second embodiment is the first embodiment in that the vertical and horizontal sizes of the expected object are obtained from the node name defined by the reserved word that defines an arbitrary node name, and the size of the external reference object is changed. It is similar to the above, but by encoding the size information to be described, the size information is concealed from the user and expressed in a method other than the vertical and horizontal sizes. A secondary effect that produces convenience such as flexible consideration is also expected.
[0075]
The second embodiment uses a character string encoded with expected size information of an object, defines a node name by “DEF”, and describes the size information in the BIFS source in advance, thereby creating the BIFS source creator. MPEG-4 data representing an object in a size intended by the.
[0076]
The second embodiment is different from the first embodiment in that the object size information is encoded, and the node name creation method and the size information extraction module 707 are different. Other than this, the configuration is the same as that of the first embodiment, and the MPEG-4 data generation apparatus can use the configurations of FIGS. 9 and 10 described in the first embodiment as they are.
[0077]
The node name creation method and the size information extraction module 707 processing different from the first embodiment will be described in detail below.
[0078]
FIG. 16 is a diagram illustrating an example of a BIFS source, and FIGS. 17 to 19 are diagrams illustrating examples of creating node names.
[0079]
The node name is created using a keyword, object size information, and a delimiter, but the object size information is encoded differently from the first embodiment.
[0080]
FIG. 17 is a flowchart illustrating an example of a method for encoding object size information. In this example, the vertical and horizontal numerical values representing the object size information are converted into a four-digit hexadecimal number (S91), and the horizontal size is the upper four digits and the vertical size is the lower four digits. (Corresponding character string) is created (S92) and encoding is performed.
[0081]
FIG. 18 is a diagram specifically illustrating this encoding. Numerical values “240” and “320” representing vertical 240 × 320 pixels are converted into four-digit hexadecimal numbers “00F0” and “0140”, respectively, and connected to create a character string “014000F0”.
[0082]
FIG. 19 is a diagram illustrating a process for creating a node name using the encoded object size information. In this example, the node name “PLACEHOLDER — 014000F0” is created using size information of vertical 240 × horizontal 320 pixels, the keyword “PLACEHOLDER”, and the delimiter “_”. It goes without saying that the same method as in the first embodiment can be used to uniquely determine the node name to be created.
[0083]
20 and 21 are diagrams for explaining the operation of the size information extraction module 707.
[0084]
The size information extraction module 707 decomposes the character string of the node name and extracts the size information of the object. The difference from the first embodiment is that a decoding process is necessary when extracting size information. The size information extraction module 707 extracts a character string indicating the size information from the character string of the node name, and then decodes the character string to extract the size information.
[0085]
FIG. 20 is a diagram showing a specific example of extracting size information from a node name. In this example, the size information extraction module 707 extracts the character string “014000F0” from the character string of the node name, and decodes this character string to obtain size information of 240 × 320 pixels. That is, the upper four digits of the extracted character string are converted into decimal numbers and interpreted as horizontal sizes (S71), and the lower four digits are converted into decimal numbers and interpreted as vertical sizes (S72).
[0086]
Although the method for converting decimal numbers into hexadecimal numbers has been described as an encoding method of object size information, the present invention is not limited to this. Even if a stronger encryption routine is used, the essence is not affected. By performing such encoding, the size expected of the object is not made clear to the user of the BIFS source, but it is concealed to protect the BIFS source creator as a copyrighted work. Is also expected. For example, the character string “014000F0” described above is not easy to read from the object size itself despite the simple processing.
[0087]
In addition, as described in the first embodiment, a secondary effect can be expected in terms of uniqueness of names. Simply, it is possible to retain uniqueness without changing the length of the character string by adding the sequence number to the character string in accordance with the order in which the definition names appear. In this case, when extracting size information from the BIFS source, the sequence number may be subtracted.
[0088]
Furthermore, although it is implicitly assumed that the vertical and horizontal sizes are used as the “size” expected by the encoding and decoding process, a combination of the vertical size and the aspect ratio may be used. . Alternatively, it may be an aspect ratio with respect to a predetermined absolute size. Details of examples using these designation methods are simple, and thus detailed description thereof is omitted.
[0089]
As described above, even if the size information of the object expected by the creator of the BIFS source is encoded in some form, it can be described in the BIFS source. Similarly to the first embodiment, the creation of the BIFS source is possible. MPEG-4 data representing an object in a size intended by a person can be generated.
[0090]
[Modification]
In the above embodiment, the relation between the URL of the object 709 referred to by the BIFS source 703 and the reference object data 714 after size conversion output from the object transcoder 712 to be replaced with it is not described at all. However, such association can be realized variously under the control of the application program. For example, this is easily realized by holding the correspondence list of the reference object data 714 to be replaced with the URL of the reference object 709 in some form. Alternatively, after extracting the URL information 711, the URL of the reference object 709 included in the data obtained by interpreting the BIFS source 703 may be changed to the URL of the converted reference object data 714. Alternatively, it can be realized by rewriting the URL of the reference object 709 included in the BIFS source 703. Furthermore, the URL does not necessarily indicate a file storing the object, but may be an ID that refers to the bit stream of the object. In such a case, there is a case where there is no contradiction even if the URL is not rewritten.
[0091]
It is also clear that the MPEG-4 data generation apparatus according to the present invention does not necessarily operate even if the various modules described above are not independent modules. The simplest example of this is evident in the BIFS source interpretation module 704 and the BIFS encoder module 706. These two modules can be mounted as a single module as long as the extraction function of the size information 708 and the URL information 711 is realized. The essence of the present invention is realized by performing the function of converting the reference object 709 included in the BIFS source 703 into a size intended by the BIFS creator and generating the reference object data 714 through a series of processes. Independence is not the essence.
[0092]
In this way, an object that has been created or referred to in advance by describing the expected width and height of the object in the BIFS source and interpreting it in a manner consistent with the definition of ISO / IEC 14496-1. Even if is an unresolved BIFS source, when the object is determined, the width and height expected for the referenced object can be used to make the representation fit the intention of the BIFS source creator. it can.
[0093]
Note that the MPEG-4 data generation apparatus and its module configuration are not necessarily limited to the field of application programs operating on a personal computer. The MPEG-4 data generation apparatus may be applicable to hardware devices such as a digital camera, a camcorder, a mobile phone, and a digital recorder.
[0094]
Furthermore, the present invention may be applied to a system composed of a plurality of devices (for example, a host computer, interface device, scanner, printer, etc.) or an apparatus composed of a single device (for example, a mobile phone). . Even if the modules constituting the MPEG-4 data generation apparatus are distributed to a plurality of devices, the plurality of devices can be used as long as they can communicate interactively by an appropriate communication function or data sharing. Configure. Alternatively, some of the plurality of devices may be located away from each other by the Internet or other communication functions.
[0095]
Another object of the present invention is to supply a storage medium (or recording medium) in which a program code of software that realizes the functions of the above-described embodiments is recorded to a system or apparatus, and the computer (or CPU or CPU) of the system or apparatus. Needless to say, this can also be achieved by the MPU) reading and executing the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present invention. Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an operating system (OS) running on the computer based on the instruction of the program code. It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the actual processing and the processing is included.
[0096]
Furthermore, after the program code read from the storage medium is written into a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function is based on the instruction of the program code. It goes without saying that the CPU or the like provided in the expansion card or the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.
[0097]
When the present invention is applied to the storage medium, the storage medium stores program codes corresponding to the flowcharts described above.
[0098]
【The invention's effect】
As described above, according to the present invention, it is possible to generate an object that meets the intention of the scene description.
[Brief description of the drawings]
FIG. 1 is a diagram showing a description example of a BIFS source based on VRML;
FIG. 2 is a diagram showing a description example of a BIFS source based on VRML;
FIG. 3 is a diagram showing a description example of a BIFS source based on VRML;
FIG. 4 is a diagram schematically showing a display effect expected as a result when the BIFS source shown in FIGS. 1 to 3 is encoded;
FIG. 5 is a diagram showing an example of a BIFS source in the first embodiment;
FIG. 6 is a diagram showing an example of a BIFS source;
FIG. 7 is a diagram for explaining a method for creating a node name in the BIFS source description method;
FIG. 8 is a diagram showing an example of a method for uniquely determining a node name;
FIG. 9 is a block diagram showing a configuration example of a computer device when an MPEG-4 data generation device is configured by software;
FIG. 10 is a block diagram illustrating a functional configuration example of an MPEG-4 data generation device;
FIG. 11 is a flowchart for explaining the operation of the MPEG-4 data generation apparatus;
FIG. 12 is a diagram showing an example of node name extraction processing;
FIG. 13 is a diagram showing an example of size information extraction processing;
FIG. 14 is a diagram showing an example of URL information extraction processing;
FIG. 15 is a diagram illustrating an example of a BIFS source in which a node defined by DEF refers to a node describing position information of an external reference object;
FIG. 16 is a diagram showing an example of a BIFS source;
FIG. 17 is a diagram for explaining an example of creating a node name;
FIG. 18 is a diagram for explaining an example of creating a node name;
FIG. 19 is a diagram for explaining an example of creating a node name;
FIG. 20 is a diagram for explaining the operation of the size information extraction module;
FIG. 21 is a diagram for explaining the operation of the size information extraction module;

Claims

An object generation method for generating an object related to object-based encoding,
From the scene description for encoding the scene, extract the node name defined by the reserved word that defines an arbitrary node name,
Extract information about the size of the object from the extracted node name and information indicating the position of the object's entity,
An object generation method characterized by changing the size of an object to be referred to based on the extracted position information and size information.

The object generation method according to claim 1, wherein the node name includes a character string that gives a size of the object.

The object generation method according to claim 1, wherein the node name includes a keyword character string, a character string that gives an object size, and a delimiter that separates the character strings.

The object generation method according to claim 1, wherein a character string for guaranteeing that the node name is unique is added to the node name.

The object generation method according to claim 1, wherein a part or all of the node name is encoded.

6. The object generating method according to claim 1, wherein the object-based encoding conforms to or conforms to MPEG-4.

A program for controlling an information processing apparatus to execute generation of an object according to any one of claims 1 to 6.

A recording medium in which the program according to claim 7 is recorded.

An object generation device for generating an object related to object-based encoding,
First extraction means for extracting a node name defined by a reserved word that defines an arbitrary node name from a scene description for encoding a scene;
Second extraction means for extracting information on the size of the object from the extracted node name, and information indicating the position of the entity of the object;
An object generating apparatus comprising: changing means for changing the size of an object to be referred to based on the extracted position information and size information.