JP2005004487A

JP2005004487A - Apparatus and method for processing surround image photographed on capture path

Info

Publication number: JP2005004487A
Application number: JP2003167396A
Authority: JP
Inventors: Frank Nielsen; フランクニールセン
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2003-06-12
Filing date: 2003-06-12
Publication date: 2005-01-06

Abstract

<P>PROBLEM TO BE SOLVED: To provide an apparatus and method capable of processing full view image data and effective sampling of surround image data that is obtained through the processing of the full view image data. <P>SOLUTION: An image processing apparatus includes a position detection unit for detecting a position at which the full view image data are captured, a spatial factor detection unit for calculating a spatial factor corresponding to the position detected, the spatial factor being related to a geometric structure of surround environment, of which the full view image is captured from the detected position; and a filter unit for causing reduction in processed image data, that is to be outputted from the image processing apparatus, according to the spatial factor calculated. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明はサラウンド画像を処理する装置及び方法に係り、より具体的には、パスに沿って移動する全方位マルチヘッドカメラで撮影された注釈付ビデオ（ａｎｎｏｔａｔｅｄｖｉｄｅｏ）を撮像、空間フィルタ処理及びビューイングするための装置及び方法に関する。
【０００２】
【従来の技術】
コンピュータグラフィックにおける迫真性の追求は果てしのない目標である。オブジェクト空間レンダリング・アルゴリズムが“現実世界”感覚には欠けるが驚くほど鮮明な画像を提供する一方、他方では逆レンダリング手段を使った画像ソースのモデリング及び解析による画像ベースレンダリング（Ｉｍａｇｅ−ＢａｓｅｄＲｅｎｄｅｒｉｎｇ：ＩＢＲ）が進み、拡張性や相互作用性には欠けるものの、画像空間において衆目を引き付けるような三次元環境を提供している。
【０００３】
Ｓ．Ｅ．Ｃｈｅｎによる写真品質の背景描写の紹介（“ＱｕｉｃｋｔｉｍｅＶＲ ‐ ＡｎＩｍａｇｅ−ＢａｓｅｄＡｐｐｒｏａｃｈｔｏＶｉｒｔｕａｌＥｎｖｉｒｏｎｍｅｎｔＮａｖｉｇａｔｉｏｎ” ＡＣＭＳＩＧＧＲＡＰＨ，ｐｐ．２９−３８，１９９５）、光線空間（Ｍ．Ｌｅｖｏｙ，Ｐ．Ｈａｎｒａｈａｎ， “Ｌｉｇｈｔｆｉｅｌｄｒｅｎｄｅｒｉｎｇ”，ＡＣＭＳＩＧＧＲＡＰＨ，ｐｐ．３１−４２，１９９６；Ｓ．Ｊ．Ｇｏｒｔｌｅｒ，Ｒ．Ｇｒｚｅｓｚｃｚｕｋ，Ｒ．Ｓｚｅｌｉｓｋｉ，Ｍ．Ｆ．Ｃｏｈｅｎ， “Ｔｈｅｌｕｍｉｇｒａｐｈ”，ＡＣＭＳＩＧＧＲＡＰＨ，ｐｐ．４３−５４，１９９６）、圧縮された光線空間平面（Ｗ．Ｃ．Ｃｈｅｎ，Ｊ．Ｙ．Ｂｏｕｇｕｅｔ，Ｍ．Ｈ．Ｃｈｕ，Ｒ．Ｇｒｚｅｓｚｃｚｕｋ， “ＬｉｇｈｔＦｉｅｌｄＭａｐｐｉｎｇ：ＥｆｆｉｃｉｅｎｔＲｅｐｒｅｓｅｎｔａｔｉｏｎａｎｄＨａｒｄｗａｒｅＲｅｎｄｅｒｉｎｇｏｆＳｕｒｆａｃｅＬｉｇｈｔＦｉｅｌｄｓ”，ＡＣＭＴｒａｎｓａｃｔｉｏｎｓｏｎＧｒａｐｈｉｃｓ．２１（３），ｐｐ．４４７−４５６，２００２）、及び、静的環境における写真品質の仮想通り抜け（Ｄ．Ｇ．Ａｌｉａｇａ，Ｔ．Ｆｕｎｋｈｏｕｓｅｒ，Ｄ．Ｙａｎｏｖｓｋｙ，Ｉ．Ｃａｒｌｂｏｍ， “ＳｅａＯｆＩｍａｇｅｓ”，ＩＥＥＥＶｉｓｕａｌｉｚａｔｉｏｎ，ｐｐ．３３１−３３８，２００２）により、ＩＢＲは一般的になっている。未だその揺籃期にはあるものの、多くの場合そのアルゴリズムは、最初パノラマフレームの位置参照（ｇｅｏ−ｒｅｆｅｒｅｎｃｅ）を行い、続いて特徴一致処理（ｆｅａｔｕｒｅｍａｔｃｈｉｎｇ）に基づきソース画像を歪ませて組み合わせ、新しい視点を合成するように進む。
【０００４】
本出願と同一出願人により出願された日本特許公開２００３−１４１５６２号公報には、三次元座標にマッピングされた円筒あるいは球状全方位画像の圧縮、蓄積及び再生に適用できる画像処理装置及び方法が開示されている。
【０００５】
【発明が解決しようとする課題】
仮想通り抜け（ＶｉｒｔｕａｌＷａｌｋｔｈｒｏｕｇｈ）システムの用途によっては、仮想サラウンド環境（Ｖｉｒｔｕａｌｓｕｒｒｏｕｎｄｅｎｖｉｒｏｎｍｅｎｔ）をレンダリングするために、ビルのような屋内環境の全領域を撮影することが必要になる。例えば、ビルの屋内環境を撮像するために全方位ビデオカメラを使用した場合、全方位ビデオカメラが毎秒６０フレームを撮影し、ビル内を移動するのに６０分かかるとすれば、画像フレームの総枚数は２１６０００枚に達する。
【０００６】
また、撮像されたサラウンド画像の中には、その他のものに比べてあまり新しい情報をもたらさないものが存在する。例えば、壁で囲まれた廊下の途中を移動している場合、そのサラウンド環境には視認可能な大きな変化はあまり存在しない。このような場所での仮想通り抜けシステムの廊下のレンダリングには、廊下の回り角あるいは部屋の入口等、より多くの環境変化が見られるビルのその他の場所と比べて、より少ない全方位画像データが必要とされる。
【０００７】
本発明は上記状況を考慮して構想されたものである。全方位画像データを処理でき、かつ、全方位画像データを処理して得られるサラウンド画像のデータサイズを効率的に削減できる装置及び方法を提供することが望ましい。
【０００８】
さらに、全方位画像データをサンプリング／フィルタ処理することができる、あるいは、アプリケーションで使用されるサラウンド画像データを適切に圧縮することができる装置及び方法を提供することが望ましい。
【０００９】
【課題を解決するための手段】
本発明の一実施形態によれば、全方位画像データ処理用の画像処理装置が提供される。本画像処理装置は、全方位画像が撮像された位置を検出する位置検出部と；サラウンド環境の幾何学的構成に関するものであり、それについて該検出位置から全方位画像が撮像される、該検出された位置に対応する空間係数を算出する空間係数検出部と；該算出された空間係数に基づき本画像処理装置から出力されるべき処理済み画像データの縮減を行わせるフィルタ部とを備える。
【００１０】
空間係数は、全方位画像が撮像されるパスの曲線長（ｃｕｒｖｅｌｅｎｇｔｈ）であっても良く、あるいは該パス内で算出された複数のビジビリティ・エンベロープ（ｖｉｓｉｂｉｌｉｔｙｅｎｖｅｌｏｐｅｓ）又はセル（ｃｅｌｌｓ）に基づき決定されてもよい。あるいは、空間係数が、上記パス内で算出されるビジビリティ多面体の集合（ａｓｅｔｏｆｐｏｌｙｈｅｄｒａ）に対応するサラウンド画像のテクスチャのパラメータ表現（ｐａｒａｍｅｔｅｒｉｚａｔｉｏｎ）に応じて決定されてもよい。
【００１１】
フィルタ部は、空間係数に応じて、全方位画像のサンプリングレートの決定あるいは全方位画像の選択をしてもよい。あるいは、フィルタ部が、空間係数に応じて、処理済み画像データを圧縮してもよい。
【００１２】
さらに、上記画像処理装置における処理では、処理済画像に重ね合わせるイベント・トリガー領域の追加を含んでいてもよい。イベント・トリガー領域は、全方位画像データの処理により得られたサラウンド画像の予め定めた位置に配置され、該領域がユーザ操作により選択されると予め定めたイベントがトリガーされるよう構成される。さらに本処理では、全方位画像データの処理により得られたサラウンド画像上への、動画テクスチャ及び／又は三次元オブジェクトのマッピングが含まれていてもよい。
【００１３】
上記画像処理装置には全方位画像撮像部が設けられていても良い。
【００１４】
本発明の他の実施形態によれば、全方位画像データ処理用の画像処理方法あるいはコンピュータプログラムが提供される。本画像処理法あるいはコンピュータプログラムは、全方位画像が撮像された位置を検出し；サラウンド環境の幾何学的構成に関連し、それについて該検出された位置から全方位画像が撮像される、該検出した位置に対応する空間係数を算出し；該算出した空間係数に基づき当該画像処理装置から出力されるべき処理済み画像データの縮減を行わせるステップを含む。
【００１５】
【発明の実施の形態】
本発明の一実施形態による装置を図面を参照して説明する。
【００１６】
図１は全方位カメラ（全方向カメラ）１０の一例を示す。全方位カメラ１０は１２個の五角形面からなる略正十二面体形状のフレームと、異なる面上に別々に積載された１１個のカメラとを含む。各カメラはサラウンド場面内での対応領域を撮影し、サラウンド画像の一部としてデータ出力する。これらの画像部分を張り合わせることにより、全天球型の全方位画像が得られる。
【００１７】
図２は全方位カメラ１０と全方位画像処理装置１００とを含むシステムの模式図を示す。図２に示すように、全方位カメラ１０は、複数のカメラ１１に加えて、複数のＶＴＲ１２とマイク１３とを備え、複数のビデオストリームと音声信号を記録する。記録されたビデオストリームは切換器１４によりビデオキャプチャーされ、コンピュータデータ（例えば、ビットマップファイル）として画像処理装置１００へ出力される。画像処理装置１００は、次の画像処理のための撮像画像の準備、画像貼り付け、処理済み画像の蓄積等の画像処理を行う。
【００１８】
さらに本発明において上記装置１００は、撮像した画像の重要度に応じて、撮像した画像のフィルタ処理又は処理済み画像の圧縮を行う。本実施形態においては、上記重要度は全方位カメラ１０のパスに沿って算出された空間係数により測られる。本実施形態によるフィルタ処理／圧縮方法の詳細は後述する。
【００１９】
画像処理装置１００は図３に示す構成を備えるコンピュータシステムにより実現できる。本装置１００は、ＣＰＵ１０１と、メモリ１０２と、表示コントローラ１０３と、入力装置インタフェース１０４と、ネットワークインタフェース１０５と、外部装置インタフェース１０７と、バス１０８と、ビデオキャプチャカード１０９と、表示装置１１１と、キーボード１１２と、マウス１１３と、ハードデスク駆動装置１１４と、メデアドライブ１１５とを含む。
【００２０】
画像処理装置１００は、画像処理用の多様なアプリケーションのダウンロードや、ビデオキャプチャカード１０９を介して受信する代わりに全方位画像をダウンロードしたり、あるいは処理済み画像データをネットワーク上で配信するために、ネットワークインタフェース１０５を介してＬＡＮあるいはインターネット１２０に接続されていても良い。
【００２１】
上記ＣＰＵ１０１はビデオキャプチャカード１０９を介して全方位カメラ１０から出力された複数画像の貼り付けや、空間係数に従っての画像フィルタ処理／圧縮等、多様なアプリケーションを実行する。
【００２２】
図４は、複数の撮像された画像を処理してサラウンド画像を算出する操作モード中のＣＰＵ１０１の機能模式図の一例を示す。本操作モードにおいて、複数の撮像された画像データからなる全方位画像データが全方位カメラ１０からサラウンド画像計算部４０１へ提供される。
【００２３】
サラウンド画像計算部４０１は全方位画像データからパノラマ画像あるいはサラウンド画像を計算し出力する。コンピュータグラフィック（ＣＧ）スクリプトを使用する場合には、この部４０１は該ＣＧスクリプトから直接サラウンド画像を算出する。
【００２４】
位置検出部４０２はカメラ１０の位置決めのため、パノラマ画像を使ってエゴモーション・リカバリ（ｅｇｏｍｏｔｉｏｎｒｅｃｏｖｅｒｙ）を行う。エゴモーション・リカバリ処理は、一定高さのカメラを使用することでかなり簡略化できる。エゴモーション・リカバリの詳細説明は後述する。
【００２５】
カメラ１０の位置は空間係数検出部４０３へ出力される。空間係数検出部４０３は位置検出部４０２により検出された位置における空間係数を検出する。本発明においては、空間係数はパノラマ画像が撮影されたサラウンド環境の重要性（有意性）レベルを計測するために導入された。空間係数は、例えば、（ａ）曲線長さ単位、（ｂ）ビジビリティ・セル、（ｃ）ビジビリティ・エンベロープ、あるいは（ｄ）テクスチャのパラメータ表現、に基づき算出される。ビジビリティ・セルあるいはエンベロープを使う場合、パスの全ての位置に対する容積及び面積属性が算出される。エンベロープの組合せ変更はビジビリティ図（ｖｉｓｉｂｉｌｉｔｙｇｒａｐｈ）（図８を参照）により定義される臨界的なビジビリティ・ベントの箇所だけで発生する。ビジビリティ・セル及びエンベロープの詳細説明は本明細書の後半部分で行う。
【００２６】
フィルタ処理部４０４は、空間係数検出部４０３において算出された空間係数に基づき、サラウンド画像のフィルタ処理を行う。例えば、最初にサラウンド画像を多めにサンプリングして、次にフィルタ処理部４０４で削減させてもよい。あるいは、フィルタ処理の代わりに、全方位カメラ１０から提供される全方位画像データのサンプリングするタイミングを空間係数に基づき制御するようにしてもよい。更には、各全方位画像を空間係数に応じて重み付けし、該重み付けに応じて他の画像と一緒に圧縮するようにしてもよい。
【００２７】
次に、本発明の他の実施形態による方法を説明する。本方法は、パスに沿って移動する全方位マルチヘッドカメラで撮影した注釈付きビデオを撮像、空間的にフィルタ処理し、ビューイングするステップを含んでいる。
【００２８】
以下の章節において、全方位カメラの軌道パス算出用に設計されたエゴモーション・リカバリ・アルゴリズム、及び幾何学的ビジビリティ・ベントに基づくプレノプティックパス（ｐｌｅｎｏｐｔｉｃｐａｔｈ）の効率的サンプリングについて詳細に説明する。適切なサンプリングによりパノラマ画像をフィルタし圧縮することが可能になり、冗長が避けられ、画像データベース内のメモリ領域の節約が可能になる。また、屋内撮影により得られた、あるいは、コンピュータグラフィックにより完全にレンダリングされたプレノプティックパスの幾つかのアプリケーション及び結果を説明する。
【００２９】
１．プレノプティック関数及びパス
プレノプティックの概念（Ｍ．Ｌｅｖｏｙ，Ｐ．Ｈａｎｒａｈａｎ． “Ｌｉｇｈｔｆｉｅｌｄｒｅｎｄｅｒｉｎｇ”，ＡＣＭＳＩＧＧＲＡＰＨ，ｐｐ．３１−４２，１９９６；Ｓ．Ｊ．Ｇｏｒｔｌｅｒ，Ｒ．Ｇｒｚｅｓｚｃｚｕｋ，Ｒ．Ｓｚｅｌｉｓｋｉ，Ｍ．Ｆ．Ｃｏｈｅｎ， “ＴｈｅＬｕｍｉｇｒａｐｈ”，ＡＣＭＳＩＧＧＲＡＰＨ，ｐｐ．４３−５４，１９９６）は、任意の時間ｔ、任意の方向（θ，φ）及び任意の波長λのスペクトル応答において、三次元空間Ｅ^３での各デカルト座標点（Ｘ，Ｙ，Ｚ）と関連する７次関数Ｌ（・）＝Ｌ（Ｘ，Ｙ，Ｚ，θ，φ，ｔ，λ）を把握できれば、三次元の幾何学的モデリングをバイパスすることが可能となり、プレノプティック関数Ｌ（・）から直接“レンダリング”することによりインタラクテイブな通り抜け（ｗａｌｋｔｈｒｏｕｇｈ）を提供することができる、という観察に基づいている。
【００３０】
実用上は、時間を凍結し（即ち、静的環境を考える）、一つの波長（一つのカラーチャンネル、例えば赤色）を選択し、レイ・サンプリング（Ｘ，Ｙ，Ｚ，θ，φ）をカメラパスに限定することで、この関数の高次元性を緩和する、即ち一次元プレノプティックパスＰ＝｛Ｐ_ｉ＝（ｘ，ｙ，ｚ）_ｉ｝に限定することができる。
【００３１】
パス沿いに全方位カメラを移動させることで、強制的にＰのサンプリングが実施でき、これによってＰを点ｐの集合Ｐ＝｛Ｐ_ｉ｝_{ｉ＝１，，ｐ}として離散化する。Ｌ_｜Ｐ（・）の３次元不連続サンプリングから、逆レンダリング問題（例えば、マクロ／メソ（Ｍａｃｒｏ／Ｍｅｓｏ）幾何及びテクスチャ属性、照明条件等の解明）を考慮するか、あるいは、視野合成（ｖｉｅｗｓｙｎｔｈｅｓｉｓ）のための関数の外挿処理を進めるかの何れかが可能になる。大量のサンプリング撮影は時間がかかり面倒であるため、全方位カメラを備えるパノラマヘッドを移動させる動力ロボットを使ってこの仕事をさせてもよい。このようなシステムの事例は次の文献に提示されている（Ｄ．Ｇ．Ａｌｉａｇａ，Ｔ．Ｆｕｎｋｈｏｕｓｅｒ，Ｄ．Ｙａｎｏｖｓｋｙ，Ｉ．Ｃａｒｌｂｏｍ， “ＳｅａＯｆＩｍａｇｅｓ”，ＩＥＥＥＶｉｓｕａｌｉｚａｔｉｏｎ，ｐｐ．３３１−３３８，２００２；Ｄ．Ｇ．Ａｌｉａｇａ，Ｄ．Ｙａｎｏｖｓｋｙ，Ｔ．Ｆｕｎｋｈｏｕｓｅｒ，Ｉ．Ｃａｒｌｂｏｍ， “ＩｎｔｅｒａｃｔｉｖｅＩｍａｇｅ−ＢａｓｅｄｒｅｎｄｅｒｉｎｇＵｓｉｎｇＦｅａｔｕｒｅＧｌｏｂａｌｉｚａｔｉｏｎ”，ＡＣＭＳｙｍｐ．Ｏｎ３ＤＧｒａｐｈｉｃｓ，２００３）。
【００３２】
２．プレノプティックパスの捕捉
２．１マルチヘッドカメラ
全方位カメラとして、図１に示すようなマルチヘッドカメラが全天球ビデオ撮影のために使用できる。パノラマヘッドは、略同一の光学的中心点（即ち、節点（ｎｏｄａｌｐｏｉｎｔ））を共有する１０個のＣＣＤＮＴＳＣブロックカメラからなり、互いに重なり合う視野の全方位画（４πステラジアン角度）を毎秒６０枚のインターレースフレームで撮影する。
【００３３】
本実施形態では全方位カメラが使用されているが、必ずしも４πステラジアン角を網羅する完全な全方位カメラを備える必要はない。最終的なアプリケーション次第では、本発明で使用されるカメラあるいは画像データがカバーする角度範囲は４πステラジアン角以下であってもよい。
【００３４】
カメラ及び記録システムを運搬車に搭載し、バッテリーで駆動することで、自由にサラウンド環境を撮影してもよい。その運搬車にはモータと駆動機構を備えるか、あるいは、人間の手で押してもよい。サラウンド環境マップの縫い合わせ、ビューイング及び符号化のアルゴリズムはＦ．Ｎｉｅｌｓｅｎによる次の文献に詳細に記載されている（“ＨｉｇｈＲｅｓｏｌｕｔｉｏｎＦｕｌｌＳｐｈｅｒｉｃａｌＶｉｄｅｏｓ”，ＩＥＥＥＩｎｔｌ．Ｃｏｎｆ．ｏｎＩｎｆｏｒｍａｔｉｏｎＴｅｃｈｎｏｌｏｇｙ；ＣｏｄｉｎｇａｎｄＣｏｍｐｕｔｉｎｇ，ｐｐ．２６０−２６７，２００２）。ｐ∝１００００のオーダの高解像度パノラマフレームＩ_ｉ、ｉ∈｛１，・・・，ｐ｝が記録される。例えば、２０４８×１０２４画像寸法を正矩形（ｅｑｕｉｒｅｃｔａｎｇｕｌａｒ）パノラマとして使用してもよい。図５（ａ）及び図５（ｂ）は、マルチヘッドカメラで撮影した全方位画像データから生成した正六面体サラウンド環境マップと正矩形サラウンド環境マップとの例を示す。
【００３５】
２．２コンピュータグラフィックス
また、レイ・トレーシングあるいはラジオシティソフトを使って全方位画像及びプレノプティックパスを算出してもよい。例えば、そのソースコードが開示されており、かなりなＣＧスクリプトも入手可能であるため、ＰＯＶ−Ｒａｙ（商標）を使ってサラウンド画像を算出してもよい。即ち、ＣＧ画像の各画素（ｘ、ｙ）に対しその対応する角座標（θ、ψ）をマッピングする全単射写像関数（ｂｉｊｅｃｔｉｖｅｆｕｎｃｔｉｏｎ）を定義して、我々の画像フォーマットを出力するためにファイル“ｒｅｎｄｅｒ．Ｃｐｐ”の工程“ｃｒｅａｔｅ＿ｒａｙ”を変更する。図１８に示すサラウンド画像の例は、サイトＩｎｔｅｒｎｅｔＲａｙＴｒａｃｉｎｇＣｏｍｐｅｔｉｔｉｏｎ（ＩＲＴＣ）から入手したＣＧスクリプトから算出された。
【００３６】
コンピュータグラフィックス産業界で使用されているＡｌｉａｓＷａｖｅｆｒｏｎｔＭａｙａ（商標）あるいはＤｉｓｃｒｅｅｔ３ＤＭＡＸ（商標）のような従来のツールも、それらのレンダリング装置あるいはＡＰＩを使って、サラウンド画像を出力するのに使用できる。
【００３７】
ＣＧ画像は完全な仮想サラウンドカメラ（即ち、装置間の物理的干渉がなく、実際のカメラ装置も必要としない）と正確な像（一定の照明で、視差がなく、ノイズも無く、振動等もない）を生成するという特徴があり、ＣＧ画像を使っての作業は相互性能評価（ベンチマーク）等には有用である。
【００３８】
３．エゴモーション・リカバリ（ｅｇｏｍｏｔｉｏｎｒｅｃｏｖｅｒｙ）
本実施形態によるカメラの外的（ｅｘｔｒｉｎｓｉｃ）位置を再捕捉（ｒｅｃｏｖｅｒｙ）するためのアルゴリズムを説明する。パノラマ画像には内的（ｉｎｓｔｒｉｎｓｉｃ）パラメータが無いため、結局我々は、空間的にインデックス付けされたパノラマ画像のユークリッドパス（倍率係数まで定義される）を得ることになる。
【００３９】
屋外環境のような大規模パノラマパスのビデオに、概略的な注釈を加えるのには、ＧＰＳシステムを使用してもよい。但し、一般的なＧＰＳシステムはあまりにも粗い位置しか与えられないので視野合成には使用できない（Ｍ．Ｈｉｒｏｓｅ， “ＳｐａｃｅＲｅｃｏｒｄｉｎｇＵｓｉｎｇＡｕｇｍｅｎｔｅｄＶｉｒｔｕａｌｉｔｙＴｅｃｈｎｏｌｏｇｙ”，Ｉｎｔｌ．ＭｉｘｅｄＲｅａｌｉｔｙＳｙｍｐ．，ｐｐ１０５−１１０，２００１；Ｄ．Ｋｉｍｂｅｒ，Ｊ．Ｆｏｏｔｅ，Ｓ．Ｌｅｒｔｓｉｔｈｉｃｈａｉ， “ＦｌｙＡｂｏｕｔ；ＳｐａｔｉａｌｌｙＩｎｄｅｘｅｄＰａｎｏｒａｍｉｃＶｉｄｅｏ”，ＡＣＭＭｕｌｔｉｍｅｄｉａ２００１，ｐｐ．３３９−３４１，２００１）。視覚効果を生むために仮想及び現実のカメラパスを一致させなければならない、移動一致を必要とする産業界（ｍａｔｃｈｍｏｖｉｎｇｉｎｄｕｓｔｒｙ）において、ビジョン・アルゴリズムが有用であるのが最近証明された。
【００４０】
特徴トラッキングに基づき、全方位画像の位置タグ（ｘ、ｙ、θ）_ｉ、ｉ∈｛１，・・・，ｐ｝が算出される。以下に、全方位カメラが固定高さの面の上を移動するように限定された場合の簡易かつ高速なグローバルエゴモーションアルゴリズムについて説明する。本エゴモーションアルゴリズムによれば、パス上での位置がサラウンド画像に登録される。
【００４１】
本アルゴリズムは次のステップを含む。
・概略回転シーケンスθ_ｉの算出（画素をベースとした方法）、
・概略並行移動シーケンス（ｘ、ｙ）_ｉの算出（特徴をベースとした方法）、
・初期推定値に基づく密度の高い広域最適化を実施することによる（ｘ、ｙ、θ）_ｉの微調整（全てのパラメータを同時に適切に考慮した特徴をベースとした方法：相対パス）、
・２又はそれ以上のランドマークを用いてのパスの固定化（絶対パス）。
【００４２】
我々はトラッキングするランドマークを指示する必要も無く、パスを事前に初期設定する必要も無いため、捕捉システムが柔軟で、拡張可能に、かつ、使用し易くなる。
【００４３】
３．１粗方位測定
絶対方位はパノラマ画像の“北”を指す。簡単ではあるが、本アルゴリズムは特徴マッチングを行わなくとも十分に機能する。例えば、複数の画像が寸法ｗ×ｈの正矩形フォーマット（緯度−経度とも呼ばれる）の場合、各パノラマフレームＩ_ｉに対して、各カラム画素は平均化されて対応する一次元のリング画像Ｒ_ｉとなる。
【００４４】
次に、リング画像Ｒ_ｉ、Ｒ_ｉ＋１は連続的に登録されて方位シフト（結果的にはサブ画素の精度で（Ｂ．Ｄ．Ｌｕｃａｓ，Ｔ．Ｋａｎａｄｅ， “ＡｎｉｔｅｒａｔｉｖｅＩｍａｇｅＲｅｇｉｓｔｒａｔｉｏｎＴｅｃｈｎｉｑｕｅｗｉｔｈａｎｄＡｐｐｌｉｃａｔｉｏｎｔｏＳｔｅｒｅｏＶｉｓｉｏｎ，” ７ｔｈＩｎｔｌＪｏｉｎｔＣｏｎｆｅｒｅｎｃｅｏｎＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ（ＩＪＣＡＩ），ｐｐ．６７６−６７９，１９８１））を得る。リング画像内の画素単位が２π／ωラジアンのシフトに対応。画素に対応した垂直方向に内在する角度長に従って画素の重み付けを行うことにより、限定された緯度範囲内でのカラムの平均が可能となる。
【００４５】
３．２粗並行移動
一旦、画像の略方位が決定されると、シーケンス（ｘ、ｙ）_ｉが決定される。モーションアルゴリズムでの殆どの構成と同様に、ユークリッド３次元点集合が再構成される。先ず、一次元並行移動がまとめて以下のように算出される。最初、方位があまり大きく変わらないところでは、（調整されたしきい値を用いて）そのパスがセグメント（連続画像シーケンス）に区分される。セグメント長λ_ｉが未だ算出されていない場合にはポリラインが定義される。算出アルゴリズムの詳細は図８を参照して後述する。
【００４６】
まとまった長さｋ＋１の両端の画像Ｉ_ｄ及びＩ_ｄ＋ｋについて、Ｉ_ｄ、…、Ｉ_ｄ＋ｋ（図６を参照）で共通に追跡される特徴から並行移動パラメータλ_ｄが標準数値解析法を用いて算出される。極からデカルトへの変換を行うことにより（θ、λ）_ｉからシーケンス（ｘ、ｙ）_ｉが求められる。図６（ａ）−６（ｃ）はプレノプティックパスシーケンスから追跡された、画像中にマーキングされた特徴の例を示す。
【００４７】
３．３パラメータ微調整
シーケンス（ｘ、ｙ、θ）_ｉは次の文献に記載された方法と同様な方法で回転及び並行移動を適切に相関付けることにより数値的に改善される（Ｃ．Ｊ．Ｔａｙｌｏｒ， “ＶｉｄｅｏＰｌｕｓ；ＡＭｅｔｈｏｄｆｏｒＣａｐｔｕｒｉｎｇｔｈｅＳｔｒｕｃｔｕｒｅａｎｄＡｐｐｅａｒａｎｃｅｏｆＩｍｍｅｒｓｉｖｅＥｎｖｉｒｏｎｍｅｎｔｓ”，ＩＥＥＥＴｒａｎｓ．ｏｎＶｉｓｕａｌｉｚａｔｉｏｎａｎｄＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ，Ｖｏｌ．８（２），ｐｐ．１７１−１８２，２００２；Ｍ．Ａｎｔｏｎｅ，Ｓ．Ｔｅｌｌｅｒ， “ＳｃａｌａｂｌｅＥｘｔｒｉｎｓｉｃＣａｌｉｂｒａｔｉｏｎｏｆＯｍｎｉ−ＤｉｒｅｃｔｉｏｎａｌＩｍａｇｅＮｅｔｗｏｒｋｓ”，Ｉｎｔｌ．ＪｏｕｒｎａｌｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，Ｖｏｌ．４９（２／３），ｐｐ．１４３−１７４，２００２）。ビジビリティ合成アプリケーションにおいて重要ではあるが、ここではパスに沿って発生する組合せイベント（ｃｏｍｂｉｎａｔｏｒｉａｌｅｖｅｎｔ）の概略分析が行われており、この最終ステップはフィルタ処理プロセスを大きく改善するものではない。
【００４８】
３．４絶対位置決定
ユーザが設定する２つ以上のランドマークを使って（例えば、基準フロア地図を使って）、フロア地図、概略再構成等により提供された利用可能な幾何学的情報の大きさ及び原点に一致するよう、回転及び並行移動パラメータを用いてパスが固定される。
【００４９】
本発明においては、エゴモーション・リカバリは上記の方法に限定されるものではなく、その方法が全てのサラウンド画像の基準角（例えば北極）及び（ｘ、ｙ）位置を定義できるものであれば、物理的装置、視点追跡装置、基準線（ｆｉｄｕｃｉａｌ）、距離計等を用いたその他の方法を用いてもよい。注意すべきは、このステップはコンピュータグラフィックススクリプトには不要である点である。なぜならば各ＣＧサラウンド画像は予め定めた位置ごとに算出されるからである。
【００５０】
４．プレノプティックパスの空間フィルタ処理
一旦、プレノプティックパスが求められると、“冗長”画像（新たな情報をそれ程もたらさない画像）やあまり重要でない画像を取り除けるように、該プレノプティックパスは適切にサンプリング／注釈付けが行われる。そのサンプリングはプレノプティックパス等のプログレッシブ・コーデングにも有用である。
【００５１】
プレノプティック関数のサンプリングはＣｈａｉ等によって研究された（Ｊ．−Ｘ．Ｃｈａｉ，Ｈ．−Ｙ．Ｓｈｕｍ，Ｘ．Ｔ． “Ｐｌｅｎｏｐｔｉｃｓａｍｐｌｉｎｇ”，ＡＣＭＳＩＧＧＲＡＰＨ，ｐｐ．３０７−３１８，２０００）。Ｃｈａｉ等は光照射野レンダリング用の最小サンプリングレート決定のためにスペクトル解析を用いてＬ（・）のサンプリング法を研究した。一方、本発明はプレノプティック関数の部分集合であり、かつ幾何学的分割が行われた、プレノプティックパスに対して特に適合させたものである。
【００５２】
サラウンド画像の数を削減する一つの方法は、曲線長に比例して、パスＰに沿って視点Ｐ_ｉを選択／配分することである（パス長のパラメータ表現）。この方法は幾何学的情報が利用できない場合には効率的である。ｌがパスの長さＰを示すものとすると、ｌはサラウンド画像の相対並行移動パラメータから、Σ_ｉ｛（ｔｘ_ｉ＋１−ｔｘ_ｉ）^２＋（ｔｙ_ｉ＋１−ｔｙ_ｉ）^２｝^１／２を持つとして、定義される。
【００５３】
幾何学的情報が利用できない場合には、ｌに従ってサンプリングすることが好ましい。例えば、記録されたｎ個のサラウンド画像の中からｍ個の部分集合画像を選択する必要がある場合、パスＰはｍ個の等しい長さの間隔に分割されて、各間隔内において一つのサラウンド画像が選択される。そうすることにより、捕捉中に発生した非均一的な人為的影響に対する補正が可能になる。例えば、全方位カメラの運搬車の速度を増せば粗いデータが得られ、他方、速度を落とせばサンプリングレートが増加する。
【００５４】
サラウンド画像の数を削減するもう一つの方法はビジビリティセル（ｖｉｓｉｂｉｌｉｔｙｃｅｌｌ）を用いることである。以下、Ｐに沿って発生する組合せイベントの幾何学解析手法を紹介する。
【００５５】
Ｆが“自由”空間を表す、すなわち、その空間が場面Ｓ＝｛Ｓ_１，．．．，Ｓ_ｎ｝のｎ個のオブジェクトの何れによっても遮られていないものとする。Ｆ＝Ｅ^３＼∪^ｎ _ｉ＝１Ｓ_ｉ（Ｅ^３はユークリッド３次元空間）。ν⊆Ｆを“視認可能な”空間、即ちユーザが相互作用を行いながら移動できる自由空間の一部とする。位置Ｐ∈νが与えられると、ε（Ｐ）はＰを囲む下限エンベロープを示すものとする（幾何学用語に関しては、Ｊ．Ｄ．Ｂｏｉｓｓｏｎｎａｔ，Ｍ．Ｙｖｉｎｅｃ著 “Ａｌｇｏｒｉｔｈｍｉｃｇｅｏｍｅｔｒｙ”，ＣａｍｂｒｉｄｇｅＵｎｉｖｅｒｓｉｔｙＰｒｅｓｓ，１９９８を参照）。即ち、ε（Ｐ）は、ある与えられた（θ、φ）角座標に対し、位置Ｐから方向（θ、φ）に発する光線が最初にヒットするオブジェクトＳまでの距離、そのオブジェクトが存在する場合、その距離を求める動径関数ｒ（θ、φ）として定義される。なお、ε（・）は必ずしも連続である必要はない。
【００５６】
図７はエンベロープε（Ｏ）の例を示すもので、Ｏは中心を示している。太い実線７００は場面（ｓｃｅｎｅｓ）を記述するポリラインである。点線星型形状の多角形は中心Ｏから生ずるエンベロープε（Ｏ）である。プレノプティックパス内の位置の移動に伴ってエンベロープε（・）は変化する。例えば図９（ａ）及び９（ｂ）はエンベロープε（・）の変化を示す。図の陰影部分はビル９００内のプレノプティックパスの異なる位置でのエンベロープε（・）を示す。
【００５７】
Ｐを僅かに移動させると、ビジビリティ・イベント（閉鎖（ｏｃｃｌｕｓｉｏｎ）／非閉鎖（ｄｉｓｏｃｃｌｕｓｉｏｎ））で定義された臨界的な組合せイベントに到達するまで、エンベロープε（Ｐ）はなめらかに変化する。Ａ（Ｓ）をビジビリティ・セル要素へのνの分割であるとする。図８はビジビリティ図とそのセル分解を示す模式図である。図中、数字８００はビジビリティ・セルの一つを示す。
【００５８】
ＳがＥ^３の三角形ｎ個からなるとすると、Ａ（Ｓ）はＯ（ｎ^９）の複雑さを持つ。しかし、直線で切断された２次元セルに限定すると、その複雑さはＯ（ｎ^３）まで下がり、御しやすくなる。ゾーン理論（Ｚｏｎｅｔｈｅｏｒｅｍ）（Ｊ．Ｄ．Ｂｏｉｓｓｓｏｎｎａｔ，Ｍ．Ｙｖｉｎｅｃ，
“Ａｌｇｏｒｉｔｈｍｉｃｇｅｏｍｅｔｒｙ”，ＣａｍｂｒｉｄｇｅＵｎｉｖｅｒｓｉｔｙＰｒｅｓｓ，１９９８）によれば，直線でカットされた全てのエンベロープの組合せのサイズはＯ（ｎ^３）になる。
【００５９】
ここでビジビリティ・セルが最小の“幅”を持つように制限されると、その複雑さは線形、すなわち、上記パス長に比例することに注目すべきである。屋内撮影に対して、ビルのフロア地図はしばしば図面交換フォーマット（ＤＸＦフォーマット）で利用であり、図８に示すようなビジビリティ図が算出できるようになる。ここで、小さいｎに対しては、素二次方程式アルゴリズム（ｎａｉｖｅｑｕａｄｒａｔｉｃａｌｇｏｒｉｔｈｍ）を適用してポリラインに対する制限を算出できることに注目すべきである。体積ｖ、長さｌでそれと交差するプレノプティックパス長を持つ、あるビジビリティ・セルに対しては、このセル内のパスは比率ｌ^３／ｖに従ってサンプリングしてもよい。
【００６０】
更に、ビジビリティ情報が大きく変わる部分では（即ち、一つの大きなビジビリティセルから他のセルに移動する場合）、サンプリングレートを局所的に増やすことで、移行がよりスムースになるようにしてもよい。（例えば、ある壁面が現在の視点の接線となるような場合、カメラが少しでも並行移動すると、その壁面の一方があきらかになるため、我々はより多くのサンプルを必要とする。）
このようにして画像データベースは、その意味合い（ｓｅｍａｎｔｉｃｓ）を維持したまま、一桁あるいは二桁のオーダーで削減できる。
【００６１】
極めて大きなｎの、非自明（ｎｏｎ−ｔｒｉｖｉａｌ）なビジビリティ図を持つ複雑なＣＧスクリプトに対してもやはり、現状のビジビリティセル容積ｖが以下のように概略推定される。
【００６２】
各画素の色及び深さ情報ｄ_ｉ，ｊが格納されている、ＲＧＢＺ環境画像の各画素ｅ_ｉ，ｊ、は対応する立体角ａ_ｉ，ｊを張る。その立体角は全単位球を小区分するから、それらの区分は単純加算されてｖ＝（１／４π）Σ_ｉ，ｊａ_ｉ，ｊｄ_ｉ，ｊを得る。
【００６３】
容積が大きく変化する場合、これは視点が一つのビジビリティ・セルから他のビジビリティ・セルへの移動を意味するが、ある与えられたプレノプティックパスに沿って最初の組合せイベントが検索され、サンプリングされる（即ちＰをレンダリングする）。ＣＧスクリプトに対するサンプリングは、次の最善の視点がそのプレノプティックパスＰに沿って逐次付加的に決定されるため、オンラインで行っても良い。
【００６４】
ビジビリティ・セルを用いた上記のアルゴリズムは、例えば、図１０に示すステップにより実現される。この例示ステップにおいては、最初、プレノプティックパス上で全方位画像Ｉが撮影される（ステップ１００１）。次に、ステップ１００２において、例えば前述したエゴモーション・リカバリ手法を用いて、プレノプティックパスの位置が決定される。ステップ１００３において、プレノプティックパスに沿ったビジビリティ・セルが算出される。ステップ１００４において、対応するビジビリティ・セルに画像が割り当てられる。最後に、ｖ及びｌに応じて画像が選択される。ここで、ｖはビジビリティ・セルの容積であり、ｌはビジビリティ・セルと交差するプレノプティックパスの長さである。その設定において、画像の重要度はその場面で起こる組合せ変化に応じて設定される。
【００６５】
もう一つの可能性は、取得された画像の位置（ｔｘ、ｔｙ）で全てのエンベロープの体積を算出して、そのエンベロープの体積あるいはエンベロープの導関数に応じて画像を選択あるいは重み付けすることである。エンベロープ体積がより大きくなれば、より多くのサンプルあるいはより良い品質の画像を得ることが望ましく、一方エンベロープ体積がより少なくなれば、より少ない画像あるいは画像当たりの品質をより下げることが好ましい。更に、連続及び組合せサンプリング手法の組み合わせを組み合わせてもよい。
【００６６】
オブジェクト集合のビジビリティ図を算出する簡単で効率的な方法は、最初プリミテイブ（線分、弧等）をカラー画像にラスタリングすることである。ここで各プリミテイブは異なるカラー番号を持つ（図１１のステップ１１０１）。次に、（背景カラーとして検出された）オブジェクトの描かれていない箇所の、画像の各画素位置（ｐｘ、ｐｙ）に対して、図１２に示すような離散的エンベロープが算出される（ステップ１１０２）。各エンベロープはオブジェクト・シーケンスによって注釈付けされる。そのシーケンスは循環数列なので逓増順に分類される。２つのビジビリティ・セルの境界上ではエンベロープが異なる注釈を持つことが分かる。従って、その算出は注釈（線毎に）走査線内で行われる。連続する注釈間の注釈が異なる度に、ビジビリティ図と対応するカラーの画素が書き込まれる（ステップ１１０３）。一旦、個別のビジビリティ図が算出されると、個別の面積あるいは体積がそれぞれのセルに対して算出できる（ステップ１１０４）。
【００６７】
ビジビリティ・セルの体積ｖを算出して比率ｌ^３／ｖに比例したサンプリングを行う代わりに、ビジビリティ・セルの面積ａを算出してＩ^２／ａに比例してプレノプティックパスをサンプリングしてもよい。
【００６８】
更に、ビジビリティ多面体に対するテクスチャのパラメータ表現を、サラウンド画像のフィルタ処理／選択の制御に用いてもよい。テクスチャのパラメータ表現は次のように行われる。ある与えられたビジビリティ多面体及び精度δに対し、そのビジビリティ多面体に該当する全ての画像がマッピングされる。各ビジビリティ多面体はトポロジーの種数（ｇｅｎｕｓ）が０であり、それに対するテクスチャにおいて概略δの精度のパラメータ表現が存在する。そのビジビリティ多面体に該当するサラウンド画像各々について、そのサラウンド画像がそのテクスチャのパラメータ表現へ逆マッピングされる。最後に、全ての逆マッピングを平均化する。
【００６９】
テクスチャのパラメータ表現の利用は、粗い三次元再構成でも十分な品質が提供できる（廊下のような）プレノプティックパス領域において役立つ。
【００７０】
あるいは、パノラマ画像の数を削減する代わりに、各画像ｉに対し、曲線長さあるいはエンベロープ体積／面積に応じた重み付け要素ｗ_ｉを割り当て、次に全ての画像をそれらｗ_ｉに応じて非可逆的に圧縮してもよい。
【００７１】
５．アプリケーション及び結果
捕捉／合成したシーケンスに関する実験結果が図１３に示されている。図１３はプレノプティックパス１３００と、幾つかの主要フレーム１３０２と、その対応位置１３０１とを示す。
【００７２】
実施にあたっては、Ｗｉｎｄｏｗｓ（商標）／Ｉｎｔｅｌ（商標）の一般ＰＣ上でＯｐｅｎＧＬ（商標）を使用するＣ＋＋で行った。図１４に示すようなビューイング／地図ウインドウでマウスを使ったり、あるいは、ジャイロを備えたヘッドマウンティング表示装置（ＨＭＤ）を使って、ユーザは相互作用を行いながら、６０ｆｐｓのリフレッシュレートにて、プレノプティックパス上を移動できる。
【００７３】
例えば、図１４に示すように、ビューイング／地図ウインドウ１４００はビューイング・ウインドウ１４０１と、地図ウインドウ１４０２と、ナビゲーション・ウインドウ１４０３とを備えてもよい。ビューイング・ウインドウ１４０１は現在の視点位置と角度に対する画像を表示する。地図ウインドウ１４０２はフロア地図上に重ね合わせたプレノプティックパス及び現在位置を表示する。ナビゲーション・ウインドウ１４０３は現在位置でのサラウンド画像とパノラマ画像に重畳された擬似フレーム１４０５とを表示して、ビューイング・ウインドウ１４０１の対応領域を示す。ＨＭＤを使用する場合、そのＨＭＤの傾きを利用して、視点を前進させるか後退させるかを示してもよい。
【００７４】
ムービーテクスチャ（ホモグラフィを使って補正し重畳されたもの）あるいはイベントに応じてトリガーされた画像ベース操作のようなマルチメデア・アドオンを用いることにより、仮想通り抜け経験をより豊かなものにできる。例えば、エレベータのボタンを押すことでエレベータのドアを開閉する等も選択できる。
【００７５】
そのようなマルチメデア・アドオンの例が図１５（ａ）〜１５（ｂ）に示されている。図１５（ａ）では、あるサラウンド画像のトリガーされた領域がポリゴン区域１５０１（ここでは四角形）により設定される。ポリゴン区域１５０１はサラウンド画像からホモグラフィを使って合成された仮想ピンホール視野のトリガー領域を示す。対応するトリガー区域同士が交差すると（即ち、図１５（ｂ）に示すように、区域１５０１と区域１５１１が交差すると）、予め定めたイベントが発生する。これらのイベントは音楽の演奏でも、３次元オブジェクトの描画等であってもよい。
【００７６】
図１６はサラウンド画像の集合に対し区域１５０１をどのように定義するかを説明するフローチャートである。最初、ユーザが最初のサラウンド画像内に４つの点を初期設定する（ステップ１６０１）。次いでその４点がサラウンド画像シーケンスに沿ってトラッキングされる（ステップ１６０２）。最終的に、そのトラッキングは、それぞれの画像で得られる４角形を一緒に登録することにより、よりしっかりと行われる（ステップ１６０３）。このステップによりジッタ効果が避けられる。最後にそれらの四角形座標と対応するサラウンド画像の数がＸＭＬファイルフォーマットで保存される（ステップ１６０４）。
【００７７】
このようなマルチメデア・アドオンのもう一つの例が図１７に示されている。プレノプティックパス上に視点ｐ及び位置ｑに３次元オブジェクト１７０２が与えられると、次の合成ステップが行われる。
【００７８】
・視点ｐから見える範囲のオブジェクト１７０２の部分を算出する。このステップは大雑把な再構成、フロア地図等から得られた概略的なビジビリティ情報に基づいて行われる。Ｉ_Ｏをそのオブジェクト画像とする。Ｉ_Ｏは、そのオブジェクトの部分が描画されているＩ_Ｏ内の箇所を特定し、そのオブジェクトを背景から分離するための、アルファチャンネル（マスク画像）を備える。
【００７９】
・視点位置ｐに関するサラウンド画像からの仮想カメラ視野を生成する（例えば、仮想ピンホールカメラ視野）。Ｉ_Ｃをその画像とする。
【００８０】
・最初Ｉ_Ｃを描画し、次にアルファチャネルを用いてＩ_Ｏを描画する。そうすることにより、例えば窓ガラスの描画などの透明効果を得ることも可能になる。
【００８１】
ここで述べられた技法は多くのオブジェクトに拡張できる。例えば、この技法を用いて廊下内に入り込む数体のロボット（ＣＧ画像による）を追加することも可能である。
【００８２】
上記の技法は、サラウンド画像をトラッキングしてその上にホモグラフィーにより他の画像をマッピングする機能とは異なるものである。ここでは、重畳されるＣＧオブジェクトは３次元である。
【００８３】
本発明の実施形態を用いて説明したように、空間係数に基づきサンプリングが制御されるプレノプティックパスの捕捉あるいは合成のフレームワーク。本発明においては、ユーザがそれらプレノプティックパスに沿って相互作用しながら通り抜けできるように、画像ベースによるレンダリング・ブラウザーが導入される。上述された実施形態は大部分が仮想通り抜けシステムに関するものであるが、本発明はそれ以外の如何なるアプリケーション、例えばゲームやテレプレゼンス（ｔｅｌｅｐｒｅｓｅｎｃｅ）のようなアプリケーションにも適用可能である。
【００８４】
【発明の効果】
本発明によれば、全方位画像データを処理でき、かつその全方位画像データの処理により得られたサラウンド画像のデータサイズを効果的に削減できる装置及び方法が提供される。
【００８５】
さらに、本発明によれば、全方位画像データのサンプリング／フィルタ処理が可能な、あるいはアプリケーションで使用されるサラウンド画像データの圧縮が可能な装置及び方法が提供される。
【図面の簡単な説明】
【図１】全方位カメラの例を示す概観図である。
【図２】全方位画像処理システムを示すブロック図である。
【図３】本発明による全方位処理装置の構成を示すブロック図である。
【図４】図３の全方位処理装置における本発明の一実施形態による機能ブロックを示す模式機能図である。
【図５】図５（ａ）：立方体環境マップ画像の一例を示す図である。
図５（ｂ）：正矩形環境マップ画像の一例を示す図である。
【図６】図６（ａ）：画像内にマーキングされ、プレノプティックパス・シーケンスからトラッキングされた特徴例を示す図である。
図６（ｂ）：画像内にマーキングされ、プレノプティックパス・シーケンスからトラッキングされた特徴例を示す図である。
図６（ｃ）：画像内にマーキングされ、プレノプティックパス・シーケンスからトラッキングされた特徴例を示す図である。
【図７】パス内のある位置で算出されたエンベロープの一例を示す模式図である。
【図８】パスに沿って算出されたビジビリティ図の一例を示す模式図である。
【図９】図９（ａ）：パス内のある位置で算出された図７のエンベロープを示す模式図である。
図９（ｂ）：図９（ａ）の同じパス内の別の位置で算出された図７のエンベロープを示す模式図である。
【図１０】パスに沿って算出されたビジビリティ・セル毎の画像フィルタ処理のステップ例を示すフローチャートである。
【図１１】オブジェクト集合のビジビリティ図を算出するためのステップ例を示すフローチャートである。
【図１２】個別エンベロープ内にどのように注釈付けがなされるかの一例を示す模式図である。
【図１３】プレノプティックパス例及び対応する位置における幾つかの主要なフレーム画像例を示す模式図である。
【図１４】本発明の実施形態によるビューイング／マッピング・ウインドウの一例を示す概要図である。
【図１５】図１５（ａ）：本発明の実施形態によるイベント・トリガーされる領域の一例を示す模式図である。
図１５（ｂ）：あるイベントがトリガーされるときの、図１５（ａ）のイベント・トリガーされる領域を示す模式図である。
【図１６】図１５（ａ）のイベントトリガーされる領域の設定ステップの一例を示すフローチャートである。
【図１７】本発明の実施形態による３次元オブジェクト画像をどのようにサラウンド画像内に合成するかに関する一例を示す模式図である。
【図１８】ＣＧスクリプトから算出されたサラウンド画像の一例を示す模式図である。
【記号の説明】
４０１：サラウンド画像合成装置、４０２：位置検出部、４０３：空間係数検出部、４０４：フィルタ部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an apparatus and method for processing a surround image, and more specifically, annotated video captured with an omnidirectional multi-head camera moving along a path, spatial filtering and view. The present invention relates to an apparatus and a method for inning.
[0002]
[Prior art]
The pursuit of authenticity in computer graphics is an endless goal. Image-based rendering (IBR), where object space rendering algorithms provide a surprisingly sharp image that lacks a “real world” feel, while on the other hand, modeling and analysis of image sources using inverse rendering means ), And provides a three-dimensional environment that attracts public attention in image space, although it lacks extensibility and interactivity.
[0003]
S. E. Chen's introduction to photo quality background depiction (“Quicktime VR-An Image-Based Approach to Virtual Environment Navigation” ACM SIGGRAPH, pp. 29-38, 1995), Light Space (M. Le f. Rendering ”, ACM SIGGRAPH, pp. 31-42, 1996; , Compressed ray space plane (W. C. Chen, J. Y. Bougue t, M. H. Chu, R. Grzezzczuk, “Light Field Mapping: Efficient Representation and Hardware Rendering of Surface Lights. 200 p.”, ACM Tras. Virtual passage of photographic quality in a dynamic environment (D.G. Aliaga, T. Funkhauser, D. Yanovsky, I. Carlbom, “Sea Of Images”, IEEE Visualization, pp. 331-338, Id. It has become. Although still in its infancy, in many cases the algorithm first performs a geo-reference of the panoramic frame, then distorts and combines the source images based on feature matching, and a new Proceed to synthesize the viewpoint.
[0004]
Japanese Patent Publication No. 2003-141562, filed by the same applicant as the present application, discloses an image processing apparatus and method applicable to compression, accumulation and reproduction of cylindrical or spherical omnidirectional images mapped to three-dimensional coordinates. Has been.
[0005]
[Problems to be solved by the invention]
Depending on the use of a virtual walkthrough system, it may be necessary to capture the entire area of an indoor environment, such as a building, in order to render a virtual surround environment. For example, if an omnidirectional video camera is used to image the indoor environment of a building, if the omnidirectional video camera captures 60 frames per second and it takes 60 minutes to move within the building, the total number of image frames The number of sheets reaches 216,000.
[0006]
Some captured surround images do not bring much new information as compared to others. For example, when moving in the middle of a corridor surrounded by walls, there is not much visible change in the surround environment. Rendering the hallway of a virtual walk-through system in such a place requires less omnidirectional image data than other places in a building where more environmental changes are seen, such as the corners of a hallway or the entrance of a room. Needed.
[0007]
The present invention has been conceived in view of the above situation. It is desirable to provide an apparatus and method that can process omnidirectional image data and that can efficiently reduce the data size of a surround image obtained by processing omnidirectional image data.
[0008]
Furthermore, it would be desirable to provide an apparatus and method that can sample / filter omnidirectional image data, or that can properly compress surround image data used in an application.
[0009]
[Means for Solving the Problems]
According to one embodiment of the present invention, an image processing apparatus for omnidirectional image data processing is provided. The image processing apparatus relates to a position detection unit that detects a position where an omnidirectional image is captured; and a geometric configuration of a surround environment, and the omnidirectional image is captured from the detection position with respect to the position detection unit. A spatial coefficient detector that calculates a spatial coefficient corresponding to the position, and a filter that reduces the processed image data to be output from the image processing apparatus based on the calculated spatial coefficient.
[0010]
The spatial coefficient may be a curve length of a path in which an omnidirectional image is captured, or is determined based on a plurality of visibility envelopes or cells calculated in the path. May be. Alternatively, the spatial coefficient may be determined according to a parameter expression of the texture of the surround image corresponding to a set of visibility polyhedrons calculated in the path (a set of polyhedra).
[0011]
The filter unit may determine the sampling rate of the omnidirectional image or select the omnidirectional image according to the spatial coefficient. Alternatively, the filter unit may compress the processed image data according to the spatial coefficient.
[0012]
Further, the processing in the image processing apparatus may include addition of an event / trigger area to be superimposed on the processed image. The event trigger area is arranged at a predetermined position of the surround image obtained by processing the omnidirectional image data, and is configured to trigger a predetermined event when the area is selected by a user operation. Further, this processing may include mapping of a moving image texture and / or a three-dimensional object onto a surround image obtained by processing omnidirectional image data.
[0013]
The image processing apparatus may be provided with an omnidirectional image capturing unit.
[0014]
According to another embodiment of the present invention, an image processing method or computer program for omnidirectional image data processing is provided. The image processing method or the computer program detects the position at which the omnidirectional image is captured; the detection is related to the geometric configuration of the surround environment, and the omnidirectional image is captured from the detected position with respect thereto. Calculating a spatial coefficient corresponding to the calculated position; and reducing the processed image data to be output from the image processing apparatus based on the calculated spatial coefficient.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
An apparatus according to an embodiment of the present invention will be described with reference to the drawings.
[0016]
FIG. 1 shows an example of an omnidirectional camera (omnidirectional camera) 10. The omnidirectional camera 10 includes a substantially regular dodecahedron-shaped frame composed of 12 pentagonal surfaces and 11 cameras mounted separately on different surfaces. Each camera captures a corresponding area in the surround scene and outputs data as part of the surround image. An omnidirectional image of an omnidirectional type is obtained by pasting these image portions.
[0017]
FIG. 2 is a schematic diagram of a system including the omnidirectional camera 10 and the omnidirectional image processing apparatus 100. As shown in FIG. 2, the omnidirectional camera 10 includes a plurality of VTRs 12 and microphones 13 in addition to the plurality of cameras 11, and records a plurality of video streams and audio signals. The recorded video stream is video-captured by the switcher 14 and output to the image processing apparatus 100 as computer data (for example, a bitmap file). The image processing apparatus 100 performs image processing such as preparation of a captured image for next image processing, image pasting, and accumulation of processed images.
[0018]
Further, in the present invention, the device 100 performs filtering of the captured image or compression of the processed image according to the importance of the captured image. In the present embodiment, the degree of importance is measured by a spatial coefficient calculated along the path of the omnidirectional camera 10. Details of the filtering / compression method according to this embodiment will be described later.
[0019]
The image processing apparatus 100 can be realized by a computer system having the configuration shown in FIG. The apparatus 100 includes a CPU 101, a memory 102, a display controller 103, an input device interface 104, a network interface 105, an external device interface 107, a bus 108, a video capture card 109, a display device 111, and a keyboard. 112, a mouse 113, a hard disk drive 114, and a media drive 115.
[0020]
The image processing apparatus 100 downloads various applications for image processing, downloads omnidirectional images instead of receiving them via the video capture card 109, or distributes processed image data over a network. It may be connected to the LAN or the Internet 120 via the network interface 105.
[0021]
The CPU 101 executes various applications such as pasting of a plurality of images output from the omnidirectional camera 10 via the video capture card 109 and image filtering / compression according to a spatial coefficient.
[0022]
FIG. 4 shows an example of a functional schematic diagram of the CPU 101 in an operation mode in which a plurality of captured images are processed to calculate a surround image. In this operation mode, omnidirectional image data including a plurality of captured image data is provided from the omnidirectional camera 10 to the surround image calculation unit 401.
[0023]
The surround image calculation unit 401 calculates and outputs a panoramic image or a surround image from the omnidirectional image data. When a computer graphic (CG) script is used, the unit 401 directly calculates a surround image from the CG script.
[0024]
The position detection unit 402 performs egogo recovery using the panoramic image for positioning the camera 10. The egomotion recovery process can be considerably simplified by using a fixed-height camera. Detailed description of egomotion recovery will be described later.
[0025]
The position of the camera 10 is output to the spatial coefficient detection unit 403. The spatial coefficient detection unit 403 detects a spatial coefficient at the position detected by the position detection unit 402. In the present invention, the spatial coefficient is introduced to measure the level of importance (significance) of the surround environment where the panoramic image is taken. The spatial coefficient is calculated based on, for example, (a) curve length unit, (b) visibility cell, (c) visibility envelope, or (d) texture parameter expression. When using a visibility cell or envelope, the volume and area attributes for all positions in the path are calculated. Envelope combination changes occur only at the critical visibility vents defined by the visibility graph (see FIG. 8). A detailed description of the visibility cell and envelope will be given later in this specification.
[0026]
The filter processing unit 404 performs surround image filter processing based on the spatial coefficient calculated by the spatial coefficient detection unit 403. For example, a large number of surround images may be sampled first, and then reduced by the filter processing unit 404. Or you may make it control the timing which samples the omnidirectional image data provided from the omnidirectional camera 10 based on a space coefficient instead of a filter process. Furthermore, each omnidirectional image may be weighted according to a spatial coefficient and compressed together with other images according to the weighting.
[0027]
Next, a method according to another embodiment of the present invention will be described. The method includes the steps of imaging, spatially filtering and viewing an annotated video taken with an omnidirectional multihead camera moving along a path.
[0028]
The following sections describe in detail the egomotion recovery algorithm designed for calculating the path of an omnidirectional camera and the efficient sampling of the plenoptic path based on geometric visibility vents. To do. With proper sampling, panoramic images can be filtered and compressed, redundancy is avoided, and memory space in the image database can be saved. Also described are some applications and results of plenoptic paths obtained by indoor shooting or fully rendered by computer graphics.
[0029]
1. Plenoptic functions and paths
The concept of plenoptics (M. Levoy, P. Hanrahan. “Light field rendering”, ACM SIGGRAPH, pp. 31-42, 1996; S. J. Gartler, R. Grzezczuk, R. Gzezzczk. Cohen, “The Lumigraph”, ACM SIGGRAPH, pp. 43-54, 1996) is a three-dimensional space E in the spectral response at any time t, in any direction (θ, φ) and at any wavelength λ.³If the seventh-order function L (•) = L (X, Y, Z, θ, φ, t, λ) associated with each Cartesian coordinate point (X, Y, Z) in FIG. It is based on the observation that modeling can be bypassed and interactive rendering can be provided by “rendering” directly from the plenoptic function L (•).
[0030]
In practice, freeze the time (ie, consider a static environment), select one wavelength (one color channel, eg red), and camera the ray sampling (X, Y, Z, θ, φ) By limiting to the path, the high dimensionality of this function is relaxed, that is, the one-dimensional plenoptic path P = {P_i= (X, y, z)_i}.
[0031]
By moving the omnidirectional camera along the path, the sampling of P can be forcibly performed, so that P is a set of points p P = {P_i}_{i = 1, p}As a discretization. L_{｜ P}From (3) three-dimensional discontinuous sampling, consider inverse rendering problems (eg, elucidation of macro / meso geometry and texture attributes, lighting conditions, etc.), or view synthesis It is possible to either proceed with extrapolation of the function for Since a large amount of sampling photography is time-consuming and cumbersome, this work may be performed using a power robot that moves a panoramic head equipped with an omnidirectional camera. Examples of such systems are presented in the following literature (DG Aliaga, T. Funkhauser, D. Yanovsky, I. Carlbom, “Sea Of Images”, IEEE Visualization, pp. 331-338. D. G. Aliaga, D. Yanovsky, T. Funkhauser, I. Carlbom, “Interactive Image-Based Rendering Using Feature Globalization”, ACM Symp. 3 ACM Symp.
[0032]
2. Capture plenoptic path
2.1 Multi-head camera
As an omnidirectional camera, a multi-head camera as shown in FIG. 1 can be used for omnidirectional video photography. The panoramic head consists of 10 CCD NTSC block cameras that share approximately the same optical center point (ie, nodal point), and displays 60 omnidirectional images (4π steradian angle) per second that overlap each other. Shoot with interlaced frames.
[0033]
Although an omnidirectional camera is used in this embodiment, it is not always necessary to provide a complete omnidirectional camera that covers 4π steradian angles. Depending on the final application, the angle range covered by the camera or image data used in the present invention may be 4π steradian angle or less.
[0034]
A surround environment may be freely photographed by mounting a camera and a recording system on a transport vehicle and driving the battery with a battery. The transport vehicle may be provided with a motor and a drive mechanism, or may be pushed by a human hand. The algorithm for stitching, viewing and coding the surround environment map is It is described in detail in the following article by Nielsen ("High Resolution Full Spheroidal Videos", IEEE Intl. Conf. On Information Technology; Coding and Computing, pp. 260-267). High resolution panoramic frame I on the order of p∝10000_i, I∈ {1,..., P}. For example, a 2048 × 1024 image size may be used as an equirectangular panorama. FIGS. 5A and 5B show examples of a regular hexahedral surround environment map and a regular rectangular surround environment map generated from omnidirectional image data captured by a multi-head camera.
[0035]
2.2 Computer graphics
Alternatively, omnidirectional images and plenoptic paths may be calculated using ray tracing or radiosity software. For example, since the source code is disclosed and considerable CG scripts are available, the surround image may be calculated using POV-Ray (trademark). That is, to define a bijective function that maps the corresponding angular coordinates (θ, ψ) to each pixel (x, y) of the CG image and output our image format The process “create_ray” of the file “render. Cpp” is changed. The example of the surround image shown in FIG. 18 was calculated from a CG script obtained from the site Internet Ray Tracing Competition (IRTC).
[0036]
Conventional tools such as the Alias Wavefront Maya ™ or Discreet 3DMAX ™ used in the computer graphics industry can also be used to output surround images using their rendering device or API.
[0037]
A CG image is a complete virtual surround camera (ie, there is no physical interference between devices and no actual camera device is required) and an accurate image (with constant illumination, no parallax, no noise, no vibrations, etc.) The operation using the CG image is useful for the mutual performance evaluation (benchmark) or the like.
[0038]
3. Egomotion recovery
An algorithm for recovering the extrinsic position of the camera according to the present embodiment will be described. Since panoramic images have no intrinsic parameters, we end up with a spatially indexed Euclidean path (defined up to a magnification factor) of the panoramic image.
[0039]
A GPS system may be used to add general annotations to videos of large panoramic paths such as outdoor environments. However, since a general GPS system is given only a rough position, it cannot be used for visual field synthesis (M. Hirose, “Space Recording Usage Augmented Technology”, Int. Mixed Reality Symp., Pp105-110; D. Kimber, J. Foote, S. Lertisichai, “FlyAbout; Spatially Indexed Panoramic Video”, ACM Multimedia 2001, pp. 339-341). The vision algorithm has recently proved useful in an industry that requires moving matching where the virtual and real camera paths must be matched to produce a visual effect.
[0040]
Based on feature tracking, omnidirectional image position tags (x, y, θ)_i, Iε {1,..., P}. Hereinafter, a simple and high-speed global egomotion algorithm when the omnidirectional camera is limited to move on a fixed height surface will be described. According to the egomotion algorithm, the position on the path is registered in the surround image.
[0041]
The algorithm includes the following steps.
・ Rough rotation sequence θ_i(Pixel based method),
-Outline parallel movement sequence (x, y)_i(Feature based method),
-By performing high density global optimization based on initial estimates (x, y, θ)_iFine-tuning (a feature-based method that takes all parameters into account at the same time: relative path),
• Path fixation (absolute path) using two or more landmarks.
[0042]
We do not need to indicate the landmark to be tracked, nor do we need to pre-initialize the path, which makes the acquisition system flexible, scalable and easy to use.
[0043]
3.1 Coarse orientation measurement
Absolute orientation refers to “north” of the panoramic image. Although simple, the algorithm works well without feature matching. For example, when a plurality of images are in a regular rectangular format (also called latitude-longitude) with dimensions w × h, each panorama frame I_iIn contrast, each column pixel is averaged to correspond to the corresponding one-dimensional ring image R._iIt becomes.
[0044]
Next, ring image R_i, R_{i + 1}Is continuously registered and the azimuth shift (resulting in sub-pixel accuracy (BD Lucas, T. Kanade, “An iterative Image Registration Technology and Application to Stereo Vision,” 7th Int. Intelligence (IJCAI), pp. 676-679, 1981)). The pixel unit in the ring image corresponds to a shift of 2π / ω radians. By weighting the pixels according to the angle length inherent in the vertical direction corresponding to the pixels, the column can be averaged within a limited latitude range.
[0045]
3.2 Coarse parallel movement
Once the approximate orientation of the image is determined, the sequence (x, y)_iIs determined. Similar to most configurations in motion algorithms, the Euclidean 3D point set is reconstructed. First, the one-dimensional parallel movement is calculated as follows. Initially, where the orientation does not change much, the path is divided into segments (continuous image sequences) (using an adjusted threshold). Segment length λ_iIf is not yet calculated, a polyline is defined. Details of the calculation algorithm will be described later with reference to FIG.
[0046]
Image I at both ends of unity length k + 1_dAnd I_{d + k}About I_d... I_{d + k}From the features tracked in common (see FIG. 6), the translation parameter λ_dIs calculated using standard numerical analysis methods. By converting from pole to Cartesian (θ, λ)_iTo sequence (x, y)_iIs required. 6 (a) -6 (c) show examples of features marked in an image tracked from a plenoptic path sequence.
[0047]
3.3 Fine parameter adjustment
Sequence (x, y, θ)_iIs numerically improved by appropriately correlating rotation and translation in a manner similar to that described in the following document (CJ Taylor, “VideoPlus; A Method for Capturing the Structure of Appealance of Immrive Environments ", IEEE Trans. On Visualization and Computer Graphics, Vol. 8 (2), pp. 171-182, 2002; Journal of omputer Vision, Vol. 49 (2/3), pp. 143-174, 2002). Although important in visibility synthesis applications, here a rough analysis of combinatorial events occurring along the path is performed, and this final step does not greatly improve the filtering process.
[0048]
3.4 Absolute position determination
Use two or more landmarks set by the user (eg, using a reference floor map) to match the size and origin of the available geometric information provided by the floor map, rough reconstruction, etc. Thus, the path is fixed using the rotation and parallel movement parameters.
[0049]
In the present invention, egomotion recovery is not limited to the above method. If the method can define the reference angles (for example, the north pole) and (x, y) positions of all surround images, Other methods using physical devices, viewpoint tracking devices, fiducials, distance meters, and the like may be used. Note that this step is not necessary for computer graphics scripts. This is because each CG surround image is calculated for each predetermined position.
[0050]
4). Spatial filtering of plenoptic paths
Once a plenoptic path is sought, the plenoptic path can be properly sampled / annotated so that “redundant” images (images that do not bring much new information) or less important images can be removed. Done. The sampling is also useful for progressive coding such as plenoptic paths.
[0051]
Sampling of plenoptic functions was studied by Chai et al. (J.-X.Chai, H.-Y.Shum, XT “Plenoptic sampling”, ACM SIGGRAPH, pp. 307-318, 2000). Chai et al. Studied the sampling method of L (•) using spectral analysis to determine the minimum sampling rate for light field rendering. The present invention, on the other hand, is a subset of the plenoptic function and is particularly adapted for plenoptic paths that have been geometrically partitioned.
[0052]
One way to reduce the number of surround images is to view the viewpoint P along the path P in proportion to the curve length._iIs selected / distributed (parameter expression of path length). This method is efficient when geometric information is not available. If l indicates the path length P, l is calculated from the relative translation parameter of the surround image by Σ_i{(Tx_{i + 1}-Tx_i)²+ (Ty_{i + 1}-Ty_i)²}^1/2Is defined as having
[0053]
If geometric information is not available, it is preferable to sample according to l. For example, if m subset images need to be selected from n recorded surround images, the path P is divided into m equal length intervals, one surround within each interval. An image is selected. By doing so, it is possible to compensate for non-uniform artifacts that occur during acquisition. For example, increasing the speed of the omnidirectional camera transporter can provide coarse data, while decreasing the speed increases the sampling rate.
[0054]
Another way to reduce the number of surround images is to use a visibility cell. In the following, a geometric analysis method for combination events occurring along P will be introduced.
[0055]
F represents a “free” space, ie the space is a scene S = {S₁,. . . , S_n} Is not obstructed by any of the n objects. F = E³\ ∪ⁿ _{i = 1}S_i(E³Is Euclidean three-dimensional space). Let ν⊆F be a “visible” space, ie a part of the free space where the user can move while interacting. Given a position Pεν, ε (P) shall denote the lower envelope surrounding P (for geometric terms, see JD Boissonnat, M. Yvinec, “Algorithmic geometry Press”, Cambridge University Press, 1998). That is, ε (P) is the distance from the position P to the object S first hit by a ray emitted in the direction (θ, φ) with respect to a given (θ, φ) angular coordinate, and that object exists. In this case, it is defined as a radial function r (θ, φ) for obtaining the distance. Note that ε (·) is not necessarily continuous.
[0056]
FIG. 7 shows an example of the envelope ε (O), where O indicates the center. A thick solid line 700 is a polyline describing a scene. The dotted star-shaped polygon is an envelope ε (O) originating from the center O. The envelope ε (·) changes as the position in the plenoptic path moves. For example, FIGS. 9A and 9B show changes in the envelope ε (·). The shaded portion of the figure shows the envelope ε (•) at different positions of the plenoptic path in the building 900.
[0057]
If P is moved slightly, the envelope ε (P) changes smoothly until a critical combination event defined by a visibility event (occlusion / disocclusion) is reached. Let A (S) be the division of ν into visibility cell elements. FIG. 8 is a schematic diagram showing the visibility diagram and its cell decomposition. In the figure, numeral 800 indicates one of the visibility cells.
[0058]
S is E³A (S) is O (n⁹) With complexity. However, when limited to two-dimensional cells cut by a straight line, the complexity is O (n³) And become easy to treat. Zone theory (JD Boisssonnat, M. Yvinec,
According to “Algorithmic geometry”, Cambridge University Press, 1998), the size of all envelope combinations cut in a straight line is O (n³)become.
[0059]
It should be noted that when the visibility cell is constrained to have a minimum “width”, its complexity is linear, ie proportional to the path length. For indoor photography, a floor map of a building is often used in a drawing exchange format (DXF format), and a visibility diagram as shown in FIG. 8 can be calculated. Here, it should be noted that for a small n, it is possible to calculate a restriction on a polyline by applying a prime quadratic algorithm. For a visibility cell with volume v, length l and a plenoptic path length intersecting it, the path in this cell is the ratio l³Sampling may be performed according to / v.
[0060]
Further, in a portion where the visibility information changes greatly (that is, when moving from one large visibility cell to another), the transition may be made smoother by locally increasing the sampling rate. (For example, if a wall is tangent to the current viewpoint, we need more samples because one side of the wall becomes clear when the camera moves in parallel, even a little.)
In this way, the image database can be reduced on the order of one or two digits while maintaining its semantics.
[0061]
Even for a very large n complex CG script having a non-trivial visibility diagram, the current visibility cell volume v is roughly estimated as follows.
[0062]
Color and depth information d for each pixel_{i, j}Each pixel e of the RGBZ environment image_{i, j}, Is the corresponding solid angle a_{i, j}Hang. Since the solid angle subdivides all unit spheres, these divisions are simply added and v = (1 / 4π) Σ_{i, j}a_{i, j}d_{i, j}Get.
[0063]
If the volume changes greatly, this means that the viewpoint is moving from one visibility cell to another, but the first combination event is searched along a given plenoptic path, Sampled (ie, renders P). Sampling for the CG script may be performed online because the next best viewpoint is additionally determined sequentially along its plenoptic path P.
[0064]
The above algorithm using the visibility cell is realized by the steps shown in FIG. 10, for example. In this example step, first, an omnidirectional image I is taken on the plenoptic path (step 1001). Next, in step 1002, the position of the plenoptic path is determined using, for example, the above-described egomotion recovery technique. In step 1003, a visibility cell along the plenoptic path is calculated. In step 1004, an image is assigned to the corresponding visibility cell. Finally, an image is selected according to v and l. Where v is the volume of the visibility cell and l is the length of the plenoptic path that intersects the visibility cell. In the setting, the importance of the image is set according to the combination change occurring in the scene.
[0065]
Another possibility is to calculate the volume of all envelopes at the acquired image position (tx, ty) and select or weight the image according to the volume of the envelope or the derivative of the envelope. . It is desirable to obtain more samples or better quality images with a larger envelope volume, while it is preferred to reduce fewer images or per image quality with a smaller envelope volume. Furthermore, combinations of continuous and combinatorial sampling techniques may be combined.
[0066]
A simple and efficient way to calculate the visibility diagram of an object set is to first rasterize primitives (lines, arcs, etc.) into a color image. Here, each primitive has a different color number (step 1101 in FIG. 11). Next, a discrete envelope as shown in FIG. 12 is calculated for each pixel position (px, py) of the image at a location where the object (detected as the background color) is not drawn (step 1102). ). Each envelope is annotated with an object sequence. Since the sequence is a cyclic sequence, it is classified in increasing order. It can be seen that the envelope has a different annotation on the boundary of the two visibility cells. Therefore, the calculation is performed within the annotation line (line by line). Each time an annotation between successive annotations is different, a color pixel corresponding to the visibility diagram is written (step 1103). Once individual visibility diagrams are calculated, individual areas or volumes can be calculated for each cell (step 1104).
[0067]
Calculate the volume v of the visibility cell and calculate the ratio l³Instead of sampling in proportion to / v, the area a of the visibility cell is calculated and I²The plenoptic path may be sampled in proportion to / a.
[0068]
Furthermore, the texture parameter representation for the visibility polyhedron may be used to control the filtering / selection of the surround image. The parameter expression of the texture is performed as follows. For a given visibility polyhedron and accuracy δ, all images corresponding to that visibility polyhedron are mapped. Each visibility polyhedron has a topology genus of 0, and there is a parameter representation with an accuracy of approximately δ in the texture for it. For each surround image corresponding to the visibility polyhedron, the surround image is inverse-mapped to a parameter representation of the texture. Finally, all the reverse mappings are averaged.
[0069]
The use of a texture parametric representation is useful in plenoptic path regions (such as corridors) where a rough three-dimensional reconstruction can provide sufficient quality.
[0070]
Alternatively, instead of reducing the number of panoramic images, for each image i, a weighting factor w according to the curve length or envelope volume / area_iAssign all images then_iDepending on, compression may be performed irreversibly.
[0071]
5. Application and results
The experimental results for the captured / synthesized sequence are shown in FIG. FIG. 13 shows a plenoptic path 1300, several main frames 1302, and their corresponding positions 1301.
[0072]
The implementation was performed in C ++ using OpenGL ™ on a general PC of Windows ™ / Intel ™. Using a mouse in a viewing / map window as shown in FIG. 14 or using a head mounting display (HMD) equipped with a gyro, the user interacts with the player at a refresh rate of 60 fps. You can move on the noptic path.
[0073]
For example, as shown in FIG. 14, the viewing / map window 1400 may include a viewing window 1401, a map window 1402, and a navigation window 1403. A viewing window 1401 displays an image for the current viewpoint position and angle. A map window 1402 displays the plenoptic path and the current position superimposed on the floor map. The navigation window 1403 displays a surround image at the current position and a pseudo frame 1405 superimposed on the panorama image to indicate a corresponding area of the viewing window 1401. When using the HMD, the inclination of the HMD may be used to indicate whether to move the viewpoint forward or backward.
[0074]
By using multimedia add-ons such as movie textures (corrected and superimposed using homography) or image-based operations triggered in response to events, the virtual walk-through experience can be enriched. For example, it is possible to select to open and close the elevator door by pressing an elevator button.
[0075]
Examples of such multimedia add-ons are shown in FIGS. 15 (a) -15 (b). In FIG. 15A, a triggered region of a certain surround image is set by a polygon area 1501 (here, a rectangle). A polygon area 1501 indicates a trigger area of a virtual pinhole field of view synthesized from a surround image using homography. When the corresponding trigger areas intersect (that is, when the area 1501 and the area 1511 intersect as shown in FIG. 15B), a predetermined event occurs. These events may be performance of music or drawing of a three-dimensional object.
[0076]
FIG. 16 is a flowchart for explaining how an area 1501 is defined for a set of surround images. Initially, the user initially sets four points in the first surround image (step 1601). The four points are then tracked along the surround image sequence (step 1602). Finally, the tracking is performed more firmly by registering together the quadrilaterals obtained in each image (step 1603). This step avoids jitter effects. Finally, the number of surround images corresponding to the rectangular coordinates is stored in the XML file format (step 1604).
[0077]
Another example of such a multimedia add-on is shown in FIG. When the three-dimensional object 1702 is given to the viewpoint p and the position q on the plenoptic path, the following synthesis step is performed.
[0078]
Calculate the portion of the object 1702 that is visible from the viewpoint p. This step is performed on the basis of rough visibility information obtained from rough reconstruction, a floor map, or the like. I_OIs the object image. I_OIs the part of the object being drawn_OAn alpha channel (mask image) is provided for identifying the location within and separating the object from the background.
[0079]
Generate a virtual camera field of view from the surround image for viewpoint position p (eg, virtual pinhole camera field of view). I_CIs the image.
[0080]
・ First I_CAnd then using the alpha channel I_ODraw. By doing so, it becomes possible to obtain transparency effects, such as drawing of a window glass, for example.
[0081]
The technique described here can be extended to many objects. For example, it is possible to add several robots (according to CG images) that enter the hallway using this technique.
[0082]
The above technique is different from the function of tracking a surround image and mapping another image on it by homography. Here, the superimposed CG object is three-dimensional.
[0083]
A framework for capturing or synthesizing a plenoptic path in which sampling is controlled based on spatial coefficients, as described using the embodiments of the present invention. In the present invention, an image-based rendering browser is introduced so that the user can interact through these plenoptic paths. Although the above-described embodiments are mostly related to virtual walk-through systems, the present invention is applicable to any other application, such as games or telepresence.
[0084]
【The invention's effect】
According to the present invention, there is provided an apparatus and method capable of processing omnidirectional image data and effectively reducing the data size of a surround image obtained by processing the omnidirectional image data.
[0085]
Furthermore, according to the present invention, there is provided an apparatus and method capable of sampling / filtering omnidirectional image data or compressing surround image data used in an application.
[Brief description of the drawings]
FIG. 1 is an overview diagram showing an example of an omnidirectional camera.
FIG. 2 is a block diagram showing an omnidirectional image processing system.
FIG. 3 is a block diagram showing a configuration of an omnidirectional processing apparatus according to the present invention.
4 is a schematic functional diagram showing functional blocks according to an embodiment of the present invention in the omnidirectional processing apparatus of FIG. 3. FIG.
FIG. 5A is a diagram illustrating an example of a cube environment map image.
FIG. 5B is a diagram showing an example of a regular rectangular environment map image.
FIG. 6A is a diagram showing an example of features marked in an image and tracked from a plenoptic path sequence.
FIG. 6B is a diagram showing an example of the feature marked in the image and tracked from the plenoptic path sequence.
FIG. 6C is a diagram showing an example of the feature marked in the image and tracked from the plenoptic path sequence.
FIG. 7 is a schematic diagram illustrating an example of an envelope calculated at a certain position in a path.
FIG. 8 is a schematic diagram showing an example of a visibility diagram calculated along a path.
9A is a schematic diagram showing the envelope of FIG. 7 calculated at a certain position in the path.
FIG. 9B is a schematic diagram showing the envelope of FIG. 7 calculated at another position in the same path of FIG. 9A.
FIG. 10 is a flowchart showing an example of image filtering processing for each visibility cell calculated along a path.
FIG. 11 is a flowchart showing an example of steps for calculating a visibility diagram of an object set.
FIG. 12 is a schematic diagram showing an example of how annotation is made within an individual envelope.
FIG. 13 is a schematic diagram showing an example of a plenoptic path and some examples of main frame images at corresponding positions.
FIG. 14 is a schematic diagram illustrating an example of a viewing / mapping window according to an embodiment of the present invention.
FIG. 15A is a schematic diagram showing an example of an event-triggered area according to the embodiment of the present invention.
FIG. 15B is a schematic diagram showing the event-triggered region in FIG. 15A when a certain event is triggered.
FIG. 16 is a flowchart showing an example of an event triggered region setting step of FIG.
FIG. 17 is a schematic diagram illustrating an example of how to synthesize a three-dimensional object image in a surround image according to an embodiment of the present invention.
FIG. 18 is a schematic diagram illustrating an example of a surround image calculated from a CG script.
[Explanation of symbols]
401: Surround image synthesizing device, 402: Position detection unit, 403: Spatial coefficient detection unit, 404: Filter unit.

Claims

In an image processing apparatus that processes omnidirectional image data,
A position detection unit that detects a position where the omnidirectional image is captured;
A spatial coefficient detector for calculating a spatial coefficient corresponding to the detected position, in which an omnidirectional image is captured from the detected position with respect to the geometric configuration of the surround environment;
An image processing apparatus comprising: a filter unit configured to reduce processed image data to be output from the image processing apparatus based on the calculated spatial coefficient.

The image processing apparatus according to claim 1, wherein the spatial coefficient is a curve length of a path where an omnidirectional image is captured.

The image processing apparatus according to claim 1, wherein the spatial coefficient is determined according to a plurality of visibility envelopes calculated in a path where an omnidirectional image is captured.

The image processing apparatus according to claim 1, wherein the spatial coefficient is determined according to a plurality of visibility cells calculated in a path where an omnidirectional image is captured.

The image processing apparatus according to claim 1, wherein the spatial coefficient is determined according to a texture parameter expression of a surround image corresponding to a set of polyhedrons calculated in a path where an omnidirectional image is captured.

The image processing apparatus according to claim 1, wherein the filter unit determines a sampling rate of the omnidirectional image according to the spatial coefficient.

The image processing apparatus according to claim 1, wherein the filter unit selects the omnidirectional image according to the spatial coefficient.

The image processing apparatus according to claim 1, wherein the filter unit compresses the processed image data according to the spatial coefficient.

The processing of the omnidirectional image data includes the addition of an event trigger region to be superimposed on the processed image,
The event trigger area is arranged at a predetermined position of the surround image obtained by processing the omnidirectional image data, and a predetermined event is triggered when the area is selected by a user operation. The image processing apparatus according to claim 1, wherein:

The processing of the omnidirectional image data includes mapping of a moving image texture and / or a three-dimensional object on a surround image obtained by the processing of the omnidirectional image data. Image processing apparatus.

In an apparatus including an omnidirectional image capturing unit and an image processing unit,
The image processing apparatus according to claim 1, wherein the image processing unit includes the image processing apparatus according to claim 1.

In an image processing method for processing omnidirectional image data,
Detecting the position where the omnidirectional image is taken,
Calculating a spatial coefficient corresponding to the detected position, for which an omnidirectional image is taken from the detected position with respect to the geometric configuration of the surround environment;
An image processing method comprising a step of reducing processed image data to be output from the image processing apparatus based on the calculated spatial coefficient.

In a computer program for executing an image processing method for processing omnidirectional image data in a computer, the image processing method detects a position where the omnidirectional image is captured,
Calculating a spatial coefficient corresponding to the detected position, for which an omnidirectional image is taken from the detected position with respect to the geometric configuration of the surround environment;
A computer program comprising causing reduction of processed image data to be output from the image processing device based on the calculated spatial coefficient.