JP3848092B2

JP3848092B2 - Image processing apparatus and method, and program

Info

Publication number: JP3848092B2
Application number: JP2001071120A
Authority: JP
Inventors: 泰弘奥野
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2001-03-13
Filing date: 2001-03-13
Publication date: 2006-11-22
Anticipated expiration: 2021-03-13
Also published as: JP2002269593A

Description

【０００１】
【産業上の利用分野】
本発明は、画像処理装置及び方法、並びにプログラムに関し、特に、現実空間の映像と、３次元的にモデリングされたコンピュータグラフィックス（以下「ＣＧ」という。）によって生成された映像とを重畳し、且つ両者の映像を位置合わせしつつ表示することができる画像処理装置及び方法、並びにプログラムに関する。
【０００２】
【従来の技術】
従来、現実空間の映像と、三次元モデリングされたＣＧ映像によって生成された映像とを重畳し、且つ両者の映像を位置合わせしつつ表示して、あたかも現実の世界の中にＣＧで描かれた物体（仮想物体）が存在しているかのように見せることができる複合現実感提示装置がある。この装置は、現実の映像を撮影するための現実映像撮影手段（例えば、ビデオカメラ）と、現実の映像を撮影している位置から見たようにＣＧ映像を作り出すＣＧ映像生成手段と、両者を合成して表示することのできる映像表示手段（例えば、ヘッドマウントディスプレイ（ＨＭＤ）又はモニタ）から成る。ここで、映像表示手段は、現実映像撮影手段の視線位置が変わってもＣＧ映像と現実の映像を正しい位置関係で表示するようになっており、現実映像撮影手段は、視線位置や視線方向を検出するための視線位置姿勢検出手段（例えば、位置姿勢センサ）を備えている。
【０００３】
ＣＧ映像生成手段は、三次元モデリングされたＣＧ映像を現実空間と同じスケールの仮想空間に置き、視線位置姿勢検出手段によって検出された視線位置や視線方向から観察されたものとしてＣＧ映像をレンダリングする。このようにして生成されたＣＧ映像と現実の映像とを重畳すると、結果として、現実映像撮影手段がどの視線位置や視線方向から観察した場合でも、現実空間の中に正しくＣＧ映像の対象が置かれているような映像を表示することができる。
【０００４】
また、ＣＧ映像を出現させたい場所にもさらなる位置姿勢検出装置を取り付け、そこから得られた位置にＣＧ映像を出現させることも行われている。例えば、手に位置姿勢検出装置（位置姿勢センサ）を取り付け、そのセンサの位置にＣＧ映像を生成することによって、あたかも手の上に常にＣＧ映像の対象が乗っているように、即ち手をどのように動かしても手の上にＣＧ映像が乗っているような映像を表示することができる。
【０００５】
現実空間の撮影対象を撮影する現実映像撮影手段としてのビデオカメラは、その視線方向にある映像を撮影し、撮影された映像データはメモリ中にキャプチャするようになっている。
【０００６】
現実の映像とＣＧ映像とを合成して表示する映像表示装置としては、例えばＨＭＤが用いられる。通常のモニタでなくＨＭＤを用いて、さらに上記ビデオカメラをＨＭＤの視線方向に装着することで、観察者が向いている方向の映像をＨＭＤに映し出すことができ、かつ、観察者がその方向を向いたときのＣＧ映像の表示も行えるため、観察者の没入感を高めることができる。
【０００７】
位置姿勢検出手段としては、磁気方式による位置姿勢センサなどが用いられ、これを上記ビデオカメラ、又はビデオカメラが取り付けられているＨＭＤに取り付ることによって、ビデオカメラの視線の位置姿勢の値を検出する。磁気方式の位置姿勢センサとは、磁気発生装置（発信機）と磁気センサ（受信機）との間の相対位置及び姿勢を検出するものであり、米国ポヒマス（PolhemuＳ）社の製品FAＳTRAKなどがあげられる。これは特定の領域内で、センサの３次元位置（Ｘ，Ｙ，Ｚ）と姿勢（ローリング、ピッチング、ヨーイング）をリアルタイムに検出する装置である。
【０００８】
上記の構成により、観察者は、ＨＭＤを通じて現実の映像とＣＧ映像が重畳された世界を観察することができるようになる。観察者が周囲を見回すと、ＨＭＤに備え付けられた現実映像撮影装置（ビデオカメラ）が現実の映像を撮影し、ＨＭＤに備え付けられた視線位置姿勢検出手段（位置姿勢センサ）がビデオカメラの位置視線方向を検出し、これに応じてＣＧ映像生成手段がその視線位置・姿勢から見たＣＧ映像を生成し、これを現実の映像に重畳して表示する。
【０００９】
また、観察者を複数設けることも可能である。現実空間を撮影するビデオカメラと表示装置（ＨＭＤ等）、位置姿勢センサを観察者数だけ用意し、夫々の観察者の視点から現実空間の撮影して現実の映像とＣＧ映像の生成を行い、これらを合成して、夫々の観察者に表示すればよい。
【００１０】
【発明が解決しようとしている課題】
しかしながら、上記従来の複合現実感提示装置では、視線位置や視線方向（姿勢）を検出するための視線位置姿勢センサから得られる値に基づいてＣＧ映像をレンダリングしており、これらの値に誤差があると、生成されるすべてのＣＧ映像の位置がずれて、誤差の分だけずれた位置や方向から見たＣＧ映像が生成される。
【００１１】
また、視線位置姿勢センサに誤差が少ない状態であっても、ＣＧ映像を出現させたい場所を検出するために設置したＣＧ映像位置姿勢センサに誤差があると、その位置に出現させようとしたＣＧ映像の位置や方向がずれる。
【００１２】
このような位置姿勢センサから得られる値には、一般的に多かれ少なかれ誤差が含まれ、例えば、磁気方式による位置姿勢センサは、位置姿勢が検出できる領域が磁気発信機を中心とした空間領域に限られており、さらにその領域の中であっても、磁気発信機から受信機（センサ）が離れるほど検出誤差が大きい。そのため、観察位置、即ち現実映像を撮影するためのカメラの位置が発信機の位置から遠くなればなるほど、その位置姿勢センサは発信機から遠くなり、結果として視線位置姿勢の値に誤差が多くなる。ＣＧ映像を出現させる位置に位置姿勢センサを置いている場合は、観察者位置が発信機の位置に近くて誤差が少ない範囲から観察しているとしても、ＣＧ映像を出現させる位置が発信機位置から遠くなると位置姿勢センサの誤差が大きくなり、結果としてＣＧ映像が現れる位置にずれが生じる。
【００１３】
従来の複合現実感提示装置では、観察者の移動やＣＧ映像位置の移動によって位置姿勢センサの誤差が大きくなったときでもＣＧ映像描画を行ってしまうため、センサの誤差によってＣＧ映像と現実空間の位置あわせがずれてしまったような、例えば、手の上にあるように計画されたＣＧ映像表示が、手の上でないところに表示されるような不自然な映像であってもそのまま表示してしまうという問題がある。現実空間との位置あわせが不要なバーチャルリアリティ（Virtual Riality）システム（現実の映像がなく、ＣＧ映像のみを表示するシステム）の場合は、少々の誤差があってもさほど不自然でないため大きな問題とならないが、複合現実感システムの場合は現実物体との高精度の位置あわせを必要とするため、位置姿勢センサの有効測定領域内で生じるような少々の誤差であっても映像が不自然になってしまい、実用に耐えなくなる場合がある。
【００１４】
また、例えば、視線位置に誤差がある場合はＣＧ映像すべての位置あわせに誤差が出るが、ＣＧ映像位置姿勢に誤差がある場合はそのＣＧ映像の描画に誤差が出るだけであるように、ＣＧ映像の位置姿勢センサに誤差が大きくなった場合と、視線位置姿勢センサの誤差が大きくなった場合とでは、発生する現象が異なり、夫々の場合に好適な対処をするシステムは存在しない。
【００１５】
本発明の目的は、不自然な映像の表示を防止することができる画像処理装置及び方法、並びにプログラムを提供することにある。
【００１６】
【課題を解決するための手段】
上記目的を達成するために、請求項１記載の画像処理装置は、所定の視線位置で現実の映像を取得する現実映像撮影手段と、前記所定の視線位置を検出する視線位置検出手段と、前記検出された視線位置に応じたＣＧ映像を生成するＣＧ映像生成手段と、前記生成されたＣＧ映像を前記取得された現実映像に重畳して表示装置に表示する映像表示手段とを備える画像処理装置において、複数の位置で囲まれる前記視線位置の有効領域を設定する有効視線位置領域設定手段と、前記検出された視線位置が前記複数の位置で囲まれる前記視線位置の有効領域内にあるか否かを判別する判別手段とを備え、前記映像表示手段は、前記検出された視線位置が前記複数の位置で囲まれる前記視線位置の有効領域内にないときは、前記ＣＧ映像を前記現実映像に重畳せず、前記現実映像のみを表示することを特徴とする。
【００１７】
請求項２記載の画像処理装置は、請求項１記載の画像処理装置において、前記ＣＧ映像位置の有効領域を設定する有効ＣＧ映像位置領域設定手段と、前記重畳すべきＣＧ映像の位置を検出するＣＧ映像位置検出手段と、前記検出されたＣＧ映像の位置が前記設定されたＣＧ映像位置の有効領域内にあるか否かを判別する他の判別手段とを備え、前記映像表示手段は、前記検出されたＣＧ映像の位置が前記設定されたＣＧ映像位置の有効領域内にないときは、当該検出されたＣＧ映像を前記現実映像に重畳せず、前記現実映像のみを表示することを特徴とする。
【００１８】
上記目的を達成するために、請求項３記載の画像処理方法は、所定の視線位置で現実の映像を取得する現実映像撮影工程と、前記所定の視線位置を検出する視線位置検出工程と、前記検出された視線位置に応じたＣＧ映像を生成するＣＧ映像生成工程と、前記生成されたＣＧ映像を前記取得された現実映像に重畳して表示装置に表示する映像表示工程とを備える画像処理方法において、複数の位置で囲まれる前記視線位置の有効領域を設定する有効視線位置領域設定工程と、前記検出された視線位置が前記複数の位置で囲まれる前記視線位置の有効領域内にあるか否かを判別する判別工程とを備え、前記映像表示工程は、前記検出された視線位置が前記複数の位置で囲まれる前記視線位置の有効領域内にないときは、前記ＣＧ映像を前記現実映像に重畳せず、前記現実映像のみを表示することを特徴とする。
【００１９】
請求項４記載の画像処理方法は、請求項３記載の画像処理方法において、前記ＣＧ映像位置の有効領域を設定する有効ＣＧ映像位置領域設定工程と、前記重畳すべきＣＧ映像の位置を検出するＣＧ映像位置検出工程と、前記検出されたＣＧ映像の位置が前記設定されたＣＧ映像位置の有効領域内にあるか否かを判別する他の判別工程とを備え、前記映像表示工程は、前記検出されたＣＧ映像の位置が前記設定されたＣＧ映像位置の有効領域内にないときは、当該検出されたＣＧ映像を前記現実映像に重畳せず、前記現実映像のみを表示することを特徴とする。
【００２０】
上記目的を達成するために、請求項５記載のプログラムは、コンピュータに画像処理方法を実行させるためのプログラムにおいて、所定の視線位置で現実の映像を取得する現実映像撮影モジュールと、前記所定の視線位置を検出する視線位置検出モジュールと、前記検出された視線位置に応じたＣＧ映像を生成するＣＧ映像生成モジュールと、前記生成されたＣＧ映像を前記取得された現実映像に重畳して表示装置に表示する映像表示モジュールと、複数の位置で囲まれる前記視線位置の有効領域を設定する有効視線位置領域設定モジュールと、前記検出された視線位置が前記複数の位置で囲まれる前記視線位置の有効領域内にあるか否かを判別する判別モジュールとを備え、前記映像表示モジュールは、前記検出された視線位置が前記複数の位置で囲まれる前記視線位置の有効領域内にないときは、前記ＣＧ映像を前記現実映像に重畳せず、前記現実映像のみを表示することを特徴とする。
【００２１】
【発明の実施の形態】
以下、本発明の実施の形態に係る画像処理（複合現実感提示）装置を図を用いて詳述する。
【００２２】
図１は、本発明の実施の形態に係る画像処理（複合現実感提示）装置の概略構成を示すブロック図である。
【００２３】
図１において、本発明の実施の形態に係る複合現実感提示装置は、ＣＰＵ１０１、メモリ１０３、メモリ１０４、位置姿勢センサ本体１０５ａ、ヘッドマウントディスプレイ（ＨＭＤ）１０６、及びビデオカメラ１０７を有し、これらは計算機バス１０２を介して互いに接続されている。また、位置姿勢センサ本体１０５ａには、位置姿勢センサ１０５ｂ，１０５ｃ，１０５ｄが接続され、これらのうち、位置姿勢センサ１０５bはビデオカメラ１０７の視線位置姿勢を検出するためにＨＭＤ１０６に取付けられている。位置姿勢センサ１０５c，１０５dはどちらもＣＧ映像位置姿勢検出用のものである。
【００２４】
メモリ１０３とメモリ１０４とはハード構成を同じくするが、メモリ１０３は、後述する図２の処理を実行するプログラムとして、視線位置姿勢検出モジュール１１０、現実映像モジュール１１１、視線位置有効判定モジュール１１２、ＣＧ映像位置姿勢検出モジュール１１３、ＣＧ映像位置有効判定モジュール１１４、ＣＧ映像生成モジュール１１５、映像表示モジュール１１６、有効視線位置領域設定モジュール１１７、及び有効ＣＧ映像位置領域設定モジュール１１８を格納しており、メモリ１０４は、上記プログラム中で使用されるデータ領域として、センサ有効領域中心データ領域１２０、映像メモリ領域１２１、視線位置姿勢データ領域１２２、ＣＧ映像位置姿勢データ領域１２３、有効視線位置データ領域１２４、有効ＣＧ映像位置データ領域１２５、視線位置距離データ領域１２６、ＣＧ映像位置距離データ領域１２７を有する。
【００２５】
図２は、図１の複合現実感提示装置によって実行される映像表示処理のフローチャートである。
【００２６】
図２において、まず、視線位置姿勢検出モジュール１１０によって、従来から用いられている技術、例えば磁気センサ等を用いてビデオカメラ１０７の視線の３次元位置（Ｘ，Ｙ，Ｚ）及び姿勢（ローリング、ピッチング、ヨーイング）を検出して、ビデオカメラ１０７の位置及び姿勢データをメモリ１０４の視線位置姿勢データ領域１２２に書き込み（ステップＳ２０１）、次いで、現実映像撮影モジュール１１１によって、ビデオカメラ１０７で撮影された映像をキャプチャし、現実空間の映像（現実の映像）をメモリ１０４の映像メモリ領域１２１に書き込む（ステップＳ２０２）。
【００２７】
続くステップＳ２０３では、後述する図３の視線位置有効判定モジュール１１２によって、検出されたビデオカメラ１０７の視線位置が有効領域内にあるか否かを判別し、視線位置が有効領域内にあるときは、ステップＳ２０４に進み、ＣＧ用の位置姿勢センサの数を数えるための正の整数からなる変数Ｎを１に初期化する（ステップＳ２０４）。本実施の形態では、位置姿勢センサの数は、位置姿勢センサ１０５ｃ，１０５ｄの２つである。
【００２８】
次いで、ＣＧ映像位置姿勢検出モジュール１１３によって、位置姿勢センサ１０５ｃ，１０５ｄを用いて第Ｎ番目のＣＧ映像を表示すべき位置姿勢を検出し、メモリ１０４中のＣＧ映像位置姿勢データ領域１２３に格納する（ステップＳ２０５）。ＣＧ映像位置姿勢検出モジュール１１３は前述した視線位置姿勢検出モジュール１１０と構成が同じである。
【００２９】
ステップＳ２０６では、後述する図４のＣＧ映像位置有効判定モジュール１１４によって、検出されたＣＧ映像位置が有効領域内にあるか否かを判別し、ＣＧ映像位置が有効領域内にあるときは、ＣＧ映像生成モジュール１１５によって、視線位置姿勢データ領域１２２のデータとＣＧ映像位置姿勢データ領域１２３のデータに基づいて、３次元モデリングされた第Ｎ番目のＣＧ映像を映像メモリ領域１２１に重ね描きして（ステップＳ２０７）、ステップＳ２０８に進む。一方、ステップＳ２０６の判別の結果、ＣＧ映像位置が有効領域内になければ、ステップＳ２０７をスキップして、ステップＳ２０８に進む。
【００３０】
ＣＧ映像生成モジュール１１５は、ＣＧ映像を、ＣＧ映像位置姿勢データ領域１２３のデータが示す位置姿勢に置かれたものを視線位置姿勢データ領域１２２のデータが示す視線から観察したようにレンダリングする。映像メモリ領域１２１には、すでにステップＳ２０３において現実映像撮影モジュール１１１によって現実空間の映像が書き込まれており、ＣＧ映像はこの上に重ねて描画されることになる。
【００３１】
続くステップＳ２０８では、変数Ｎが１だけインクリメントされ、次のステップＳ２０９で、変数Ｎが２（ＣＧ映像の数）以下であるか否かを判別し、２以下であるときは、ステップＳ２０５以降の処理を繰り返す一方、２を超えるときは、ステップＳ２１０に進む。
【００３２】
ステップＳ２０３の判別の結果、視線位置が有効領域内にないときは、ステップＳ２０４以降の処理（第Ｎ番目のＣＧ映像の映像メモリ領域への格納）を実行することなく、直ちにステップＳ２１０に進む。
【００３３】
続くステップＳ２１０では、映像表示モジュール１１６によって、映像メモリ領域１２１に書き込まれた映像をＨＭＤ１０６のフレームメモリに書き込み、実際の表示を行う。
【００３４】
次いで、ステップＳ２１１では全体の処理を終了するか否かを判別し、終了しない場合は再びステップＳ２０１以降の処理を繰り返す。処理を繰り返すことによってビデオカメラ１０７から次々と現実映像をキャプチャし、これにＣＧ映像を重ね書きし、表示するというサイクルを連続して行うことになる。
【００３５】
図２の処理によれば、位置姿勢センサ１０５ｃ，１０５ｄの視線位置が有効領域内にないときは（ステップＳ２０３でＮＯ）、全てのＣＧ映像を描画することなく現実映像のみをＨＭＤに表示し（ステップＳ２１０）、ＣＧ映像位置が有効領域内にないときは（ステップＳ２０６でＮＯ）、そのＣＧ映像のみの描画を停止し且つ他のＣＧ映像を描画する（ステップＳ２０６〜Ｓ２０７）ので、不自然な映像の表示を防止することができる。
【００３６】
図３は、図２のステップＳ２０３における視線位置有効判定モジュール１１２のフローチャートである。
【００３７】
メモリ１０４の有効視線位置データ領域１２４には、事前に、有効視線位置領域設定モジュール１１７によってビデオカメラ１０７の視線位置の有効領域の値が記録されているものとする。また、メモリ１０４のセンサ有効領域中心データ領域１２０には、位置姿勢センサの有効領域の中心となる座標（Ｘ，Ｙ，Ｚ）が格納されているものとする。
【００３８】
有効視線位置データ領域１２４のデータは、例えば、センサ有効領域中心からのビデオカメラ１０７の視線位置の有効距離の値であり、この場合１つの数値となる。
【００３９】
図３において、まず、メモリ１０４の視線位置姿勢データ領域１２２のデータとセンサ有効領域中心データ領域１２０のデータから、ビデオカメラ１０７の視線位置とセンサ有効領域中心の間の距離を計算し、メモリ１０４の視線位置距離データ領域１２６に格納する（ステップＳ３０１）。
【００４０】
次いで、視線位置距離データ領域１２６のデータが有効視線位置データ領域１２４のデータ未満であるか否かを判別し（ステップＳ３０２）、ビデオカメラ１０７の視線位置距離が有効視線位置距離未満であるときは、視線位置が有効領域内にあると判定し（ステップＳ３０３）、視線位置距離が有効視線位置距離以上であるときは、有効領域内にないと判定して（ステップＳ３０４）、本処理を終了する。
【００４１】
有効視線位置データ領域１２４が保持するデータは、本実施の形態に挙げたようなセンサ有効領域中心からの距離に限らず、特定の領域を示す複数の位置の組などであってもよい。その場合は、視線位置有効判定モジュール１１２は、視線位置姿勢がその複数の位置で囲まれる領域内にあるか否かを判定することとなる。
【００４２】
図４は、図２のステップＳ２０６におけるＣＧ映像位置有効判定モジュール１１４のフローチャートである。
【００４３】
メモリ１０４の有効ＣＧ映像位置データ領域１２５には、事前に、有効ＣＧ映像位置領域設定モジュール１１８によって、ＣＧ映像位置の有効領域の値が格納されているものとする。また、メモリ１０４のセンサ有効領域中心１２０には、位置姿勢センサの有効領域の中心となる座標（Ｘ，Ｙ，Ｚ）が格納されているものとする。
【００４４】
有効ＣＧ映像位置データ領域１２５は、例えば、センサ有効領域中心からのＣＧ映像位置の有効距離の値であり、この場合は１つの数値となる。
【００４５】
図４において、ステップＳ４０１では、メモリ１０４のＣＧ映像位置姿勢データ領域１２３のデータとセンサ有効領域中心データ領域１２０のデータから、ＣＧ映像位置とセンサ有効領域中心の間の距離を計算し、メモリ１０４のＣＧ映像位置距離データ領域１２７に格納する（ステップＳ４０１）。
【００４６】
次いで、ＣＧ映像位置姿勢データ領域１２３のデータが有効ＣＧ映像位置データ領域１２５のデータ未満であるか否かを判別し（ステップＳ４０２）、ＣＧ映像位置距離が有効ＣＧ映像位置距離未満であるときは、ＣＧ映像位置が有効領域内にあると判定し（ステップＳ４０３）、ＣＧ映像位置距離が有効ＣＧ映像位置距離以上であるときは、有効領域内にないと判定して（ステップＳ４０４）、本処理を終了する。
【００４７】
上記実施の形態において、視線位置姿勢やＣＧ映像位置姿勢を検出するための位置姿勢センサ１０５b，１０５c，１０５dがセンサの有効領域外に出てしまうと、センサが正しい値を検出できなくなって検出値が不定になるため、有効視線位置データ領域１２４と有効ＣＧ映像位置データ領域１２５はセンサの有効領域内にするのが普通である。また、位置姿勢センサ１０５ｂ〜１０５ｃは有効領域の境界近傍で検出値に誤差が多く含まれる場合あるため、有効視線位置領域と有効ＣＧ映像位置領域は、位置姿勢センサ１０５ｂ〜１０５ｃの検出値に基づいて描画するＣＧ映像が不自然にならないように位置姿勢センサの設置状況等に応じて領域を設定するのがよい。
【００４８】
また、位置姿勢センサ１０５ｂ〜１０５ｃの１つ１つに個別に有効領域を設定してもよく、この場合は、メモリ１０４の有効ＣＧ映像位置データ領域１２５を位置姿勢センサ毎に複数用意すればよい。
【００４９】
また、本発明は、前述した実施の形態を実現するソフトウェアのプログラムモジュールを記憶した記憶媒体を、システム又は装置にプログラムを供給することによって達成される場合にも適用できることはいうまでもない。この場合、記憶媒体から読み出されたプログラムモジュール自体が本発明の新規な機能を実現することになり、そのプログラムを記憶した記憶媒体は本発明を構成することになる。
【００５０】
上記実施の形態では、プログラムモジュールはメモリ１０３に格納されるが、プログラムモジュールを供給する記憶媒体としては、フロッピーディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＭＯ、ＣＤ−Ｒ、ＤＶＤ、磁気テープ、不揮発性のメモリカード等の様々なものが考えられるが、特定のものに限定する必要はなく、上記プログラムを記憶できるものであればよい。
【００５１】
【発明の効果】
本発明によれば、検出した現実の映像の視線位置が、設定された現実の映像の視線位置の有効領域内にないときは、視線位置に応じて生成されたＣＧ映像を現実映像に重畳せず、現実映像のみを表示するので、不自然な映像が表示されることを防止することができる。
さらに、本発明によれば、複数の位置で囲まれる視線位置の有効領域を設定し、検出された視線位置が複数の位置で囲まれる有効領域内にあるか否かを判別するので、有効領域を空間的に任意に設定することができる。
【００５２】
請求項２記載の画像処理装置及び請求項４記載の画像処理方法によれば、検出されたＣＧ映像の位置が、設定されたＣＧ映像位置の有効領域内にないときは、当該検出されたＣＧ映像を現実映像に重畳せず、現実映像のみを表示するので、そのＣＧ映像の描画を停止し且つ他のＣＧ映像を描画することにより不自然な映像の表示を防止することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態に係る画像処理（複合現実感提示）装置の概略構成を示すブロック図である。
【図２】図１の複合現実感提示装置によって実行される映像表示処理のフローチャートである。
【図３】図２のステップＳ２０３における視線位置有効判定モジュール１１２のフローチャートである。
【図４】図２のステップＳ２０６におけるＣＧ映像位置有効判定モジュール１１４のフローチャートである。
【符号の説明】
１０１ＣＰＵ
１０２計算機バス
１０３，１０４メモリ
１０５ａ位置姿勢センサ本体
１０５ｂ、１０５ｃ、１０５ｄ位置姿勢センサ
１０６ＨＭＤ
１０７ビデオカメラ[0001]
[Industrial application fields]
The present invention relates to an image processing apparatus, method, and program , and in particular, superimposes a real-space video and a video generated by three-dimensionally modeled computer graphics (hereinafter referred to as “CG”), In addition, the present invention relates to an image processing apparatus and method, and a program capable of displaying both images while aligning them.
[0002]
[Prior art]
Conventionally, images in real space and images generated by 3D modeled CG images are superimposed and displayed while aligning both images, as if they were drawn in CG in the real world. There is a mixed reality presentation device that can make it appear as if an object (virtual object) exists. This apparatus includes both a real video shooting means (for example, a video camera) for shooting a real video, and a CG video generation means for creating a CG video as seen from the position where the real video is shot. It comprises video display means (for example, a head mounted display (HMD) or a monitor) that can be displayed in a synthesized manner. Here, the video display means displays the CG video and the real video in the correct positional relationship even if the gaze position of the real video shooting means changes, and the real video shooting means displays the gaze position and the gaze direction. A line-of-sight position / orientation detection means (for example, a position / orientation sensor) is provided for detection.
[0003]
The CG image generation means places the CG image that has been three-dimensionally modeled in a virtual space having the same scale as the real space, and renders the CG image as observed from the line-of-sight position and the line-of-sight direction detected by the line-of-sight position / orientation detection means. . When the CG video generated in this way and the real video are superimposed, as a result, the target of the CG video is correctly placed in the real space regardless of the gaze position or gaze direction observed by the real video shooting means. It is possible to display an image as if it were displayed.
[0004]
Further, a further position / orientation detection device is attached to a place where a CG image is desired to appear, and the CG image is caused to appear at a position obtained therefrom. For example, by attaching a position / orientation detection device (position / orientation sensor) to the hand and generating a CG image at the position of the sensor, it is as if the object of the CG image is always on the hand, Even if it moves like this, it is possible to display an image as if a CG image is on the hand.
[0005]
A video camera as a real image photographing means for photographing an object to be photographed in a real space captures an image in the direction of the line of sight and captures the captured image data in a memory.
[0006]
For example, an HMD is used as a video display device that synthesizes and displays an actual video and a CG video. By using the HMD instead of a normal monitor and mounting the video camera in the direction of the line of sight of the HMD, the image in the direction that the observer is facing can be displayed on the HMD, and the observer can change the direction. Since the CG image can be displayed when facing the viewer, it is possible to enhance the immersive feeling of the observer.
[0007]
As the position / orientation detection means, a magnetic position / orientation sensor or the like is used. By attaching this to the video camera or the HMD to which the video camera is attached, the position / orientation value of the line of sight of the video camera is obtained. To detect. A magnetic position / orientation sensor detects the relative position and orientation between a magnetism generator (transmitter) and a magnetic sensor (receiver), such as the product FASTRAK from PolhemuS of the United States. It is done. This is a device that detects the three-dimensional position (X, Y, Z) and posture (rolling, pitching, yawing) of a sensor in real time in a specific region.
[0008]
With the above configuration, the observer can observe the world in which the real image and the CG image are superimposed through the HMD. When the observer looks around, the real image capturing device (video camera) provided in the HMD captures a real image, and the line-of-sight position / orientation detection means (position / posture sensor) provided in the HMD detects the position of the video camera. The direction is detected, and the CG image generation means generates a CG image viewed from the line-of-sight position / orientation in response to the detected direction, and displays the CG image superimposed on the actual image.
[0009]
It is also possible to provide a plurality of observers. Video cameras and display devices (such as HMDs) that shoot real space, and position and orientation sensors are prepared as many as the number of viewers, and real space and CG video are generated by shooting real space from the viewpoint of each viewer. These may be combined and displayed to each observer.
[0010]
[Problems to be solved by the invention]
However, the above-described conventional mixed reality presentation apparatus renders a CG image based on values obtained from a gaze position / orientation sensor for detecting a gaze position and a gaze direction (orientation), and there is an error in these values. If there is, the positions of all the generated CG images are shifted, and CG images viewed from positions and directions shifted by the error are generated.
[0011]
Further, even when the line-of-sight position / orientation sensor has a small error, if there is an error in the CG image position / orientation sensor installed to detect the location where the CG image is to appear, the CG intended to appear at that position. The position and direction of the image are shifted.
[0012]
The values obtained from such position and orientation sensors generally include more or less errors. For example, in a position and orientation sensor using a magnetic system, the region where the position and orientation can be detected is a spatial region centered on the magnetic transmitter. Even within that region, the detection error increases as the receiver (sensor) moves away from the magnetic transmitter. For this reason, the farther the observation position, that is, the position of the camera for taking a real image, is farther from the transmitter, the farther the position / orientation sensor is from the transmitter, resulting in more errors in the values of the gaze position / orientation. . When the position / orientation sensor is placed at a position where the CG image appears, even if the observer position is close to the transmitter position and the observation is performed from a range with little error, the position where the CG image appears is the transmitter position. As the distance from the position increases, the error of the position / orientation sensor increases, resulting in a shift in the position where the CG image appears.
[0013]
In the conventional mixed reality presentation device, even when the error of the position / orientation sensor becomes large due to the movement of the observer or the movement of the CG image position, the CG image is drawn. Even if the CG video display that is planned to be on the hand, for example, that is out of alignment, is displayed as it is even if it is an unnatural video that is displayed on a place that is not on the hand. There is a problem of end. In the case of a virtual reality system that does not require alignment with the real space (a system that displays only CG images without real images), there is a big problem because it is not so unnatural even if there is a slight error. However, the mixed reality system requires high-precision alignment with the real object, so even a small error that occurs in the effective measurement area of the position and orientation sensor makes the image unnatural. And may not be practical.
[0014]
Also, for example, if there is an error in the line-of-sight position, an error will occur in the alignment of all CG images, but if there is an error in the CG image position and orientation, only an error will occur in the drawing of the CG image. The phenomenon that occurs is different between when the image position / orientation sensor error increases and when the line-of-sight position / orientation sensor error increases, and there is no system that can take appropriate measures in each case.
[0015]
An object of the present invention is to provide an image processing apparatus and method, and a program capable of preventing unnatural video display.
[0016]
[Means for Solving the Problems]
In order to achieve the above object, the image processing apparatus according to claim 1, a real image photographing unit that acquires a real image at a predetermined line-of-sight position, a line-of-sight position detection unit that detects the predetermined line-of-sight position, An image processing apparatus comprising: a CG video generation unit that generates a CG video corresponding to the detected line-of-sight position; and a video display unit that displays the generated CG video on a display device so as to be superimposed on the acquired real video. The effective line-of-sight position area setting means for setting an effective area of the line-of-sight position surrounded by a plurality of positions, and whether or not the detected line-of-sight position is within the effective area of the line-of-sight position surrounded by the plurality of positions Determining means for determining whether or not the image display means displays the CG image when the detected line-of-sight position is not within the effective region of the line-of-sight position surrounded by the plurality of positions. Not superimposed on the image, and displaying only the physical image.
[0017]
The image processing apparatus according to claim 2, in the image processing apparatus according to claim 1, detects effective CG image position area setting means for setting an effective area of the CG image position, and detects the position of the CG image to be superimposed. CG video position detection means, and other determination means for determining whether or not the position of the detected CG video is within the effective area of the set CG video position, the video display means When the position of the detected CG video is not within the effective area of the set CG video position, the detected CG video is not superimposed on the real video and only the real video is displayed. To do.
[0018]
In order to achieve the above object, the image processing method according to claim 3 includes a real image capturing step of acquiring a real image at a predetermined line-of-sight position, a line-of-sight position detecting step of detecting the predetermined line-of-sight position, An image processing method comprising: a CG video generation step of generating a CG video according to the detected line-of-sight position; and a video display step of superimposing the generated CG video on the acquired real video and displaying it on a display device. And an effective line-of-sight position region setting step of setting an effective area of the line-of-sight position surrounded by a plurality of positions, and whether or not the detected line-of-sight position is within the effective area of the line-of-sight position surrounded by the plurality of positions And the image display step displays the CG image when the detected line-of-sight position is not within the effective region of the line-of-sight position surrounded by the plurality of positions. Not superimposed on the image, and displaying only the physical image.
[0019]
The image processing method according to claim 4 is the image processing method according to claim 3, wherein an effective CG image position region setting step for setting an effective region of the CG image position and a position of the CG image to be superimposed are detected. A CG video position detection step, and another determination step for determining whether or not the position of the detected CG video is within an effective area of the set CG video position, wherein the video display step includes: When the position of the detected CG video is not within the effective area of the set CG video position, the detected CG video is not superimposed on the real video and only the real video is displayed. To do.
[0020]
In order to achieve the above object, a program according to claim 5 is a program for causing a computer to execute an image processing method, a real image photographing module for acquiring a real image at a predetermined line-of-sight position, and the predetermined line-of-sight. A line-of-sight position detection module for detecting a position, a CG video generation module for generating a CG video corresponding to the detected line-of-sight position, and the generated CG video superimposed on the acquired real video on the display device A video display module to display, an effective line-of-sight position area setting module for setting an effective area of the line-of-sight position surrounded by a plurality of positions, and an effective area of the line-of-sight position in which the detected line-of-sight position is surrounded by the plurality of positions A discriminating module that discriminates whether or not the video line is within the plurality of detected visual line positions. When not in the effective area of the viewpoint position surrounded by the position does not overlap the CG image to the actual image, and displaying only the physical image.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an image processing (mixed reality presentation) apparatus according to an embodiment of the present invention will be described in detail with reference to the drawings.
[0022]
FIG. 1 is a block diagram showing a schematic configuration of an image processing (mixed reality presentation) apparatus according to an embodiment of the present invention.
[0023]
1, the mixed reality presentation apparatus according to the embodiment of the present invention includes a CPU 101, a memory 103, a memory 104, a position and orientation sensor main body 105a, a head mounted display (HMD) 106, and a video camera 107. Are connected to each other via a computer bus 102. Position / orientation sensors 105b, 105c, and 105d are connected to the position / orientation sensor main body 105a. Among these, the position / orientation sensor 105b is attached to the HMD 106 in order to detect the line-of-sight position and orientation of the video camera 107. Both of the position / orientation sensors 105c and 105d are for CG image position / orientation detection.
[0024]
The memory 103 and the memory 104 have the same hardware configuration, but the memory 103 has a gaze position / orientation detection module 110, a real image module 111, a gaze position validity determination module 112, and a CG as programs for executing the processing of FIG. A video position / attitude detection module 113, a CG video position validity determination module 114, a CG video generation module 115, a video display module 116, an effective line-of-sight position area setting module 117, and an effective CG video position area setting module 118 are stored. Reference numeral 104 denotes a sensor effective area central data area 120, a video memory area 121, a visual line position / attitude data area 122, a CG video position / attitude data area 123, an effective visual line position data area 124, and an effective data area used in the program. CG movie Position data area 125 has a line-of-sight position distance data area 126, CG video location distance data area 127.
[0025]
FIG. 2 is a flowchart of the video display process executed by the mixed reality presentation apparatus of FIG.
[0026]
In FIG. 2, first, the gaze position / orientation detection module 110 uses a conventionally used technique, for example, a magnetic sensor or the like, to detect the three-dimensional position (X, Y, Z) and attitude (rolling, (Pitching, yawing) is detected, and the position and orientation data of the video camera 107 are written into the line-of-sight position / orientation data area 122 of the memory 104 (step S201), and then captured by the video camera 107 by the real image capturing module 111. The video is captured, and the video in the real space (real video) is written in the video memory area 121 of the memory 104 (step S202).
[0027]
In the subsequent step S203, the gaze position validity determination module 112 of FIG. 3 described later determines whether or not the detected gaze position of the video camera 107 is in the valid area, and when the gaze position is in the valid area, In step S204, a variable N consisting of a positive integer for counting the number of position / orientation sensors for CG is initialized to 1 (step S204). In the present embodiment, the number of position and orientation sensors is two, that is, the position and orientation sensors 105c and 105d.
[0028]
Next, the CG image position / orientation detection module 113 detects the position / orientation at which the Nth CG image is to be displayed using the position / orientation sensors 105 c and 105 d, and stores it in the CG image position / orientation data area 123 in the memory 104. (Step S205). The CG image position / orientation detection module 113 has the same configuration as the above-described line-of-sight position / orientation detection module 110.
[0029]
In step S206, the later-described CG video position validity determination module 114 in FIG. 4 determines whether or not the detected CG video position is within the valid area. If the CG video position is within the valid area, Based on the data in the line-of-sight position / orientation data area 122 and the data in the CG image position / orientation data area 123, the image generation module 115 superimposes the Nth CG image modeled three-dimensionally on the image memory area 121 ( The process proceeds to step S207) and step S208. On the other hand, if the result of determination in step S206 is that the CG video position is not within the valid area, step S207 is skipped and processing proceeds to step S208.
[0030]
The CG video generation module 115 renders the CG video as if viewed from the line of sight indicated by the data in the line-of-sight position / orientation data area 122, which is placed in the position / posture indicated by the data in the CG video position / attitude data area 123. In the video memory area 121, the video of the real space has already been written by the real video shooting module 111 in step S203, and the CG video is drawn on top of this.
[0031]
In the subsequent step S208, the variable N is incremented by 1, and in the next step S209, it is determined whether or not the variable N is 2 (the number of CG video images) or less. On the other hand, if the process is repeated and the number exceeds 2, the process proceeds to step S210.
[0032]
As a result of the determination in step S203, if the line-of-sight position is not within the effective area, the process immediately proceeds to step S210 without executing the processing after step S204 (storage of the Nth CG video in the video memory area).
[0033]
In the subsequent step S210, the video display module 116 writes the video written in the video memory area 121 into the frame memory of the HMD 106 and performs actual display.
[0034]
Next, in step S211, it is determined whether or not the entire process is to be ended. If not, the process after step S201 is repeated again. By repeating the processing, a real image is captured one after another from the video camera 107, and a cycle in which the CG image is overwritten and displayed is continuously performed.
[0035]
According to the processing of FIG. 2, when the line-of-sight positions of the position / orientation sensors 105c and 105d are not within the effective region (NO in step S203), only the real video is displayed on the HMD without rendering all the CG video ( Step S210) When the CG video position is not within the effective area (NO in Step S206), drawing of only the CG video is stopped and other CG video is drawn (Steps S206 to S207), which is unnatural. Display of video can be prevented.
[0036]
FIG. 3 is a flowchart of the gaze position validity determination module 112 in step S203 of FIG.
[0037]
It is assumed that the effective line-of-sight position data area 124 of the memory 104 has previously recorded the value of the effective line-of-sight position of the video camera 107 by the effective line-of-sight position area setting module 117. Further, it is assumed that the coordinate (X, Y, Z) that is the center of the effective area of the position and orientation sensor is stored in the sensor effective area center data area 120 of the memory 104.
[0038]
The data in the effective line-of-sight position data area 124 is, for example, the value of the effective distance of the line-of-sight position of the video camera 107 from the center of the sensor effective area. In this case, the value is one numerical value.
[0039]
In FIG. 3, first, the distance between the line-of-sight position of the video camera 107 and the center of the sensor effective area is calculated from the data of the line-of-sight position / orientation data area 122 of the memory 104 and the data of the sensor effective area center data area 120. Is stored in the line-of-sight position distance data area 126 (step S301).
[0040]
Next, it is determined whether or not the data in the line-of-sight position distance data area 126 is less than the data in the effective line-of-sight position data area 124 (step S302), and when the line-of-sight position distance of the video camera 107 is less than the effective line-of-sight position distance. Then, it is determined that the line-of-sight position is within the effective area (step S303), and when the line-of-sight position distance is equal to or greater than the effective line-of-sight position distance, it is determined that the line-of-sight position is not within the effective area (step S304). .
[0041]
The data stored in the effective line-of-sight position data area 124 is not limited to the distance from the center of the sensor effective area as described in the present embodiment, but may be a set of a plurality of positions indicating a specific area. In that case, the gaze position validity determination module 112 determines whether or not the gaze position / posture is within an area surrounded by the plurality of positions.
[0042]
FIG. 4 is a flowchart of the CG video position validity determination module 114 in step S206 of FIG.
[0043]
It is assumed that the effective CG image position data area 125 of the memory 104 stores the value of the effective area of the CG image position by the effective CG image position area setting module 118 in advance. Further, it is assumed that coordinates (X, Y, Z) serving as the center of the effective area of the position and orientation sensor are stored in the sensor effective area center 120 of the memory 104.
[0044]
The effective CG image position data area 125 is, for example, the value of the effective distance of the CG image position from the center of the sensor effective area, and in this case, is a single numerical value.
[0045]
In FIG. 4, in step S 401, the distance between the CG image position and the sensor effective area center is calculated from the data in the CG image position / orientation data area 123 and the data in the sensor effective area center data area 120 in the memory 104. Is stored in the CG video position distance data area 127 (step S401).
[0046]
Next, it is determined whether or not the data in the CG video position / attitude data area 123 is less than the data in the valid CG video position data area 125 (step S402), and when the CG video position distance is less than the valid CG video position distance. The CG video position is determined to be within the effective area (step S403). If the CG video position distance is equal to or greater than the effective CG video position distance, it is determined that the CG video position is not within the effective area (step S404). Exit.
[0047]
In the above embodiment, if the position / orientation sensors 105b, 105c, and 105d for detecting the gaze position / orientation and the CG image position / orientation go out of the effective area of the sensor, the sensor cannot detect a correct value and the detected value Therefore, the effective line-of-sight position data area 124 and the effective CG video position data area 125 are usually within the effective area of the sensor. In addition, since the position and orientation sensors 105b to 105c may include many errors in the detection values near the boundary of the effective area, the effective line-of-sight position area and the effective CG video position area are based on the detection values of the position and orientation sensors 105b to 105c. It is preferable to set the area according to the installation status of the position and orientation sensor so that the CG image to be drawn does not become unnatural.
[0048]
In addition, an effective area may be individually set for each of the position and orientation sensors 105b to 105c. In this case, a plurality of effective CG video position data areas 125 in the memory 104 may be prepared for each position and orientation sensor. .
[0049]
It goes without saying that the present invention can also be applied to a case in which a storage medium storing software program modules for realizing the above-described embodiments is achieved by supplying a program to a system or apparatus. In this case, the program module itself read from the storage medium realizes the novel function of the present invention, and the storage medium storing the program constitutes the present invention.
[0050]
In the above embodiment, the program module is stored in the memory 103. As a storage medium for supplying the program module, a floppy disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, an MO, a CD-R, a DVD, Although various things, such as a magnetic tape and a non-volatile memory card, can be considered, it is not necessary to limit to a specific thing, What is necessary is just to be able to memorize | store the said program.
[0051]
【The invention's effect】
According to the present invention, the line-of-sight position of the real image which is detected, when not in the effective area of the line-of-sight position of the set real images, not overlap the CG image generated in accordance with the viewpoint position in the physical image Therefore, since only the real image is displayed, it is possible to prevent an unnatural image from being displayed.
Furthermore, according to the present invention, the effective region of the line-of-sight position surrounded by a plurality of positions is set, and it is determined whether or not the detected line-of-sight position is within the effective region surrounded by the plurality of positions. Can be arbitrarily set spatially.
[0052]
According to the image processing device of claim 2 and the image processing method of claim 4, when the position of the detected CG video is not within the effective area of the set CG video position, the detected CG video Since the video is not superimposed on the real video and only the real video is displayed , drawing of the CG video is stopped and another CG video is drawn, thereby preventing an unnatural video from being displayed.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of an image processing (mixed reality presentation) apparatus according to an embodiment of the present invention.
FIG. 2 is a flowchart of video display processing executed by the mixed reality presentation device of FIG.
FIG. 3 is a flowchart of a gaze position validity determination module 112 in step S203 of FIG.
4 is a flowchart of a CG video position validity determination module 114 in step S206 of FIG.
[Explanation of symbols]
101 CPU
102 Computer bus 103, 104 Memory 105a Position / orientation sensor main body 105b, 105c, 105d Position / orientation sensor 106 HMD
107 video camera

Claims

A real image photographing means for acquiring a real image at a predetermined line-of-sight position;
Eye-gaze position detecting means for detecting the predetermined eye-gaze position;
CG video generation means for generating a CG video according to the detected line-of-sight position;
An image processing apparatus comprising: a video display unit configured to superimpose the generated CG video on the acquired real video and display the video on a display device;
Effective line-of-sight position area setting means for setting an effective area of the line-of-sight position surrounded by a plurality of positions;
Determining means for determining whether or not the detected line-of-sight position is within an effective area of the line-of-sight position surrounded by the plurality of positions;
The video display means displays only the real video without superimposing the CG video on the real video when the detected gaze position is not within the effective region of the gaze position surrounded by the plurality of positions. An image processing apparatus.

Effective CG image position setting means for setting an effective area of the CG image position;
CG image position detecting means for detecting the position of the CG image to be superimposed;
Other discriminating means for discriminating whether or not the position of the detected CG video is within the effective area of the set CG video position,
When the position of the detected CG image is not within the effective area of the set CG image position, the image display means does not superimpose the detected CG image on the actual image and only the actual image the image processing apparatus according to claim 1, wherein the displaying the.

A real image shooting process of acquiring a real image at a predetermined line-of-sight position;
A line-of-sight position detecting step of detecting the predetermined line-of-sight position;
A CG image generation step for generating a CG image corresponding to the detected line-of-sight position;
An image processing method comprising: a video display step of displaying the generated CG video on a display device in a superimposed manner on the acquired real video;
An effective line-of-sight position area setting step for setting an effective area of the line-of-sight position surrounded by a plurality of positions;
A determination step of determining whether or not the detected line-of-sight position is within an effective region of the line-of-sight position surrounded by the plurality of positions,
The video display step displays only the real video without superimposing the CG video on the real video when the detected gaze position is not within the effective region of the gaze position surrounded by the plurality of positions. An image processing method.

An effective CG image position area setting step for setting an effective area of the CG image position;
A CG image position detecting step for detecting a position of the CG image to be superimposed;
Another determination step of determining whether or not the position of the detected CG image is within an effective area of the set CG image position,
In the video display step, when the position of the detected CG video is not within the effective area of the set CG video position, the detected CG video is not superimposed on the real video and only the real video is displayed. the image processing method according to claim 3, wherein the displaying the.

In a program for causing a computer to execute an image processing method,
A real image capturing module for acquiring a real image at a predetermined line-of-sight position;
A line-of-sight position detection module for detecting the predetermined line-of-sight position;
A CG video generation module that generates a CG video according to the detected line-of-sight position;
A video display module that superimposes the generated CG video on the acquired real video and displays it on a display device;
An effective line-of-sight position area setting module for setting an effective area of the line-of-sight position surrounded by a plurality of positions;
A determination module that determines whether or not the detected line-of-sight position is within an effective region of the line-of-sight position surrounded by the plurality of positions;
When the detected line-of-sight position is not within the effective area of the line-of-sight position surrounded by the plurality of positions, the video display module displays only the real video without superimposing the CG video on the real video The program characterized by doing.