JP3793158B2

JP3793158B2 - Information processing method and information processing apparatus

Info

Publication number: JP3793158B2
Application number: JP2003037406A
Authority: JP
Inventors: 清秀佐藤; 裕之山本; 登志一大島; 尚郷谷口; 昭宏片山
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1997-09-01
Filing date: 2003-02-14
Publication date: 2006-07-05
Anticipated expiration: 2018-03-16
Also published as: JP2003308514A

Description

【０００１】
【発明の属する技術分野】
本発明は、例えばコンピュータグラフィックスによる仮想画像を現実の空間に結合させた複合現実感を作業者に提示する技術に関する。
【０００２】
【従来の技術】
近年、現実空間と仮想空間の繋ぎ目のない(seemless)結合を目的とした複合現実感（以下、「ＭＲ」(Mixed Reality)と称す）に関する研究が盛んになっている。ＭＲは、従来、現実空間と切り離された状況でのみ体験可能であったバーチャルリアリティ（以下ＶＲと略す）の世界と現実空間との共存を目的とし、ＶＲを増強する技術として注目されている。
【０００３】
ＭＲの応用としては、患者の体内の様子を透視しているように医師に提示する医療補助の用途や、工場において製品の組み立て手順を実物に重ねて表示する作業補助の用途など、今までのＶＲとは質的に全く異なった新たな分野が期待されている。これらの応用に対して共通に要求されるのは、現実空間と仮想空間の間の“ずれ”をいかにして取り除くかという技術である。“ずれ”は、位置ずれ、時間ずれ、質的ずれに分類可能であり、この中でも最も基本的な要求といえる位置ずれの解消（即ち、位置合わせ）については、従来から多くの取り組みが行われてきた。
【０００４】
ビデオカメラで撮影された映像に仮想物体を重畳するビデオシースルー(Video-See-Through)方式のＭＲの場合、位置合せの問題は、そのビデオカメラの３次元位置を正確に求める問題に帰結される。半透過型のＨＭＤ(Head Mount Display)を用いる光学シースルー(Optic-See-Through)方式のＭＲの場合における位置合せの問題は、ユーザーの視点の３次元位置を求める問題といえ、それらの計測法としては、磁気センサや超音波センサ、ジャイロといった３次元位置方位センサ利用が一般的であるが、これらの精度は必ずしも十分とはいえず、その誤差が位置ずれの原因となる。
【０００５】
一方、ビデオシースルー方式の場合には、このようなセンサを用いずに画像情報を元に画像上での位置合わせを直接行う手法も考えられる。この手法では位置ずれを直接取り扱えるために、位置合わせが精度よく行える反面、実時間性や信頼性の欠如などの問題があった。近年になって、位置方位センサと画像情報の併用により、両者の欠点を互いに補って精度よい位置合わせを実現する試みが報告されている。
【０００６】
１つの試みとして、「Dynamic Registration Correction in Video-Based-Augmented Reality Systems」(Bajura MichaelとUlrich Neuman, IEEE Computer Graphics and Applications 15, 5, pp. 52-60, 1995)（以下、第１文献と呼ぶ）は、ビデオシースルー方式のＭＲにおいて、磁気センサの誤差によって生じる位置ずれを画像情報によって補正する手法を提案した。
【０００７】
また、「Superior Augmented Reality Registration by Integrating Landmark Tracking and Magnetic Tracking」(State Andrei等, Proc. of SIGGRAPH 96,pp. 429-438, 1996)（以下、第２文献と呼ぶ）は、さらにこの手法を発展させ、画像情報による位置推定の曖昧性をセンサ情報によって補う手法を提案した。上記第２文献は、位置方位センサposition-azimuth sensorのみを用いてビデオシースルー方式のＭＲ提示システムを構築した場合において、そのセンサの誤差が原因となって画像上に発生する位置ずれを解消するために、３次元位置が既知であるランドマークを現実空間に設定する。このランドマークは、その位置ずれを画像情報から検出するための手掛かりとなる。
【０００８】
位置方位センサの出力に誤差が含まれていないとすると、画像上で実際に観測されるランドマークの座標（Ｑ_Iとする）と、そのセンサ出力に基づいて得られるカメラ位置とランドマークの３次元位置とから導きだされるランドマークの観測予測座標（Ｐ_Iとする）とは、同一となるはずである。しかし、実際にはセンサ出力に基づいて得られたカメラ位置は正確ではないため、ランドマークの座標Ｑ_Iと観測予測座標Ｐ_Iは一致しない。このＰ_IとＱ_Iのずれは、ランドマーク位置における仮想空間と現実空間の位置ずれを表しており、このために、画像からランドマーク位置を抽出することで、ずれの向きと大きさが算出できる。
【０００９】
このように、画像上での位置ずれを定量的に計測することにより、位置ずれを解消するようなカメラ位置の補正が可能となる。方位センサと画像を併用する最も単純な位置合わせ方式は、１点のランドマークを用いたセンサ誤差の補正と考えられ、画像上のランドマークの位置ずれに応じてカメラ位置を平行移動または回転させる手法が第１文献によって提案されている。
【００１０】
第１図に、１点のランドマークを用いた位置ずれ補正の基本的な考え方を示す。以下では、カメラの内部パラメータを既知として、歪みなどの影響を除外した理想的な撮像系によって画像撮影が行われているものと仮定する。カメラの視点位置をＣ、画像上でのランドマークの観測座標をＱ_I 、現実空間のランドマーク位置をＱ_Iとすると、点Ｑ_Iは点Ｃと点Ｑ_Iを結ぶ直線ｌ_Q上に存在する。一方、位置方位センサによって与えられるカメラ位置からは、カメラ座標系におけるランドマーク位置Ｐ_Cと、その画像上での観測座標Ｐ_Iとが推測できる。以下では、点Ｃから点Ｑ_I、点Ｐ_Iへの３次元ベクトルを、それぞれｖ₁、ｖ₂と表記する。この方法では、補正後のランドマークの観測予測座標符Ｐ'_IがＱ_Iに一致するように（すなわち、カメラ座標系における補正後のランドマーク予測位置Ｐ'_Cが、直線ｌ_Q上に乗るように）、カメラと物体の相対的な位置情報を修正する事によって、位置ずれが補正される。
【００１１】
ランドマークの位置ずれを、カメラ位置の回転によって補正することを考える。これは、二つのベクトルｖ₁、ｖ₂の成す角θだけカメラが回転するように、カメラの位置情報に修正を加えることにより実現できる。実際の計算では、上記ベクトルｖ₁、ｖ₂を正規化したベクトルｖ_1n、ｖ_2nを用いて、その外積ｖ_1n×ｖ_2nを回転軸に、内積ｖ_1n・ｖ_2nを回転角として、点Ｃを中心にカメラを回転させる。
【００１２】
ランドマークの位置ずれを、カメラ位置の相対的な平行移動によって補正することを考える。これは、仮想世界中の物体位置をｖ＝ｎ（ｖ₁−ｖ₂）だけ平行移動させることで実現できる。ここでｎは、次式によって定義されるスケールファクタである。
【００１３】
【数１】

【００１４】
ここで、｜ＡＢ｜は点Ａと点Ｂの間の距離を示す記号とする。また、カメラが−ｖだけ平行移動するようにカメラの位置情報に修正を加えることでも、同様の補正が可能となる。これは、この操作によって、相対的に仮想物体がｖだけ移動したことに等しくなるためである。以上の２つの手法は、ランドマーク上での位置ずれを２次元的に一致させる手法であり、３次元的に正しい位置にカメラ位置を補正することではない。しかし、センサ誤差が小さい場合には十分な効果が期待できるものであり、また、補正のための計算コストは非常に小さなものであり、実時間性に優れた手法である。
【００１５】
【発明が解決しようとする課題】
しかしながら、上記文献に示された手法では、唯一のマーカの撮像画像内での位置を捕捉することが必要であるから、そのマーカが常にカメラに撮影されていなくてはならないという制約があるため、ごく限られた範囲の空間しか見ることができなかった。
【００１６】
ましてや、複数の作業者が共通の複合現実空間を共有する場合には、１つのマーカのみでは上記制約は致命的である。
【００１７】
本発明は、このような事態に鑑みてなされたもので、補正値の急激な変化が緩和され、三次元仮想画像の急激な変化による不自然な移動を解消することができる情報処理方法及び情報処理装置を提供することを目的とする。
【００１８】
【課題を解決するための手段】
上記課題を解決するため、本発明に係る情報処理方法は、複数のマーカが配置された現実空間を撮影装置によって撮影することによって得られる撮影画像を取得する第１の取得工程と、前記現実空間内の対象物の位置姿勢を取得する第２の取得工程と、前記撮影画像に含まれている前記マーカの位置を検出する検出工程と、前記マーカを用いて前記対象物の位置姿勢の第１の補正値を算出する第１の算出工程と、前記撮影画像のフレームの前フレームの撮影画像から算出された前記対象物の位置姿勢の第２の補正値を取得する補正値取得工程と、前記第２の補正値を用いて前記第１の補正値を補正することにより第３の補正値を算出する第２の算出工程と、前記第３の補正値を用いて前記対象物の位置姿勢を補正する位置姿勢補正工程と、前記位置姿勢補正工程によって位置姿勢が補正された前記対象物の該位置姿勢に応じた仮想画像を生成する第１の生成工程と、前記仮想画像と前記撮影画像とを合成した複合空間画像を生成する第２の生成工程とを有することを特徴とする。
【００２３】
【発明の実施の形態】
以下、本発明の、複合現実感の提示手法及びＨＭＤを、エアーホッケーゲーム装置に適用した実施形態に係わるシステムを説明する。エアーホッケーゲームは相手の存在する対戦型のゲームであり、通常、下部から圧縮空気を供給してパックを浮かして、このパックを打ち合い、相手のゴールにパックを入れたら得点が入る。得点の多い方を勝者とするゲームである。本実施形態のＭＲを適用したエアホッケーゲームは、仮想のパックを仮想３次元画像として現実環境のテーブルの上に重畳表示してプレーヤーに提示して、その仮想パックをプレーヤに現実のマレットで仮想的に打ち合わせるものである。
【００２４】
このゲーム装置の特徴は、
▲１▼：共通のカメラで複数の作業者に共通の現実世界を撮影し、その共通の画像の中に複数の作業者によって操作される作業用アクチュエータ（本実施形態ではマレット）を特定することによって、１つの複合現実世界を現出させ、複数人で共有可能ならしめる。
▲２▼：広範な現実空間内で大きく移動する作業者の視点位置を精度良く検出するために、頭部の位置と姿勢とを検出する磁気センサの他に、カメラを作業者の頭部に装着し、このカメラが、ゲームプレー用テーブルの上に設けられた複数のマーカの中の少なくとも１つのマーカを撮像ならしめ、この撮像されたマーカの画像座標とそのマーカの既知の位置との差異から、上記磁気センサが検出した頭部の位置／姿勢（即ち、作業者の視点の位置及び姿勢）を補正するものである。
【００２５】
〈ゲーム装置の構成〉
第２図は、本実施形態のシステムのゲーム装置部分を側面から見た図である。複合現実感のエアーホッケーゲームは、テーブル１０００を挟んで、二人の対戦者２０００，３０００が手にマレット（２６０Ｌ，２６０Ｒ）をもって向かい合う。二人の対戦者２０００，３０００は頭部にヘッドマウントデイスプレイ（以下ＨＭＤと略す）２１０Ｌ，２１０Ｒを装着する。本実施形態のマレットは、その先端に赤外線発光器を有している。後述するように、本実施形態では、画像処理によりマレット位置を検出するが、マレットの形状や色に特徴があるのであれば、それらの特徴を用いたパターン認識によるマレット位置の検出も可能である。
【００２６】
実施形態のＨＭＤ２１０は、第４図に示すようにシースルー型である。両対戦者２０００，３０００は、ＨＭＤ２１０Ｌ，２１０Ｒを装着していても、テーブル１０００の表面を観察することができる。ＨＭＤ２１０には後述の画像処理システムから三次元仮想画像が入力される。従って、対戦者２０００，３０００は、ＨＭＤ２１０の光学系（第２図には不図示）を通した現実空間の映像に重ねて、ＨＭＤ２１０の表示画面に表示された三次元画像を見ることとなる。
【００２７】
第３図は、左側プレーヤ２０００が自身のＨＭＤ２１０Ｌからみた映像を示す。二人のプレーヤ２０００，３０００は仮想のパック１５００を打ち合う。パック１５００を打つのはプレーヤ２０００（プレーヤ３０００）が手に握っている現実のマレット２６０Ｌ（２６０Ｒ）を用いる。プレーヤ２０００は手にマレット２６０Ｌを握っている。相手プレーヤ３０００の直前にはゴール１２００Ｒが見える。後述の画像処理システム（第３図には不図示）は、ゴール１２００Ｒが相手方近傍に見えるように、三次元ＣＧを生成してＨＭＤ２１０Ｌに表示する。
【００２８】
対するプレーヤ３０００も、ＨＭＤ２１０Ｒを介してプレーヤ３０００の近傍にゴール１２００Ｌを見ることとなる。パック１５００も後述の画像処理システムにより生成されて、各々のＨＭＤに表示される。
【００２９】
〈磁気センサ付きＨＭＤ〉
第４図は、ＨＭＤ２１０の構成を示す。このＨＭＤ２１０は、例えば特開平７−３３３５５１号のＨＭＤの本体に、磁気センサ２２０を支柱２２１を介して取り付けたものである。図中、２１１はＬＣＤ表示パネルである。ＬＣＤ表示パネルからの光は、光学部材２１２に入射し、全反射面２１４にて反射して、凹面ミラー２１３の全反射面にて反射して、全反射面２１４を透過して観察者の目に届く。
【００３０】
磁気センサ２２０は、本実施形態では、Polhemus社の磁気センサFastrackを用いた。磁気センサは磁気ノイズに弱いので、プラスティック製の支柱２２１により、ノイズ発生源である表示パネル２１１及びカメラ２４０から離間した。尚、第４図に示したＨＭＤに磁気センサ及び（又は）カメラを取り付ける構成は、光学的シースルー方式の（透視型の）ＨＭＤに限られず、ビデオシースルー方式の（遮蔽型の）ＨＭＤであっても、磁気センサ及び（又は）カメラを、頭部位置及び姿勢を正確に検出する目的で、そのＨＭＤに装着することは可能である。
【００３１】
第２図において、夫々のＨＭＤ２１０はバンド（不図示）によってプレーヤの頭部に固定される。プレーヤの夫々の頭部には、第４図に示すように磁気センサ２２０が、第２図に示すようにＣＣＤカメラ２４０（２４０Ｌ，２４０Ｒ）が、それぞれ固定されている。カメラ２４０の視界はプレーヤの前方方向に設定されている。このような磁気センサ２２０とカメラ２４０とを備えたＨＭＤをエアホッケーゲームに用いる場合には、プレーヤはそれぞれテーブル１０００の上面を見ることとなるので、カメラ２４０もテーブル１０００の表面の画像を撮像する。磁気センサ２２０（２２０Ｌ，２２０Ｒ）は、交流磁界発生源２５０が発する交流磁界の変化をセンスする。
【００３２】
後述するように、カメラ２４０が撮影した画像は、磁気センサ２２０が検出した頭部の位置／姿勢を補正するために用いられる。プレーヤがテーブル１０００の表面を見るために斜め下方を向くと、ＨＭＤ２１０を通した視界には、テーブル１０００の表面と、前述の仮想のパック１５００、現実のマレット２６０（２６０Ｌ，２６０Ｒ）、仮想のゴール１２００（１２００Ｌ，１２００Ｒ）が見える。また、プレーヤが、頭部を、水平二次元平面内において水平移動させ、あるいはティルティング運動、ヨー運動、ローリング運動を行わせると、その変化は先ず磁気センサ２２０によって検出され、併せて、頭部の姿勢変化に伴ってＣＣＤカメラ２４０が撮像する画像の変化として観測される。即ち、磁気センサ２２０からの頭部位置を表す信号はカメラ２４０の画像を画像処理することによって補正される。
【００３３】
〈複数のマーカ〉
夫々のプレーヤに把持される夫々のマレット２６０はその先端に赤外線発光器を有しており、各マレットのテーブル１０００上の位置（二次元平面位置）は各マレットからの赤外線を検出するＣＣＤカメラ２３０によって知られる。即ち、カメラ２３０は各プレーヤの手の位置（マレットの位置）を検出するためにある。マレット位置を検出することにより、本ホッケーゲームの進行を判断することができる。
【００３４】
他方、ＣＣＤカメラ２４０はマーカ画像と呼ばれる画像を出力する。第５図はテーブル１０００上に配置されたマーカの一例を示す。第５図において、○印で示した５つのランドマーク即ちマーカ（１６００〜１６０４）はプレーヤ２０００の頭部位置を補助的に検出するために用いられるマーカを示し、□印で示した５つのランドマーク即ちマーカ（１６５０〜１６５４）はプレーヤ３０００の頭部位置を補助的に検出するために用いられるマーカを示す。マーカを第５図のように複数配置すると、頭部の位置、特に姿勢によって、どのマーカが見えるかが決まる。換言すれば、各々のプレーヤに装着されたＣＣＤカメラ２４０が写す画像中におけるマーカを特定し、画像内での位置を検出することにより、プレーヤの頭部姿勢を検出する磁気センサの出力信号の補正を行うことができる。
【００３５】
尚、第５図の○印と□印とは図示のために採用したものであり、その形状に特徴があるわけではなく、任意の形状であってもよい。二人のプレーヤ（２０００，３０００）に対してそれぞれ割り当てられたマーカ群（１６００〜１６０８）とマーカ群（１６５０〜１６５８）とは、それぞれ、異なる色に着色されている。本実施形態では、左側プレーヤ（＃１プレーヤ）のためのマーカは赤色に、右側プレーヤ（＃２プレーヤ）のためのマーカは緑色に着色されている。画像処理におけるマーカの区別を容易にするためである。尚、色でなく形状やテクスチャによってマーカを区別することも可能である。
【００３６】
本実施形態の大きな特徴は、マーカを複数配置した点にある。複数配置することによって、プレーヤがテーブル１０００上で本エアホッケーゲームの動作範囲内で行動する限りにおいて、少なくとも１つのマーカがＣＣＤカメラ２４０の視野内に入ることが保証される。第６図は、プレーヤが頭部を色々と移動した場合において、頭部の移動に伴って、マーカを検出する画像処理範囲が移動する様子が描かれてる。同図に示すように、１つの画像には少なくとも１つのマーカが入っている。換言すれば、マーカの数、マーカ間の間隔等は、テーブル１０００の大きさ、カメラ２４０の視野角、ゲームの性質に基づくプレーヤの移動範囲の大きさに応じて設定されるべきである。第５図の場合、プレーヤから遠方であればあるほど、広い範囲が視野に入るので、マーカ間の間隔を広くしなければならない。これは、近傍にあるマーカ間の画像中での間隔距離と、遠方にあるマーカ間の画像中での距離とを同じくすることにより、遠方の視野の画像中に取り込まれるマーカの数を低く抑えてマーカ検出精度の低下を防止するためである。このようにすることにより、画像中に取り込まれるマーカの密度が遠方マーカでも近隣のマーカでも実質的に等しくすることができ、同じフレーム内に不必要に複数のマーカが撮像されるのを防ぐことができる。
【００３７】
後述するように、本システムでは、カメラ２４０Ｌ（２４０Ｒ）が得た画像中に少なくとも１つのマーカが存在し、そのマーカを特定できれば十分である。従って、特定のマーカをプレーヤが頭を移動させる間（カメラ２４０を移動させる間）において追跡し続ける必要はない。
【００３８】
〈ＭＲ画像生成システム〉
第７図は、第２図に示したゲーム装置における三次元画像の生成提示システムの構成を示す。この画像生成提示システムは、左側プレーヤ２０００のＨＭＤ２１０Ｌ及び右側プレーヤ３０００のＨＭＤ２１０Ｒの夫々の表示装置に、三次元の仮想画像（第３図のパック１５００，ゴール１２００）を出力するものである。三次元仮想画像のための左右の視差画像の生成は、画像生成部５０５０Ｌ，５０５０Ｒに拠って行われる。本実施形態では、画像生成部５０５０の夫々に米国SiliconGraphics社製のコンピュータシステム「ONYX2」を用いた。
【００３９】
画像生成部５０５０は、ゲーム状態管理部５０３０が生成するパック位置情報等と、２つの補正処理部５０４０Ｌ，５０４０Ｒが生成する補正後の視点位置・頭部方向に関する情報とを入力する。ゲーム状態管理部５０３０および補正処理部５０４０Ｌ，５０４０Ｒの夫々はコンピュータシステムONYX2により構成された。
【００４０】
テーブル１０００の中央上空に固定されたＣＣＤカメラ２３０は、テーブル１０００の表面を全て視野に納める。カメラ２３０によって取得されたマレット情報はマレット位置計測部５０１０に入力される。この計測部５０１０は、同じく、SiliconGraphics社製「O2」コンピュータシステムにより構成された。計測部５０１０は、二名のプレーヤのマレット位置、即ち、手の位置を検出する。手の位置に関する情報はゲーム状態管理部５０３０に入力されて、ここで、ゲーム状態が管理される。即ち、ゲーム状態・ゲームの進行は基本的にはマレットの位置によって決定される。
【００４１】
SiliconGraphics社製コンピュータシステムO2により構成された位置姿勢検出部５０００は、２つの磁気センサ２２０Ｌ，２２０Ｒの夫々の出力（センサ２２０自体の位置及び姿勢）を入力して、各プレーヤに装着されたカメラ（２４０Ｌ，２４０Ｒ）での視点位置（Ｘ，Ｙ，Ｚ）及び姿勢（ｐ，ｒ，φ）を検出し、補正処理部５０４０Ｌ，５０４０Ｒに出力する。
【００４２】
一方、各プレーヤの頭部に固定されたＣＣＤカメラ２４０Ｌ，２４０Ｒはマーカ画像を取得し、このマーカ画像は、夫々、マーカ位置検出部５０６０Ｌ，５０６０Ｒにおいて処理され、夫々のカメラ２４０の視野に納まっている夫々のプレーヤについての追跡マーカの位置が検出される。追跡マーカ位置に関する情報は補正処理部５０４０（５０４０Ｌ，５０４０Ｒ）に入力される。
【００４３】
尚、マーカを追跡するマーカ位置検出部５０６０（５０６０Ｌ，５０６０Ｒ）はO2コンピュータシステムにより構成された。
【００４４】
〈マレット位置計測〉
第８図乃至第１０図は、マレット位置を計測する制御手順を示すフローチャートである。１つの共通カメラでマレットを追跡することにより、複数作業者による共通の複合現実感を提示することが可能となる。第８図乃至第１０図のフローチャートにより、本実施形態のマレット位置の計測について説明する。
【００４５】
エアホッケーゲームでは、プレーヤは自身のマレットを他のプレーヤの領域まで進めることはない。そのために、左側プレーヤ２０００（右側プレーヤ３０００）のマレット２６０Ｌ（２６０Ｒ）を探索する処理は、第１１図に示すように、左側フィールドの画像データＩL（画像データＩR）に処理を集中すればよい。固定位置にあるＣＣＤカメラ２３０が取得した画像を第１１図に示すように２つの領域に分割することは容易である。
【００４６】
従って、第８図のフローチャートにおいて、プレーヤ＃１（プレーヤ２０００）のマレット２６０Ｌの探索についてはステップＳ１００で、プレーヤ＃２（プレーヤ３０００）のマレット２６０Ｒの探索についてはステップＳ２００で、夫々の処理が行われる。そこで、便宜上、右側プレーヤのマレットの探索（ステップＳ２００）を例にして説明する。
【００４７】
先ず、ステップＳ２１０で、ＴＶカメラ２３０が撮像したテーブル１０００表面の多値画像を取得する。ステップＳ２１２では、その多値画像の右半分の画像データＩRについて、サブルーチン「ローカル領域での探索」を施す。「ローカル領域での探索」処理の詳細は第９図に示される。ステップＳ２１２で画像座標系でのマレット位置の座標（ｘ，ｙ）が見つかると、ステップＳ２１４からステップＳ２２０に進み、画像座標系でのマレット位置座標（ｘ，ｙ）を次式に従ってテーブル１０００の座標系（第１３図を参照）の座標位置（ｘ’，ｙ’）に変換する。
【００４８】
【数２】

【００４９】
ここで、マトリクスＭ_Tは画像座標系とテーブル座標系とをキャリブレーションするための３×３の変換行列で、既知である。ステップＳ２２０で得られた座標位置（ｘ’，ｙ’）（第３図では、（ｘ’，ｙ’）は「手の位置」としてしめされている）はゲーム状態管理部５０３０に送られる。ローカル領域でマレットがみつからなかったならば、ステップＳ２１６で「グローバル領域での探索」を行う。「グローバル領域での探索」でマレットが見つかったならば、ステップＳ２２０でその座標位置をテーブル座標系に変換する。尚、ローカル又はグローバル領域で探索された座標位置は、次のフレームでのローカル領域におけるマレットの探索に用いられる。
【００５０】
第９図はマレットをローカル領域で探索する処理（ステップＳ２１２の詳細）を示す。但し、この処理は便宜上右側フィールドにおける探索処理を示すが、左側フィールドにおけるマレットの探索処理についても実質的に同様である。ステップＳ２２２で、次式で定義される大きさ（２Ａ＋１）×（２Ｂ＋１）画素の矩形領域を抽出する。
【００５１】
【数３】

ここで、上記式中の、Ｉ'_x，Ｉ'_yは前フレームで検出されたマレットの座標値であり、Ａ，Ｂは探索領域の大きさを決める定数であって、かかる探索領域は第１２図のようになる。
【００５２】
ステップＳ２３０は、ステップＳ２２２で定義された矩形領域中の全ての画素（ｘ，ｙ）について、特徴の評値値Ｉ_S（ｘ，ｙ）が一定の条件を満足する画素を抽出する工程である。マレットを探索する目的では、特徴量とは、画素値（赤外光の強度値）の類似度が好適である。本実施形態では、マレットには赤外線発光器を用いているので、その赤外光の強度の特徴を有するものは、一応マレットと判断する。
【００５３】
即ち、ステップＳ２３２では、類似度Ｉ_Sが所定の閾値以上にマレットに近い画素を見つける。そのような画素を見つけると、カウンタＮに発生度数の累積値を記憶する。また、そのような画素のｘ座標値及びｙ座標値をレジスタSUMx及びSUMyに累積記憶する。即ち、
【００５４】
【数４】

とする。ステップＳ２３０を終了した時点で、第１２図の領域中でマレットからの赤外光のパターンに類似している全ての画素の個数Ｎ、及び座標値の累積値SUMx，SUMyが得られる。Ｎ＝０であればステップＳ２３６で結果“Not Found”が出力される。Ｎ＞０であれば、マレットらしいものが見つかったのであり、ステップＳ２３８で、マレットの位置（Ｉ_x，Ｉ_y）を、
【００５５】
【数５】

に従って演算する。そして、この演算されたマレット位置（Ｉ_x，Ｉ_y）をステップＳ２２０（第８図）でテーブル座標系に変換し、この変換値を「手の位置」を表す信号として管理部５０３０に渡す。第１０図は、ステップＳ２１６のグローバル領域探索の詳細手順を示す。第１０図のステップＳ２４０で、右側フィールドの画像ＩR中の、
【００５６】
【数６】

を満足する画素の中で、特徴の評値値Ｉ_Sの最大値をレジスタMaxに記憶する。ここで、Ｃ，Ｄは探索の粗さを決める定数であり、WidthおよびHeightはその定義を第１５図に示す。即ち、ステップＳ２４２で、特徴量Ｉ_Sが閾値記憶レジスタMaxに記憶されている閾値を超えるか否かを判断する。そのような画素が見つかったならば、ステップＳ２４４で、その特徴量を新たな閾値とすべく、ステップＳ２４４で、
【００５７】
【数７】

とする。ステップＳ２４６では、グローバル探索で見つかった最もマレットらしい画素（Ｉ_x，Ｉ_y ）の座標値をステップＳ２２０に渡す。
【００５８】
このようにして、マレットを画像中で見つけ、その座標値をテーブル座標系に変換したものをゲーム状態管理部５０３０に渡す。
【００５９】
〈ゲーム状態管理〉
第１３図は、本実施形態のエアホッケーゲームのゲームフィールドを示す。このフィールドは、テーブル１０００の上の２次元平面上に定義され、ｘ，ｙ軸を有する。また、左右の２つの仮想的ゴールライン１２００Ｌ，１２００Ｒと、第１３図の上下方向に設けられた仮想的壁１３００ａ，１３００ｂとを有する。仮想的ゴールライン１２００Ｌ，１２００Ｒと仮想的壁１３００ａ，１３００ｂとは、その座標値は既知であり、移動することはない。このフィールドの中で、マレット２６０Ｒ，２６０Ｌの移動に応じて、パック１５００の仮想画像が移動する。
【００６０】
パック１５００は、現在位置の座標情報Ｐ_pと速度情報ｖ_pとを有し、左マレット２６０Ｌは現在位置の座標情報Ｐ_SLと速度情報ｖ_SLとを有し、右マレット２６０Ｒは現在位置の座標情報Ｐ_SRと速度情報ｖ_SRとを有する。第１４図は、ゲーム状態管理部５０３０における処理手順を説明するフローチャートである。
【００６１】
ステップＳ１０において、パック１５００の初期位置Ｐ_p0及び初期速度ｖ_p0を設定する。尚、パックは速度ｖ_pで等速度運動を行う。また、パックは、壁又はマレットに当たると完全弾性衝突を行う、即ち、速度方向が反転するものとする。ゲーム状態管理部５０３０は、マレット位置計測部５０１０が計測した各マレットの位置情報Ｐ_Sから速度情報ｖ_Sを得る。
【００６２】
ステップＳ１２は、ゲームでの勝敗が決定する（ステップＳ５０で一方が３点を先取する）迄の間は、Δｔ時間毎に実行される。すると、ステップＳ１２では、パックの位置は、
【００６３】
【数８】

に更新される。初期位置及び初期速度設定後におけるパックの位置は、一般には、
【００６４】
【数９】

で表される。ステップＳ１４では、更新されたパック位置Ｐ_pがプレーヤの＃１側（左プレーヤ）のフィールドにあるか否かを調べる。パック１５００が左プレーヤ側にある場合について説明する。
【００６５】
ステップＳ１６では、現在のパック位置が左プレーヤのマレット１１００Ｌと干渉する位置にあるか否かを調べる。パック１５００がマレット１１００Ｌと干渉する位置にあるとは、左プレーヤ２０００がマレット２６０Ｌをパックに衝突させるようなマレット操作を行ったことを意味するから、パック１５００の運動を反転させるために、ステップＳ１８で、パック１５００の速度ｖ_pのｘ方向速度成分ｖ_pxの符号を反転させて、ステップＳ２０に進む。
【００６６】
尚、単に速度ｖ_pのｘ方向速度成分ｖ_pxの符号を反転させる代わりに、
【００６７】
【数１０】

として、パックの動作を、パックのｘ方向速度ｖ_pxにマレットのｘ方向の操作速度ｖ_SLxを重畳して、反対方向に進ませるようにしても良い。一方、現在のパック位置が左プレーヤのマレット１１００Ｌと干渉する位置にない場合（ステップＳ１６でＮＯ）には、そのままステップＳ２０に進む。
【００６８】
ステップＳ２０では、パックの位置Ｐ_pが仮想壁１３００ａ又は１３００ｂと衝突する位置にあるか否かを調べる。ステップＳ２０の判断がＹＥＳの場合には、ステップＳ２２でパックの速度のｙ成分を反転させる。次ぎにステップＳ２４で、現在のパック位置が左プレーヤのゴールライン内にあるか否かを調べる。ＹＥＳの場合には、ステップＳ２６で相手側のプレーヤ、即ち、右（＃２）プレーヤの得点を加算する。ステップＳ５０では、いずれかの得点が３点以上先取したかを調べる。３点以上であればゲームを終了する。
【００６９】
ステップＳ１４での判断で、パックの位置Ｐ_pが右プレーヤ側（＃２プレーヤ側）にある場合には、ステップＳ３０以下を実行する。ステップＳ３０〜ステップＳ４０は、ステップＳ１６〜ステップＳ２６と実質的に動作は同じである。かくして、ゲームの進行状態は管理される。ゲームの進行状態は、パックの位置、マレットの位置であり、前述したように、画像生成部５０５０（５０５０Ｌ，５０５０Ｒ）に入力される。
【００７０】
〈頭部位置の補正〉
第１６図は、補正処理部５０４０（５０４０Ｌ，５０４０Ｒ）における処理の制御手順の全体を示す。補正処理部５０４０における補正とは、磁気センサ２２０の出力は誤差を伴うものであり、そのような出力に基づいて計測部５０００が演算した視点位置データ及び頭部姿勢データを、ＣＣＤカメラ２４０から得られた画像中のマーカ位置に基づいて補正する処理をいう。即ち、この補正処理は、カメラ２４０が取得した画像中のマーカ位置から、カメラ２４０の位置（頭部の位置に密接に関連するものでもある）の補正値を求め、その補正値を用いて、視点のビューイング変換行列を変更する。変更されたビューイング変換行列は補正された視点の位置及び姿勢データを表すものであり、換言すれば、この補正されたビューイング変換行列は補正された視点位置での仮想画像を与える。
【００７１】
第２６図は、第１実施形態における観察者の視点位置／姿勢の補正の原理を説明する。ここで、実施形態における観察者の視点位置／姿勢の補正とは、補正されたビューイング変換行列を求めることと等値である。第２６図において、プレーヤのカメラ２４０がマーカ１６０３を画像３００に捉えているとする。マーカ１６０３の位置はこの画像３００内では画像座標系で例えば（ｘ₀，ｙ₀）と表される。一方、画像３００が捉えているマーカが１６０３であることが分かれば、そのマーカ１６０３の世界座標系での座標（Ｘ₀，Ｙ₀，Ｚ₀）は既知である。（ｘ₀，ｙ₀）は画像座標値であり（Ｘ₀，Ｙ₀，Ｚ₀）は世界座標であるから、これらの座標同士を比較することはできない。第１実施形態では、磁気センサ２２０の出力からカメラ２４０のビューイング変換行列ＭCを求め、世界座標系での座標（Ｘ₀，Ｙ₀，Ｚ₀）をこのビューイング変換行列ＭCを用いて画像座標系の座標（ｘ'₀，ｙ'₀）に変換する。そして、（ｘ₀，ｙ₀）と（ｘ'₀，ｙ'₀）との誤差が磁気センサ２２の出力の誤差を表現するものであるから、これを補正する補正行列ΔＭ_Cを求める。
【００７２】
尚、第２６図から明らかなように、画像３００内に捉えられたマーカがマーカ１６０３であることを特定する必要があるが、第１実施形態では、後述するように、全てのマーカの世界座標系での三次元位置を上記ビューイング変換行列Ｍ_Cにより画像座標系に変換して、変換後のカメラ座標値が上記（ｘ₀，ｙ₀）に最も近いマーカを画像３００内に捉えられたマーカと特定する。この処理は第１９図及び第２０図によって説明されるであろう。
【００７３】
第１６図に従って補正処理部５０４０の処理手順を詳細に説明する。即ち、ステップＳ４００では、磁気センサ２２０の出力に基づいて、カメラ２４０のビューイング変換行列（４×４）を計算する。ステップＳ４１０では、ステップＳ４００で求めたビューイング変換行列と、カメラ２４０の理想的透視変換行列（既知）と、各マーカの三次元位置（既知）とに基づいて、各マーカが観測されるべき位置座標（画像座標系での）を予測する。
【００７４】
一方、マーカ位置検出部５０６０（５０６０Ｌ，５０６０Ｒ）は、プレーヤの頭部に取り付けられたカメラ２４０（２４０Ｌ，２４０Ｒ）から得た画像中でマーカを追跡している。マーカ位置検出部５０６０は、検出したマーカ位置を、補正処理部５０４０（ステップＳ４２０において）に渡す。補正処理部５０４０（５０４０Ｌ，５０４０Ｒ）は、ステップＳ４２０において、渡されたマーカ位置情報に基づいて、現在観測しているマーカ、即ち補正の基準となるマーカを判別する。ステップＳ４３０では、ステップＳ４１０で演算されたマーカの予測座標値とマーカ位置検出部５０６０が検出したマーカの観測座標値との差異に基づいて、磁気センサ２２０が検出したカメラ２４０の位置姿勢を補正するための補正行列ΔＭｃを求める。カメラ２４０の位置姿勢の補正ができるのは、マーカ位置検出部５０６０が観測したマーカ（第２６図の例ではマーカ１６０３）の座標値と磁気センサが検出した頭部位置に基づくマーカ座標とは、センサ出力が正確であれば一致している筈であるから、ステップＳ４３０で演算する上記の差異は磁気センサ２２０の誤差を反映するからである。カメラの位置姿勢と視点の位置姿勢との相対関係は既知であり、その関係は三次元座標変換で表される。従って、このカメラの位置姿勢の補正行列ΔＭｃに基づいて、ステップＳ４４０で、ステップＳ４００で演算した視点のビューイング変換行列を補正し、この補正された変換行列を画像生成部５０５０（５０５０Ｌ，５０５０Ｒ）に渡す。
【００７５】
第１７図はマーカ位置検出部５０６０における、マーカの位置検出のための処理手順である。ステップＳ５００では、カメラ２４０が取得したカラー画像を取り込む。その後に、ステップＳ５０２では、「ローカル領域探索」を、ステップＳ５０６では「グローバル領域探索」を行って、画像座標系によって表されたマーカ位置（ｘ，ｙ）を検出する。ステップＳ５０２の「ローカル領域探索」、ステップＳ５０６の「グローバル領域探索」は、手順としては、マレット探索における「ローカル領域探索」（第９図）、「グローバル領域探索」（第１０図）に実質的に同じであるので、この「ローカル領域探索」（第９図）と「グローバル領域探索」（第１０図）とを援用することとし、図示を省略する。但し、援用された制御手順（ステップＳ２３２）における、マーカ探索のための特徴量ＩSとして、プレーヤ＃１（左）について、注目画素の画素値の、
【００７６】
【数１１】

を用いる。プレーヤ＃１については、マーカ（１６００〜１６０４）には赤色を用いているので、この特徴量は赤らしさの程度を表す。また、プレーヤ＃２（右）については緑色のマーカ（１６５０〜１６５４）を用いているので、
【００７７】
【数１２】

を用いる。また、グローバル探索における特徴量Ｉ_S（ｘ，ｙ）についても上記２つの量を用いる。ステップＳ５０２及びステップＳ５０６で得られたマーカの座標値は、ステップＳ５１０で、歪みを補正するための行列Ｍ（例えば３×３の大きさを有する）を用いて歪みのない理想的な画像座標系に変換する。この時の変換式は、
【００７８】
【数１３】

である。次ぎに、第１６図のステップＳ４１０の処理の詳細について第１８図を用いて説明する。前述したように、ステップＳ４００では世界座標系からカメラ座標系への変換行列Ｍ_C（４×４のビューイング変換行列）が得られている。一方、カメラ座標系から画像座標系への変換行列Ｐ_C（４×４）も既知の値として与えられている。また、注目するマーカの三次元座標位置（Ｘ，Ｙ，Ｚ）も既知として与えられている。
【００７９】
即ち、角度ｒをカメラ２４０の位置でのＺ軸方向の回転(roll)とし、角度ｐをカメラ２４０の位置でのＸ軸方向の回転(pitch)とし、角度φをカメラ２４０の位置でのＺ軸方向の回転(yaw)とし、カメラ２４０の位置を（Ｘ₀，Ｙ₀，Ｚ₀）すると、カメラ２４０のビューイング変換行列（即ち、世界座標系からカメラ座標系への変換行列）Ｍ_cは、
【００８０】
【数１４】

であり、ｄをカメラ２４０の焦点距離、ｗをカメラの撮像面の幅、ｈを同じく高さとすると、カメラ座標系から画像座標系への変換行列Ｐ_Cは、
【００８１】
【数１５】

で表される。従って、第１８図のステップＳ５２０（即ち第１６図のステップＳ４１０）では、注目マーカの座標位置（Ｘ，Ｙ，Ｚ）を、次式に従って、画像面上での位置（ｘ_h，ｙ_h，ｚ_h）に変換する。
【００８２】
【数１６】

【００８３】
ステップＳ５２２では、画像座標系におけるマーカの観測予測座標値ｘ、ｙとして、
【００８４】
【数１７】

を得る。かくして、ステップＳ４１０により、各マーカｉの画像座標系での観測予測座標値（ｘ_i，ｙ_i）を得ることができる。次ぎに、ステップＳ４２０における「マーカ判別」の処理を説明する。第１９図は、テーブル１０００上において、一方のプレーヤのカメラ２４０が画像６００を取得した場合を示す。
【００８５】
テーブル１０００上に設けられているマーカを、例えばＭ₁〜Ｍ₇とし、△印で表す。このマーカの三次元位置Ｍ_iは既知である。画像６００内には、マーカＭ₂，Ｍ₃，Ｍ₆，Ｍ₇が含まれている。一方、各マーカＭ_iの観測予測位置はステップＳ５２０で求められたものであり、それをＰ_iとする。また、Ｑは、マーカ位置検出部５０６０が検出し、検出部５０６０から渡されたマーカ位置を示す。
【００８６】
ステップＳ４２０の「マーカの判別」は、マーカ位置検出部５０６０が検出したマーカ位置Ｑが、どのＰ_i（即ち、どのＭ_i）に対応するかを判断するものである。第１９図において、ベクトルｅ_iを、検出されたマーカ位置Ｑから各マーカの予測位置Ｐ_iに向かうベクトルの長さ、即ち、距離を表すものとする。ステップＳ４２０の詳細を第２０図に示す。即ち、第２０図の処理は、画像６０００内に入るマーカｉ（ｉ＝０〜ｎ）の距離ｅ_iのうち、最小値を示すマーカを探索し、そのマーカの識別子ｉを出力するものである。即ち、
【００８７】
【数１８】

である。第１９図の例では、Ｐ₂間での距離ｅ₂が一番短いので、マーカＭ2を磁気センサ出力の補正に用いるデータとする。かくして、プレーヤがどのように移動しても、その活動範囲（フィールド）内では、カメラ２４０は少なくとも１つのマーカを画像中に捉えるので、従来のように、フィールドの大きさを狭く限定する必要が無くなる。
【００８８】
次に、ステップＳ４３０では、第１図において説明した処理と同じであって、式１８に基づいて得られた誤差距離ｅ_minに基づいて、カメラの位置方位の補正を表す変換行列ΔＭ_cを求める。一方、ステップＳ４３２では、磁気センサ出力に基づいて、プレーヤの視点位置でのビューイング変換行列Ｍ_Vを求める。また、Ｍ_vcをカメラ座標系から視点座標系への変換行列（既知である）とすると、ステップＳ４４０では、このＭ_vcを用いて、次式によって補正後の視点のビューイング変換行列Ｍ_v’を導出する。
【００８９】
【数１９】

【００９０】
尚、第２６図から明らかであり、また後述の第２実施形態からも明らかになることであるが、第１実施形態（第１６図の処理）では、画像座標系に変換して、誤差距離ｅを求めたが、反対に世界座標系に変換して誤差距離ｅを求めても同じように補正後の視点のビューイング変換行列を得ることができる。
【００９１】
〈頭部位置の検出精度の向上〉…第２実施形態
上記第１実施形態では、ＨＭＤ２１０Ｌ（２１０Ｒ）には前方モニタ用の１つのカメラ２４０Ｌ（２４０Ｒ）が設けられていた。このカメラ２４０により取得されたテーブル１０００上のマーカの画像を、処理部５０６０が処理して、画像中のマーカを特定（ステップＳ４２０）して、プレーヤの頭部の姿勢、即ち、頭部に装着されたカメラの姿勢、換言すれば、この姿勢を有するカメラによるビューイング変換を表す行列を決定するというものであった。しかしながら、第１実施形態では画像座標系での誤差のみを利用しているため、カメラとマーカの位置関係に三次元的なずれが残ってしまう。
【００９２】
また、複合現実感を提示するための用途によっては、マーカが３次元空間の任意位置におかれる場合もあり、このような場合は第１実施形態での第１６図に示されたマーカの同定手法は信頼性が低くなる。次に提案する第２実施形態は、この三次元的なずれの問題を解消する。即ち、一人のプレーヤに２台のカメラを装着してマーカの検出を世界座標系で行うことで、上記の問題を解消する。また、第２実施形態は、マーカが平面上に配置されなくてはならないという拘束も緩和する。
【００９３】
具体的には、２つのカメラが左右に配置装着された２台のＨＭＤを二人のプレーヤに用いるものである。即ち、第２１図に示すように、プレーヤ２０００（３０００）のＨＭＤ２１０Ｌ（２１０Ｒ）には、２台のカメラ２４０ＬＲ，２４０ＬＬ（２４０ＲＲ，２４０ＲＬ）を装着し、このカメラ２４０ＬＲ，２４０ＬＬ（２４０ＲＲ，２４０ＲＬ）から得たステレオ画像から、カメラ２４０ＬＲ，２４０ＬＬ（２４０ＲＲ，２４０ＲＬ）の姿勢を補正するものである。
【００９４】
尚、第２実施形態のシステムは、マーカが３次元的に配置された場合にも対処できるものであるが、第１実施形態の処理手順との異同を明瞭にするために、第１実施形態と同じく、平面上に配置された複数のマーカを用いるエアホッケーゲームに適用する。第２２図は、第２実施形態に関わる画像処理システムの一部を示す。即ち、第２２図は、第１実施形態の画像処理システム（第７図）のうちの変更部分を示す。即ち、第７図と第２２図とを対比すると、第２実施形態の画像処理システムは、各プレーヤに２台のカメラが設けられている点のほかに、マーカ位置検出部５０６０Ｌ’（５０６０Ｒ’）と、補正処理部５０４０Ｌ’（５０４０Ｒ’）とを有する点で第１実施形態と異なるものの、第２実施形態のマーカ位置検出部５０６０Ｌ’（５０６０Ｒ’）と、補正処理部５０４０Ｌ’（５０４０Ｒ’）は第１実施形態のマーカ位置検出部５０６０Ｌ（５０６０Ｒ）と、補正処理部５０４０Ｌ（５０４０Ｒ）とソフトウエアの処理において異なっているに過ぎない。
【００９５】
第２３図は、第２実施形態の処理手順のうちの特に左側プレーヤ２０００のための制御手順を示し、特に、第１実施形態の第１６図の制御手順に対応する部分であって、マーカ位置検出部５０６０’と位置・姿勢検出部５０００と補正処理部５０４０Ｌ’との連係動作を説明する。第２３図に於いて、第１実施形態と同じところの位置・姿勢検出部５０００は、ステップＳ３９８で磁気センサ２２０Ｌの出力に基づいて、視点のビューイング変換行列を算出する。ステップＳ４００’では、磁気センサ２２０Ｌの出力に基づいて、カメラ２４０ＬＲのビューイング変換行列の逆行列を算出する。この変換行列は補正処理部５０４０’に送られる。
【００９６】
２台のカメラ２４０ＬＬ（２４０ＬＲ）からの画像はマーカ位置検出部５０６０Ｌ’に送られる。即ち、ステップＳ４０２では、検出部５０６０’が右側カメラ２４０ＬＲからの画像Ｒ中にマーカ画像ｍ_Rを抽出する。抽出されたマーカの座標（即ち、観測座標）をＩ_mRで表す。ステップＳ４０４では、検出部５０６０’が右側カメラ２４０ＬＬからの画像Ｌ中に対応マーカ画像ｍ_Lを抽出する。抽出されたマーカの座標をＩ_mLで表す。マーカ画像ｍ_Rとマーカ画像ｍ_Lとは本来同じマーカｍXのものであるから、ステップＳ４０６では、観測された１組のマーカ座標（Ｉ_mR，Ｉ_mL）から、三角測量の原理に基づいて、カメラ２４０ＬＲの座標系における抽出された観測マーカの三次元位置Ｃ_mを導出する。
【００９７】
尚、ステップＳ４０４では、一般的なステレオ視による手法を用いてマーカ画像ｍ_Lの対応点探索を行うが、処理を高速に行うために、周知のエピポーラ拘束epipolar bindを用いて探索範囲を限定してもよい。第２３図のステップＳ４１０’，ステップＳ４２０’，ステップＳ４２２，ステップＳ４３０’は補正処理部５０４０Ｌ’における処理を示す。
【００９８】
まず、ステップＳ４１０’では、観測されたマーカのカメラ座標系における三次元位置Ｃ_mを、ステップＳ４００’において導出された透視変換行列を用いて、世界座標系の三次元位置Ｗ_mに変換する。ステップＳ４２０’では、全てのマーカｍ_iの世界座標系における３次元位置Ｗ_mi（既知）を所定のメモリから取り出して、個々のマーカｍ_iと観測マーカｍ_Xとのユークリッド距離｜Ｗ_mi−Ｗ_m｜を最小とするようなＷ_miを決定する。即ち、観測マーカｍ_X最も近い位置にある既知のマーカを同定する。
【００９９】
Ｗ_miとＷ_mとは本来的には同じ位置ではあるが、センサの誤差によって誤差ベクトルＤ（第１実施形態のｅに相当）が発生しているものである。そこで、ステップＳ４２０’では、観測された（追跡された）マーカの３次元座標（世界座標）に最も近い座標値Ｗ_miを有するマーカを決定し、ステップＳ４３０’では、観測マーカと決定されたマーカとの距離差ベクトルＤを、
【０１００】
【数２０】

により演算して、カメラ位置をこのベクトル量だけ移動させるための変換行列ΔＭ_cを求め、ステップＳ４４０’では、第１実施形態と同じ手法で視点のビューイング変換行列を補正する。
【０１０１】
かくして、本発明は、２台のカメラを装着したＨＭＤを用いることにより、観測マーカの位置検出を三次元的に行うことができ、このために、より正確な視点の位置姿勢が検出可能となり、ＭＲの仮想画像と実画像とのつなぎが滑らかとなる。
【０１０２】
〈変形例１〉
本発明は上述の第１実施形態や第２実施形態にのみ適用されるものではない。
【０１０３】
上記第１実施形態では、画像中にマーカを検出する処理は、第１７図に示すように、最初に見つかったものを追跡対象のマーカとしていた。そのために、例えば、第２４図に示すように、あるフレームでマーカＭ₁を含む画像８００が得られた場合に、その後のフレームの画像領域８１０には、マーカが領域８１０の端部ではあるがその領域８１０内に含まれている場合には、マーカＭ_iを補正処理の基準用のマーカとして決定することに不都合はない。しかし、その後のフレームで、例えば画像８２０が得られ、その領域内にはマーカＭ_iが外れ、代わりにマーカＭ₂を含む場合には、補正のための基準マーカはそのマーカＭ₂に変更せざるを得ない。このようなマーカの変更は追跡に失敗した場合にも必要となり、位置ずれの補正には、新たに追跡されたマーカが利用される。
【０１０４】
このように補正に使用するマーカを切り替えることの問題点として、その切り替わりの際に、補正値の急激な変化が原因となって、仮想物体が不自然に移動してしまう場合がある。そこで、補正値の時間的整合性を保つために、前フレームまでの補正値を次の補正値の設定に反映させることを変形例として提案する。
【０１０５】
即ち、あるフレームでの補正値（世界座標系での平行移動を表す３次元ベクトル）をｖ_t、前フレームでの補正値をｖ'_t-1としたとき、次式で求められるｖ'_tを新たな補正値とする。
【０１０６】
【数２１】

ここでαは、過去の情報の影響の度合いを定義する０≦α＜１の定数である。上記式の意味するところは、前フレームでの補正値ｖ'_t-1に拠る寄与度をαとし、今回のフレームで得られた補正値ｖ_tを（１−α）の寄与度で用いるというものである。
【０１０７】
このようにすることにより、補正値の急激な変化が緩和され、三次元仮想画像の急激な変化（不自然な移動）が解消する。新たな補正値αを適当な値に設定することで、マーカの切り替わりによる不自然な物体の移動を防ぐことができる。
【０１０８】
〈変形例２〉
上記実施形態では、画像中にマーカを検出する処理は、第１７図に示すように、ローカル探索でマーカを発見できなかった場合、前回のフレームでのマーカの位置に関わらず、全画面中で最も類似度の高い点を追跡対象のマーカとしていた。ここで、マーカの探索を、前のフレームで見つかったマーカの位置を中心にして、マーカ探索を行う変形例を提案する。これは、プレーヤの移動に伴う画像フレームの移動があっても、マーカは前フレームに存在した位置から大きくずれていない位置に存在する可能性が高いからである。
【０１０９】
第２５図は、前回のフレームにおいて見つかったマーカを今回のフレームに探索する原理を説明する。このような探索経路で探索を行い、ある閾値以上の類似度を持つ点を見つけたら、この点を追跡対象のマーカとするのである。
【０１１０】
〈変形例３〉
上記実施形態は光学式ＨＭＤを用いたものであったが、本発明は光学式ＨＭＤの適用に限定されるものではなく、ビデオシースルー方式のＨＭＤにも適用可能である。
【０１１１】
〈変形例４〉
上記実施形態は、エアホッケーゲームに適用したものであったが、本発明はエアホッケーゲームに限られない。本発明は、複数人の作業（例えばマレット操作）を、１つのカメラ手段により撮像して捉えるので、その複数人の作業を１つの仮想空間に再現することが可能である。従って、本発明は、２人以上の作業者を前提とした協調作業（例えば、複数人による設計作業のＭＲプレゼンテーション、あるいは複数人の対戦型ゲーム）の実施例にも好適である。
【０１１２】
本発明の、複数のマーカに基づいた頭部姿勢位置を補正する処理は、複数人の協調作業にのみ好適であることはない。一人の作業者（あるいはプレーヤ）に複合現実感を提示するシステムにも適用可能である。
【０１１３】
〈他の変形例〉
第２実施形態では、２台のカメラを用いていたが、３台以上のカメラを用いてもよい。
【０１１４】
前述したように、マーカはプレーヤのカメラ２４０の少なくとも１つ捉えられれば十分である。マーカの数が多すぎると、画像に捕捉されるマーカの数が多くなり、第１６図のＳ４３０や第２３図のＳ４３０’の追跡マーカの同定処理で、マーカを誤って同定してしまう可能性が高くなる。従って、作業がカメラ２４０の移動をある程度規制できるものであれば、そのカメラに常に１つのマーカだけが捕捉されるように、マーカの数を少なくすることも可能である。
【０１１５】
また、前述の実施形態としての位置姿勢検出装置は、プレーヤの視点位置での補正されたビューイング変換行列を出力するものであったが、本発明はこれに限定されず、プレーヤの視点位置を補正された値（Ｘ，Ｙ，Ｚ，ｒ，ｐ，φ）の形式で出力する装置にも適用できる。また、マーカは、上述のシステムがマーカ若しくはマークとして認識できるものであれば、いかなる形状を有してもよく、また、マークではなく、ものであってもよい。
【０１１６】
【発明の効果】
以上説明したように、本発明によれば、補正値の急激な変化が緩和され、三次元仮想画像の急激な変化（不自然な移動）が解消する。また、新たな補正値αを適当な値に設定することで、マーカの切り替わりによる不自然な物体の移動を防ぐことができる。
【図面の簡単な説明】
【図１】従来技術において、および本発明の実施形態に適用されている、カメラ位置の補正の原理を説明する図。
【図２】本発明の実施形態に用いられているゲーム装置の構成を示す側面図。
【図３】第２図のゲーム装置で左側プレーヤの視界に見えるシーンを説明する図。
【図４】第２図のゲーム装置に用いられているＨＭＤの構成を説明する図。
【図５】第２図のゲーム装置のテーブルに設けられたマーカの配置を説明する図。
【図６】第５図のテーブル上で、プレーヤの移動につれて、プレーヤの頭部に装着されたカメラに捉えられる画像中に含まれるマーカの変遷を説明する図。
【図７】実施形態のゲーム装置のための、三次元画像生成装置の構成を説明する図。
【図８】実施形態のマレット位置計測部に因る処理手順を説明するフローチャート。
【図９】実施形態のマレット位置計測部に因る処理手順の一部サブルーチン（ローカル探索）を説明するフローチャート。
【図１０】実施形態のマレット位置計測部に因る処理手順の一部サブルーチン（グローバル探索）を説明するフローチャート。
【図１１】第８図のフローチャートの処理において用いられる処理対象領域の分割を説明する図。
【図１２】第８図のフローチャートの処理において用いられる対象領域の設定手法を示す図。
【図１３】本実施形態のゲームにおける仮想ゲームフィールドの構成を説明する図。
【図１４】実施形態のゲーム状態管理部におけるゲーム管理の制御手順を説明するフローチャート。
【図１５】マレット検出ための手法を説明する図。
【図１６】実施形態における補正処理部の処理手順を全体的に説明するフローチャート。
【図１７】第１６図のフローチャートの一部（マーカの追跡）を詳細に説明するフローチャート。
【図１８】第１６図のフローチャートの一部（マーカ位置の予測）を詳細に説明するフローチャート。
【図１９】補正のために使用される基準となるマーカの検出の原理を説明する図。
【図２０】基準となるマーカの検出の原理を説明するフローチャート。
【図２１】第２の実施形態に用いられるＨＭＤの構成を示す図。
【図２２】第２実施形態の画像処理システムの主要の構成を示すブロック図。
【図２３】第２実施形態の画像処理システムの制御の一部を示すフローチャート。
【図２４】実施形態の変形例に適用される基準マーカの変遷を説明する図。
【図２５】実施形態の変形例に適用されるマーカ探索の原理を説明する図。
【図２６】第１実施形態の補正処理の原理を説明する図。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a technique for presenting a mixed reality, for example, by combining a virtual image by computer graphics with a real space to an operator.
[0002]
[Prior art]
In recent years, research on mixed reality (hereinafter referred to as “MR” (Mixed Reality)) for the purpose of seamless connection between real space and virtual space has become active. MR has been attracting attention as a technique for enhancing VR for the purpose of coexistence between a virtual reality (hereinafter abbreviated as VR) world and a real space, which can be experienced only in a situation separated from the real space.
[0003]
MR applications include the use of medical assistance to present to the doctor as if the patient's body is seen through, and the use of work assistance to display the product assembly procedure in the factory. A new field that is qualitatively different from VR is expected. A common requirement for these applications is a technique of how to eliminate the “deviation” between the real space and the virtual space. “Misalignment” can be classified into misalignment, time misalignment, and qualitative misalignment, and many efforts have been made to eliminate misalignment (ie, alignment), which is the most basic requirement among them. I came.
[0004]
In the case of video see-through MR in which a virtual object is superimposed on a video image taken by a video camera, the alignment problem results in the problem of accurately determining the three-dimensional position of the video camera. . The alignment problem in the case of optical see-through MR using a transflective HMD (Head Mount Display) is a problem of obtaining the three-dimensional position of the user's viewpoint, and the measurement method thereof. For example, a three-dimensional position / orientation sensor such as a magnetic sensor, an ultrasonic sensor, or a gyro is generally used. However, the accuracy of these sensors is not always sufficient, and the error causes a position shift.
[0005]
On the other hand, in the case of the video see-through method, a method of directly performing alignment on an image based on image information without using such a sensor is also conceivable. Since this method can directly handle misalignment, positioning can be performed with high accuracy, but there are problems such as lack of real time and reliability. In recent years, there have been reports of attempts to achieve accurate alignment by compensating for the disadvantages of both by using a position and orientation sensor and image information together.
[0006]
As one attempt, “Dynamic Registration Correction in Video-Based-Augmented Reality Systems” (Bajura Michael and Ulrich Neuman, IEEE Computer Graphics and Applications 15, 5, pp. 52-60, 1995) (hereinafter referred to as the first document) ) Proposed a method for correcting misalignment caused by an error of a magnetic sensor by image information in video see-through MR.
[0007]
“Superior Augmented Reality Registration by Integrating Landmark Tracking and Magnetic Tracking” (State Andrei et al., Proc. Of SIGGRAPH 96, pp. 429-438, 1996) (hereinafter referred to as the second document) further develops this method. Then, we proposed a method to compensate the ambiguity of position estimation by image information with sensor information. In the second document, when a video see-through MR presentation system is constructed by using only a position-azimuth sensor, the positional deviation generated on the image due to the error of the sensor is eliminated. In addition, a landmark having a known three-dimensional position is set in the real space. This landmark serves as a clue for detecting the positional deviation from the image information.
[0008]
Assuming that the output of the position / orientation sensor does not include an error, the coordinates of landmarks actually observed on the image (Q _I And the predicted observation coordinates of the landmark (P) derived from the camera position obtained based on the sensor output and the three-dimensional position of the landmark. _I Is supposed to be the same. However, since the camera position actually obtained based on the sensor output is not accurate, the landmark coordinates Q _I And observation coordinate P _I Does not match. This P _I And Q _I The deviation represents the positional deviation between the virtual space and the real space at the landmark position. For this reason, the direction and magnitude of the deviation can be calculated by extracting the landmark position from the image.
[0009]
Thus, by quantitatively measuring the positional deviation on the image, it is possible to correct the camera position so as to eliminate the positional deviation. The simplest alignment method using both an azimuth sensor and an image is considered to be correction of sensor error using a single landmark, and the camera position is translated or rotated in accordance with the positional deviation of the landmark on the image. A technique is proposed by the first document.
[0010]
FIG. 1 shows the basic concept of misalignment correction using a single landmark. In the following, it is assumed that an image is captured by an ideal imaging system in which the internal parameters of the camera are known and the influence of distortion and the like is excluded. The viewpoint position of the camera is C, and the observation coordinates of the landmark on the image are Q _I , Q is the landmark position in real space _I Then point Q _I Is point C and point Q _I A straight line connecting _Q Exists on. On the other hand, from the camera position given by the position and orientation sensor, the landmark position P in the camera coordinate system is used. _C And the observed coordinate P on the image _I Can be guessed. In the following, point C to point Q _I , Point P _I Each of the three-dimensional vectors to v ₁ , V ₂ Is written. In this method, the observation prediction coordinate mark P ′ of the landmark after correction is corrected. _I Is Q _I (That is, the landmark predicted position P ′ after correction in the camera coordinate system) _C Is straight line l _Q The positional deviation is corrected by correcting the relative position information of the camera and the object (as it is on top).
[0011]
Consider correcting the positional deviation of the landmark by rotating the camera position. This is the two vectors v ₁ , V ₂ It can be realized by correcting the position information of the camera so that the camera rotates by an angle θ formed by In the actual calculation, the vector v ₁ , V ₂ Vector v normalized _1n , V _2n And the outer product v _1n × v _2n Is the inner product v _1n ・ V _2n The camera is rotated around the point C with the angle of rotation as the rotation angle.
[0012]
Consider correcting the positional deviation of the landmark by relative translation of the camera position. This represents the object position in the virtual world as v = n (v ₁ -V ₂ ) Only in parallel. Here, n is a scale factor defined by the following equation.
[0013]
[Expression 1]

[0014]
Here, | AB | is a symbol indicating the distance between point A and point B. The same correction can be performed by correcting the camera position information so that the camera moves in parallel by −v. This is because the virtual object is relatively moved by v by this operation. The above two methods are methods for two-dimensionally matching the positional deviation on the landmark, and are not correcting the camera position to a three-dimensionally correct position. However, when the sensor error is small, a sufficient effect can be expected, and the calculation cost for correction is very small, and this method is excellent in real time.
[0015]
[Problems to be solved by the invention]
However, in the method shown in the above document, since it is necessary to capture the position of the only marker in the captured image, there is a restriction that the marker must always be captured by the camera. I could only see a very limited space.
[0016]
Furthermore, when a plurality of workers share a common mixed reality space, the above constraint is fatal with only one marker.
[0017]
The present invention has been made in view of such a situation, and an information processing method and information that can alleviate an abrupt change in a correction value and eliminate an unnatural movement due to an abrupt change in a three-dimensional virtual image. An object is to provide a processing apparatus.
[0018]
[Means for Solving the Problems]
In order to solve the above problems, the present invention Information processing method related to Has multiple markers placed Real space The Imaging device Shooting images obtained by shooting with Get Do First acquisition Process, A second acquisition step of acquiring the position and orientation of the object in the real space; The photographed image Contained in the above Marker Position of Detecting step of detecting the marker and Use the first position and orientation of the object Calculate the correction value of First A calculation process; A correction value acquisition step of acquiring a second correction value of the position and orientation of the object calculated from a captured image of a frame before the frame of the captured image; and the first correction using the second correction value. A second calculation step of calculating a third correction value by correcting the value; Said Third Using the correction value Object Position and posture correction Do Position and orientation correction Process, A first generation step of generating a virtual image corresponding to the position and orientation of the object whose position and orientation are corrected by the position and orientation correction step; and generating a composite space image obtained by combining the virtual image and the captured image Second to Generation process When It is characterized by having.
[0023]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, a system according to an embodiment in which the mixed reality presentation method and the HMD of the present invention are applied to an air hockey game apparatus will be described. The air hockey game is a match-type game in which the opponent exists. Usually, a compressed air is supplied from the bottom to float the puck, hit this puck, and score the points when the puck is put in the opponent's goal. It is a game in which the winner is the one with the highest score. In the air hockey game to which the MR of the present embodiment is applied, a virtual pack is displayed as a virtual three-dimensional image superimposed on a table in a real environment and presented to the player, and the virtual pack is virtually displayed on the player with a real mallet. It is a thing to meet.
[0024]
The feature of this game device is
(1): Shooting a common real world for a plurality of workers with a common camera, and specifying a working actuator (a mallet in this embodiment) operated by a plurality of workers in the common image. With this, one mixed reality world appears and it can be shared by multiple people.
(2): In addition to a magnetic sensor that detects the position and posture of the head, in addition to a magnetic sensor that detects the position and orientation of the head, the camera is placed on the worker's head in order to accurately detect the viewpoint position of the worker who moves greatly in a wide real space. The camera is mounted and this camera images at least one of a plurality of markers provided on the game play table, and the difference between the image coordinates of the imaged marker and the known position of the marker Thus, the position / posture of the head detected by the magnetic sensor (that is, the position and posture of the operator's viewpoint) are corrected.
[0025]
<Configuration of game device>
FIG. 2 is a side view of the game device portion of the system of this embodiment. In the mixed reality air hockey game, two

opponents

2000 and 3000 face each other with a mallet (260L, 260R) in between. Two

opponents

2000 and 3000 wear head-mounted displays (hereinafter abbreviated as HMDs) 210L and 210R on their heads. The mallet of this embodiment has an infrared light emitter at its tip. As will be described later, in this embodiment, the mallet position is detected by image processing. However, if there is a feature in the shape or color of the mallet, the mallet position can be detected by pattern recognition using those features. .
[0026]
The HMD 210 of the embodiment is a see-through type as shown in FIG. Both

opponents

2000 and 3000 can observe the surface of the table 1000 even when the

HMDs

210L and 210R are attached. A three-dimensional virtual image is input to the HMD 210 from an image processing system described later. Accordingly, the

opponents

2000 and 3000 see the three-dimensional image displayed on the display screen of the HMD 210 so as to be superimposed on the image of the real space through the optical system of the HMD 210 (not shown in FIG. 2).
[0027]
FIG. 3 shows an image viewed by the left player 2000 from his / her HMD 210L. The two

players

2000 and 3000 meet the virtual pack 1500. The actual mallet 260L (260R) held by the player 2000 (player 3000) is used to hit the pack 1500. Player 2000 holds mallet 260L in his hand. A goal 1200R is seen just before the opponent player 3000. An image processing system (not shown in FIG. 3) described later generates a three-dimensional CG and displays it on the HMD 210L so that the goal 1200R can be seen in the vicinity of the opponent.
[0028]
On the other hand, the player 3000 also sees the goal 1200L in the vicinity of the player 3000 via the HMD 210R. The pack 1500 is also generated by an image processing system described later and displayed on each HMD.
[0029]
<HMD with magnetic sensor>
FIG. 4 shows the configuration of the HMD 210. The HMD 210 is obtained by attaching a magnetic sensor 220 to a main body of an HMD disclosed in, for example, Japanese Patent Laid-Open No. 7-333551 via a column 221. In the figure, reference numeral 211 denotes an LCD display panel. Light from the LCD display panel enters the optical member 212, is reflected by the total reflection surface 214, is reflected by the total reflection surface of the concave mirror 213, and passes through the total reflection surface 214 to pass through the eyes of the observer. To reach.
[0030]
In this embodiment, the magnetic sensor 220 is a magnetic sensor Fastrack manufactured by Polhemus. Since the magnetic sensor is vulnerable to magnetic noise, the magnetic sensor is separated from the display panel 211 and the camera 240, which are noise sources, by a plastic support 221. The configuration of attaching the magnetic sensor and / or camera to the HMD shown in FIG. 4 is not limited to the optical see-through type (transparent type) HMD, but a video see-through type (shielded type) HMD. However, it is possible to attach the magnetic sensor and / or camera to the HMD for the purpose of accurately detecting the head position and posture.
[0031]
In FIG. 2, each HMD 210 is fixed to the player's head by a band (not shown). A magnetic sensor 220 is fixed to each player's head, as shown in FIG. 4, and a CCD camera 240 (240L, 240R) is fixed as shown in FIG. The field of view of the camera 240 is set in the forward direction of the player. When an HMD including such a magnetic sensor 220 and a camera 240 is used for an air hockey game, each player looks at the top surface of the table 1000, so the camera 240 also captures an image of the surface of the table 1000. . The magnetic sensor 220 (220L, 220R) senses a change in the alternating magnetic field generated by the alternating magnetic field generation source 250.
[0032]
As will be described later, the image taken by the camera 240 is used to correct the position / posture of the head detected by the magnetic sensor 220. When the player turns obliquely downward to see the surface of the table 1000, the field of view through the HMD 210 shows the surface of the table 1000, the virtual pack 1500, the actual mallet 260 (260L, 260R), and the virtual goal. 1200 (1200L, 1200R) is visible. Further, when the player moves the head horizontally in the horizontal two-dimensional plane, or performs a tilting motion, a yaw motion, or a rolling motion, the change is first detected by the magnetic sensor 220. This is observed as a change in the image captured by the CCD camera 240 with the change in posture. That is, the signal representing the head position from the magnetic sensor 220 is corrected by performing image processing on the image of the camera 240.
[0033]
<Multiple markers>
Each mallet 260 held by each player has an infrared light emitter at its tip, and the position (two-dimensional plane position) of each mallet on the table 1000 is a CCD camera 230 that detects infrared rays from each mallet. Known by. That is, the camera 230 is for detecting the position of each player's hand (the position of the mallet). The progress of the hockey game can be determined by detecting the mallet position.
[0034]
On the other hand, the CCD camera 240 outputs an image called a marker image. FIG. 5 shows an example of the marker arranged on the table 1000. In FIG. 5, five landmarks, i.e., markers (1600 to 1604) indicated by ◯ indicate markers used for auxiliary detection of the head position of the player 2000, and the five lands indicated by □ Marks or markers (1650 to 1654) indicate markers used for auxiliary detection of the head position of the player 3000. When a plurality of markers are arranged as shown in FIG. 5, which marker is visible depends on the position of the head, particularly the posture. In other words, the correction of the output signal of the magnetic sensor that detects the head posture of the player by identifying the marker in the image captured by the CCD camera 240 attached to each player and detecting the position in the image. It can be performed.
[0035]
The circles and squares in FIG. 5 are used for illustration, and the shapes thereof are not characteristic, and any shapes may be used. The marker groups (1600 to 1608) and the marker groups (1650 to 1658) assigned to the two players (2000, 3000) are colored in different colors. In the present embodiment, the marker for the left player (# 1 player) is colored in red, and the marker for the right player (# 2 player) is colored in green. This is to facilitate the marker distinction in the image processing. It is also possible to distinguish the markers not by color but by shape and texture.
[0036]
A major feature of this embodiment is that a plurality of markers are arranged. By arranging a plurality, it is ensured that at least one marker falls within the field of view of the CCD camera 240 as long as the player acts on the table 1000 within the operating range of the air hockey game. FIG. 6 shows a state in which the image processing range for detecting the marker moves with the movement of the head when the player moves the head in various ways. As shown in the figure, at least one marker is included in one image. In other words, the number of markers, the interval between markers, etc. should be set according to the size of the table 1000, the viewing angle of the camera 240, and the size of the player's movement range based on the nature of the game. In the case of FIG. 5, the farther away from the player, the wider the range is in the field of view, so the interval between the markers must be increased. This is to keep the number of markers captured in the image of the far field of view low by making the distance in the image between the markers in the vicinity the same as the distance in the image between the markers in the distance. This is to prevent a decrease in marker detection accuracy. In this way, the density of markers captured in the image can be made substantially the same for both distant markers and neighboring markers, and multiple markers are not imaged unnecessarily in the same frame. Can do.
[0037]
As will be described later, in this system, it is sufficient that at least one marker exists in the image obtained by the camera 240L (240R) and that the marker can be specified. Accordingly, there is no need to keep tracking a specific marker while the player moves his head (while moving the camera 240).
[0038]
<MR image generation system>
FIG. 7 shows a configuration of a three-dimensional image generation / presentation system in the game apparatus shown in FIG. This image generation / presentation system outputs a three-dimensional virtual image (pack 1500, goal 1200 in FIG. 3) to the respective display devices of the HMD 210L of the left player 2000 and the HMD 210R of the right player 3000. Generation of the left and right parallax images for the three-dimensional virtual image is performed by the

image generation units

5050L and 5050R. In this embodiment, a computer system “ONYX2” manufactured by Silicon Graphics, Inc., is used for each of the image generation units 5050.
[0039]
The image generation unit 5050 inputs pack position information and the like generated by the game state management unit 5030 and information on the corrected viewpoint position and head direction generated by the two

correction processing units

5040L and 5040R. Each of the game state management unit 5030 and the

correction processing units

5040L and 5040R is configured by a computer system ONYX2.
[0040]
The CCD camera 230 fixed above the center of the table 1000 places the entire surface of the table 1000 in the field of view. Mallet information acquired by the camera 230 is input to the mallet position measurement unit 5010. The measurement unit 5010 is similarly configured by an “O2” computer system manufactured by Silicon Graphics. The measurement unit 5010 detects the mallet position of two players, that is, the hand position. Information regarding the position of the hand is input to the game state management unit 5030, where the game state is managed. That is, the game state / game progress is basically determined by the position of the mallet.
[0041]
The position / orientation detection unit 5000 configured by the computer system O2 manufactured by SiliconGraphics Co., Ltd. inputs the outputs of the two

magnetic sensors

220L and 220R (the position and orientation of the sensor 220 itself), and the camera ( The viewpoint position (X, Y, Z) and posture (p, r, φ) at 240L, 240R) are detected and output to the

correction processing units

5040L, 5040R.
[0042]
On the other hand, the

CCD cameras

240L and 240R fixed to the heads of the players acquire marker images, and the marker images are processed by the marker

position detection units

5060L and 5060R, respectively, and are stored in the field of view of the respective cameras 240. The position of the tracking marker for each player that is present is detected. Information regarding the tracking marker position is input to the correction processing unit 5040 (5040L, 5040R).
[0043]
The marker position detection unit 5060 (5060L, 5060R) for tracking the marker is configured by an O2 computer system.
[0044]
<Mallet position measurement>
FIGS. 8 to 10 are flowcharts showing a control procedure for measuring the mallet position. By tracking mallet with one common camera, it is possible to present a common mixed reality by a plurality of workers. The measurement of the mallet position according to this embodiment will be described with reference to the flowcharts of FIGS.
[0045]
In an air hockey game, a player does not advance his mallet to the area of another player. Therefore, the process of searching for the mallet 260L (260R) of the left player 2000 (right player 3000) may be performed by concentrating the process on the image data IL (image data IR) in the left field as shown in FIG. It is easy to divide the image acquired by the CCD camera 230 at the fixed position into two areas as shown in FIG.
[0046]
Accordingly, in the flowchart of FIG. 8, each process is performed in step S100 for searching for the mallet 260L of the player # 1 (player 2000) and in step S200 for searching for the mallet 260R of the player # 2 (player 3000). Is called. Therefore, for the sake of convenience, description will be made by taking an example of searching for the right player's mallet (step S200).
[0047]
First, in step S210, a multi-value image of the surface of the table 1000 captured by the TV camera 230 is acquired. In step S212, the subroutine “search in the local area” is performed on the right half of the multi-valued image data IR. Details of the “search in the local area” process are shown in FIG. If the coordinates (x, y) of the mallet position in the image coordinate system are found in step S212, the process proceeds from step S214 to step S220, and the mallet position coordinates (x, y) in the image coordinate system are converted into the coordinates of the table 1000 according to the following equations. It is converted into the coordinate position (x ′, y ′) of the system (see FIG. 13).
[0048]
[Expression 2]

[0049]
Where matrix M _T Is a known 3 × 3 transformation matrix for calibrating the image coordinate system and the table coordinate system. The coordinate position (x ′, y ′) obtained in step S220 ((x ′, y ′) in FIG. 3 is indicated as “hand position”) is sent to the game state management unit 5030. If no mallet is found in the local area, “search in the global area” is performed in step S216. If a mallet is found in the “search in the global area”, the coordinate position is converted into a table coordinate system in step S220. Note that the coordinate position searched in the local or global area is used for searching the mallet in the local area in the next frame.
[0050]
FIG. 9 shows a process for searching the mallet in the local area (details of step S212). However, this process shows the search process in the right field for convenience, but the mallet search process in the left field is substantially the same. In step S222, a rectangular area of size (2A + 1) × (2B + 1) pixels defined by the following equation is extracted.
[0051]
[Equation 3]

Here, I ′ in the above formula _x , I ' _y Is the coordinate value of the mallet detected in the previous frame, A and B are constants that determine the size of the search area, and the search area is as shown in FIG.
[0052]
In step S230, the characteristic evaluation value I for all the pixels (x, y) in the rectangular area defined in step S222. _S This is a step of extracting pixels where (x, y) satisfies a certain condition. For the purpose of searching for mallet, the feature value is preferably the similarity of pixel values (infrared light intensity values). In this embodiment, since the infrared light emitter is used for the mallet, the one having the characteristic of the intensity of the infrared light is determined as a mallet.
[0053]
That is, in step S232, the similarity I _S Find pixels that are close to the mallet above a predetermined threshold. When such a pixel is found, the cumulative value of the occurrence frequency is stored in the counter N. Further, the x coordinate value and the y coordinate value of such a pixel are accumulated and stored in the registers SUMx and SUMy. That is,
[0054]
[Expression 4]

And When step S230 is completed, the number N of all pixels similar to the infrared light pattern from the mallet in the region of FIG. 12 and the cumulative values SUMx and SUMy of the coordinate values are obtained. If N = 0, the result “Not Found” is output in step S236. If N> 0, something like a mallet has been found. In step S238, the mallet position (I _x , I _y )
[0055]
[Equation 5]

Calculate according to The calculated mallet position (I _x , I _y ) Is converted into a table coordinate system in step S220 (FIG. 8), and the converted value is passed to the management unit 5030 as a signal representing the “hand position”. FIG. 10 shows the detailed procedure of the global area search in step S216. In step S240 of FIG. 10, in the image IR of the right field,
[0056]
[Formula 6]

Among the pixels satisfying _S Is stored in the register Max. Here, C and D are constants that determine the roughness of the search, and the definitions of Width and Height are shown in FIG. That is, in step S242, the feature amount I _S Whether or not exceeds a threshold value stored in the threshold value storage register Max. If such a pixel is found, in step S244, in order to set the feature amount as a new threshold, in step S244,
[0057]
[Expression 7]

And In step S246, the most mallet-like pixel (I _x , I _y ) Is passed to step S220.
[0058]
In this way, the mallet is found in the image, and the coordinate value converted into the table coordinate system is passed to the game state management unit 5030.
[0059]
<Game state management>
FIG. 13 shows a game field of the air hockey game of the present embodiment. This field is defined on a two-dimensional plane on the table 1000 and has x and y axes. Further, it has two left and right

virtual goal lines

1200L and 1200R, and

virtual walls

1300a and 1300b provided in the vertical direction of FIG. The

virtual goal lines

1200L and 1200R and the

virtual walls

1300a and 1300b have known coordinate values and do not move. In this field, the virtual image of the pack 1500 moves in accordance with the movement of the

mallets

260R and 260L.
[0060]
The pack 1500 is coordinate information P of the current position. _p And speed information v _p The left mallet 260L is coordinate information P of the current position. _SL And speed information v _SL The right mallet 260R is coordinate information P of the current position. _SR And speed information v _SR And have. FIG. 14 is a flowchart for explaining a processing procedure in the game state management unit 5030.
[0061]
In step S10, the initial position P of the pack 1500 _p0 And initial speed v _p0 Set. The pack is speed v _p Make a constant velocity motion. Further, when the pack hits the wall or mallet, the pack performs a complete elastic collision, that is, the speed direction is reversed. The game state management unit 5030 displays the position information P of each mallet measured by the mallet position measurement unit 5010. _S To speed information v _S Get.
[0062]
Step S12 is executed every Δt time until the game win / loss is determined (one of which takes three points in advance in step S50). Then, in step S12, the position of the pack is
[0063]
[Equation 8]

Updated to The pack position after setting the initial position and initial speed is generally
[0064]
[Equation 9]

It is represented by In step S14, the updated pack position P _p Is in the field on the # 1 side (left player) of the player. A case where the pack 1500 is on the left player side will be described.
[0065]
In step S16, it is checked whether or not the current puck position is at a position where it interferes with the left player's mallet 1100L. The fact that the pack 1500 is in a position where it interferes with the mallet 1100L means that the left player 2000 has performed a mallet operation that causes the mallet 260L to collide with the pack. And the speed of the pack 1500 _p X-direction velocity component v _px And the process proceeds to step S20.
[0066]
Note that the speed is simply v _p X-direction velocity component v _px Instead of inverting the sign of
[0067]
[Expression 10]

As the pack movement, the pack x speed v _px The operation speed of the mallet in the x direction v _SLx May be superimposed and advanced in the opposite direction. On the other hand, if the current puck position is not at a position where it interferes with the left player's mallet 1100L (NO in step S16), the process directly proceeds to step S20.
[0068]
In step S20, the pack position P _p Is in a position that collides with the

virtual wall

1300a or 1300b. If the determination in step S20 is yes, the y component of the pack speed is reversed in step S22. Next, in step S24, it is checked whether or not the current puck position is within the goal line of the left player. In the case of YES, the score of the opponent player, that is, the right (# 2) player is added in step S26. In step S50, it is checked whether any score is 3 points or more in advance. If the score is 3 or more, the game ends.
[0069]
In the determination in step S14, the pack position P _p Is on the right player side (# 2 player side), step S30 and subsequent steps are executed. Steps S30 to S40 are substantially the same in operation as steps S16 to S26. Thus, the progress of the game is managed. The progress state of the game is the position of the pack and the position of the mallet, and is input to the image generation unit 5050 (5050L, 5050R) as described above.
[0070]
<Head position correction>
FIG. 16 shows the overall control procedure of processing in the correction processing unit 5040 (5040L, 5040R). The correction in the correction processing unit 5040 is that the output of the magnetic sensor 220 is accompanied by an error, and the viewpoint position data and head posture data calculated by the measurement unit 5000 based on such output are obtained from the CCD camera 240. A process of correcting based on the marker position in the obtained image. That is, this correction process obtains a correction value of the position of the camera 240 (which is also closely related to the position of the head) from the marker position in the image acquired by the camera 240, and uses the correction value to Change the viewing transformation matrix of the viewpoint. The changed viewing transformation matrix represents the corrected viewpoint position and orientation data. In other words, the corrected viewing transformation matrix provides a virtual image at the corrected viewpoint position.
[0071]
FIG. 26 explains the principle of the correction of the viewpoint position / posture of the observer in the first embodiment. Here, the correction of the viewpoint position / posture of the observer in the embodiment is equivalent to obtaining a corrected viewing transformation matrix. In FIG. 26, it is assumed that the camera 240 of the player captures the marker 1603 in the image 300. The position of the marker 1603 is, for example, (x ₀ , Y ₀ ). On the other hand, if it is known that the marker captured by the image 300 is 1603, the coordinates (X ₀ , Y ₀ , Z ₀ ) Is known. (X ₀ , Y ₀ ) Is the image coordinate value (X ₀ , Y ₀ , Z ₀ ) Is a world coordinate, so these coordinates cannot be compared. In the first embodiment, the viewing transformation matrix MC of the camera 240 is obtained from the output of the magnetic sensor 220, and the coordinates (X ₀ , Y ₀ , Z ₀ ) In the image coordinate system (x ′) using this viewing transformation matrix MC. ₀ , Y ' ₀ ). And (x ₀ , Y ₀ ) And (x ' ₀ , Y ' ₀ ) Expresses an error in the output of the magnetic sensor 22, and thus a correction matrix ΔM for correcting this error. _C Ask for.
[0072]
As is clear from FIG. 26, it is necessary to specify that the marker captured in the image 300 is the marker 1603. In the first embodiment, as described later, the world coordinates of all the markers are used. The three-dimensional position in the system is converted to the viewing transformation matrix M _C To the image coordinate system, and the converted camera coordinate values are ₀ , Y ₀ ) Is identified as the marker captured in the image 300. This process will be described with reference to FIGS.
[0073]
The processing procedure of the correction processing unit 5040 will be described in detail with reference to FIG. That is, in step S400, the viewing transformation matrix (4 × 4) of the camera 240 is calculated based on the output of the magnetic sensor 220. In step S410, the position where each marker should be observed based on the viewing transformation matrix obtained in step S400, the ideal perspective transformation matrix (known) of the camera 240, and the three-dimensional position (known) of each marker. Predict coordinates (in image coordinate system).
[0074]
On the other hand, the marker position detection unit 5060 (5060L, 5060R) tracks the marker in the image obtained from the camera 240 (240L, 240R) attached to the head of the player. The marker position detection unit 5060 passes the detected marker position to the correction processing unit 5040 (in step S420). In step S420, the correction processing unit 5040 (5040L, 5040R) determines the marker currently being observed, that is, the marker serving as a correction reference, based on the transferred marker position information. In step S430, the position and orientation of the camera 240 detected by the magnetic sensor 220 is corrected based on the difference between the predicted coordinate value of the marker calculated in step S410 and the observed coordinate value of the marker detected by the marker position detection unit 5060. A correction matrix ΔMc is obtained. The position and orientation of the camera 240 can be corrected because the marker coordinates detected by the marker position detection unit 5060 (the marker 1603 in the example of FIG. 26) and the marker coordinates based on the head position detected by the magnetic sensor are: This is because the difference calculated in step S430 reflects the error of the magnetic sensor 220 because the sensor outputs should match if they are accurate. The relative relationship between the position and orientation of the camera and the position and orientation of the viewpoint is known, and the relationship is represented by three-dimensional coordinate transformation. Accordingly, based on the correction matrix ΔMc of the position and orientation of the camera, in step S440, the viewing conversion matrix of the viewpoint calculated in step S400 is corrected, and the corrected conversion matrix is used as the image generation unit 5050 (5050L, 5050R). To pass.
[0075]
FIG. 17 shows a processing procedure for marker position detection in the marker position detection unit 5060. In step S500, the color image acquired by the camera 240 is captured. Thereafter, “local area search” is performed in step S502, and “global area search” is performed in step S506, and the marker position (x, y) represented by the image coordinate system is detected. The “local area search” in step S502 and the “global area search” in step S506 are substantially the same as “local area search” (FIG. 9) and “global area search” (FIG. 10) in the mallet search. Therefore, the “local area search” (FIG. 9) and the “global area search” (FIG. 10) are used, and the illustration is omitted. However, as the feature quantity IS for marker search in the incorporated control procedure (step S232), for the player # 1 (left), the pixel value of the target pixel is
[0076]
[Expression 11]

Is used. For player # 1, since red is used for the markers (1600 to 1604), this feature amount represents the degree of redness. Since player # 2 (right) uses a green marker (1650 to 1654),
[0077]
[Expression 12]

Is used. In addition, feature quantity I in global search _S The above two quantities are also used for (x, y). The coordinate values of the markers obtained in step S502 and step S506 are obtained from the ideal image coordinate system without distortion using a matrix M (for example, having a size of 3 × 3) for correcting distortion in step S510. Convert to The conversion formula at this time is
[0078]
[Formula 13]

It is. Next, details of the processing in step S410 in FIG. 16 will be described with reference to FIG. As described above, in step S400, the transformation matrix M from the world coordinate system to the camera coordinate system. _C (4 × 4 viewing transformation matrix) is obtained. On the other hand, the transformation matrix P from the camera coordinate system to the image coordinate system _C (4 × 4) is also given as a known value. Also, the three-dimensional coordinate position (X, Y, Z) of the marker of interest is given as known.
[0079]
That is, the angle r is the rotation in the Z-axis direction at the position of the camera 240, the angle p is the rotation in the X-axis direction at the position of the camera 240, and the angle φ is the Z at the position of the camera 240. Axial rotation (yaw) and camera 240 position (X ₀ , Y ₀ , Z ₀ ), The viewing transformation matrix of the camera 240 (ie, the transformation matrix from the world coordinate system to the camera coordinate system) M _c Is
[0080]
[Expression 14]

Where d is the focal length of the camera 240, w is the width of the imaging surface of the camera, and h is the same height, the transformation matrix P from the camera coordinate system to the image coordinate system _C Is
[0081]
[Expression 15]

It is represented by Accordingly, in step S520 in FIG. 18 (that is, step S410 in FIG. 16), the coordinate position (X, Y, Z) of the marker of interest is determined by the position (x _h , Y _h , Z _h ).
[0082]
[Expression 16]

[0083]
In step S522, as the observed observation coordinate values x and y of the marker in the image coordinate system,
[0084]
[Expression 17]

Get. Thus, in step S410, the predicted predicted coordinate value (x _i , Y _i ) Can be obtained. Next, the “marker discrimination” process in step S420 will be described. FIG. 19 shows the case where the camera 240 of one player has acquired the image 600 on the table 1000.
[0085]
The marker provided on the table 1000 is, for example, M ₁ ~ M ₇ And represented by Δ. 3D position M of this marker _i Is known. In the image 600, a marker M ₂ , M _Three , M ₆ , M ₇ It is included. On the other hand, each marker M _i The observation predicted position of is obtained in step S520, and it is calculated as P _i And Q indicates the marker position detected by the marker position detection unit 5060 and passed from the detection unit 5060.
[0086]
In step S420 “marker discrimination”, the marker position Q detected by the marker position detection unit 5060 indicates which P _i (That is, which M _i ). In FIG. 19, the vector e _i From the detected marker position Q to the predicted position P of each marker. _i It is assumed that the length of the vector toward, that is, the distance is represented. Details of step S420 are shown in FIG. That is, the process of FIG. 20 is performed by the distance e of the marker i (i = 0 to n) entering the image 6000. _i The marker indicating the minimum value is searched for, and the identifier i of the marker is output. That is,
[0087]
[Formula 18]

It is. In the example of FIG. ₂ Distance e between ₂ Therefore, the marker M2 is used as data used for correcting the magnetic sensor output. Thus, no matter how the player moves, the camera 240 captures at least one marker in the image within the range of activity (field). Therefore, it is necessary to limit the size of the field narrowly as in the past. Disappear.
[0088]
Next, in step S430, the error distance e is the same as that described in FIG. _min Based on the transformation matrix ΔM representing the correction of the camera position and orientation _c Ask for. On the other hand, in step S432, based on the magnetic sensor output, the viewing transformation matrix M at the viewpoint position of the player. _V Ask for. M _vc Is a transformation matrix (known) from the camera coordinate system to the viewpoint coordinate system, in step S440, this M _vc , The viewing transformation matrix M of the viewpoint after correction by the following equation: _v Derives'.
[0089]
[Equation 19]

[0090]
It is apparent from FIG. 26 and also from the second embodiment to be described later. In the first embodiment (the process of FIG. 16), the error distance is converted to the image coordinate system. Although e is obtained, the viewing transformation matrix of the corrected viewpoint can be obtained in the same manner even if the error distance e is obtained by converting to the world coordinate system.
[0091]
<Improvement of head position detection accuracy> ... Second Embodiment
In the first embodiment, the HMD 210L (210R) is provided with one camera 240L (240R) for front monitoring. The processing unit 5060 processes the image of the marker on the table 1000 acquired by the camera 240, specifies the marker in the image (step S420), and attaches it to the player's head posture, that is, the head. In other words, a matrix representing viewing transformation by a camera having this attitude is determined. However, since only errors in the image coordinate system are used in the first embodiment, a three-dimensional shift remains in the positional relationship between the camera and the marker.
[0092]
Further, depending on the application for presenting the mixed reality, the marker may be placed at an arbitrary position in the three-dimensional space. In such a case, the marker identification shown in FIG. 16 in the first embodiment is performed. The method is less reliable. The second embodiment proposed next solves this three-dimensional shift problem. That is, the above problem is solved by attaching two cameras to one player and performing marker detection in the world coordinate system. The second embodiment also relieves the constraint that the marker must be placed on a plane.
[0093]
Specifically, two HMDs with two cameras arranged on the left and right are used for two players. That is, as shown in FIG. 21, two cameras 240LR and 240LL (240RR and 240RL) are mounted on the HMD 210L (210R) of the player 2000 (3000), and the cameras 240LR and 240LL (240RR and 240RL) are connected. The posture of the cameras 240LR and 240LL (240RR and 240RL) is corrected from the obtained stereo image.
[0094]
Note that the system of the second embodiment can cope with the case where the markers are arranged three-dimensionally, but in order to clarify the difference from the processing procedure of the first embodiment, the first embodiment Similarly to the above, the present invention is applied to an air hockey game using a plurality of markers arranged on a plane. FIG. 22 shows a part of the image processing system according to the second embodiment. That is, FIG. 22 shows a changed part of the image processing system (FIG. 7) of the first embodiment. That is, comparing FIG. 7 with FIG. 22, the image processing system according to the second embodiment has a marker position detection unit 5060L ′ (5060R ′) in addition to the point that each camera is provided with two cameras. ) And a correction processing unit 5040L ′ (5040R ′), the marker position detection unit 5060L ′ (5060R ′) and the correction processing unit 5040L ′ (5040R ′) of the second embodiment are different from the first embodiment. ) Differs from the marker position detection unit 5060L (5060R) and the correction processing unit 5040L (5040R) of the first embodiment only in software processing.
[0095]
FIG. 23 shows a control procedure for the left player 2000 in the processing procedure of the second embodiment, and particularly corresponds to the control procedure of FIG. 16 of the first embodiment, and is a marker position. The linking operation of the detection unit 5060 ′, the position / posture detection unit 5000, and the correction processing unit 5040L ′ will be described. In FIG. 23, the same position / orientation detection unit 5000 as in the first embodiment calculates the viewing conversion matrix of the viewpoint based on the output of the magnetic sensor 220L in step S398. In step S400 ′, an inverse matrix of the viewing transformation matrix of the camera 240LR is calculated based on the output of the magnetic sensor 220L. This conversion matrix is sent to the correction processing unit 5040 ′.
[0096]
Images from the two cameras 240LL (240LR) are sent to the marker position detection unit 5060L ′. That is, in step S402, the detection unit 5060 ′ includes the marker image m in the image R from the right camera 240LR. _R To extract. The coordinates of the extracted marker (that is, the observation coordinates) are set as I _mR Represented by In step S404, the detection unit 5060 ′ displays the corresponding marker image m in the image L from the right camera 240LL. _L To extract. The coordinates of the extracted marker are I _mL Represented by Marker image m _R And marker image m _L Are originally of the same marker mX, so in step S406, a set of observed marker coordinates (I _mR , I _mL ) Based on the principle of triangulation, the three-dimensional position C of the observed marker extracted in the coordinate system of the camera 240LR _m Is derived.
[0097]
In step S404, the marker image m is obtained using a general stereo method. _L However, in order to perform the processing at high speed, the search range may be limited using a known epipolar bind. Step S410 ′, Step S420 ′, Step S422, and Step S430 ′ of FIG. 23 show processing in the correction processing unit 5040L ′.
[0098]
First, in step S410 ′, the three-dimensional position C of the observed marker in the camera coordinate system is displayed. _m Using the perspective transformation matrix derived in step S400 ′, the three-dimensional position W of the world coordinate system _m Convert to In step S420 ′, all markers m _i 3D position W in the world coordinate system _mi (Known) is taken out from a predetermined memory, and individual markers m _i And observation marker m _X Euclidean distance to W _mi -W _m W that minimizes | _mi To decide. That is, the observation marker m _X Identify the closest known marker.
[0099]
W _mi And W _m Is essentially the same position, but an error vector D (corresponding to e in the first embodiment) is generated due to a sensor error. Therefore, in step S420 ′, the coordinate value W closest to the three-dimensional coordinate (world coordinate) of the observed (tracked) marker is used. _mi In step S430 ′, the distance difference vector D between the observed marker and the determined marker is
[0100]
[Expression 20]

To convert the camera position by this vector amount. _c In step S440 ′, the viewpoint viewing transformation matrix is corrected by the same method as in the first embodiment.
[0101]
Thus, according to the present invention, the position of the observation marker can be detected in three dimensions by using the HMD equipped with two cameras, which makes it possible to detect the position and orientation of the viewpoint more accurately. The connection between the MR virtual image and the real image becomes smooth.
[0102]
<Modification 1>
The present invention is not applied only to the first and second embodiments described above.
[0103]
In the first embodiment, as shown in FIG. 17, in the processing for detecting a marker in an image, the first found marker is used as a tracking target marker. For this purpose, for example, as shown in FIG. ₁ When the image 800 including is obtained, the marker M in the image area 810 of the subsequent frame is included in the area 810 although the marker is included in the area 810. _i It is not inconvenient to determine as a reference marker for correction processing. However, in a subsequent frame, for example, an image 820 is obtained, and a marker M is included in the area. _i Comes off, instead marker M ₂ Is included, the reference marker for correction is the marker M. ₂ I have to change it. Such a marker change is necessary even when tracking fails, and a newly tracked marker is used to correct the positional deviation.
[0104]
As a problem of switching the marker used for correction in this way, there is a case where the virtual object moves unnaturally due to a sudden change in the correction value at the time of switching. Therefore, in order to maintain the temporal consistency of the correction value, it is proposed as a modification to reflect the correction value up to the previous frame in the setting of the next correction value.
[0105]
That is, a correction value (a three-dimensional vector representing a translation in the world coordinate system) at a certain frame is expressed as v. _t , The correction value in the previous frame is v ′ _t-1 V ′ obtained by the following equation _t Is a new correction value.
[0106]
[Expression 21]

Here, α is a constant of 0 ≦ α <1 that defines the degree of influence of past information. The meaning of the above equation is that the correction value v ′ in the previous frame _t-1 Is the correction value v obtained in this frame _t Is used with a contribution of (1-α).
[0107]
By doing so, the rapid change of the correction value is alleviated and the rapid change (unnatural movement) of the three-dimensional virtual image is eliminated. By setting the new correction value α to an appropriate value, it is possible to prevent unnatural movement of the object due to marker switching.
[0108]
<Modification 2>
In the above embodiment, as shown in FIG. 17, the process for detecting a marker in an image is performed on the entire screen regardless of the position of the marker in the previous frame when the marker cannot be found by local search. The point with the highest similarity was used as the marker to be tracked. Here, a modification is proposed in which the marker search is performed centering on the position of the marker found in the previous frame. This is because there is a high possibility that the marker is present at a position that is not greatly deviated from the position present in the previous frame even when the image frame is moved due to the movement of the player.
[0109]
FIG. 25 illustrates the principle of searching for the marker found in the previous frame in the current frame. When a search is performed using such a search route and a point having a similarity equal to or higher than a certain threshold is found, this point is set as a tracking target marker.
[0110]
<Modification 3>
Although the above embodiment uses an optical HMD, the present invention is not limited to the application of the optical HMD, and can also be applied to a video see-through HMD.
[0111]
<Modification 4>
Although the said embodiment was applied to the air hockey game, this invention is not limited to an air hockey game. Since the present invention captures and captures a work of a plurality of people (for example, a mallet operation) using a single camera unit, the work of the plurality of people can be reproduced in a single virtual space. Therefore, the present invention is also suitable for an embodiment of a collaborative work based on two or more workers (for example, MR presentation of design work by a plurality of people or a multiplayer game).
[0112]
The process of correcting the head posture position based on a plurality of markers according to the present invention is not suitable only for the cooperative work of a plurality of people. The present invention is also applicable to a system that presents mixed reality to one worker (or player).
[0113]
<Other variations>
In the second embodiment, two cameras are used, but three or more cameras may be used.
[0114]
As described above, it is sufficient that the marker is captured by at least one of the player's cameras 240. If the number of markers is too large, the number of markers captured in the image increases, and the marker may be erroneously identified in the tracking marker identification process of S430 in FIG. 16 or S430 ′ in FIG. Becomes higher. Therefore, if the operation can restrict the movement of the camera 240 to some extent, the number of markers can be reduced so that only one marker is always captured by the camera.
[0115]
Further, the position / orientation detection apparatus according to the above-described embodiment outputs a viewing transformation matrix corrected at the viewpoint position of the player. However, the present invention is not limited to this, and the viewpoint position of the player is determined. The present invention can also be applied to a device that outputs in the form of corrected values (X, Y, Z, r, p, φ). In addition, the marker may have any shape as long as the system can be recognized as a marker or a mark, and may be a mark instead of a mark.
[0116]
【The invention's effect】
As described above, according to the present invention, the rapid change of the correction value is alleviated and the rapid change (unnatural movement) of the three-dimensional virtual image is eliminated. Further, by setting the new correction value α to an appropriate value, it is possible to prevent unnatural movement of the object due to marker switching.
[Brief description of the drawings]
FIG. 1 is a view for explaining the principle of camera position correction applied in the prior art and in an embodiment of the present invention;
FIG. 2 is a side view showing a configuration of a game device used in an embodiment of the present invention.
FIG. 3 is a diagram for explaining a scene that can be seen in the field of view of the left player in the game apparatus of FIG. 2;
FIG. 4 is a diagram illustrating a configuration of an HMD used in the game device of FIG.
FIG. 5 is a view for explaining the arrangement of markers provided on the table of the game apparatus of FIG. 2;
6 is a diagram for explaining the transition of markers included in an image captured by a camera mounted on the player's head as the player moves on the table of FIG. 5;
FIG. 7 is a diagram illustrating a configuration of a three-dimensional image generation device for the game device according to the embodiment.
FIG. 8 is a flowchart illustrating a processing procedure due to the mallet position measurement unit of the embodiment.
FIG. 9 is a flowchart for explaining a partial subroutine (local search) of a processing procedure due to the mallet position measurement unit of the embodiment.
FIG. 10 is an exemplary flowchart illustrating a partial subroutine (global search) of a processing procedure according to the mallet position measurement unit of the embodiment;
FIG. 11 is a diagram for explaining division of a processing target region used in the processing of the flowchart of FIG. 8;
FIG. 12 is a diagram showing a target area setting technique used in the processing of the flowchart of FIG. 8;
FIG. 13 is a diagram illustrating the configuration of a virtual game field in the game according to the present embodiment.
FIG. 14 is a flowchart for explaining a game management control procedure in the game state management unit of the embodiment;
FIG. 15 is a diagram illustrating a technique for detecting mallet.
FIG. 16 is a flowchart for generally explaining a processing procedure of a correction processing unit in the embodiment;
FIG. 17 is a flowchart for explaining in detail a part (marker tracking) of the flowchart of FIG. 16;
FIG. 18 is a flowchart for explaining in detail a part of the flowchart of FIG. 16 (prediction of marker position).
FIG. 19 is a diagram for explaining the principle of detection of a reference marker used for correction.
FIG. 20 is a flowchart for explaining the principle of detecting a reference marker.
FIG. 21 is a diagram showing a configuration of an HMD used in the second embodiment.
FIG. 22 is a block diagram showing the main configuration of an image processing system according to the second embodiment.
FIG. 23 is a flowchart showing a part of the control of the image processing system of the second embodiment.
FIG. 24 is a diagram for explaining the transition of a reference marker applied to a modification of the embodiment.
FIG. 25 is a view for explaining the principle of marker search applied to a modification of the embodiment.
FIG. 26 is a view for explaining the principle of correction processing according to the first embodiment;

Claims

A first acquisition step of acquiring a photographed image obtained by photographing a real space in which a plurality of markers are arranged with a photographing device ;
A second acquisition step of acquiring the position and orientation of the object in the real space;
A detection step of detecting a position of the marker included in the captured image;
A first calculation step of calculating a first correction value of the position and orientation of the object using the marker;
A correction value acquisition step of acquiring a second correction value of the position and orientation of the object calculated from the captured image of the previous frame of the frame of the captured image;
A second calculation step of calculating a third correction value by correcting the first correction value using the second correction value;
A position and orientation correction step of correcting the position and orientation of the object using the third correction value;
A first generation step of generating a virtual image corresponding to the position and orientation of the object whose position and orientation have been corrected by the position and orientation correction step;
An information processing method characterized in that it comprises a second generating step of generating a composite space image obtained by synthesizing the said captured image and the virtual image.

The detection information processing method according to claim 1, wherein the area for detecting a marker from the captured image, and identifies using the detection result of the marker in the captured image of the previous frame of the captured image.

The detection information processing method according to claim 1, characterized in that to detect the same marker and the detected marker in the captured image of a previous frame of the captured image.

It said second obtaining step obtains the detection result of the sensor for detecting the position and orientation of the object,
Wherein the position and orientation correction process, by correcting the detection result using the third correction value, information processing method according to claim 1, characterized that you correct the position and orientation of the object.

First acquisition means for acquiring a photographed image obtained by photographing a real space in which a plurality of markers are arranged with a photographing device ;
Second acquisition means for acquiring the position and orientation of the object in the real space;
Detecting means for detecting a position of the marker included in the captured image;
First calculating means for calculating a first correction value of the position and orientation of the object using the marker;
Correction value acquisition means for acquiring a second correction value of the position and orientation of the object calculated from the captured image of the previous frame of the frame of the captured image;
Second calculating means for calculating a third correction value by correcting the first correction value using the second correction value;
Position and orientation correction means for correcting the position and orientation of the object using the third correction value;
First generation means for generating a virtual image corresponding to the position and orientation of the object whose position and orientation are corrected by the position and orientation correction means;
The information processing apparatus characterized in that it comprises a second generating means for generating a composite space image obtained by synthesizing the said captured image and the virtual image.