JP3762382B2

JP3762382B2 - Image processing method and apparatus

Info

Publication number: JP3762382B2
Application number: JP2003102140A
Authority: JP
Inventors: 清秀佐藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2003-04-04
Filing date: 2003-04-04
Publication date: 2006-04-05
Anticipated expiration: 2021-03-06
Also published as: JP2003319388A

Description

【０００１】
【発明の属する技術分野】
本発明は、たとえばランドマーク等の、静止物体の特定点を画像から検出するための特定点検出方法及び装置に関する。
【０００２】
【従来の技術】
近年、現実空間に付加情報や仮想物体（以下、仮想画像と総称する）を重畳表示することを目的とした複合現実感（Mixed Reality、以下、ＭＲ技術）に関する研究が盛んに行われている。その中でも、ビデオシースルータイプのヘッドマウンティドディスプレイ（Head-Mounted Display、以下、ＨＭＤ）を観察者が装着して、ＨＭＤに内蔵または装着されたカメラによって撮影される現実画像に、現実空間と仮想空間を３次元的に位置合わせした状態で仮想画像を重畳描画し、その結果生成される複合現実感画像（以下、ＭＲ画像）をリアルタイムにＨＭＤに表示するシステム（本明細書において、このような装置をＭＲシステムと称することにする）が注目されている。
【０００３】
仮想画像と現実画像の位置合せは、ＭＲシステムにおける最大の技術課題であり、その実現には、カメラ視点の位置と方位姿勢の正確な計測が必要である。一般に、３次元位置の既知な複数（理論的には３点以上、安定的に解くためには６点以上）の点の撮影画像上における位置が得られれば、その対応関係からカメラ視点の位置と方位姿勢を求めることができる（本明細書において、このような点をランドマークと称することにする）。すなわち位置合わせの問題は、移動するカメラによって撮影された画像中から、如何に正確にランドマークを追跡あるいは検出し、その位置を得るかに帰着される。
【０００４】
本発明者らは、これまでにゲーム等の分野においてＭＲ技術の応用装置を開発してきた。これらの装置は、屋内での使用を前提としたものであった。
【０００５】
上述のような屋内の使用においては、特徴的なマーカ（赤や緑のような特徴的な色を単色または組み合わせて配置したものや、市松模様や同心円のような特徴的なパターンが用いられる場合が多い）を対象空間中に配置し、これらをランドマークとすることで、画像処理によるランドマークの検出を容易かつ安定的に行うことが可能となり、高精度な位置合わせが実現できる。
【０００６】
色に基づくマーカを用いる場合のマーカの検出方法としては、例えば、ある照明環境下においてマーカを撮影し、その画像中におけるマーカ領域の代表色を抽出しこれを保存しておくことで、撮影画像中におけるマーカ領域の代表色と同一色（あるいはその近傍色）をもつ領域としてマーカを検出する方法が知られている。また、パターンに基づくマーカを用いる場合のマーカ検出方法としては、例えば、ある照明環境下において各マーカを撮影し、その画像中におけるマーカの近傍領域をテンプレート画像として保存しておくことで、テンプレートマッチングによってマーカを検出することができる。すなわち、テンプレート画像と撮影画像の部分領域との間で類似度演算を行い、テンプレート画像に最も類似する部分領域の位置をマーカの位置として検出する。本明細書では、上記におけるマーカ領域の代表色やテンプレート画像といった、マーカを検出するための手掛りとして用いる画像特徴を総称して、「検出パラメータ」と呼ぶこととする。
【０００７】
一方、例えば、ＨＭＤに案内者の仮想画像を表示して、大学構内や観光地の案内を行うなど、屋外での使用を前提としたＭＲシステムに対しても要望が増加している。
【０００８】
屋外では環境中に人為的なマーカを貼ることが困難な場合が多い。このような状況下において観察者視点の位置と方位姿勢を計測する手法としては、カメラによって撮影される撮影画像内において、画像処理によって検出可能な特徴をもつ点（例えば構造物の角、構造物中のテクスチャの多い点、色味が局所的に変化している点等）をランドマークとして用いる手法が知られている。撮影画像からのランドマークの検出には、テンプレートマッチング技術を適用できる。
【０００９】
【発明が解決しようとする課題】
しかしながら、屋外環境においては、天候（晴／曇り／雨）や時間帯（朝／昼／夜）による環境光の変化によってランドマークの見え方（明るさや色味）に変化が生じる。このため、テンプレートマッチングによるランドマークの検出を行おうとした場合、検出パラメータとして予めマッチングのためのテンプレート画像を用意しておいても、環境光の変化によって正しいマッチングを行うことが出来ず、ランドマークの検出が行えなくなるという問題がある。したがって、視点の正しい位置と方位姿勢を得ることができず、現実画像と仮想画像の正しい位置合わせを行うことができないという問題が発生する。また、屋内環境において人為的なマーカを利用した場合であっても、照明環境が変化する場合には同様な問題が生じてしまう。
【００１０】
本発明は、上記の問題に鑑みてなされたものであり、撮影時の環境が変化して特定点として用いるランドマーク等の見え方が変化しても、撮影画像中から特定点を確実に検出可能とすることを目的とする。
【００１１】
【課題を解決するための手段】
上記の目的を達成するための本発明による画像処理方法は、例えば以下の工程を備える。すなわち、
現実空間内に配置されている複数の特定点を用いて撮影部の姿勢を算出する画像処理方法であって、
撮影画像から複数の特定点の夫々を検出するための検出パラメータを複数保持する保持工程と、
撮影部によって撮影された撮影画像を入力する入力工程と、
前記撮像画像の平均輝度を算出する平均輝度算出工程と、
前記保持されている複数の検出パラメータから、前記複数の特定点の夫々に対して、前記平均輝度に応じた検出パラメータを選択する選択工程と、
前記選択工程よって選択された検出パラメータを用いて、前記入力工程によって入力される撮影画像から特定点を検出する検出工程と、
前記検出された特定点の撮影画像における位置を用いて前記撮影部の姿勢を算出する算出工程とを有する。
【００１２】
また、上記の目的を達成するための本発明による画像処理装置は、例えば以下の構成を備える。すなわち、
現実空間内に配置されている複数の特定点を用いて撮影部の姿勢を算出する画像処理装置であって、
撮影画像から複数の特定点の夫々を検出するための検出パラメータを複数保持する保持部と、
撮影部によって撮影された撮影画像を入力する入力部と、
前記撮像画像の平均輝度を算出する平均輝度算出部と、
前記保持されている複数の検出パラメータから、前記複数の特定点の夫々に対して、前記平均輝度に応じた検出パラメータを選択する選択部と、
前記選択部よって選択された検出パラメータを用いて、前記入力部によって入力される撮影画像から特定点を検出する検出部と、
前記検出された特定点の撮影画像における位置を用いて前記撮影部の姿勢を算出する算出部とを有する。
【００１３】
【発明の実施の形態】
以下、添付の図面を参照して本発明の好適な実施形態を説明する。
【００１４】
＜第１実施形態＞
以下に説明する実施形態では、検出パラメータとして、テンプレートマッチングに用いるテンプレート画像を用い、このテンプレート画像を動的に更新することにより、ランドマークの検出精度を向上する。
【００１５】
図１は第１実施形態によるＭＲシステムの構成を説明するブロック図である。図１において、１０１は本発明の第２の撮影手段に相当する固定カメラであり、常にシーン中の同一地点が観測されるように、その設置位置、視点の方位姿勢、焦点距離等が固定されている。すなわち、固定カメラ１０１より得られる撮影画像（以下、固定視点画像Ｉ_Sという）上においては、検出対象であるランドマークＰ_i（iは１〜ランドマーク数）は、常に同一の座標（ｘ_i，ｙ_i）で撮影されている。
【００１６】
１０２はテンプレート画像作成モジュールであり、固定視点画像Ｉ_Sから、各ランドマークＰ_iに対応するテンプレート画像Ｔ_iを生成する。テンプレート画像の生成方法には後に説明するような種々の方法があるが、本実施形態では、ランドマークＰ_iの観測座標（ｘ_i，ｙ_i）は既知であるものと仮定する。また、テンプレート画像Ｔ_iは、（ｘ_i，ｙ_i）を中心とした一定範囲の矩形領域Ｒ_iをＩ_Sから抽出することで生成する。このテンプレート画像Ｔ_iは、後に説明するように、ランドマーク検出のためのテンプレートマッチング処理に用いられる。なお、このテンプレート画像Ｔ_iは、所定のタイミング、例えば固定カメラ１０１の１フレーム毎に更新される。
【００１７】
１１０は観察者が装着するＨＭＤであり、観察者視点カメラ１１１とディスプレイ１１２を備える。観察者視点カメラ１１１はＨＭＤ１１０に固定されており、その撮影画像は、観察者の視点位置、方向に対応した画像（以下、観察者視点画像Ｉという）となる。ここで、観察者カメラ１１１は第１の撮影手段の一態様に相当し、この観察者視点画像が、特定点（ランドマーク）検出の対象となる対象画像に相当する。
【００１８】
１１３はランドマーク検出モジュールであり、テンプレート画像作成モジュール１０２から提供されるテンプレート画像Ｔ_iを用いてテンプレートマッチングによる探索処理を行うことにより、観察者視点カメラ１１１より提供される観察者視点画像ＩからランドマークＰ_iを検出する。上述したようにテンプレート画像作成モジュール１０２は所定のタイミングでテンプレート画像を更新しているので、ランドマーク検出モジュールでは、観察者視点画像Ｉとほぼ同一時刻に撮影された（すなわち、観察者視点画像Ｉとほぼ同一光源環境下において撮影された）テンプレート画像を用いてテンプレートマッチングを行うことができる。したがって、屋外環境のように光源環境が動的に変化する状況下においても、常に安定したテンプレートマッチングを行うことが可能であり、ランドマーク位置の正確な検出が実現できる。
【００１９】
ランドマーク検出モジュール１１３はさらに、検出されたランドマークＰ_iの当該観察者視点画像Ｉ上の座標値（ｕ_i，ｖ_i）を求め、視点位置推定モジュール１１４へ送る。なお、（ｕ_i，ｖ_i）は、テンプレート画像と一致した領域の中心位置とする。
【００２０】
視点位置推定モジュール１１４では、ランドマーク検出モジュール１１３から提供される複数のランドマークの画像座標値と、予め計測し既知の情報として保持している実空間におけるランドマークの位置に基づいて、周知の方法により観察者の視点位置及び方位姿勢を算出する。なお，理論的には、観察者視点画像Ｉ上の３ヶ所のランドマークの座標値があれば、当該観察者視点画像の視点位置及び方位姿勢を算出することができる。
【００２１】
以上のようにして算出された視点位置及び方位姿勢は、仮想画像生成モジュール１１５に提供される。仮想画像生成モジュール１１５は、視点位置推定モジュール１１４から提供された視点位置及び方位姿勢から観察されるであろう仮想画像を観察者視点画像Ｉ上に重畳描画し、これをＨＭＤ１１０のディスプレイ１１２に表示する。この結果、現実空間と仮想空間が正確な位置合せのもとに融合されたＭＲ画像がディスプレイ１１２に表示され、観察者はこれを観察することになる。
【００２２】
なお、屋外で観察者が移動することを想定すると、固定カメラ１０１及びテンプレート画像作成モジュール１０２を含むユニット（固定部分）と、ＨＭＤ１１０及びランドマーク検出モジュール１１３を含むユニット（観察者に装着される部分）とは別体であることが好ましい。この場合、テンプレート画像作成モジュール１０２からランドマーク検出モジュール１１３へのテンプレート画像の送信は、有線或いは無線で行われる。
【００２３】
図２は第１の実施形態によるランドマーク検出処理の概要を説明する図である。２０１は固定カメラ１０１で撮影された固定視点画像Ｉ_Sであり、本例では７つのランドマーク（Ｐ₁〜Ｐ₇）が設定されている。前述のように、固定視点画像２０１中のランドマーク位置（ｘ_i，ｙ_i）は既知である。従って、テンプレート画像作成モジュール１０２は、固定視点画像２０１内の各ランドマーク位置（ｘ_i，ｙ_i）を中心とした所定領域Ｒ₁〜Ｒ₇を抽出することでテンプレート画像Ｔ₁〜Ｔ₇を生成することができる。このようにして、テンプレート画像作成モジュール１０２は、所定のタイミングで最新の固定視点画像Ｉ_Sを用いてテンプレート画像Ｔ_iを生成する。
【００２４】
ランドマーク検出モジュール１１３は、以上のようにして生成された、最新のテンプレート画像Ｔ_iを用いて、ＨＭＤ１１０が備える観察者視点カメラ１１１より得られる観察者視点画像Ｉ（２０２）にテンプレートマッチングを行い、ランドマークを検出する。
【００２５】
図３はテンプレート画像作成モジュール１０２によるテンプレート画像作成処理の手順を説明するフローチャートである。まずステップＳ３０１において、テンプレート画像の更新タイミングか否かを判定する。本実施形態では、テンプレート画像の更新タイミングを固定カメラ１０１のフレーム周期と一致させるものとするが、もちろんこれに限定されるものではない。例えば、所定の時間の経過毎にテンプレート画像の更新を行う、固定カメラ１０１が所定フレーム数の撮影を終える毎にテンプレート画像の更新を行う、前回のテンプレート画像の更新時の固定視点画像と現在の固定視点画像の平均輝度値の差が所定値以上になったときにテンプレート画像の更新を行う、或いはこれらのタイミングの組み合わせなど、種々の変形が可能であることは明らかであろう。
【００２６】
ステップＳ３０１においてテンプレート画像の更新タイミングであった場合は、ステップＳ３０２に進み、固定視点カメラ１０１からの固定視点画像Ｉ_Sを入力する。そして、ステップＳ３０３において、画像Ｉ_Sの中から、ランドマークＰ_iに対応する所定の矩形領域Ｒ_i（例えば(x_i-n＜x＜x_i+n,y_i-n＜y＜y_i+n;nは定数)を満たすような(x,y)）の画像を抽出し、これをテンプレート画像Ｔ_iとする。ステップＳ３０４では、ステップＳ３０３で得られたテンプレート画像Ｔ_iをランドマーク検出モジュール１１３に出力する。
【００２７】
ステップＳ３０５では、ランドマークＰ_iのすべてについてテンプレート画像の生成を終えたかどうかを判定し、未処理のランドマークがあれば、ステップＳ３０６で、そのランドマークに処理の対象を移し、ステップＳ３０３へ戻って上記処理を繰り返す。すべてのランドマークについてテンプレート画像の生成及び出力を終えたならば、ステップＳ３０５よりステップＳ３０１へ処理を戻し、次の更新タイミングを待つ。
【００２８】
以上の処理によって、所定のタイミングで（本実施形態ではフレーム単位で）更新されたテンプレート画像がランドマーク検出モジュール１１３へ提供されることになる。
【００２９】
なお、上記実施形態では、ステップＳ３０３において、画像Ｉ_Sから抽出した矩形領域Ｒ_iをそのままテンプレート画像Ｔ_iとしたが、テンプレート画像の生成方法はこれに限られるものではない。たとえば、過去複数フレームにおける固定視点画像Ｉ_Sから抽出した複数の矩形領域Ｒ_iを用いて、その平均画像や重み付き平均画像を作成し、これをテンプレート画像Ｔ_Iとしてもよい。この場合、固定視点画像Ｉ_Sに含まれるノイズ成分を取り除くことが期待できる。
【００３０】
また、上記実施形態では、ステップＳ３０４において、ステップＳ３０３で生成したテンプレート画像を全て出力してたが、テンプレート画像の出力方法はこれに限られるものではない。たとえば、最後に出力したテンプレート画像Ｔ_i’とステップＳ３０３で生成したテンプレート画像Ｔ_iとの相違度ｅを算出し、相違度が一定値以上の場合（ｅ≧ＴＨ₁）にのみ光源環境が変化したと判断してテンプレート画像の出力を行ってもよい。この場合、不必要なデータ送信を省略することで、ネットワークのトラフィックを軽減させることができる。また、ランドマークと固定カメラ１０１の間に障壁物が進入し固定視点画像Ｉ_S上でランドマークが観測されていない場合に、障壁物を撮影した誤った画像にテンプレート画像が更新されてしまうことを防ぐために、相違度が一定値以上の場合（ｅ≧ＴＨ₂）はランドマークが隠蔽されていると判断し、テンプレート画像の出力を行わないとしてもよい。なお、テンプレート画像間の相違度の演算は、相互相関や画素値の差分絶対値の和等、周知の画像処理手法を用いることができる。
【００３１】
次に、ランドマーク検出モジュール１１３による処理を説明する。図４はランドマーク検出モジュールによるランドマークの検出手順を説明するフローチャートである。
【００３２】
ステップＳ４０１、Ｓ４０２は、上述のテンプレート画像作成モジュール１０２からテンプレート画像Ｔ_iが出力された場合に、テンプレートマッチングにおいて用いるためにこれをメモリに格納する処理である。なお、本実施形態では、上述の図３において１つのテンプレート画像が得られるごとにそのテンプレート画像が出力される（ステップＳ３０３、Ｓ３０４）ので、ステップＳ４０１、Ｓ４０２におけるテンプレート画像の更新は１つのテンプレート画像毎に行われることになる。ただし、テンプレート画像の更新手順はこれに限られるものではない。例えば、テンプレート画像作成モジュール１０２において固定視点画像Ｉ_Sに含まれる全てのランドマークに対するテンプレート画像の生成を終えてから、それらテンプレート画像を一括して出力するようにすれば、ランドマーク検出モジュール１１３においては、全テンプレート画像の更新が一括して行われることになる。
【００３３】
ステップＳ４０１においてテンプレート画像が受信されていない場合、或いはステップＳ４０２を終了したあと、処理はステップＳ４０３へ進み、観察者視点画像Ｉが入力されたか否かを判定する。上述のように観察者視点画像Ｉは観察者視点カメラ１１１より出力された画像データであり、ステップＳ４０４〜Ｓ４０７の処理によってこの観察者視点画像Ｉからランドマークが検出される。従って、本実施形態では、観察者視点カメラ１１１から観察者視点画像が入力される毎に（すなわちフレーム毎に）ランドマークの検出が行われることになる。
【００３４】
ステップＳ４０４では、テンプレート画像Ｔ_iを用いて観察者視点画像ＩからランドマークＰ_iを検出する。この検出処理には、周知のテンプレートマッチングの何れの手法を用いても良い。例えば、観察者視点画像Ｉ中の各画素（ｕ_j，ｖ_j）ごとに、その画素を中心としてテンプレート画像Ｔ_iと同サイズの領域を部分画像Ｑ_jとして抽出し、部分画像Ｑ_jとテンプレート画像Ｔ_iとの間で相違度ｅ_jを算出する。相違度の算出方法としては、両画像間の相互相関を求めても良いし、対応する画素同士の輝度値の差分値の絶対値の和を用いても良いし、入力画像がカラー画像の場合には、対応する画素同士のＲＧＢ距離の和を用いても良い。観察者視点画像Ｉ中の全ての画素（ｕ_j，ｖ_j）について部分画像Ｑ_jとテンプレート画像Ｔ_iとの間の相違度ｅ_jを求め、相違度ｅ_jを最小とする画素を（すなわち、テンプレート画像Ｔ_iと最も一致した部分画像Ｑ_jの中心座標（ｕ_j，ｖ_j）を）、観察者視点画像ＩにおけるランドマークＰ_iの検出位置（ｕ_i，ｖ_i）とする。
【００３５】
ステップＳ４０５では、座標（ｕ_i，ｖ_i）を、観察者視点画像ＩにおけるランドマークＰ_iの検出位置として、視点位置推定モジュール１１４へ出力する。なお、ステップＳ４０４において観察者視点画像Ｉにテンプレート画像Ｔ_iとマッチングする部分が存在しないと判断された場合（例えば、全ての相違度ｅ_jが設定した閾値を越えた場合）は、ランドマークＰ_iが観察者視点画像Ｉ上に存在しない旨の情報を出力するか、本処理をスキップする。ステップＳ４０６では、全てのランドマークＰ_iについて検出処理を終えたか否かを判定する。まだ未処理のランドマークがあれば、ステップＳ４０７へ進んで、未処理のランドマークＰ_iを検出対象とし、ステップＳ４０４以降の処理を繰り返す。全てのランドマークＰ_iについて処理を終えたならば、ステップＳ４０１へ戻る。
【００３６】
なお、テンプレート画像作成モジュール１０２とランドマーク検出モジュール１１３を同期して動作させることで、本発明はさらに効果を増す。すなわち、ステップＳ４０１においてテンプレート画像を受信したのちに、ステップＳ４０３において、受信したテンプレート画像の元となった固定視点画像Ｉ_Sと同一時刻に撮影された観察者視点画像Ｉを入力することで、観察者視点画像Ｉと同一光源環境下において撮影されたテンプレート画像を用いたテンプレートマッチングが可能となる。この処理を厳密に実現するためには、固定カメラ１０１と観察者視点カメラ１１１の撮像が電気的に同期されていることが望ましいことはいうまでもない。
【００３７】
また、上記実施形態では全てのランドマークについて検出処理を行うが、観察者視点位置の算出を可能にする所定数のランドマークが検出された時点で処理を打ち切るようにしてもよい。
【００３８】
また、上記処理では、テンプレート画像作成モジュール１０２が更新されたテンプレート画像を出力することによりランドマーク検出モジュール１１３におけるテンプレート画像の更新を行ったが、ランドマーク検出モジュールが１１３が、必要に応じてテンプレート画像作成モジュール１０２に格納された最新のテンプレート画像を読み込むようにしてもよい。その読み込みのタイミングは、例えば観察者視点画像Ｉが入力される毎、所定の時間間隔毎等となる。この場合、テンプレート画像作成モジュール１０２は自身の記憶媒体に作成したテンプレート画像を保持し、ランドマーク検出モジュール１１３からの要求により、最新のテンプレート画像がテンプレート画像作成モジュール１０２からランドマーク検出モジュール１１３へ送信される。
【００３９】
また、上記ステップＳ４０４においては、観察者視点画像Ｉの全体を走査してランドマークＰ_iを検出しているが、テンプレートマッチングの処理の効率化を図るための、周知の各種手法を適用することが可能である。一例を示せば次のとおりである。
【００４０】
図５はランドマーク検出処理時の探索領域を限定する方法を説明する図である。観察者視点画像Ｉの前フレーム（或いは過去のフレーム）での観察者カメラの位置姿勢や、前フレーム（或いは過去のフレーム）でのランドマークの検出位置等の情報を用いて、各ランドマーク毎に現フレームの観察者視点画像Ｉにおけるおおよその位置を推定し、その周辺の領域に探索領域を設定する。もちろん直前の視点位置推定モジュール１１４による位置データを用いてもよい。そして、現フレームの観察者視点画像Ｉにその探索領域が含まれるランドマークＰ_iについてのみ、その探索領域内での探索処理を行う。図５の例で説明すれば、（ａ）において示されるランドマークＰ₁〜Ｐ₇のそれぞれの探索領域が、観察者視点画像Ｉに対して（ｂ）に示されるように求められたとする。この場合、ステップＳ４０４では、Ｐ₃〜Ｐ₅の探索領域全てとＰ₂の探索領域の観察者視点画像Ｉに含まれる部分について、対応するランドマークの探索を行うことになる。即ち、探索範囲の絞込により処理の高速化が実現される。
【００４１】
以上説明したように、第１の実施形態によれば、固定カメラ１０１で撮影した画像を用いてテンプレート画像の更新を行うので、環境の変化に追従して、環境に対応したテンプレート画像を得ることができる。このため、環境の変化によらず観察者視点画像Ｉから確実にランドマークを検出することが可能となるので、屋外環境における観察者の視点の位置及び方位姿勢を正確に求めることが可能になる。従って、特にＨＭＤ１１０が備えるディスプレイ１１２上にＭＲ画像を表示する場合の、現実空間と仮想空間との位置合せとして好適である。
【００４２】
なお、本実施形態では固定視点画像２０１における各ランドマークの位置は既知であり、例えばテンプレート画像作成モジュールの不図示のメモリに保持しておき、必要に応じて取得され、テンプレート画像作成モジュール１０２に供給されるものとする。このようなランドマークの位置の供給手段としては、これ以外にも次のような方法をとることができる。すなわち、不図示の入力手段によってオペレータが固定視点画像２０１上でランドマークの位置を直接指定してもよいし、何らかの方法で計測した３次元空間中の各ランドマークの位置と固定カメラ１０１のカメラパラメータ（少なくとも位置及び方位姿勢を含む）をメモリに保持しておき、この情報に基づいて、不図示のランドマーク位置算出手段（本発明の特定点位置算出手段に相当する）によって固定視点画像２０１上における各ランドマークの位置を算出するようにしてもよい。また、検出するランドマークが予め定められておらず、観察者画像２０２中の何らかの特徴点を追跡すれば良い用途の場合には、不図示の特徴抽出手段によって初期時刻において固定視点画像２０１上から顕著な画像特徴（例えばエッジ部分やテクスチャ性の強い部分）を持つ特徴点を自動的に抽出し、この位置をランドマークの位置としてもよい。
【００４３】
＜第２の実施形態＞
上記第１の実施形態では１台の固定カメラでテンプレート画像の更新を行うので、テンプレート画像の獲得範囲が限られ、観察者の移動および／又は見回し範囲が限定されてしまう。そこで、第２の実施形態では、複数台の固定カメラを設置して、観察者が広範囲に移動および／又は見回しできるようにする。ただし、複数台の固定カメラを用いるので、１つのランドマークに対して複数のテンプレート画像が存在する場合（以下、オーバーラップ有りの場合と称する）と、１つのランドマークに１台の固定カメラを割り当てることにより１つのテンプレート画像のみが存在するようにする場合（オーバーラップ無しの場合と称する）とが存在する。第２の実施形態では、オーバーラップ無しの場合について説明し、オーバーラップありの場合については第３の実施形態で説明することにする。
【００４４】
オーバーラップの無い場合、固定カメラを複数設けたＭＲシステムは、第１の実施形態と類似の構成で実現できる。図６は第２の実施形態によるＭＲシステムの構成を示すブロック図である。すなわち、テンプレート画像作成モジュール６０２は、複数台の固定カメラ６０１から得られる複数の固定視点画像より、それぞれについて予め決められた領域Ｒ_iのデータを抽出し、これをテンプレート画像Ｔ_iとして出力する。
【００４５】
ランドマーク検出モジュール６１３は、第１の実施形態と同様に、テンプレート画像作成モジュール６０２から送信されたテンプレート画像によって使用するテンプレート画像を更新し、そのテンプレート画像を用いて、観察者視点画像Ｉからランドマークの検出を行う。カメラ選択モジュール６１６は、視点位置推定モジュール６１４から得られた視点位置の近くにある所定台数の固定カメラを選択し、その選択結果をランドマーク検出モジュール６１３に通知する。後述するが、第２の実施形態では、処理効率を向上するために、視点位置推定モジュール６１４から出力される視点位置に基づいて、カメラ選択モジュール６１６がどの固定カメラからのテンプレート画像を使用するかを決定する。そして、その決定された固定カメラからのテンプレート画像を用いて、ランドマーク検出モジュール６１３がランドマークの検出のためのテンプレートマッチングを行う。
【００４６】
仮想画像生成モジュール１１５、ＨＭＤ１１０については第１の実施形態で説明したとおりである。
【００４７】
図７は第２の実施形態によるランドマーク検出処理の概要を説明する図である。複数台の固定カメラ６０１（Ａ〜Ｅ）によって得られた各固定視点画像Ｉ_S1〜Ｉ_S5上におけるランドマークＰ₁〜Ｐ₁₃の観測位置が定められており、その周辺の矩形領域Ｒ₁〜Ｒ₁₃を抽出することでそれぞれに対応するテンプレート画像Ｔ₁〜Ｔ₁₃が生成される。そして、これらテンプレート画像を用いて観察者視点画像Ｉからランドマークを検出すればよい。この場合の処理は、本質的には固定カメラが１台の場合と同様であり、１台のカメラの画角が広くなったものと考えればよく、図３及び図４で説明した処理手順によりランドマークの検出が行える。
【００４８】
以上のように、複数の固定カメラを設けた第２の実施形態においても、第１の実施形態と同様の処理で（すなわち、図６中のカメラ選択モジュール６１６が存在しない構成においても）観察者視点の位置及び方位姿勢を検出できる。ただし、ランドマークの数が多くなるので、毎回全てのランドマークに対して検出処理を行うと処理効率が低下する。従って、第２の実施形態では、ランドマーク検出モジュール６１３において検出の対象とするランドマークの数を予め限定しておくことで処理効率を向上させる。すなわち、カメラ選択モジュール６１６によって選択された固定カメラで観測されているランドマークのみに、検出の対象とするランドマークの絞り込みを行う。
【００４９】
これは、例えば、図４に示す処理において、図８に示すようにステップＳ８０１をステップＳ４０４の前に追加することで実現できる。観察者視点画像Ｉが入力されると、ステップＳ４０３からステップＳ８０１へ処理が進み、ランドマークＰ_iがカメラ選択モジュール６１６で選択された固定カメラで観測されているものであるかどうかを判断する。ここで、ランドマークＰ_iが選択された固定カメラで観測されているものでなければ、当該ランドマークの検出処理（ステップＳ４０４、Ｓ４０５）をスキップして、次のランドマークを検出すべくステップＳ４０６へ進む。一方、ランドマークＰ_iが選択された固定カメラで観測されているものであれば、そのランドマークを検出すべくステップＳ４０４へ進む。
【００５０】
なお、第２の実施形態においても、テンプレートマッチングの処理の効率化を図るための、周知の各種手法を適用することが可能である。例えば、第１の実施形態で述べた探索領域を限定する手法も有効である。特に、上述したような、使用するテンプレート画像の限定を行ってから探索領域を特定することで、不必要な探索領域の位置計算を不要にすることができ、効果的である。
【００５１】
図９は第２の実施形態における、ランドマーク検出処理時にテンプレート画像の探索領域を限定する方法を説明する図である。例えばカメラ選択モジュール６１６が、検出された視点位置に基づいて、図７に示した固定カメラＡ、Ｂ、Ｃを選択したとする。この場合、検出の対象となるのはランドマークＰ₁〜Ｐ₈であり、他のランドマークＰ₉〜Ｐ₁₃については考慮されない。そして、ステップＳ４０４では、これらのランドマークＰ₁〜Ｐ₈のうち、観察者視点画像にその探索領域が含まれるもの（図ではＰ₂〜Ｐ₆）についてのみ、対応するテンプレート画像Ｔ₂〜Ｔ₆を用いたテンプレートマッチングによって、ランドマークの検出処理が行われる。
【００５２】
以上のように、第２の実施形態によれば、複数の固定カメラを用いてテンプレート画像の更新を行うので、観察者のより広範囲な移動が許容される。
【００５３】
＜第３の実施形態＞
次に、複数の固定カメラを備えたことにより、１つの時点において、１つのランドマークに複数のテンプレート画像が存在する場合、すなわちオーバーラップのある場合を説明する。
【００５４】
図１０は、第３の実施形態による、オーバーラップのある場合のランドマーク検出処理の概要を説明する図である。固定カメラＦにはランドマークＰ₁とＰ₂が観測されており、その周辺に定められた矩形領域Ｒ₁ ^F、Ｒ₂ ^Fによってテンプレート画像Ｔ₁ ^F、Ｔ₂ ^Fが生成される。また、固定カメラＧにはランドマークＰ₁〜Ｐ₃が観測されており、その周辺に定められた矩形領域Ｒ₁ ^G〜Ｒ₃ ^Gによってテンプレート画像Ｔ₁ ^G〜Ｔ₃ ^Gが生成される。同様にして、固定カメラＨからはテンプレート画像Ｔ₁ ^H〜Ｔ₃ ^Hが得られる。ここで、例えばＴ₁ ^FとＴ₁ ^GとＴ₁ ^Hは空間中の同一のランドマークＰ₁に対応するテンプレート画像である。
【００５５】
このように、１つのランドマークに対して異なる固定カメラによって複数のテンプレート画像が得られている場合には、どのテンプレート画像を用いてランドマークを検出するかを決める必要がある。以下では、（１）テンプレートマッチングの結果が最良のものを用いる場合、（２）観察者位置に基づいて選択された固定カメラによって得られるテンプレート画像を用いる場合の２つについて説明する。なお、第３の実施形態では、たとえばカメラＦ，Ｇ，Ｈのそれぞれによって得られた撮影画像から取得されたテンプレート画像が図１６のように格納されているとする。たとえば、カメラＦの撮影画像からはランドマークＰ₁〜Ｐ₆のテンプレート画像Ｔ₁ ^F〜Ｔ₆ ^Fが、カメラＧの撮影画像からはランドマークＰ₃〜Ｐ₈のテンプレート画像Ｔ₃ ^G〜Ｔ₈ ^Gが、カメラＨの撮影画像からはランドマークＰ₃〜Ｐ₈のテンプレート画像Ｔ₇ ^H〜Ｔ₁₂ ^Hがそれぞれ取得され、格納されている。ここで、添え字番後が同じランドマークは同一のランドマークである。たとえば、ランドマークＰ₆のテンプレート画像は、カメラＦとカメラＧの各撮影画像から取得されている。
【００５６】
（１）テンプレートマッチングの結果が最良のものを用いる場合について
図１１は、同一ランドマークに複数のテンプレート画像が存在した場合に、マッチングの結果が最良のものを用いてランドマーク検出を行う場合の手順を説明するフローチャートである。図１１では、図４のステップＳ４０４の部分に置き換わる処理を示している。
【００５７】
ステップＳ４０３において観察者視点画像Ｉが入力されると、ステップＳ１１００で固定カメラｊで得られたランドマークＰ_iのテンプレート画像Ｔ_i ^jを用いて観察者視点画像ＩからランドマークＰ_iを検出する。そして、ステップＳ１１０１で、このランドマークＰ_iが複数のテンプレート画像を有しており、既に別のテンプレート画像によって座標が算出されているか否かを判断する。別のテンプレート画像によって座標が算出されていない場合や、対応するテンプレート画像が複数存在しない場合には、ステップＳ１１０４で当該テンプレート画像によって求まる座標値とそのマッチング度をメモリに格納する。
【００５８】
一方、既に別のテンプレート画像によって座標が出力されている場合は、ステップＳ１１０２へ進み、メモリに格納されている別のテンプレート画像によるマッチング結果と今回のテンプレート画像によるマッチング結果とを比較する。そして、今回のテンプレート画像によるマッチングのほうが良好な結果であった場合（マッチング度が大きかった場合）は、ステップＳ１１０３へ進み、当該ランドマークのメモリに記憶されている座標を今回のテンプレート画像を用いて得られた座標値とマッチング度で置換する。たとえば、Ｔ₆ ^Gについてマッチングを行ったときに、すでにＴ₆ ^Fを用いたマッチングが実行されてそのマッチング度が格納されていた場合は、Ｔ₆ ^Gを用いたときのマッチング度とＴ₆ ^Fを用いたときのマッチング度が比較され、マッチング度の高い方を採用する。
【００５９】
次に、ステップＳ１１０５において、ランドマークＰ_iに対応する全てのテンプレート画像Ｔ_i ^jについて処理を終えていない場合には、ステップ１１０６へ進み、未処理のテンプレート画像Ｔ_i ^jを処理対象としてステップＳ４０４以降の処理を繰り返す。一方、ランドマークＰ_iに対応する全てのテンプレート画像Ｔ_i ^jについて処理を終えている場合には、ステップＳ４０５へ進み、メモリに格納された座標をランドマークＰ_iの検出位置としてランドマーク検出モジュールに対して出力する。以上のようにして、全てのテンプレート画像について処理を行うことで、１つのランドマークに複数のテンプレート画像が存在した場合には、最良のマッチング度を有するテンプレート画像による座標値が採用されることになる。
【００６０】
（２）観察者位置に基づいて選択された固定カメラによって得られるテンプレート画像を用いる場合について
図１２は、同一ランドマークに複数のテンプレート画像が存在した場合に、観察者位置に基づいて選択された固定カメラによって得られるテンプレート画像を用いてランドマーク検出を行う場合の手順を説明するフローチャートである。図１２では、図４のステップＳ４０４の前に追加される処理が示されている。
【００６１】
ステップＳ４０３において観察者視点画像Ｉが入力されると、ステップＳ１２０１において、これから検出処理を行うランドマークＰ_iに関して複数のテンプレート画像が存在するか否かを判定する。複数のテンプレート画像が存在しない場合は、当該ランドマークについて１つのテンプレート画像しか存在しないので、ステップＳ４０４へ進み、テンプレートマッチングによるランドマーク検出を行う。
【００６２】
一方、複数のテンプレート画像が存在する場合は、ステップＳ１２０２において、当該複数のテンプレート画像の中から観察者位置に最も近い固定カメラから得られたテンプレート画像を選択し、これを検出処理に用いるテンプレート画像Ｔ_iとして、ステップＳ４０４へ進む。たとえば、図１６において、観察者位置がカメラＦよりもカメラＧに近い状態であれば、ランドマークＰ₃〜Ｐ₆に関してはカメラＧで撮影された画像から得られるテンプレート画像Ｔ₃ ^G〜Ｔ₆ ^Gが採用される。
【００６３】
以上のようにして全てのテンプレート画像について処理を行うことで、１つのランドマークに複数のテンプレート画像が存在した場合には、観察者位置に最も近い固定カメラからのテンプレート画像が採用されて、ランドマークの検出が行われることになる。
【００６４】
以上のように第３の実施形態によれば、１つのランドマークに複数の固定カメラから得られる複数のテンプレート画像が存在した場合に、適切なテンプレート画像を選択することが可能となる。特に図１０に示したように、１つのランドマークを異なる方向から撮影して得られた複数の固定視点画像のそれぞれから得られるテンプレート画像を適切に用いることができるので、観察する方向によってランドマークの見え方が大きく違う場合（例えば、立体的な形状や、鏡面に近い反射特性であった場合）でも、適切にテンプレートマッチングを行える。
【００６５】
なお、第２の実施形態で説明したようなカメラ選択モジュール６１６との併用も可能である。この場合、図１１、図１２で説明した処理の対象となるランドマークが、カメラ選択モジュール６１６で選択された固定カメラから得られたランドマークのみとなる。
【００６６】
また、第３の実施形態においても、テンプレートマッチングの処理の効率化を図るための、周知の各種手法を適用することが可能であることはいうまでもない。
【００６７】
＜第４の実施形態＞
第１〜第３の実施形態では、固定カメラを用いて得られた固定視点画像より随時テンプレート画像を作成することにより、ランドマーク検出モジュール１１３で行われるテンプレートマッチングに用いるテンプレート画像を更新している。この手法によれば、各時点において撮影された画像が用いられてテンプレート画像が生成されるので、そのときそのときのランドマークの見え方がテンプレート画像に反映され、良好なテンプレートマッチングを行うことができる。しかしながら、１つまたは複数の固定カメラを用意しなければならず、装置規模が大きくなってしまう。そこで、第４の実施形態では、１つのランドマークについて予め複数種類のテンプレート画像を登録しておき、これを用いてテンプレート画像の更新を行う。
【００６８】
図１３Ａは第４の実施形態によるＭＲシステムの構成を示すブロック図である。１３０１はテンプレート画像格納部であり、複数のランドマークのそれぞれについて複数種類のテンプレート画像１３１０が登録されている。１３０２はテンプレート画像選択モジュールであり、テンプレート画像格納部１３０１に格納された複数のテンプレート画像のうち、各ランドマークについて１つのテンプレート画像を選択する。本例では、ＨＭＤ１１０に搭載された観察者視点カメラ１１１によるその時点の撮影画像より、平均輝度値算出モジュール１３０３によって平均輝度値に基づいて使用するテンプレート画像を選択する（後に詳述する）。従って、テンプレート画像格納部１３０１は、図１３Ｂに示すように輝度値の範囲によって使用すべきテンプレート画像が分類され、格納されている。なお、テンプレート画像を変更すべき輝度値はランドマーク毎に異なるので、図１３Ｂに示すように、輝度値範囲が異なっても同じテンプレート画像を用いる場合もある。例えば、ランドマーク＃１は輝度値範囲ＢでもＣでも同じテンプレート画像Ｔ_1Bが用いられる。
【００６９】
ランドマーク検出モジュール１３１３は、テンプレート画像選択モジュール１３０２によって取得されたテンプレート画像を用いて、観察者視点画像Ｉについてテンプレートマッチングを行い、ランドマークを検出する。視点位置推定モジュール１１４、仮想画像生成モジュール１１５、ＨＭＤ１１０については第１の実施形態（図１）で説明したとおりである。
【００７０】
平均輝度値算出モジュール１３０３はＨＭＤ１１０に装着された観察者視点カメラ１１１からの撮影画像から平均輝度値を求め、その算出結果をテンプレート画像選択モジュール１３０２に提供する。テンプレート画像選択モジュール１３０２は、この平均輝度値に基づいてテンプレート画像格納部１３０１より各ランドマークのテンプレート画像を選択し、ランドマーク検出モジュール１３１３に出力する。
【００７１】
図１４は第４の実施形態によるテンプレート画像選択モジュールの処理手順を説明するフローチャートである。まず、ステップＳ１４０１において、平均輝度算出モジュール１３０３から平均輝度値を採り込む。そして、ステップＳ１４０２において、輝度値範囲が変更になったかどうかを判定する。例えば、現在使用されているテンプレート画像の輝度値範囲が範囲Ａの場合、ステップＳ１４０１で取り込まれた平均輝度値が他の輝度値範囲（Ｂ或いはＣ）に属するかどうかを判定する。輝度値範囲が変化した場合は、ステップＳ１４０３へ進み、新たな平均輝度値が属する輝度範囲に対応したテンプレート画像群を読み込む。そして、ステップＳ１４０４でそれらのテンプレート画像群をランドマーク検出モジュール１３１３に出力する。
【００７２】
以上のように第４の実施形態によれば、固定カメラを用いずに、予め用意した複数種類のテンプレート画像から適切なものが選択され、テンプレートマッチングに用いられるので、別途固定カメラを設けることなく、正確なテンプレートマッチングを実現できる。
【００７３】
なお、テンプレート画像の切り替えは、平均輝度値に限らず、朝、昼、夜の時間帯に応じて実行するようにすることもできる。或いは、観察者がマニュアルで晴、曇り、雨等の気象状態を入力し、これに応じてテンプレート画像選択モジュール１３０２がテンプレート画像の切り替えを行うようにすることも可能である。
【００７４】
なお、上記の例では、１つのテンプレート画像グループよりテンプレート画像を選択するが、複数の位置から観察されるランドマークに対応して、テンプレート画像グループを複数用意しておき、この中から使用すべきテンプレート画像グループを選択し、選択されたテンプレート画像グループから平均輝度値に従ってテンプレート画像を取得するようにしてもよい。この場合、複数のテンプレート画像グループは、上述した第２及び第３の実施形態の複数の固定カメラに対応づけて考えることができる。従って、観察者の位置からテンプレート画像グループを選択するように構成することが可能である。
【００７５】
更に、テンプレートマッチングにおける探索範囲の絞り込み（たとえば第１の実施形態の図５で説明した手法）が可能であることはいうまでもない。
【００７６】
＜第５の実施形態＞
上記第１乃至第３の実施形態においては、検出パラメータとしてテンプレート画像を定義し、ランドマーク検出にテンプレートマッチングを用いていたが、ランドマーク検出には必ずしもテンプレートマッチングを用いなくても良い。例えば、色特徴を用いたマーカ（カラーマーカ）をランドマークとして用いる場合には、ランドマークの検出は、検出パラメータとしてマーカの色特徴を表す色パラメータを定義し、特定色領域の抽出によって行うことができる。
【００７７】
図１５は、本実施形態によるＭＲシステムの構成を説明するブロック図である。図１５において、固定カメラ１０１、ＨＭＤ１１０、観察者カメラ１１１、ディスプレイ１１２、視点位置推定モジュール１１４、仮想画像生成モジュール１１５は第１の実施形態と同様である。
【００７８】
１５０２は色パラメータ抽出モジュールであり、固定視点画像Ｉ_Sから、各ランドマークＰ_iを検出するための色パラメータＣ_iを生成する。例えば、固定視点画像Ｉ_S上におけるランドマークＰ_iの観測領域Ｒ_i（本実施形態では既知であり不図示の供給手段によって供給されるものと仮定する）内の各画素のＲＧＢ色空間における分布に基づいて、ＲＧＢ色空間におけるランドマークの存在範囲(赤の最小値Ｒmin,赤の最大値Ｒmax, 緑の最小値Ｇmin, 緑の最大値Ｇmax,青の最小値Ｂmin, 青の最大値Ｂmax)）を求め、これをランドマークの色特徴を表す色パラメータＣ_iとする。この色パラメータＣ_iは、所定のタイミング毎に後述のランドマーク検出モジュールへ出力される。
【００７９】
１５１３はランドマーク検出モジュールであり、色パラメータ抽出モジュール１５０２から提供される色パラメータＣ_iに基づいて、観察者視点画像Ｉから、色パラメータＣ_iで定義される色領域に含まれる画素を抽出することで、ランドマークＰ_iを検出する。以上によって、観察者視点画像Ｉとほぼ同一時刻に撮影された（すなわち、観察者視点画像Ｉとほぼ同一光源環境下において撮影された）固定カメラ画像Ｉ_Sに基づいて色パラメータＣ_iが定義できるので、屋外環境のように光源環境が動的に変化する状況下においても、常に安定したカラーマーカ検出を行うことが可能であり、ランドマーク位置の正確な検出が実現できる。なお、本実施例では色パラメータＣ_iとして、ＲＧＢ色空間におけるランドマークの存在範囲を用いたが、一般に色特徴抽出に用いられる何れの色空間や色特徴抽出法を用いても良いことはいうまでもなく、濃淡画像に対する輝度情報をパラメータとしてもよい。また、検出パラメータの種類はテンプレート画像や色特徴に限定されるものではなく、画像からランドマークを検出するためのいずれの検出パラメータを用いてもよい。
【００８０】
＜第６の実施形態＞
上記第１〜第５の実施形態においては、撮影画像上のランドマーク位置を検出したい観察者視点カメラは１台であったが、観察者視点カメラは必ずしも１台でなくてもよい。例えば、複数の観察者（ここではＡ〜Ｄの４人とする）それぞれに対応する観察者視点カメラ１１１Ａ〜１１１Ｄが存在し、それらによって撮影された観察者視点画像Ｉ_Ａ〜Ｉ_Ｄ上におけるランドマーク位置を検出する場合には、それぞれに対応するランドマーク検出モジュール１１３Ａ〜１１３Ｄを設け、上記第１〜第４の実施形態と同様な構成のテンプレート画像作成モジュール１０２を用いて、これらのランドマーク検出モジュール１１３Ａ〜１１３Ｄそれぞれに対してテンプレート画像を更新すればよい。
【００８１】
以上説明したように、上記各実施形態によれば、撮影時の環境が変化して特定点の見え方が変化しても、撮影画像中からランドマークを正確に検出することが可能となる。また、各実施形態によれば、環境の変化に対して正確なランドマークの検出が保証されるので、ＭＲ技術において、仮想と現実の高精度な位置合せと、屋外での自由な移動との両立を達成することができる。
【００８２】
なお、上記実施形態１〜６では、ビデオシースルー方式のＭＲシステムへの応用を説明したが、視点位置の計測が必要な用途、例えば、光学シースルー方式のＭＲへの応用ももちろん可能であるし、カメラで撮影した画像中から静止物体の特定の箇所の座標を検出する用途であれば、ＭＲ以外の用途にも適用可能である。
【００８３】
【発明の効果】
以上説明したように、本発明によれば、撮影時の環境が変化して特定点の見え方が変化しても、撮影画像中から特定点を確実に検出することが可能となる。
【図面の簡単な説明】
【図１】第１実施形態によるＭＲシステムの構成を説明するブロック図である。
【図２】第１の実施形態によるランドマーク検出処理の概要を説明する図である。
【図３】テンプレート画像作成モジュール１０２によるテンプレート画像作成処理の手順を説明するフローチャートである。
【図４】ランドマーク検出モジュールによるランドマークの検出手順を説明するフローチャートである。
【図５】ランドマーク検出処理時に探索領域を限定する方法を説明する図である。
【図６】第２の実施形態によるＭＲシステムの構成を示すブロック図である。
【図７】第２の実施形態によるランドマーク検出処理の概要を説明する図である。
【図８】第２の実施形態における、検出の対象とするランドマークの制限を行う場合の処理を説明するフローチャートである。
【図９】第２の実施形態における、ランドマーク検出処理時に探索領域を限定する方法を説明する図である。
【図１０】第３の実施形態による、オーバーラップのある場合のランドマーク検出処理の概要を説明する図である。
【図１１】同一ランドマークに複数のテンプレート画像が存在した場合に、マッチングの結果が最良のものを用いてランドマーク検出を行う場合の手順を説明するフローチャートである。
【図１２】同一ランドマークに複数のテンプレート画像が存在した場合に、観察者位置に基づいて選択された固定カメラによって得られるテンプレート画像を用いてランドマーク検出を行う場合の手順を説明するフローチャートである。
【図１３Ａ】第４の実施形態によるＭＲシステムの構成を示すブロック図である。
【図１３Ｂ】テンプレート画像のデータ構成例を示す図である。
【図１４】第４の実施形態によるテンプレート画像選択モジュールの処理手順を説明するフローチャートである。
【図１５】第５実施形態によるＭＲシステムの構成を説明するブロック図である。
【図１６】第３の実施形態におけるテンプレート画像の格納状態を説明する図である。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a specific point detection method and apparatus for detecting a specific point of a stationary object such as a landmark from an image.
[0002]
[Prior art]
In recent years, research on mixed reality (hereinafter referred to as MR technology) for the purpose of superimposing and displaying additional information and virtual objects (hereinafter collectively referred to as virtual images) in a real space has been actively conducted. Among them, a video see-through type head-mounted display (hereinafter referred to as “HMD”) is worn by an observer, and a real image captured by a camera built in or attached to the HMD is added to the real space and virtual. A system that superimposes and draws a virtual image in a state where the space is three-dimensionally aligned, and displays a mixed reality image (hereinafter referred to as MR image) generated as a result on the HMD in real time (in this specification, The device will be referred to as the MR system).
[0003]
The alignment of the virtual image and the real image is the biggest technical problem in the MR system, and the realization thereof requires accurate measurement of the position and orientation of the camera viewpoint. In general, if the positions of a plurality of known three-dimensional positions (theoretically 3 points or more, 6 points or more for stable solving) on the captured image can be obtained, the position of the camera viewpoint from the corresponding relationship Azimuth and orientation can be obtained (in the present specification, such a point is referred to as a landmark). That is, the alignment problem is reduced to how accurately the landmark is tracked or detected from the image taken by the moving camera and the position is obtained.
[0004]
The present inventors have so far developed an application device of MR technology in the field of games and the like. These devices were intended for indoor use.
[0005]
For indoor use as described above, a characteristic marker (a characteristic color such as red or green, or a characteristic pattern such as a checkered pattern or concentric circles) Are arranged in the target space and these are used as landmarks, the landmarks can be detected easily and stably by image processing, and high-precision positioning can be realized.
[0006]
As a marker detection method when using a color-based marker, for example, a marker is photographed under a certain illumination environment, a representative color of the marker area in the image is extracted, and the captured image is stored. There is known a method of detecting a marker as an area having the same color (or a color close to it) as the representative color of the marker area. In addition, as a marker detection method in the case of using a marker based on a pattern, for example, each marker is photographed under a certain illumination environment, and a region near the marker in the image is saved as a template image, thereby performing template matching. The marker can be detected by. That is, similarity calculation is performed between the template image and the partial area of the photographed image, and the position of the partial area most similar to the template image is detected as the marker position. In the present specification, image features used as clues for detecting a marker, such as the representative color of the marker area and the template image in the above, are collectively referred to as “detection parameters”.
[0007]
On the other hand, for example, there is an increasing demand for MR systems that are assumed to be used outdoors, such as displaying virtual images of guiders on HMDs to guide university campuses and sightseeing spots.
[0008]
There are many cases where it is difficult to put an artificial marker in the environment outdoors. In such a situation, the method of measuring the position and orientation of the observer's viewpoint is a point having features that can be detected by image processing in the captured image captured by the camera (for example, the corner of the structure, the structure There are known methods that use, as a landmark, a point having a lot of texture inside, a point where the color changes locally, and the like. A template matching technique can be applied to the detection of the landmark from the photographed image.
[0009]
[Problems to be solved by the invention]
However, in the outdoor environment, the appearance (brightness and color) of the landmark changes due to changes in ambient light depending on the weather (sunny / cloudy / rainy) and time zones (morning / daytime / night). For this reason, when trying to detect a landmark by template matching, even if a template image for matching is prepared as a detection parameter in advance, correct matching cannot be performed due to changes in ambient light, There is a problem that it becomes impossible to detect. Therefore, there is a problem that the correct position and orientation of the viewpoint cannot be obtained, and correct alignment between the real image and the virtual image cannot be performed. Further, even when artificial markers are used in an indoor environment, the same problem occurs when the lighting environment changes.
[0010]
The present invention has been made in view of the above-described problems, and even when the appearance of a landmark used as a specific point changes due to changes in the shooting environment, the specific point is reliably detected from the captured image. The purpose is to make it possible.
[0011]
[Means for Solving the Problems]
  According to the invention to achieve the above objectImage processingThe method includes, for example, the following steps. That is,
  Placed in real spacepluralAn image processing method for calculating a posture of a photographing unit using a specific point,
  From captured imagespluralSpecific pointEachHolding step for holding a plurality of detection parameters for detecting
  An input process for inputting a photographed image photographed by the photographing unit;
  The captured imageAverage brightness calculation to calculate the average brightness ofProcess,
  From the plurality of held detection parameters,For each of the plurality of specific points,SaidAverage brightnessA selection step of selecting a detection parameter according to
  A detection step of detecting a specific point from the captured image input by the input step using the detection parameter selected by the selection step;
  A calculating step of calculating the posture of the photographing unit using the position of the detected specific point in the photographed image.
[0012]
  In addition, according to the present invention to achieve the above objectImage processingThe apparatus includes, for example, the following configuration.Ie,
  Placed in real spacepluralAn image processing apparatus that calculates a posture of a photographing unit using a specific point,
  From captured imagespluralSpecific pointEachA holding unit for holding a plurality of detection parameters for detecting
  Enter the image taken by the shooting unitInput sectionWhen,
  The captured imageAverage brightness calculation to calculate the average brightness ofAnd
  From the plurality of held detection parameters,For each of the plurality of specific points,SaidAverage brightnessA selection unit for selecting a detection parameter according to
  A detection unit that detects a specific point from a captured image input by the input unit using the detection parameter selected by the selection unit;
  A calculating unit that calculates the posture of the photographing unit using the position of the detected specific point in the photographed image.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.
[0014]
<First Embodiment>
In the embodiment described below, a template image used for template matching is used as a detection parameter, and this template image is dynamically updated to improve landmark detection accuracy.
[0015]
FIG. 1 is a block diagram illustrating the configuration of an MR system according to the first embodiment. In FIG. 1, reference numeral 101 denotes a fixed camera corresponding to the second photographing means of the present invention, and its installation position, viewpoint orientation, focal length, etc. are fixed so that the same point in the scene is always observed. ing. That is, a captured image obtained from the fixed camera 101 (hereinafter referred to as a fixed viewpoint image I)._SIn the above, the landmark P to be detected_i(Where i is the number of landmarks) is always the same coordinate (x_i, Y_i)
[0016]
A template image creation module 102 is a fixed viewpoint image I._SFrom each landmark P_iTemplate image T corresponding to_iIs generated. There are various methods for generating a template image as will be described later. In this embodiment, the landmark P is used._iObservation coordinates (x_i, Y_i) Is assumed to be known. The template image T_iIs (x_i, Y_i) With a certain range of rectangular area R_iI_SGenerate by extracting from. This template image T_iIs used for template matching processing for landmark detection, as will be described later. This template image T_iIs updated at a predetermined timing, for example, every frame of the fixed camera 101.
[0017]
Reference numeral 110 denotes an HMD worn by an observer, and includes an observer viewpoint camera 111 and a display 112. The observer viewpoint camera 111 is fixed to the HMD 110, and the captured image is an image corresponding to the viewpoint position and direction of the observer (hereinafter referred to as an observer viewpoint image I). Here, the observer camera 111 corresponds to one aspect of the first photographing unit, and this observer viewpoint image corresponds to a target image that is a target for detecting a specific point (landmark).
[0018]
A landmark detection module 113 is a template image T provided from the template image creation module 102._iBy performing a search process by template matching using the image, a landmark P is obtained from the observer viewpoint image I provided by the observer viewpoint camera 111._iIs detected. As described above, since the template image creation module 102 updates the template image at a predetermined timing, the landmark detection module was photographed at almost the same time as the observer viewpoint image I (that is, the observer viewpoint image I). Template matching can be performed using a template image (taken under almost the same light source environment). Accordingly, stable template matching can always be performed even in a situation where the light source environment changes dynamically as in the outdoor environment, and accurate detection of the landmark position can be realized.
[0019]
The landmark detection module 113 further detects the detected landmark P._iCoordinate value (u) on the observer viewpoint image I of_i, V_i) And sent to the viewpoint position estimation module 114. (U_i, V_i) Is the center position of the region that matches the template image.
[0020]
The viewpoint position estimation module 114 is known based on the image coordinate values of the plurality of landmarks provided from the landmark detection module 113 and the positions of the landmarks in the real space that are measured in advance and stored as known information. The viewpoint position and orientation of the observer are calculated by the method. Theoretically, if there are coordinate values of three landmarks on the observer viewpoint image I, the viewpoint position and orientation of the observer viewpoint image can be calculated.
[0021]
The viewpoint position and azimuth orientation calculated as described above are provided to the virtual image generation module 115. The virtual image generation module 115 superimposes and draws the virtual image that will be observed from the viewpoint position and orientation orientation provided from the viewpoint position estimation module 114 on the observer viewpoint image I and displays it on the display 112 of the HMD 110. To do. As a result, an MR image in which the real space and the virtual space are merged based on accurate alignment is displayed on the display 112, and the observer observes it.
[0022]
Assuming that the observer moves outdoors, a unit (fixed portion) including the fixed camera 101 and the template image creation module 102, and a unit including the HMD 110 and the landmark detection module 113 (portion attached to the viewer) ) Is preferably a separate body. In this case, the template image is transmitted from the template image creation module 102 to the landmark detection module 113 by wire or wireless.
[0023]
FIG. 2 is a diagram for explaining the outline of the landmark detection process according to the first embodiment. Reference numeral 201 denotes a fixed viewpoint image I captured by the fixed camera 101._SIn this example, seven landmarks (P₁~ P₇) Is set. As described above, the landmark position (x_i, Y_i) Is known. Therefore, the template image creation module 102 determines each landmark position (x_i, Y_i) And a predetermined area R₁~ R₇Template image T by extracting₁~ T₇Can be generated. In this way, the template image creation module 102 has the latest fixed viewpoint image I at a predetermined timing._STemplate image T using_iIs generated.
[0024]
The landmark detection module 113 generates the latest template image T generated as described above._iIs used to perform template matching on the observer viewpoint image I (202) obtained from the observer viewpoint camera 111 provided in the HMD 110 to detect landmarks.
[0025]
FIG. 3 is a flowchart for explaining the procedure of template image creation processing by the template image creation module 102. First, in step S301, it is determined whether or not it is a template image update timing. In the present embodiment, the update timing of the template image is made to coincide with the frame period of the fixed camera 101, but of course it is not limited to this. For example, the template image is updated every elapse of a predetermined time, the template image is updated every time the fixed camera 101 finishes photographing a predetermined number of frames, and the fixed viewpoint image at the time of the previous template image update and the current It will be apparent that various modifications are possible, such as updating the template image when the difference between the average luminance values of the fixed viewpoint images exceeds a predetermined value, or a combination of these timings.
[0026]
If it is time to update the template image in step S301, the process proceeds to step S302, and the fixed viewpoint image I from the fixed viewpoint camera 101 is reached._SEnter. In step S303, the image I_SFrom among the landmarks P_iA predetermined rectangular area R corresponding to_i(E.g. (x_i-n <x <x_i+ n, y_i-n <y <y_i(x, y)) that satisfies + n; n is a constant) is extracted, and this is extracted as a template image T_iAnd In step S304, the template image T obtained in step S303 is displayed._iIs output to the landmark detection module 113.
[0027]
In step S305, the landmark P_iIt is determined whether or not the template image generation has been completed for all of the above, and if there is an unprocessed landmark, in step S306, the processing target is transferred to the landmark, and the process returns to step S303 to repeat the above processing. When the generation and output of template images for all landmarks are completed, the process returns from step S305 to step S301, and the next update timing is awaited.
[0028]
Through the above processing, the template image updated at a predetermined timing (in this embodiment, in units of frames) is provided to the landmark detection module 113.
[0029]
In the above embodiment, in step S303, the image I_SRectangular region R extracted from_iAs template image T_iHowever, the method for generating the template image is not limited to this. For example, the fixed viewpoint image I in the past plural frames_SMultiple rectangular regions R extracted from_iIs used to create an average image or a weighted average image, and this is used as a template image T_IIt is good. In this case, the fixed viewpoint image I_SIt can be expected to remove the noise component contained in.
[0030]
In the above embodiment, in step S304, all the template images generated in step S303 are output. However, the template image output method is not limited to this. For example, the last output template image T_i'And the template image T generated in step S303_iWhen the difference e is greater than a certain value (e ≧ TH)₁), It may be determined that the light source environment has changed, and the template image may be output. In this case, network traffic can be reduced by omitting unnecessary data transmission. In addition, a barrier enters between the landmark and the fixed camera 101, and the fixed viewpoint image I_SWhen the landmark is not observed above, in order to prevent the template image from being updated to an incorrect image obtained by capturing the obstacle, the difference is greater than or equal to a certain value (e ≧ TH₂) May determine that the landmark is concealed and not output the template image. For calculating the degree of difference between template images, a known image processing method such as cross-correlation or the sum of absolute differences of pixel values can be used.
[0031]
Next, processing by the landmark detection module 113 will be described. FIG. 4 is a flowchart for explaining a landmark detection procedure by the landmark detection module.
[0032]
Steps S401 and S402 are executed by the template image T from the template image creation module 102 described above._iIs output in a memory for use in template matching. In this embodiment, every time one template image is obtained in FIG. 3 described above, the template image is output (steps S303 and S304). Therefore, the template image is updated in steps S401 and S402 by one template image. Will be done every time. However, the template image update procedure is not limited to this. For example, in the template image creation module 102, the fixed viewpoint image I_SIf the template images for all the landmarks included in the template are generated and then output in a batch, the landmark detection module 113 updates all the template images in a batch. It will be.
[0033]
If no template image has been received in step S401, or after step S402 is completed, the process proceeds to step S403, where it is determined whether an observer viewpoint image I has been input. As described above, the observer viewpoint image I is image data output from the observer viewpoint camera 111, and a landmark is detected from the observer viewpoint image I by the processing in steps S404 to S407. Therefore, in this embodiment, each time an observer viewpoint image is input from the observer viewpoint camera 111 (that is, every frame), the landmark is detected.
[0034]
In step S404, the template image T_iThe landmark P from the observer viewpoint image I using_iIs detected. Any known template matching method may be used for this detection process. For example, each pixel (u_j, V_j) For each template image T around the pixel._iAn area of the same size as the partial image Q_jExtracted as a partial image Q_jAnd template image T_iDifference e between_jIs calculated. As a method of calculating the degree of difference, a cross-correlation between both images may be obtained, or the sum of absolute values of luminance values of corresponding pixels may be used, or the input image is a color image May be the sum of the RGB distances between the corresponding pixels. All pixels (u_j, V_j) Partial image Q_jAnd template image T_iDifference e between_jAnd the difference e_j(Ie, the template image T_iPartial image Q that most closely matches_jCenter coordinates (u_j, V_j)), The landmark P in the observer viewpoint image I_iDetection position (u_i, V_i).
[0035]
In step S405, coordinates (u_i, V_i) To the landmark P in the observer viewpoint image I_iIs output to the viewpoint position estimation module 114. In step S404, the template image T is added to the observer viewpoint image I._iIs determined that there is no matching part (for example, all the differences e_j) Exceeds the set threshold)_iIs output on the observer viewpoint image I, or this processing is skipped. In step S406, all landmarks P_iIt is determined whether or not the detection process has been completed. If there is still an unprocessed landmark, the process proceeds to step S407 and an unprocessed landmark P_iThe process from step S404 is repeated. All landmarks P_iWhen the process is finished, the process returns to step S401.
[0036]
Note that the present invention further increases the effect by operating the template image creation module 102 and the landmark detection module 113 in synchronization. That is, after receiving the template image in step S401, in step S403, the fixed viewpoint image I that is the origin of the received template image._SBy inputting the observer viewpoint image I photographed at the same time, template matching using a template image photographed under the same light source environment as the observer viewpoint image I becomes possible. Needless to say, in order to strictly realize this processing, it is desirable that the imaging of the fixed camera 101 and the observer viewpoint camera 111 be electrically synchronized.
[0037]
In the above embodiment, the detection process is performed for all the landmarks. However, the process may be terminated when a predetermined number of landmarks that enable calculation of the observer viewpoint position are detected.
[0038]
In the above processing, the template image creation module 102 outputs the updated template image to update the template image in the landmark detection module 113. The landmark detection module 113 uses the template as necessary. The latest template image stored in the image creation module 102 may be read. The reading timing is, for example, every time the observer viewpoint image I is input, every predetermined time interval, or the like. In this case, the template image creation module 102 holds the template image created in its own storage medium, and the latest template image is transmitted from the template image creation module 102 to the landmark detection module 113 in response to a request from the landmark detection module 113. Is done.
[0039]
In step S404, the entire observer viewpoint image I is scanned and the landmark P is scanned._iHowever, it is possible to apply various known techniques for improving the efficiency of the template matching process. An example is as follows.
[0040]
FIG. 5 is a diagram for explaining a method for limiting the search area during the landmark detection process. For each landmark, information such as the position and orientation of the observer camera in the previous frame (or past frame) of the observer viewpoint image I and the detection position of the landmark in the previous frame (or past frame) is used. Next, an approximate position in the observer viewpoint image I of the current frame is estimated, and a search area is set in the surrounding area. Of course, position data from the immediately preceding viewpoint position estimation module 114 may be used. The landmark P including the search area in the observer viewpoint image I of the current frame_iOnly for, search processing within the search area is performed. If it demonstrates in the example of FIG. 5, the landmark P shown in (a) will be described.₁~ P₇Are obtained as shown in (b) with respect to the observer viewpoint image I. In this case, in step S404, P_Three~ P_FiveAll search areas and P₂The corresponding landmark is searched for the portion included in the observer viewpoint image I of the search area. That is, the processing speed can be increased by narrowing down the search range.
[0041]
As described above, according to the first embodiment, since the template image is updated using the image captured by the fixed camera 101, a template image corresponding to the environment can be obtained following the change in the environment. Can do. For this reason, since it is possible to reliably detect the landmark from the observer viewpoint image I regardless of changes in the environment, it is possible to accurately determine the position and orientation of the observer's viewpoint in the outdoor environment. . Therefore, it is suitable for alignment between the real space and the virtual space, particularly when displaying an MR image on the display 112 provided in the HMD 110.
[0042]
In this embodiment, the position of each landmark in the fixed viewpoint image 201 is known. For example, the position of each landmark is stored in a memory (not shown) of the template image creation module, acquired as necessary, and stored in the template image creation module 102. Shall be supplied. In addition to this, the following method can be used as means for supplying the landmark position. That is, the operator may directly specify the position of the landmark on the fixed viewpoint image 201 by an input unit (not shown), or the position of each landmark in the three-dimensional space measured by some method and the camera of the fixed camera 101. Parameters (including at least position and orientation) are stored in a memory, and based on this information, a fixed viewpoint image 201 is displayed by a landmark position calculation unit (not shown) (corresponding to a specific point position calculation unit of the present invention). The position of each landmark on the top may be calculated. In addition, when a landmark to be detected is not determined in advance and it is only necessary to track some feature point in the observer image 202, the feature extraction unit (not shown) starts from the fixed viewpoint image 201 at an initial time. A feature point having a remarkable image feature (for example, an edge portion or a strong texture portion) is automatically extracted, and this position may be used as a landmark position.
[0043]
<Second Embodiment>
In the first embodiment, since the template image is updated by one fixed camera, the acquisition range of the template image is limited, and the movement and / or look-around range of the observer is limited. Therefore, in the second embodiment, a plurality of fixed cameras are installed so that the observer can move and / or look around in a wide range. However, since a plurality of fixed cameras are used, when there are a plurality of template images for one landmark (hereinafter referred to as “with overlap”), one fixed camera is used for one landmark. There is a case where only one template image is present by assignment (referred to as “no overlap”). In the second embodiment, a case where there is no overlap will be described, and a case where there is an overlap will be described in the third embodiment.
[0044]
When there is no overlap, the MR system provided with a plurality of fixed cameras can be realized with a configuration similar to that of the first embodiment. FIG. 6 is a block diagram showing the configuration of the MR system according to the second embodiment. In other words, the template image creation module 602 determines a predetermined region R from each of a plurality of fixed viewpoint images obtained from a plurality of fixed cameras 601._iIs extracted from the template image T_iOutput as.
[0045]
As in the first embodiment, the landmark detection module 613 updates the template image to be used with the template image transmitted from the template image creation module 602, and uses the template image to calculate the landmark image from the observer viewpoint image I. Perform mark detection. The camera selection module 616 selects a predetermined number of fixed cameras near the viewpoint position obtained from the viewpoint position estimation module 614 and notifies the landmark detection module 613 of the selection result. As will be described later, in the second embodiment, which fixed camera the camera selection module 616 uses based on the viewpoint position output from the viewpoint position estimation module 614 in order to improve the processing efficiency. To decide. The landmark detection module 613 performs template matching for landmark detection using the determined template image from the fixed camera.
[0046]
The virtual image generation module 115 and the HMD 110 are as described in the first embodiment.
[0047]
FIG. 7 is a diagram for explaining the outline of the landmark detection process according to the second embodiment. Each fixed viewpoint image I obtained by a plurality of fixed cameras 601 (A to E)._S1~ I_S5Landmark P above₁~ P₁₃The observation position is determined, and the surrounding rectangular area R₁~ R₁₃Template image T corresponding to each by extracting₁~ T₁₃Is generated. And what is necessary is just to detect a landmark from the observer viewpoint image I using these template images. The processing in this case is essentially the same as the case where there is one fixed camera, and it can be considered that the angle of view of one camera is widened, and the processing procedure described in FIGS. 3 and 4 is used. Can detect landmarks.
[0048]
As described above, in the second embodiment in which a plurality of fixed cameras are provided, the observer can perform the same processing as that in the first embodiment (that is, even in the configuration in which the camera selection module 616 in FIG. 6 does not exist). The position and orientation of the viewpoint can be detected. However, since the number of landmarks increases, the processing efficiency decreases if detection processing is performed for all landmarks each time. Therefore, in the second embodiment, processing efficiency is improved by limiting the number of landmarks to be detected in the landmark detection module 613 in advance. That is, the landmarks to be detected are narrowed down only to the landmarks observed by the fixed camera selected by the camera selection module 616.
[0049]
This can be realized, for example, by adding step S801 before step S404 as shown in FIG. 8 in the process shown in FIG. When the observer viewpoint image I is input, the process proceeds from step S403 to step S801, and the landmark P_iIs observed by the fixed camera selected by the camera selection module 616. Where landmark P_iIf it is not observed with the selected fixed camera, the landmark detection process (steps S404 and S405) is skipped, and the process proceeds to step S406 to detect the next landmark. On the other hand, landmark P_iIf it is observed by the selected fixed camera, the process proceeds to step S404 to detect the landmark.
[0050]
Also in the second embodiment, various well-known methods for improving the efficiency of the template matching process can be applied. For example, the technique for limiting the search area described in the first embodiment is also effective. In particular, by specifying the search area after limiting the template image to be used as described above, unnecessary position calculation of the search area can be eliminated, which is effective.
[0051]
FIG. 9 is a diagram for explaining a method for limiting the search area of the template image during the landmark detection process in the second embodiment. For example, assume that the camera selection module 616 selects the fixed cameras A, B, and C shown in FIG. 7 based on the detected viewpoint position. In this case, the object of detection is the landmark P₁~ P₈And other landmarks P₉~ P₁₃Is not considered. In step S404, these landmarks P₁~ P₈Of which the search area is included in the observer viewpoint image (P in the figure)₂~ P₆) Only for the corresponding template image T₂~ T₆A landmark detection process is performed by template matching using.
[0052]
As described above, according to the second embodiment, since the template image is updated using a plurality of fixed cameras, a wider range of movement of the observer is allowed.
[0053]
<Third Embodiment>
Next, a case where a plurality of template images exist at one landmark at one time point due to the provision of a plurality of fixed cameras, that is, a case where there is an overlap will be described.
[0054]
FIG. 10 is a diagram for explaining the outline of the landmark detection process when there is an overlap according to the third embodiment. Landmark P for fixed camera F₁And P₂Is observed, and a rectangular region R is defined around it.₁ ^F, R₂ ^FTemplate image T₁ ^F, T₂ ^FIs generated. The fixed camera G has a landmark P.₁~ P_ThreeIs observed, and a rectangular region R is defined around it.₁ ^G~ R_Three ^GTemplate image T₁ ^G~ T_Three ^GIs generated. Similarly, from the fixed camera H, the template image T₁ ^H~ T_Three ^HIs obtained. Here, for example, T₁ ^FAnd T₁ ^GAnd T₁ ^HIs the same landmark P in space₁Is a template image corresponding to.
[0055]
As described above, when a plurality of template images are obtained with respect to one landmark by different fixed cameras, it is necessary to determine which template image is used to detect the landmark. In the following, two cases will be described: (1) a case where the best template matching result is used, and (2) a case where a template image obtained by a fixed camera selected based on the observer position is used. In the third embodiment, for example, it is assumed that template images acquired from captured images obtained by the cameras F, G, and H are stored as shown in FIG. For example, from the photographed image of camera F, landmark P₁~ P₆Template image T₁ ^F~ T₆ ^FHowever, from the image taken by camera G, landmark P_Three~ P₈Template image T_Three ^G~ T₈ ^GHowever, from the image taken by camera H, landmark P_Three~ P₈Template image T₇ ^H~ T₁₂ ^HAre acquired and stored. Here, the same landmarks after the subscript number are the same landmarks. For example, landmark P₆The template images are acquired from the captured images of the camera F and the camera G.
[0056]
(1) When using the best template matching result
FIG. 11 is a flowchart for explaining a procedure in a case where landmark detection is performed using the best matching result when a plurality of template images exist in the same landmark. FIG. 11 shows a process that replaces the step S404 in FIG.
[0057]
When the observer viewpoint image I is input in step S403, the landmark P obtained by the fixed camera j in step S1100._iTemplate image T_i ^jThe landmark P from the observer viewpoint image I using_iIs detected. In step S1101, the landmark P_iHas a plurality of template images, and it is determined whether or not the coordinates have already been calculated by another template image. If the coordinates are not calculated by another template image, or if there are not a plurality of corresponding template images, the coordinate value obtained by the template image and the matching degree are stored in the memory in step S1104.
[0058]
On the other hand, if the coordinates have already been output by another template image, the process proceeds to step S1102, and the matching result based on the other template image stored in the memory is compared with the matching result based on the current template image. If the matching with the current template image is a better result (when the matching degree is higher), the process proceeds to step S1103, and the coordinates stored in the memory of the landmark are used as the current template image. Replace with the coordinate value and matching degree obtained. For example, T₆ ^GWhen matching is already done for T₆ ^FIf the matching using is performed and the matching degree is stored, T₆ ^GMatching degree when T is used and T₆ ^FThe matching degree when using is compared, and the one with the higher matching degree is adopted.
[0059]
Next, in step S1105, the landmark P_iAll template images T corresponding to_i ^jIf the processing has not been completed, the process proceeds to step 1106, and the unprocessed template image T_i ^jAre repeated as the processing target. On the other hand, landmark P_iAll template images T corresponding to_i ^jIf the processing is finished, the process proceeds to step S405, and the coordinates stored in the memory are changed to the landmark P._iIs output to the landmark detection module. As described above, when all the template images are processed, when there are a plurality of template images in one landmark, the coordinate value based on the template image having the best matching degree is adopted. Become.
[0060]
(2) When using a template image obtained by a fixed camera selected based on the observer position
FIG. 12 is a flowchart for explaining a procedure for performing landmark detection using a template image obtained by a fixed camera selected based on the observer position when a plurality of template images exist in the same landmark. is there. FIG. 12 shows a process added before step S404 in FIG.
[0061]
When the observer viewpoint image I is input in step S403, in step S1201, a landmark P to be detected from now on._iIt is determined whether or not there are a plurality of template images. If a plurality of template images do not exist, only one template image exists for the landmark, so the process proceeds to step S404, and landmark detection is performed by template matching.
[0062]
On the other hand, if there are a plurality of template images, in step S1202, a template image obtained from a fixed camera closest to the observer position is selected from the plurality of template images, and this is used for the detection process. T_iThen, the process proceeds to step S404. For example, in FIG. 16, if the observer position is closer to the camera G than the camera F, the landmark P_Three~ P₆Is a template image T obtained from an image taken by the camera G_Three ^G~ T₆ ^GIs adopted.
[0063]
By performing processing for all the template images as described above, when a plurality of template images exist for one landmark, the template image from the fixed camera closest to the observer position is adopted, and the land Mark detection is performed.
[0064]
As described above, according to the third embodiment, when a plurality of template images obtained from a plurality of fixed cameras exist in one landmark, it is possible to select an appropriate template image. In particular, as shown in FIG. 10, a template image obtained from each of a plurality of fixed viewpoint images obtained by photographing one landmark from different directions can be used appropriately. Template matching can be performed appropriately even when the appearance of the image is greatly different (for example, when it has a three-dimensional shape or a reflection characteristic close to a mirror surface).
[0065]
Note that the camera selection module 616 as described in the second embodiment can be used together. In this case, the landmarks to be processed in FIGS. 11 and 12 are only the landmarks obtained from the fixed camera selected by the camera selection module 616.
[0066]
In the third embodiment, it is needless to say that various known techniques for improving the efficiency of the template matching process can be applied.
[0067]
<Fourth Embodiment>
In the first to third embodiments, a template image used for template matching performed by the landmark detection module 113 is updated by creating a template image as needed from a fixed viewpoint image obtained using a fixed camera. . According to this method, a template image is generated by using images taken at each time point. At that time, the appearance of the landmark is reflected in the template image, and good template matching can be performed. it can. However, one or a plurality of fixed cameras must be prepared, which increases the device scale. Therefore, in the fourth embodiment, a plurality of types of template images are registered in advance for one landmark, and the template image is updated using them.
[0068]
FIG. 13A is a block diagram showing a configuration of an MR system according to the fourth embodiment. A template image storage unit 1301 registers a plurality of types of template images 1310 for each of a plurality of landmarks. A template image selection module 1302 selects one template image for each landmark from among a plurality of template images stored in the template image storage unit 1301. In this example, a template image to be used is selected based on the average luminance value by the average luminance value calculation module 1303 from the image captured at that time by the observer viewpoint camera 111 mounted on the HMD 110 (detailed later). Therefore, as shown in FIG. 13B, the template image storage unit 1301 classifies and stores the template images to be used according to the luminance value range. Note that since the luminance value for changing the template image differs for each landmark, as shown in FIG. 13B, the same template image may be used even if the luminance value range is different. For example, landmark # 1 has the same template image T in both luminance value ranges B and C._1BIs used.
[0069]
The landmark detection module 1313 performs template matching on the observer viewpoint image I using the template image acquired by the template image selection module 1302, and detects a landmark. The viewpoint position estimation module 114, the virtual image generation module 115, and the HMD 110 are as described in the first embodiment (FIG. 1).
[0070]
The average luminance value calculation module 1303 obtains an average luminance value from the captured image from the observer viewpoint camera 111 attached to the HMD 110 and provides the calculation result to the template image selection module 1302. The template image selection module 1302 selects a template image of each landmark from the template image storage unit 1301 based on this average luminance value, and outputs it to the landmark detection module 1313.
[0071]
FIG. 14 is a flowchart for explaining the processing procedure of the template image selection module according to the fourth embodiment. First, in step S1401, the average luminance value is taken from the average luminance calculation module 1303. In step S1402, it is determined whether the luminance value range has been changed. For example, when the luminance value range of the template image currently used is the range A, it is determined whether or not the average luminance value captured in step S1401 belongs to another luminance value range (B or C). If the brightness value range has changed, the process advances to step S1403 to read a template image group corresponding to the brightness range to which the new average brightness value belongs. In step S1404, the template image group is output to the landmark detection module 1313.
[0072]
As described above, according to the fourth embodiment, an appropriate image is selected from a plurality of types of template images prepared in advance and used for template matching without using a fixed camera. , Accurate template matching can be realized.
[0073]
Note that switching of template images is not limited to the average luminance value, and may be executed according to morning, noon, and night time zones. Alternatively, the observer can manually input weather conditions such as clear, cloudy, rain, etc., and the template image selection module 1302 can switch the template image accordingly.
[0074]
In the above example, a template image is selected from one template image group, but a plurality of template image groups should be prepared corresponding to landmarks observed from a plurality of positions. A template image group may be selected, and a template image may be acquired from the selected template image group according to the average luminance value. In this case, a plurality of template image groups can be considered in association with the plurality of fixed cameras of the second and third embodiments described above. Therefore, the template image group can be selected from the position of the observer.
[0075]
Furthermore, it goes without saying that the search range in template matching can be narrowed down (for example, the method described with reference to FIG. 5 of the first embodiment).
[0076]
<Fifth Embodiment>
In the first to third embodiments, the template image is defined as the detection parameter and the template matching is used for the landmark detection. However, the template matching is not necessarily used for the landmark detection. For example, when a marker using a color feature (color marker) is used as a landmark, the landmark is detected by defining a color parameter representing the color feature of the marker as a detection parameter and extracting a specific color region. Can do.
[0077]
FIG. 15 is a block diagram illustrating the configuration of the MR system according to the present embodiment. In FIG. 15, the fixed camera 101, the HMD 110, the observer camera 111, the display 112, the viewpoint position estimation module 114, and the virtual image generation module 115 are the same as those in the first embodiment.
[0078]
Reference numeral 1502 denotes a color parameter extraction module, which is a fixed viewpoint image I._SFrom each landmark P_iParameter C for detecting_iIs generated. For example, the fixed viewpoint image I_SLandmark P above_iObservation region R_iBased on the distribution in the RGB color space of each pixel in the pixel (assumed to be supplied by supply means not shown and not shown in the present embodiment), the existence range of landmarks in the RGB color space (minimum red value) Rmin, maximum red value Rmax, minimum green value Gmin, maximum green value Gmax, minimum blue value Bmin, maximum blue value Bmax)), and this is the color parameter C representing the color characteristics of the landmark_iAnd This color parameter C_iIs output to a landmark detection module described later at a predetermined timing.
[0079]
A landmark detection module 1513 is a color parameter C provided from the color parameter extraction module 1502._iFrom the observer viewpoint image I, the color parameter C_iBy extracting the pixels included in the color area defined by_iIs detected. As described above, the fixed camera image I photographed at almost the same time as the observer viewpoint image I (that is, photographed in almost the same light source environment as the observer viewpoint image I)._SColor parameter C based on_iTherefore, even in a situation where the light source environment changes dynamically as in an outdoor environment, stable color marker detection can always be performed, and accurate detection of the landmark position can be realized. In this embodiment, the color parameter C_iAs described above, although the landmark existence range in the RGB color space is used, it is needless to say that any color space or color feature extraction method generally used for color feature extraction may be used. It may be a parameter. Also, the types of detection parameters are not limited to template images and color features, and any detection parameter for detecting a landmark from an image may be used.
[0080]
<Sixth Embodiment>
In the first to fifth embodiments, there is one observer viewpoint camera for which the landmark position on the captured image is to be detected. However, the observer viewpoint camera is not necessarily one. For example, there are observer viewpoint cameras 111 </ b> A to 111 </ b> D corresponding to a plurality of observers (here, four persons A to D), and observer viewpoint images I photographed by them._A~ I_DWhen detecting the landmark positions on the upper side, the corresponding landmark detection modules 113A to 113D are provided, and the template image creation module 102 having the same configuration as that of the first to fourth embodiments is used. The template image may be updated for each of the landmark detection modules 113A to 113D.
[0081]
As described above, according to each of the above-described embodiments, it is possible to accurately detect a landmark from a captured image even if the environment at the time of shooting changes and the appearance of a specific point changes. In addition, according to each embodiment, accurate landmark detection is assured with respect to environmental changes, so that in MR technology, virtual and real high-precision alignment and free movement outdoors. Coexistence can be achieved.
[0082]
In the first to sixth embodiments, the application to the video see-through type MR system has been described. However, it is of course possible to apply to the use that requires the measurement of the viewpoint position, for example, the optical see-through type MR system. Any application other than MR can be applied as long as it is an application that detects the coordinates of a specific part of a stationary object from an image captured by a camera.
[0083]
【The invention's effect】
As described above, according to the present invention, it is possible to reliably detect a specific point from a captured image even if the environment at the time of shooting changes and the appearance of the specific point changes.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of an MR system according to a first embodiment.
FIG. 2 is a diagram illustrating an outline of landmark detection processing according to the first embodiment.
FIG. 3 is a flowchart for explaining a procedure of template image creation processing by a template image creation module 102;
FIG. 4 is a flowchart illustrating a landmark detection procedure performed by a landmark detection module.
FIG. 5 is a diagram illustrating a method of limiting a search area during landmark detection processing.
FIG. 6 is a block diagram showing a configuration of an MR system according to a second embodiment.
FIG. 7 is a diagram illustrating an outline of landmark detection processing according to a second embodiment.
FIG. 8 is a flowchart for explaining processing in a case where a landmark to be detected is limited in the second embodiment.
FIG. 9 is a diagram illustrating a method of limiting a search area during landmark detection processing in the second embodiment.
FIG. 10 is a diagram for explaining an overview of landmark detection processing when there is an overlap according to the third embodiment;
FIG. 11 is a flowchart for explaining a procedure when landmark detection is performed using the best matching result when there are a plurality of template images in the same landmark.
FIG. 12 is a flowchart illustrating a procedure for performing landmark detection using a template image obtained by a fixed camera selected based on an observer position when a plurality of template images exist in the same landmark. is there.
FIG. 13A is a block diagram showing a configuration of an MR system according to a fourth embodiment.
FIG. 13B is a diagram illustrating a data configuration example of a template image.
FIG. 14 is a flowchart illustrating a processing procedure of a template image selection module according to the fourth embodiment.
FIG. 15 is a block diagram illustrating a configuration of an MR system according to a fifth embodiment.
FIG. 16 is a diagram illustrating a storage state of a template image according to the third embodiment.

Claims

An image processing method for calculating a posture of a photographing unit using a plurality of specific points arranged in a real space,
A holding step for holding a plurality of detection parameters for detecting each of a plurality of specific points from the captured image;
An input process for inputting a photographed image photographed by the photographing unit;
An average luminance calculating step for calculating an average luminance of the captured image;
A selection step of selecting a detection parameter corresponding to the average luminance for each of the plurality of specific points from the plurality of detection parameters held;
A detection step of detecting a specific point from the captured image input by the input step using the detection parameter selected by the selection step;
And a calculation step of calculating an attitude of the photographing unit using a position of the detected specific point in the photographed image.

The detection parameter is a template image. 1 The image processing method as described.

In response to said average luminance, an image processing method according to claim 1 or 2, characterized in that to control the execution of the selected detection parameters by the selection step.

An image processing apparatus that calculates an attitude of a photographing unit using a plurality of specific points arranged in a real space,
A holding unit that holds a plurality of detection parameters for detecting each of a plurality of specific points from a captured image;
An input unit for inputting a photographed image taken by the photographing unit ;
An average luminance calculation unit for calculating an average luminance of the captured image;
A selection unit that selects a detection parameter corresponding to the average luminance for each of the plurality of specific points from the plurality of detection parameters held;
A detection unit that detects a specific point from a captured image input by the input unit using the detection parameter selected by the selection unit;
An image processing apparatus comprising: a calculation unit that calculates an attitude of the photographing unit using a position of the detected specific point in a photographed image.

A storage medium storing a control program for causing a computer to execute an image processing method for calculating a posture of a photographing unit using a plurality of specific points arranged in a real space, the image processing method comprising:
A holding step for holding a plurality of detection parameters for detecting each of a plurality of specific points from the captured image;
An input process for inputting a photographed image photographed by the photographing unit;
An average luminance calculating step for calculating an average luminance of the captured image;
A selection step of selecting a detection parameter corresponding to the average luminance for each of the plurality of specific points from the plurality of detection parameters held;
A detection step of detecting a specific point from the captured image input by the input step using the detection parameter selected by the selection step;
And a calculation step of calculating an attitude of the photographing unit using a position of the detected specific point in the photographed image.