JP3860550B2

JP3860550B2 - Interface method, apparatus, and program

Info

Publication number: JP3860550B2
Application number: JP2003061227A
Authority: JP
Inventors: 秀則佐藤; 英一細谷; 美紀北端; 育生原田; 久雄野島; 晃小野澤
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-03-07
Filing date: 2003-03-07
Publication date: 2006-12-20
Anticipated expiration: 2023-03-07
Also published as: JP2004272515A

Description

【０００１】
【発明の属する技術分野】
本発明は、ユーザーが体の部位（指や腕等）を用いて指し示した撮影空間内の物体を検出するインタフェース装置に関する。
【０００２】
【従来の技術】
これまで、コンピュータと人間とのインタフェースに関し、人間の動作に基づくインタフェース方法や装置としては、以下に挙げるような手法がある。
【０００３】
第１の従来手法として、体に手や指の動作計測可能なセンサを装着し、センサ情報からユーザーの動きを検出する装置がある。例えば、磁気センサを用いた装置（ＡＳＣＥＮＳＩＯＮ社の「ＭｏｔｉｏｎＳｔａｒ」等）や、機械式センサを用いた装置（スパイス社の「Ｇｙｐｓｙ」、Ｉｍｍｅｒｓｉｏｎ社の「ＣｙｂｅｒＧｌｏｖｅ」等）等の市販製品がある。また、非特許文献１に記載された方法がある。これは加速度センサ等を取り付けたグローブを手に装着して、ユーザーの動作を認識するものである。
【０００４】
第２の従来手法として、非特許文献２に記載された手法がある。本手法は、腕の制約条件とユーザーの実際の腕の長さを利用して、１台のカメラの入力画像から、腕領域を抽出し、その長さの時間的変化から腕の動きの３次元的変化を推定するものである。
【０００５】
第３の従来手法として、非特許文献３に記載された手法がある。本手法は、体の中に求めた座標位置（「仮想投射中心」）と、検出した指先の座標位置を結び、延長した指示軸線がスクリーン（入力画像を表示している表示装置の画面）と交差する点をカーソル位置（指示位置）とする方法である。仮想投射中心位置は、スクリーンの４つの角位置から指先へ延長した直線の交点から求めているので、指示できる位置はスクリーン上のみである。指先の位置の抽出は、２台のカメラを用い、１台を上部から撮影する位置に設置することにより、スクリーンに最も近い物体を検出することで実現している。
【０００６】
第４の従来手法として、非特許文献４に記載された手法がある。本手法は、多眼ステレオカメラを用いて生成した距離情報を用いて、スクリーンに最も近い物体を指先として検出し、また色情報と距離情報を用いて眉間（目）の位置を検出し、これらを結んだ延長線がスクリーンと交差する点をカーソル位置（指示位置）とする方法である。
【０００７】
【非特許文献１】
塚田ら，“Ｕｂｉ-Ｆｉｎｇｅｒ：モバイル指向ジェスチャ入力デバイスの試作”，インタラクティブとソフトウェアに関するワークショップ（ＷＩＳＳ２００１），ｐｐ．１１９−１２４，２００１
【非特許文献２】
安部ら，“オプティカルフローと色情報を用いた腕の動作の３次元追跡”，画像の認識・理解シンポジウム（ＭＩＲＵ２００２），ｐｐ．Ｉ２６７−Ｉ２７２，２００２
【非特許文献３】
福本ら，“動画像処理による非接触ハンドリーダ”，第７回ヒューマン・インタフェース・シンポジウム論文集，ｐｐ．４２７−４３２，１９９１
【非特許文献４】
金次ら，“指さしポインターにおけるカーソル位置の特定法”，電子情報通信学会画像工学研究会，２００２．１
【０００８】
【発明が解決しようとする課題】
しかしながら、上述した従来の手法では、以下に示す問題があった。
【０００９】
第１の従来手法では、手または指の動作を認識できるが、認識したい部位に常に何らかのセンサを装着する必要があるため、実用的なインタフェース装置としての利便性に欠ける。
【００１０】
第２の従来手法では、体に何も装着せずに腕の動作を認識できるが、カメラ１台のみの情報を使っているので奥行き方向の情報が直接得られないため、３次元的なユーザーの腕の動きを精度良く抽出できない。
【００１１】
第３、第４の従来手法では、ユーザーが非装着かつ非接触に、３次元的な動作を認識して、スクリーン画面上の位置を指示することができるポインティング手法であるが、指し示せるのはスクリーン画面内に限定されており、それ以外の方向にある実物体や位置を指し示すことはできない。
【００１２】
本発明の目的は、これら従来手法の、装着型のため利便性に欠ける問題、カメラ１台利用の方法では奥行き精度が悪い問題、を解決し、かつ実空間との位置の対応付けが簡単であり、ユーザーが視覚的に実物体の位置、情報を登録することもできるし、基準カメラから直接見えない位置にある実物体情報を登録することもできる、インタフェース方法、装置、およびプログラムを提供することにある。
【００１３】
【課題を解決するための手段】
上記目的を達成するために、本発明のインタフェース装置は、
基準カメラを含む複数のビデオカメラを用いて、実空間を撮影した画像を入力する手段と、
基準カメラから撮影物体までの距離情報を前記撮影画像を用いたステレオ法により生成する手段と、
基準カメラで撮影した画像である基準画像からの距離情報を用いてボクセル初期ボクセル空間を生成し、該初期ボクセル空間をあらかじめ与えられた大きさのボクセルが得られるまで、またはあらかじめ決められた分割数に達するまで階層的に分割し、その際前記距離情報が交差するボクセルのみを再分割することで空間を生成する手段と、
基準カメラからの撮影画像上で、ユーザーの体の予め定めた部位２箇所を始点と終点として検出し、該画像上でのそれらの２次元座標を算出する手段と、
距離情報と、始点・終点の画像上での２次元座標とから、実空間上における始点・終点の３次元座標を算出し、ユーザーが指し示す実空間上での方向情報を算出する手段と、
生成されたボクセル空間上の当該３次元位置に、実空間上に存在する実物体の情報、すなわち３次元位置情報およびその付加情報を登録する手段と、
ユーザーの指し示す実空間上での方向情報と、登録されている物体の３次元位置情報とを照合し、操作者が指し示す３次元指示位置情報である、ユーザーが指し示す方向の延長線と、登録された物体との交点に関する情報を検出する手段と、
生成されたボクセル空間、３次元指示方向情報、および撮影画像とを重畳表示する手段と
を有している。
【００１４】
ここで、主な用語について説明する。
１．ボクセル空間
立方体形状を持つ単位３次元空間である“ボクセル”の集合である。図１５に、ボクセルとボクセル空間との関係、それの階層分割のイメージを示す。
２．ステレオ法
ステレオ視（三角測量）の原理を用いて、基準カメラからの画像を含む２枚以上の入力画像から、基準カメラから見た撮影物体までの距離を測定する手法である。
３．基準カメラ
ステレオ法実行時に、距離情報生成のための基準となるカメラである。基準カメラで撮影した画像を基準画像と呼ぶ。
【００１５】
本発明によれば、非装着であり、奥行き情報を精度良く得られ、実空間への３次元的なポインティングを実現できる。また、基準カメラから見たボクセル空間を生成し、それを実画像と重畳して視覚的に表示することにより、実物体情報の登録も簡単にできるし、基準画像から見えない位置にある実物体の情報も登録できる。
【００１６】
本発明の実施態様では、インタフェース装置は、基準画像を左右反転した画像を生成する手段を有し、ボクセル空間も前記反転画像に合わせて、左右反転させた状態で生成し、重畳表示手段における各種表示情報も、該反転画像上に重畳表示する。
【００１７】
そのため、前記の利点に加え、自己画像が写っている鏡を見ながらポインティング操作を行っているような、より直接性、直感性が高まったインタフェースとすることができる。
【００１８】
本発明の他の実施態様では、インタフェース装置は、前記距離情報を生成する手段の代りに、ビデオカメラと投光装置を用いた能動的なステレオ法により距離情報を生成する手段を有している。
【００１９】
そのため、同様に、前記問題を解決でき、かつ物体情報登録に関する利便性、簡便性も高いインタフェースとすることができる。
【００２０】
【発明の実施の形態】
次に、本発明の実施の形態について図面を参照して説明する。
【００２１】
［第１の実施形態］
図１は本発明の第１の実施形態のインタフェース装置の構成図、図２はその処理の流れを示すフローチャートである。
【００２２】
本実施形態のインタフェース装置は、複数のカメラで撮影された画像を入力画像とし、ユーザーが体の部位（指や腕等）を用いて指し示した実空間内の位置（もしくは物体）を検出するインタフェース装置において、ユーザーの直接的で直感的な３次元的指示動作に基づき、３次元空間上での指示位置を認識することができる装置で、かつその操作時にユーザーが自己画像を見ながらインタフェース動作を行える装置である。
【００２３】
本インタフェース装置は画像入力部１１₁，１１₂と距離情報生成部１２とボクセル空間生成部１３と始終点２次元座標算出部１４と始終点３次元座標算出部１５と３次元指示位置検出部１６と空間情報登録部１７と反転情報生成部１８と情報表示部１９と空間情報データ２０から構成される。
【００２４】
画像入力部１１₁，１１₂としては、図１のように２台（もしくは３台以上）のビデオカメラを用いる。カメラは一般に用いられるビデオカメラやＣＣＤカメラでよく、白黒でもカラーでもよい。ただし、後述する色情報を使用した方法を用いる場合はカラーカメラが必要である。
【００２５】
距離情報生成部１２は、基準画像（基準画像のイメージを図３に示す）を含む２枚以上の入力画像から、ステレオ法による画像処理を用いて基準カメラから撮影物体までの距離情報を生成する（ステップ２１）。距離情報を生成する具体的な画像処理方法の例としては、市販の製品、ＰｏｉｎｔＧｒｅｙＲｅｓｅａｒｃｈ社のＤｉｇｉｃｌｏｐｓ（３眼カメラ式）やＢｕｍｂｌｅｂｅｅ（２眼カメラ式）等を用いる方法がある。これらは各々、３台もしくは２台のカメラが内蔵された画像入力機器であり、出力として距離情報を生成できるものである。また、ステレオ法を用いる手法は、画像処理分野において一般的である（発表文献多数）ので、任意の２台以上のカメラを用いて自作することも可能である。図４に、図３の基準画像上の一本の画素のライン上ｘ軸方向に沿って得られる距離情報のイメージを表す。図で、距離軸方向が基準カメラからの距離を表しており、離れていればより大きな値をとり、すなわち基準カメラからより遠い位置にあることを意味している。
【００２６】
ボクセル空間生成部１３は、距離情報生成部１２において求められた距離情報に合わせて、ボクセル空間を、階層分割しながら作成する（ステップ２２）。例えば、図５では、得られた距離情報に対し、４×３×３個のボクセルからなる初期ボクセル空間を生成している。ここでは、初期ボクセル空間を指定階層まで階層的に分割し、各階層のボクセルに、距離情報をもとに、実物体が存在し得るかどうかのフラグを立てていく。以下、説明を簡単にするため、図４の距離情報を表すグラフ（矩形グラフ）を用いて、ボクセルへのフラグの立て方、および２次元的な階層分割法を説明する。図６に分割例を示す。各ボクセルには、白、灰色、黒、の三種類のうち、いずれかひとつのフラグが与えられる。すなわち、図６に示したように、距離情報を表すグラフが着目ボクセルと交差する場合は灰色、完全に上側を通る場合は黒、完全に下側を通る場合は白、とそのボクセルのフラグとする。すなわち、距離グラフと各ボクセルとが交差するか否かを判定し、その結果に従い、フラグを決定する。このうち、灰色と判定されたボクセルだけを、図６に示すようにさらに４等分に再分割し（３次元のボクセルの場合は、図１５に示すように、８等分に再分割することとなる）、再びフラグを与えていく。以上をあらかじめ与えたボクセルの大きさになるか、あらかじめ決められた分割数に到達するまで繰り返し行う。このようにして再帰分割を行うと、距離グラフと交差するボクセルが再帰的に分割されていき、それ以外の距離的に意味をなさない領域が白または黒のフラグをとる分割されないボクセルで表されることとなる。このようにして生成されたボクセル空間は、図７に示したようなオクトリー構造をとることにより、効率的、高速に、親ボクセルや子ボクセルとの関係を引き出すことができる。なお、ボクセルのフラグについては第４のものとして、登録フラグも存在する。これについては後述する。
【００２７】
始終点２次元座標算出部１４は、距離情報生成部１２で用いた入力画像のうち、基準画像を用いて、ユーザーの体の予め定める部位２箇所（始点と終点）の画面上での２次元座標を算出する（ステップ２３）。始点と終点は、例えば、ユーザーの肩の位置を始点とし、手の位置を終点とすることが考えられる。これにより、この場合は腕を伸ばした手の先の方向が後述する３次元指示方向となる。右肩と右手の位置を始点・終点とした場合の画面上での具体的な検出方法について、以下に示す。
【００２８】
右手の位置を画像処理により検出する方法としては、例えば、入力画像をカラー画像とした場合、カラー画像中の色情報を用いて、肌色領域（肌色の取り得る範囲を任意に幅を持たせた色の値の範囲で指定する。）を抽出する方法がある。得られた複数の肌色領域の中から、大きさや位置等の制約情報（例えば、手の大きさから推測される可能性のある肌色領域の面積範囲を指定したり、画面上の天井付近や床付近等手が存在する可能性の低いところを除外したりする等の制約。または距離情報を用いることも考えられる。）を利用して、手の肌色領域の候補を選択する。さらに、１）通常ユーザーは衣服を着ていると考えることができ、肌色領域の候補となる可能性が高い肌色領域は両手と顔と考えられる、２）最も面積の大きい領域は顔と考えられる、といった性質を用いて、２番目と３番目に面積が大きい肌色領域を手の候補とする。右手は、顔の右側にある領域であり、その重心位置を右手位置とする。左手位置を求めたい場合は、左右逆に考えればよく、両手の場合は、ふたつの場合を組合わせて考えればよい。
【００２９】
また、右肩の位置を画像処理により検出する方法としては、例えば、上記の手段で顔の位置を抽出してから肩の位置を算出する方法がある。具体的には、まず、前記の肌色抽出処理を行った結果から、最も面積が大きい肌色領域は顔の可能性が高いので、その肌色領域を顔と判断し、その重心を求める。次に、右肩の位置は顔の重心位置から、下へある程度の距離、右へある程度の距離ずらしたものと仮定することができるので、予めそのずらす距離を決めておいて（個人差があるのでユーザーによって値を変えてもよい）、顔の重心位置から右肩の位置を算出することができる。これらにより、始点・終点の画面上での２次元座標を求めることができる。また、ここでは肩の位置を始点としているが、前記により求められる顔の重心位置をそのまま始点としてもよい。その場合、顔の位置と手の位置を結ぶ延長線がユーザーの指示方向となる。
【００３０】
始終点３次元座標算出部１５は、生成された距離情報と、始点・終点の２次元座標から、始点・終点の３次元座標値を求める（ステップ２４）。具体的な方法としては、例えば、基準画像上で、始点の画像上での２次元座標に相当する距離値を始点の距離値とする。終点も同様である。３次元の実空間上において、画像の２次元座標系と３次元座標系の変換式は、一般に予め容易に算出しておけるので、それに基づいて得られた画面上での始点および終点の２次元座標値とその各距離値から、始点と終点の３次元空間上での３次元座標値を求めることができる。さらに、得られた始点・終点の２つの３次元座標値から、２点を結ぶ３次元直線を求めることにより、ユーザーの指示方向を求めることができる。
【００３１】
３次元指示位置検出部１６は、ユーザーが指し示した撮影実空間中の３次元位置を検出する（ステップ２５）。具体的な方法としては、まず始終点３次元座標算出部１５で求められた、始点と終点を結ぶ３次元直線を手（終点）方向に延長していく。このとき、ボクセル空間を用いて、該延長線が、予め登録されている空間中の物体等の３次元位置情報が登録されているボクセルと交差するか否かを検出する。そのようなボクセルには、後述するように登録フラグが立っており、容易に判別できる。ここでも、図４の距離情報の例をもとに、２次元的に交差ボクセルを求める方法を述べる。今、図８中の黒いボクセルが、物体情報が登録されたボクセル（登録ボクセル）で、ユーザーがその方向を指しているものとする。この時、該ボクセルが指示方向の延長線と交わるか否かを判定するには、該延長線が登録ボクセルを通過するか否かを上位層のボクセルから計算していく。通過する登録ボクセルが見つかった場合、それの下位層のボクセルについても、登録ボクセルを通過するか否かを計算する。この通過判定を、情報が実際に登録されたボクセルが見つかるまで繰り返し行う。該延長線がボクセルを通過するか否かの判定は、ボクセルが規則的な立方体形状をとるため極めて簡単に計算でき、さらに上位層のボクセルから下位層に向かって判定していくため、計算の早い段階で情報が登録されていない大きな領域を除外することができ、極めて高速に、登録情報が存在する３次元指示位置情報を探し出すことができる。ここで、空間中の物体等の情報については、空間情報登録部１７の説明において説明する。得られた３次元指示位置情報は空間情報データ２０として蓄積され、情報表示部１９において、ディスプレイ１０上に表示される。
【００３２】
空間情報登録部１７は、ユーザーが指示する可能性のある撮影実空間中の物体等の情報を空間情報データ２０に登録する（ステップ２６）。実空間中の物体等としては、例えば、ユーザーが部屋の中にいる場合には、部屋の中にある家電機器等（テレビ、エアコン、コンピュータ、時計、窓、棚、椅子、机、引出し、書類、オーディオ機器、照明機器等）の物体や、また部屋自体の壁、床、天井、窓等、任意のものが対象として考えられる。ここでは、情報表示部１９を使って、ボクセル空間を基準画像に重畳表示させながら、会話処理により、実物体情報を３次元的に登録していく。図９に、基準画像上に初期ボクセル空間を重畳表示したイメージ図を示す。この結果に対し、情報を登録したい位置を表わすボクセルを選択し、それに情報を登録していくことを行う。この際には、図１０に示したように、ある着目ボクセルのみを表示させたり、マウスの移動操作により、それの隣接ボクセルを表示させたり、マウスのクリック操作により下位層のボクセルを表示させたりしながら、目的ボクセルを絞り込んでいくことが考えられる。なお、この表示法やボクセルの選択、移動法は本方法に限ったものではなく、例えば、実空間ではなく、ボクセル空間を指示対象とした本発明におけるポインティング操作を行うことも考えられる。また、同じ情報を複数のボクセルに登録することも考えられる。また、本会話処理時に、白フラグや黒フラグのボクセルも強制的に分割できる操作を加えれば、実物体がない空間や、基準画像から隠れた場所にも、情報を登録することができる。最後に、情報を登録したボクセルとそれが属する最上位レベルから最下位レベルまでの全ボクセルについては、第４のフラグとして、登録フラグを立てていく。
【００３３】
これらの情報（３次元位置の座標情報やその他物体に関する情報等）は、実座標を利用して、予め空間情報データ２０に登録・保存しておき、それをボクセルに自動で割り当てることも考えられる。また、情報の登録に関しては、予め固定の３次元位置座標としておくのではなく、対象とする実物体毎に位置認識可能なセンサ（市販されている磁気センサ、超音波センサ、赤外線タグ、無線タグ等）を取り付けておくことにより、各々の物体の位置をリアルタイムに認識することができるので、それらにより得られた３次元位置情報から該物体情報を生成し、常時その物体の３次元位置座標等の情報を更新していくことも可能である。この場合、物体を移動させても３次元位置情報等をリアルタイムに更新させることができる。
【００３４】
表示情報反転部１８は、基準画像を左右反転するとともに、ボクセル空間や３次元位置情報も合わせて左右反転する（ステップ２７）。基準画像とそれを左右反転させた画像のイメージを図１１に示す。この場合は、ユーザーが、鏡を見ながらポインティング動作を行うのに近い像が得られる。基準画像の左右反転は、コンピュータ内へ取り込んだ入力画像に対し市販の汎用画像処理ソフトウェア（例：ＨＡＬＣＯＮ）により、リアルタイムに実行することができる。または、入力画像を入力し反転画像をリアルタイムに生成する市販の機器（例：（株）朋栄の画面左右反転装置ＵＰＩ−１００ＬＲＦ、またはカメラ一体型でＳＯＮＹのＥＶＩ−Ｄ１００）でも実現できる。また、本情報反転部１８が無い実施形態も考えられる。その場合には、そのままの座標で表示される。
【００３５】
情報表示部１９は、例えば、ポインティング動作中においては、１）左右反転した基準画像、２）同ボクセル空間、３）該３次元指示方向、および位置を表すＣＧをディスプレイ１０に重畳表示する（ステップ２８）。その場合の各画像の例を図１２に、重畳表示した結果を図１に示す。ディスプレイ１０は、コンピュータ用に使われる汎用のディスプレイでよく、コンピュータの画面とカメラ画像を表示できるものであればよい。なお、表示方法は本例に限るものではなく、例えばボクセル空間の表示を、一部着目しているボクセルのみに限ったり、全く行わなかったりする、ということも考えられる。
【００３６】
［第２の実施形態］
図１３は本発明の第２の実施形態のインタフェース装置の構成図、図１４はその処理の流れを示すフローチャートである。
【００３７】
本実施形態のインタフェース装置は、第２の実施形態において、２個以上のカメラからの入力画像から生成する受動的なステレオ法を用いる代わりに、カメラ１１と投光装置３１を用いた能動的なステレオ法により距離情報を生成するものである。両者を混在させることも可能である。
【００３８】
２個以上の画像から生成する受動的なステレオ法とは、例えば視線方向の近い２個のカメラの入力画像同士間で、対応する点を探し（対応点探索を行い）、その座標値のずれの大きさ（視差）からその点の距離を求める手法である。距離の計算には、三角測量の原理を用いている。この手法は、対応点探索が難しく精度良い距離情報が得られにくい問題があるが、光を照射するなどの能動的な動作や装置は必要なく、撮影環境等に影響されない利点を持っている。例えば、市販の製品で、ＰｏｉｎｔＧｒｅｙＲｅｓｅａｒｃｈ社のＤｉｇｉｃｌｏｐｓ（３眼カメラ式）やＢｕｍｂｌｅｂｅｅ（２眼カメラ式）等がある。
【００３９】
これに対し、投光装置を用いた能動的なステレオ法とは、２個のカメラのうち１台を、光を投射する光源に置き換え、対応点探索のための手がかりとなる情報を対象物に直接投射する手法である。光は、スリット光、スポット光、多種に変化するパターン光など、各種の光を用いる方法もしくは製品が提案もしくは市販されている。この手法は、光を投射する複雑な装置が必要であり、また撮影環境にも影響される問題があるが、対応点探索は安定して行えるので、精度良く距離画像を求めることができる利点を持っている。例えば、市販の製品で、ＮＥＣエンジニアリング社のＤａｎａｅ−Ｒ（非接触型３次元形状計測用レンジファインダ）等がある。
【００４０】
これら２つのステレオ法はいずれも距離情報を求めることができるので、互いに置き換えることが可能である。よって、２台以上のカメラだけを使うのではなく、１台以上のカメラと投光装置を用いた能動的なステレオ法も利用可能とすることにより、利用できる手法も市販機器も広くなり汎用性を高めることができるとともに、応用先を広げることができる。
【００４１】
なお、本発明は専用のハードウェアにより実現されるもの以外に、その機能を実現するためのプログラムを、コンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行するものであってもよい。コンピュータ読み取り可能な記録媒体とは、フロッピーディスク、光磁気ディスク、ＣＤ−ＲＯＭ等の記録媒体、コンピュータシステムに内蔵されるハードディスク装置等の記憶装置を指す。さらに、コンピュータ読み取り可能な記録媒体は、インターネットを介してプログラムを送信する場合のように、短時間の間、動的にプログラムを保持するもの（伝送媒体もしくは伝送波）、その場合のサーバとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含む。
【００４２】
【発明の効果】
以上説明したように、本発明は、下記の効果がある。
【００４３】
請求項１、４、７の発明は、非装着なインタフェースであるため、ユーザーの利便性が向上する。また、ステレオ法で得られる距離情報を利用したボクセル空間を利用するため、実物体情報の登録や、ユーザーの３次元的なポインティングの指示位置検出も効率的に行うことができる。さらに、３次元的なポインティングの指示先として、画面上だけでなく実空間上の位置もポインティング可能であり、応用先を広げることができる。
【００４４】
請求項２、５、７の発明は、請求項１の効果に加え、鏡のメタファを用いたインタフェース動作を行えるため、ユーザーの利便性をより向上させることができる。
【００４５】
請求項３、６、７の発明は、請求項１、２の効果に加え、２台以上のカメラのみでなく、１台以上のカメラと投光装置を用いた能動的なステレオ法を利用した手法もしくは市販機器も使うことができるため、汎用性を高めるとともに、応用先を広げることができる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態のインタフェース装置の構成図である。
【図２】第１の実施形態のインタフェース装置の処理の流れを示すフローチャートである。
【図３】基準画像のイメージを表す図である。
【図４】図３の基準画像上のひとつのライン上に沿って得られる距離情報のイメージを表す図である。
【図５】距離情報をもとに生成したボクセル空間のイメージを表す図である。
【図６】ボクセル空間の階層分割とフラグの立て方のイメージを表す図である。
【図７】生成したボクセル空間をオクトリー構造として保持した場合のイメージを表す図である。
【図８】指示方向から情報を登録したボクセルを探すアルゴリズムのイメージを表す図である。
【図９】基準画像上に初期ボクセル空間を重畳表示したイメージを表す図である。
【図１０】空間情報登録部において、ひとつのボクセルを強調表示した状態から、隣接ボクセルを強調表示させたり、下の階層のボクセルを強調表示させたりする場合のイメージを表す図である。
【図１１】図３の基準画像を左右反転させた画像のイメージを表す図である。
【図１２】第１の実施形態における表示結果に対する構成画像のイメージを示す図である。
【図１３】本発明の第２の実施形態のインタフェース装置の構成図である。
【図１４】第２の実施形態のインタフェース装置の処理の流れを示すフローチャートである。
【図１５】ボクセルとボクセル空間との関係、それの階層分割の概念を表す図である。
【符号の説明】
１０ディスプレイ
１１，１１₁，１１₂ 画像入力部
１２距離情報生成部
１３ボクセル空間生成部
１４始終点２次元座標算出部
１５始終点３次元座標算出部
１６３次元指示位置検出部
１７空間情報登録部
１８表示情報反転部
１９情報表示部
２０空間情報データ
２１〜２８ステップ
３１投光部
Ｉ，Ｉ₁，Ｉ₂ 入力画像[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an interface device that detects an object in an imaging space pointed to by a user using a body part (such as a finger or an arm).
[0002]
[Prior art]
Up to now, regarding the interface between a computer and a human, there are the following methods as an interface method and apparatus based on a human operation.
[0003]
As a first conventional technique, there is an apparatus that detects a user's movement from sensor information by attaching a sensor capable of measuring the movement of a hand or finger to the body. For example, there are commercially available products such as a device using a magnetic sensor (such as “MotionStar” manufactured by ASCENSION) or a device using a mechanical sensor (“Gypsy” manufactured by Spice, “CyberGlove” manufactured by Immersion). Further, there is a method described in Non-Patent Document 1. This is to recognize a user's movement by wearing a glove with an acceleration sensor or the like attached to his / her hand.
[0004]
As a second conventional technique, there is a technique described in Non-Patent Document 2. This method uses the arm constraints and the actual arm length of the user to extract the arm region from the input image of one camera, and the arm movement 3 from the temporal change in the length. Estimate dimensional change.
[0005]
As a third conventional technique, there is a technique described in Non-Patent Document 3. This method connects the coordinate position obtained in the body ("virtual projection center") and the coordinate position of the detected fingertip, and the extended pointing axis is the screen (screen of the display device displaying the input image) In this method, the intersecting point is set as a cursor position (instructed position). Since the virtual projection center position is obtained from the intersection of straight lines extending from the four corner positions of the screen to the fingertip, the position that can be indicated is only on the screen. The extraction of the fingertip position is realized by detecting the object closest to the screen by using two cameras and installing one at the position where the image is taken from above.
[0006]
As a fourth conventional technique, there is a technique described in Non-Patent Document 4. This method uses the distance information generated using a multi-lens stereo camera to detect the object closest to the screen as the fingertip, and uses the color information and distance information to detect the position of the eyebrows (eyes). In this method, the point where the extension line connecting the lines intersects the screen is set as the cursor position (indicated position).
[0007]
[Non-Patent Document 1]
Tsukada et al., “Ubi-Finger: Prototype of Mobile Oriented Gesture Input Device”, Workshop on Interactive and Software (WISS2001), pp. 119-124, 2001
[Non-Patent Document 2]
Abe et al., “Three-dimensional tracking of arm movement using optical flow and color information”, Image Recognition and Understanding Symposium (MIRU2002), pp. I267-I272, 2002
[Non-Patent Document 3]
Fukumoto et al., “Non-contact hand reader by moving image processing”, Proceedings of 7th Human Interface Symposium, pp. 427-432, 1991
[Non-Patent Document 4]
Kaneji et al., “Specifying method of cursor position with pointing pointer”, IEICE Technical Committee on Image Engineering, 2002.1
[0008]
[Problems to be solved by the invention]
However, the conventional methods described above have the following problems.
[0009]
In the first conventional method, the movement of the hand or finger can be recognized, but it is necessary to always attach some sensor to the part to be recognized, so that it is not convenient as a practical interface device.
[0010]
The second conventional method can recognize the movement of the arm without wearing anything on the body, but since it uses information from only one camera, information in the depth direction cannot be obtained directly, so a three-dimensional user The movement of the arm cannot be accurately extracted.
[0011]
The third and fourth conventional methods are pointing methods that allow a user to recognize a three-dimensional motion and indicate a position on a screen screen without wearing and non-contacting. It is limited to the screen and cannot point to a real object or position in any other direction.
[0012]
The object of the present invention is to solve the problems of these conventional methods that are not convenient due to the wearing type, and the problem that the depth accuracy is poor in the method using one camera, and is easy to associate the position with the real space. Provided is an interface method, apparatus, and program that allow a user to register the position and information of a real object visually or register real object information at a position that cannot be directly seen from a reference camera. There is.
[0013]
[Means for Solving the Problems]
  In order to achieve the above object, the interface device of the present invention provides:
  Means for inputting an image of a real space using a plurality of video cameras including a reference camera;
  Means for generating distance information from a reference camera to a photographed object by a stereo method using the photographed image;
  Voxel using distance information from the reference image, which is an image taken with the reference cameraAn initial voxel space is generated, and the initial voxel space is divided hierarchically until a voxel having a predetermined size is obtained or a predetermined number of divisions is reached, and the voxels at which the distance information intersects at that time Only by subdividingMeans for generating a space;
  Means for detecting two predetermined parts of a user's body as a start point and an end point on a photographed image from a reference camera, and calculating their two-dimensional coordinates on the image;
  Means for calculating the three-dimensional coordinates of the start point / end point on the real space from the distance information and the two-dimensional coordinates on the image of the start point / end point, and calculating direction information on the real space indicated by the user;
  Information on real objects existing in the real space at the generated three-dimensional position in the voxel space, that is,3Means for registering the dimension position information and the additional information;
  The direction information in the real space pointed to by the user is compared with the three-dimensional position information of the registered object, and the extension line in the direction indicated by the user, which is the three-dimensional pointed position information pointed by the operator, is registered. Means for detecting information about the intersection with the detected object;
  Generated voxel space3Dimension direction information, andShootingMeans for superimposing and displaying shadow imagesWhen
  have.
[0014]
Here, main terms will be described.
1. Voxel space
It is a set of “voxels” which are unit three-dimensional spaces having a cubic shape. FIG. 15 shows the relationship between the voxel and the voxel space, and an image of its hierarchical division.
2. Stereo method
This is a technique for measuring the distance from a reference camera to two or more input images including the image from the reference camera, using the principle of stereo vision (triangulation).
3. Reference camera
It is a camera that becomes a reference for generating distance information when performing the stereo method. An image taken with the reference camera is called a reference image.
[0015]
According to the present invention, it is not attached, depth information can be obtained with high accuracy, and three-dimensional pointing to a real space can be realized. In addition, by generating a voxel space viewed from the reference camera and displaying it visually superimposed on the real image, it is possible to easily register real object information, and real objects that are invisible from the reference image Can also be registered.
[0016]
In the embodiment of the present invention, the interface device includes means for generating an image obtained by horizontally inverting the reference image, and the voxel space is also generated in a state of being horizontally reversed in accordance with the inverted image. Display information is also superimposed on the inverted image.
[0017]
Therefore, in addition to the above-described advantages, it is possible to provide an interface that is more direct and intuitive, such as performing a pointing operation while looking at a mirror in which a self-image is reflected.
[0018]
In another embodiment of the present invention, the interface device has means for generating distance information by an active stereo method using a video camera and a light projector instead of the means for generating the distance information. .
[0019]
Therefore, similarly, the above-described problem can be solved, and an interface having high convenience and convenience for registering object information can be provided.
[0020]
DETAILED DESCRIPTION OF THE INVENTION
Next, embodiments of the present invention will be described with reference to the drawings.
[0021]
[First Embodiment]
FIG. 1 is a configuration diagram of an interface apparatus according to the first embodiment of the present invention, and FIG. 2 is a flowchart showing a flow of the processing.
[0022]
The interface apparatus according to the present embodiment uses an image captured by a plurality of cameras as an input image, and detects an actual space position (or object) pointed to by a user using a body part (such as a finger or an arm). In the device, the device can recognize the pointing position in the three-dimensional space based on the user's direct and intuitive three-dimensional pointing operation, and the user can perform the interface operation while looking at the self-image at the time of the operation. It is a device that can do it.
[0023]
This interface apparatus has an image input unit 11.₁, 11₂, Distance information generation unit 12, voxel space generation unit 13, start / end point two-dimensional coordinate calculation unit 14, start / end point three-dimensional coordinate calculation unit 15, three-dimensional indication position detection unit 16, spatial information registration unit 17, and inversion information generation unit 18. And an information display unit 19 and spatial information data 20.
[0024]
Image input unit 11₁, 11₂As shown in FIG. 1, two (or three or more) video cameras are used. The camera may be a commonly used video camera or CCD camera, and may be monochrome or color. However, a color camera is required when using a method using color information described later.
[0025]
The distance information generation unit 12 generates distance information from the reference camera to the photographing object using image processing by a stereo method from two or more input images including a reference image (an image of the reference image is shown in FIG. 3). (Step 21). As an example of a specific image processing method for generating the distance information, there is a method using a commercially available product, Digigreys (trinocular camera type), Bumblebee (binocular camera type), etc. of Point Gray Research. These are image input devices each incorporating three or two cameras, and can generate distance information as an output. In addition, since the method using the stereo method is common in the field of image processing (many publications), it is possible to make it by using any two or more cameras. FIG. 4 shows an image of distance information obtained along the x-axis direction on the line of one pixel on the reference image of FIG. In the figure, the distance axis direction represents the distance from the reference camera, and if it is far away, it takes a larger value, that is, it is in a position farther from the reference camera.
[0026]
The voxel space generation unit 13 creates the voxel space while dividing the hierarchy according to the distance information obtained by the distance information generation unit 12 (step 22). For example, in FIG. 5, an initial voxel space composed of 4 × 3 × 3 voxels is generated for the obtained distance information. Here, the initial voxel space is hierarchically divided up to the designated hierarchy, and a flag is set for each voxel whether or not a real object can exist based on the distance information. Hereinafter, in order to simplify the description, a method of setting a flag on a voxel and a two-dimensional hierarchical division method will be described using a graph (rectangular graph) representing distance information in FIG. FIG. 6 shows an example of division. Each voxel is given one of the three flags of white, gray, and black. That is, as shown in FIG. 6, when the graph representing the distance information intersects with the target voxel, it is gray, when it passes completely above, it is black, when it completely passes below, and the flag of that voxel To do. That is, it is determined whether or not the distance graph intersects with each voxel, and a flag is determined according to the result. Of these, only the voxels determined to be gray are further subdivided into four equal parts as shown in FIG. 6 (in the case of three-dimensional voxels, subdivided into eight equal parts as shown in FIG. 15). And give the flag again. The above is repeated until the size of the given voxel is reached or a predetermined number of divisions is reached. When recursive division is performed in this way, voxels that intersect the distance graph are recursively divided, and other areas that do not make sense in distance are represented by undivided voxels with a white or black flag. The Rukoto. The voxel space generated in this way has an octree structure as shown in FIG. 7, whereby the relationship between the parent voxel and the child voxel can be extracted efficiently and at high speed. As a fourth voxel flag, there is also a registration flag. This will be described later.
[0027]
The start / end point two-dimensional coordinate calculation unit 14 uses a reference image among the input images used in the distance information generation unit 12 to perform two-dimensional display on the screen of two predetermined parts (start point and end point) of the user's body. The coordinates are calculated (step 23). For example, the start point and the end point may be the start point of the user's shoulder and the end point of the hand. As a result, in this case, the direction of the tip of the hand with the arm extended becomes a three-dimensional instruction direction to be described later. A specific detection method on the screen when the position of the right shoulder and the right hand is set as the start point / end point is shown below.
[0028]
As a method for detecting the position of the right hand by image processing, for example, when the input image is a color image, the color information in the color image is used, and the skin color region (the range that the skin color can take is arbitrarily given a width. (Specify in the range of color values.) From the obtained skin color areas, constraint information such as size and position (for example, specify the area range of the skin color area that may be inferred from the size of the hand, or near the ceiling or floor on the screen. A candidate such as a hand skin color region is selected by using a restriction such as excluding a place where a hand is unlikely to exist or a distance information. In addition, 1) a normal user can be considered to be wearing clothes, and a skin color area that is likely to be a candidate for the skin color area is considered to be both hands and face 2) the area with the largest area is considered to be a face The skin color region having the second and third largest areas is used as a candidate for the hand. The right hand is an area on the right side of the face, and the center of gravity is the right hand position. When it is desired to obtain the left-hand position, it can be considered to be reversed left and right, and in the case of both hands, the two cases can be combined.
[0029]
In addition, as a method for detecting the position of the right shoulder by image processing, for example, there is a method of calculating the shoulder position after extracting the face position by the above-described means. Specifically, first, from the result of performing the skin color extraction process, since the skin color area having the largest area is highly likely to be a face, it is determined that the skin color area is a face, and the center of gravity is obtained. Next, it can be assumed that the position of the right shoulder is shifted from the center of gravity of the face by a certain distance downward and by a certain distance to the right. Therefore, the value of the right shoulder can be calculated from the center of gravity of the face. As a result, the two-dimensional coordinates on the screen of the start point and end point can be obtained. Here, the position of the shoulder is used as the starting point, but the center of gravity of the face obtained as described above may be used as the starting point. In that case, an extension line connecting the position of the face and the position of the hand is the user instruction direction.
[0030]
The start / end point three-dimensional coordinate calculation unit 15 calculates a three-dimensional coordinate value of the start point / end point from the generated distance information and the two-dimensional coordinates of the start point / end point (step 24). As a specific method, for example, a distance value corresponding to a two-dimensional coordinate on the image of the start point on the reference image is set as the distance value of the start point. The same applies to the end point. In the three-dimensional real space, the conversion formula between the two-dimensional coordinate system and the three-dimensional coordinate system of the image can generally be easily calculated in advance, so that the two-dimensional start point and end point on the screen obtained based on it can be calculated. A three-dimensional coordinate value in the three-dimensional space of the start point and the end point can be obtained from the coordinate value and each distance value thereof. Furthermore, by obtaining a three-dimensional straight line connecting the two points from the obtained two three-dimensional coordinate values of the start point and the end point, it is possible to obtain the user instruction direction.
[0031]
The three-dimensional indication position detection unit 16 detects the three-dimensional position in the actual photographing space pointed to by the user (step 25). As a specific method, first, a three-dimensional straight line connecting the start point and the end point obtained by the start / end point three-dimensional coordinate calculation unit 15 is extended in the hand (end point) direction. At this time, using the voxel space, it is detected whether or not the extension line intersects with a voxel in which three-dimensional position information of an object or the like in a previously registered space is registered. Such voxels have a registration flag as will be described later, and can be easily discriminated. Here again, a method of obtaining the intersecting voxels two-dimensionally based on the example of the distance information in FIG. 4 will be described. Now, assume that the black voxel in FIG. 8 is a voxel (registered voxel) in which object information is registered, and the user is pointing in that direction. At this time, in order to determine whether or not the voxel crosses the extension line in the designated direction, whether or not the extension line passes through the registered voxel is calculated from the voxel in the upper layer. When a registered voxel that passes through is found, whether or not the voxel in the lower layer of the registered voxel passes through the registered voxel is calculated. This passage determination is repeated until a voxel in which information is actually registered is found. Whether or not the extension line passes through the voxel can be calculated very easily because the voxel takes a regular cubic shape, and further, it is determined from the upper layer voxel toward the lower layer. It is possible to exclude a large area where information is not registered at an early stage, and it is possible to find out three-dimensional indication position information where registered information exists at extremely high speed. Here, information on objects in the space will be described in the description of the space information registration unit 17. The obtained three-dimensional indication position information is accumulated as spatial information data 20 and displayed on the display 10 in the information display unit 19.
[0032]
The spatial information registration unit 17 registers information such as an object in the actual photographing space that the user may instruct in the spatial information data 20 (step 26). For example, when the user is in a room, the object in the real space is home appliances in the room (TV, air conditioner, computer, clock, window, shelf, chair, desk, drawer, document , Audio equipment, lighting equipment, etc.) objects, and walls, floors, ceilings, windows, etc. of the room itself can be considered as objects. Here, using the information display unit 19, the real object information is registered three-dimensionally by conversation processing while displaying the voxel space superimposed on the reference image. FIG. 9 shows an image diagram in which the initial voxel space is superimposed and displayed on the reference image. In response to this result, a voxel representing a position where information is to be registered is selected, and information is registered in it. At this time, as shown in FIG. 10, only a certain voxel of interest is displayed, the adjacent voxel is displayed by moving the mouse, or the lower layer voxel is displayed by clicking the mouse. However, it is conceivable to narrow down the target voxels. Note that this display method, voxel selection, and movement method are not limited to this method. For example, it is conceivable to perform a pointing operation in the present invention in which a voxel space is designated as an indication target instead of a real space. It is also conceivable to register the same information in a plurality of voxels. In addition, information can be registered in a space where there is no real object or a place hidden from the reference image by adding an operation that can forcibly divide the white flag and black flag voxels during the conversation process. Finally, a registration flag is set as a fourth flag for the voxel in which information is registered and all voxels from the highest level to the lowest level to which the information belongs.
[0033]
Such information (coordinate information of a three-dimensional position, information on other objects, etc.) may be registered and stored in advance in the spatial information data 20 using real coordinates and automatically assigned to voxels. . Regarding the registration of information, a sensor capable of recognizing a position for each target real object (commercially available magnetic sensor, ultrasonic sensor, infrared tag, wireless tag) is not used as fixed three-dimensional position coordinates in advance. Etc.), the position of each object can be recognized in real time. Therefore, the object information is generated from the three-dimensional position information obtained from the object, and the three-dimensional position coordinates of the object are always generated. It is also possible to update the information. In this case, even if the object is moved, the three-dimensional position information and the like can be updated in real time.
[0034]
The display information reversing unit 18 horizontally inverts the reference image and also horizontally inverts the voxel space and the three-dimensional position information (step 27). FIG. 11 shows an image of a reference image and an image obtained by reversing the reference image. In this case, an image close to a user performing a pointing operation while looking at a mirror can be obtained. The left-right reversal of the reference image can be executed in real time by commercially available general-purpose image processing software (for example, HALCON) for the input image taken into the computer. Alternatively, it can also be realized by a commercially available device that inputs an input image and generates a reverse image in real time (for example, a screen horizontal reversing device UPI-100LRF of Sakae Co., Ltd., or SONY's EVI-D100 with an integrated camera). Further, an embodiment without the information reversing unit 18 is also conceivable. In that case, the coordinates are displayed as they are.
[0035]
For example, during the pointing operation, the information display unit 19 superimposes and displays on the display 10 CG representing 1) a reference image that is horizontally reversed, 2) the same voxel space, 3) the three-dimensional indication direction, and position (step) 28). FIG. 12 shows an example of each image in that case, and FIG. The display 10 may be a general-purpose display used for a computer as long as it can display a computer screen and a camera image. Note that the display method is not limited to this example. For example, it is conceivable that the display of the voxel space is limited to only some voxels of which attention is focused, or not performed at all.
[0036]
[Second Embodiment]
FIG. 13 is a block diagram of the interface apparatus according to the second embodiment of the present invention, and FIG. 14 is a flowchart showing the processing flow.
[0037]
In the second embodiment, the interface device according to the present embodiment is an active device that uses the camera 11 and the light projector 31 instead of using the passive stereo method generated from input images from two or more cameras. Distance information is generated by a stereo method. It is also possible to mix both.
[0038]
The passive stereo method generated from two or more images is, for example, searching for corresponding points between two camera input images close to the line-of-sight direction (corresponding point search is performed), and the deviation of the coordinate values. This is a method for obtaining the distance of the point from the size (parallax). The triangulation principle is used to calculate the distance. This method has a problem that it is difficult to search for corresponding points and it is difficult to obtain accurate distance information. However, there is no need for an active operation or device such as irradiating light, and there is an advantage that it is not affected by the photographing environment. For example, there are commercially available products such as Digigreys (trinocular camera type) and Bumblebee (binocular camera type) of Point Gray Research.
[0039]
On the other hand, the active stereo method using the light projector device replaces one of the two cameras with a light source that projects light, and uses information as a clue for searching for corresponding points as a target. This is a direct projection method. As the light, methods or products using various kinds of light such as slit light, spot light, and various kinds of pattern light are proposed or commercially available. This method requires a complex device that projects light, and has the problem of being affected by the shooting environment. However, since the corresponding point search can be performed stably, there is an advantage that the distance image can be obtained with high accuracy. have. For example, there are commercially available products such as Danae-R (non-contact type three-dimensional shape measurement range finder) manufactured by NEC Engineering.
[0040]
Since these two stereo methods can obtain distance information, they can be replaced with each other. Therefore, by using not only two or more cameras but also an active stereo method using one or more cameras and a projector, the available methods and commercially available devices become wide and versatile. Can be increased and the application destination can be expanded.
[0041]
In addition to what is implemented by dedicated hardware, the present invention records a program for realizing the function on a computer-readable recording medium, and the program recorded on the recording medium is stored in a computer system. It may be read and executed. The computer-readable recording medium refers to a recording medium such as a floppy disk, a magneto-optical disk, a CD-ROM, or a storage device such as a hard disk device built in a computer system. Furthermore, a computer-readable recording medium is a server that dynamically holds a program (transmission medium or transmission wave) for a short period of time, as in the case of transmitting a program via the Internet, and a server in that case. Some of them hold programs for a certain period of time, such as volatile memory inside computer systems.
[0042]
【The invention's effect】
As described above, the present invention has the following effects.
[0043]
Since the first, fourth, and seventh aspects of the present invention are non-wearing interfaces, user convenience is improved. In addition, since the voxel space using distance information obtained by the stereo method is used, registration of real object information and detection of a user's three-dimensional pointing position can be performed efficiently. Further, as a destination for three-dimensional pointing, not only the position on the screen but also the position in the real space can be pointed, and the application destination can be expanded.
[0044]
In addition to the effect of the first aspect, the inventions of the second, fifth, and seventh aspects can perform an interface operation using a mirror metaphor, thereby further improving user convenience.
[0045]
In addition to the effects of claims 1 and 2, the inventions of claims 3, 6 and 7 utilize not only two or more cameras but also an active stereo method using one or more cameras and a projector. Since methods or commercially available equipment can be used, versatility can be enhanced and application destinations can be expanded.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of an interface apparatus according to a first embodiment of this invention.
FIG. 2 is a flowchart illustrating a processing flow of the interface apparatus according to the first embodiment.
FIG. 3 is a diagram illustrating an image of a reference image.
4 is a diagram illustrating an image of distance information obtained along one line on the reference image in FIG. 3;
FIG. 5 is a diagram illustrating an image of a voxel space generated based on distance information.
FIG. 6 is a diagram illustrating an image of how to divide a voxel space and set a flag.
FIG. 7 is a diagram illustrating an image when the generated voxel space is held as an octree structure.
FIG. 8 is a diagram illustrating an image of an algorithm for searching for a voxel in which information is registered from an instruction direction.
FIG. 9 is a diagram illustrating an image in which an initial voxel space is superimposed and displayed on a reference image.
FIG. 10 is a diagram illustrating an image when a neighboring voxel is highlighted or a lower-layer voxel is highlighted from a state in which one voxel is highlighted in the spatial information registration unit.
FIG. 11 is a diagram illustrating an image of the reference image of FIG.
FIG. 12 is a diagram illustrating an image of a configuration image with respect to a display result in the first embodiment.
FIG. 13 is a configuration diagram of an interface apparatus according to a second embodiment of this invention.
FIG. 14 is a flowchart illustrating a processing flow of the interface apparatus according to the second embodiment.
FIG. 15 is a diagram illustrating a relationship between a voxel and a voxel space and a concept of hierarchical division thereof.
[Explanation of symbols]
10 display
11, 11₁, 11₂    Image input section
12 Distance information generator
13 Voxel space generator
14 Start / end 2D coordinate calculation part
15 Start / end 3D coordinate calculation unit
16 3D pointing position detector
17 Spatial Information Registration Department
18 Display information reversing part
19 Information display section
20 Spatial information data
21 to 28 steps
31 Emitter
I, I₁, I₂    Input image

Claims

An interface method for recognizing a position in a photographing space or a real object, which is taken by a video camera and pointed to by the user using a body part,
Using a plurality of video cameras including a reference camera to input an image of real space;
Generating distance information from the reference camera to the photographed object by a stereo method using the photographed image;
An initial voxel space is generated using the distance information from a reference image that is an image captured by the reference camera, and the initial voxel space is divided until a voxel having a predetermined size is obtained or predetermined division Generating a voxel space by subdividing only the voxels where the distance information intersects, subdividing hierarchically until reaching a number ,
Detecting two predetermined parts of the user's body on the captured image from the reference camera as a start point and an end point, and calculating their two-dimensional coordinates on the image;
Calculating the three-dimensional coordinates of the start point / end point in real space from the distance information and the two-dimensional coordinates on the image of the start point / end point, and calculating direction information in the real space indicated by the user When,
Registering three- dimensional position information and its additional information , which is information of a real object existing in real space, in an arbitrary three-dimensional position on the generated voxel space;
The direction information in the real space pointed to by the user is collated with the three-dimensional position information of the registered object, and the extension line in the direction indicated by the user, which is the three-dimensional pointed position information pointed by the operator, and registration Detecting information about intersections with the detected object;
The generated voxel space, before Symbol 3-dimensional pointing direction information, interface method having the steps of superimpose and shadow image shooting beauty Oyo.

Generating a left-right inverted image of the reference image;
The step of generating the voxel space is generated in a state where the voxel space is horizontally reversed in accordance with the inverted image, and the step of superimposing and displaying various display information is also superimposed on the inverted image. The interface method according to 1.

The interface method according to claim 1, further comprising a step of generating distance information by an active stereo method using a video camera and a light projector instead of the step of generating the distance information.

An interface device for photographing a user with a video camera and recognizing a position or real object in the photographing space, which the user points to using a body part,
Means for inputting an image of a real space using a plurality of video cameras including a reference camera;
Means for generating distance information from the reference camera to the photographed object by a stereo method using the photographed image;
An initial voxel space is generated using the distance information from a reference image that is an image captured by the reference camera, and the initial voxel space is divided until a voxel having a predetermined size is obtained or predetermined division Means for generating a voxel space by subdividing hierarchically until reaching a number, and subdividing only voxels where the distance information intersects ,
Means for detecting two predetermined parts of a user's body as a start point and an end point on a photographed image from the reference camera, and calculating their two-dimensional coordinates on the image;
Means for calculating the three-dimensional coordinates of the start point / end point in real space from the distance information and the two-dimensional coordinates on the image of the start point / end point, and calculating direction information in the real space indicated by the user When,
In any three-dimensional position on the generated voxel space is information on the real object existing in a real space, and the three-dimensional position information and means for registering the additional information,
The direction information in the real space pointed to by the user is compared with the three-dimensional position information of the registered object, and the extension line in the direction indicated by the user, which is the three-dimensional pointed position information pointed by the operator, is registered. Means for detecting information about the intersection with the detected object;
The generated voxel space, before Symbol 3-dimensional pointing direction information, the interface device comprising means for displaying superimposing the shadow image shooting beauty Oyo.

Means for generating an image obtained by horizontally inverting the reference image;
The voxel space generating means also matches the voxel space with the inverted image,
5. The interface apparatus according to claim 4, wherein the interface device is generated in a horizontally reversed state, and the superimposing display unit also superimposes and displays various display information on the reversed image.

6. The interface apparatus according to claim 4, further comprising means for generating distance information by an active stereo method using a video camera and a light projector instead of the means for generating the distance information.

An interface program for causing a computer to execute the interface method according to claim 1.