JP3588527B2

JP3588527B2 - User interface device and instruction input method

Info

Publication number: JP3588527B2
Application number: JP00949697A
Authority: JP
Inventors: 美和子土井; 明森下; 直子梅木; 俊一沼崎
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1997-01-22
Filing date: 1997-01-22
Publication date: 2004-11-10
Anticipated expiration: 2017-01-22
Also published as: JPH10207618A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像処理により入力を行なうユーザインタフェース装置及び入力方法に関する。
【０００２】
【従来の技術】
コンピュータの入力デバイスとして、マウスが圧倒的に使われている。しかし、マウスで操作できることは、カーソルの移動と、メニューの選択などであり、あくまでも２次元のポインティングデバイスとしても役目に過ぎない。マウスで扱えるのは、２次元の情報であり、３次元空間の中の物体など奥行きがあるものを選択することは難しい。また、アニメーションを作成する場合、キャラクタの動きをつけるのに、マウスのような入力デバイスでは、自然な動きをつけることが難しい。
【０００３】
３次元空間でのポインティングの難点を補うために、ボールを所望の方向に押したり回したりして６軸方向の情報を入力する装置や、いわゆるデータグローブやデータスーツ、サイバーグローブといった手などにはめて使用する装置が開発されている。しかし、これら装置は操作性の悪さ等のために、当初の期待程、普及していないのが現実である。
【０００４】
これに対し、最近、ユーザは特殊な装置を扱うことなく、手振りや身振りでユーザの意図する情報を入力できる直接指示型の入力装置が開発されている。
【０００５】
例えば、光を照射し、ユーザの手による反射光を受光し、これを画像化して特徴量抽出や形状認識処理を行ない、手の形状に応じた制御を実行したり、手の移動量に応じた分のカーソル移動や３次元モデルにおける視点の変更を行なったりするものがある。
【０００６】
あるいは、ユーザの手の動きをビデオ撮影し、ビデオ映像を解析することにより、上記と同様の処理を行なうものがある。
【０００７】
このような装置によって、ユーザは特殊な装置を装着などすることなく、簡易にジェスチャで入力を行なうことができる。
【０００８】
【発明が解決しようとする課題】
しかし、この種の装置では、カーソルの移動モード、選択のモード、ダブルクリックのモードなどの各モードは固定的に使用され、モードを変えるためにはモードを変えるための明示的な操作が必要であり、ユーザに操作上の負担を強いる問題があった。
【０００９】
本発明は、上記事情を考慮してなされたもので、画像処理により入力を行なうユーザインタフェース装置において、ユーザの操作上の負担を軽減したより使い易いユーザインタフェース装置及び指示入力方法を提供することを目的とする。
【００１０】
【課題を解決するための手段】
本発明は、入力画像の画像処理により入力を行なうユーザインタフェース装置であって、ポインティングを行なうモードと、それ以外のモードとを、入力画像の画像処理結果に基づいて切り換える手段を備えたことを特徴とする。
【００１１】
また、本発明は、入力画像の画像処理により入力を行なうユーザインタフェース装置であって、少なくとも、カーソルの移動のモードと、選択のモードと、ダブルクリックのモードとを、入力画像の画像処理結果に基づいて切り換える手段を備えたことを特徴とする。
【００１２】
好ましくは、選択のモードでの選択可能なオブジェクト毎に、画像処理内容を制約した認識方法（認識エンジン）を指定する手段をさらに備え、選択されたオブジェクトについて入力画像の画像処理を行なうにあたっては、該オブジェクトについて指定された認識方法に従って画像処理を行なうようにしても良い。
【００１３】
好ましくは、選択のモードでの選択可能なオブジェクト毎に、画像処理内容を制約した認識方法（認識エンジン）を指定する手段と、カーソルで指示された表示中のオブジェクトの近傍に該オブジェクトについて指定された認識方法を示す情報を呈示する手段とをさらに備えたても良い。
【００１４】
好ましくは、カーソル上に所定の形状で前記入力画像の画像処理結果を呈示する手段をさらに備えたても良い。
【００１５】
また、本発明は、反射画像を入力する第１の装置と入力画像の画像処理により入力を行なう第２の装置からなるユーザインタフェース装置であって、前記第２の装置は、前記第１の装置に対して、入力画像の画像処理内容を制約した認識方法（認識エンジン）の指定を行なう手段を備え、前記第１の装置は、前記認識方法の指定に基づいて所定の画像処理を行なう手段と、入力した画像と画像処理結果とを前記第２の装置に返す手段とを備えたことを特徴とする。
【００１６】
好ましくは、前記第１の装置は、必要な認識方法に適した画像処理手段（認識エンジン）を保持しない場合、前記第２の装置に該認識方法に適した画像処理に必要な情報（認識エンジン）の転送を要求する手段をさらに備え、前記第２の装置は、要求された前記情報を、前記第１の装置に転送する手段をさらに備えても良い。
【００１７】
好ましくは、前記第１の装置および前記第２の装置はそれぞれ、自装置内の所定の認識方法に適した画像処理に必要な情報が先にアクティブになった場合、他装置にアクティブになった該情報を非アクティブにするよう通知する手段と、他装置から前記所定の認識方法に適した画像処理に必要な情報を非アクティブにするよう通知を受けた場合、該情報を非アクティブにする手段とをさらに備えても良い。
【００１８】
また、本発明は、入力画像の画像処理により入力を行なうユーザインタフェース装置による指示入力方法であって、対象物体の入力画像を画像処理し、この画像処理の結果に基づいて、ポインティングを行なうモードと、それ以外のモードとを切り換えることを特徴とする。
【００１９】
また、本発明は、反射画像を入力する第１の装置と入力画像の画像処理により入力を行なう第２の装置からなるユーザインタフェース装置による指示入力方法であって、前記第２の装置から前記第１の装置に対して、入力画像の画像処理内容を制約した認識方法（認識エンジン）の指定を行ない、前記第１の装置は、前記認識方法の指定に基づいて所定の画像処理を行ない、これら入力した画像と画像処理結果とを前記第２の装置に返すことを特徴とする。
【００２０】
本発明によれば、カーソルの移動モード、選択のモード、ダブルクリックのモードなどの各モードを変えるためのユーザによる明示的な操作が不要になる。
【００２１】
また、ユーザの指示した点を認識処理で読み取って画面上のカーソル移動等に反映させるので、ユーザの操作によるキャリブレーションが不要である。
【００２２】
さらに、必要な認識方法に適した画像処理手段（認識エンジン）を用いれば、入力精度の向上とユーザの操作性の向上が期待できる。
【００２３】
また、入力画像を半透明にしてカーソルに重ねて表示すれば、ユーザに操作状況をフィードバックすることができる。
【００２４】
このように本発明によれば、ユーザの操作上の負担を軽減したより使い易いユーザインタフェース装置を提供することができる。
【００２５】
また、本発明によれば、第１の装置側（デバイス側）である程度の認識処理を行なうので、負荷分散ができ、また全体的な認識処理の速度向上を図ることができる。
【００２６】
また、画像入力機能を持つデバイスの高機能化を図ることができる。
【００２７】
なお、以上の各装置に係る発明は、方法に係る説明としても成立する。
【００２８】
また、上記の発明は、相当する手順あるいは手段をコンピュータに実行させるためのプログラムを記録した機械読取り可能な媒体としても成立する。
【００２９】
【発明の実施の形態】
以下、図面を参照しながら発明の実施の形態を説明する。
【００３０】
まず、第１の実施形態について説明する。
【００３１】
図１は、本発明の一実施形態に係るユーザインタフェース装置の構成例を示す図である。
【００３２】
本ユーザインタフェース装置は、例えば、グラフィックユーザインタフェースを持つ計算機に適用すると好適なものである。すなわち、表示画面上にカーソル、スライダーバー、スクロールバー、プルダウンメニュー、ボックス、リンク、アプリケーションなどのアイコンが表示され、ユーザが入力デバイスによりカーソルの移動、アイコンの選択、アプリケーションの起動などの指示を入力するようなシステムであって、入力デバイスがマウスのような専用器具を必要とせずユーザの手などの物体を画像処理することにより入力を受けるものである。
【００３３】
本実施形態は、概略的には、ユーザの手などの対象物体による反射光を画像として捉えることにより（あるいは背景の光の対象物体による反射光を画像として捉えることにより）、その形状、動き、距離情報などを検出し、その形状等に応じた所定の制御（例えば入出力装置に関する制御あるいはアプリケーションソフトの起動など）を行なうもので、ユーザは手のモーションなどにより意図する入力を行なうことができる機能を提供するものである。また、画像処理結果に応じて、カーソルの移動、アイコンの選択、アプリケーションの起動などの各モードを切り替えるようにしているので、ユーザは各モードを切り替えるための明示的な操作を不要としている。
【００３４】
本ユーザインタフェース装置は、画像入力部１０、画像記憶部１１、形状解釈部１２、解釈規則記憶部１３、呈示部１４、カーソル切替部１５を備えている。
【００３５】
図２に、画像入力部１０の一構成例を示す。
【００３６】
画像入力部１０は、例えば、ＬＥＤなどの発光素子により近赤外線などの光を対象物体に照射する発光部１０１と、対象物体からの反射光を２次元アレイ状に配列された受光素子で受光する反射光抽出部１０２と、発光部１０１および反射光抽出部１０２の動作タイミングを制御するタイミング制御部１０３を有する。発光部１０１が発光しているときに反射光抽出部１０２で受光した光の量と、発光部１０１が発光していないときに反射光抽出部１０２で受光した光の量の差をとることによって、バックグラウンドの補正を行ない、発光部１０１からの光の対象物体による反射光の成分だけを取り出す。なお、画像入力部１０は、発光部を持たず、ＣＣＤカメラなどの受光部のみ持つものであっても良い。
【００３７】
図３に、表示装置２０と画像入力部１０の筐体８と対象物体２２の関係を示す。例えば、画像入力部１０の前にユーザの手２２を持ってきた場合、その手からの反射光画像が得られる。このとき、反射光画像の各画素値は、物体の性質（光を鏡面反射する、散乱する、吸収する、など）、物体面の向き、物体の距離、などに影響されるが、物体全体が一様に光を散乱する物体である場合、その反射光量は物体までの距離と密接な関係を持つ。手などはこのような性質を持つため、手を差し出した場合の反射光画面は、手の距離、手の傾き（部分的に距離が異なる）、などを反映する。したがって、これらの情報を抽出することによって、様々な情報の入力・生成が可能になる。
【００３８】
画像記憶部１１は、画像入力部１０から所定時間毎（例えば１／３０秒毎、１／６０秒毎、１／１００秒毎など）に出力される画像検出対象物体の２次元画像を逐次記憶する。
【００３９】
形状解釈部１２は、画像記憶部１１に記憶された２次元画像を、Ｎ×Ｎ（例えば６４×６４）のドットマトリクスとして逐次取込む。各画素は階調（例えば８ビット＝２５６階調）を持つものとする。
【００４０】
また、形状解釈部１２は、ドットマトリクスから所定の特徴量を抽出し、解釈規則記憶部１３に記憶された解釈規則をもとに、形状解釈する。そして、適合する解釈規則に応じた指示を解釈結果として出力する。もし適合する解釈規則がなければ、必要に応じてドットマトリクスからの所定の特徴量の抽出の仕方を変更して（例えばドットマトリクスのしきい値処理を行なう場合、そのしきい値を変更する）、再度マッチング処理をやり直すようにしても良い。最終的に適合する解釈規則がなければ、入力はなかったものとする。
【００４１】
解釈規則記憶部１３は、形状解釈のための解釈規則を記憶する。例えば、ドットマトリックス中におけるユーザの手などの対象物体の特徴量、例えば形状、面積、最上点、重心などの所定のものと、これに応じた指示内容が解釈規則として記憶されている。指示内容には、アイコンの選択、アプリケーションの起動、カーソル移動等がある。カーソル移動の場合には、手の移動方向・距離に応じたカーソルの移動量も指示される。例えば、親指と人差し指を開いて立てた状態をカーソル移動に対応させ（この場合、例えば人差し指の先端の移動距離・方向がカーソルの移動距離・方向に対応づけられる）、親指と人差し指を閉じて立てた状態を、カーソルの位置するアイコンの選択に対応させ、親指と人差し指を立て、カーソル移動の場合に対して手のひらを反転させた状態を、カーソルの位置するアイコンに対応するアプリケーションの起動に対応させるなどの規則が考えられる。
【００４２】
形状解釈部１２による形状解釈におけるドットマトリクスから特徴量の抽出の代表例は、距離情報の抽出と、領域抽出である。物体が一様で均質な散乱面を持つ物体であれば、反射光画像は距離画像とみなすことができる。したがって、受光部から見込んだ物体の立体形状を抽出することができる。物体が手であれば、手のひらの傾きなどが検出できる。手のひらの傾きは部分的な距離の違いとして現れる。また、手を移動させたときに画素値が変われば、距離が移動したと見ることができる。また、背景のように遠い物体からの反射光はほとんどないため、反射光画像からあるしきい値以上の領域を切り出すという処理で、物体の形状を簡単に切り出すことができる。例えば、物体が手であれば、そのシルエット像を切り出すのは極めて容易である。距離画像を用いる場合でも、一度しきい値によって領域抽出をしておいてから、その領域内の距離情報を用いる、という場合が多い。
【００４３】
ドットマトリクスから抽出した特徴量と解釈規則とのマッチングの手法には種々のものがある。例えば、画像からベクトルを抽出するベクトル化、形状モデルにもとづいた形状の変形状態の抽出、走査線上の距離値にもとづいたスペクトル解析などである。
【００４４】
もし適合する形状がなければ、例えばしきい値を変更するなどして、再度マッチング処理をやり直すようにしても良い。最終的に適合する形状がなければ、入力はなかったものと見なす。
【００４５】
なお、形状解釈部１２がアプリケーションやＯＳの機能などを起動する指示と認識した場合は、これらソフトウェアを起動させるようにする。
【００４６】
呈示部１４は、表示装置にて形状解釈部１２による解釈結果を反映した呈示を行なう。例えば、カーソルの移動、必要に応じてメッセージの呈示などを行う。
【００４７】
カーソル切替部１５は、形状解釈部１２による解釈結果に基づいてカーソルの切替えを制御する。
【００４８】
図４〜図６に、本実施形態のユーザインタフェース装置の動作手順例を示す。
【００４９】
まず、カーソルの制御状態Ｃを初期化（Ｃ←０）し、選択状態Ｓを初期化（Ｓ←０）し、カーソル情報Ｉを初期化（Ｉ←０）し、認識エンジンフラグＲを初期化（Ｒ←０）する（ステップＳ１）。
【００５０】
次に、反射画像を読み込み画像記憶部１１へ書き込む（ステップＳ２）。
【００５１】
次に、形状解釈部１２にドットマトリックスを読み込む（ステップＳ３）。
【００５２】
次に、形状解釈部１２にてドットマトリックスから抽出した特徴量と解釈規則からジェスチャの示すモードを判定する（ステップＳ４）。
【００５３】
以降は、この判定結果とパラメータ値によって処理が分かれる。
【００５４】
カーソル制御のジェスチャかつＣ＝０かつＳ＝０かつＲ＝０ならば（ステップＳ５）、カーソル制御に入る処理であり、Ｃ←１として（ステップＳ１１）、ステップＳ２へ戻る。
【００５５】
カーソル制御のジェスチャは、例えば図７に示す手の形状が認識された場合である。
【００５６】
カーソル制御のジェスチャかつＣ＝１かつＳ＝０かつＲ＝０ならば（ステップＳ６）、カーソル移動中の処理である。この場合、まず、ドットマトリックスより近傍点の座標（ｘ，ｙ）を算出し（ステップＳ１２）、算出座標（ｘ，ｙ）にカーソルを移動させる（ステップＳ１３）。そして、算出座標（ｘ，ｙ）を保持する。Ｃｐ＝（ｘ，ｙ）とする（ステップＳ１４）。ここで、Ｃｐにオブジェクトがあるならば（ステップＳ１５）、そのオブジェクトの状態をセット（Ｉ←オブジェクト状態）し（ステップＳ１６）、オブジェクトがなければ（ステップＳ１５）、Ｉ＝Ｏとして（ステップＳ１７）、ステップＳ２へ戻る。
【００５７】
カーソル制御のジェスチャかつＣ＝１かつＳ＝１ならば（ステップＳ７）、カーソル制御に戻る処理であり、Ｓ←０、Ｒ←０、Ｉ←０として（ステップＳ１８）、ステップＳ２へ戻る。
【００５８】
選択のジェスチャかつＣ＝１かつＳ＝０かつＲ＝０ならば（ステップＳ８）、オブジェクトを選択する処理である。この場合、まず、Ｓ←１とし（ステップＳ１９）、Ｃｐに最も近いオブジェクトを探索し（ステップＳ２０）、探索したオブジェクトを選択する（ステップＳ２１）。ここで、選択したオブジェクトに指定の認識エンジンがあるならば（ステップＳ２２）、Ｒ←１とし（ステップＳ２３）、ステップＳ２に戻る。指定の認識エンジンがない場合、リンクオブジェクトならば（ステップＳ２４）、リンク先へ飛び（ステップＳ２５）、Ｃ←０、Ｓ←０、Ｉ←０として（ステップＳ２６）、ステップＳ２に戻る。リンクオブジェクトでないならば（ステップＳ２４）、そのままステップＳ２に戻る。
【００５９】
選択のジェスチャは、例えば図１１に示す手の形状が認識された場合である。
【００６０】
ここで、認識エンジンは、詳しくは後述するようにドットマトリックスから所定の特徴量を抽出するものであり、例えば、ドットマトリックス中の対象物体形状の最上位点の垂直方向の移動量を抽出する最上位点垂直方向エンジンなど、種々のものがある。
【００６１】
選択のジェスチャかつＲ＝１ならば（ステップＳ９）、選択したオブジェクトを移動する処理であり、認識オブジェクトに沿った認識を行い（ステップＳ２７）、ステップＳ２に戻る。
【００６２】
ダブルクリックのジェスチャかつＣ＝１ならば（ステップＳ１０）、ダブルクリックの処理である。この場合、Ｃｐに最も近いオブジェクトを開き（ステップＳ２８）、Ｃ←０、Ｓ←０、Ｉ←０、Ｒ←０として（ステップＳ２９）、ステップＳ２に戻る。
【００６３】
ダブルクリックのジェスチャは、例えば図１４に示す手の形状が認識された場合である。
【００６４】
上記以外の場合は、他の認識処理を行い（ステップＳ３０）、ステップＳ２に戻る。
【００６５】
以下では、具体例を用いて本実施形態を説明する。
【００６６】
まず、図７のように親指と人差し指を延ばした手の形状が認識された場合に、カーソル移動が指示されるものとする。
【００６７】
例えば図８（ａ）の状態で、図７の手の形状が認識され、カーソル移動を受け付ける状態となり、ユーザが手の形状を図７の状態で移動させると、これに応じてカーソル２０１が表示画面上で移動する。この場合、ユーザの手の形状のドットマトリックス中における一定の点、例えば画像中で縦方向の位置が最も上である最上位点（例えば人差し指の先端部に対応）、あるいは画像中で最も受光部に近い最近点（例えば最も高階調の点）などにおける移動量／方向が抽出される。
【００６８】
なお、図８（ｂ）に示すように、カーソル２０１上に入力画像２０２を半透明にして表示し、ユーザに認識状況をフィードバックするようにしても良い。
【００６９】
カーソルがスライダーバーやリンクノードなど種々のオブジェクトの上にある場合は、その機能を呈示するようにカーソルの形状を変える。
【００７０】
例えば図９（ａ）のように、カーソルの移動したところにスライダーバーがある場合、カーソルの形状を図中２０３のように変形する。なお、図９（ｂ）のように、カーソル上に入力画像を半透明２０５にして表示しても良い。
【００７１】
ここで、図９（ａ）中の２０４で示される矢印は、操作対象であるスライダーバーの操作が垂直方向に制約されることを呈示している。すなわち、スライダーバーには、垂直方向エンジンが指定されている。この場合、ユーザが手をどう動かそうと、認識されるのは、垂直方向の移動のみとなる。
【００７２】
また例えば図１０（ａ）のように、カーソルの移動したところにリンクノードがある場合、カーソルの形状を図中２０６のように変形する。なお、図１０（ｂ）のように、カーソル上に入力画像２０７を半透明にして表示しても良い。
【００７３】
次に、図７の手の形状でカーソルを所望の位置に移動させた後、図１１のように人差し指を延ばし、親指を閉じた手の形状が認識された場合に、カーソルによって示されるオブジェクトの選択が指示されるものとする。これは、マウスのシングルクリックに相当する。
【００７４】
例えば図１２（ａ）のような状態で、図１１の手の形状にすると、スライダバーが選択される。なお、図中２０８で示される矢印は、操作対象であるスライダーバーの操作が垂直方向に制約されることを呈示している。なお、図１２（ｂ）のように、カーソル上に入力画像２０９を半透明にして表示しても良い。
【００７５】
また例えば図１０（ａ）の状態で、図１１の手の形状にすると、リンクノード「会社概要」が選択され、図１３（ａ）のように表示内容が変わる。なお、図１３（ｂ）のように、カーソル上に入力画像２１０を半透明にして表示しても良い。
【００７６】
次に、図７の手の形状でカーソルを所望の位置に移動させた後、図１４のように手の形状自体は変えずに手首を１８０度回転させた状態が認識された場合に、カーソルによって示されるオブジェクトについてダブルクリックが指示されるものとする。
【００７７】
例えば図１５（ａ）において、図７の手の形状で、「インタネット」のアイコン上のカーソルを移動させ、続いて手を図１４のように回転させる。すると、「インタネット」のダブルクリックが受け付けられる。図１５（ｂ）にアイコンを選択してアイコンが開く直前の状態を示す。なお、図１６のように、カーソル上に入力画像２１２を半透明にして表示しても良い。
【００７８】
また例えば、まず図１７（ａ）のようにカーソルを「ファイル」のところに移動させる。このとき、図１７（ｂ）のように、カーソル上に入力画像２１３を半透明にして表示しても良い。
【００７９】
次に、「ファイル」を選択すると、図１８（ａ）に示すように、プルダウンメニューが表示される。ここで、図中２１４で示される矢印は、操作対象であるプルダウンメニューの操作が垂直方向に制約されることを呈示している。
【００８０】
次に、図１８（ａ）のようにカーソルを「保存」の上に位置させ、図１４のような手の形状にすることにより、「保存」のダブルクリックが受け付けられる。ここでも、図１８（ｂ）のように、カーソル上に入力画像２１３を半透明にして表示しても良い。
【００８１】
「保存」がダブルクリックされると、カーソルの形状を例えば図１９（ａ）の２１６のように変形する。これは、保存の選択により文書を保存する操作中であることを示す。ここでも、図１９（ｂ）のように、カーソル上に入力画像２１７を半透明にして表示しても良い。
【００８２】
また例えば、まず図２０（ａ）のようにカーソルを「ファイル」のところに移動させ、「ファイル」を選択し、さらに印刷のところに移動させ、「印刷」を選択する。すると、「印刷」に対応するメニューが表示される。このとき、図２０（ｂ）のように、カーソル上に入力画像２１９を半透明にして表示しても良い。
【００８３】
次に、認識エンジンについて説明する。
【００８４】
図２１に、認識エンジンに関する処理のフローチャートを示す。
【００８５】
本実施形態では、必要に応じオブジェクトごとに認識エンジンを指定するものとする。
【００８６】
認識エンジンは、ドットマトリックスから所定の特徴量を抽出するものである。すなわち、形状解釈部１２は、選択されたオブジェクトについて指定されている認識エンジンがあるならば（ステップＳ３１）、指定された認識エンジンに従ってドットマトリックスから特徴量を抽出する。認識エンジンの指定がなければ（ステップＳ３１）、通常の認識を実行する（ステップＳ３２）。
【００８７】
認識エンジンには、例えば、
ドットマトリックス中の対象物体形状の最近点の垂直方向の移動量を抽出する最近点垂直方向エンジン１２１、最近点の水平方向の移動量を抽出する最近点水平方向エンジン１２２、最近点の斜め方向の移動量を抽出する最近点斜め方向エンジン１２３、重心点の垂直方向の移動量を抽出する重心点垂直方向エンジン１２４、重心点の水平方向の移動量を抽出する重心点水平方向エンジン１２５、重心点の斜め方向の移動量を抽出する重心点斜め方向エンジン１２６、重心点の垂直方向の移動量を抽出する重心点垂直方向エンジン１２７、重心点の水平方向の移動量を抽出する重心点水平方向エンジン１２８、重心点の斜め方向の移動量を抽出する重心点斜め方向エンジン１２９、ドットマトリックス中の対象物体形状のエッジを切り出すエッジ切り出しエンジン１３０、ドットマトリックス中の対象物体形状の面積を算出する面積算出エンジン１３１、ドットマトリックス中の対象物体形状の最近点のｘ軸回り回転角を抽出する最近点ｘ軸回り回転角エンジン１３２、最近点のｙ軸回り回転角を抽出する最近点ｙ軸回り回転角エンジン１３３、最近点のｚ軸回り回転角を抽出する最近点ｚ軸回り回転角エンジン１３４、重心点のｘ軸回り回転角を抽出する重心点ｘ軸回り回転角エンジン１３５、重心点のｙ軸回り回転角を抽出する重心点ｙ軸回り回転角エンジン１３６、重心点のｚ軸回り回転角を抽出する重心点ｚ軸回り回転角エンジン１３７、重心点のｘ軸回り回転角を抽出する重心点ｘ軸回り回転角エンジン１３８、重心点のｙ軸回り回転角を抽出する重心点ｙ軸回り回転角エンジン１３９、重心点のｚ軸回り回転角を抽出する重心点ｚ軸回り回転角エンジン１４０、その他、所定のエンジンの重み付け組合わせによる認識エンジン１４１など、種々のものが考えられる。
【００８８】
図２２には、オブジェクト毎の認識エンジンの記述の一例を示す。
【００８９】
以上のように本実施形態によれば、カーソルの移動モード、選択のモード、ダブルクリックのモードなどの各モードを変えるためのユーザによる明示的な操作が不要になる。
【００９０】
また、ユーザの指示した点を認識処理で読み取って画面上のカーソル移動等に反映させるので、ユーザの操作によるキャリブレーションが不要である。
【００９１】
さらに、認識エンジンを用いれば、入力精度の向上とユーザの操作性の向上が期待できる。
【００９２】
また、入力画像を半透明にしてカーソルに重ねて表示すれば、ユーザに操作状況をフィードバックすることができる。
【００９３】
このように本実施形態によれば、ユーザの操作上の負担を軽減したより使い易いユーザインタフェース装置を提供することができる。
【００９４】
次に、第２の実施形態について説明する。
【００９５】
第２の実施形態は、第１の実施形態とは基本的に同様であるが、本実施形態では、画像入力部側筐体内（以下、デバイス側と言う）で認識処理の一部を行ない、画像入力部１０側から本体側に入力画像のドットマトリックスと所定の認識結果を渡すようにしたものである。なお、デバイス側で行なう認識処理は、負荷の軽い処理であることが望ましい。
【００９６】
図２３に、本実施形態に係るインタフェース装置の一構成例を示す。
【００９７】
図２３では、本体内に、本体制御部３２、呈示部１４、カーソル切替部１５が配置され、デバイス側に、画像入力部１０、画像記憶部１１、認識エンジン制御部３０、アクティブリスト３１、いくつかの所定の認識エンジン１２１，１２２，１４２，１４３，１４４が配置される。
【００９８】
本体制御部３２は、例えば、図１の形状解釈部１２、解釈規則記憶部１３（および認識エンジンを含む）に相当するものであるが、これに限らず、他の構成を持つもの、他の認識処理を行なうものであっても、認識エンジンを用いるものであれば構わない。
【００９９】
図２４には、垂直スライダーバー選択時のアクティブリスト記憶部の記述の一例を示す。この場合、カーソルエンジン１４２と最近点垂直方向エンジン１２１が指定されていることが示される。
【０１００】
このような構成において、本体側からデバイス側に、アクティブにする認識エンジンのリストまたはその反対に非アクティブにする認識エンジンのリストを送る。デバイス側では、このリストをアクティブリスト記憶部３１に格納し、認識エンジン制御部３０により指定の認識エンジンに従って入力画像から所定の特徴量を認識結果として抽出し、入力画像と認識結果を本体側に返す。
【０１０１】
本実施形態によれば、デバイス側である程度の認識処理を行なうので、負荷分散ができ、また全体的な認識処理の速度向上を図ることができる。
【０１０２】
また、画像入力機能を持つデバイスの高機能化を図ることができる。
【０１０３】
図２７に、本実施形態に係るインタフェース装置のさらに他の構成例を示す。
【０１０４】
図２７では、本体内に、図２５とほぼ同様の構成であるが、本構成例は、認識エンジンが本体側とデバイス側などの複数の場所に存在する場合、先にアクティブになった認識エンジンを持つ側が、同一の認識エンジンを持つ他の側に、該認識エンジンを非アクティブにするように通信するようにしたものである。
【０１０５】
この場合のアクティブ通知の動作手順例を図２８に、アクティブ通知受理の動作手順例を図２９にそれぞれ示す。
【０１０６】
まず、デバイス側で画像入力が行なわれ（ステップＳ３９）、本体側および／またはデバイス側で認識で実行される（ステップＳ４０）。そして、認識実行した側は通知相手に、認識結果と画像マトリックスとアクティブリスト（または非アクティブリスト）を送る（ステップＳ４１）。
【０１０７】
次に、通知を受ける側は、認識結果と画像マトリックスとアクティブリスト（または非アクティブリスト）を受理し（ステップＳ４２）、受理したアクティブリスト中でＯＮの認識エンジン（または非アクティブリスト中でＯＦＦの認識エンジン）を、保持しているアクティブリスト中でＯＦＦに書き替える（ステップＳ４３）。そして、必要に応じて他の処理を実行する（ステップＳ４４）。
【０１０８】
なお、以上の各機能は、ソフトウェアとしても実現可能である。また、上記した各手順あるいは手段をコンピュータに実行させるためのプログラムを記録した機械読取り可能な媒体として実施することもできる。
【０１０９】
本発明は、上述した実施の形態に限定されるものではなく、その技術的範囲において種々変形して実施することができる。
【０１１０】
【発明の効果】
本発明によれば、画像処理結果に基づいて、カーソルの移動モード、選択のモード、ダブルクリックのモードなどの各モードを切り換えるので、モードを変えるためのユーザによる明示的な操作が不要になる。
【０１１１】
また、ユーザの指示した点を認識処理で読み取って画面上のカーソル移動等に反映させるので、ユーザの操作によるキャリブレーションが不要である。
【０１１２】
このように本発明によれば、ユーザの操作上の負担を軽減したより使い易いユーザインタフェース装置を提供することができる。
【０１１３】
また、本発明によれば、第１の装置側（デバイス側）である程度の認識処理を行なうので、負荷分散ができ、また全体的な認識処理の速度向上を図ることができる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態に係るインタフェース装置の構成例を示す図
【図２】画像入力部の構成例を示す図
【図３】表示装置と画像入力部の筐体と対象物体の関係を説明するための図
【図４】同実施形態のユーザインタフェース装置の動作手順例を示すフローチャート
【図５】同実施形態のユーザインタフェース装置の動作手順例を示すフローチャート
【図６】同実施形態のユーザインタフェース装置の動作手順例を示すフローチャート
【図７】カーソル制御のジェスチャを示す入力画像の一例を示す図
【図８】画面表示例を示す図
【図９】画面表示例を示す図
【図１０】画面表示例を示す図
【図１１】選択のジェスチャを示す入力画像の一例を示す図
【図１２】画面表示例を示す図
【図１３】画面表示例を示す図
【図１４】ダブルクリックのジェスチャを示す入力画像の一例を示す図
【図１５】画面表示例を示す図
【図１６】画面表示例を示す図
【図１７】画面表示例を示す図
【図１８】画面表示例を示す図
【図１９】画面表示例を示す図
【図２０】画面表示例を示す図
【図２１】指定された認識エンジンに従った処理について説明するための図
【図２２】オブジェクト毎の認識エンジンの記述の一例を示す図
【図２３】本発明の第２の実施形態に係るインタフェース装置の一構成例を示す図
【図２４】垂直スライダーバー選択時のアクティブリスト記憶部の記述の一例を示す図
【図２５】本発明の第２の実施形態に係るインタフェース装置の他の構成例を示す図
【図２６】同実施形態のユーザインタフェース装置の動作手順例を示すフローチャート
【図２７】本発明の第２の実施形態に係るインタフェース装置のさらに他の構成例を示す図
【図２８】同実施形態のユーザインタフェース装置の動作手順例を示すフローチャート
【図２９】同実施形態のユーザインタフェース装置の動作手順例を示すフローチャート
【符号の説明】
１０…画像入力部
１１…画像記憶部
１２…形状解釈部
１３…解釈規則記憶部
１４…呈示部
１５…カーソル切替部
１０１…発光部
１０２…反射光抽出部
１０３…タイミング制御部
２０…表示装置
８…筐体
３０…認識エンジン制御部
３１，３４…アクティブリスト記憶部
３２…本体制御部
３３…認識エンジン保存部
１２１…最近点垂直方向エンジン
１２２…最近点水平方向エンジン
１４２…カーソルエンジン
１４３…選択エンジン
１４４…ダブルクリックエンジン
１５１〜１５Ｎ…認識エンジン[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a user interface device and an input method for performing input by image processing.
[0002]
[Prior art]
Mice are overwhelmingly used as input devices for computers. However, what can be operated with the mouse is the movement of the cursor and the selection of menus, and is merely useful as a two-dimensional pointing device. The mouse can handle two-dimensional information, and it is difficult to select an object having a depth such as an object in a three-dimensional space. In addition, when creating an animation, it is difficult to make a natural movement with an input device such as a mouse to make the movement of the character.
[0003]
In order to compensate for the difficulty of pointing in three-dimensional space, push or rotate the ball in the desired direction to input information in six axes, or fit into so-called data gloves, data suits, cyber gloves, etc. Devices have been developed. However, these devices are not as widely used as initially expected due to poor operability and the like.
[0004]
On the other hand, recently, a direct instruction type input device has been developed in which a user can input information intended by the user by hand or gesture without handling a special device.
[0005]
For example, by irradiating light, receiving light reflected by the user's hand, imaging the light, performing feature amount extraction and shape recognition processing, and executing control according to the shape of the hand, and controlling the amount of movement of the hand. Some of them move the cursor by a certain amount or change the viewpoint in the three-dimensional model.
[0006]
Alternatively, there is a method in which the same processing as described above is performed by capturing a video of a user's hand motion and analyzing the video image.
[0007]
With such a device, the user can easily perform input by gesture without wearing a special device.
[0008]
[Problems to be solved by the invention]
However, in this type of device, each mode such as a cursor movement mode, a selection mode, and a double-click mode is fixedly used, and an explicit operation for changing the mode is required to change the mode. Therefore, there is a problem that the user is burdened with the operation.
[0009]
SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and provides a user interface device for performing input by image processing, which provides a user interface device and an instruction input method that are easier to use and reduce the burden on the user. Aim.
[0010]
[Means for Solving the Problems]
The present invention is a user interface device for performing input by image processing of an input image, comprising a means for switching between a pointing mode and other modes based on an image processing result of the input image. And
[0011]
Further, the present invention is a user interface device for performing input by image processing of an input image, wherein at least a mode of moving a cursor, a mode of selection, and a mode of double-clicking are set in an image processing result of the input image. And means for switching based on the information.
[0012]
Preferably, the apparatus further comprises means for designating a recognition method (recognition engine) that restricts image processing contents for each selectable object in the selection mode, and in performing image processing of the input image for the selected object, Image processing may be performed according to a recognition method specified for the object.
[0013]
Preferably, for each selectable object in the selection mode, means for designating a recognition method (recognition engine) that restricts image processing contents, and a method for designating the object in the vicinity of the displayed object indicated by the cursor. And means for presenting information indicating the recognition method.
[0014]
Preferably, the apparatus may further include means for presenting an image processing result of the input image in a predetermined shape on a cursor.
[0015]
The present invention is also a user interface device comprising a first device for inputting a reflection image and a second device for inputting the input image by image processing, wherein the second device is the first device. A means for designating a recognition method (recognition engine) restricting image processing content of an input image, wherein the first device performs predetermined image processing based on the designation of the recognition method; Means for returning the input image and the image processing result to the second device.
[0016]
Preferably, when the first device does not have an image processing unit (recognition engine) suitable for a required recognition method, the second device stores information (a recognition engine) required for image processing suitable for the recognition method. ) May be further provided, and the second device may further comprise means for transferring the requested information to the first device.
[0017]
Preferably, each of the first device and the second device is activated by another device when information necessary for image processing suitable for a predetermined recognition method in the own device is activated first. Means for notifying the information to be inactive, and means for inactivating the information when notified from another device to deactivate information necessary for image processing suitable for the predetermined recognition method. May be further provided.
[0018]
The present invention is also directed to a method for inputting an instruction by a user interface device for performing input by image processing of an input image, wherein a mode for performing image processing on an input image of a target object and performing pointing based on a result of the image processing is provided. , And other modes.
[0019]
The present invention also provides an instruction input method using a user interface device including a first device for inputting a reflection image and a second device for inputting the input image by image processing. The first device specifies a recognition method (recognition engine) that restricts the image processing content of the input image, and the first device performs predetermined image processing based on the specification of the recognition method. The input image and the image processing result are returned to the second device.
[0020]
According to the present invention, an explicit operation by the user for changing each mode such as a cursor movement mode, a selection mode, and a double-click mode becomes unnecessary.
[0021]
In addition, since the point designated by the user is read by the recognition process and reflected on the movement of the cursor on the screen, the calibration by the user's operation is unnecessary.
[0022]
Furthermore, if an image processing means (recognition engine) suitable for a necessary recognition method is used, improvement in input accuracy and operability for the user can be expected.
[0023]
Further, if the input image is made translucent and displayed over the cursor, the operation status can be fed back to the user.
[0024]
As described above, according to the present invention, it is possible to provide an easier-to-use user interface device in which the operational burden on the user is reduced.
[0025]
Further, according to the present invention, since a certain degree of recognition processing is performed on the first device side (device side), the load can be distributed, and the overall speed of the recognition processing can be improved.
[0026]
Further, it is possible to enhance the functionality of a device having an image input function.
[0027]
It should be noted that the inventions relating to the respective devices described above are also valid as descriptions relating to methods.
[0028]
Further, the above-described invention is also realized as a machine-readable medium storing a program for causing a computer to execute a corresponding procedure or means.
[0029]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the invention will be described with reference to the drawings.
[0030]
First, a first embodiment will be described.
[0031]
FIG. 1 is a diagram illustrating a configuration example of a user interface device according to an embodiment of the present invention.
[0032]
The present user interface device is preferably applied to, for example, a computer having a graphic user interface. That is, icons such as a cursor, a slider bar, a scroll bar, a pull-down menu, a box, a link, and an application are displayed on a display screen, and a user inputs an instruction such as moving a cursor, selecting an icon, or starting an application using an input device. In this system, an input device receives an input by performing image processing on an object such as a user's hand without requiring a dedicated device such as a mouse.
[0033]
In the present embodiment, the shape, the motion, and the like of the reflected light by the target object such as the user's hand are roughly captured (or the reflected light of the background light by the target object is captured as an image). It detects distance information and the like, and performs predetermined control (for example, control on an input / output device or activation of application software) in accordance with the shape or the like. The user can perform an intended input by hand motion or the like. It provides functions. In addition, since each mode such as cursor movement, icon selection, and application startup is switched according to the image processing result, the user does not need to perform an explicit operation for switching each mode.
[0034]
The user interface device includes an image input unit 10, an image storage unit 11, a shape interpretation unit 12, an interpretation rule storage unit 13, a presentation unit 14, and a cursor switching unit 15.
[0035]
FIG. 2 shows a configuration example of the image input unit 10.
[0036]
The image input unit 10 receives, for example, a light-emitting unit 101 that irradiates a target object with light such as near-infrared light by a light-emitting element such as an LED, and receives light reflected from the target object by a light-receiving element arranged in a two-dimensional array. It has a reflected light extraction unit 102 and a timing control unit 103 that controls the operation timing of the light emitting unit 101 and the reflected light extraction unit 102. By calculating the difference between the amount of light received by the reflected light extraction unit 102 when the light emitting unit 101 is emitting light and the amount of light received by the reflected light extraction unit 102 when the light emitting unit 101 is not emitting light Then, the background is corrected, and only the component of the light from the light emitting unit 101 reflected by the target object is extracted. Note that the image input unit 10 may have only a light receiving unit such as a CCD camera without having a light emitting unit.
[0037]
FIG. 3 shows the relationship between the display device 20, the housing 8 of the image input unit 10, and the target object 22. For example, when the user's hand 22 is brought in front of the image input unit 10, a reflected light image from the hand is obtained. At this time, each pixel value of the reflected light image is affected by the properties of the object (specular reflection, scattering, absorption, etc. of the light), the direction of the object surface, the distance of the object, and the like. When an object scatters light uniformly, the amount of reflected light has a close relationship with the distance to the object. Since a hand or the like has such a property, the reflected light screen when the hand is extended reflects the distance of the hand, the inclination of the hand (partially different distance), and the like. Therefore, by extracting such information, it is possible to input and generate various information.
[0038]
The image storage unit 11 sequentially stores two-dimensional images of the image detection target object output from the image input unit 10 at predetermined time intervals (eg, every 1/30 second, every 1/60 second, every 1/100 second, etc.). I do.
[0039]
The shape interpretation unit 12 sequentially takes in the two-dimensional images stored in the image storage unit 11 as an N × N (for example, 64 × 64) dot matrix. Each pixel has a gradation (for example, 8 bits = 256 gradations).
[0040]
Further, the shape interpretation unit 12 extracts a predetermined feature amount from the dot matrix and interprets the shape based on the interpretation rules stored in the interpretation rule storage unit 13. Then, an instruction corresponding to the matching interpretation rule is output as an interpretation result. If there is no matching interpretation rule, the method of extracting a predetermined feature amount from the dot matrix is changed as necessary (for example, when performing threshold processing of the dot matrix, the threshold value is changed). Alternatively, the matching process may be performed again. If there is no finally applicable interpretation rule, there is no input.
[0041]
The interpretation rule storage unit 13 stores interpretation rules for shape interpretation. For example, a feature amount of a target object such as a user's hand in a dot matrix, for example, a predetermined value such as a shape, an area, an uppermost point, and a center of gravity, and an instruction corresponding thereto are stored as interpretation rules. Instructions include icon selection, application activation, cursor movement, and the like. In the case of cursor movement, the movement amount of the cursor according to the movement direction and distance of the hand is also instructed. For example, a state in which the thumb and the index finger are opened and set up corresponds to the cursor movement (in this case, for example, the movement distance / direction of the tip of the index finger corresponds to the movement distance / direction of the cursor), and the thumb and the index finger are closed and set up Corresponds to the selection of the icon at which the cursor is located, the thumb and forefinger are set up, and the palm is inverted in the case of cursor movement, and the state where the palm is inverted corresponds to the activation of the application corresponding to the icon at which the cursor is located Such rules are conceivable.
[0042]
Representative examples of extracting a feature amount from a dot matrix in the shape interpretation by the shape interpretation unit 12 include extraction of distance information and region extraction. If the object has a uniform and uniform scattering surface, the reflected light image can be regarded as a distance image. Therefore, the three-dimensional shape of the object viewed from the light receiving unit can be extracted. If the object is a hand, the inclination of the palm can be detected. The tilt of the palm appears as a partial difference in distance. If the pixel value changes when the hand is moved, it can be regarded that the distance has moved. Further, since there is almost no reflected light from a distant object such as the background, the shape of the object can be easily cut out by processing to cut out an area of a certain threshold or more from the reflected light image. For example, if the object is a hand, it is extremely easy to cut out a silhouette image thereof. Even when a distance image is used, it is often the case that an area is extracted once using a threshold, and then the distance information within the area is used.
[0043]
There are various methods for matching the feature quantity extracted from the dot matrix with the interpretation rule. For example, vectorization for extracting a vector from an image, extraction of a deformed state of a shape based on a shape model, spectrum analysis based on a distance value on a scanning line, and the like.
[0044]
If there is no suitable shape, the matching process may be performed again, for example, by changing the threshold value. If there is no final conforming shape, it is assumed that there was no input.
[0045]
When the shape interpreting unit 12 recognizes that the instruction is to activate an application or a function of an OS, the software is activated.
[0046]
The presentation unit 14 performs presentation on the display device reflecting the interpretation result by the shape interpretation unit 12. For example, the cursor is moved, and a message is presented as necessary.
[0047]
The cursor switching unit 15 controls cursor switching based on the result of interpretation by the shape interpretation unit 12.
[0048]
4 to 6 show an example of an operation procedure of the user interface device according to the present embodiment.
[0049]
First, the control state C of the cursor is initialized (C ← 0), the selection state S is initialized (S ← 0), the cursor information I is initialized (I ← 0), and the recognition engine flag R is initialized. (R ← 0) is performed (step S1).
[0050]
Next, the reflection image is read and written into the image storage unit 11 (step S2).
[0051]
Next, a dot matrix is read into the shape interpretation unit 12 (step S3).
[0052]
Next, the mode indicated by the gesture is determined from the feature amount extracted from the dot matrix and the interpretation rule by the shape interpretation unit 12 (step S4).
[0053]
Thereafter, the processing is divided depending on the result of the determination and the parameter value.
[0054]
If the gesture of the cursor control and C = 0 and S = 0 and R = 0 (step S5), the process enters the cursor control, C is set to 1 (step S11), and the process returns to step S2.
[0055]
The gesture of the cursor control is, for example, a case where the hand shape illustrated in FIG. 7 is recognized.
[0056]
If the gesture of the cursor control and C = 1, S = 0 and R = 0 (step S6), the processing is during the movement of the cursor. In this case, first, the coordinates (x, y) of the neighboring point are calculated from the dot matrix (step S12), and the cursor is moved to the calculated coordinates (x, y) (step S13). Then, the calculated coordinates (x, y) are held. It is assumed that Cp = (x, y) (step S14). If there is an object in Cp (step S15), the state of the object is set (I ← object state) (step S16). If there is no object (step S15), I = O is set (step S17). Then, the process returns to step S2.
[0057]
If the gesture of the cursor control and C = 1 and S = 1 (step S7), the process returns to the cursor control. S ← 0, R ← 0, I ← 0 (step S18), and the process returns to step S2.
[0058]
If the gesture of selection and C = 1, S = 0, and R = 0 (step S8), the process is to select an object. In this case, first, S ← 1 is set (step S19), an object closest to Cp is searched (step S20), and the searched object is selected (step S21). Here, if the selected object has the specified recognition engine (step S22), R ← 1 is set (step S23), and the process returns to step S2. If there is no specified recognition engine, if the object is a link object (step S24), jump to the link destination (step S25), set C ← 0, S ← 0, I ← 0 (step S26), and return to step S2. If it is not a link object (step S24), the process returns to step S2.
[0059]
The gesture of selection is, for example, when the hand shape shown in FIG. 11 is recognized.
[0060]
Here, the recognition engine extracts a predetermined feature amount from the dot matrix as described in detail later. For example, the recognition engine extracts a vertical movement amount of the highest point of the target object shape in the dot matrix. There are a variety of things, such as upper point vertical engines.
[0061]
If the gesture for selection and R = 1 (step S9), this is a process for moving the selected object, and recognition is performed along the recognition object (step S27), and the process returns to step S2.
[0062]
If the gesture of the double click and C = 1 (step S10), the process is a double click. In this case, the object closest to Cp is opened (step S28), C ← 0, S ← 0, I ← 0, R ← 0 (step S29), and the process returns to step S2.
[0063]
The double-click gesture is, for example, when the hand shape shown in FIG. 14 is recognized.
[0064]
Otherwise, another recognition process is performed (step S30), and the process returns to step S2.
[0065]
Hereinafter, the present embodiment will be described using a specific example.
[0066]
First, it is assumed that the cursor movement is instructed when the shape of the hand with the thumb and forefinger extended as shown in FIG. 7 is recognized.
[0067]
For example, in the state of FIG. 8A, the hand shape of FIG. 7 is recognized, and the cursor movement is accepted. When the user moves the hand shape in the state of FIG. 7, the cursor 201 is displayed accordingly. Move around on the screen. In this case, a certain point in the dot matrix of the shape of the user's hand, for example, the highest point where the vertical position in the image is the highest (for example, corresponding to the tip of the index finger), or the most light receiving portion in the image The movement amount / direction at the nearest point (for example, the point with the highest gradation) close to is extracted.
[0068]
As shown in FIG. 8B, the input image 202 may be displayed translucently on the cursor 201, and the recognition status may be fed back to the user.
[0069]
When the cursor is over various objects such as a slider bar and a link node, the shape of the cursor is changed so as to exhibit its function.
[0070]
For example, as shown in FIG. 9A, when the slider bar is located where the cursor has moved, the shape of the cursor is deformed as indicated by 203 in the figure. Note that, as shown in FIG. 9B, the input image may be displayed as a translucent 205 on the cursor.
[0071]
Here, the arrow indicated by 204 in FIG. 9A indicates that the operation of the slider bar to be operated is restricted in the vertical direction. That is, a vertical engine is specified for the slider bar. In this case, no matter how the user moves his hand, only the movement in the vertical direction is recognized.
[0072]
Further, for example, when a link node is located at a position where the cursor has moved as shown in FIG. 10A, the shape of the cursor is deformed as indicated by 206 in the figure. Note that, as shown in FIG. 10B, the input image 207 may be displayed translucently on the cursor.
[0073]
Next, after the cursor is moved to a desired position with the hand shape of FIG. 7, the index finger is extended as shown in FIG. 11, and when the shape of the hand with the thumb closed is recognized, the object indicated by the cursor is displayed. A selection shall be indicated. This is equivalent to a single mouse click.
[0074]
For example, in the state shown in FIG. 12A, if the hand is shaped as shown in FIG. 11, the slider bar is selected. The arrow indicated by 208 in the figure indicates that the operation of the slider bar to be operated is restricted in the vertical direction. As shown in FIG. 12B, the input image 209 may be displayed semi-transparently on the cursor.
[0075]
For example, in the state of FIG. 10A, if the hand is shaped as shown in FIG. 11, the link node “company profile” is selected, and the display content changes as shown in FIG. 13A. As shown in FIG. 13B, the input image 210 may be displayed translucently on the cursor.
[0076]
Next, after the cursor is moved to a desired position with the hand shape of FIG. 7 and the state where the wrist is rotated by 180 degrees without changing the hand shape itself as shown in FIG. A double click is instructed for the object indicated by.
[0077]
For example, in FIG. 15A, the cursor on the “Internet” icon is moved in the shape of the hand in FIG. 7, and then the hand is rotated as shown in FIG. Then, a double click of “Internet” is accepted. FIG. 15B shows a state immediately before an icon is selected and the icon is opened. As shown in FIG. 16, the input image 212 may be displayed translucently on the cursor.
[0078]
For example, first, the cursor is moved to “file” as shown in FIG. At this time, the input image 213 may be displayed translucently on the cursor as shown in FIG. 17B.
[0079]
Next, when "File" is selected, a pull-down menu is displayed as shown in FIG. Here, the arrow indicated by 214 in the figure indicates that the operation of the pull-down menu to be operated is restricted in the vertical direction.
[0080]
Next, as shown in FIG. 18A, the cursor is positioned on “save” and the hand is shaped as shown in FIG. 14, so that a double-click of “save” is accepted. Here, as shown in FIG. 18B, the input image 213 may be displayed translucently on the cursor.
[0081]
When "save" is double-clicked, the shape of the cursor is changed to, for example, 216 in FIG. This indicates that the document is being saved by selecting save. Here, as shown in FIG. 19B, the input image 217 may be displayed translucently on the cursor.
[0082]
Further, for example, first, as shown in FIG. 20A, the cursor is moved to "file", "file" is selected, further moved to print, and "print" is selected. Then, a menu corresponding to “print” is displayed. At this time, as shown in FIG. 20B, the input image 219 may be displayed translucently on the cursor.
[0083]
Next, the recognition engine will be described.
[0084]
FIG. 21 shows a flowchart of processing relating to the recognition engine.
[0085]
In the present embodiment, a recognition engine is specified for each object as needed.
[0086]
The recognition engine extracts a predetermined feature amount from the dot matrix. That is, if there is a recognition engine specified for the selected object (step S31), the shape interpretation unit 12 extracts a feature amount from the dot matrix according to the specified recognition engine. If no recognition engine is specified (step S31), normal recognition is executed (step S32).
[0087]
Recognition engines include, for example,
A nearest point vertical engine 121 for extracting the vertical movement amount of the nearest point of the target object shape in the dot matrix, a nearest point horizontal engine 122 for extracting the horizontal movement amount of the nearest point, an oblique direction of the nearest point Nearest point diagonal engine 123 for extracting the amount of movement, centroid vertical engine 124 for extracting the amount of vertical movement of the center of gravity, centroid horizontal engine 125 for extracting the amount of horizontal movement of the center of gravity, centroid Center-of-gravity point diagonal engine 126 for extracting the amount of movement of the center of gravity in the diagonal direction, center-of-gravity point vertical engine 127 for extracting the amount of vertical movement of the center of gravity, and center-of-gravity point horizontal engine for extracting the amount of horizontal movement of the center of gravity 128, a center-of-gravity point oblique direction engine 129 for extracting the amount of movement of the center-of-gravity point in an oblique direction; An extraction engine 130, an area calculation engine 131 for calculating the area of the target object shape in the dot matrix, and a nearest point x-axis rotation angle engine 132 for extracting the x-axis rotation angle of the nearest point of the target object shape in the dot matrix. A closest point y-axis rotation angle engine 133 for extracting the closest point y-axis rotation angle, a closest point z-axis rotation angle engine 134 for extracting the closest point z-axis rotation angle, and a x-axis rotation of the center of gravity Center of gravity point x-axis rotation angle engine 135 for extracting angles, center of gravity point y-axis rotation angle engine 136 for extracting the center of gravity point y-axis rotation angle, center of gravity point z-axis for extraction of the center of gravity point z-axis rotation angle A rotation angle engine 137, a centroid point x-axis rotation angle engine 138 for extracting the x-axis rotation angle of the center of gravity point, a centroid point y-axis rotation angle for extracting the y-axis rotation angle of the center of gravity point Engine 139, the center of gravity z-axis rotation angle engine 140 for extracting z-axis rotation angle of the center of gravity, and other, such as recognition engine 141 by weighting combination predetermined engine, considered various ones.
[0088]
FIG. 22 shows an example of a description of a recognition engine for each object.
[0089]
As described above, according to the present embodiment, an explicit operation by the user for changing each mode such as the cursor movement mode, the selection mode, and the double-click mode becomes unnecessary.
[0090]
In addition, since the point designated by the user is read by the recognition process and reflected on the movement of the cursor on the screen, the calibration by the user's operation is unnecessary.
[0091]
Furthermore, if a recognition engine is used, an improvement in input accuracy and an improvement in user operability can be expected.
[0092]
Further, if the input image is made translucent and displayed over the cursor, the operation status can be fed back to the user.
[0093]
As described above, according to the present embodiment, it is possible to provide an easier-to-use user interface device in which the operational burden on the user is reduced.
[0094]
Next, a second embodiment will be described.
[0095]
The second embodiment is basically the same as the first embodiment, but in the present embodiment, a part of the recognition processing is performed in an image input unit side housing (hereinafter, referred to as a device side). The image input unit 10 transfers a dot matrix of an input image and a predetermined recognition result to the main body. It is desirable that the recognition process performed on the device side is a light load process.
[0096]
FIG. 23 shows a configuration example of the interface device according to the present embodiment.
[0097]
In FIG. 23, a main body control unit 32, a presentation unit 14, and a cursor switching unit 15 are arranged in the main body, and an image input unit 10, an image storage unit 11, a recognition engine control unit 30, an active list 31, The predetermined recognition engines 121, 122, 142, 143, 144 are arranged.
[0098]
The main body control unit 32 corresponds to, for example, the shape interpretation unit 12 and the interpretation rule storage unit 13 (including the recognition engine) in FIG. 1, but is not limited thereto. The recognition processing may be performed as long as the recognition engine is used.
[0099]
FIG. 24 shows an example of the description of the active list storage unit when the vertical slider bar is selected. In this case, it is indicated that the cursor engine 142 and the closest point vertical direction engine 121 are designated.
[0100]
In such a configuration, a list of recognition engines to be activated or a list of recognition engines to be deactivated are transmitted from the main body side to the device side. On the device side, this list is stored in the active list storage unit 31, a predetermined feature amount is extracted from the input image as a recognition result by the recognition engine control unit 30 according to the specified recognition engine, and the input image and the recognition result are stored in the main unit. return.
[0101]
According to the present embodiment, since a certain degree of recognition processing is performed on the device side, the load can be distributed, and the overall speed of the recognition processing can be improved.
[0102]
Further, it is possible to enhance the functionality of a device having an image input function.
[0103]
FIG. 27 shows still another configuration example of the interface device according to the present embodiment.
[0104]
In FIG. 27, the configuration in the main body is almost the same as that in FIG. 25. However, in the present configuration example, when the recognition engine exists in a plurality of places such as the main body and the device side, the recognition engine activated first Is configured to communicate with the other side having the same recognition engine so as to deactivate the recognition engine.
[0105]
FIG. 28 shows an operation procedure example of the active notification in this case, and FIG. 29 shows an operation procedure example of the active notification reception.
[0106]
First, image input is performed on the device side (step S39), and recognition is performed on the main body side and / or device side (step S40). Then, the side that has performed the recognition sends the recognition result, the image matrix, and the active list (or the inactive list) to the notification partner (step S41).
[0107]
Next, the side receiving the notification receives the recognition result, the image matrix, and the active list (or inactive list) (step S42), and recognizes the recognition engine that is ON in the received active list (or OFF in the inactive list). The recognition engine is rewritten to OFF in the active list stored therein (step S43). Then, other processing is executed as needed (step S44).
[0108]
Note that each of the above functions can be implemented as software. Further, the present invention can be embodied as a machine-readable medium storing a program for causing a computer to execute the above-described procedures or means.
[0109]
The present invention is not limited to the above-described embodiment, and can be implemented with various modifications within the technical scope thereof.
[0110]
【The invention's effect】
According to the present invention, since each mode such as a cursor movement mode, a selection mode, and a double-click mode is switched based on the image processing result, an explicit operation by the user to change the mode is not required.
[0111]
In addition, since the point designated by the user is read by the recognition process and reflected on the movement of the cursor on the screen, the calibration by the user's operation is unnecessary.
[0112]
As described above, according to the present invention, it is possible to provide an easier-to-use user interface device in which the operational burden on the user is reduced.
[0113]
Further, according to the present invention, since a certain degree of recognition processing is performed on the first device side (device side), the load can be distributed, and the overall speed of the recognition processing can be improved.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration example of an interface device according to a first embodiment of the present invention.
FIG. 2 is a diagram illustrating a configuration example of an image input unit;
FIG. 3 is a diagram illustrating a relationship between a display device, a housing of an image input unit, and a target object.
FIG. 4 is an exemplary flowchart illustrating an example of an operation procedure of the user interface device according to the embodiment.
FIG. 5 is an exemplary flowchart illustrating an example of an operation procedure of the user interface device of the embodiment.
FIG. 6 is an exemplary flowchart illustrating an example of an operation procedure of the user interface device of the embodiment.
FIG. 7 is a view showing an example of an input image showing a gesture of cursor control;
FIG. 8 is a diagram showing a screen display example.
FIG. 9 is a diagram showing a screen display example.
FIG. 10 is a diagram showing a screen display example.
FIG. 11 is a diagram showing an example of an input image showing a gesture of selection
FIG. 12 is a diagram showing a screen display example.
FIG. 13 is a diagram showing a screen display example.
FIG. 14 is a diagram showing an example of an input image showing a double-click gesture.
FIG. 15 is a diagram showing a screen display example.
FIG. 16 shows a screen display example.
FIG. 17 is a diagram showing a screen display example.
FIG. 18 is a diagram showing a screen display example.
FIG. 19 is a diagram showing a screen display example.
FIG. 20 is a diagram showing a screen display example.
FIG. 21 is a diagram for explaining processing according to a specified recognition engine;
FIG. 22 is a diagram illustrating an example of a description of a recognition engine for each object.
FIG. 23 is a diagram illustrating a configuration example of an interface device according to a second embodiment of the present invention;
FIG. 24 is a diagram showing an example of a description in an active list storage unit when a vertical slider bar is selected.
FIG. 25 is a diagram showing another configuration example of the interface device according to the second embodiment of the present invention.
FIG. 26 is an exemplary flowchart illustrating an example of the operation procedure of the user interface device of the embodiment.
FIG. 27 is a diagram showing still another configuration example of the interface device according to the second embodiment of the present invention;
FIG. 28 is an exemplary flowchart illustrating an example of the operation procedure of the user interface device of the embodiment.
FIG. 29 is an exemplary flowchart illustrating an example of the operation procedure of the user interface device of the embodiment.
[Explanation of symbols]
10. Image input unit
11 image storage unit
12 ... Shape interpreter
13. Interpretation rule storage
14 ... Presentation part
15 Cursor switching unit
101: Light emitting unit
102: reflected light extraction unit
103 timing control unit
20 ... Display device
8 ... housing
30 ... Recognition engine control unit
31, 34: Active list storage unit
32 Body control unit
33 ... Recognition engine storage
121: closest point vertical engine
122: nearest point horizontal engine
142 ... Cursor engine
143… Selection engine
144: Double click engine
151 to 15N: Recognition engine

Claims

A user interface device including a first device that inputs a reflection image as an input image and a second device that performs input for operating an operation target on a display screen based on a recognition processing result of the input image,
The second device comprises:
A means for designating a predetermined recognition method for the operation target for the first device;
The first device comprises:
Means for performing a recognition process corresponding to the recognition method specified by the second device on the input image, and returning a recognition result obtained as a result and the input image to the second device;
A user interface device comprising:

A user interface device including a first device that inputs a reflection image as an input image and a second device that performs input for operating a plurality of operation targets on a display screen based on a recognition processing result of the input image. hand,
The second device comprises:
Means for selecting one of a plurality of operation targets displayed on the display screen based on the input image;
For the first device, further comprising means for specifying a predetermined recognition method for the selected operation target,
The first device comprises:
A recognition process corresponding to a recognition method specified by the second device is performed on an input image input after the operation target is selected, and a recognition result obtained as a result and the input image are compared with the second image. Means for returning to the device of
A user interface device comprising:

The first device further includes means for requesting the second device to transfer the recognition engine when the first device does not hold a recognition engine required for performing the recognition processing ,
3. The user interface device according to claim 1, wherein the second device further includes a unit configured to transfer a recognition engine requested by the first device to the first device.

Each of the first device and the second device holds a plurality of recognition engines that perform different recognition processes, and the same recognition engine is held by the first device and the second device. ,
Each of the first device and the second device,
In one of the first device and the second device, when the same recognition engine is activated, the same recognition engine is deactivated with respect to the other device. Means for notifying,
Means for deactivating the same recognition engine when notified from the one device to deactivate the same recognition engine;
The user interface device according to claim 1 or 2, further comprising:

An instruction input method in a user interface device including a first device for inputting a reflection image as an input image and a second device for inputting an operation target on a display screen based on a recognition processing result of the input image And
When the second device specifies a predetermined recognition method for the operation target to the first device, the first device specifies the input image from the second device for the input image. Performing a recognition process corresponding to the recognized recognition method, and returning the obtained recognition result and the input image to the second device.

Instructions in a user interface device including a first device that inputs a reflection image as an input image and a second device that performs input for operating a plurality of operation targets on a display screen based on a recognition processing result of the input image An input method,
The second device selects one of a plurality of operation objects displayed on the display screen based on the input image, and executes a recognition method predetermined for the selected operation object. When specified for the first device, the first device performs recognition processing corresponding to the recognition method specified by the second device on an input image input after the operation target is selected. Returning the recognition result and the input image obtained as a result to the second device.