JP2004301869A

JP2004301869A - Voice output device and pointing device

Info

Publication number: JP2004301869A
Application number: JP2003091122A
Authority: JP
Inventors: Takuya Shinkawa; 拓也新川
Original assignee: Individual
Current assignee: Individual
Priority date: 2003-03-28
Filing date: 2003-03-28
Publication date: 2004-10-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device by which the intention is easily transmitted to surrounding people, for many of seriously physically handicapped persons who can move their eyeballs. <P>SOLUTION: A voice output device is provided with an eye camera, a visual field camera, a line of sight analysis part, an operation analysis part, a display part, and a voice output part, wherein the eye camera detects the line of sight of a user, the visual field camera picks up the image of at least either the user or the display part, and the line of sight analysis part receives output information of the eye camera and the visual field camera and detects a gazing area on the display part which the user is gazing at. When the operation analysis part detects a decisive operation of voice output by the user, seriously handicapped persons can easily inform surrounding people of their intention with the voice output device for allowing the voice output part to output the voice corresponding to the observed area. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、肢体不自由者などのような使用者が視線によりその意思を表現するための音声出力装置等に関する。
【０００２】
【従来の技術】
第一の従来技術は、重度の肢体不自由者がアイカメラを用いて、表示画面上のキーボードのキーを選択して、文字入力するワープロやパソコンが知られている。
【０００３】
また、第二の従来技術は、音声が役立たないような騒音下で、プラントの運転指令を行うポインティングデバイス装置および入力方法が知られている（例えば、特許文献１参照）。
【０００４】
【特許文献１】
特開平７−２５３８４３号公報（第２−３頁、第１図）
【０００５】
【発明が解決しようとする課題】
しかしながら、第一の従来技術では、使用者の意思を伝えるまで、時間のかかる入力操作が必要であった。
【０００６】
また、第二の従来技術では、使用者の意思を伝えるのに、やはり複雑な手順を必要とし、かつ、音声による伝達ができない。
【０００７】
本発明は、重度の肢体不自由者で、目を動かすことが出来る人が、気軽に使用でき、周囲の人に意思を伝え、ヒューマン・コミュニケーションがとりやすいポインティングデバイスおよび入力方法を提供することを目的とする。
【０００８】
【課題を解決するための手段】
上記の課題を解決するために、本発明のポインティングデバイスおよび入力方法および音声出力装置は、以下のような手段を採用する。
【０００９】
（１）あらかじめ設定された注視され得る１つまたは複数の領域を設け、使用者の視線が注視した領域を検出し、当該領域に関連する情報に基づいて音声出力する、ポインティングデバイスおよび入力方法および音声出力装置とする。
【００１０】
（２）アイカメラ、視野カメラ、視線解析部、操作解析部、表示部、音声出力部を備え、アイカメラは、使用者の視線を検出し、視野カメラは、使用者と表示部の少なくとも一方を撮像し、視線解析部は、アイカメラと視野カメラの出力情報を受けて、使用者が注視する表示部の注視領域を検出し、操作解析部が使用者の音声出力決定の操作を検出すると、前記注視領域に対応して音声出力するようにした、ポインティングデバイスおよび入力方法および音声出力装置である。音声出力する情報は文章でもよい。
【００１１】
（３）音声出力決定のための領域を設け、上記音声出力決定のための領域に対する注視を検出することにより音声出力したり、使用者の瞬きを検出し、音声出力決定を行ったりする。
【００１２】
（４）使用者が装着する筐体に、アイカメラと視野カメラを設置してもよく、また、使用者以外の位置に、アイカメラと視野カメラを設置してもよい。
【００１３】
（５）注視した領域の色、形態を注視決定に伴い変更するようにしてもよい。使用者は、自分の操作が意図どおりかどうか知ることができる。
【００１４】
（６）視線解析部は、アイカメラおよび視野カメラからの出力情報を受けて座標変換により視線が領域内にあるかどうかを判定して、注視状態かどうかを判定するようにする。受ける出力情報は、座標変換手法により変わるが、位置基準情報、視線情報などがある。
【００１５】
（７）表示部は、表示記憶部から注視されうる領域の表示用情報の供給を受けて領域を表示し、視線解析部は、領域の座標情報を受けて解析を行うようにしてもよい。
【００１６】
【発明の実施の形態】
以下、本発明の音声出力装置等の実施形態について図面を参照して説明する。なお、実施の形態において同じ符号を付した構成要素は同様の動作を行うので、再度の説明を省略する場合がある。
【００１７】
（実施の形態１）
【００１８】
図１は、本発明の音声出力装置用のポインティングデバイスおよび入力方法を説明する外観図を示す。図１において、表示部１０はテレビやパソコンの表示装置のようなもので、使用者が意思を伝えたい内容として３つの項目が、依頼、返事、挨拶の３領域分表示されている。その下には、Ａ、Ｂ、Ｃの領域が表示されており、主領域である３領域のそれぞれについて、さらに使用者の意思伝達の選択肢領域になっている。使用者１１は、眼鏡１２を装着しており、表示部１０に向かい合わせの位置にいる。眼鏡１２には、後述するアイカメラ２１と視野カメラ２２が組み込まれており、表示部１０上の各領域の内、使用者が注視している領域を検知できる。いずれかの領域を注視している状態で、後述する所定の動作を行うと、注視している対象を選択できる。
【００１９】
図２は、本発明のポインティングデバイスを備えた音声出力装置のシステムのブロック図である。図２において、表示部１０は、表示記憶部２５に格納された表示内容を表す表示用情報と表示位置を示す座標情報に基づき、図１に示したような画面を表示する。アイカメラ２１と視野カメラ２２は、図１の眼鏡１２に組み込まれている。アイカメラ２１と視野カメラ２２の出力情報は、視線解析部２３に送られる。視野カメラ２２は、表示部１０を撮像し、表示部１０からの光情報をもとに、眼鏡１２と表示部１０の枠や表示内容との位置関係を表す位置基準情報を視線解析部２３に送る。アイカメラ２１は、使用者１１の視線方向を表す視線情報を視線解析部２３に送る。視線解析部２３は、位置基準情報と視線情報とを解析し、表示部１０上のどの領域を注視しているかを表す注視領域情報を生成し、出力選択部２６に送る。また、アイカメラ２１は、出力情報として、使用者１１の目の動作情報を操作解析部２４に送っている。操作解析部２４は、目の動作情報を解析しており、使用者１１が瞬きをすると、瞬きの動作を検出し、選択指示情報を出力選択部２６に送る。出力選択部２６は、選択指示情報を受けると、注視領域情報と表示情報から出力選択情報を生成する。出力記憶部２７には、表示情報に対応した音声情報が格納されている。出力選択部２６は、生成した出力選択情報に対応した音声情報を出力記憶部２７から読み出して出力させ、音声信号としてスピーカから出力させる。
【００２０】
位置基準情報と視線情報は、いくつかの方法での表現が可能である。その一例について説明する。使用者１１が表示部１０上のどの領域を見ているかを検知するためには、表示部１０と眼鏡１２の位置関係、および、眼鏡１２を基準にした使用者１１の視線の方向を知る必要がある。視野カメラ２２は、使用者１１が、頭を動かして、表示部１０と眼鏡１２の位置関係がずれた場合、そのずれを補正することを第１の目的として設けられる。視野カメラ２２は、眼鏡１２の正面方向を中心方向、すなわち中心軸として前方を撮像する。撮像画面エリアの中心が、正面方向に当たる。視野カメラ２２の撮像範囲をあらわす視野角は、あらかじめ分っている。視野角と、撮像した物体の撮像画面エリア内での座標から、正面方向、すなわち中心軸を基準としたその物体の上下左右の方向角度が算出できる。よって、撮像画面エリア内に写った、表示部１０上の各領域の画像を解析すれば、各領域の方向角度が算出できる。図１に示したような長方形の１つの領域の場合、その４つの頂点の方向角度で決まる角錐領域内に、視線があれば、その領域を見ていることになる。この場合、各領域の頂点の４つの方向角度がその領域の位置基準情報となる。
【００２１】
アイカメラ２１は、眼鏡１２の中心軸を基準として、視線の方向角度を視線情報として出力する。視線解析部２３は、受け取った各領域の頂点の方向角度と視線の方向角度とを比較し、各領域の方向角度により形成される角錐領域の中に視線の方向角度を含むかどうかを調べ、視線の方向角度を含む領域を注視領域として選択し、その領域記号を出力する。
【００２２】
つぎに、表示部１０の各領域と音声出力について説明する。「挨拶」を選択すると、「Ａ」により「こんにちは」を、「Ｂ」により「さようなら」を、「Ｃ」により「今度は何時お会いできますか」を、それぞれ選択できる。「返事」を選択すると、「Ａ」により「はい」を、「Ｂ」により「いいえ」を、「Ｃ」により「わかりません」を選択できる。「依頼」を選択すると、「Ａ」により「お茶を入れてください」を、「Ｂ」により「本を読んでください」を、「Ｃ」により「トイレをお願いします」を選択できる。領域記号を「挨拶」＝Ｘ、「返事」＝Ｙ、「依頼」＝Ｚとする。出力記憶部２７には、上記の各文章が音声信号情報として記憶される。ＸＡに対応するアドレスには、「こんにちは」、ＸＢに対応するアドレスには、「さようなら」、ＸＣに対応するアドレスには、「今度は何時お会いできますか」というように記憶される。
【００２３】
視線解析部２３が、領域記号Ｘ、Ｙ、Ｚのいずれか、たとえば、Ｘを出力している状態で、操作解析部２４が、瞬きの動作を検出し、選択指示情報を出力選択部２６に送ると、出力選択部２６は、Ｘを記憶する。つぎに、視線解析部２３が、領域記号Ａ、Ｂ、Ｃのいずれか、たとえば、Ａを出力している状態で、操作解析部２４が瞬きの動作を検出し、選択指示情報を出力選択部２６に送ると、出力選択部２６は、Ｘに続いてＡを記憶する。ＸＡが決まったので、出力選択部２６は、ＸＡに対応するアドレスを出力記憶部２７に与える。出力記憶部２７は、ＸＡ、すなわち、「こんにちは」の音声信号情報を出力する。音声出力部２８は、音声信号情報を音声信号に変換し、スピーカより音声を発生する。出力記憶部２７が、ＺＢの順に領域記号を受け取ると、「本を読んでください」の音声信号が発せられることになる。出力選択部２６において、Ｘ，Ｙ、Ｚのいずれかの後にＡ、Ｂ、Ｃのいずれかが記憶されない場合、例えば、ＡＢやＣＺなどとなった場合は、エラー信号として「ポン」というような音を出力音声として発生する。
【００２４】
上記、Ｘ，Ｙ、Ｚのいずれか、および、Ａ、Ｂ、Ｃのいずれかを選択した段階で、その記号を表示記憶部２５に送り、表示部１０の対応する領域の外観（色や形など）を変えて選択されたことを使用者１１に知らせるようにしてもよい。
【００２５】
なお、上記説明では、視野カメラ２２は、表示部１０を撮像し、表示部１０からの光情報をもとに、眼鏡１２と表示部１０の枠や表示内容との位置関係を表す位置基準情報を視線解析部２３に送り、アイカメラ２１は、使用者１１の視線方向を表す視線情報を視線解析部２３に送り、視線解析部２３は、位置基準情報と視線情報とを解析し、表示部１０上のどの領域を注視しているかを表す注視領域情報を生成し、出力選択部２６に送り、アイカメラ２１は、使用者１１の目の動作情報を操作解析部２４に送るようにしたが、上記各種情報の抽出は、視線解析部２３や操作解析部２４において行ってもよい。すなわち、視野カメラ２２やアイカメラ２１は、撮像した画像情報から位置基準情報や視線情報の解析や抽出を行わない形の情報を視線解析部２３に送り、解析や抽出を視線解析部２３において行うようにしてもよい。アイカメラ２１は、撮像した画面情報を操作解析部２４に送り、操作解析部２４が目の動きを解析するようにしてもよい。
【００２６】
さらに、本実施の形態における視線解析などの処理は、ソフトウェアで実現しても良い。そして、このソフトウェアをソフトウェアダウンロード等により配布しても良い。また、このソフトウェアをＣＤ−ＲＯＭなどの記録媒体に記録して流布しても良い。なお、このことは、本明細書における他の実施の形態においても該当する。なお、本実施の形態における情報処理装置を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、コンピュータに、アイカメラと視野カメラの出力情報を受け付けるステップと、使用者が注視する注視領域を検出するステップと、使用者の音声出力決定の操作を検出するステップと、使用者の音声出力決定の操作を検出した場合に注視領域に対応する音声を出力するステップを実行させるためのプログラム、である。
【００２７】
（実施の形態２）
【００２８】
上記実施の形態１では、基準軸に対する上下左右の方向角度により、各領域や視線を表現したが、以下、別の表現方法とそれに基づく処理について、図３を用いて説明する。
【００２９】
図３において、アイカメラ２１、視野カメラ２２、表示記憶部２５の出力情報は、視線解析部２３に入力される。視野カメラ２２は、視野内の画像情報を視線解析部２３に送る。視線解析部２３は、受け取った画像情報を解析し、表示部１０の枠の部分を画像解析により検出し、枠を画像情報上に配置する。表示記憶部２５は、表示している各領域の表示枠内での座標情報を視線解析部２３に送る。視線解析部２３は、表示記憶部２５から受け取った各領域の座標情報に基づき、各領域を上記画像情報上の枠の中に配置する。この配置とは、表示部１０の座標体系上の座標を視野の画像情報上の座標体系に座標変換する処理であり、この座標変換により画像情報中での各領域の座標値が得られる。つぎに、アイカメラ２１から受け取った視線情報に基づき、画像情報上に視線位置を配置する。この配置とは、アイカメラ２１が視線検出したときの座標体系、たとえば、円錐座標体系から、視野の画像情報上の座標体系に座標変換する処理であり、この処理により、画像情報上での視線の位置の座標（視線座標）が得られる。つぎに、視線座標がどの領域座標エリアに含まれるか、を判定する。視線座標が含まれる領域が注視領域である。視線解析部２３は、注視領域の領域記号（注視領域を識別する情報）を出力選択部２６に出力する。操作部３４が操作選択情報を出力選択部２６に送ると、そのときの領域記号が選択され、出力選択部２６は、対応するアドレスを出力記憶部２７に与える。出力記憶部２７は上記アドレスに格納された音声信号情報を読み出して出力する。音声出力部２８は上記音声信号情報音声信号に変換し、スピーカより音声を発生する。
【００３０】
なお、アイカメラ２１からの視線情報を視野の画像情報上に配置するには、視野カメラ２２から、視野内の画像情報とともに視野角の情報が、視線解析部２３に与えられていなければならない。
【００３１】
（実施の形態３）
【００３２】
上記説明では、領域の選択決定は、瞬きの動作で行うようにした。瞬きの検出は、目を瞑ったときには、アイカメラ２１において、眼球や瞳孔の検出がない状態になり、通常の視線情報が途切れる。目が自然に行う瞬きを選択決定の動作と区別する必要がある。このためには、操作解析部２４は、受け取る視線情報の時間長を計数しておき、自然な瞬きの時間よりも十分長い時間だけ目を瞑ったとき、あるいは、目を瞑る時間を短、長、短などの順序により、使用者１１が符号化し、この符号パターンを受信したときに、選択決定と判定するようにすればよい。
【００３３】
表示部１０の上に、読点の「。」印や、決定ボタンを表示しておき、視線がこれらを選択した場合に、音声出力を出すようにしてもよい。瞬きができない肢体不自由者には、このような入力方法が有効になる。ただし、意図せずに視線が決定ボタンを横切る場合もあるので、視線が決定ボタンを注視している時間が一定以上の場合や、視線が決定ボタン上を時間的に、あるいは、位置的に一定の軌跡を描いた場合、などに選択決定と判定するような対策を行ってもよい。
【００３４】
また、各領域を広めにしておき、領域内で視線が円を描いた場合、選択決定としてもよい。このためには、各領域の表示画面上に円の図形を表示しておき、使用者１１が円状の視線移動をしやすくしてもよい。
【００３５】
また、選択決定以前の状態でも、視線が表示の上記領域上にある場合は、その領域情報を、表示記憶部２５を介して、または、表示部１０への図示しない経路を介して、視線解析部２３から表示部１０に送り、表示部１０は、視線が当てられた領域の色などを変更して、使用者１１が今どの領域を見ているかが分るようにしてもよい。
【００３６】
また、視線解析部２３が、視線があたっている位置の表示部１０上の座標情報を表示部１０に与えるようにして、マウスのアイコンのような視線アイコンを表示部１０上に表示するようにしてもよい。
【００３７】
（実施の形態４）
【００３８】
上記各実施の形態では、アイカメラ２１、視野カメラ２２を眼鏡１２に組み込んだが、肢体不自由者のベッドなどに設置してもよい。近赤外ＬＥＤを眼球に当ててＣＣＤカメラによって眼球の動きを計測し，あらかじめ分っている表示部１０の室内での配置位置情報と、表示部１０上での各領域の座標情報とにより、注視領域の検出を、視線解析部２３が行ってもよい。眼鏡を装着していなくともよいので、使用者は苦痛なく長時間使用できる。このためには、視野カメラ２２は、アイカメラ２１、使用者１１、表示部１０などの位置関係を測定する必要がある。アイカメラ２１は、使用者１１の眼球から視線の方向情報を得られるが、その視線が、アイカメラ２１からどのくらい離れた位置にあるかは分らない。アイカメラ２１と視野カメラ２２の位置関係は分っているものとして、使用者１１の頭部の寸法情報、視野カメラ２２自身の視野角、画面上の頭部の大きさ、及び目の位置から、使用者１１の目とアイカメラ２１との位置関係を計算することにより、視線の空間座標上の位置を決定する。なお、空間座標とは、視野カメラ２２を基準として周囲の物体の位置を表現する３次元座標体系である。アイカメラ２１と表示部１０との位置関係情報と、表示部１０上の各領域の位置情報とにより、各領域の空間座標上の位置を計算し、視線の空間座標上の位置と各領域の空間座標上の位置とを比較することにより、注視領域の有無とその領域符号を割り出すようにすればよい。
【００３９】
視野カメラ２２を立体カメラとして、使用者１１の頭部、眼球、表示部１０などの対象物の空間座標内の位置を測定できるようにしてもよい。
【００４０】
アイカメラ２１や視野カメラ２２を表示部１０と一体構造としてもよい。この場合は、それらの互いの位置関係は一定であり、あらかじめ分っているので、その位置関係の情報を視線解析部２３に格納しておき、空間座標の計算に使用すればよい。
【００４１】
（その他の実施の形態と補足）
【００４２】
視線解析部２３が行う各領域に対する注視の判定は、上記実施の形態において説明した手法に限らず、３次元空間座標、円錐座標、角錐座標、それらの座標体系を歪ませた座標空間や、それらを投影した２次元空間などに、視線や各領域を配置する演算により行うことができる。
【００４３】
使用者が、指などを動かせる場合は、図２の操作解析部２４の代わりに、図３の操作部３４、たとえば、操作ボタンを押して領域の選択を行ってもよい。
【００４４】
アイカメラ２１、視野カメラ２２と視線解析部２３、操作解析部２４の結合を、無線通信、赤外通信などの有線でない通信方法によって行えば、使用者に与える苦痛がより少なくなる。
【００４５】
アイカメラ２１、視野カメラ２２は、通常は、別々のカメラでよいが、両方の機能を一つのカメラで達成するようにしてもよい。撮像した画面情報の眼球画像の細部から視線情報を解析し、表示部１０の画像、または、使用者の頭部の画像から視野角内の表示部１０の位置や各領域の位置、または、頭部の位置を解析するようにすればよい。
【００４６】
表示部１０に表示する各領域の画像は、図１に示したような文字以外に、絵、アイコンなどでもよい。表示部１０は、電子的表示装置として説明したが、紙などの上に印刷されたものでもよい。この場合は、領域の位置や寸法は固定であるので、その情報をあらかじめ視線解析部２３に格納しておけば、上記説明した位置や座標の計算を同様にして行うことができる。また、表示部１０は、平面以外に、立体物など種々の形態を適用してもよい。部屋にある家具や照明器具などを対象の領域としてもよい。この場合は、家具や照明器具それぞれの位置を視線解析部２３に事前に入力する必要がある。
【００４７】
図１では、主領域と選択肢領域とを表示し、最終的な意思を２つのステップで選択するようにしたが、選択肢領域には、Ａ、Ｂ、Ｃなどの記号の代わりに、発生する音声の文章や音声の内容を表す単語を表示してもよい。また、主領域と選択肢領域の２段階以外の表示による選択方法でもよい。
【００４８】
音声出力は、使用者１１の近傍で出力する場合もあるが、離れた場所、たとえば、介護者の居室、看護婦の詰め所、家族のいる台所などで出力するようにしてもよい。
【発明の効果】
以上のように、本発明によれば、
【００４９】
（１）肢体不自由者を介護する人が肢体不自由者から離れていれも、肢体不自由者の能動的な要求が分かる。
【００５０】
（２）肢体不自由者にとって、急を要することが多い、例えば、トイレや痛みを訴えるなどの場合、簡単な指示で、知らせることができる。
【図面の簡単な説明】
【図１】本発明の音声出力装置用のポインティングデバイスおよび入力方法を説明する使用外観図
【図２】本発明の一実施形態による音声出力装置のブロック図
【図３】本発明の他の実施形態による音声出力装置のブロック図
【符号の説明】
１０表示部
２１アイカメラ
２２視野カメラ
２３視線解析部
２４操作解析部
２５表示記憶部
２６出力選択部
２７出力記憶部
３４操作部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an audio output device and the like for allowing a user such as a physically handicapped person to express his / her intention by eyes.
[0002]
[Prior art]
As a first prior art, a word processor or a personal computer in which a severely disabled person selects a key on a keyboard on a display screen and inputs characters using an eye camera is known.
[0003]
Further, as a second conventional technique, a pointing device apparatus and an input method for issuing a plant operation command under noise in which voice is useless are known (for example, see Patent Document 1).
[0004]
[Patent Document 1]
JP-A-7-253843 (pages 2-3, FIG. 1)
[0005]
[Problems to be solved by the invention]
However, in the first prior art, a time-consuming input operation was required until the user's intention was communicated.
[0006]
Further, in the second conventional technique, a complicated procedure is still required to communicate a user's intention, and communication by voice cannot be performed.
[0007]
The present invention is to provide a pointing device and an input method that can be used easily by a person with severe physical disabilities and who can move their eyes, communicates intentions to surrounding people, and facilitates human communication. Aim.
[0008]
[Means for Solving the Problems]
In order to solve the above-described problems, a pointing device, an input method, and an audio output device of the present invention employ the following means.
[0009]
(1) A pointing device, an input method, and an input method, in which one or a plurality of regions that can be watched in advance are provided, a region watched by the user's line of sight is detected, and voice is output based on information related to the region. An audio output device.
[0010]
(2) An eye camera, a field camera, a line-of-sight analysis unit, an operation analysis unit, a display unit, and an audio output unit are provided. The eye camera detects the line of sight of the user, and the field camera is at least one of the user and the display unit. The gaze analysis unit receives the output information of the eye camera and the visual field camera, detects the gaze area of the display unit that the user gazes at, and detects the user's voice output determination operation by the operation analysis unit. , A pointing device, an input method, and an audio output device configured to output audio corresponding to the gaze area. The information to be output as voice may be text.
[0011]
(3) A region for determining a voice output is provided, and a voice is output by detecting gaze at the region for determining a voice output, or a blink of a user is detected to determine a voice output.
[0012]
(4) An eye camera and a visual field camera may be installed in a housing worn by a user, or an eye camera and a visual field camera may be installed in positions other than the user.
[0013]
(5) The color and form of the gazed area may be changed according to the gazing decision. The user can know whether his / her operation is as intended.
[0014]
(6) The line-of-sight analysis unit receives output information from the eye camera and the field-of-view camera, determines whether or not the line of sight is within the area by coordinate conversion, and determines whether or not the user is in a gaze state. The received output information varies depending on the coordinate transformation method, and includes position reference information, line-of-sight information, and the like.
[0015]
(7) The display unit may receive the display information of the area that can be watched from the display storage unit and display the area, and the line-of-sight analysis unit may perform the analysis by receiving the coordinate information of the area.
[0016]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of a sound output device and the like of the present invention will be described with reference to the drawings. Note that components denoted by the same reference numerals in the embodiments perform the same operation, and thus the description thereof may not be repeated.
[0017]
(Embodiment 1)
[0018]
FIG. 1 is an external view illustrating a pointing device and an input method for an audio output device according to the present invention. In FIG. 1, a display unit 10 is like a display device of a television or a personal computer, and three items are displayed in three areas of request, reply, and greeting as contents that the user wants to convey. Areas A, B, and C are displayed below the area, and each of the three areas, which are main areas, is further an option area for communication of a user's intention. The user 11 is wearing glasses 12 and is at a position facing the display unit 10. The eyeglasses 12 incorporate an eye camera 21 and a field-of-view camera 22, which will be described later, and can detect an area being watched by the user among the areas on the display unit 10. When a predetermined operation described later is performed in a state in which one of the areas is being watched, a watched object can be selected.
[0019]
FIG. 2 is a block diagram of a system of an audio output device including the pointing device of the present invention. 2, the display unit 10 displays a screen as shown in FIG. 1 based on display information indicating display contents stored in the display storage unit 25 and coordinate information indicating a display position. The eye camera 21 and the visual field camera 22 are incorporated in the glasses 12 of FIG. Output information of the eye camera 21 and the visual field camera 22 is sent to the line-of-sight analysis unit 23. The field-of-view camera 22 captures an image of the display unit 10, and based on the optical information from the display unit 10, provides the line-of-sight analysis unit 23 with position reference information indicating the positional relationship between the glasses 12 and the frame or display content of the display unit 10. send. The eye camera 21 sends gaze information indicating the gaze direction of the user 11 to the gaze analysis unit 23. The gaze analysis unit 23 analyzes the position reference information and the gaze information, generates gaze area information indicating which area on the display unit 10 is gazed, and sends the gaze area information to the output selection unit 26. In addition, the eye camera 21 sends the operation information of the eyes of the user 11 to the operation analysis unit 24 as output information. The operation analysis unit 24 analyzes the eye movement information. When the user 11 blinks, the operation analysis unit 24 detects the blinking operation and sends selection instruction information to the output selection unit 26. Upon receiving the selection instruction information, the output selection unit 26 generates output selection information from the gaze area information and the display information. The output storage unit 27 stores audio information corresponding to the display information. The output selection unit 26 reads and outputs audio information corresponding to the generated output selection information from the output storage unit 27, and outputs the audio information as an audio signal from the speaker.
[0020]
The position reference information and the line-of-sight information can be expressed in several ways. An example will be described. In order to detect which area on the display unit 10 the user 11 is looking at, it is necessary to know the positional relationship between the display unit 10 and the glasses 12 and the direction of the line of sight of the user 11 with respect to the glasses 12. There is. The field camera 22 is provided for the first purpose to correct the positional relationship between the display unit 10 and the glasses 12 when the user 11 moves his or her head to shift the positional relationship. The field camera 22 captures an image of the front with the front direction of the glasses 12 as the center direction, that is, the center axis. The center of the imaging screen area corresponds to the front direction. The viewing angle representing the imaging range of the viewing camera 22 is known in advance. From the viewing angle and the coordinates of the imaged object in the imaging screen area, the vertical, horizontal, and vertical direction angles of the object with respect to the front direction, that is, the center axis, can be calculated. Therefore, by analyzing the image of each area on the display unit 10 captured in the imaging screen area, the direction angle of each area can be calculated. In the case of one rectangular area as shown in FIG. 1, if there is a line of sight in a pyramid area determined by the direction angles of the four vertices, the user is looking at that area. In this case, the four directional angles of the vertices of each region serve as position reference information for that region.
[0021]
The eye camera 21 outputs the direction angle of the line of sight as the line of sight information with reference to the center axis of the glasses 12. The line-of-sight analysis unit 23 compares the direction angle of the vertex of each region and the direction angle of the line of sight received, and checks whether or not the pyramid region formed by the direction angle of each region includes the direction angle of the line of sight, A region including the direction angle of the line of sight is selected as a gaze region, and the region symbol is output.
[0022]
Next, each area of the display unit 10 and audio output will be described. When you select the "greeting", by the "A" and "Hello", the "B" by the "good-bye" and "Do you now Deki see when you" by the "C", can each selection. When "reply" is selected, "Yes" can be selected by "A", "No" by "B", and "I do not understand" by "C". When "Request" is selected, "Please put tea" can be selected by "A", "Please read a book" by "B", and "Please give me a toilet" by "C". The area symbols are “greeting” = X, “reply” = Y, and “request” = Z. The output storage unit 27 stores each of the above sentences as audio signal information. The address corresponding to XA, "Hello", the address corresponding to the XB, "Goodbye", the address corresponding to the XC, is stored as "Do you now Deki see when you".
[0023]
In a state where the line-of-sight analysis unit 23 is outputting any of the region symbols X, Y, Z, for example, X, the operation analysis unit 24 detects the blinking operation and sends the selection instruction information to the output selection unit 26. Then, the output selection unit 26 stores X. Next, in a state where the line-of-sight analysis unit 23 is outputting one of the area symbols A, B, and C, for example, A, the operation analysis unit 24 detects a blinking operation and outputs selection instruction information to the output selection unit. When the output is transmitted to A, the output selection unit 26 stores A following X. Since XA has been determined, the output selection unit 26 gives the address corresponding to XA to the output storage unit 27. Output storage section 27, XA, i.e., outputs the audio signal information of "Hello". The audio output unit 28 converts the audio signal information into an audio signal and generates a sound from a speaker. When the output storage unit 27 receives the area symbols in the order of ZB, an audio signal of “Please read a book” is issued. When any of A, B, and C is not stored after any of X, Y, and Z in the output selection unit 26, for example, when the output becomes AB or CZ, an error signal such as “pong” is used. Generates sound as output sound.
[0024]
When any one of X, Y, and Z and any of A, B, and C are selected, the symbol is sent to the display storage unit 25, and the appearance (color and shape) of the corresponding area of the display unit 10 is displayed. And the like may be changed to notify the user 11 of the selection.
[0025]
In the above description, the visual field camera 22 captures an image of the display unit 10 and, based on the light information from the display unit 10, the position reference information indicating the positional relationship between the glasses 12 and the frame or display content of the display unit 10. To the gaze analysis unit 23, the eye camera 21 sends gaze information indicating the gaze direction of the user 11 to the gaze analysis unit 23, and the gaze analysis unit 23 analyzes the position reference information and the gaze information, and displays the gaze information on the display unit. Gaze area information indicating which area on the camera 10 is being watched is generated and sent to the output selection unit 26, and the eye camera 21 sends the eye movement information of the user 11 to the operation analysis unit 24. The extraction of the various types of information may be performed by the visual line analysis unit 23 or the operation analysis unit 24. In other words, the visual field camera 22 and the eye camera 21 send information that does not analyze or extract the position reference information or the line-of-sight information from the captured image information to the line-of-sight analyzing unit 23, and the line-of-sight analyzing unit 23 performs the analysis and the extraction. You may do so. The eye camera 21 may send the captured screen information to the operation analysis unit 24, and the operation analysis unit 24 may analyze the eye movement.
[0026]
Furthermore, processing such as eye gaze analysis in the present embodiment may be realized by software. Then, this software may be distributed by software download or the like. Further, the software may be recorded on a recording medium such as a CD-ROM and distributed. Note that this also applies to other embodiments in this specification. The software that implements the information processing apparatus according to the present embodiment is a program as described below. That is, the program includes a step of receiving, by the computer, output information of the eye camera and the field-of-view camera, a step of detecting a gaze area to be watched by the user, a step of detecting an operation of the user to determine the voice output, And a program for executing a step of outputting a voice corresponding to the gaze area when an operation of determining a voice output by the user is detected.
[0027]
(Embodiment 2)
[0028]
In the first embodiment, each area and the line of sight are expressed by the vertical and horizontal directions with respect to the reference axis. Hereinafter, another expression method and processing based thereon will be described with reference to FIG.
[0029]
In FIG. 3, output information of the eye camera 21, the visual field camera 22, and the display storage unit 25 is input to the visual line analysis unit 23. The visual field camera 22 sends image information in the visual field to the visual line analysis unit 23. The line-of-sight analysis unit 23 analyzes the received image information, detects a frame portion of the display unit 10 by image analysis, and arranges the frame on the image information. The display storage unit 25 sends the coordinate information of the displayed area within the display frame to the visual line analysis unit 23. The gaze analysis unit 23 arranges each area in the frame on the image information based on the coordinate information of each area received from the display storage unit 25. This arrangement is a process of converting the coordinates on the coordinate system of the display unit 10 into the coordinate system on the image information of the field of view. By this coordinate conversion, the coordinate values of each area in the image information are obtained. Next, the gaze position is arranged on the image information based on the gaze information received from the eye camera 21. This arrangement is a process of performing coordinate conversion from a coordinate system when the eye camera 21 detects a line of sight, for example, a cone coordinate system, to a coordinate system on image information of a visual field. Are obtained (line-of-sight coordinates). Next, it is determined in which area coordinate area the line-of-sight coordinates are included. A region including the line-of-sight coordinates is a gaze region. The gaze analysis unit 23 outputs the region symbol of the gaze region (information for identifying the gaze region) to the output selection unit 26. When the operation unit 34 sends the operation selection information to the output selection unit 26, the area symbol at that time is selected, and the output selection unit 26 gives the corresponding address to the output storage unit 27. The output storage unit 27 reads and outputs the audio signal information stored at the address. The sound output unit 28 converts the sound signal information into the sound signal and generates sound from a speaker.
[0030]
In addition, in order to arrange the line-of-sight information from the eye camera 21 on the image information of the field of view, the field-of-view camera 22 must provide the line-of-sight information together with the image information in the field of view.
[0031]
(Embodiment 3)
[0032]
In the above description, the selection of the area is determined by the blinking operation. In the blink detection, when the eyes are closed, the eye camera 21 does not detect an eyeball or a pupil, and normal line-of-sight information is interrupted. It is necessary to distinguish the blink that the eyes naturally perform from the action of the selection decision. For this purpose, the operation analysis unit 24 counts the time length of the received line-of-sight information, and closes the eyes for a sufficiently long time than the time of natural blinking, or shortens the time for closing the eyes to a short or long time. , Short, etc., the user 11 encodes the data, and when this code pattern is received, the user 11 may determine the selection.
[0033]
A mark “.” Of a reading point and a decision button may be displayed on the display unit 10, and an audio output may be output when the line of sight selects these. Such an input method is effective for a physically handicapped person who cannot blink. However, since the line of sight may inadvertently cross the enter button, the time when the line of sight gazes at the enter button is longer than a certain value, or the line of sight is temporally or positionally fixed on the enter button. When the trajectory is drawn, a countermeasure may be taken to determine the selection.
[0034]
Alternatively, each area may be made wider, and when the line of sight draws a circle in the area, the selection may be determined. For this purpose, a circular figure may be displayed on the display screen of each area to make it easier for the user 11 to move the line of sight.
[0035]
Also, even if the line of sight is on the above-mentioned region of the display even before the selection is determined, the region information is stored in the line-of-sight analysis via the display storage unit 25 or a path (not shown) to the display unit 10. The display may be sent from the unit 23 to the display unit 10, and the display unit 10 may change the color or the like of the area on which the line of sight is directed so that the user 11 can see which area is currently being viewed.
[0036]
In addition, the line-of-sight analysis unit 23 provides the display unit 10 with coordinate information on the display unit 10 at the position where the line of sight is located, and displays a line-of-sight icon such as a mouse icon on the display unit 10. You may.
[0037]
(Embodiment 4)
[0038]
In each of the above embodiments, the eye camera 21 and the visual field camera 22 are incorporated in the eyeglasses 12, but they may be installed on a bed or the like of a physically handicapped person. The near-infrared LED is applied to the eyeball, the movement of the eyeball is measured by a CCD camera, and the position information of the display unit 10 in the room, which is known in advance, and the coordinate information of each area on the display unit 10, The gaze analysis unit 23 may detect the gaze area. Since it is not necessary to wear glasses, the user can use it for a long time without pain. For this purpose, the visual field camera 22 needs to measure the positional relationship between the eye camera 21, the user 11, the display unit 10, and the like. The eye camera 21 can obtain direction information of the line of sight from the eyeball of the user 11, but does not know how far the line of sight is from the eye camera 21. Assuming that the positional relationship between the eye camera 21 and the visual field camera 22 is known, it is determined from the dimensional information of the head of the user 11, the visual angle of the visual field camera 22 itself, the size of the head on the screen, and the position of the eyes. By calculating the positional relationship between the eyes of the user 11 and the eye camera 21, the position of the line of sight on the spatial coordinates is determined. Note that the spatial coordinates are a three-dimensional coordinate system that represents the positions of surrounding objects with respect to the visual field camera 22. Based on the positional relationship information between the eye camera 21 and the display unit 10 and the position information of each region on the display unit 10, the position of each region on the spatial coordinates is calculated, and the position of the line of sight on the spatial coordinates and the position of each region are calculated. The presence / absence of the gaze area and the area code thereof may be determined by comparing the position on the space coordinates.
[0039]
The field camera 22 may be configured as a stereoscopic camera so that the position of an object such as the head, the eyeball, or the display unit 10 of the user 11 in the space coordinates can be measured.
[0040]
The eye camera 21 and the field camera 22 may be integrated with the display unit 10. In this case, since their mutual positional relationship is constant and is known in advance, information on the positional relationship may be stored in the eye-gaze analyzing unit 23 and used for calculating spatial coordinates.
[0041]
(Other embodiments and supplements)
[0042]
The gaze determination performed by the line-of-sight analysis unit 23 on each region is not limited to the method described in the above embodiment, but may be a three-dimensional space coordinate, a cone coordinate, a pyramid coordinate, a coordinate space distorted in their coordinate system, Can be performed by an operation of arranging the line of sight and each area in a two-dimensional space or the like on which is projected.
[0043]
When the user can move a finger or the like, an area may be selected by pressing the operation unit 34 in FIG. 3, for example, an operation button, instead of the operation analysis unit 24 in FIG.
[0044]
If the eye camera 21, the field camera 22 and the line-of-sight analysis unit 23, and the operation analysis unit 24 are connected by a non-wired communication method such as wireless communication or infrared communication, the pain given to the user is further reduced.
[0045]
Normally, the eye camera 21 and the field camera 22 may be separate cameras, but both functions may be achieved by one camera. The eye-gaze information is analyzed from the details of the eyeball image of the captured screen information, and the position of the display unit 10 or the position of each region within the viewing angle or the position of the head or the head is analyzed from the image of the display unit 10 or the image of the head of the user. What is necessary is just to analyze the position of a part.
[0046]
The image of each area displayed on the display unit 10 may be a picture, an icon, or the like in addition to the characters shown in FIG. The display unit 10 has been described as an electronic display device, but may be printed on paper or the like. In this case, since the position and size of the area are fixed, if the information is stored in advance in the line-of-sight analysis unit 23, the above-described calculation of the position and coordinates can be performed in a similar manner. Further, the display unit 10 may apply various forms such as a three-dimensional object other than the plane. Furniture, lighting equipment, and the like in the room may be the target area. In this case, it is necessary to input the positions of the furniture and the lighting equipment to the eye-gaze analyzing unit 23 in advance.
[0047]
In FIG. 1, the main area and the option area are displayed, and the final intention is selected in two steps. However, instead of the symbols such as A, B, and C, the generated voice is displayed in the option area. May be displayed. Alternatively, a selection method using a display other than the two stages of the main area and the option area may be used.
[0048]
The voice output may be output in the vicinity of the user 11, but may be output in a distant place, for example, a caregiver's room, a nurse's station, a kitchen with a family, or the like.
【The invention's effect】
As described above, according to the present invention,
[0049]
(1) Even if the person who cares for the physically handicapped is away from the physically handicapped, the active demands of the handicapped can be understood.
[0050]
(2) When a person with physical disabilities often needs urgency, for example, complaining of a toilet or pain, it can be notified with simple instructions.
[Brief description of the drawings]
FIG. 1 is a perspective view illustrating a pointing device and an input method for an audio output device according to the present invention. FIG. 2 is a block diagram of an audio output device according to an embodiment of the present invention. FIG. Block diagram of audio output device by form [Description of reference numerals]
Reference Signs List 10 display unit 21 eye camera 22 field camera 23 line of sight analysis unit 24 operation analysis unit 25 display storage unit 26 output selection unit 27 output storage unit 34 operation unit

Claims

An audio output device provided with at least one area on a display, detecting an area gazed by a user's gaze, and outputting audio based on information related to the gazed area.

An eye camera, a visual field camera, a visual line analysis unit, an operation analysis unit, a display unit, an audio output device including an audio output unit,
The eye camera detects a user's line of sight,
The field-of-view camera images at least one of a user and the display unit,
The line-of-sight analysis unit receives the output information of the eye camera and the visual field camera, and detects a gazing area of the display unit where the user gazes,
An audio output device that outputs an audio corresponding to the gaze area when the operation analysis unit detects an operation of a user to determine an audio output.

The audio output device according to claim 1, wherein the information to be output as audio is information that constitutes a sentence.

The audio output device according to any one of claims 1 to 3, wherein an area for determining an audio output is provided, and an audio output is performed by detecting gaze at the area for determining the audio output.

The voice output device according to claim 1, wherein the voice output is determined by detecting a blink of a user.

The audio output device according to any one of claims 2 to 5, wherein the eye camera and the visual field camera are installed in a housing worn by a user.

The audio output device according to any one of claims 2 to 5, wherein the eye camera and the visual field camera are installed at a position distant from a user.

The audio output device according to any one of claims 1 to 7, wherein the appearance of the gazed area is changed according to the gazing decision.

The gaze analysis unit receives information from the eye camera and the visual field camera, determines whether or not the gaze is within an area by coordinate transformation, and determines whether or not a gaze state is present. Item 10. The audio output device according to any one of Items 8.

The display unit receives the display information of the area that can be watched from the display storage unit and displays the area, and the line-of-sight analysis unit receives the coordinate information of the area from the display storage unit and performs analysis. The audio output device according to claim 2.

The audio output device according to any one of claims 1 to 10, wherein the information to be output as audio is information for a disabled person to communicate.

An audio output method in which one or more areas are provided on a display, an area where a user's gaze is watched is detected, and a voice is output based on information related to the watched area.

A pointing device comprising an eye camera and a visual field camera constituting the audio output device according to claim 2.

On the computer,
Receiving the output information of the eye camera and the visual field camera;
Detecting a gaze area to be watched by the user;
Detecting a user's voice output determination operation;
A program for executing a step of outputting a voice corresponding to a gaze area when an operation of the user for determining a voice output is detected.