JP2004321621A

JP2004321621A - Method and program for puzzlement detection

Info

Publication number: JP2004321621A
Application number: JP2003122991A
Authority: JP
Inventors: Minoru Sasaki; 佐々木　　実; Kiyouko Sai; 京虎崔; Noriyuki Aoyama; 憲之青山; Yuko Taniguchi; 悠子谷口; Tadahiko Fukuda; 忠彦福田; Seitetsu Tanaka; 靖哲田中; Masashi Nakamura; 正士中村; Koji Yoshikawa; 浩司吉川
Original assignee: DENSAN SYSTEM CO Ltd; Softopia Japan Foundation
Current assignee: DENSAN SYSTEM CO Ltd; Softopia Japan Foundation
Priority date: 2003-04-25
Filing date: 2003-04-25
Publication date: 2004-11-18
Anticipated expiration: 2023-04-25
Also published as: JP4054713B2

Abstract

<P>PROBLEM TO BE SOLVED: To automatically detect whether an examinee is puzzled or not by means of merely observing his/her visual axis. <P>SOLUTION: This puzzlement detection system 1 presents some questions including unanswerable traps to an examinee on a display 8, and has him/her input the answers with a keyboard 9 or a mouse 10. When light sources P1 and P2 showing prescribed figure patterns are placed in the visual field of the examinee, the combined images given from these light sources on the corneal surface are picked up by an eye detection camera 12, so that the images can be analyzed by a computer 2 in order to detect the positions of the visual axis at prescribed time intervals. Based on the varied visual axis positions, the visual speed can be obtained, and a series of such data are given to produce the visual speed record data. The pattern matching is then carried out between the visual speed record data and the puzzlement patterns that have been learned beforehand by a neural network to detect the puzzlement. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、戸惑い検出方法及びプログラムに係り、詳しくは視線の観察のみで戸惑いの状態を自動検出する戸惑い検出方法に関するものである。
【０００２】
【従来の技術】
従来より、人間の感情などを外部観察により検出しようとする試みがなされていた。それは、例えばマン−マシンインタフェースの改善のために、ユーザが操作上戸惑いを感じる操作をピックアップして問題点を抽出するためなどに用いられる。その方法としては、顔の位置、腕の動作、音声、筋肉の変化、発汗、心拍数、血圧或いは表情など様々なデータを検出して、これらの検出データに基づいて被験者の感情を推測していた。だが、１種類だけの検出データでは推測の精度が上がらず、複数の検出データに基づいて感情を推測するものが多かった。
【０００３】
そこで、特許文献１に示すような視線情報解析装置などが提案された。この視線情報解析装置では、眼球運動検出装置によって眼球の運動を検出し、解析装置によって検出された眼球の時系列変化を周波数領域で解析し、画像入力部画像入力部から入力された表示される画像の内容を表示内容解析装置で解析し、両者を統合解析部で統合処理をすることにより、被験者の心理的な観察状態、画像に対する客観的な評価について信頼性の高いデータを得るものである（要約参照）。このような視線情報解析装置であれば、画像を評価しようとする被験者の意思決定過程を、眼球運動の時間軸上で見た挙動の変化と、提示される画像内容と眼球運動との相関付けから推定し、人の画像に対する客観的な評価結果を得ることができる。なお、特許文献２には、視線の位置の検出の方法の一例が開示されている。
【０００４】
【特許文献１】
特開平６−１６２号公報図１
【特許文献２】
特開平６−３１９７０１号公報
【０００５】
【発明が解決しようとする課題】
しかしながら、この視線情報解析装置では、画像内容が既知で、眼球の挙動が未知の場合、または眼球の挙動が既知で、画像内容が未知の場合、それぞれの未知量を予め記述された予測関数で推定することはできても、実際の戸惑いまでも検出できるものではないという問題があった。
【０００６】
上記課題を解決するため、本発明は、視線を観察するだけで、被験者の戸惑いを自動的に検出できる戸惑い検出方法を提供することを目的とする。
【０００７】
【課題を解決するための手段】
請求項１に係る戸惑い検出方法では、入力手段と表示手段とを備えたコンピュータが、所定時間間隔で視線検出手段により被験者の視線の位置を検出する視線位置検出のステップと、検出した視線の位置の変化を視線速度データとして生成する視線速度データ生成のステップと、所定個数の連続して生成された視線速度データを視線速度履歴データとして記憶する視線速度履歴データ記憶のステップと、前記視線速度履歴データを所定のパターンと比較することで、被験者が戸惑いの状態であるか否かを判断する戸惑い判定のステップとを実行することを要旨とする。
【０００８】
この構成に係る戸惑い検出方法では、視線位置検出のステップで検出した視線の位置を、視線速度データ生成のステップで速度データとして生成し、視線速度履歴データ記憶のステップで視線速度履歴データとして記憶する。この視線速度履歴データを戸惑い判定のステップで所定のパターンと比較することで被験者が戸惑いの状態であるか否かを判断することができるという効果がある。
【０００９】
請求項２に係る戸惑い検出方法では、請求項１に記載の戸惑い検出方法の構成に加え、前記視線位置検出のステップにおいて、所定時間間隔が、２００分の１秒以上、１５分の１以下であることを要旨とする。
【００１０】
この構成に係る戸惑い検出方法では、請求項１に記載の戸惑い検出方法の効果に加え、視線位置検出のステップにおける所定時間間隔を２００分の１秒以上、１５分の１以下とすることで、被験者の戸惑いの検出を実用的な時間の遅れ無しに、かつコンピュータの処理量を過多にせず処理することができるという効果がある。特に３０分の１秒においては、最も好ましい結果を得ることができた。
【００１１】
請求項３に係る戸惑い検出方法では、請求項１又は請求項２に記載の戸惑い検出方法の構成に加え、前記視線速度履歴データ記憶のステップにおいて、視線速度履歴データとして連続して検出される視線速度データの所定個数が少なくとも３０以上であることを要旨とする。
【００１２】
この構成に係る戸惑い検出方法では、請求項１又は請求項２に記載の戸惑い検出方法の効果に加え、視線速度履歴データとして連続して検出される視線速度データの所定個数を少なくとも３０以上とすることで、所定のパターンとの比較をするために必要な情報量を確保することができるという効果がある。
【００１３】
請求項４に係る戸惑い検出方法では、請求項１乃至請求項３のいずれか１項に記載の戸惑い検出方法の構成に加え、前記視線速度履歴データ記憶のステップにおいて、視線速度履歴データとして連続して検出される視線速度データの合計時間長が０．５秒以上５秒以下であることを要旨とする。
【００１４】
この構成に係る戸惑い検出方法では、請求項１乃至請求項３のいずれか１項に記載の戸惑い検出方法の効果に加え、前記視線速度履歴データ記憶のステップにおける視線速度履歴データとして連続して検出される視線速度データの合計時間長を０．５秒以上とすることで精度の信頼性を確保し、５秒以下とすることで処理速度の向上や処理の負担の軽減をはかることで、戸惑いの状態を判断するのに必要かつ十分な時間とすることができるという効果がある。
【００１５】
請求項５に係る戸惑い検出方法では、請求項１乃至請求項４のいずれか１項に記載の戸惑い検出方法の構成に加え、前記視線速度履歴データ記憶のステップにおいて、前記視線速度データ生成のステップで視線の速度データを生成する毎に前記視線速度履歴データを生成し記憶することを要旨とする。
【００１６】
この構成に係る戸惑い検出方法では、請求項１乃至請求項４のいずれか１項に記載の戸惑い検出方法の効果に加え、視線速度履歴データ記憶のステップにおける視線速度履歴データを生成し記憶するタイミングを視線速度データ生成のステップで視線速度データを生成する毎にすることで、検出時間の遅れを生じることなく精度良く戸惑いの状態を判断することができるという効果がある。
【００１７】
請求項６に係る戸惑い検出方法では、請求項１乃至請求項５のいずれか１項に記載の戸惑い検出方法の構成に加え、前記戸惑い判定のステップは、ニューラルネットワークを用いたパターンマッチングによることを要旨とする。
【００１８】
この構成に係る戸惑い検出方法では、請求項１乃至請求項５のいずれか１項に記載の戸惑い検出方法の効果に加え、戸惑い判定のステップを、ニューラルネットワークを用いたパターンマッチングによることで、非線形の複雑なパターンの視線速度履歴をもつ戸惑いの状態でも高速かつ正確に検出することができるという効果がある。
【００１９】
請求項７に係る戸惑い検出方法では、請求項６に記載の戸惑い検出方法の構成に加え、前記戸惑い判定のステップにおいて、前記ニューラルネットワークはフィードフォワード方式の階層型ニューラルネットワークであって、ノード関数がシグモイド関数を用いたことを要旨とする。
【００２０】
この構成に係る戸惑い検出方法では、請求項６に記載の戸惑い検出方法の効果に加え、ノード関数をシグモイド関数とすることで、戸惑い状態の検出精度を高めることができるという効果がある。
【００２１】
請求項８に係る戸惑い検出方法では、請求項７に記載の戸惑い検出方法の構成に加え、前記戸惑い判定のステップにおいて、比較される所定のパターンは、被験者に正解がない若しくは正解が重複する設問により解答を画面上で選択させるトラップを回答させる状態における前記視線速度履歴データに基づいて生成する学習のステップをさらに実行することを要旨とする。
【００２２】
この構成に係る戸惑い検出方法では、請求項７に記載の戸惑い検出方法の効果に加え、戸惑い判定のステップにおいて、比較される所定のパターンを、被験者に対して、正解がない若しくは正解が重複する設問により解答を画面上で選択させるトラップを回答させることで、直ちにかつ確実に被験者に戸惑いの状態を導出させ、この状態を学習することで比較される所定パターンの精度を高めることができるという効果がある。
【００２３】
請求項９に係る戸惑い検出方法では、請求項８に記載の戸惑い検出方法の構成に加え、前記学習のステップにおいて、前記トラップが表示されている時間中の特定の時間の視線速度履歴データのみを学習し、前記トラップが表示されている時間中の他の時間の視線速度履歴データは学習しないことを要旨とする。
【００２４】
この構成に係る戸惑い検出方法では、請求項８に記載の戸惑い検出方法の効果に加え、学習のステップにおいて、トラップが表示されている時間中の特定の時間、例えば最初の視線速度履歴データのみを学習し、トラップが表示されている時間中の他の視線速度履歴データは学習しないことで戸惑いが現に生じている確度の高いデータのみを用いて学習させ、確度の低いデータを排除することで学習効果を高めることができるという効果がある。
【００２５】
請求項１０に係る戸惑い検出方法では、請求項１乃至請求項９のいずれかに記載の戸惑い検出方法の構成に加え、前記戸惑い判定のステップにおいて、被験者が戸惑いの状態であるか否かを判断する視線速度履歴データと所定のパターンとの比較は、所定の閾値を基準として一致又は不一致の２値で行われることを要旨とする。
【００２６】
この構成に係る戸惑い検出方法では、請求項１乃至請求項９のいずれか１項に記載の戸惑い検出方法の効果に加え、戸惑い判定のステップにおける視線速度履歴データのパターンと所定のパターンとのパターンマッチングによる比較を所定の閾値を基準として一致又は不一致の２値で出力させることで、処理量を軽減させ処理を高速化することができるという効果がある。
【００２７】
請求項１１に係る戸惑い検出プログラムでは、入力手段と表示手段とを備えたコンピュータに、所定時間間隔で視線検出手段により被験者の視線の位置を検出する視線位置検出のステップと、検出した視線の位置の変化を視線速度データとして生成する視線速度データ生成のステップと、所定個数の連続して生成された視線速度データを視線速度履歴データとして記憶する視線速度履歴データ記憶のステップと、前記視線速度履歴データを所定のパターンと比較することで、被験者が戸惑いの状態であるか否かを判断する戸惑い判定のステップとを実行させることを要旨とする。
【００２８】
この構成に係る戸惑い検出プログラムでは、コンピュータに視線位置検出のステップで検出させた視線の位置を、視線速度データ生成のステップで視線速度データとして生成させ、視線速度履歴データ記憶のステップで視線速度履歴データとして記憶させる。この視線速度履歴データを戸惑い判定のステップで所定のパターンと比較させることで被験者が戸惑いの状態であるか否かを判断する処理を実行させることができるという効果がある。
【００２９】
【発明の実施の形態】
以下、本発明を具体化した一実施形態である戸惑い検出システム１により実行される人の戸惑い検出方法を図１〜図１１に従って説明する。ここで、「戸惑い」とは、本実施形態においては、人がコンピュータ端末等の操作時において、どのように操作したらよいか判らなくなる困惑状態をいうものとする。図１は、戸惑い検出システム１の構成を示すブロック図である。
【００３０】
（ハード構成）
戸惑い検出システム１は、ＣＰＵ３、ＲＡＭ４、ＲＯＭ５を備えた周知のパーソナルコンピュータから構成されるコンピュータ２を備える。このコンピュータ２には、入出力を制御するインタフェース６を介して、外部記憶装置であるＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）７が接続されている。また、ＣＲＴ（Ｃａｔｈｏｄｅ−ＲａｙＴｕｂｅ）やＬＣ（ＬｉｑｕｉｄＣｒｙｓｔａｌ）からなる出力手段、表示手段であるディスプレー８を備える。なお、図示を省略したが検出を管理するためのモニタを備えてよく、また、出力手段としてプリンタなどを備えてもよい。また、入力手段であるキーボード９、マウス１０を備える。なお、シリアル入力が可能な他の入力手段を用いてもよい。また、被験者の全体を捉えるモニタカメラ１１と、被験者の眼球をアップで映し出す視線検出用カメラ１２と、視線認識対象の眼球Ｅの角膜上に結像させるための幾何学的な特徴を有する図形パターンの光源Ｐ１，Ｐ２を備える。
【００３１】
本実施形態の戸惑い検出システム１を構成するコンピュータ２は、図１では１つのＣＰＵ３を備えた１台のコンピュータ２により構成されているが、複数のＣＰＵを備え、或いは複数のコンピュータを備えて構成されてもよい。特に、ニューラルネットワークの処理は大量の演算を必要とするため、十分な能力を備えたものが望ましい。例えば、本発明者の実験では、動作クロック３ＧＨｚ程度のパーソナルコンピュータ４台で並列処理を行い、動作確認を行っている。また、処理を機能毎に複数のコンピュータに分散化してもよい。
【００３２】
図２はＨＤＤ７の記憶内容を模式的に示す図である。ＨＤＤ７は、ＯＳ（オペレーションシステム）７ａ、本発明の戸惑い検出プログラム７ｂや視線位置検出プログラム７ｃ等のプログラムが記憶される。また、戸惑いを示す視線速度履歴データのパターンが学習されたニューラルネットワーク７ｄ、さらに検出された視線速度データの一時保存ファイル７ｅ、ここから生成された視線速度履歴データのファイル７ｆなどのファイルが記憶される。
【００３３】
視線検出用カメラ１２、光源Ｐ１，Ｐ２は、視線速度検出のために視線の認識を行う。これらが本発明の視線検出手段の一部に相当する。視線認識に使われている特徴点としては、瞳孔、黒目、角膜反射像、水晶体反射像等様々なものが使われている。それらの中でも、角膜反射像は、輝度が高く抽出しやすいことから、非常によく使われている特徴点である。この特徴点を用いた視線認識技術の一つとしてアイマークレコーダがあげられる。これは、眼球の回転中心と眼球を観測する観測系と点光源の位置が固定されているという条件（具体的には、観測系、光源の付いた装置を頭部に固定する。）の下では、角膜反射像は眼球の回転に応じて移動するという性質を用いたものである。角膜上皮が球面の一部としてモデル化されている場合には、複数の角膜反射像の位置から角膜の曲率中心を求めることができる。
【００３４】
また、例えば、上述の特許文献２に開示されている視線認識装置などでは、頭部に検出のための装置を装着しなくても視線を認識することができる。本実施形態ではこの方法を例に視線の認識を説明し、詳細な説明は省略する。この視線の認識方法では、視線認識対象の眼球Ｅの角膜上に結像させるための幾何学的な特徴を有する図形パターンの光源Ｐ１、Ｐ２を被験者の視界の両サイドに被験者の眼球に照射する。角膜上に結像された図形パターンを撮像装置で撮像し、その結像した図形パターンの特徴・位置から眼球Ｅの角膜曲率中心を幾何学的に計算して求める。そして求められた眼球角膜曲率中心情報を用いて視線を認識する。
【００３５】
ここで、本実施態様の被験者の戸惑いの検出のための学習の概略について説明する。図３は、視線の位置を検出する方法を示す図である。まず、被験者はディスプレー８を見ながら、キーボード９、マウス１０が操作可能な状態で着座して待機する。このとき、図３に示すように被験者の眼球Ｅと略同じ高さの前方両サイドには、被験者の視界に入るように光源Ｐ１，Ｐ２が配置される。光源Ｐ１には十字形、光源Ｐ２には円形の光源パターンが形成されており、視線検出用カメラ１２により被験者の眼球に結像した光源パターンの映像を記録する。視線検出用カメラ１２は、例えばＣＣＤビデオカメラで、コンピュータ２に制御されて時刻情報とともにＨＤＤ７に映像のデータが記録される。
【００３６】
撮影された映像から図形パターンの特徴・位置がコンピュータ２によって認識され、眼球Ｅの角膜曲率中心が幾何学的に計算して求められる。そして求められた眼球角膜曲率中心情報を用いて視線を認識する。この際、眼球角膜曲率中心情報に加えて、瞳孔中心位置等の眼球の他の情報を用いて視線を認識する。このようにして本実施形態では、所定間隔、例えば３０分の１秒毎の間隔で視線が認識される。この手順が、本発明の所定時間間隔で視線検出手段により被験者の視線の位置を検出する視線位置検出のステップに相当する。
【００３７】
なお、本発明者の試行錯誤の結果、この視線位置検出のステップにおいては、所定時間間隔が、３０分の１秒が最も好ましいことがわかったが、少なくとも２００分の１秒以上、１５分の１以下の範囲内であればよい。１５分の１秒を超える時間間隔であっても視線の速度を把握できるが、それ以上の間隔であると、その時間間隔内に視線が往復した動きをした場合等の影響が大きくなりやすく、検出する速度の精度が下がり好ましくはない。一方、２００分の１秒以下であると、現状のコンピュータの処理能力では負担が大きくなる。もちろん将来的にコンピュータの処理能力が向上すれば２００分の１秒未満の時間間隔とすることも可能である。なお、眼球の動きは大きく視線を変える動きと、視線方向を保ったまま振動するような動きを示しているため、必ずしも戸惑いを検出するためには、間隔が短ければよいというわけではない。
【００３８】
そして、ここで認識された眼球角膜曲率中心の位置と、その３０分の１秒前に認識された眼球角膜曲率中心の位置の移動距離を視線速度として計測する。あるいは、このとき認識された視線の方向と３０分の１秒前に認識された視線の方向とから視線の角度の差を視線の角速度として計測する。そしてこれが１フレームの視線速度データとなる。この手順が、本発明の検出した視線の位置の変化を視線速度データとして生成する視線速度データ生成のステップに相当する。このとき測定時間は３０分の１秒で一定であるため、計測された角速度若しくは移動距離は、いずれも視線速度に比例する。
本発明でいう「視線速度」は、例えば、所定時間当たりの眼球中心からの角膜曲率中心の移動する角速度、若しくは眼球の角膜曲率中心の移動距離のいずれでもよい。視線速度の変化をパターンで判断するため、時系列に速度変化が把握できればよいからである。眼球全体は概ね球体であるから、いずれの数値であって、速度変化の傾向はほぼ同一となるから同等の精度が期待できるからである。また、眼球の正確な形状の把握の困難さなど、観測値が技術的な問題から理論的な真の数値に対して誤差が含まれていると考えられる場合がある。しかしそのような場合でも本発明では、速度変化のパターンマッチングにより判断するため、そのままの未加工の生データを用いることができ、あえて修正や加工をする必要がない。比較するいずれのパターンにも同等に誤差が含まれることになるからである。したがって、被験者による学習やテストなども、観測条件を同一にする限り極めて簡易な処理とすることができる。
【００３９】
次に、このように検出された視線速度データが視線速度履歴データとしてＨＤＤ７に記憶される。この「視線速度履歴データ」は、連続した５０フレームの視線速度データを１つの塊として「パターン」としたものである。つまり、３０分の１秒毎で５０フレームであるので、１つの視線速度履歴データのパターンは、およそ１．７秒の時間長となる。
【００４０】
３０分の１秒毎に集計する視線速度履歴データをシフトしながら直近の連続する５０フレーム分からこのパターンを生成する。この手順が、本発明の所定個数の連続して生成された視線速度データを視線速度履歴データとして記憶する視線速度履歴データ記憶のステップに相当する。視線速度履歴データ記憶のステップにおいて、視線速度生成のステップで視線の速度データを生成する毎に視線速度履歴データを記憶することが好ましい。ただし、処理の負担などを考慮して、視線の速度データを生成する毎ではなく、１つおき、あるいはそれ以上の間隔で視線速度履歴データを生成するようにすることもできる。戸惑いを検出する被験者から得たパターンを「比較パターン」、戸惑い状態が導出された被験者から学習して得たパターンを「戸惑いパターン」という。但し、「戸惑いパターン」は、ニューラルネットワークにより記憶された概念的なもので、視線速度履歴データとして顕在化されたものではない。
【００４１】
なお、本実施形態では、望ましい例として視線速度履歴データを構成する連続した視線速度データの数を５０フレームとしているが、視線速度履歴データとして連続して検出される速度データの所定個数が少なくとも３０以上であれば、十分な精度となる。また、本実施形態では、望ましい例として、１つの視線速度履歴データのパターンの時間長が、およそ１．７秒の時間長とされているが、視線速度履歴記憶データのステップにおいて、視線速度履歴データとして連続して検出される速度データの時間長が０．５秒以上５秒以下であれば好ましい。視線速度履歴データの時間長が０．５秒未満であるとサンプリングの時間間隔にもよるが、パターンとしてデータ不足になりやすい。一方、時間長が５秒を超えると、時間内に戸惑い状態が解消してしまう場合もあり、却って精度が低下する虞がある。
【００４２】
（トラップの表示）次に、戸惑い状態にある被験者の視線速度履歴データのパターンをニューラルネットワークが記憶するための「戸惑いパターン」を取得する「トラップ」による学習を説明する。本実施形態では、戸惑い判定のステップにおいて、「比較パターン」とパターンマッチングにより比較される戸惑い状態か否かの基準とされる所定のパターンである「戸惑いパターン」は、学習によりニューロネットワークの中にその特徴が取得される。この学習は、被験者に誤操作をしやすい操作をさせたり、「正解がない設問」若しくは「正解が重複する設問」により解答を画面上で選択させるトラップを回答させたりする。そして、戸惑い状態における視線速度履歴データを「教師」として、ニューラルネットワークに入力する。この手順は、本発明の学習のステップに相当する。学習においては、被験者にはコンピュータのモニタを見せながら、「戸惑い」導出用のプログラムが表示させるディスプレー８の画面の指示に従って、キーボード９、マウス１０からキー入力、マウス入力をさせる。本実施形態では、このプログラムで表示させる「戸惑い」導出用の画面を「トラップ」、このプログラムを「トラッププログラム」と呼ぶ。
【００４３】
（トラップＴ０）このトラッププログラムは、例えば、アプリケーションソフトウエアのインストール画面を模した画面を表示させる。被験者が通常馴染んでいるＯＳにより示されるアプリケーションソフトのインストール画面と同様の画面が連続して表示される。ここには、このＯＳで共通な画面が示され、例えば、操作に必要な「次へ」ボタン、「戻る」ボタン、「キャンセル」ボタンも、このＯＳに共通な配列で表示されている。ところが、図４に示すように、このトラップＴ０では、「次へ」ボタンＴ０ａと、「戻る」ボタンＴ０ｂが通常の配列とは逆になっている。同じような正しい配列のメッセージボックスが連続して出現した後で、全体としてはそれまでのメッセージボックスと同じであるが、トラップＴ０では、操作ボタンだけが図４に示すように通常の配列と異なった配列で表示される。そのため、被験者は操作ミスを起こしそうになり戸惑いを生じるか、さらに誤操作をしてしまった場合もやはり戸惑いを生じる。従って確実に「通常状態（戸惑いを生じていない状態）」にある被験者に「戸惑い状態（戸惑いを生じている状態）」を導出できる。
（トラップＴ１）また、図５に示すトラップＴ１では、「目標値」Ｔ１ａと「現在値」Ｔ１ｂが表示され、「現在値」Ｔ１ｂにボタンＴ１ｃ若しくはボタンＴ１ｄを選択してクリックすることでその選択した数が加算されて目標値にするものである。しかし、「現在値９」に対して「＋２」しても「−２」しても、「目標値１０」にはならないため、被験者は戸惑い状態となる。極めて、分かり易い設定であり、かつ被験者は与えられた条件では解決できないことが短時間にわかるため、戸惑い状態の発生が速やかで、発生時間が管理しやすいものである。
【００４４】
（トラップＴ２）図６に示すトラップＴ２では、正解の選択肢がある計算問題が順次示された後にトラップＴ２が表示される。トラップＴ２では問題Ｔ２ａに示された「２＋２＝？」という、瞬時に計算できる問題に対して、解答Ｔ２ｂが「５」、解答Ｔ２ｃが「６」、解答２ｄが「７」と、正解の「４」に当たる解答がない。一定時間後、正解が候補の中に出現する。従って、確実に被験者に戸惑い状態を導出することができる。
【００４５】
（トラップＴ３）図７に示すトラップＴ３では、指示項目Ｔ３ａに示された「次の数字を選択して下さい２８」という指示に対して、解答Ｔ３ｂが「２８」、解答Ｔ３ｃが「２８」、解答Ｔ３ｄが「２８」といずれも正解である。そのため、被験者は、これらの解答のいずれを選択すべきか根拠を見いだせず、極めて短時間に被験者に戸惑い状態を導出することができる。
【００４６】
（トラップＴ４）そして、図８に示すトラップＴ４では、図５に示すトラップＴ１と同様に、順次設問が表示されるが、現在値Ｔ４ｂを目標値Ｔ４ａにするための加算又は減算するボタンが表示されるべき設問であるにも拘わらず、解答に必要なボタン表示がなく、解答不能になる。この場合も被験者に直ちに戸惑い状態を導出させることができる。
【００４７】
以上例示したようなトラップＴ０〜トラップＴ４によれば、設定した時間に表示することで被験者に直ちに戸惑い状態を導出できる。そのため、確実に戸惑い状態の視線をサンプリングすることができ、確実に戸惑い状態にある中で視線速度履歴データを採取して入力し、適切な「教師」として学習させることができる。
【００４８】
（トラップ以外の戸惑い）なお、上記トラップは、戸惑い状態を所定時刻に確実に導出させることができるため、学習の効果を高めることができる。ただし、このようなトラップによらなくても、正しい操作であっても経験的に戸惑いを生じる操作、例えば、表示している意味が分かりにくい、或いは誤操作が著しく多い、被験者が操作を停止して考え込む等々明らかな戸惑い状態を生じている操作を再現して学習させることもできる。
【００４９】
このトラップＴ０〜Ｔ４では、短時間のうちに確実に被験者を戸惑い状態にできるが、必ずしもトラップを表示している間、ずっと被験者に戸惑い状態が継続されて導出されている訳ではない。つまり、トラップＴ０乃至トラップＴ４等を表示してから、被験者がこれを理解して戸惑い状態になるまでには被験者の違いやトラップの種類により若干のリードタームの差があり、さらに状況を理解し戸惑い状態が解消するまでにも時間差がある。そこで、学習のステップにおいて、トラップが表示されている時間中の特定の時間、例えば、表示から３秒経過したときを戸惑い状態としてサンプリングする。それ以外は、戸惑い状態か通常状態かの判別が困難であるため、戸惑い状態としても通常状態としても学習のデータとしてサンプリングしないようにする。このような構成とすることで、確実に適切な「教師」のみを学習させることで学習の精度を向上させている。ここで、図９は、学習のステップにおけるニューラルネットワークの学習を示す模式図である。ここでは、「Ｈｉｓｔｏｒｙｏｆｅｙｅｍｏｖｅｍｅｎｔｓｐｅｅｄ」と表示された部分ＥＭが、視線速度を示し、網がけで表示された部分がトラップｔが表示された時間である。その下部の「０００…１」で表示された部分がサンプリングのタイミングを示す。ここに示すようにトラップｔが表示されるまでのサンプリングＳ０では通常状態を示す「０」としてサンプリングしている。また、トラップｔが表示された最初のサンプリングＳ１では、戸惑い状態を示す「１」のサンプルとして学習させるが、その後に続くサンプリングＳ２は、「０」とも「１」ともサンプリングしない。トラップｔの表示が終了した後はサンプリングＳ３で通常状態の「０」としてサンプリングする。
【００５０】
（戸惑い検出）次に、以上のような学習を行ったニューラルネットワークを用いて被験者の戸惑いを検出する方法について説明する。上述のように被験者は視線の検出を行える状態で、戸惑いを検出したい作業を行う。この作業は、パーソナルコンピュータのアプリケーションソフトの操作や、銀行のＡＴＭ（Ａｕｔｏｍａｔｅｄ−ＴｅｌｌｅｒＭａｃｈｉｎｅ）の操作、その他機械の操作用コンソールなど、マン−マシンインタフェースに関わるものが代表的なものである。これらは、操作時に戸惑いを生じないことがマン−マシンインタフェースの向上に直接関係するためである。また、これらは操作の履歴が時間情報と共に再現しやすいからである。もちろん作業はこれらに限らず、印刷された文書や写真、映像などを見せたときの反応などを調べることもできる。本実施形態では、視線の認識を行うため、被験者が見ている対象物を画面表示しながら、その画像上でアイポイントを表示するようなこともできる。また、モニタカメラ１１により被験者自身の表情や、動作を映像として記録しておくこともできる。
【００５１】
本実施形態では、視線速度履歴データの作成と同時にモニタカメラ１１で被験者のデジタルビデオ映像を記録する。これは、時刻情報とともに記録され、時刻情報は、画面に時刻をスーパインポーズをしたり、視線速度履歴データとの同期をはかるために用いられる。このデジタルビデオ映像からは、被験者の態度、表情などが映し出され、戸惑い状態の発現との関連を観察できる。
【００５２】
もちろん、マルチ画面として、被験者が見ている表示画面、被験者の表情、被験者の眼球、観測された数値・パターン、戸惑いの発現の表示を同一画面に表示することも好ましい。
【００５３】
このような作業を行いつつ、前記視線速度履歴データをニューラルネットワークでパターンマッチングにより比較することで、被験者が戸惑いの状態であるか否かを判断する。この手順が本発明の戸惑い判定のステップに相当する。この手順はニューラルネットワークを用いたパターンマッチングにより、パターンが一致するとの判断、即ち、ニューラルネットワークが「１」を出力した場合は、戸惑い状態が発現していると判断する。また、パターンが一致しないと判断された場合は「０」が出力される。即ち、本実施形態では、戸惑い判定のステップにおいて、被験者が戸惑いの状態であるか否かを判断する視線速度履歴と所定のパターンとの比較は、所定の閾値を基準として一致を示す「１」、又は不一致を示す「０」の２値のいずれかで出力される。
【００５４】
本実施形態の戸惑い判定のステップでは、ニューラルネットワークにより戸惑いの判定をしている。そのため、他の方法に比較してパターンマッチングの精度が高く確実に戸惑いの発現を捉えることができる。本実施形態のニューラルネットワークは、教師あり学習アルゴリズムを備えた階層型ニューラルネットワークであって、入力された信号は内部で前向きのみに流れるフィードフォワード方式で処理される。また、ノード関数はシグモイド関数を用いている（図１１参照）。本発明者らは、トラップＴ０〜Ｔ４を用いた実験を行い、これらのデータを使ってニューラルネットワークによる戸惑い状態検出モデルを作成した。実験データを学習データとして使った結果、収束するモデルが作成された。そして学習後のニューラルネットワークの認識結果を図１０に示す。ここに示すように、全視線速度履歴ＷＨに対して、学習したトラップを示した部分ＰＴでは、検出結果ＤＲに示すように戸惑い状態として検出されている。さらに、実験データに５％のノイズを加えたデータを戸惑い状態検出モデルに与え、戸惑い状態の検出テストを行った。その結果、全ての戸惑い状態ポイントを特定することができた。
【００５５】
次に、作成した戸惑い状態検出ニューラルネットワークモデルを検証するために、未知データとして被験者１０人を対象にトラップＴ２の基本パターン画面を使って検証実験を行った。検証実験データをモデルに入力して戸惑い状態検出を行った。ここで図１２は、検証実験データをモデルに入力して未知の戸惑い状態の検出を行った結果を示す図である。図は、横軸が時間軸であり、上段の枠の部分が、戸惑い状態パターン画面であるトラップｔ１〜ｔ１０が表示されていた時間である。下段の下に伸びている線がモデルが戸惑い状態と判定した箇所を示している。その結果、図１２に示すように戸惑い状態検出モデルは、１０の未知のトラップｔ１〜ｔ１０に対して、ｔ１０以外は戸惑い状態を検出し、９０％の検出率を示した。
【００５６】
上記実施形態の戸惑い検出方法によれば、以下のような特徴を得ることができる。
・上記戸惑い検出方法では、被験者の視線を観察するだけで、他の要素を含まずに被験者の戸惑いを検出することができるという効果がある。したがって、被験者に負担をかけず、かつ極めて容易に被験者の戸惑いを検出できる。このように戸惑いを正確に検出することで、通常の観察では摘出できない戸惑い状態の発生を客観的かつ確実に検出することで、マン−マシンインターフェースの問題部分を摘出することができる。
【００５７】
・また、戸惑い状態の判定にニューラルネットワークを用いているため、複雑な視線速度履歴データを的確に判断し、精度の高い検出をすることができるという効果がある。特に、本実施形態のニューラルネットワークは、パターンマッチングに適した教師あり学習アルゴリズムを備えた階層型ニューラルネットワークである。加えて入力された信号は内部で前向きのみに流れるフィードフォワード方式で処理され、ノード関数はシグモイド関数を用いている。そのため、パーソナルコンピュータなどを用いて、高い精度の判定を行うことができるという効果がある。
【００５８】
・また、ニューラルネットワークの学習に、トラップＴ０〜Ｔ４を用いているため、精度の高い学習を行わせることができるという効果がある。
なお、上記実施形態は以下のように変更してもよい。
【００５９】
○ 本実施形態では、戸惑い判定のステップを、フィードフォワード方式のニューラルネットワークを用いたパターンマッチングにより処理しているが、バックプロパゲーション（誤差逆伝播法）の学習アルゴリズムで行ってもよい。その他にも、ボルツマンマシン、これを簡易化した平均場近似学習マシンや、ＲＢＦネット（放射状基関数ネット）、学習ベクトル量子化、ファジー・アートマップ、コグニトロンなど種々の技法を採用又は応用しうるものである。
【００６０】
○ 本実施形態では、戸惑い判定のステップを、入力信号の処理の流れでノード関数がシグモイド関数を用いたニューラルネットワークを用いているが、他にも閾値関数などを用い得る。また、シグモイド関数や閾値関数においても、さらに「ゆらぎ」をプラスした処理をしてもよく、入力和をそのまま出力する線形関数などによることも考えられる。また、ボルツマンマシンであれば傾きを次第に急にする。
【００６１】
【発明の効果】
以上、詳述したように、本願発明では、本発明は、視線を観察するだけで、被験者の戸惑いを自動的にかつ適切に検出できるという効果がある。
【図面の簡単な説明】
【図１】戸惑い検出システム１の構成を示すブロック図。
【図２】ＨＤＤ７の記憶内容を模式的に示す図。
【図３】視線の位置を検出する方法を示す図。
【図４】トラップＴ０を示す図。
【図５】トラップＴ１を示す図。
【図６】トラップＴ２を示す図。
【図７】トラップＴ３を示す図。
【図８】トラップＴ４を示す図。
【図９】ニューラルネットワークの学習を示す模式図。
【図１０】学習後の全視線速度履歴とトラップの位置と戸惑いの検出の関係を示す図。
【図１１】シグモイド関数の例を示す図。
【図１２】検証実験データをモデルに入力して未知の戸惑い状態の検出を行った結果を示す図。
【符号の説明】
１…戸惑い検出装置、２…コンピュータ、３…ＣＰＵ、４…ＲＡＭ、５…ＲＯＭ、６…インタフェース、７…ＨＤＤ、８…ディスプレー、９…キーボード、１０…マウス、１１…モニタカメラ、１２…視線検出用カメラ、Ｐ１，Ｐ２…光源、Ｔ０〜Ｔ４…トラップ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a confused detection method and program, and more particularly, to a confused detection method for automatically detecting a confused state only by observing a line of sight.
[0002]
[Prior art]
Conventionally, attempts have been made to detect human emotions and the like by external observation. It is used, for example, to pick up an operation in which the user feels confused in operation and to extract a problem, for example, to improve a man-machine interface. As the method, various data such as face position, arm movement, voice, muscle change, sweating, heart rate, blood pressure or facial expression are detected, and the subject's emotion is estimated based on the detected data. Was. However, the accuracy of estimation was not improved with only one type of detection data, and in many cases, emotions were estimated based on a plurality of detection data.
[0003]
Therefore, a line-of-sight information analyzing device as shown in Patent Document 1 has been proposed. In this gaze information analysis device, the eye movement is detected by the eye movement detection device, and the time series change of the eyeball detected by the analysis device is analyzed in the frequency domain, and the input image is displayed from the image input unit. The contents of the image are analyzed by the display content analysis device, and the two are integrated by the integrated analysis unit to obtain highly reliable data on the psychological observation state of the subject and the objective evaluation of the image. (See summary). With this gaze information analysis device, the decision-making process of the subject who wants to evaluate the image is correlated with the change in the behavior seen on the time axis of the eye movement and the presented image content and the eye movement. And an objective evaluation result for a human image can be obtained. Note that Patent Document 2 discloses an example of a method of detecting the position of a line of sight.
[0004]
[Patent Document 1]
JP-A-6-162 FIG.
[Patent Document 2]
JP-A-6-319701
[0005]
[Problems to be solved by the invention]
However, in this gaze information analysis device, when the image content is known and the behavior of the eyeball is unknown, or when the behavior of the eyeball is known and the image content is unknown, each unknown amount is calculated using a prediction function described in advance. There was a problem that even if it could be estimated, it could not be detected even in actual puzzle.
[0006]
In order to solve the above-mentioned problems, an object of the present invention is to provide a confused detection method that can automatically detect confused subjects only by observing the line of sight.
[0007]
[Means for Solving the Problems]
In the embarrassment detection method according to claim 1, a computer including an input unit and a display unit detects a gaze position of the subject by using a gaze detection unit at predetermined time intervals, and a position of the detected gaze. Gaze speed data generation step of generating a change in eye gaze speed data, a gaze speed history data storage step of storing a predetermined number of continuously generated gaze speed data as gaze speed history data, and the gaze speed history The gist of the present invention is to execute a confused judgment step of judging whether or not the subject is confused by comparing the data with a predetermined pattern.
[0008]
In the embarrassment detection method according to this configuration, the gaze position detected in the gaze position detection step is generated as speed data in the gaze speed data generation step, and stored as gaze speed history data in the gaze speed history data storage step. . By comparing the line-of-sight speed history data with a predetermined pattern in the embarrassment determination step, it is possible to determine whether or not the subject is confused.
[0009]
In the embarrassment detection method according to claim 2, in addition to the configuration of the embarrassment detection method according to claim 1, in the step of detecting the line-of-sight position, the predetermined time interval is not less than 1/200 second and not more than 1/15. The gist is that there is.
[0010]
In the embarrassment detection method according to this configuration, in addition to the effect of the embarrassment detection method according to claim 1, by setting the predetermined time interval in the line-of-sight position detection step to be at least 1/200 second and not more than 15 times, There is an effect that the detection of the embarrassment of the subject can be processed without delay of a practical time and without increasing the processing amount of the computer. Particularly at 1/30 second, the most favorable result could be obtained.
[0011]
In the embarrassment detection method according to claim 3, in addition to the configuration of the embarrassment detection method according to claim 1 or 2, in the step of storing the line-of-sight speed history data, the line of sight continuously detected as the line-of-sight speed history data The gist is that the predetermined number of speed data is at least 30 or more.
[0012]
In the embarrassment detection method according to this configuration, in addition to the effects of the embarrassment detection method according to claim 1 or 2, the predetermined number of line-of-sight velocity data continuously detected as the line-of-sight velocity data is at least 30 or more. Thus, there is an effect that an amount of information necessary for comparison with a predetermined pattern can be secured.
[0013]
In the embarrassment detection method according to claim 4, in addition to the configuration of the embarrassment detection method according to any one of claims 1 to 3, in the step of storing the line-of-sight speed history data, the line-of-sight speed The gist is that the total time length of the detected line-of-sight speed data is not less than 0.5 seconds and not more than 5 seconds.
[0014]
In the embarrassment detection method according to this configuration, in addition to the effects of the embarrassment detection method according to any one of claims 1 to 3, detection is continuously performed as the gaze speed history data in the step of storing the gaze speed history data. The reliability of accuracy is secured by setting the total time length of the line-of-sight speed data to be 0.5 seconds or more, and the processing speed is improved and the processing load is reduced by setting the total time length to 5 seconds or less. There is an effect that it can be necessary and sufficient time to judge the state of.
[0015]
In the embarrassment detection method according to claim 5, in addition to the configuration of the embarrassment detection method according to any one of claims 1 to 4, in the step of storing the gaze velocity history data, the step of generating the gaze velocity data is performed. The gist is that the line-of-sight speed history data is generated and stored each time the line-of-sight speed data is generated.
[0016]
In the embarrassment detection method according to this configuration, in addition to the effect of the embarrassment detection method according to any one of claims 1 to 4, the timing of generating and storing the line-of-sight speed history data in the step of storing the line-of-sight speed history data Is performed every time the line-of-sight speed data is generated in the line-of-sight speed data generation step, there is an effect that the state of embarrassment can be accurately determined without delay of the detection time.
[0017]
In the embarrassment detection method according to claim 6, in addition to the configuration of the embarrassment detection method according to any one of claims 1 to 5, the embarrassment determination step is based on pattern matching using a neural network. Make a summary.
[0018]
In the embarrassment detection method according to this configuration, in addition to the effect of the embarrassment detection method according to any one of claims 1 to 5, the embarrassment determination step is performed by pattern matching using a neural network, thereby achieving a nonlinear There is an effect that high-speed and accurate detection can be performed even in a confused state having a line-of-sight speed history of a complicated pattern.
[0019]
In the embarrassment detection method according to claim 7, in addition to the configuration of the embarrassment detection method according to claim 6, in the embarrassment determination step, the neural network is a feed-forward hierarchical neural network, and the node function is The point is that a sigmoid function is used.
[0020]
In the embarrassment detection method according to this configuration, in addition to the effect of the embarrassment detection method according to claim 6, there is an effect that the node function is a sigmoid function, so that the detection accuracy of the embarrassed state can be increased.
[0021]
In the embarrassment detection method according to claim 8, in addition to the configuration of the embarrassment detection method according to claim 7, in the embarrassment determination step, the predetermined pattern to be compared is such that the subject has no correct answer or the correct answer is duplicated. The gist of the present invention is to further execute a learning step of generating based on the line-of-sight speed history data in a state in which a trap is selected to select an answer on the screen.
[0022]
In the embarrassment detection method according to this configuration, in addition to the effect of the embarrassment detection method according to claim 7, in the embarrassment determination step, the predetermined pattern to be compared has no correct answer or the correct answer overlaps with the subject. The effect of allowing the subject to immediately and surely derive a state of embarrassment by letting the subject answer the trap that allows the user to select the answer on the screen by the question, and learning this state can improve the accuracy of the predetermined pattern to be compared. There is.
[0023]
In the embarrassment detection method according to claim 9, in addition to the configuration of the embarrassment detection method according to claim 8, in the learning step, only the eye-gaze speed history data at a specific time during the time when the trap is displayed is used. The gist is that learning is not performed and the line-of-sight speed history data at other times during the time when the trap is displayed is not learned.
[0024]
In the embarrassment detection method according to this configuration, in addition to the effect of the embarrassment detection method according to claim 8, in the learning step, a specific time during the time when the trap is displayed, for example, only the first line-of-sight speed history data is used. Learning by learning other eye-gaze speed history data during the time the trap is displayed, learning using only high-accuracy data that actually causes confusion, and learning by eliminating low-accuracy data There is an effect that the effect can be enhanced.
[0025]
In the embarrassment detection method according to claim 10, in addition to the configuration of the embarrassment detection method according to any one of claims 1 to 9, in the embarrassment determination step, it is determined whether or not the subject is in a confused state. The gist is that the comparison between the line-of-sight speed history data and the predetermined pattern is performed using two values of coincidence or non-coincidence based on a predetermined threshold value.
[0026]
In the embarrassment detection method according to this configuration, in addition to the effect of the embarrassment detection method according to any one of claims 1 to 9, the pattern of the line-of-sight speed history data pattern and the predetermined pattern in the embarrassment determination step is provided. By outputting the comparison by matching as two values of matching or non-matching based on a predetermined threshold value, there is an effect that the processing amount can be reduced and the processing speed can be increased.
[0027]
The embarrassment detection program according to claim 11, wherein a computer having input means and display means has a gaze position detection step of detecting the gaze position of the subject by the gaze detection means at predetermined time intervals; Gaze speed data generation step of generating a change in eye gaze speed data, a gaze speed history data storage step of storing a predetermined number of continuously generated gaze speed data as gaze speed history data, and the gaze speed history The gist of the present invention is to compare the data with a predetermined pattern to execute a confused determination step of determining whether the subject is in a confused state.
[0028]
In the embarrassment detection program according to this configuration, the computer causes the computer to generate the gaze position detected in the gaze position detection step as gaze speed data in the gaze speed data generation step, and stores the gaze speed history in the gaze speed history data storage step. Store as data. By comparing the line-of-sight speed history data with a predetermined pattern in the confused determination step, it is possible to execute a process of determining whether or not the subject is in a confused state.
[0029]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, a method for detecting embarrassment of a person performed by the embarrassment detection system 1 according to an embodiment of the present invention will be described with reference to FIGS. 1 to 11. Here, in the present embodiment, “embarrassed” refers to a puzzled state in which a person cannot understand how to operate the computer terminal or the like when operating the computer terminal or the like. FIG. 1 is a block diagram illustrating a configuration of the embarrassment detection system 1.
[0030]
(Hardware configuration)
The embarrassment detection system 1 includes a computer 2 including a known personal computer including a CPU 3, a RAM 4, and a ROM 5. An HDD (Hard Disk Drive) 7 as an external storage device is connected to the computer 2 via an interface 6 for controlling input and output. In addition, a display 8 is provided as an output unit including a CRT (Cathode-Ray Tube) or an LC (Liquid Crystal) and a display unit. Although not shown, a monitor for managing detection may be provided, and a printer or the like may be provided as output means. Further, a keyboard 9 and a mouse 10 as input means are provided. Note that other input means capable of serial input may be used. Also, a monitor camera 11 for capturing the entire subject, a gaze detection camera 12 for projecting an eyeball of the subject up, and a graphic pattern having a geometric feature for forming an image on the cornea of the eyeball E to be recognized. Light sources P1 and P2.
[0031]
Although the computer 2 constituting the puzzle detection system 1 of the present embodiment is constituted by one computer 2 having one CPU 3 in FIG. 1, the computer 2 is constituted by comprising a plurality of CPUs or comprising a plurality of computers. May be done. In particular, the processing of the neural network requires a large amount of computation, and therefore it is desirable that the neural network has sufficient capacity. For example, in an experiment performed by the inventor of the present invention, parallel processing was performed by four personal computers with an operation clock of about 3 GHz to confirm operation. Further, the processing may be distributed to a plurality of computers for each function.
[0032]
FIG. 2 is a diagram schematically showing the storage contents of the HDD 7. The HDD 7 stores an OS (operation system) 7a and programs such as a confused detection program 7b and a gaze position detection program 7c of the present invention. In addition, files such as a neural network 7d in which a pattern of eye-gaze speed history data showing embarrassment has been learned, a temporary storage file 7e of detected eye-gaze speed data, and a file 7f of eye-gaze speed history data generated from the neural network 7d are stored. You.
[0033]
The line-of-sight detection camera 12 and the light sources P1 and P2 recognize the line of sight for detecting the line-of-sight speed. These correspond to a part of the visual line detection means of the present invention. As a feature point used for gaze recognition, various points such as a pupil, a iris, a corneal reflection image, and a crystalline lens reflection image are used. Among them, the corneal reflection image is a feature point that is very often used because of its high luminance and easy extraction. One of the eye-gaze recognition techniques using these feature points is an eye mark recorder. This is under the condition that the rotation center of the eyeball, the observation system for observing the eyeball, and the position of the point light source are fixed (specifically, the observation system and the device with the light source are fixed to the head). Uses the property that the corneal reflection image moves in accordance with the rotation of the eyeball. When the corneal epithelium is modeled as a part of a spherical surface, the center of curvature of the cornea can be obtained from the positions of a plurality of corneal reflection images.
[0034]
Further, for example, in the gaze recognition device disclosed in Patent Document 2 described above, the gaze can be recognized without mounting a device for detection on the head. In the present embodiment, recognition of the line of sight will be described using this method as an example, and detailed description will be omitted. In this gaze recognition method, the light sources P1 and P2 of a graphic pattern having a geometric feature for forming an image on the cornea of the eyeball E to be recognized are illuminated on both sides of the field of view of the subject. . The figure pattern formed on the cornea is imaged by an image pickup device, and the corneal curvature center of the eyeball E is geometrically calculated and obtained from the features and positions of the formed figure pattern. The gaze is recognized using the obtained eyeball corneal curvature center information.
[0035]
Here, an outline of the learning for detecting the embarrassment of the subject according to the present embodiment will be described. FIG. 3 is a diagram illustrating a method of detecting the position of the line of sight. First, the subject sits and waits while watching the display 8 with the keyboard 9 and the mouse 10 operable. At this time, as shown in FIG. 3, the light sources P1 and P2 are arranged on both front sides at substantially the same height as the eyeball E of the subject so as to enter the field of view of the subject. A cross-shaped light source pattern is formed on the light source P1, and a circular light source pattern is formed on the light source P2. The image of the light source pattern formed on the eyeball of the subject by the visual axis detection camera 12 is recorded. The line-of-sight detection camera 12 is, for example, a CCD video camera, and video data is recorded on the HDD 7 together with time information under the control of the computer 2.
[0036]
The features and positions of the graphic pattern are recognized by the computer 2 from the photographed video, and the center of the corneal curvature of the eyeball E is calculated geometrically. The gaze is recognized using the obtained eyeball corneal curvature center information. At this time, in addition to the eyeball corneal curvature center information, the line of sight is recognized using other information of the eyeball such as the pupil center position. Thus, in the present embodiment, the line of sight is recognized at a predetermined interval, for example, at intervals of 1/30 second. This procedure corresponds to a gaze position detection step of detecting the gaze position of the subject by the gaze detection means at predetermined time intervals of the present invention.
[0037]
As a result of trial and error by the inventor, it has been found that, in this gaze position detecting step, the predetermined time interval is most preferably 1/30 second, but at least 1/200 second or more and 15 minutes or more. It may be within the range of 1 or less. Even if the time interval exceeds 1/15 second, the speed of the line of sight can be grasped, but if the time interval is longer than that, the effect of the case where the line of sight reciprocates within the time interval tends to be large, The accuracy of the detection speed decreases, which is not preferable. On the other hand, if the time is less than 1/200 second, the load on the current computer's processing capacity increases. Of course, if the processing capability of the computer is improved in the future, it is possible to set the time interval to less than 1/200 second. In addition, since the movement of the eyeball greatly changes the line of sight and the movement that vibrates while maintaining the direction of the line of sight, it is not always necessary that the interval is short in order to detect embarrassment.
[0038]
Then, the movement distance between the position of the eyeball corneal curvature center recognized here and the position of the eyeball corneal curvature center recognized 1/30 second before that is measured as the line-of-sight velocity. Alternatively, a difference between the angle of the line of sight from the direction of the line of sight recognized at this time and the direction of the line of sight recognized 1/30 second earlier is measured as the angular velocity of the line of sight. This becomes the line-of-sight velocity data for one frame. This procedure corresponds to a line-of-sight speed data generation step of generating a detected line-of-sight position change as line-of-sight speed data according to the present invention. At this time, since the measurement time is constant at 1/30 second, the measured angular velocity or moving distance is proportional to the line-of-sight velocity.
The “line-of-sight velocity” in the present invention may be, for example, either an angular velocity at which the center of the corneal curvature moves from the center of the eyeball per predetermined time or a moving distance of the center of the corneal curvature of the eyeball. This is because the change in the line-of-sight speed is determined by a pattern, so that the change in speed can be grasped in a time series. This is because the entire eyeball is almost a sphere, and any numerical value is obtained, and the tendency of the speed change is almost the same, so that the same accuracy can be expected. Further, there is a case where an observation value is considered to include an error with respect to a theoretical true numerical value due to a technical problem, such as difficulty in grasping an accurate shape of an eyeball. However, even in such a case, in the present invention, since the determination is made by pattern matching of the speed change, the raw data as it is can be used as it is, and there is no need to make any correction or processing. This is because any pattern to be compared includes an error equally. Therefore, learning and testing by the subject can be extremely simple processing as long as the observation conditions are the same.
[0039]
Next, the line-of-sight speed data thus detected is stored in the HDD 7 as line-of-sight speed history data. The “line-of-sight speed history data” is obtained by forming the line-of-sight speed data of continuous 50 frames into a “pattern” as one block. That is, since there are 50 frames every 1/30 second, a pattern of one line-of-sight velocity history data has a time length of about 1.7 seconds.
[0040]
This pattern is generated from the latest 50 consecutive frames while shifting the line-of-sight speed history data to be counted every 1/30 second. This procedure corresponds to a gaze speed history data storage step of storing a predetermined number of continuously generated gaze speed data as gaze speed history data of the present invention. In the gaze speed history data storage step, it is preferable to store the gaze speed history data each time the gaze speed data is generated in the gaze speed generation step. However, in consideration of the processing load and the like, the line-of-sight speed history data may be generated not every time the line-of-sight speed data is generated, but at every other or more intervals. A pattern obtained from a subject who detects embarrassment is referred to as a “comparison pattern”, and a pattern obtained by learning from a subject whose confused state is derived is referred to as a “confused pattern”. However, the “embarrassed pattern” is a conceptual one stored by a neural network, and is not actualized as eye-gaze velocity history data.
[0041]
In the present embodiment, as a desirable example, the number of continuous line-of-sight speed data constituting the line-of-sight speed history data is set to 50 frames, but the predetermined number of the speed data continuously detected as the line-of-sight speed history data is at least 30. Above is sufficient accuracy. In the present embodiment, as a desirable example, the time length of one pattern of the line-of-sight speed history data is set to a time length of approximately 1.7 seconds. It is preferable that the time length of the speed data continuously detected as data is 0.5 seconds or more and 5 seconds or less. If the time length of the line-of-sight speed history data is less than 0.5 seconds, depending on the sampling time interval, data shortage tends to occur as a pattern. On the other hand, if the time length exceeds 5 seconds, the confused state may be resolved within the time, and the accuracy may be reduced.
[0042]
(Display of Trap) Next, learning by "trap" for acquiring a "confused pattern" for the neural network to store the pattern of the line-of-sight speed history data of the subject in a confused state will be described. In the present embodiment, in the embarrassment determination step, the `` embarrassment pattern '' which is a predetermined pattern which is a criterion for determining whether or not the embarrassment state is compared by the `` comparison pattern '' and the pattern matching is stored in the neural network by learning. The feature is obtained. In this learning, the subject is caused to perform an operation that is likely to cause an erroneous operation, or the subject is answered by a trap that allows the user to select an answer on the screen based on “a question having no correct answer” or “a question having a correct answer overlapping”. Then, the line-of-sight speed history data in the puzzled state is input to the neural network as "teacher". This procedure corresponds to the learning step of the present invention. In the learning, the subject inputs a key and a mouse from the keyboard 9 and the mouse 10 in accordance with the instructions on the screen of the display 8 displayed by the program for deriving “confused” while showing the monitor of the computer. In the present embodiment, a screen for deriving “confused” displayed by this program is called a “trap”, and this program is called a “trap program”.
[0043]
(Trap T0) This trap program displays, for example, a screen imitating an installation screen of application software. A screen similar to the application software installation screen indicated by the OS to which the subject is usually familiar is continuously displayed. Here, a screen common to this OS is shown. For example, a “next” button, a “return” button, and a “cancel” button required for the operation are also displayed in an array common to this OS. However, as shown in FIG. 4, in the trap T0, the “next” button T0a and the “return” button T0b are reversed from the normal arrangement. After the message boxes of the same correct arrangement appear successively, they are the same as the previous message boxes as a whole, but in the trap T0, only the operation buttons are different from the normal arrangement as shown in FIG. Is displayed in an array. For this reason, the subject is likely to make an operation error, causing embarrassment, or even erroneously operating, causing embarrassment. Therefore, a "confused state (a state in which embarrassment is occurring)" can be reliably derived for a subject who is in a "normal state (a state in which embarrassment has not occurred)".
(Trap T1) In the trap T1 shown in FIG. 5, a "target value" T1a and a "current value" T1b are displayed, and the button T1c or the button T1d is selected and clicked on the "current value" T1b. Are added to the target value. However, even if “+2” or “−2” with respect to “current value 9”, the target value does not become “target value 10”, and the subject is in a confused state. The setting is extremely easy to understand, and the subject can quickly understand that it cannot be solved under the given conditions, so that the occurrence of the embarrassed state is prompt and the occurrence time is easy to manage.
[0044]
(Trap T2) In the trap T2 shown in FIG. 6, the trap T2 is displayed after the calculation questions having correct choices are sequentially displayed. In the trap T2, the answer T2b is "5", the answer T2c is "6", the answer 2d is "7", and the correct answer is "2 + 2 =?" Shown in the question T2a. No answer to "4." After a certain time, the correct answer appears in the candidates. Therefore, it is possible to reliably confuse the subject with the confused state.
[0045]
(Trap T3) In the trap T3 shown in FIG. 7, in response to the instruction "Please select the next number 28" shown in the instruction item T3a, the answer T3b is "28", the answer T3c is "28", The answer T3d is "28" and both are correct. For this reason, the subject does not find a basis for selecting any of these answers, and can derive a confused state to the subject in a very short time.
[0046]
(Trap T4) Then, in the trap T4 shown in FIG. 8, questions are sequentially displayed as in the trap T1 shown in FIG. 5, but a button for adding or subtracting the current value T4b to the target value T4a is displayed. Despite the question to be asked, there is no button display required for answering, and the answer cannot be answered. Also in this case, the subject can be immediately brought out of the puzzled state.
[0047]
According to the traps T0 to T4 as exemplified above, the display at the set time can immediately elicit a confused state for the subject. Therefore, it is possible to reliably sample the line of sight in a puzzled state, to collect and input the line-of-sight velocity history data in the state of being puzzled, and to learn as an appropriate “teacher”.
[0048]
(Essence other than trap) Note that the above trap can surely derive an embarrassed state at a predetermined time, so that the effect of learning can be enhanced. However, even without such a trap, an operation that causes embarrassment even if it is a correct operation, for example, the meaning of the display is difficult to understand, or the number of erroneous operations is extremely large. It is also possible to learn by reproducing an operation causing an apparent embarrassment state such as thinking.
[0049]
In the traps T0 to T4, the subject can be surely put in a confused state in a short time. However, the confused state is not always derived while the trap is being displayed. In other words, after displaying the traps T0 to T4 and the like, there is a slight difference in the lead terms depending on the difference between the subjects and the type of the trap from the time when the subject understands this and becomes confused, and further understands the situation. There is a time lag before the embarrassment is resolved. Therefore, in the learning step, sampling is performed as a confused state at a specific time during which the trap is displayed, for example, when three seconds have elapsed from the display. In other cases, it is difficult to determine whether it is a puzzled state or a normal state. Therefore, sampling is not performed as learning data in both the puzzled state and the normal state. With such a configuration, the learning accuracy is improved by ensuring that only the appropriate “teacher” is learned. Here, FIG. 9 is a schematic diagram showing learning of the neural network in the learning step. Here, the portion EM displayed as “History of eye movement speed” indicates the line-of-sight speed, and the portion displayed as a shaded portion is the time during which the trap t is displayed. .. 1 at the lower part indicates the sampling timing. As shown here, in the sampling S0 until the trap t is displayed, sampling is performed as "0" indicating a normal state. Further, in the first sampling S1 in which the trap t is displayed, learning is performed as a sample of "1" indicating a puzzled state, but in subsequent sampling S2, neither "0" nor "1" is sampled. After the display of the trap t is completed, sampling is performed as "0" in the normal state in sampling S3.
[0050]
(Detection of Puzzle) Next, a method of detecting the embarrassment of the subject using the neural network that has learned as described above will be described. As described above, the subject performs an operation to detect embarrassment in a state where the gaze can be detected. This work is typically related to a man-machine interface such as operation of application software of a personal computer, operation of an ATM (Automated-Teller Machine) of a bank, and a console for operating other machines. These are because the lack of confusion during operation is directly related to the improvement of the man-machine interface. In addition, these are because the operation history is easily reproduced together with the time information. Of course, the work is not limited to these, and it is also possible to examine a reaction when a printed document, photograph, video, or the like is shown. In the present embodiment, in order to recognize the line of sight, an eye point can be displayed on the image while the target object that the subject is looking at is displayed on the screen. In addition, the expression and movement of the subject himself can be recorded as video by the monitor camera 11.
[0051]
In the present embodiment, the digital video image of the subject is recorded by the monitor camera 11 simultaneously with the generation of the line-of-sight speed history data. This is recorded together with the time information, and the time information is used to superimpose the time on the screen and synchronize with the line-of-sight speed history data. From this digital video image, the attitude, facial expression, and the like of the subject are displayed, and the relation with the appearance of the embarrassed state can be observed.
[0052]
Of course, as the multi-screen, it is also preferable to display the display screen that the subject is looking at, the expression of the subject, the eyeball of the subject, the observed numerical value / pattern, and the expression of the embarrassment on the same screen.
[0053]
While performing such operations, it is determined whether or not the subject is in a confused state by comparing the line-of-sight speed history data by pattern matching using a neural network. This procedure corresponds to the confused judgment step of the present invention. In this procedure, it is determined by pattern matching using a neural network that the patterns match, that is, if the neural network outputs “1”, it is determined that a confused state has developed. If it is determined that the patterns do not match, “0” is output. That is, in the present embodiment, in the embarrassment determination step, the comparison between the line-of-sight speed history for determining whether the subject is embarrassed or not and the predetermined pattern is “1” indicating a match based on a predetermined threshold. , Or "0" indicating a mismatch.
[0054]
In the embarrassment determination step of the present embodiment, embarrassment is determined by a neural network. Therefore, compared to other methods, the accuracy of pattern matching is high, and it is possible to surely grasp the occurrence of embarrassment. The neural network according to the present embodiment is a hierarchical neural network having a supervised learning algorithm, and an input signal is processed by a feedforward method that internally flows only forward. The sigmoid function is used as the node function (see FIG. 11). The present inventors conducted an experiment using traps T0 to T4, and created a confused state detection model using a neural network using these data. Using the experimental data as training data, a converging model was created. FIG. 10 shows the recognition result of the neural network after learning. As shown here, for the entire line-of-sight velocity history WH, the part PT indicating the learned trap is detected as a confused state as indicated by the detection result DR. Further, data obtained by adding 5% noise to the experimental data was given to a puzzled state detection model, and a puzzled state detection test was performed. As a result, all confused state points could be identified.
[0055]
Next, in order to verify the created confused state detection neural network model, a verification experiment was performed using the basic pattern screen of the trap T2 on 10 subjects as unknown data. The verification experiment data was input to the model to detect the confused state. Here, FIG. 12 is a diagram illustrating a result of inputting verification experiment data into a model and detecting an unknown puzzled state. In the figure, the horizontal axis is the time axis, and the upper frame portion is the time during which traps t1 to t10, which are confusing state pattern screens, are displayed. A line extending below the lower part indicates a place where the model is determined to be confused. As a result, as shown in FIG. 12, the confused state detection model detected the confused state except for t10 for ten unknown traps t1 to t10, and showed a detection rate of 90%.
[0056]
According to the puzzle detection method of the above embodiment, the following features can be obtained.
-The above-mentioned embarrassment detection method has an effect that embarrassment of the subject can be detected only by observing the gaze of the subject without including other elements. Therefore, it is possible to extremely easily detect the embarrassment of the subject without placing a burden on the subject. By accurately detecting the embarrassment as described above, the occurrence of the embarrassment state that cannot be extracted by ordinary observation is objectively and reliably detected, so that the problematic portion of the man-machine interface can be extracted.
[0057]
In addition, since the neural network is used for the determination of the embarrassed state, there is an effect that it is possible to accurately determine complicated line-of-sight speed history data and perform highly accurate detection. In particular, the neural network of the present embodiment is a hierarchical neural network having a supervised learning algorithm suitable for pattern matching. In addition, the input signal is processed in a feed-forward manner that flows only in the forward direction internally, and the sigmoid function is used as the node function. Therefore, there is an effect that determination with high accuracy can be performed using a personal computer or the like.
[0058]
Further, since the traps T0 to T4 are used for learning the neural network, there is an effect that highly accurate learning can be performed.
The above embodiment may be modified as follows.
[0059]
In the present embodiment, the embarrassment determination step is processed by pattern matching using a feedforward neural network, but may be performed by a learning algorithm of back propagation (error back propagation method). In addition, various techniques such as Boltzmann machine, a simplified mean-field approximation learning machine, RBF net (radial basis function net), learning vector quantization, fuzzy art map, and cognitron can be adopted or applied. It is.
[0060]
In the present embodiment, the neural network using the sigmoid function as the node function in the flow of the processing of the input signal is used as the confusing judgment step, but a threshold function or the like may be used. Also, in the sigmoid function and the threshold function, a process to which “fluctuation” is added may be performed, and a linear function that outputs the input sum as it is may be considered. In the case of a Boltzmann machine, the inclination is gradually increased.
[0061]
【The invention's effect】
As described above in detail, according to the present invention, the present invention has an effect that the embarrassment of the subject can be automatically and appropriately detected only by observing the line of sight.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a confused detection system 1.
FIG. 2 is a diagram schematically showing storage contents of an HDD 7;
FIG. 3 is a diagram showing a method of detecting the position of a line of sight.
FIG. 4 is a diagram showing a trap T0.
FIG. 5 is a diagram showing a trap T1.
FIG. 6 is a diagram showing a trap T2.
FIG. 7 is a diagram showing a trap T3.
FIG. 8 is a diagram showing a trap T4.
FIG. 9 is a schematic diagram showing learning of a neural network.
FIG. 10 is a diagram showing the relationship between the entire line-of-sight velocity history after learning, the position of a trap, and the detection of embarrassment.
FIG. 11 is a diagram showing an example of a sigmoid function.
FIG. 12 is a diagram showing a result of inputting verification experiment data into a model and detecting an unknown puzzled state;
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Puzzle detection apparatus, 2 ... Computer, 3 ... CPU, 4 ... RAM, 5 ... ROM, 6 ... Interface, 7 ... HDD, 8 ... Display, 9 ... Keyboard, 10 ... Mouse, 11 ... Monitor camera, 12 ... Detection camera, P1, P2: light source, T0 to T4: trap

Claims

A computer comprising input means and display means,
Gaze position detection step of detecting the position of the gaze of the subject by the gaze detection means at predetermined time intervals,
A line-of-sight speed data generation step of generating a change in the position of the detected line of sight as line-of-sight speed data,
A step of storing a predetermined number of continuously generated line-of-sight speed data as line-of-sight speed history data,
A confused determination step of determining whether or not the subject is confused by comparing the line-of-sight speed history data with a predetermined pattern.

2. The embarrassment detection method according to claim 1, wherein the predetermined time interval in the step of detecting the line-of-sight position is not less than 1/200 second and not more than 1/15.

The confused detection according to claim 1 or 2, wherein, in the step of storing the line-of-sight speed history data, a predetermined number of line-of-sight speed data continuously detected as line-of-sight speed history data is at least 30 or more. Method.

The total time length of the line-of-sight speed data continuously detected as the line-of-sight speed history data in the step of storing the line-of-sight speed history data is 0.5 seconds or more and 5 seconds or less. 3. The confused detection method according to any one of 3.

The gaze velocity history data is generated and stored each time the gaze velocity data is generated in the gaze velocity data generation step in the gaze velocity history data storage step. Or the confused detection method according to claim 1.

The confused detection method according to any one of claims 1 to 5, wherein the confused determination step is performed by pattern matching using a neural network.

7. The embarrassment detection method according to claim 6, wherein in the embarrassment determination step, the neural network is a feedforward hierarchical neural network, and the node function uses a sigmoid function.

In the embarrassment determination step, a predetermined pattern to be compared is generated based on the line-of-sight speed history data in a state in which a subject is given a trap that allows the subject to select an answer on the screen based on a question that has no correct answer or an answer that is duplicated. The embarrassment detection method according to claim 7, further comprising a learning step.

In the learning step, only the line-of-sight speed history data at a specific time during the time when the trap is displayed is learned, and the line-of-sight speed history data at other times during the time when the trap is displayed is not learned. The confused detection method according to claim 8, wherein:

In the embarrassment determination step, the comparison between the line-of-sight speed history data and the predetermined pattern for determining whether or not the subject is in a confused state is performed using a binary value of a match or a disagreement based on a predetermined threshold. The confused detection method according to any one of claims 1 to 9, wherein:

A computer having input means and display means,
Gaze position detection step of detecting the position of the gaze of the subject by the gaze detection means at predetermined time intervals,
A line-of-sight speed data generation step of generating a change in the position of the detected line of sight as line-of-sight speed data,
A step of storing a predetermined number of continuously generated line-of-sight speed data as line-of-sight speed history data,
A confused determination step of comparing the line-of-sight speed history data with a predetermined pattern to determine whether or not the subject is in a confused state.