JP2004062393A

JP2004062393A - Method and device for determining attention

Info

Publication number: JP2004062393A
Application number: JP2002218086A
Authority: JP
Inventors: Hitoshi Hongo; 本郷　仁志
Original assignee: Japan Science and Technology Corp; Softopia Japan Foundation
Current assignee: Japan Science and Technology Agency; Softopia Japan Foundation
Priority date: 2002-07-26
Filing date: 2002-07-26
Publication date: 2004-02-26
Anticipated expiration: 2022-07-26
Also published as: JP4203279B2

Abstract

<P>PROBLEM TO BE SOLVED: To measure the degree of attention to presentation information. <P>SOLUTION: Video cameras 11A, 11B pick up the images of a person H to be detected. Personal computers (PCs) 12A, 12B for the cameras capture the images from the cameras 11A, 11B, detect eye areas from the picture data, detect pupil areas from the eye areas, and calculate the circularity of the pupil areas. Then the PCs 12A, 12B transmit the calculated circularity data to a main PC 13, which compares circularity data with each other to detect the line of sight. When the line of sight is turned to the screen of a picture display device 14, the main PC 13 moves presentation information Q displayed on the screen of the display device 14. When the moving direction of the detected line of sight coincides with the moving direction of the presented information Q, the main PC 13 decides that the line of sight follows the movement of the presented information Q. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、注目判定方法及び注目判定装置に関するものである。
【０００２】
【従来の技術】
人物の視線や動作など、人間をセンシングして得られる情報と、物体センシングにより構築された周辺環境とから、その人の要望を察知し、その人の意図に適したサービスを提供することが提案されている。これらを実現するためには、人間とその周辺環境をセンシングし、その人が何を見て、どのような動作を行っているかを知ることが重要なこととなる。このとき、視線情報はその人が注目している物、又はその人の意図や状況を推定するのに欠かせない情報の１つである。
【０００３】
【発明が解決しようとする課題】
しかしながら、単に視線を検出しても、その人が視線方向にある提示された情報（提示情報）を単に眺めているだけなのか、あるいはその提示情報に注目しているのかの相異を得ることができる方法及びその装置は未だ提案されていない。
【０００４】
仮に、提示情報に注目していることが分かる方法及びその装置があるとすると、この方法及び装置によって得られた提示情報に対する人の注目度を分析することが可能となり、その提示情報に対する興味度や、市場性をも分析することが可能となる。
【０００５】
本発明の目的は、上記問題点を解決するためになされたものであり、その目的は、提示情報に対する注目度を測ることができる注目判定方法及びその装置を提供することにある。
【０００６】
【課題を解決するための手段】
上記問題点を解決するために、請求項１の発明は、提示情報を所定方向に移動する移動ステップと、前記移動ステップ期間中の人の視線を検出し、視線の移動方向を検出する視線移動方向推測ステップと、前記提示情報の所定方向の移動と、推測した前記人の視線の移動方向とが一致しているか否かを判定する判定ステップとを含む注目判定方法であることを特徴とする。
【０００７】
請求項２の発明では、提示情報を所定方向に移動する移動手段と、前記提示情報の移動の期間中の人の視線を検出して視線の移動方向を推測する視線移動方向推測手段と、前記提示情報の所定方向の移動と、推測した前記人の視線の移動方向とが一致しているか否かを判定する判定手段とを備えた注目判定装置を構成した。
【０００８】
請求項３の発明では、請求項２において、前記移動手段は、前記提示情報を画像表示する画像表示装置とし、前記画像表示装置における画面内で前記提示情報を移動するようにした。
【０００９】
請求項４の発明では、請求項２及び請求項３のいずれか１項において、検出対象者を撮像する撮像手段と、前記撮像手段が撮像した画像データに含まれる顔の目領域から瞳領域を検出する瞳領域検出手段と、前記瞳領域検出手段が検出した瞳領域の形状に対して円形度を算出する円形度算出手段と、前記円形度算出手段が算出した円形度に基づいて視線を検出する視線検出手段と、前記視線検出手段によって検出された視線の移動方向を特定する特定手段とを備えた前記視線移動方向推測手段を構成した。
【００１０】
請求項５の発明では、請求項４において、互いに異なる位置に前記撮像手段を複数設け、前記視線検出手段は、各撮像手段から得た画像データに含まれる瞳領域の各円形度を比較することで視線を検出するものとした。
【００１１】
請求項６の発明では、請求項４及び請求項５のいずれか１項において、前記円形度算出手段は、前記瞳領域を含む画像データをハフ変換し、ハフ変換の投票空間において、投票数が最大となる投票数最大点を含む第１領域、及び前記投票数最大点を含み前記第１領域よりも大きい第２領域における各投票数から円形度を算出するものとした。
【００１２】
請求項７の発明では、請求項４及び請求項５のいずれか１項において、前記円形度算出手段は、瞳領域の面積及び瞳領域の輪郭線の長さに基づいて円形度を算出するものとした。
【００１３】
（作用）
請求項１及び請求項２の発明によれば、提示情報が所定方向に移動するときに人の視線が検出される。この視線検出に基づいて視線の移動方向が推測され、この推測された視線の移動方向と、提示情報の所定方向の移動と一致しているか否かの判定が行われる。この判定に基づいて、人が視線方向にある提示情報を単に眺めているだけなのか、あるいはその提示情報に注目しているのかの特定をすることが可能となる。
【００１４】
請求項３の発明によれば、提示情報は、画像表示装置の画面内で移動される。提示情報を画面に表示する構成では、種々の提示情報を必要に応じて取り替えて表示することができる。
【００１５】
請求項４の発明によれば、撮像手段が撮像した画像データに含まれる顔の目領域から瞳領域が瞳領域検出手段によって検出され、その検出された瞳領域の形状に対して円形度が円形度算出手段により算出される。そして、視線検出手段により、円形度算出手段が算出した円形度に基づいて視線が検出される。特定手段は、視線検出手段によって検出された視線の移動方向を特定する。判定手段は、特定手段によって特定された視線の移動方向が提示情報の移動方向に一致しているか否かを判定する。目領域の中から白目領域と容易に区別して検出可能な瞳領域を用いて視線検出を行うため、従来と異なり、瞳孔を検出可能なまで目領域を拡大して撮像する必要がない。このため、視線検出に瞳孔を用いる場合と比較して検出対象者の頭部位置の影響を受けにくく、人が提示情報に対して注目しているか否かを判定するために好適に視線を検出することができる。
【００１６】
請求項５の発明によれば、画像データは、互いに異なる位置に設けられた複数の撮像手段でそれぞれ取得され、各画像データに含まれる瞳領域の各円形度が前記視線検出手段により比較される。そして、この比較結果に基づいて視線が検出される。従って、算出された複数の円形度を比較することで視線を検出しているため、例えば、記憶装置に予め記憶されていた参照データと比較して視線を検出する従来と異なり、検出した瞳領域に対して前記参照データと比較可能な状態にするために較正（キャリブレーション）を行う必要がなく、簡便に視線検出が行われる。
【００１７】
請求項６の発明によれば、前記瞳領域を含む画像データはハフ変換され、ハフ変換の投票空間において、投票数が最大となる投票数最大点を含む第１領域、及び前記投票数最大点を含み前記第１領域よりも大きい第２領域における各投票数から円形度算出手段により円形度が算出される。従って、ハフ変換の投票空間への投票数を利用して好適に円形度の算出が実現される。
【００１８】
請求項７の発明によれば、円形度は、検出された瞳領域の面積及び瞳領域の輪郭線の長さに基づいて円形度算出手段により算出される。従って、面積及び輪郭線の長さを利用して好適に円形度の算出が実現される。
【００１９】
【発明の実施の形態】
以下、本発明の視線検出装置を具体化した第１の実施形態を図１〜図１０を参照して説明する。
【００２０】
図１（ａ）に示すように、注目判定装置１０は、撮像手段としての複数台（本実施形態では２台）のビデオカメラ１１Ａ，１１Ｂと、カメラ用パソコン１２Ａ，１２Ｂと、メインパソコン１３と、画像表示装置１４とを備えている。ビデオカメラ１１Ａ，１１Ｂは、ＣＣＤカメラである。本実施形態におけるビデオカメラ１１Ａ，１１Ｂ及び画像表示装置１４は、全て所定の高さ（例えば、１．１５ｍ）に配置されており、画像表示装置１４は、ビデオカメラ１１Ａ，１１Ｂの設置個所の中間の位置に配設されている。
【００２１】
ビデオカメラ１１Ａ，１１Ｂにはカメラ用パソコン１２Ａ，１２Ｂが接続されている。カメラ用パソコン１２Ａ，１２Ｂには、ビデオカメラ１１Ａ，１１Ｂで撮影された個々のフレーム（画像データ）が、ビデオレートのカラー画像（６４０×４８０）として入力されるようになっている。
【００２２】
カメラ用パソコン１２Ａ，１２Ｂは、メインパソコン１３に接続されており、メインパソコン１３は、各カメラ用パソコン１２Ａ，１２Ｂとの通信をイーサネット（登録商標）を介したソケット通信で行うようにしている。また、ネットワーク・タイムサーバシステムが用いられており、メインパソコン１３がタイムサーバとして設定され、各カメラ用パソコン１２Ａ，１２Ｂの時刻がメインパソコン１３に合わされるようになっている。又、メインパソコン１３は、画像表示装置１４に有線接続されている。メインパソコン１３は、注目判定装置１０の視線検出結果に応じて画像表示装置１４における提示情報の表示及び提示情報の移動を制御する。なお、メインパソコン１３と画像表示装置１４とを有線接続せずに、画像表示装置１４における画像表示を無線で制御する態様をとってもよい。
【００２３】
（注目判定方法）
以下、第１の実施形態の注目判定方法について説明する。まず、注目判定装置１０が行う注目判定の概要を説明する。
【００２４】
各ビデオカメラ１１Ａ，１１Ｂは、検出対象者Ｈを撮像し、撮像情報を各カメラ用パソコン１２Ａ，１２Ｂに入力する。各カメラ用パソコン１２Ａ，１２Ｂは、ビデオカメラ１１Ａ，１１Ｂからの画像のキャプチャを行い、画像データから目領域１６を検出する。そして、検出された目領域１６から、瞳領域１７を検出すると共に、瞳領域１７の大きさを正規化し、この瞳領域１７の形状に対して円形度を算出する。カメラ用パソコン１２Ａ，１２Ｂは、その円形度の算出結果をメインパソコン１３に送信し、メインパソコン１３は、各円形度を比較することで視線を検出する。なお、「円形度」とは、真円に対する近似度合いを示すものであり、真円に近いほど円形度が高いと表現し、円が変形するほど円形度が低いと表現する。例えば、長軸の長さと短軸の長さとの差が大きい楕円になっていくほど、円形度が低くなる。そして、第１の実施形態においては、円形度は、後述する式（６）により算出される評価値Ｅに比例する。評価値Ｅが大きければ円形度は高く、評価値が小さければ円形度は低くなる。従って、第１の実施形態において、円形度を算出するというのは、後記する評価値Ｅを算出することを示す。
【００２５】
視線が画像表示装置１４の画面に向けられたとき、メインパソコン１３は、画像表示装置１４の画面に表示されている提示情報Ｑを移動させる。提示情報Ｑの移動に伴い、メインパソコン１３は、検出される視線の移動方向を推測する。検出される視線の移動方向と提示情報Ｑの移動方向とが一致する場合、メインパソコン１３は、視線が提示情報Ｑの移動に追従したとの判定を行う。
【００２６】
図２及び図３は、注目判定装置１０によって遂行される注目判定プログラムを表すフローチャートである。以下、このフローチャートを参照して注目判定装置１０による注目判定を説明する。
【００２７】
メインパソコン１３からカメラ用パソコン１２Ａ，１２Ｂへ開始要求信号が送信されると、このフローチャートは開始される。そして、メインパソコン１３からカメラ用パソコン１２Ａ，１２Ｂへ終了要求信号が送信されるまで、ステップＳ１の処理が繰り返し行われる。
【００２８】
ステップＳ１において、まず、カメラ用パソコン１２Ａ，１２Ｂは、ビデオカメラ１１Ａ，１１Ｂからの画像のキャプチャを行うか否かの判定を行う。即ち、本実施形態では、ビデオカメラ１１Ａ，１１Ｂからの画像のキャプチャは、所定間隔（例えば０．１秒）毎に行われるようになっており、各カメラ用パソコン１２Ａ，１２Ｂは、その時刻か否かを判定する。そして、画像をキャプチャする時刻であると判断した場合（ステップＳ１がＹＥＳ）、各カメラ用パソコン１２Ａ，１２Ｂは、ビデオカメラ１１Ａ，１１Ｂからの画像のキャプチャを行う（ステップＳ２）。一方、カメラ用パソコン１２Ａ，１２Ｂが画像をキャプチャする時刻ではないと判断した場合（ステップＳ１がＮＯ）、この判定を繰り返す。なお、各カメラ用パソコン１２Ａ，１２Ｂの時刻は、メインパソコン１３に合わされているため、各カメラ用パソコン１２Ａ，１２Ｂは、同時刻に画像のキャプチャを行うようになっている。
【００２９】
（顔領域検出）
各カメラ用パソコン１２Ａ，１２Ｂは、ビデオカメラ１１Ａ，１１Ｂからのフレーム〔画像データ、例えば図４（ａ）参照〕をキャプチャした後、顔領域検出を行う。顔領域検出は、色情報を用いた公知の肌色基準値による手法を用いている。本実施形態では、均等知覚色空間の１つであるＣＩＥ　Ｌ＊ｕ＊ｖ　表色系を用いている。
【００３０】
まず、入力された画像データから、画像の全領域に亘り、Ｕ，Ｖ座標値による２次元色ヒストグラムを求め、予め定めた肌色有効範囲内のピーク値（度数が最大の値）を肌色基準値とする。その基準値からの色差に対して公知の判別分析法を適用して閾値を決定し、その閾値に基づいて肌色領域とその他の領域に２値化する〔図４（ｂ）参照〕。本実施形態では、検出対象者Ｈが一人の場合を想定しているため、複数の肌色領域が検出された場合には、各カメラ用パソコン１２Ａ，１２Ｂは最大領域を顔領域１５と判定する（ステップＳ３）。すなわち、抽出された複数の肌色領域にて、画素数（面積）を求め、最大面積Ｓｍａｘ　の領域を顔領域１５とする。なお、以下の説明において、前記Ｕ，Ｖ座標値は、説明の便宜上ＵＶ値又はＵ値，Ｖ値というときもある。
【００３１】
ステップＳ４では、メインパソコン１３は、画像表示装置１４に対して提示情報の表示を指示し、画像表示装置１４は、画面上に提示情報Ｑ〔図１（ｂ）に図示〕を表示する。
【００３２】
（視線検出）
次のステップＳ５〜Ｓ９の概要を説明すると、カメラ用パソコン１２Ａ，１２Ｂは、顔領域１５の中から目領域１６を検出する。そして、さらにその目領域１６の中から瞳領域１７を検出すると共に、その瞳領域１７の大きさを正規化する。そして、瞳領域１７の形状に対して円形度を算出し、その算出結果をメインパソコン１３に送信する。メインパソコン１３は、各カメラ用パソコン１２Ａ，１２Ｂから受信した各円形度の算出結果を比較して視線を検出する。
【００３３】
（目領域検出）
ステップＳ５において、まず、カメラ用パソコン１２Ａ，１２Ｂは、画像データについて肌色基準値を再算出し、肌色領域を抽出する。抽出された肌色領域のうち、最大領域を顔領域１５と判定する。
【００３４】
カメラ用パソコン１２Ａ，１２Ｂは、その顔領域１５に基づき、４方向面特徴と色差面特徴を用いたテンプレートマッチング手法により、それぞれ目領域１６、並びに口領域を検出する。
【００３５】
今回の画像データの１つ前に本フローチャートを用いて処理された画像データにおいて、ステップＳ５で目領域１６及び口領域が検出されていたとする。その場合には、前回の検出結果に基づいて、今回得られた顔領域１５を所定領域削除し、顔領域１５が前記所定領域分狭められた探索範囲として設定されるようになっている。そして、今回の画像データに関しては、前記探索範囲が用いられ、テンプレートマッチング手法により目領域１６及び口領域の検出が行われる。なお、テンプレートマッチングを行った結果、前記探索範囲に対して目領域１６及び口領域が検出されなかった場合は、再度、顔領域１５に対して目領域１６及び口領域の検出が行われるようになっている。
【００３６】
ここで、前記テンプレートマッチング手法について説明する。
この手法は、得られた画像データから、前述した４方向面特徴抽出にて４方向面特徴（方向面）、及びＵ，Ｖ座標値による色差面特徴を抽出し、肌色領域抽出で得られた肌色領域（顔領域１５）又は探索範囲に対して、右目、左目、口の各テンプレートを用いて類似度を計算する。
【００３７】
なお、前記色差面特徴は、肌色基準値からのＵ値の差、及びＶ値の差を示すものである。また、前記テンプレートとは、予め、右目、左目、口の画像を複数枚用意し、４方向面特徴及び色差面特徴を抽出した画像データを、所定比率で縮小し、横幅を所定ピクセル（例えば３２ピクセル）に揃え、大きさの正規化を行ったものである。そして、４方向面特徴に関しては、エッジ方向情報を４方向に分解し、さらに、４方向面特徴及び色差面特徴に対してガウシャンフィルタで平滑化し、各画像データを８×８の解像度に変換したものである。このテンプレートは、記憶装置（図示しない）に記憶されている。
【００３８】
そして、テンプレートＴと画像データ（入力画像）Ｉとの４方向面特徴の類似度ａを以下の式（１）で算出し、色差面特徴の類似度ｂを以下の式（２）で算出する。
【００３９】
【数１】

【００４０】
【数２】

式（１），（２）中、Ｉは入力画像を示し、Ｔはテンプレートを示す。ｉは１〜ｍの整数値であり、ｊは１〜ｎの整数値である。ｉ，ｊは、ｍ×ｎ画素のテンプレート及び入力画像に対応している。（ｘ，ｙ）は入力画像の左上座標を示す。また、式（２）中Ｔｕ，ＴｖはテンプレートのＵＶ値、Ｉｕ，Ｉｖは画像データのＵＶ値を示し、Ｕｍａｘ　，Ｖｍａｘ　はＵＶ値の最大範囲を示す。本実施形態では、ＣＩＥ　Ｌ＊ｕ＊ｖ　表色系を用いており、このＣＩＥ　Ｌ＊ｕ＊ｖ　表色系において、処理の高速化及び記憶装置の空間を節約するため、Ｕｍａｘ　＝２５６、Ｖｍａｘ　＝２５６としている。
【００４１】
次いで、これらの式（１），（２）で算出した、各類似度ａ，ｂに基づいて、以下の式（３）により、最終的な類似度ｋを算出する。
ｋ＝Ｗａ　×ａ＋Ｗｂ　×ｂ　　　・・・（３）
式（３）中、Ｗａ　，Ｗｂ　は、重み付けとして、各類似度ａ，ｂに掛け合わせられる所定の定数であり、Ｗａ　＋Ｗｂ　＝１を満たしている。本実施形態では、Ｗａ　＝Ｗｂ　＝０．５としている。
【００４２】
類似度ｋの算出結果を基に、類似度ｋが予め設定された閾値以上の箇所を目の候補領域とする。そして、入力画像（画像データ）には、左上座標が予め付与されており、その座標に基づき、目、口の位置関係が把握できる。従って、その座標に基づいて、例えば、目は口より上にある、右目と左目の配置等、目、口の大まかな位置関係（座標位置）を満たし、最も類似度ｋの高い組み合わせを目領域１６並びに口領域として決定する。この結果、顔領域１５の中で目領域１６が検出される。
【００４３】
（瞳領域検出）
次にステップＳ６において、検出された目領域１６からカメラ用パソコン１２Ａ，１２Ｂは、瞳領域１７の検出を行う。本実施形態では、ステップＳ５にて検出された目領域１６のうち何れか一方（例えば右目）の目領域１６について、以下に説明する瞳領域検出を行う。
【００４４】
まず、目領域画像の彩度値ヒストグラムを作成して、公知の判別分析法を適用し、顔領域１５を目領域１６と肌領域（顔領域の目領域１６以外の領域）とに分離する。一般的に、肌領域の彩度は高く、目領域１６の彩度は低い。このため、この分離処理はその特性を利用している。次いで、前記目領域画像の輝度ヒストグラムを作成して、公知の判別分析法を適用し、分離された目領域１６を、瞳領域１７と白目領域１８とに分割する。このとき、白目領域１８と瞳領域１７の彩度の違いは明らかであるため、例えば瞳領域１７を瞳孔と虹彩とに分割する場合と比較して、拡大された画像データでなくても簡便に分割処理ができる。
【００４５】
その後、瞳領域１７の検出結果を基に、瞳領域１７を縮小又は拡大し、所定の大きさに正規化する。そして、瞳領域１７に対して円形状の補完を行う。この際、前述したように、彩度値ヒストグラム及び輝度ヒストグラムにそれぞれ判別分析法を適用して分割することで得られた瞳領域１７内には、図５（ａ）に示すように、瞼による陰影１７１の存在が考えられる。このとき、通常、画像の濃淡値を８ビットで表した場合、濃淡値０が黒、濃淡値２５６が白となる。そこで、領域分割結果における濃淡値０（黒色）の領域に対して、図５（ｂ）に示すような水平射影ヒストグラムを作成する。そして、同ヒストグラムにおいて縦軸方向〔図５（ｂ）において上下方向〕の上部に示されるように、極端なピークをもつ部分を予め設定された閾値に基づいて削除する。つまり、瞼による陰影１７１の部分は、該ヒストグラム上でピークとして現れ、それを削除することで、図５（ｃ）に示すような、瞳領域１７のみが抽出される。
【００４６】
（円形度算出）
次に、ステップＳ７において、カメラ用パソコン１２Ａ，１２Ｂは、検出された瞳領域１７に対して円のハフ変換を行い、このハフ変換で用いられるパラメータ空間における投票数によって円形度の算出を行う。
【００４７】
まず、目領域１６に対して、白目領域１８と検出された瞳領域１７との濃淡の違いを利用して、Ｐｒｅｗｉｔｔ　オペレータを用い、図５（ｄ）に示すような瞳領域１７のエッジ画像を生成することで、エッジ（輪郭）１７２を抽出する。その後、そのエッジ１７２を構成する点（以下「エッジ点」という）群に対して公知の円のハフ変換を行う。
【００４８】
ここで、円のハフ変換について図６（ａ）及び図６（ｂ）を用いて簡単に説明する。
図６（ａ）に示すようなあるｘ−ｙ平面において、円は次の方程式（４）で表される。
【００４９】
（ｘ−ａ）^２＋（ｙ−ｂ）^２＝ｒ^２・・・（４）
ｘ−ｙ平面上の１つの円は、円の中心（ａ，ｂ）及び半径ｒに関する（ａ，ｂ，ｒ）パラメータ空間（以下、単に「パラメータ空間」という）では１点で表現される。円のハフ変換は、ｘ−ｙ平面を円の特徴を示す３次元のパラメータ空間に変換する。そして、この円のハフ変換は、式（４）を変換した次式（５）で実現する。
【００５０】
ｂ＝±√｛ｒ^２−（ｘ−ａ）^２｝＋ｙ　・・・（５）
この式（５）を利用して、ｘ−ｙ平面における円上の任意の点（ｘ０，ｙ０）をハフ変換すると、パラメータ空間では、ｂ＝±√｛ｒ^２−（ｘ０−ａ）^２｝＋ｙ０上に乗った画素〔点（ａ，ｂ，ｒ）〕に投票がなされる。すると、パラメータ空間において、投票結果である点群は、（ｘ０，ｙ０）を通る全ての円の集まりで表現される。例えば、図６（ａ）に示したｘ−ｙ平面上の３点ｅ（ｘ１，ｙ１）、ｆ（ｘ２，ｙ２）、ｇ（ｘ３，ｙ３）を式（５）でパラメータ空間に投射すると、それぞれの半径の値ｒに対するａ−ｂ平面について、各点ｅ，ｆ，ｇが図６（ｂ）に示すような１つずつの円に変換される。なお、図６（ｂ）は、ｒ＝ｒ０におけるａ−ｂ平面を示している。このハフ変換によって得られた円をハフ曲線という。
【００５１】
前記ハフ曲線は、パラメータ空間において互いに交じり合うため、交点が複数生じる。そして、交わるハフ曲線の数（即ち、投票数）が最大の交点を見つけることにより、ｘ−ｙ平面の円の中心座標と半径が検出可能になっている。なお、図６（ｂ）においては、交点（ａ０，ｂ０，ｒ０）において、３つのハフ曲線が交わり、交点に対する投票数は３となっている。それに対して、それ以外の点では投票数が２以下であるため、ｘ−ｙ平面の円の中心座標は（ａ０，ｂ０）となり、半径はｒ０と決定される。
【００５２】
さて、瞳領域１７のエッジ画像におけるエッジ点群には左上座標（ｘ，ｙ）が付与される。そして、前記円のハフ変換を行い、図５（ｄ）に示すようなエッジ点群が付されたｘ−ｙ平面を、図７に示すような瞳領域１７の中心座標（ａ１，ｂ１）と半径ｒ１を示すための３次元パラメータ空間へ変換する。そして、ｘ−ｙ平面における各エッジ点が、パラメータ空間において、式（５）を満たす所定の画素〔点（ａ，ｂ，ｒ）〕に投票する。そして、図形空間におけるエッジ１７２をくまなく走査して、パラメータ空間において、最大投票数を得た点が瞳中心点Ｃとなる。前記瞳中心点Ｃが投票数最大点に相当し、パラメータ空間が投票空間に相当する。なお、図７のａ軸及びｂ軸の値は中心座標を示し、ｒ軸の値は半径を示す。また、瞳領域１７の半径ｒの範囲（ｒｍｉｎ　〜ｒｍａｘ　）は予めカメラ用パソコン１２Ａ，１２Ｂの記憶装置（図示しない）に記憶されており、ハフ変換の際には、その範囲（ｒｍｉｎ　〜ｒｍａｘ　）内の所定の画素〔点（ａ，ｂ，ｒ）〕に投票がなされる。
【００５３】
ところで、図８に示すように、例えば検出対象者Ｈがビデオカメラ１１Ａを真正面から直視しているとき、ビデオカメラ１１Ａにて撮像された画像における瞳（瞳領域１７）の形状は、図９（ａ）に示すように真円になる。また、このとき、ビデオカメラ１１Ａの近傍に位置するビデオカメラ１１Ｂの方向から検出対象者Ｈを見ると、同検出対象者Ｈは他の方向を見ていることになる。そのため、ビデオカメラ１１Ｂにて撮像された画像における瞳（瞳領域１７）の形状は、図９（ｂ）に示すように、楕円のような変形した円になる。
【００５４】
つまり、検出対象者Ｈの視線が向けられていれば、撮像した瞳（瞳領域１７）の形状は、真円若しくは真円に近い形を示すのである。その一方で、視線が外れていれば、外れるほど撮像した瞳（瞳領域１７）の形状は、短軸の長さと長軸の長さとの差が広がった楕円を示すのである。本実施形態においては、ビデオカメラ１１Ａ，１１Ｂは全て同じ高さに配置されているため、図９（ｂ）に示す視線が外れた状態の瞳（瞳領域１７）の形状は縦長の楕円となる。従って、複数のビデオカメラ１１Ａ，１１Ｂで撮像した瞳（瞳領域１７）の形状を比較して、ある瞳（瞳領域１７）の形状がより真円に近いということは、検出対象者Ｈがそのビデオカメラ１１Ａへ視線を向けていることを示す。
【００５５】
一方、ハフ変換を行った際のパラメータ空間での投票数において、ｘ−ｙ平面たるエッジ画像の瞳領域１７（エッジ１７２）の形状が真円に近いほど、図１０（ａ）に示すように瞳中心点Ｃとされる点（ａ１，ｂ１，ｒ１）に投票数が集中し、その周辺の投票数との差が大きくなるという特性がある。また、エッジ画像の瞳領域１７（エッジ１７２）の形状が楕円形になった場合、その長軸の長さと短軸の長さとが異なるほど投票数がばらつき、図１０（ｂ）に示すように、瞳中心点Ｃとされる点（ａ１，ｂ１，ｒ１）の投票数とその周辺の投票数との差が開きにくくなるという特性がある。なお、図１０（ａ），（ｂ）に示す３次元空間は、中心座標（ａ１，ｂ１）を示すためのａ−ｂ平面に、投票数を示すｖ軸が直交した三次元空間を示し、図７に示すパラメータ空間とは異なる。また、図１０（ａ），（ｂ）は、共に半径ｒ＝ｒ１のａ−ｂ平面における投票数を示している。
【００５６】
以上の結果により、パラメータ空間において決定された瞳中心点Ｃの投票数Ｖｃと、この瞳中心点Ｃを中心とした比較領域Ｐ〔図９（ａ），（ｂ）に図示〕内の投票数Ｖｒとから、次式（６）により円形度を示す評価値Ｅを算出する。なお、以下においては、瞳中心点Ｃの投票数Ｖｃを「中心投票数」といい、瞳中心点Ｃを中心とした比較領域Ｐ〔図９（ａ），（ｂ）に図示〕内の投票数Ｖｒを「領域投票数」ということにする。
【００５７】
Ｅ＝ｋ×Ｖｃ／Ｖｒ　　・・・（６）
式（６）中、ｋは、重み付けとして掛け合わせられる所定の定数である。また、本実施形態において、比較領域Ｐとは、瞳中心点Ｃにおける中心座標（ａ１，ｂ１）を中心として５×５画素の範囲を示す。領域投票数Ｖｒは、図１０（ａ），（ｂ）に示すようなｒ＝ｒ１における投票数だけではなく、比較領域Ｐに含まれたそれぞれのｒにおける投票数も含む。瞳中心点Ｃが第１領域に相当し、比較領域Ｐが第２領域に相当する。
【００５８】
そして、前述したように、瞳領域１７が真円に近くなるほど、瞳中心点Ｃに投票数が集中するため、式（６）においては、分子の中心投票数Ｖｃが大きくなり、評価値Ｅも大きくなる。また、瞳領域１７が真円から離れるほど投票数がばらつくため、式（６）においては、分子の中心投票数Ｖｃが小さくなる一方で、分母の領域投票数Ｖｒが大きくなり、評価値Ｅが小さくなる。この結果、評価値Ｅが大きくなれば円形度が高くなり、評価値Ｅが小さくなれば円形度は低くなる。
【００５９】
カメラ用パソコン１２Ａ，１２Ｂは、前記のようにして算出された円形度（評価値Ｅ）の結果をメインパソコン１３に送信する。なお、各カメラ用パソコン１２Ａ，１２Ｂの時刻は、メインパソコン１３に合わされているため、各カメラ用パソコン１２Ａ，１２Ｂから送信される円形度（評価値Ｅ）の算出結果は、それぞれ同時刻にキャプチャした画像データから算出されたものになっている。
【００６０】
（視線検出）
ステップＳ８において、メインパソコン１３は、カメラ用パソコン１２Ａから受信した評価値Ｅａと、カメラ用パソコン１２Ｂから受信した評価値Ｅｂとの差を演算する。評価値Ｅａと評価値Ｅｂとの差｜Ｅａ−Ｅｂ｜が予め設定された閾値Δよりも小さい場合、メインパソコン１３は、視線が画像表示装置１４の画面上の提示情報Ｑに向けられているとの判定を行う（ステップＳ９がＹＥＳ）。評価値Ｅａと評価値Ｅｂとの差｜Ｅａ−Ｅｂ｜が閾値Δよりも大きい場合、メインパソコン１３は、視線が画像表示装置１４の画面上の提示情報Ｑに向けられていないとの判定を行う（ステップＳ９がＮＯ）。
【００６１】
視線が画像表示装置１４の画面上の提示情報Ｑに向けられていないとの判定が行われた場合、メインパソコン１３は、この結果を記憶する（ステップＳ２０）。
【００６２】
視線が画像表示装置１４の画面上の提示情報Ｑに向けられているとの判定が行われた場合、メインパソコン１３は、提示情報Ｑを右側へ（ビデオカメラ１１Ａ側からビデオカメラ１１Ｂ側へ）、又は左側へ（ビデオカメラ１１Ｂ側からビデオカメラ１１Ａ側へ）の移動を画像表示装置１４に指示する。画像表示装置１４は、この指示に基づいて提示情報Ｑを右側へ、又は左側へ向けて移動させる（ステップＳ１０）。
【００６３】
提示情報Ｑが移動開始した後、ステップＳ２，Ｓ３と同じ画像収集（ステップＳ１２）及び顔領域検出（ステップＳ１３）が遂行される。次に、ステップＳ５〜Ｓ７と同じ目領域検出（ステップＳ１４）、瞳領域検出（ステップＳ１５）及び円形度算出（ステップＳ１６）が遂行される。ステップＳ１６の遂行後、メインパソコン１３は、カメラ用パソコン１２Ａから受信した評価値Ｅａと、カメラ用パソコン１２Ｂから受信した評価値Ｅｂとの大小関係を比較すると共に、この比較結果を記憶する（ステップＳ１７）。提示情報Ｑの移動期間ｔ_０は、例えば３秒というように予め設定されている。メインパソコン１３は、提示情報Ｑの移動開始時からの経過時間が移動期間ｔ_０に達したか否かを判断している（ステップＳ１８）。ビデオカメラ１１Ａ，１１Ｂからの画像のキャプチャは、所定間隔（０．１秒）毎に行われており、ステップＳ１７における比較は、提示情報Ｑの移動開始時からの経過時間が移動期間ｔ_０に達するまでの間、所定間隔（０．１秒）毎に行われている。従って、評価値Ｅａと評価値Ｅｂとの大小関係の比較結果〔差（Ｅａ−Ｅｂ）〕の記憶は、０．１秒毎に３秒／０．１秒＝３０回分だけ行われる。
【００６４】
提示情報Ｑの移動開始時からの経過時間が移動期間ｔ_０に達すると、メインパソコン１３は、記憶した比較結果に基づいて、視線が提示情報Ｑの移動に追従したか否かの判定を行う（ステップＳ１９）。
【００６５】
メインパソコン１３が提示情報Ｑの左側への移動を指示したとする。つまり、提示情報Ｑがビデオカメラ１１Ｂ側からビデオカメラ１１Ａ側へ移動したとする。この場合、視線が提示情報Ｑの移動に追従したか否かの判定基準は、例えば以下のようである。
【００６６】
〈１〉評価値Ｅａと評価値Ｅｂとの差（Ｅａ−Ｅｂ）が前回の差（Ｅａ’−Ｅｂ’）よりも大きい場合が少なくともｚ回（ｚは、例えば１０といっ
た複数）ある。
【００６７】
なお、前回の差（Ｅａ’−Ｅｂ’）とは、差（Ｅａ−Ｅｂ）を得たときよりも０．１秒前に得られた差（Ｅａ’−Ｅｂ’）のことである。
メインパソコン１３が提示情報Ｑの右側への移動を指示したとする。つまり、提示情報Ｑがビデオカメラ１１Ａ側からビデオカメラ１１Ｂ側へ移動したとする。この場合、視線が提示情報Ｑの移動に追従したか否かの判定基準は、例えば以下のようである。
【００６８】
〈２〉評価値Ｅａと評価値Ｅｂとの差（Ｅａ−Ｅｂ）が前回の差（Ｅａ’−Ｅｂ’）よりも小さい場合が少なくともｚ回（ｚは、例えば１０といっ
た複数）ある。
【００６９】
判定基準〈１〉が満たされるか、あるいは判定基準〈２〉が満たされた場合、メインパソコン１３は、視線の移動方向を特定すると共に、視線が提示情報Ｑの移動に追従したとの判定を行う。判定基準〈１〉及び判定基準〈２〉がいずれも満たされない場合、メインパソコン１３は、視線が提示情報Ｑの移動に追従しなかったとの判定を行う。
【００７０】
メインパソコン１３は、判定結果を記憶する（ステップＳ２０）。
第１の実施形態では、カメラ用パソコン１２Ａ，１２Ｂは、撮像手段が撮像した画像データに含まれる顔の目領域から瞳領域を検出する瞳領域検出手段に相当する。又、カメラ用パソコン１２Ａ，１２Ｂは、瞳領域検出手段が検出した瞳領域の形状に対して円形度を算出する円形度算出手段に相当する。メインパソコン１３は、円形度算出手段が算出した円形度に基づいて視線を検出する視線検出手段に相当する。又、メインパソコン１３は、視線検出手段によって検出された視線の移動方向を特定する特定手段、及び提示情報の所定方向の移動と推測した人の視線の移動方向とが一致しているか否かを判定する判定手段に相当する。
【００７１】
第１の実施形態によれば、以下のような効果を得ることができる。
（１）提示情報Ｑが所定方向（右側又は左側）に移動するときに人の視線が検出される。この視線検出に基づいて視線の移動方向が推測される。この推測された視線の移動方向と、提示情報Ｑの移動方向とが一致しているか否かの判定が行われる。本実施形態では、判定基準〈１〉又は判定基準〈２〉が満たされていれば、視線の移動方向と提示情報Ｑの移動方向とが一致しているとの判定が行われる。人の視線が静止している提示情報に向けられている場合には、人が視線方向にある提示情報を単に眺めているだけなのか、あるいはその提示情報に注目しているのかが判然としない。しかし、視線の移動方向と提示情報Ｑの移動方向とが一致しているとの判定が行われた場合には、人が提示情報Ｑに注目しているものと特定することができる。
【００７２】
視線が提示情報Ｑの移動に追従したとの判定や、視線が提示情報Ｑの移動に追従しなかったとの判定は、提示情報Ｑに対する人の注目度を分析することを可能にし、その提示情報に対する興味度や、市場性をも分析することが可能となる。
【００７３】
（２）１台のビデオカメラのみによって得られた評価値Ｅから追従判定を行う場合、提示情報Ｑの移動中に１台のビデオカメラのみによって得られた今回の評価値と前回の評価値との差を評価して追従判定を行うことになる。本実施形態では、提示情報Ｑの移動中に２台のビデオカメラ１１Ａ，１１Ｂによって得られた評価値Ｅａ，Ｅｂの差を評価して追従判定を行っている。つまり、評価値Ｅａ，Ｅｂの差と前回の評価値Ｅａ，Ｅｂの差との大小関係を把握することによって追従判定を行っている。提示情報Ｑが左側に移動するとすると、評価値Ｅａは、視線が提示情報Ｑの移動に追従するにつれて大きくなってゆくが、評価値Ｅｂは、視線が提示情報Ｑの移動に追従するにつれて小さくなってゆく。
【００７４】
図９（ａ）の鎖線で示す瞳領域１７ａは、画像表示装置１４の画面上の移動前の提示情報Ｑを見ているときにビデオカメラ１１Ａで撮像されたものである。図９（ａ）の鎖線で示す瞳領域１７ｂは、画像表示装置１４の画面上の移動前の提示情報Ｑを見ているときにビデオカメラ１１Ｂで撮像されたものである。そこで、図９（ａ），（ｂ）の鎖線で示す瞳領域１７ａ，１７ｂは、ビデオカメラ１１Ａ，１１Ｂによって前回得られたものとする。図９（ａ），（ｂ）の実線で示す瞳領域１７，１７は、ビデオカメラ１１Ａ，１１Ｂによって今回得られたものとする。そして、１台のビデオカメラ１１Ａのみによって得られた前回の評価値Ｅ〔図９（ａ）の鎖線で示す瞳領域１７の大きさに対応〕と今回の評価値Ｅ〔図９（ａ）の実線で示す瞳領域１７の大きさに対応〕との差をΔ１（＞０）とする。又、１台のビデオカメラ１１Ｂのみによって得られた前回の評価値Ｅ〔図９（ｂ）の鎖線で示す瞳領域１７の大きさに対応〕と今回の評価値Ｅ〔図９（ｂ）の実線で示す瞳領域１７の大きさに対応〕との差をΔ２（＞０）とする。
【００７５】
そうすると、２台のビデオカメラ１１Ａ，１１Ｂによって得られた前回の評価値Ｅａ，Ｅｂの差は、零であるが、２台のビデオカメラ１１Ａ，１１Ｂによって得られた今回の評価値Ｅａ，Ｅｂの差は、（Δ１＋Δ２）である。従って、前回の評価値Ｅａ，Ｅｂの差と、今回の評価値Ｅａ，Ｅｂの差との差は、（Δ１＋Δ２）である。つまり、２台のビデオカメラによって得られた今回の評価値Ｅａ，Ｅｂの差と前回の評価値Ｅａ，Ｅｂの差との違いは、１台のビデオカメラのみによって得られた評価値と前回の評価値との差よりも大きい。従って、２台のビデオカメラ１１Ａ，１１Ｂを用いた注目判定の精度は、１台のビデオカメラを用いた注目判定の精度よりも高くなる。つまり、２台のビデオカメラ１１Ａ，１１Ｂを用いた注目判定方法は、１台のビデオカメラを用いた注目判定方法よりも正確な注目判定を行うことができる。
【００７６】
（３）提示情報Ｑは、画像表示装置１４の画面内で移動される。提示情報Ｑを画面に表示する構成では、種々の提示情報を必要に応じて簡単に取り替えて表示することができ、多数の提示情報に対する人の注目度を分析することが容易である。
【００７７】
（４）本実施形態では、カメラ用パソコン１２Ａ，１２Ｂは、ビデオカメラ１１Ａ，１１Ｂにより撮像した画像データに含まれる目領域１６から瞳領域１７を検出し、この瞳領域１７の形状に対して円形度を算出する。そして、カメラ用パソコン１２Ａ，１２Ｂは、その算出結果をメインパソコン１３に送信し、メインパソコン１３は、その円形度に基づいて視線を検出する。従来と異なり、視線検出に瞳孔を利用せず、瞳孔よりも大きく、目領域１６の中から白目領域１８と容易に区別できる瞳領域１７の円形度に基づいて視線検出を行ったため、瞳孔を検出可能な位まで目領域１６を拡大して撮像する必要がない。そのため、ビデオカメラ１１Ａ，１１Ｂにて検出対象者Ｈを捕捉可能な範囲が広がる。従って、検出対象者Ｈの位置に影響されることなく、人が提示情報Ｑに対して注目しているか否かを判定するために好適に視線を検出することができる。
【００７８】
（５）本実施形態では、視線検出空間内に複数のビデオカメラ１１Ａ，１１Ｂを互いに異なる位置に配置し、各ビデオカメラ１１Ａ，１１Ｂでそれぞれ同時に検出した各瞳領域１７の円形度をメインパソコン１３により比較して視線検出を向行っている。従って、従来と異なり、カメラ用パソコン１２Ａ，１２Ｂで算出された円形度（評価値Ｅ）を予め記憶した参照データと比較しないため、検出した瞳領域１７に対して較正（キャリブレーション）を行う必要がなく、簡便に視線検出を行うことができる。
【００７９】
（６）本実施形態では、２つのビデオカメラ１１Ａ，１１Ｂしか用いていないが、設置するビデオカメラは、カメラ用パソコンを介してメインパソコン１３に接続するだけで容易に追加できる。このため、視線検出の精度向上及び検出領域の拡張を容易に図ることができる。
【００８０】
（７）円形度の適正な把握は、提示情報Ｑに対する人の注目度を判定する上で重要である。本実施形態では、瞳領域１７のエッジ画像をハフ変換し、変換先のパラメータ空間において、投票数が最大となる瞳中心点Ｃと、瞳中心点Ｃを含む比較領域Ｐにおける各投票数から円形度を示す評価値Ｅを算出した。従って、パラメータ空間における投票数を利用して好適に円形度の算出が実現できる。
【００８１】
（８）本実施形態では、メインパソコン１３において、２つのカメラ用パソコン１２Ａ，１２Ｂから受信した評価値Ｅが所定の許容範囲内（閾値Δ以下）で等しいと判断した場合は、両ビデオカメラ１１Ａ，１１Ｂのほぼ中央を見ているものと判断する。このときには人の視線が提示情報Ｑに向けられており、この状態から提示情報Ｑを右あるいは左のどちらに移動しても、追従判定を行うことができる。
【００８２】
本発明では、図１１（ａ），（ｂ）に示すように、ビデオカメラ１１を１台だけ用いた第２の実施形態も可能である。第１の実施形態と同じ構成部には同じ符号が用いてある。
【００８３】
画像表示装置１４は、上から見てビデオカメラ１１の光軸よりも左側に設置されている。カメラ用パソコン１２は、第１の実施形態のカメラ用パソコン１２Ａ，１２Ｂと同じ役割を果たす。カメラ用パソコン１２Ａ，１２Ｂは、算出された円形度（評価値Ｅ）の結果をメインパソコン１３Ｃに送信する。メインパソコン１３Ｃは、カメラ用パソコン１２から受信した評価値Ｅに基づいて、視線が画像表示装置１４の画面上の提示情報に向けられているか否かの判定を行う。評価値Ｅが最大の場合、メインパソコン１３Ｃは、視線が提示情報に向けられているとの判定を行う。この判定後、提示情報を右へ移動し、これに伴って視線が右へ移動した場合には、視線が提示情報の移動に追従したとの判定を行う。メインパソコン１３Ｃは、提示情報の移動中に１台のビデオカメラ１１のみによって得られた今回の評価値が前回の評価値よりも大きい場合には、視線が右へ移動したとの特定を行う。メインパソコン１３Ｃは、提示情報の移動中に１台のビデオカメラ１１のみによって得られた今回の評価値が前回の評価値よりも小さい場合には、視線が左へ移動したとの特定を行う。
【００８４】
第２の実施形態においても、人が提示情報に注目しているのか否かを特定することができる。
本発明では以下のような実施形態も可能である。
【００８５】
・ビデオカメラ、視線検出用光源、受光センサ、記憶装置、及び判定制御装置等を備えた視線検出装置を本発明の注目判定装置の視線移動方向推測手段の一部として用いること。記憶装置には、視線検出判定を行うための参照データが予め記憶されている。
【００８６】
注目判定に用いる前記視線検出装置では、ビデオカメラによって検出対象者を拡大して撮像し、拡大した目領域の画像データから瞳孔を検出する。このとき、前記参照データとの比較処理に利用可能な状態にするために、検出した瞳孔に対して較正（キャリブレーション）を行う。また、前記視線検出用光源から検出対象者に赤外光を照射して、同赤外光を検出対象者の眼部に照射し、前記受光センサにより眼部（瞳孔と角膜）にて反射する反射光を受光する。瞳孔と反射光との位置関係と、前記参照データとが判定制御装置において比較され、視線が検出される。
【００８７】
この場合にも、人が提示情報に注目しているのか否かを把握することができる。
・第１の実施形態において、提示情報を左側（又は右側）へ移動して止めた後、右側（左側）へ移動し、この提示情報の移動の間における視線の移動方向を検出すること。この場合、提示情報の左側への移動に視線の移動が追従し、かつ提示情報の右側への移動に視線の移動が追従した場合には、視線が提示情報Ｑの移動に追従したとの判定を行うようにすればよい。このようにすれば、人が提示情報に注目しているのか否かを一層確実に把握することができる。
【００８８】
・第１の実施形態において、顔領域の検出を省略し、目領域を拡大して撮像して視線検出を行うようにすること。
・第１の実施形態では、カメラ用パソコン１２Ａ，１２Ｂにおける円形度（即ち、評価値Ｅ）の算出に、瞳領域１７のエッジ画像をハフ変換する際におけるパラメータ空間の投票数を利用したが、以下の方法で円形度を算出してもよい。すなわち、ステップＳ６，Ｓ１５において検出され、円形状の補完が行われた瞳領域１７全体の画素数から面積Ｓを求め、瞳領域１７のエッジ画像からエッジ１７２の画素数を数えることでエッジ１７２（即ち、輪郭線）の長さＬを求める。そして、各カメラ用パソコン１２Ａ，１２Ｂでは、次式（７）から円形度を示す評価値Ｒａｔｅを算出する。
【００８９】
Ｒａｔｅ＝４πＳ／Ｌ^２　　　・・・（７）
これによれば、円形度は式（７）により算出される評価値Ｒａｔｅに基づいて決定され、評価値Ｒａｔｅが１に近づくほど円形度は高く、１から離れるほど円形度は低くなる。従って、この別例においては、円形度を算出するというのは評価値Ｒａｔｅを算出することを示す。このようにしても、瞳領域１７の面積Ｓ及びエッジ１７２の長さＬを利用して好適に円形度の算出が実現できる。
【００９０】
・第１の実施形態では、式（６）における分子の中心投票数Ｖｃは、投票数が最大である瞳中心点Ｃのみの投票数であった。これに代えて、例えば、瞳中心点Ｃとその隣の点を合せた領域の投票数の合計を中心投票数Ｖｃとする等、少なくとも比較領域Ｐよりも狭い領域の投票数であれば、中心投票数Ｖｃとしてもよい。この場合、瞳中心点Ｃとその隣の点を合せた領域が第１領域に相当し、比較領域Ｐが第２領域に相当する。
【００９１】
・第１の実施形態では、ステップＳ８，Ｓ１５における瞳領域検出を、ステップＳ５，Ｓ１４において検出された目領域１６のうち何れか一方の目領域１６について行ったが、右・左の両方の目領域１６に対して行ってもよい。この場合、各瞳領域１７から算出された円形度（評価値Ｅ）の平均値が算出され、その値が、各画像データの円形度とされ、比較される。このようにすれば、片目について円形度を算出する場合と比較して、高精度に視線検出を行うことができ、注目判定の精度が一層向上する。
【００９２】
・第１の実施形態では、複数台のビデオカメラ１１Ａ，１１Ｂを視線検出空間内に設け、各ビデオカメラ１１Ａ，１１Ｂのカメラ用パソコン１２Ａ，１２Ｂにて算出された円形度（評価値Ｅ）をメインパソコン１３が比較することで、視線を検出した。これに代えて、撮像された画像データに基づく各円形度同士の比較ではなく、予めメインパソコン１３に記憶した閾値との比較で視線を検出してもよい。すなわち、各カメラ用パソコン１２Ａ，１２Ｂから送信された評価値Ｅと前記閾値との比較を行って視線検出を行う。なお、この場合、複数の円形度を比較することはないため、設置するビデオカメラ１１Ａ，１１Ｂを１台としてもよい。
【００９３】
・第１の実施形態では、メインパソコン１３と各カメラ用パソコン１２Ａ，１２Ｂとの通信をイーサネットを介したソケット通信にて行っていたが、無線電波にて行ってもよい。
【００９４】
・第１の実施の形態では、ビデオカメラ１１Ａ，１１Ｂ及び画像表示装置１４が全て同じ高さに配置されていたが、それぞれ異なる高さに配置してもよい。
・提示情報を描いた看板を機械的な移動手段によって移動させるようにすること。
【００９５】
【発明の効果】
以上詳述したように本発明では、提示情報に対する注目度を測ることができるという優れた効果を奏する。
【図面の簡単な説明】
【図１】第１の実施形態を示し，（ａ）は、注目判定装置の構成を示すブロック図。（ｂ）は画像表示装置の画面に表示された提示情報を示す画面図。
【図２】注目判定方法を示すフローチャート。
【図３】注目判定方法を示すフローチャート。
【図４】（ａ）はビデオカメラが撮像した画像データの説明図、（ｂ）は肌色基準で抽出した画像データの説明図。
【図５】（ａ），（ｃ）は瞳領域検出を示す説明図。（ｂ）は水平射影ヒストグラムを示す説明図。（ｄ）は瞳領域のエッジ画像を示す説明図。
【図６】（ａ），（ｂ）は円のハフ変換の説明図。
【図７】パラメータ空間を示す説明図。
【図８】ビデオカメラ１１Ａを直視した検出対象者を示す概念図。
【図９】（ａ）は円形度が高い瞳領域を示す説明図。（ｂ）は円形度が低い瞳領域を示す説明図。
【図１０】（ａ），（ｂ）は投票数と中心座標の関係を示す説明図。
【図１１】（ａ）は第２の実施形態の注目判定装置の構成を示すブロック図。（ｂ）は提示情報を示す画面図。
【符号の説明】
Ｈ…検出対象者。Ｑ…提示情報。Ｐ…第２領域としての比較領域。Ｃ…第１領域としての瞳中心点。１１Ａ，１１Ｂ…ビデオカメラ（撮像手段）。１２…カメラ用パソコン（視線移動方向推測手段を構成する瞳領域検出手段及び円形度算出手段）。１３…メインパソコン（判定手段、視線移動方向推測手段を構成する視線検出手段及び特定手段）。１４…画像表示装置（移動手段）。１５…顔領域。１６…目領域。１７，１７ａ，１７ｂ…瞳領域。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an attention determination method and an attention determination device.
[0002]
[Prior art]
It is proposed to detect the needs of the person from the information obtained by sensing the person, such as the line of sight and movement of the person, and the surrounding environment constructed by object sensing, and to provide services suitable for the intention of the person Have been. In order to realize these, it is important to sense a person and its surrounding environment, and to know what the person is seeing and what kind of operation is being performed. At this time, the line-of-sight information is one of information indispensable for estimating the person's attention or the intention or situation of the person.
[0003]
[Problems to be solved by the invention]
However, even if the gaze is simply detected, it is necessary to obtain a difference between whether the person is merely looking at the presented information (presentation information) in the gaze direction or whether the person is paying attention to the presented information. A method and an apparatus for the above have not been proposed yet.
[0004]
Assuming that there is a method and a device that show that the user is paying attention to the presentation information, it is possible to analyze the degree of attention of the person to the presentation information obtained by the method and the device, and the degree of interest in the presentation information In addition, marketability can be analyzed.
[0005]
An object of the present invention is to solve the above-described problem, and an object of the present invention is to provide an attention determination method and an apparatus thereof that can measure the degree of attention to presentation information.
[0006]
[Means for Solving the Problems]
In order to solve the above-mentioned problems, the invention according to claim 1 includes a moving step of moving presentation information in a predetermined direction, and a gaze movement detecting a line of sight of a person during the moving step and detecting a moving direction of the line of sight. An attention estimating method including a direction estimating step, and a judging step of judging whether or not the movement of the presentation information in a predetermined direction and the estimated moving direction of the line of sight of the person match. .
[0007]
In the invention of claim 2, a moving means for moving the presentation information in a predetermined direction, a gaze moving direction estimating means for detecting a gaze of a person during the movement of the presentation information and estimating a moving direction of the gaze, The attention determining device includes a determining unit configured to determine whether the movement of the presentation information in the predetermined direction matches the estimated moving direction of the line of sight of the person.
[0008]
According to a third aspect of the present invention, in the second aspect, the moving means is an image display device that displays the presentation information as an image, and moves the presentation information within a screen of the image display device.
[0009]
According to a fourth aspect of the present invention, in any one of the second and third aspects, an image pickup means for picking up an image of the detection target, and a pupil area from an eye area of a face included in image data picked up by the image pickup means. Pupil region detecting means for detecting, circularity calculating means for calculating circularity with respect to the shape of the pupil region detected by the pupil region detecting means, and detecting the line of sight based on the circularity calculated by the circularity calculating means The gaze movement direction estimating means includes: a gaze detection unit that performs the gaze detection; and a specification unit that specifies a movement direction of the gaze detected by the gaze detection unit.
[0010]
According to a fifth aspect of the present invention, in the fourth aspect, a plurality of the imaging units are provided at different positions from each other, and the line-of-sight detection unit compares each circularity of a pupil region included in image data obtained from each imaging unit. Is used to detect the line of sight.
[0011]
According to a sixth aspect of the present invention, in any one of the fourth and fifth aspects, the circularity calculating means performs Hough transform on the image data including the pupil region, and the number of votes in the voting space of the Hough transform is reduced. The circularity is calculated from the number of votes in the first region including the maximum number of votes and the number of votes in the second region including the maximum number of votes and larger than the first region.
[0012]
According to a seventh aspect of the present invention, in any one of the fourth and fifth aspects, the circularity calculating means calculates the circularity based on the area of the pupil region and the length of the contour of the pupil region. And
[0013]
(Action)
According to the first and second aspects of the present invention, a line of sight of a person is detected when the presentation information moves in a predetermined direction. The direction of movement of the line of sight is estimated based on the detection of the line of sight, and it is determined whether the estimated direction of movement of the line of sight matches the movement of the presentation information in the predetermined direction. Based on this determination, it is possible to specify whether the person is simply looking at the presentation information in the line of sight or is paying attention to the presentation information.
[0014]
According to the invention of claim 3, the presentation information is moved within the screen of the image display device. In the configuration in which the presentation information is displayed on the screen, various pieces of presentation information can be replaced and displayed as needed.
[0015]
According to the invention of claim 4, a pupil region is detected by the pupil region detection unit from the eye region of the face included in the image data captured by the imaging unit, and the circularity is circular with respect to the shape of the detected pupil region. It is calculated by the degree calculating means. Then, the line of sight is detected by the line of sight detection unit based on the circularity calculated by the circularity calculation unit. The specifying unit specifies a moving direction of the line of sight detected by the line of sight detecting unit. The determining means determines whether or not the moving direction of the line of sight specified by the specifying means matches the moving direction of the presentation information. Since gaze detection is performed using a pupil region that can be easily distinguished from a white eye region and can be detected from the eye region, it is not necessary to enlarge and capture an image of the eye region until a pupil can be detected, unlike the related art. For this reason, compared to the case where the pupil is used for gaze detection, the detection target is less affected by the position of the head of the detection target person, and the gaze is preferably detected to determine whether or not the person is paying attention to the presentation information. can do.
[0016]
According to the fifth aspect of the present invention, the image data is acquired by a plurality of imaging units provided at different positions from each other, and each circularity of the pupil region included in each image data is compared by the line-of-sight detection unit. . Then, the line of sight is detected based on the comparison result. Therefore, since the line of sight is detected by comparing a plurality of calculated degrees of circularity, for example, unlike the related art in which the line of sight is detected by comparing with the reference data stored in advance in the storage device, the detected pupil region Therefore, it is not necessary to perform calibration in order to make the state comparable to the reference data, and the line of sight is easily detected.
[0017]
According to the invention of claim 6, the image data including the pupil region is subjected to the Hough transform, and in the voting space of the Hough transform, the first region including the maximum vote count and the maximum vote count are obtained. The circularity is calculated by the circularity calculating means from each vote count in the second area larger than the first area. Therefore, the calculation of the circularity is suitably realized using the number of votes to the voting space of the Hough transform.
[0018]
According to the seventh aspect of the invention, the circularity is calculated by the circularity calculating means based on the detected area of the pupil region and the length of the contour of the pupil region. Accordingly, the calculation of the degree of circularity can be suitably realized by using the area and the length of the contour line.
[0019]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, a first embodiment embodying a gaze detection device of the present invention will be described with reference to FIGS.
[0020]
As shown in FIG. 1A, the attention judging device 10 includes a plurality of (two in this embodiment)

video cameras

11A and 11B as imaging means, camera

personal computers

12A and 12B, and a main personal computer 13. , An image display device 14. The

video cameras

11A and 11B are CCD cameras. The

video cameras

11A and 11B and the image display device 14 according to the present embodiment are all arranged at a predetermined height (for example, 1.15 m), and the image display device 14 is located between the

video cameras

11A and 11B. It is arranged at the position.
[0021]
Camera

personal computers

12A and 12B are connected to the

video cameras

11A and 11B. Individual frames (image data) photographed by the

video cameras

11A and 11B are input to the camera

personal computers

12A and 12B as video-rate color images (640 × 480).
[0022]
The camera

personal computers

12A and 12B are connected to a main personal computer 13, and the main personal computer 13 communicates with each of the camera

personal computers

12A and 12B by socket communication via Ethernet (registered trademark). In addition, a network time server system is used, and the main personal computer 13 is set as a time server, and the time of each camera

personal computer

12A, 12B is adjusted to the main personal computer 13. The main personal computer 13 is connected to the image display device 14 by wire. The main personal computer 13 controls the display of the presentation information and the movement of the presentation information on the image display device 14 according to the gaze detection result of the attention determination device 10. Note that a mode in which image display on the image display device 14 is wirelessly controlled without connecting the main personal computer 13 and the image display device 14 by wire may be adopted.
[0023]
(Attention judgment method)
Hereinafter, the attention determination method of the first embodiment will be described. First, an outline of the attention determination performed by the attention determination device 10 will be described.
[0024]
Each of the

video cameras

11A and 11B captures an image of the detection target H, and inputs the captured information to each of the camera

personal computers

12A and 12B. Each of the camera

personal computers

12A and 12B captures images from the

video cameras

11A and 11B, and detects the eye area 16 from the image data. Then, the pupil region 17 is detected from the detected eye region 16, the size of the pupil region 17 is normalized, and the circularity of the shape of the pupil region 17 is calculated. The camera

personal computers

12A and 12B transmit the calculation result of the circularity to the main personal computer 13, and the main personal computer 13 detects the line of sight by comparing each circularity. The "circularity" indicates the degree of approximation to a perfect circle, and the closer to a perfect circle, the higher the degree of circularity, and the more deformed the circle, the lower the degree of circularity. For example, as the difference between the length of the major axis and the length of the minor axis becomes larger, the degree of circularity becomes lower. Then, in the first embodiment, the circularity is proportional to the evaluation value E calculated by Expression (6) described later. If the evaluation value E is large, the circularity is high, and if the evaluation value is small, the circularity is low. Therefore, in the first embodiment, calculating the circularity means calculating an evaluation value E described later.
[0025]
When the line of sight is pointed at the screen of the image display device 14, the main personal computer 13 moves the presentation information Q displayed on the screen of the image display device 14. With the movement of the presentation information Q, the main personal computer 13 estimates the moving direction of the detected line of sight. When the detected direction of movement of the line of sight matches the direction of movement of the presentation information Q, the main computer 13 determines that the line of sight has followed the movement of the presentation information Q.
[0026]
FIG. 2 and FIG. 3 are flowcharts illustrating an attention determination program executed by the attention determination device 10. Hereinafter, the attention determination by the attention determination device 10 will be described with reference to this flowchart.
[0027]
This flowchart is started when a start request signal is transmitted from the main personal computer 13 to the camera

personal computers

12A and 12B. Then, the processing of step S1 is repeatedly performed until an end request signal is transmitted from the main personal computer 13 to the camera

personal computers

12A and 12B.
[0028]
In step S1, first, the camera

personal computers

12A and 12B determine whether or not to capture images from the

video cameras

11A and 11B. That is, in the present embodiment, capture of images from the

video cameras

11A and 11B is performed at predetermined intervals (for example, 0.1 seconds), and the

personal computers

12A and 12B It is determined whether or not. If it is determined that it is time to capture an image (step S1 is YES), each of the camera

personal computers

12A and 12B captures an image from the

video cameras

11A and 11B (step S2). On the other hand, when it is determined that it is not time for the camera

personal computers

12A and 12B to capture an image (NO in step S1), this determination is repeated. Since the camera

personal computers

12A and 12B are synchronized with the main personal computer 13, the camera

personal computers

12A and 12B capture images at the same time.
[0029]
(Face area detection)
Each of the camera

personal computers

12A and 12B captures a frame (image data, for example, see FIG. 4A) from the

video cameras

11A and 11B, and then performs face area detection. The face region detection uses a known skin color reference value method using color information. In the present embodiment, the CIE L * u * v color system, which is one of the uniform perceived color spaces, is used.
[0030]
First, a two-dimensional color histogram based on U and V coordinate values is obtained from the input image data over the entire area of the image, and a peak value (a value having a maximum frequency) within a predetermined effective skin color range is determined as a skin color reference value. And A threshold value is determined by applying a known discriminant analysis method to the color difference from the reference value, and binarized into a skin color region and other regions based on the threshold value (see FIG. 4B). In the present embodiment, it is assumed that the detection target person H is a single person. Therefore, when a plurality of skin color areas are detected, the camera

personal computers

12A and 12B determine the maximum area as the face area 15 ( Step S3). That is, the number of pixels (area) is determined in the plurality of extracted skin color areas, and the area having the maximum area Smax is set as the face area 15. In the following description, the U and V coordinate values may be referred to as UV values or U and V values for convenience of description.
[0031]
In step S4, the main personal computer 13 instructs the image display device 14 to display the presentation information, and the image display device 14 displays the presentation information Q (shown in FIG. 1B) on the screen.
[0032]
(Gaze detection)
Explaining the outline of the next steps S5 to S9, the camera

personal computers

12A and 12B detect the eye area 16 from the face area 15. Then, the pupil region 17 is further detected from the eye region 16 and the size of the pupil region 17 is normalized. Then, the degree of circularity is calculated for the shape of the pupil region 17, and the calculation result is transmitted to the main personal computer 13. The main personal computer 13 detects the line of sight by comparing the calculation results of the respective circularities received from the camera

personal computers

12A and 12B.
[0033]
(Eye area detection)
In step S5, first, the camera

personal computers

12A and 12B recalculate the skin color reference value for the image data and extract the skin color area. Of the extracted skin color regions, the largest region is determined to be the face region 15.
[0034]
Based on the face area 15, the camera

personal computers

12A and 12B detect the eye area 16 and the mouth area, respectively, by a template matching method using the four-directional plane features and the color difference plane features.
[0035]
It is assumed that the eye area 16 and the mouth area have been detected in step S5 in the image data processed using this flowchart immediately before the current image data. In this case, the face area 15 obtained this time is deleted by a predetermined area based on the previous detection result, and the face area 15 is set as a search range narrowed by the predetermined area. The search range is used for the current image data, and the eye region 16 and the mouth region are detected by the template matching method. When the eye region 16 and the mouth region are not detected in the search range as a result of performing the template matching, the eye region 16 and the mouth region are detected again in the face region 15. Has become.
[0036]
Here, the template matching method will be described.
In this method, a four-way surface feature (directional surface) and a color difference surface feature based on U and V coordinate values are extracted from the obtained image data by the above-described four-way surface feature extraction, and the skin color region is obtained. The similarity is calculated for the skin color area (face area 15) or the search range using the right eye, left eye, and mouth templates.
[0037]
The color difference plane feature indicates a difference between a U value and a V value from a skin color reference value. In addition, the template means that a plurality of images of the right eye, the left eye, and the mouth are prepared in advance, the image data obtained by extracting the four-way surface features and the color difference surface features is reduced at a predetermined ratio, and the width is set to a predetermined pixel (eg, 32 pixels). Pixels) and normalize the size. Regarding the four-directional surface features, the edge direction information is decomposed into four directions, and the four-directional surface features and the chrominance surface features are smoothed with a Gaussian filter, and each image data is converted to an 8 × 8 resolution. It was done. This template is stored in a storage device (not shown).
[0038]
Then, the similarity a of the four-way plane feature between the template T and the image data (input image) I is calculated by the following equation (1), and the similarity b of the chrominance plane feature is calculated by the following equation (2). .
[0039]
(Equation 1)

[0040]
(Equation 2)

In equations (1) and (2), I indicates an input image, and T indicates a template. i is an integer value of 1 to m, and j is an integer value of 1 to n. i and j correspond to a template of m × n pixels and an input image. (X, y) indicates the upper left coordinates of the input image. In the equation (2), Tu and Tv indicate the UV value of the template, Iu and Iv indicate the UV value of the image data, and Umax and Vmax indicate the maximum range of the UV value. In the present embodiment, the CIE L * u * v color system is used. In this CIE L * u * v color system, Umax = 256 and Vmax are used in order to speed up processing and save space in a storage device. = 256.
[0041]
Next, based on the similarities a and b calculated by the equations (1) and (2), a final similarity k is calculated by the following equation (3).
k = Wa × a + Wb × b (3)
In Expression (3), Wa and Wb are predetermined constants to be multiplied by the similarities a and b as weights, and satisfy Wa + Wb = 1. In the present embodiment, it is assumed that Wa = Wb = 0.5.
[0042]
Based on the calculation result of the similarity k, a portion where the similarity k is equal to or larger than a preset threshold is set as an eye candidate region. The input image (image data) is given upper left coordinates in advance, and the positional relationship between the eyes and the mouth can be grasped based on the coordinates. Therefore, based on the coordinates, for example, the eye satisfies the rough positional relationship (coordinate position) of the eyes and the mouth, such as the arrangement of the right eye and the left eye, above the mouth, and the combination having the highest similarity k is determined as the eye area. 16 and the mouth area. As a result, the eye region 16 is detected from the face region 15.
[0043]
(Pupil area detection)
Next, in step S6, the camera

personal computers

12A and 12B detect the pupil region 17 from the detected eye region 16. In the present embodiment, the pupil region detection described below is performed on one of the eye regions 16 (for example, the right eye) among the eye regions 16 detected in step S5.
[0044]
First, a saturation histogram of an eye area image is created, and a known discriminant analysis method is applied to separate the face area 15 into an eye area 16 and a skin area (an area other than the eye area 16 of the face area). Generally, the saturation of the skin region is high and the saturation of the eye region 16 is low. For this reason, this separation process utilizes its characteristics. Next, a luminance histogram of the eye area image is created, and a known discriminant analysis method is applied to divide the separated eye area 16 into a pupil area 17 and a white eye area 18. At this time, since the difference in saturation between the white-eye region 18 and the pupil region 17 is clear, compared to, for example, the case where the pupil region 17 is divided into a pupil and an iris, the image data can be easily stored even if the image data is not enlarged. Division processing is possible.
[0045]
Thereafter, based on the detection result of the pupil region 17, the pupil region 17 is reduced or enlarged and normalized to a predetermined size. Then, circular interpolation is performed on the pupil region 17. At this time, as described above, in the pupil region 17 obtained by applying the discriminant analysis method to each of the saturation value histogram and the luminance histogram and dividing the pupil region 17 as shown in FIG. The existence of the shadow 171 is considered. At this time, when the grayscale value of the image is usually represented by 8 bits, the grayscale value 0 is black and the grayscale value 256 is white. Therefore, a horizontal projection histogram as shown in FIG. 5B is created for a region having a gray value of 0 (black) in the region division result. Then, as shown in the upper part of the histogram in the vertical axis direction (vertical direction in FIG. 5B), a portion having an extreme peak is deleted based on a preset threshold. That is, the shadow 171 due to the eyelids appears as a peak on the histogram, and by deleting it, only the pupil region 17 as shown in FIG. 5C is extracted.
[0046]
(Calculation of circularity)
Next, in step S7, the camera

personal computers

12A and 12B perform a Hough transform of a circle on the detected pupil region 17, and calculate the circularity based on the number of votes in the parameter space used in the Hough transform.
[0047]
First, an edge image of the pupil region 17 as shown in FIG. 5D is used for the eye region 16 by using the difference in shading between the white eye region 18 and the detected pupil region 17 using a Prewitt operator. By generating, the edge (contour) 172 is extracted. Thereafter, a well-known circle Hough transform is performed on a group of points constituting the edge 172 (hereinafter referred to as “edge points”).
[0048]
Here, the Hough transform of a circle will be briefly described with reference to FIGS. 6A and 6B.
In a certain xy plane as shown in FIG. 6A, a circle is represented by the following equation (4).
[0049]
(Xa) ² + (Y-b) ² = R ² ... (4)
One circle on the xy plane is represented by one point in the (a, b, r) parameter space (hereinafter simply referred to as “parameter space”) relating to the center (a, b) and radius r of the circle. The Hough transform of a circle converts the xy plane into a three-dimensional parameter space indicating the characteristics of the circle. The Hough transform of the circle is realized by the following equation (5) obtained by converting the equation (4).
[0050]
b = ± √ ｛r ² -(Xa) ² ｝ + Y (5)
Using this equation (5), Huff transform of an arbitrary point (x0, y0) on a circle on the xy plane gives b = ± √ ｛r in the parameter space. ² -(X0-a) ² Voting is performed on the pixel [point (a, b, r)] on｝ + y0. Then, in the parameter space, the point group as the voting result is represented by a set of all circles passing through (x0, y0). For example, when three points e (x1, y1), f (x2, y2), and g (x3, y3) on the xy plane shown in FIG. 6A are projected into the parameter space by Expression (5), For the ab plane for each radius value r, each point e, f, g is converted into one circle as shown in FIG. FIG. 6B shows an ab plane at r = r0. The circle obtained by this Hough transform is called a Hough curve.
[0051]
Since the Hough curves intersect with each other in the parameter space, a plurality of intersections occur. Then, by finding the intersection having the largest number of intersecting Hough curves (that is, the number of votes), the center coordinates and the radius of the circle on the xy plane can be detected. In FIG. 6B, three Hough curves intersect at the intersection (a0, b0, r0), and the number of votes for the intersection is three. On the other hand, since the number of votes is 2 or less at other points, the center coordinates of the circle on the xy plane are (a0, b0), and the radius is determined to be r0.
[0052]
Now, an upper left coordinate (x, y) is given to the edge point group in the edge image of the pupil region 17. Then, Huff transform of the circle is performed, and the xy plane to which the edge point group is attached as shown in FIG. 5D is defined as the center coordinates (a1, b1) of the pupil region 17 as shown in FIG. Conversion to a three-dimensional parameter space for indicating the radius r1. Then, each edge point on the xy plane votes for a predetermined pixel [point (a, b, r)] that satisfies Expression (5) in the parameter space. Then, the edge 172 in the graphic space is completely scanned, and the point at which the maximum number of votes is obtained in the parameter space is the pupil center point C. The pupil center point C corresponds to the maximum number of votes, and the parameter space corresponds to the voting space. Note that the values of the a-axis and the b-axis in FIG. 7 indicate the center coordinates, and the values of the r-axis indicate the radius. The range (rmin to rmax) of the radius r of the pupil region 17 is stored in advance in a storage device (not shown) of the camera

personal computers

12A and 12B, and the range (rmin to rmax) is used for the Hough transform. Voting is performed on a predetermined pixel [point (a, b, r)].
[0053]
By the way, as shown in FIG. 8, for example, when the detection target person H is directly looking at the video camera 11A from the front, the shape of the pupil (pupil region 17) in the image captured by the video camera 11A is as shown in FIG. It becomes a perfect circle as shown in a). At this time, when the detection target person H is viewed from the direction of the video camera 11B located near the video camera 11A, the detection target person H is looking at another direction. Therefore, the shape of the pupil (pupil region 17) in the image captured by the video camera 11B is a deformed circle like an ellipse, as shown in FIG. 9B.
[0054]
That is, if the line of sight of the detection target person H is directed, the shape of the captured pupil (pupil region 17) indicates a perfect circle or a shape close to a perfect circle. On the other hand, if the line of sight is deviated, the shape of the pupil (pupil region 17) that is imaged as it deviates shows an ellipse in which the difference between the length of the short axis and the length of the long axis is widened. In the present embodiment, since the

video cameras

11A and 11B are all arranged at the same height, the shape of the pupil (pupil region 17) in a state where the line of sight is off shown in FIG. 9B is a vertically long ellipse. . Therefore, comparing the shapes of the pupils (pupil region 17) captured by the plurality of

video cameras

11A and 11B, the fact that the shape of a certain pupil (pupil region 17) is closer to a perfect circle means that the detection target person H has This indicates that the user is looking at the video camera 11A.
[0055]
On the other hand, in the number of votes in the parameter space when the Hough transform is performed, as the shape of the pupil region 17 (edge 172) of the edge image on the xy plane is closer to a perfect circle, as shown in FIG. There is a characteristic that the number of votes is concentrated on the point (a1, b1, r1) that is the pupil center point C, and the difference from the number of votes around the point is large. Also, when the shape of the pupil region 17 (edge 172) of the edge image is elliptical, the number of votes varies as the length of the major axis and the length of the minor axis differ, and as shown in FIG. There is a characteristic that the difference between the number of votes of the point (a1, b1, r1) set as the pupil center point C and the number of votes around the point is difficult to be opened. The three-dimensional space shown in FIGS. 10A and 10B shows a three-dimensional space in which the v-axis indicating the number of votes is orthogonal to the ab plane for indicating the central coordinates (a1, b1). This is different from the parameter space shown in FIG. FIGS. 10A and 10B both show the number of votes on the ab plane with a radius r = r1.
[0056]
Based on the above results, the number of votes Vc of the pupil center point C determined in the parameter space and the number of votes in the comparison area P centered on the pupil center point C (shown in FIGS. 9A and 9B). From Vr, the evaluation value E indicating the circularity is calculated by the following equation (6). In the following, the voting number Vc of the pupil center point C is referred to as “center voting number”, and the voting in the comparison area P (shown in FIGS. 9A and 9B) centered on the pupil center point C. The number Vr is referred to as “region vote number”.
[0057]
E = k × Vc / Vr (6)
In Expression (6), k is a predetermined constant multiplied as a weight. Further, in the present embodiment, the comparison area P indicates a range of 5 × 5 pixels around the center coordinates (a1, b1) at the pupil center point C. The area vote number Vr includes not only the vote number at r = r1 as shown in FIGS. 10A and 10B, but also the vote number at each r included in the comparison area P. The pupil center point C corresponds to the first area, and the comparison area P corresponds to the second area.
[0058]
Then, as described above, as the pupil region 17 becomes closer to a perfect circle, the number of votes concentrates on the pupil center point C. Therefore, in Expression (6), the number of central votes Vc of the numerator increases, and the evaluation value E also increases. growing. In addition, since the number of votes varies as the pupil area 17 moves away from the perfect circle, in equation (6), the central vote number Vc of the numerator decreases, while the area vote number Vr of the denominator increases, and the evaluation value E increases. Become smaller. As a result, the circularity increases as the evaluation value E increases, and the circularity decreases as the evaluation value E decreases.
[0059]
The camera

personal computers

12A and 12B transmit the result of the circularity (evaluation value E) calculated as described above to the main personal computer 13. Since the times of the camera

personal computers

12A and 12B are synchronized with the main personal computer 13, the calculation results of the circularity (evaluation value E) transmitted from the camera

personal computers

12A and 12B are captured at the same time. It is calculated from the obtained image data.
[0060]
(Gaze detection)
In step S8, the main personal computer 13 calculates a difference between the evaluation value Ea received from the camera personal computer 12A and the evaluation value Eb received from the camera personal computer 12B. When the difference | Ea−Eb | between the evaluation value Ea and the evaluation value Eb is smaller than a preset threshold Δ, the main computer 13 is directed to the presentation information Q on the screen of the image display device 14. Is determined (YES in step S9). When the difference | Ea−Eb | between the evaluation value Ea and the evaluation value Eb is larger than the threshold value Δ, the main computer 13 determines that the line of sight is not directed to the presentation information Q on the screen of the image display device 14. (Step S9 is NO).
[0061]
When it is determined that the line of sight is not directed to the presentation information Q on the screen of the image display device 14, the main personal computer 13 stores the result (step S20).
[0062]
When it is determined that the line of sight is directed to the presentation information Q on the screen of the image display device 14, the main personal computer 13 shifts the presentation information Q to the right (from the video camera 11A to the video camera 11B). Or instruct the image display device 14 to move to the left (from the video camera 11B side to the video camera 11A side). The image display device 14 moves the presentation information Q rightward or leftward based on the instruction (step S10).
[0063]
After the presentation information Q starts to move, the same image collection (step S12) and face area detection (step S13) as in steps S2 and S3 are performed. Next, the same eye area detection (step S14), pupil area detection (step S15), and circularity calculation (step S16) as in steps S5 to S7 are performed. After performing step S16, the main personal computer 13 compares the magnitude relationship between the evaluation value Ea received from the camera personal computer 12A and the evaluation value Eb received from the camera personal computer 12B, and stores this comparison result (step S16). S17). Moving period t of presentation information Q ₀ Is set in advance, for example, to 3 seconds. The main computer 13 determines that the elapsed time from the start of the movement of the presentation information Q is the movement period t. ₀ Is determined (step S18). Images are captured from the

video cameras

11A and 11B at predetermined intervals (0.1 seconds). The comparison in step S17 is based on the fact that the elapsed time from the start of the movement of the presentation information Q is the movement period t. ₀ Is performed at predetermined intervals (0.1 seconds) until the time reaches. Therefore, the comparison result [difference (Ea−Eb)] of the magnitude relationship between the evaluation value Ea and the evaluation value Eb is stored for 3 times / 0.1 seconds = 30 times every 0.1 seconds.
[0064]
The elapsed time from the start of the movement of the presentation information Q is the movement period t ₀ Is reached, the main personal computer 13 determines whether or not the line of sight follows the movement of the presentation information Q based on the stored comparison result (step S19).
[0065]
It is assumed that the main personal computer 13 instructs the movement of the presentation information Q to the left. That is, it is assumed that the presentation information Q has moved from the video camera 11B to the video camera 11A. In this case, the criteria for determining whether or not the line of sight has followed the movement of the presentation information Q are, for example, as follows.
[0066]
<1> The difference (Ea−Eb) between the evaluation value Ea and the evaluation value Eb is larger than the previous difference (Ea′−Eb ′) at least z times (z is, for example, 10).
There are several).
[0067]
The previous difference (Ea′−Eb ′) is the difference (Ea′−Eb ′) obtained 0.1 seconds before the time when the difference (Ea−Eb) was obtained.
It is assumed that the main personal computer 13 instructs the movement of the presentation information Q to the right. That is, it is assumed that the presentation information Q has moved from the video camera 11A to the video camera 11B. In this case, the criteria for determining whether or not the line of sight has followed the movement of the presentation information Q are, for example, as follows.
[0068]
<2> The difference (Ea−Eb) between the evaluation value Ea and the evaluation value Eb is smaller than the previous difference (Ea′−Eb ′) at least z times (z is, for example, 10).
There are several).
[0069]
When the criterion <1> is satisfied or the criterion <2> is satisfied, the main personal computer 13 specifies the moving direction of the line of sight and determines that the line of sight follows the movement of the presentation information Q. Do. When neither the criterion <1> nor the criterion <2> is satisfied, the main personal computer 13 determines that the line of sight did not follow the movement of the presentation information Q.
[0070]
The main personal computer 13 stores the determination result (Step S20).
In the first embodiment, the camera

personal computers

12A and 12B correspond to pupil area detection means for detecting a pupil area from an eye area of a face included in image data captured by the imaging means. The camera

personal computers

12A and 12B correspond to circularity calculating means for calculating the circularity of the shape of the pupil area detected by the pupil area detecting means. The main personal computer 13 corresponds to a line-of-sight detecting unit that detects a line of sight based on the circularity calculated by the circularity calculating unit. The main personal computer 13 also includes a specifying unit that specifies the moving direction of the line of sight detected by the line of sight detecting unit, and determines whether the moving direction of the presentation information in the predetermined direction matches the moving direction of the supposed person's line of sight. This corresponds to a determination unit.
[0071]
According to the first embodiment, the following effects can be obtained.
(1) A line of sight of a person is detected when the presentation information Q moves in a predetermined direction (right or left). The moving direction of the line of sight is estimated based on this line of sight detection. It is determined whether or not the estimated moving direction of the line of sight matches the moving direction of the presentation information Q. In this embodiment, if the criterion <1> or the criterion <2> is satisfied, it is determined that the moving direction of the line of sight matches the moving direction of the presentation information Q. When a person's line of sight is pointed at stationary presentation information, it is not clear whether the person is simply looking at the presentation information in the direction of the line of sight or is paying attention to the presentation information. . However, when it is determined that the moving direction of the line of sight matches the moving direction of the presentation information Q, it can be specified that a person is paying attention to the presentation information Q.
[0072]
The determination that the gaze follows the movement of the presentation information Q or the determination that the gaze does not follow the movement of the presentation information Q enables the analysis of the degree of attention of the person to the presentation information Q. It is possible to analyze the degree of interest in and marketability.
[0073]
(2) When the following determination is performed from the evaluation value E obtained by only one video camera, the current evaluation value and the previous evaluation value obtained by only one video camera during the movement of the presentation information Q The following determination is performed by evaluating the difference between In the present embodiment, the following determination is performed by evaluating the difference between the evaluation values Ea and Eb obtained by the two

video cameras

11A and 11B while the presentation information Q is moving. That is, the following determination is performed by grasping the magnitude relationship between the difference between the evaluation values Ea and Eb and the difference between the previous evaluation values Ea and Eb. If the presentation information Q moves to the left, the evaluation value Ea increases as the line of sight follows the movement of the presentation information Q, but the evaluation value Eb decreases as the line of sight follows the movement of the presentation information Q. Go on.
[0074]
A pupil area 17a indicated by a chain line in FIG. 9A is an image captured by the video camera 11A while viewing the presentation information Q before movement on the screen of the image display device 14. A pupil region 17b indicated by a dashed line in FIG. 9A is an image captured by the video camera 11B while viewing the presentation information Q before movement on the screen of the image display device 14. Therefore, it is assumed that the pupil regions 17a and 17b indicated by chain lines in FIGS. 9A and 9B were obtained by the

video cameras

11A and 11B last time. It is assumed that the

pupil regions

17, 17 indicated by solid lines in FIGS. 9A and 9B are obtained this time by the

video cameras

11A and 11B. Then, the previous evaluation value E (corresponding to the size of the pupil region 17 shown by the chain line in FIG. 9A) obtained by only one video camera 11A and the current evaluation value E [FIG. (Corresponding to the size of the pupil region 17 shown by the solid line)] is Δ1 (> 0). Further, the previous evaluation value E (corresponding to the size of the pupil region 17 shown by the chain line in FIG. 9B) obtained by only one video camera 11B and the current evaluation value E [FIG. (Corresponding to the size of the pupil region 17 shown by the solid line)] is Δ2 (> 0).
[0075]
Then, the difference between the previous evaluation values Ea and Eb obtained by the two

video cameras

11A and 11B is zero, but the difference between the current evaluation values Ea and Eb obtained by the two

video cameras

11A and 11B is zero. The difference is (Δ1 + Δ2). Therefore, the difference between the difference between the previous evaluation values Ea and Eb and the difference between the current evaluation values Ea and Eb is (Δ1 + Δ2). That is, the difference between the current evaluation values Ea and Eb obtained by two video cameras and the previous evaluation value Ea and Eb is the difference between the evaluation value obtained by only one video camera and the previous evaluation value. It is larger than the difference from the evaluation value. Accordingly, the accuracy of the attention determination using the two

video cameras

11A and 11B is higher than the accuracy of the attention determination using the one video camera. That is, the attention determination method using the two

video cameras

11A and 11B can perform more accurate attention determination than the attention determination method using the single video camera.
[0076]
(3) The presentation information Q is moved within the screen of the image display device 14. In the configuration in which the presentation information Q is displayed on the screen, various presentation information can be easily replaced and displayed as needed, and it is easy to analyze the degree of attention of a large number of presentation information.
[0077]
(4) In the present embodiment, the camera

personal computers

12A and 12B detect the pupil region 17 from the eye region 16 included in the image data captured by the

video cameras

11A and 11B, and have a circular shape with respect to the shape of the pupil region 17. Calculate the degree. Then, the camera

personal computers

12A and 12B transmit the calculation results to the main personal computer 13, and the main personal computer 13 detects the line of sight based on the circularity. Unlike the conventional method, the pupil is not used for the gaze detection, and the gaze detection is performed based on the circularity of the pupil region 17 which is larger than the pupil and can be easily distinguished from the white-eye region 18 in the eye region 16. There is no need to magnify and image the eye region 16 to the extent possible. Therefore, the range in which the detection target person H can be captured by the

video cameras

11A and 11B is widened. Therefore, the line of sight can be suitably detected to determine whether or not the person is paying attention to the presentation information Q without being affected by the position of the detection target person H.
[0078]
(5) In the present embodiment, a plurality of

video cameras

11A and 11B are arranged in different positions in the line-of-sight detection space, and the circularity of each pupil region 17 detected simultaneously by each

video camera

11A and 11B is determined by the main personal computer 13 Gaze detection. Therefore, unlike the related art, the circularity (evaluation value E) calculated by the camera

personal computers

12A and 12B is not compared with the reference data stored in advance, so that the detected pupil region 17 needs to be calibrated. Therefore, it is possible to easily perform the gaze detection.
[0079]
(6) In the present embodiment, only two

video cameras

11A and 11B are used, but the video cameras to be installed can be easily added simply by connecting to the main personal computer 13 via a personal computer for cameras. For this reason, it is possible to easily improve the accuracy of gaze detection and expand the detection area.
[0080]
(7) Proper grasp of the circularity is important in determining the degree of attention of a person to the presentation information Q. In the present embodiment, the edge image of the pupil region 17 is subjected to the Hough transform, and in the parameter space of the conversion destination, the pupil center point C at which the number of votes is the largest, and the respective vote numbers in the comparison region P including the pupil center point C are circular. An evaluation value E indicating the degree was calculated. Therefore, it is possible to suitably calculate the circularity by using the number of votes in the parameter space.
[0081]
(8) In the present embodiment, when the main personal computer 13 determines that the evaluation values E received from the two camera

personal computers

12A and 12B are equal within a predetermined allowable range (not more than the threshold value Δ), both video cameras 11A are used. , 11B. At this time, the line of sight of the person is directed to the presentation information Q, and the tracking determination can be performed regardless of whether the presentation information Q is moved right or left from this state.
[0082]
In the present invention, as shown in FIGS. 11A and 11B, a second embodiment using only one video camera 11 is also possible. The same reference numerals are used for the same components as those in the first embodiment.
[0083]
The image display device 14 is installed on the left side of the optical axis of the video camera 11 when viewed from above. The camera personal computer 12 plays the same role as the camera

personal computers

12A and 12B of the first embodiment. The camera

personal computers

12A and 12B transmit the result of the calculated circularity (evaluation value E) to the main personal computer 13C. The main personal computer 13 </ b> C determines whether or not the line of sight is directed to the presentation information on the screen of the image display device 14 based on the evaluation value E received from the camera personal computer 12. When the evaluation value E is the maximum, the main personal computer 13C determines that the line of sight is directed to the presentation information. After this determination, the presentation information is moved to the right, and when the line of sight moves to the right accordingly, it is determined that the line of sight follows the movement of the presentation information. When the current evaluation value obtained by only one video camera 11 during the movement of the presentation information is larger than the previous evaluation value, the main personal computer 13C specifies that the line of sight has moved to the right. When the present evaluation value obtained by only one video camera 11 during the movement of the presentation information is smaller than the previous evaluation value, the main personal computer 13C specifies that the line of sight has moved to the left.
[0084]
Also in the second embodiment, it is possible to specify whether or not a person is paying attention to the presentation information.
In the present invention, the following embodiments are also possible.
[0085]
A gaze detection device including a video camera, a gaze detection light source, a light receiving sensor, a storage device, a determination control device, and the like is used as a part of the gaze movement direction estimation means of the attention determination device of the present invention. Reference data for making a gaze detection determination is stored in the storage device in advance.
[0086]
In the eye gaze detecting device used for attention determination, a detection target person is magnified and imaged by a video camera, and a pupil is detected from image data of the magnified eye region. At this time, calibration is performed on the detected pupil so that the pupil can be used for comparison with the reference data. In addition, the eye gaze detection light source irradiates infrared light to the detection target person, irradiates the infrared light to the eye part of the detection target person, and is reflected by the eye part (pupil and cornea) by the light receiving sensor. Receives reflected light. The positional relationship between the pupil and the reflected light and the reference data are compared in the determination control device, and the line of sight is detected.
[0087]
Also in this case, it can be grasped whether or not the person is paying attention to the presentation information.
In the first embodiment, after moving the presentation information to the left (or right) and stopping it, then move to the right (left) to detect the direction of movement of the line of sight during the movement of the presentation information. In this case, if the movement of the line of sight follows the movement of the presentation information to the left and the movement of the line of sight follows the movement of the presentation information to the right, it is determined that the line of sight has followed the movement of the presentation information Q. Should be performed. This makes it possible to more reliably grasp whether or not a person is paying attention to the presentation information.
[0088]
In the first embodiment, the detection of the face area is omitted, and the eye area is magnified and imaged to perform the gaze detection.
In the first embodiment, the roundness (that is, the evaluation value E) of the camera

personal computers

12A and 12B is calculated using the number of votes in the parameter space when the edge image of the pupil region 17 is subjected to the Hough transform. The circularity may be calculated by the following method. That is, the area S is obtained from the total number of pixels of the pupil region 17 detected in steps S6 and S15 and complemented with the circular shape, and the number of pixels of the edge 172 is counted from the edge image of the pupil region 17 to obtain the edge 172 ( That is, the length L of the contour line is obtained. Then, in each of the camera

personal computers

12A and 12B, an evaluation value Rate indicating circularity is calculated from the following equation (7).
[0089]
Rate = 4πS / L ² ... (7)
According to this, the circularity is determined based on the evaluation value Rate calculated by Expression (7), and the circularity increases as the evaluation value Rate approaches 1, and the circularity decreases as the evaluation value Rate moves away from 1. Therefore, in this alternative example, calculating the circularity means calculating the evaluation value Rate. Even in this case, it is possible to suitably calculate the degree of circularity using the area S of the pupil region 17 and the length L of the edge 172.
[0090]
In the first embodiment, the central vote number Vc of the numerator in Expression (6) is the vote number of only the pupil center point C having the maximum vote number. Alternatively, for example, if the total number of votes of the area obtained by combining the pupil center point C and the point adjacent thereto is the center vote number Vc, and the vote number is at least the area smaller than the comparison area P, the center The number of votes Vc may be used. In this case, a region in which the pupil center point C and a point adjacent thereto are combined corresponds to the first region, and the comparison region P corresponds to the second region.
[0091]
In the first embodiment, the pupil region detection in steps S8 and S15 is performed on any one of the eye regions 16 detected in steps S5 and S14, but both the right and left eyes are detected. This may be performed on the area 16. In this case, the average value of the circularity (evaluation value E) calculated from each pupil region 17 is calculated, and the average value is used as the circularity of each image data and compared. In this way, the gaze detection can be performed with higher accuracy than when the circularity is calculated for one eye, and the accuracy of attention determination is further improved.
[0092]
In the first embodiment, a plurality of

video cameras

11A and 11B are provided in the visual line detection space, and the circularity (evaluation value E) calculated by the camera

personal computers

12A and 12B of each

video camera

11A and 11B is calculated. The gaze was detected by the comparison by the main personal computer 13. Instead of this, the line of sight may be detected by comparing with a threshold value stored in the main personal computer 13 in advance, instead of comparing each circularity based on the captured image data. That is, the gaze detection is performed by comparing the evaluation value E transmitted from each of the camera

personal computers

12A and 12B with the threshold value. In this case, since there is no need to compare a plurality of circularities, only one

video camera

11A, 11B may be installed.
[0093]
In the first embodiment, the communication between the main personal computer 13 and each of the camera

personal computers

12A and 12B is performed by socket communication via Ethernet, but may be performed by wireless radio waves.
[0094]
In the first embodiment, the

video cameras

11A and 11B and the image display device 14 are all arranged at the same height, but they may be arranged at different heights.
-The signboard on which the presented information is drawn is moved by a mechanical moving means.
[0095]
【The invention's effect】
As described in detail above, the present invention has an excellent effect that the degree of attention to the presentation information can be measured.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of an attention determination device according to a first embodiment. (B) is a screen diagram showing presentation information displayed on the screen of the image display device.
FIG. 2 is a flowchart illustrating an attention determination method.
FIG. 3 is a flowchart illustrating an attention determination method.
FIG. 4A is an explanatory diagram of image data captured by a video camera, and FIG. 4B is an explanatory diagram of image data extracted based on a skin color standard.
FIGS. 5A and 5C are explanatory diagrams showing pupil region detection. (B) is an explanatory view showing a horizontal projection histogram. (D) is an explanatory view showing an edge image of a pupil region.
6A and 6B are explanatory diagrams of Hough transform of a circle.
FIG. 7 is an explanatory diagram showing a parameter space.
FIG. 8 is a conceptual diagram showing a detection target person who looks directly at a video camera 11A.
FIG. 9A is an explanatory diagram showing a pupil region having a high circularity. (B) is an explanatory view showing a pupil region having a low circularity.
FIGS. 10A and 10B are explanatory diagrams showing the relationship between the number of votes and center coordinates.
FIG. 11A is a block diagram illustrating a configuration of an attention determination device according to a second embodiment. (B) is a screen diagram showing presentation information.
[Explanation of symbols]
H: person to be detected. Q: Presentation information. P: comparison area as the second area. C: pupil center point as the first area. 11A, 11B: Video cameras (imaging means). 12: Camera personal computer (pupil area detecting means and circularity calculating means constituting the visual line moving direction estimating means). 13: Main personal computer (gaze detecting means and specifying means constituting the gaze moving direction estimating means). 14 ... Image display device (moving means). 15: Face area. 16 ... eye area. 17, 17a, 17b ... pupil area.

Claims

A moving step of moving the presentation information in a predetermined direction;
A gaze movement direction estimation step of detecting a gaze of a person during the movement step and estimating a gaze movement direction,
An attention determination method including a determination step of determining whether or not the movement of the presentation information in a predetermined direction matches the estimated movement direction of the line of sight of the person.

Moving means for moving the presentation information in a predetermined direction;
Gaze movement direction estimating means for detecting the gaze of a person during the movement of the presentation information and estimating the movement direction of the gaze,
An attention judging device comprising: judging means for judging whether or not the movement of the presentation information in a predetermined direction matches the estimated moving direction of the line of sight of the person.

The attention judging device according to claim 2, wherein the moving unit is an image display device that displays the presentation information as an image, and the presentation information is moved within a screen of the image display device.

The gaze movement direction estimating means includes: an imaging means for imaging a detection target person; a pupil area detection means for detecting a pupil area from an eye area of a face included in image data imaged by the imaging means; Circularity calculating means for calculating the degree of circularity with respect to the shape of the pupil region detected by the detecting means, line-of-sight detecting means for detecting a line of sight based on the degree of circularity calculated by the circularity calculating means, The attention determining device according to claim 2, further comprising a specifying unit configured to specify a moving direction of the line of sight.

5. The image capturing device according to claim 4, wherein a plurality of the image capturing devices are provided at different positions, and the visual line detecting device detects the visual line by comparing each circularity of a pupil region included in image data obtained from each image capturing device. Attention determination device.

The circularity calculating means performs a Hough transform of the image data including the pupil region, and includes, in a voting space of the Hough transform, a first region including a maximum vote number point with the maximum vote number, and including the maximum vote number point. The attention judging device according to claim 4, wherein the circularity is calculated from the number of votes in a second area larger than the first area.

The attention determination device according to claim 4, wherein the circularity calculating unit calculates the circularity based on an area of the pupil region and a length of a contour of the pupil region.