JP2004280673A

JP2004280673A - Information providing device

Info

Publication number: JP2004280673A
Application number: JP2003073638A
Authority: JP
Inventors: Atsuo Ishikawa; 敦雄石川; Takumi Fujii; 卓美藤井
Original assignee: Takenaka Komuten Co Ltd
Current assignee: Takenaka Komuten Co Ltd
Priority date: 2003-03-18
Filing date: 2003-03-18
Publication date: 2004-10-07

Abstract

<P>PROBLEM TO BE SOLVED: To provide information on attention of a subject in an object space including an arbitrary construction to be an action target of the subject. <P>SOLUTION: In an evaluation device 70, characteristics of the subject from a recognition device 10 are obtained as data, a noticed space is evaluated in a space evaluation part 92 by taking into consideration of the structure of objects present in the space, and the state of attention (emotion) of the space is evaluated by a attention state evaluation part 94. In an attention part evaluation part 96, a portion which the subject has paid attention to is derived and outputted as subject portion data 98. Thus, it is possible to present the attention portion the subject has paid attention to. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、情報提供装置にかかり、特に、３次元空間内において対象者が関心を寄せる情報を提供する情報提供装置に関する。
【０００２】
【従来の技術】
催事場や展覧会などの場で、視聴者が注目するものを把握することは、主催者側や、その場の動向を把握する上で重要である。
【０００３】
そこで、テレビカメラで撮像した画像データに基づいて任意の商品等に対して注目したか否かの判定や注目も度合いを計測するシステムが知られている（例えば、特許文献１参照）。この技術では、テレビカメラで人物を撮像した画像データから人物の位置、顔の向きなどを検出し、任意の物体周辺の滞在時間や滞在人物の人数などを基にして注目度を求めている。
【０００４】
【特許文献１】
特開平１０−４８００８号公報
【０００５】
【発明が解決しようとする課題】
しかしながら、人物のみの把握では、実際に人物が注目することを判定するには不十分である。すなわち、人物が３次元的にどの位置を注目しているのかを正確に検出しなければならず、視線方向のみからの検出では困難である。
【０００６】
本発明は、上記事実を考慮して、対象者の行動対象となる任意の構造物を含んだ対象空間において、対象者の注目についての情報を提供する情報提供装置の提供を目的とする。
【０００７】
【課題を解決するための手段】
上記目的を達成するために、本発明の情報提供装置は、対象者の行動について特徴を表す特徴情報を検出する検出手段と、前記対象者の行動の対象となる対象空間における構造物の位置及び構造を表す構造情報を記憶する記憶手段と、前記対象空間を単位空間で分割しかつ前記構造情報による構造物が分割した単位空間の何れに該当するかの対応関係を求め、前記特徴情報に基づいて、対象者が注目する位置に対応する単位空間を導出すると共に該単位空間について前記対象者が注目した度合いを表す注目度を導出する導出手段と、前記導出手段で導出した単位空間と注目度とを対応づけて蓄積する蓄積手段と、前記蓄積手段で蓄積した単位空間と注目度とに基づいて、前記対象者が注目した注目部位を特定する特定手段と、前記特定手段により特定した注目部位を提示する提示手段と、を備えたことを特徴とする。
【０００８】
本発明では、催事場や展覧会などの場で、視聴者や参加者などの対象者が注目する部位を把握するため、検出手段により、対象者の行動について特徴を表す特徴情報を検出する。この検出は、対象者の所作や感情表現の動作による行動を、映像や音声などによる第三者の観察結果に相当する。特徴情報は、対象者の行動の特徴を表すものであり、対象者の行動の特徴として、対象者の肢体各部の変動や対象者が発する音などで統合的に分類することができる。
【０００９】
例えば、視線の挙動、身ぶりや手ぶりなどで代表される、眼球や瞼、頭部や腕などの肢体の変動についてその位置、速度や方向などに、対象者の心理状態が現れる。また、息づかいや声づかいのタイミング、速度や大きさなどに、対象者の心理状態が現れる。そこで、これら対象者の肢体各部の変動や対象者が発する音を、予め分類し、対象者の行動の特徴を表す特徴情報として導出することができる。
【００１０】
また、本発明の情報提供装置は、記憶手段を備えており、記憶手段には、対象者の行動の対象となる対象空間における構造物の位置及び構造を表す構造情報が記憶されている。構造情報には、建造物の構造そのものをＣＡＤデータなどのデジタルデータで表現したものや、建造物の位置について色や濃度のデジタルデータで表現したもの、絵画などの画像のデジタルデータがある。従って、構造情報によって、対象空間内における構造物の任意の部位を特定することができる。
【００１１】
前記検出手段で検出した特徴情報は、行動の特徴であるので、その時点における対象者が注目した状態に寄与していると考えられる。そこで、導出手段は、特徴情報から、対象者が注目した度合いを表す注目度を導出する。まず、導出手段では、対象空間を単位空間で分割しかつ構造情報による構造物が分割した単位空間の何れに該当するかの対応関係を求める。これにより、構造物を単位空間毎に把握することができる。そして、導出手段は、特徴情報に基づいて対象者が注目する位置に対応する単位空間を導出する。これと共に導出手段は、導出された単位空間について対象者が注目した度合いを表す注目度を導出する。すなわち、行動の特徴である特徴情報は、建造物に向かう対象者の方向や指示位置などを含んでおり、単位空間で分割された対象空間内の対象となる単位空間を導出できる。このとき、特徴情報による建造物に向かう対象者の方向や指示位置などについて単位空間に対してなされた回数や時間から、単位空間に対する対象者の注目した度合いを表す注目度を導出できる。
【００１２】
蓄積手段は、導出手段で導出した単位空間と注目度とを対応づけて蓄積する。従って、蓄積手段には、単位空間毎に注目度が蓄積され、対象者による注目度が対象空間内において蓄積される。特定手段は、蓄積手段で蓄積した単位空間と注目度とに基づいて、対象者が注目した注目部位を特定する。すなわち、蓄積手段には、対象空間内における対象者による注目度が蓄積されるので、その蓄積量が大きいものほど注目の度合いが大きいと推定できる。この場合、蓄積量が最も大きい単位空間が対象者が注目した空間であると推定でき、この単位空間に対応する構造物を特定すれば、対象者が注目した注目部位を特定することができる。そして、提示手段によって、特定手段により特定した注目部位を提示することで、対象者の注目した部位を、第三者が把握するための情報として提示することができる。
【００１３】
前記検出手段は、対象者を撮影した撮影画像情報を特徴情報として検出する映像検出手段と、複数の対象者が発する音声情報を特徴情報として検出する音声検出手段と、から構成されることを特徴とする。
【００１４】
対象者の行動に表れる特徴情報を検出するには、対象者を捉えた映像や音声から把握することが好適である。そこで、前記検出手段は、対象者を撮影した撮影画像情報を特徴情報として検出する映像検出手段と、対象者が発する音声情報を特徴情報として検出する音声検出手段と、から構成することが好ましい。
【００１５】
映像検出手段では、対象者の行動を映像として捉えることができ、音声検出手段では対象者が発する音を音声情報として捉えることがでる。これによって、対象者の行動に表れる特徴を情報化することが容易になる。
【００１６】
前記検出手段は、対象者の視線に関係する視線情報を検出し、前記導出手段は、前記対象者の視線情報から前記単位空間及び注目度を導出することを特徴とする。
【００１７】
対象者が物を注目する場合、対象者が手振りによって指し示す場合もあるが、対象者が目視によって注視する場合が多くある。そこで、対象者の視線に関係する視線情報を検出し、導出手段は、対象者の視線情報から単位空間及び注目度を導出する。すなわち、視線情報が検出できれば、その視線と構造情報による構造物との交点が検出できるので、単位空間を特定することも容易となる。これによって、対象者が注目した注目部位を容易に特定することができる。
【００１８】
【発明の実施の形態】
以下、図面を参照して本発明の実施の形態の一例を詳細に説明する。本実施の形態では、展覧会等の催事場で観覧する観覧者に対して、その行動（視線運動、音声変化、身体運動など）を把握し分析し、その催事場の動向把握を支援するために、観覧者が注目する部位についての情報を提供する場合に、本発明を適用したものである。
【００１９】
〔構成〕
図１には、本発明が適用可能な催事場などの動向把握を支援するための支援装置９０の概略構成が示されている。図１に示したように、支援装置９０は、認識装置１０と評価装置７０とから構成されている。
【００２０】
認識装置１０は、ＣＰＵ１２，ＲＯＭ１４，ＲＡＭ１６が入出力ポート（以下、Ｉ／Ｏという）１８に接続され、コマンドやデータが授受可能なコンピュータ構成とされている。このＩ／Ｏ１８には、各種データやプログラムが格納されたメモリ２０が接続されている。
【００２１】
また、Ｉ／Ｏ１８には、データ入力のためのキーボード２８、コマンドやデータを表示するためのディスプレイ（以下、ＣＲＴという）３０、及びコマンドやデータを印刷するためのプリンタ２６が接続されている。さらに、Ｉ／Ｏ１８には、観覧者を撮影するカメラなどの撮影装置２２、及び観覧者が発する音声を入力するマイクロフォンなどの音声装置２４が接続されている。また、Ｉ／Ｏ１８には、評価装置７０との間で各種データを授受するための接続装置６８が接続されている。
【００２２】
評価装置７０は、ＣＰＵ７２，ＲＯＭ７４，ＲＡＭ７６が入出力ポート（以下、Ｉ／Ｏという）７８に接続され、コマンドやデータが授受可能なコンピュータ構成とされている。このＩ／Ｏ７８には、各種データやプログラムが格納されたメモリ８０が接続されている。
【００２３】
また、Ｉ／Ｏ７８には、データ出力のためのプリンタ８２、データ入力のためのキーボード８４、及びコマンドやデータを表示するためのＣＲＴ８６が接続されている。さらに、Ｉ／Ｏ７８には、認識装置１０との間で各種データを授受するための接続装置８８が接続されている。
【００２４】
〔機能ブロック〕
本実施の形態では、催事場の動向把握を支援するために、認識装置１０は、複数の認識処理を実施する。評価装置７０は、認識装置１０により得られたデータにより催事場の動向把握のための評価処理を実施する。
【００２５】
認識装置１０における認識処理の概要は、センサ（撮影装置２２、音声装置２４）からのデータを用いて全ての対象者の行動を把握し、次に、指定した対象者について、行動の特徴を把握し、対象者の感情要素を推定し、対象者の心理状態を推定する。この場合、心理状態を推定した対象者に関係する他の対象者との関わりも要素となる。なお、上記推定を、対象者全てについて実行することにより、相互関係を含めた各対象者の心理状態を推定できるので、より精密な催事場の動向把握のために必要なデータ提供が可能となる。
【００２６】
評価装置７０における評価処理の概要は、認識装置１０からのデータを用いて対象者の行動の特徴を把握し、対象者が注目する部位を導出して、その注目部位に該当する構造物の部位を注目部位と推定する。なお、対象者に関係する他の対象者との関わりを要素とすることで、相互関係を含めた注目部位を推定できるので、より精密な催事場の動向把握が可能となる。
【００２７】
図２には、催事場の動向把握を支援するための各種データを得るための認識装置１０における、認識処理に関する機能的なブロック図を示した。認識装置１０は、その機能部分として、行動把握部３２と、心理状態認識部３４とに分類される。なお、以下に説明する各機能部は、認識装置１０のメモリ２０に格納された各機能部の各処理として機能する処理プログラムが、ＣＰＵ１２、ＲＯＭ１４、ＲＡＭ１６、Ｉ／Ｏ１８などの各資源を利用して処理することで構成される。
【００２８】
行動把握部３２は、撮影装置２２で撮影された対象者の画像データや音声装置２４で収集した対象者の音声データから、対象者の瞬時行動を把握するものであり、心理状態認識部３４は、行動把握部３２で把握された対象者の瞬時行動から対象者の心理状態を推定するものである。
【００２９】
行動把握部３２では、対象者の瞬時行動把握として、視線、瞬目、身体動作、音声、の４種類の瞬時行動に分類して把握する。これらの瞬時行動を把握するため、行動把握部３２は、視線把握部３６、瞬目把握部３８、身体動作把握部４０、音声把握部４２から構成されており、各々の把握部からデータを得ることができる。
【００３０】
まず、撮影装置２２で撮影された対象者の画像データを認識処理（画像認識）する把握部は、視線把握部３６、瞬目把握部３８及び身体動作把握部４０が対応する。また、音声装置２４で収集した対象者の音声データを認識処理（音声認識）する把握部は、音声把握部４２が対応する。各々の把握部で把握する項目は、次の表１に示した。
【００３１】
【表１】

【００３２】
視線把握部３６は、撮影画像を画像処理することで対象者の眼球の位置及び瞳の位置から視線方向、及び所定時間内における動きを把握データとして得る。瞬目把握部３８は、撮影画像（撮影装置２２で得た画像データ）を画像処理することで対象者の眼球の位置及び瞼の挙動から瞬目頻度、及び瞬目の分布（群発性）を把握データとして得る。
【００３３】
また、身体動作把握部４０は、撮影画像を画像処理することで対象者の肢体として頭部、顔、手腕、上体、及び姿勢を認識すると共に、各認識について、肢体の位置、方向、及び所定時間内における動きを把握データとして得る。なお、この身体動作把握部４０では、その場における対象者の位置、他の対象者との相対関係をさらに把握し、これを把握データに含めている。
【００３４】
音声把握部４２は、取得音声（音声装置２４で得た音声データ）を音声処理することで対象者の会話を抽出し、声量、トーン、速度、連続性の把握データを得る。また、取得音声には、周囲環境音が含まれる場合があるので、これも環境音として把握データとすることができる。
【００３５】
心理状態認識部３４では、行動把握部３２で得られた把握データを用いて、対象者の心理状態を推定する。心理状態認識部３４は、行動認識部４４、行動状態導出部４６、状態変化推定部４８から構成される。
【００３６】
行動認識部４４は、対象者の行動についてその特徴を把握する処理部であり、行動把握部３２からの把握データが入力されるように構成される。この行動認識部４４で把握された行動の特徴は、特徴データ５０として心理状態認識部３４から出力されるように構成されると共に、特徴データ５０は行動状態導出部４６へ出力されるように構成される。
【００３７】
この行動認識部４４では、入力された把握データから対象者の特徴を把握処理するものであり、把握データに含まれる、視線、瞬目、身体動作、及び音声を用いて把握する。把握する特徴項目は、次の表２に示した。
【００３８】
【表２】

【００３９】
行動認識部４４で把握する特徴項目は、対象者について単独で抽出することが可能な項目と、他の対象者との相互関係から抽出可能な項目とに分類できる。単独で抽出可能な項目は、視線については、視線の向きや動き、視線と対象者の顔の向きのずれがある。瞬目については、瞬目頻度や瞬目群発性がある。身体動作については、頭部の上下運動や左右運動、顔と体の向きのずれ、手腕の動き、上体の向き、上体の動き、前傾、後傾などがある。音声については、声量、トーン、速度などがある。
【００４０】
他の対象者との相互関係から抽出可能な項目には、視線については視線の交差があり、身体動作については姿勢の正対、姿勢の一致、注視対象の一致、身振り・手振りの同期・同調がある。また、音声については、会話の連続性がある。これらの特徴は、特徴データ５０として出力されるが、項目毎の大きさや頻度を係数として出力することもできる。
【００４１】
なお、複数の対象者の内の対象者を特定するときには、複数の把握データを必要とする場合がある。例えば、身体動作把握部４０において、その場における対象者の位置、他の対象者との相対関係をさらに把握した場合、音声が対象者に該当するか否かは、音声と同期した口や手腕などの肢体の動きの有無から判別でき、対象者の他の対象者に対する向きは、視線方向や身体の向きなどから判別できる。すなわち、１つの把握データから判別が困難な場合には、他の把握データを用いて判別することができる。
【００４２】
本実施の形態では、対象者の注目部位を得るため、行動の特徴として視線を採用した一例を説明する。
【００４３】
認識装置１０では、対象者の顔の特徴として、両眼の中央位置の座標（ｘ、ｙ、ｚ）を求める。この３次元の各方向Ｘ，Ｙ，Ｚによる座標（ｘ、ｙ、ｚ）は、撮影装置２２を基準として数値化することができる。この場合、撮影装置２２は、２台のカメラを所定間隔でかつ同一撮影方向に設定することが好ましい。方向Ｘは撮影方向に向かって右方向を正符号として撮影装置２２の配置についての左右方向に設定する。方向Ｙは、撮影方向に向かって上方向を正符号として撮影装置２２の配置についての上下方向に設定する。方向Ｚは、撮影方向に向かって撮影方向を負符号として撮影装置２２の配置についての前後方向に設定する。従って、撮影装置２２の２台のカメラの中央位置が原点（０，０，０）になる。
【００４４】
また、他の方向として、各方向Ｘ，Ｙ，Ｚの各軸回りの回転を、回転方向φ、θ、ψと定める。上記規定により、対象者の視線方向は、眼球の黒目の位置から顔の向きに対して、Ｘ軸回りの回転方向φの値をφｅ、Ｙ軸回りの回転方向θの値をθｅ、Ｚ軸回りの回転方向ψの値をψｅと設定する。
【００４５】
以上の設定により、視線を、ベクトル始点（ｘ、ｙ、ｚ）、ベクトル方向（φ＋φｅ、θ＋θｅ、ψ＋ψｅ）というベクトルで表現できる。
【００４６】
行動状態導出部４６は、対象者の感情要素すなわち内部状態を導出する処理部であり、特徴データ５０に基づいて導出された対象者の内部状態である行動状態は、状態データ５２として心理状態認識部３４から出力されるように構成されると共に、状態データ５２は状態変化推定部４８へ出力されるように構成される。
【００４７】
この行動状態導出部４６では、入力された特徴データ５０から対象者の内部状態を導出処理するものであり、予め定めた行動の特徴と感情要素との関係から求めることができる。この行動の特徴と感情要素の関係は、次の表３乃至表５に示した。本実施の形態では、Ｗｕｎｄｔの感情３次元説に基づき、感情要素として、快適さ（快適から不快）、興奮さ（興奮から沈静）、緊張さ（緊張から弛緩）の３種類を採用する。
【００４８】
【表３】

【００４９】
【表４】

【００５０】
【表５】

【００５１】
行動状態導出部４６は、快適さの状態データ５２を得るために、特徴データ５０のうちの視線及び身体動作の該当データを用いて快適さを数値化する。
【００５２】
表３に示す快適と不快の例では、視線の向きについては、他の対象者（例えば会話の対象者）に向き合う方向になるに従って快適となるデータ、避ける方向になるに従って不快となるデータとする。同様に、視線の顔方向のずれについて、他の対象者に向き合う方向になるに従って快適となるデータ、避ける方向になるに従って不快となるデータとする。
【００５３】
また、身体動作のうちの上体の向きについては、他の対象者（例えば会話の対象者）に向き合う方向になるに従って快適となるデータ、さける方向になるに従って不快となるデータとする。同様に、前後方向の傾きについては、前傾方向になるに従って快適となるデータ、後傾方向になるに従って不快となるデータとし、上体の顔方向のずれについては、一致するに従って快適となるデータ、ずれる方向になるに従って不快となるデータとする。
【００５４】
さらに、身体動作のうちの頭部の上下方向については、動きが大きくなるに従って快適となるデータとする。なお、頭部の左右方向については、動きが大きくなるに従って不快となるデータとすることもできる。
【００５５】
表４に示す興奮と沈静の例では、視線の動きについては、動かなくなるに従って興奮となるデータ、揺れるに従って沈静となるデータとする。同様に、瞬目の頻度について、低下するに従って興奮となるデータ、群発するに従って沈静となるデータとする。
【００５６】
また、身体動作のうちの上体の動きについては、揺れるに従って興奮となるデータとし、手腕の動きについては、動きが大きくなるに従って興奮となるデータ、動きが小さくなるに従って沈静となるデータとする。
【００５７】
さらに、音声のうちの音量については、大きくなるに従って興奮となるデータ、小さくなるに従って沈静となるデータとし、速度については、速くなるに従って興奮となるデータ、遅くなるに従って沈静となるデータとし、トーンについては、高くなるに従って興奮となるデータ、低くなるに従って沈静となるデータとする。
【００５８】
表５に示す緊張と弛緩の例では、視線の動きについては、不安定になるに従って緊張となるデータとする。同様に、瞬目の頻度について、増加するに従って緊張となるデータ、減少するに従って弛緩となるデータとする。また、身体動作のうちの手腕の動きについては、動かなくなるに従って弛緩となるデータとする。
【００５９】
上記状態データ５２としてデータを得るには、言語値を数値化するファジイ推論、ニューラルネットワークシステムや遺伝的アルゴリズムを用いて数値化することが可能である。
【００６０】
状態変化推定部４８は、対象者の心理状態を推定する処理部であり、状態データ５２に基づいて推定された対象者の状態変化は、心理データ５４として心理状態認識部３４から出力されるように構成される。状態変化推定部４８は、心理データ５４を得るために、３種類の状態データ５２を用いて心理状態を数値化する。この数値化には３次元ルックアップテーブルが用いられる。
【００６１】
図３には、本実施の形態で採用した３種類の感情要素である、快適さ（快適から不快）、興奮さ（興奮から沈静）、緊張さ（緊張から弛緩）の各々を軸とした心理状態空間６０を示した。心理状態空間６０には、予め定めた心理状態の領域が定められている。本実施の形態では、心理状態は、共感、嫌悪、興味、無関心、活性、抑圧の６種類を採用している。これらの共感、嫌悪、興味、無関心、活性、抑圧の６種類からなる心理状態を表すデータを心理データ５４とする。
【００６２】
すなわち、心理状態は、感情要素の分布で包括することができ、この包括された領域内のいずれに位置するかによって、心理状態を推定することが可能となる。そこで、多数の対象者に対して予め図３に示す心理状態空間６０内に状態データ５２の感情要素をプロットし、そのプロットの相関関係により領域を分類する。その分類した領域について心理状態が対応されるので、分類した領域を心理状態を表す領域として、その領域内において心理状態の大きさを定める。これによって、状態データ５２の感情要素に対応する心理データ５４の心理状態を求めることができ、対象者の心理状態を推定できる。
【００６３】
例えば、快適さ、興奮さ、緊張さの状態データ５２と、共感、嫌悪、興味、無関心、活性、抑圧の心理データ５４との対応関係を図４及び図５に示した。
【００６４】
図４の例では、快適さ及び興奮さの状態データ５２と、共感、嫌悪、興味、無関心の心理データ５４との対応関係を示した。快適さ及び興奮さの２次元領域について、快適さ及び興奮さの軸で分離された４つの分離領域のそれぞれに共感、嫌悪、興味、無関心の心理状態の各々が対応されている。すなわち、快適さが高くかつ興奮さが高い分離領域に心理状態（共感）が対応され、快適さが低く（不快性が高く）かつ興奮さが高い分離領域に心理状態（嫌悪）が対応され、快適さが高くかつ興奮さが低い（沈積性が高い）分離領域に心理状態（興味）が対応され、快適さが低く（不快性が高く）かつ興奮さが低い（沈積性が高い）分離領域に心理状態（無関心）が対応される。
【００６５】
これによって、快適さ及び興奮さの状態データ５２をプロットした点Ｐ１が、そのプロット位置に対応する心理状態の領域内に存在すれば（ここでは、共感の領域に該当）、該当する心理状態と推定でき、さらに、その領域内の位置の重みにより、心理状態を数値化できる。この重みは、図４の例では、分離領域である心理状態（共感）は、ほぼ円形領域とされ、その中心を最も高い共感を示し半径方向に移動するに従って共感度が希薄になるように設定することができる。なお、この重みは、予め多数のデータにより求めた相関関係から定めたテーブルや関数を採用することもできる。
【００６６】
また、図５の例では、快適さ及び緊張さの状態データ５２と、活性、抑圧の心理データ５４との対応関係を示した。快適さ及び緊張さの２次元領域について、快適さ及び緊張さの軸で分離された分離領域のうち、快適さが高くかつ緊張さが高い分離領域に心理状態（活性）が対応され、快適さが低く（不快性が高く）かつ緊張さが高い分離領域に心理状態（抑圧）が対応される。
【００６７】
これによって、快適さ及び緊張さの状態データ５２をプロットした点Ｐ２が、そのプロット位置に対応する心理状態の領域内に存在すれば（ここでは、活性の領域に該当）、該当する心理状態と推定でき、さらに、その領域内の位置の重みにより、心理状態を数値化できる。この重みは、図４の例と同様に、分離領域である心理状態（共感）は、ほぼ円形領域とされ、その中心を最も高い共感を示し半径方向に移動するに従って共感度が希薄になるように設定することができる。なお、この重みは、予め多数のデータにより求めた相関関係から定めたテーブルや関数を採用することもできる。
【００６８】
この心理状態を数値化について詳細に説明する。なお、本実施の形態では、言語値を数値化するファジイ推論、ニューラルネットワークシステムや遺伝的アルゴリズムを用いて数値化することになるが、得られる数値は、±１の値に規格化されたものとして説明する。可能である。
【００６９】
また、上記では、心理状態の領域としてほぼ円形領域で設定したが、数値化の場合には、これに限定されない。図６には、共感状態の定量化（数値化）の説明図を示し、図７には、嫌悪状態の定量化（数値化）の説明図を示した。
【００７０】
対象者が共感しているか否かは、心理状態が「快」かつ「興奮」の状態であるか否かを判断し、「共感」の状態の度合いを示す数値で表す。すなわち、「快適−不快」軸、「興奮−沈積」軸を縦横軸とする平面上において、原点からの距離で、「共感」の状態の度合いを示す数値（以下の説明では、共感状態値という）で表し、「嫌悪」の状態の度合いを示す数値（以下の説明では、嫌悪状態値という）で表す。
【００７１】
図６の例では、快適さが原点から＋０．４で、興奮さが原点から＋０．３である。従って、共感状態値は、０．５（＝√（０．４^２＋０．３^２））と求まる。同様に、図７の例では、快適さが原点から−０．４で、興奮さが原点から＋０．３である。従って、嫌悪状態値は、０．５（＝√（−０．４^２＋０．３^２））と求まる。
【００７２】
以上のことにより、本実施の形態では、共感状態値及び嫌悪状態値は、最小０最大√２の値の範囲内で数値化されることにある（図８参照）。
【００７３】
以上説明した特徴データ５０、状態データ５２、及び心理データ５４は、プリンタ２６やＣＲＴ３０で参照可能に提示されると共に、接続装置６８が接続された評価装置７０の接続装置８８により、評価装置７０へ出力される。
【００７４】
上記構成による、催事場の動向把握のための対象者の行動評価用の認識データを得る認識装置１０の作動について説明する。認識装置１０では、図９に示す処理ルーチンが実行され、撮影装置２２及び音声装置２４により、複数の対象者に対する画像データ及び音声データが収集され、これをセンサデータとして検出する（ステップ１００）。
【００７５】
次に、上記検出したセンサデータにより、画像認識処理及び音声認識処理が実行される（ステップ１０２）。これらの画像認識処理及び音声認識処理は、上述の機能部である行動把握部３２で処理される処理に相当し、この処理で把握データが得られる。次に、把握データにより特定できる何れかの対象者を、自動的に一人またはオペレータのキーボード２８による入力値によって設定し（ステップ１０４）、設定した対象者について、行動の特徴を把握する（ステップ１０６）。この処理は、上述の機能部である行動認識部４４で処理される処理に相当し、対象者についての行動の特徴を表す特徴データ５０を求める。
【００７６】
次に、特徴データ５０から、感情要素を定量化する（ステップ１０８）。この定量化は、上述の機能部である行動状態導出部４６で処理される処理に相当し、対象者のその時点における行動状態に起因する対象者の内部状態を表す感情要素を数値化した状態データ５２を求める。次に、状態データ５２から、心理状態を定量化する（ステップ１１０）。この定量化は、上述の機能部である状態変化推定部４８で処理される処理に相当し、対象者のその時点における心理状態を分類しかつ分類した心理状態を数値化した心理データ５４を求める。
【００７７】
そして、全ての対象者について上述の処理が終了（ステップ１１２で肯定判断）するまで、上記処理を繰り返し、全ての対象者について心理状態の推定処理が終了すると、得られた特徴データ５０、状態データ５２、心理データ５４を接続装置６８を介して評価装置７０へ出力（ステップ１１４）したのちに本ルーチンを終了する。なお、得られた特徴データ５０、状態データ５２、心理データ５４をプリンタ２６やＣＲＴ３０へ出力することもできる。
【００７８】
このようにして、対象者の行動について映像や音声により収集したデータを用いて認識処理をした結果について、対象者の行動の特徴を抽出し、その行動の特徴から感情要素を求め、求めた感情要素から心理状態を推定しているので、対象者がどのような心理状態にあるのか、どのような心理状態でその場に立ち会っているのかなどのように、通常知り得ない心理状態を含むデータを提示できる。
【００７９】
図１０には、催事場の動向把握を支援するための各種データを得るための評価装置７０における、評価処理に関する機能的なブロック図を示した。評価装置７０は、認識装置１０からの特徴データ５０、状態データ５２、心理データ５４を用いて対象者の注目状態を評価するものである。
【００８０】
評価装置７０は、その機能部分として、空間評価部９２と、注目状態評価部９４と、注目部位評価部９６と、構造物記憶部９７とに分類される。これらの機能部により評価された対象者の行動により対象者が注目した空間内の構造物の部位を表すデータが対象部位データ９８として得られる。この対象部位データ９８は、ププリンタ８２やＣＲＴ８６で参照可能に提示される。
【００８１】
なお、以下に説明する各機能部は、評価装置７０のメモリ８０に格納された各機能部の各処理として機能する処理プログラムが、ＣＰＵ７２、ＲＯＭ７４、ＲＡＭ７６、入出力ポート７８などの各資源を利用して処理することで構成される。
【００８２】
認識装置１０からのデータは、空間評価部９２に入力されるように構成され、この空間評価部９２には構造物記憶部９７からの構造データも入力されるように構成される。空間評価部９２の評価結果は、注目状態評価部９４へ出力されるように構成され、この注目状態評価部９４において単位空間と注目度（後述）が蓄積される。この蓄積結果は、注目部位評価部９６へ出力されるように構成され、注目部位評価部９６は評価結果の対象部位データを出力する。この注目部位評価部９６には構造物記憶部９７からの構造データも入力されるように構成される。
【００８３】
空間評価部９２は、認識装置１０で把握された対象者の特徴や内部状態及び心理状態から、対象者が注目した空間を評価する機能部である。この空間評価部９２では、空間を単位空間に分割して単位空間で評価すると共に、構造物記憶部９７に記憶されている空間内の構造物の構造データを用いて空間内にある構造物を考慮して評価される。この空間評価部９２では、その評価によって、対象者による注目位置に対応する単位空間とその単位空間の注目度が導出される。
【００８４】
注目状態評価部９４は、空間評価部９２の評価結果を用いて、対象者の注目状態を評価する機能部である。この注目状態評価部９４は、空間評価部９２で導出された単位空間及び注目度を対応づけて記憶することを繰り返す蓄積を行うことで注目状態を保持する。
【００８５】
注目部位評価部９６は、注目状態評価部９４の評価結果を用いて対象者の注目部位を評価する機能部である。注目部位評価部９６では、注目状態評価部９４で保持されている単位空間及び注目度、すなわち、対象者が注目した単位空間と注目度の履歴により、最も注目したと予測される単位空間から注目部位が導出される。この注目部位評価部９６では、構造物記憶部９７に記憶されている空間内の構造物の構造データを用いて空間内にある構造物を考慮して注目部位が導出される。
【００８６】
なお、認識装置１０及び評価装置７０からなるシステムは本発明の情報提供装置に相当する。また、認識装置１０は本発明の検出手段に相当し、空間評価部９２は本発明の導出手段に相当し、構造物記憶部９７は本発明の記憶手段に相当し、注目状態評価部９４は本発明の蓄積手段に相当し、注目部位評価部９６は本発明の特定手段に相当する。また、本発明の提示手段は、プリンタ８２やＣＲＴ８６が対応する。
【００８７】
〔実施形態の作用〕
次に、本実施形態の作用を説明する。本実施の形態では、認識装置１０において検出された対象者の特徴から、評価装置７０において対象者の注目部位を求める。そこで、評価装置７０の作動を中心に説明する。評価装置７０では、図１１に示す処理ルーチンが実行される。
【００８８】
図１１のステップ２０２では、認識装置１０の認識結果である認識データ（特徴データ５０、状態データ５２、心理データ５４）を検出する。すなわち、認識装置１０において検出された視線方向ベクトル（始点とベクトル方向）を検出すると共に、共感状態値や嫌悪状態値を認識データＪとして検出する。
【００８９】
次のステップ２０４では、上記検出した認識データＪにより、感情度Ｋを導出する。この感情度Ｋは、感情の度合いを表すもので、上述の共感の度合いを表す共感度や嫌悪の度合いを表す嫌悪度に相当する。すなわち、認識装置１０から出力される共感状態値や嫌悪状態値（認識データＪ）は、最大値が√２であるので、これを最大値「１」に規格化する処理に相当し、規格化後の数値を感情度Ｋ（共感度や嫌悪度）とする。具体的には、認識装置１０から出力される共感状態値や嫌悪状態値を所定数（√２）で除算する処理（Ｋ＝Ｊ／√２）を実行する。
【００９０】
次のステップ２０６では、対象者が注目した空間位置を導出する。この処理は、視線方向ベクトルと構造物の３次元空間データから、対象者の視線の先に存在する空間ブロック（単位空間）を導出するものである。
【００９１】
図１３に示すように、展覧会等の催事場の３次元空間すなわち対象者が注目することが可能な注目空間２３０を１辺の距離が所定値（例えば１０ｃｍ）の立方体（以下、単位空間という）２３２で分割する。この分割は、注目空間２３０は、催事場の構造図や設計図から容易に特定することができる。この注目空間２３０内には、展示物や構造上の物品が設置されており、これらの位置関係が記憶されている。
【００９２】
すなわち、本実施の形態では、注目空間２３０内に存在する全ての物品に対して、設置位置、寸法などの３次元空間データを、メモリ８０に保持しており、各物品が分割された単位空間に対応づけられる。
【００９３】
例えば、図１３に示すように、注目空間２３０を単位空間２３２で分割する。この分割は、注目空間２３０は、催事場の構造図や設計図から容易に特定することができる。この注目空間２３０内には、展示物などが配置されている。図１２の例では、その物品として自動車２３４が配置された一例を示している。この自動車２３４の構造データは、自動車２３４の位置関係データと共に、メモリ８０に記憶されており、自動車２３４の各パーツが注目空間２３０内の何れに位置するかを把握することができる。従って、単位空間２３２で分割された注目空間２３０内に位置する自動車２３４について何れの単位空間２３２に各パーツまたは構造位置が含まれるか否かを判別することで、自動車２３４の構造位置を単位空間２３２の位置で規定することができる。
【００９４】
従って、認識装置１０からの視線ベクトル情報（ベクトル始点、方向）と、上記３次元データから対象者が目視している単位空間２３２を特定できる。そして、その単位空間２３２内の物品（例えば、自動車２３４の部位）を特定することができる。
【００９５】
次のステップ２０８では、対象者が目視したときの視線の移動量を導出する。この処理は、任意の時刻（ｔ）における対象者が目視している単位空間２３２と、次に目視した単位空間２３２（所定時間｛ｔ＋△ｔ｝を経過したときに目視したとされる単位空間２３２）の距離を求めるものである。すなわち、任意の時刻（ｔ）における対象者が目視している単位空間２３２の中心位置座標を（ｘ０，ｙ０，ｚ０）として、次に目視した単位空間２３２（所定時間｛ｔ＋△ｔ｝を経過したときに目視したとされる単位空間２３２）の中心位置座標を（ｘ１，ｙ１，ｚ１）とすると、視線移動量Ｌは、次の式で表すことができる。
【００９６】
Ｌ＝√｛（ｘ１−ｘ０）^２＋（ｙ１−ｙ０）^２＋（ｚ１−ｚ０）^２｝
【００９７】
次のステップ２１０では、対象者が目視した単位空間２３２の視線についての滞留時間を求める。この処理における視線滞留度Ｍは、図１４に示すように、視線移動量Ｌが大きくなるのに従って小さくなる値をとる関数で表現できる。例えば、視線滞留度Ｍは、次の式によって求めることができる。
【００９８】

但し、Ｌｍａｘは、予め設定した所定値である。
【００９９】
この所定値Ｌｍａｘは、任意の時刻（ｔ）において導出される「共感」や「嫌悪」の心理状態と、その時点における視線の先にある物品との間に関連性が確保可能な視線移動最大速度の最大値から求めることができる。
【０１００】
例えば、１秒間隔で視線をサンプリングした場合、視線が所定速度（例えば、３ｍ／ｓ）以下で移動しているとき、心理状態と視線方向の物品との間に関連性を確保できることが想定される。すなわち、対象者が注視するという状態を視線の移動速度で考えると、統計的に所定値を導出でき、この移動速度を設定すれば、上記の関連性の判別に用いることができる。一方、対象者が目視対象の物品を変更する場合、速い移動速度で視線を変更する。
【０１０１】
従って、例えば、３ｍ／ｓを所定速度とすると、所定値Ｌは、３ｍ（＝３［ｍ／ｓ］×１［ｓｅｃ］）となる。
【０１０２】
上記式を用いれば、視線移動量Ｌ＝０（所定時間内で同一の単位空間２３２を目視）の場合、視線滞留度Ｍは「１」となり、視線移動量Ｌが初手位置Ｌｍａｘに近づくほどに小さくなり、所定値Ｌｍａｘを越えると、全て「０」となる。
【０１０３】
次に、ステップ２１２では、感情状態の瞬時値を導出する。この処理は、共感や嫌悪の度合いについて瞬間的な深さを求めるものである。感情状態は、推移する。また、その感情推移の間、視線は滞留したり大きく変更したりする。そこで、本実施の形態では、感情度Ｋと視線滞留度Ｍとの関係から感情状態の瞬時値Ｎを求めている。瞬時値Ｎは、次の式によって求めることができる。
【０１０４】
Ｎ＝Ｋ×Ｍ
但し、Ｋは共感度または嫌悪度であり、Ｋが共感度の場合には瞬時値Ｎは、共感の深さを表す瞬時値Ｎとなり、Ｋが嫌悪度の場合には瞬時値Ｎは、嫌悪の深さを表す瞬時値Ｎとなる。
【０１０５】
次のステップ２１４では、上記ステップ２１２で導出した瞬時値Ｎを、単位空間２３２に対応して累積する。すなわち、ステップ２１２で求める瞬時値Ｎは、任意の時刻における瞬間的な値であるので、これを蓄積することで、単位空間２３２についてどれだけ感情状態が変動している、すなわち注目しているたのかを累積値として把握することができる。そこで、瞬時値Ｎを対応する単位空間２３２のデータとして累積する。
【０１０６】
図１５には、説明を簡単にするため、注目空間２３０を２次元平面に投影した図を示した。図１５（Ａ）に示すように、自動車２３４を目視する対象者の視線移動による滞留範囲２３６が、ドアミラー周辺であるとき、図１５（Ｂ）に示すように、所定時間（ｔ）までの累積値は、中心が９．２であり、その周辺が上部から時計回りに８．５，７．７，７．２，８．９となっている。所定時間経過後（ｔ＋△ｔ）の瞬時値Ｎは、図１５（Ｃ）に示すように、中心が０．８であり、その周辺が上部から時計回りに０．２，０．５，０．６，０．７となっている。これにより、累積値は、中心が１０．０であり、その周辺が上部から時計回りに８．７，８．２，７．８，９．６となる。
【０１０７】
次のステップ２１６では、累積した瞬時値Ｎから注目部位を導出する。すなわち、メモリ８０には、単位空間２３２毎に、瞬時値Ｎが累積されている。これは、所定時間内に対象者が注目空間２３０を注目したことが想定される単位空間２３２毎の分布である。従って、その累積値が最大のものが、対象者が注目した部位を含む単位空間２３２である確度が高い。この単位空間２３２には、自動車２３４など注目空間２３０内に設置された物品の部位が対応されている。このため、累積した瞬時値Ｎが最大である単位空間２３２を求め、求めた単位空間２３２に対応する物品の部位を注目部位として特定する。図１５の例では、累積値が１０．０である単位空間２３２であり、その単位空間２３２に対応するドアミラーに注目されていると特定する。
【０１０８】
この特定された注目部位は、その感情状態を含んでいる。すなわち、例えば、共感の深さが高い場合には、関心を有しつつ注目していると特定できる。一方、嫌悪の深さが高い場合には、いやがる気持ちを有しつつ注目していると特定できる。
【０１０９】
次のステップ２１８では、リアルタイム評価か否かを判断し、肯定されると、ステップ２０２へ戻り上記処理を繰り返し実行する。リアルタイム評価とは、実時間継続処理であり、予め定めた条件になるまで肯定判断される。このリアルタイム評価で否定判断される条件の一例は、所定時間を経過した、累積値が所定値を超えた、などの評価を中断する条件がある。
【０１１０】
ステップ２１８で否定されると、ステップ２２０へ進み、特定した注目部位をプリンタ８２やＣＲＴ８６へ出力して本ルーチンを終了する。
【０１１１】
このようにして、対象者の行動について映像や音声により収集したデータを用いて認識処理をした結果について、対象者の行動の特徴、すなわち行動の特徴と心理状態から、対象者の注目部位を推定しているので、対象者がどのような心理状態で物品を注目しているのか、どの部位をどのような心理状態で注目しているのかなど、通常知り得ない３次元空間内における心理状態を含む注目部位を提示できる。
【０１１２】
本実施の形態の認識装置１０及び評価装置７０によるシステムは、モデルルームにおける顧客の関心や注目に対する評価、店舗における顧客の商品に対する関心評価、ショールームにおける関心度が高い物品の評価、博物館や美術館などの場における関心度が高い物品の評価に好適に用いることができる。
【０１１３】
なお、上記実施の形態では、自動車２３４などの物品の構造として外形などの構造を目視する場合に好適であるが、物品の外形目視に限定されるものではない。例えば、ショウウィンドウのようにガラスなどの透明媒体を介して目視する場合にも適用可能である。この場合、透明媒体の構造データとして、透明度を表す属性を付与することで、目視可能なことを表す状態を数値化して表現でき、透明媒体の先に設置された物品の評価すなわち注目状態を評価することができる。
【０１１４】
また、上記実施の形態では、自動車２３４などの固定的な物品を目視する場合に好適であるが、固定的に設置された物品の目視に限定されるものではない。例えば、アミューズメントなどのように物品が移動したり、移動する演者などの人物を目視する場合にも適用可能である。この場合、移動する物品や人物について時刻歴を構造データに属性として付与することで、移動を伴う注目対象を数値化して表現でき、移動される物品や人物の評価すなわち注目状態を評価することができる。
【０１１５】
また、上記認識装置１０では、複数の対象者から対象者を設定し、順次設定を変更することで全ての対象者についての特徴を得る場合を説明したが、全ての対象者について特徴を得ることに限定されない。例えば、予め定めた対象者のみのデータを出力するようにしてもよい。
【０１１６】
また、上記では単一の場におけるデータ出力を説明したが、複数の議題を複数のコミュニティで行っている場などへも、容易に適用することができる。例えば、対象者の視線方向で到達する対象者の存在を図形的に求めるなどのように、議題のグループをコミュニティとして場を分類し、分類されたコミュニティ毎に処理を実行することによって達成することができる。
【０１１７】
また、上記では撮影装置２２や音声装置２４で撮影したり収音したりしたデータを用いて心理状態を推定した場合を説明したが、撮影装置２２や音声装置２４で予め撮影したり収音したりしたデータを記憶媒体（例えば磁気テープや光ディスク）に記憶して、その記憶媒体に記憶されたデータ用いて心理状態を推定してもよい。
【０１１８】
本実施の形態の処理ルーチン及び各機能部の処理は、記録媒体としての磁気ディスクメディアに格納して流通させることが可能である。この場合、図示しない磁気ディスクメディアリードライト装置を備えることによって、処理ルーチン等は、図示しない磁気ディスクメディアリードライト装置を用いて磁気ディスクメディアに対して読み書き可能である。従って、予め磁気ディスクメディアに処理ルーチン等を記録しておき、図示しない磁気ディスクメディアリードライト装置を介して磁気ディスクメディアに記録された処理プログラムを実行してもよい。また、コンピュータにハードディスク装置等の大容量記憶装置（図示省略）を接続し、磁気ディスクメディアに記録された処理プログラムを大容量記憶装置（図示省略）へ格納（インストール）して実行するようにしてもよい。また、記録媒体としては、ＣＤ−ＲＯＭ，ＭＤ，ＭＯ，ＤＶＤ等のディスクやＤＡＴ等の磁気テープがあり、これらを用いるときには、上記図示しない磁気ディスクメディアリードライト装置に代えてまたはさらにＣＤ−ＲＯＭ装置、ＭＤ装置、ＭＯ装置、ＤＶＤ装置、ＤＡＴ装置等を用いればよい。
【０１１９】
【発明の効果】
以上説明したように本発明によれば、催事場や展覧会などの場で、検出された対象者の行動について特徴を表す特徴情報と記憶手段に記憶された構造物の構造情報とから、導出手段により対象者が注目した度合いを表す注目度を導出して蓄積して、特定手段により対象者が注目した注目部位を特定するので、容易に対象者が注目した注目部位を特定することができる、という効果がある。
【図面の簡単な説明】
【図１】本発明の実施の形態にかかる支援装置の概略構成を示すブロック図である。
【図２】本発明の実施の形態にかかる認識装置を機能的に説明するための機能別概念構成図である。
【図３】認識装置で用いられる感情要素を軸とした心理状態空間を示すイメージ図である。
【図４】快適さ及び興奮さの状態データと、共感、嫌悪、興味、及び無関心の心理データとの対応関係を説明するための説明図である。
【図５】快適さ及び緊張さの状態データと、活性及び抑圧の心理データとの対応関係を説明するための説明図である。
【図６】共感についての数値化を説明するための説明図である。
【図７】嫌悪についての数値化を説明するための説明図である。
【図８】共感状態値の最大値を説明するための説明図である。
【図９】認識装置において実行される処理の流れを示すフローチャートである。
【図１０】本発明の実施の形態にかかる評価装置を機能的に説明するための機能別概念構成図である。
【図１１】評価装置において実行される処理の流れを示すフローチャートである。
【図１２】評価装置で評価する注目空間を示すイメージ図である。
【図１３】物品を含む注目空間を示すイメージ図である。
【図１４】支援滞留度を説明するための説明図である。
【図１５】瞬時値の累積を説明するための説明図であり、（Ａ）は滞留範囲を示し、（Ｂ）は累積値を示し、（Ｃ）は瞬時値を示している。
【符号の説明】
１０…認識装置
５０…特徴データ
５２…状態データ
５４…心理データ
７０…評価装置
９０…支援装置
９２…空間評価部
９４…注目状態評価部
９６…注目部位評価部
９７…構造物記憶部
９８…対象部位データ
２３０…注目空間
２３２…単位空間
２３６…滞留範囲[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an information providing apparatus, and more particularly, to an information providing apparatus that provides information of interest to a subject in a three-dimensional space.
[0002]
[Prior art]
It is important to grasp what the viewer pays attention to at the venue such as an event hall or an exhibition in order to understand the organizer's side and the trend of the place.
[0003]
Therefore, there is known a system that determines whether or not attention has been paid to an arbitrary product or the like based on image data captured by a television camera and measures the degree of attention (for example, see Patent Document 1). In this technique, the position of a person, the direction of a face, and the like are detected from image data obtained by capturing a person with a television camera, and a degree of attention is obtained based on a staying time around an arbitrary object, the number of persons staying, and the like.
[0004]
[Patent Document 1]
JP-A-10-48008
[0005]
[Problems to be solved by the invention]
However, grasping only the person is not enough to determine that the person actually pays attention. That is, it is necessary to accurately detect which position the person is paying attention to three-dimensionally, and it is difficult to detect the position only from the line-of-sight direction.
[0006]
The present invention has been made in view of the above circumstances, and has as its object to provide an information providing device that provides information about a target person's attention in a target space including an arbitrary structure that is a target of the target person's action.
[0007]
[Means for Solving the Problems]
In order to achieve the above object, an information providing apparatus according to the present invention includes a detecting unit configured to detect characteristic information representing a characteristic of a behavior of a target person, a position of a structure in a target space to be targeted by the behavior of the target person, and A storage unit for storing structure information representing a structure, and dividing the target space into unit spaces and obtaining a correspondence relationship as to which of the divided unit spaces the structure according to the structure information corresponds to, based on the feature information Deriving means for deriving a unit space corresponding to the position of interest of the subject and deriving a degree of interest representing the degree of attention of the subject with respect to the unit space; and a unit space and degree of interest derived by the deriving means. Accumulating means for accumulating the information in association with each other; identifying means for identifying a site of interest to which the target person has noticed based on the unit space and the degree of attention accumulated by the accumulating means; Characterized in that and a presenting means for presenting the identified area of interest Ri.
[0008]
According to the present invention, in a place such as an event hall or an exhibition, in order to grasp a part of a target person, such as a viewer or a participant, to which the target person pays attention, characteristic information indicating characteristics of the behavior of the target person is detected by a detection unit. This detection corresponds to the behavior of the target person's actions and emotional expressions as an observation result of a third party using video, audio, or the like. The characteristic information represents the characteristics of the behavior of the target person, and can be integrally classified as the characteristics of the behavior of the target person based on fluctuations of each part of the limb of the target person, sounds emitted by the target person, and the like.
[0009]
For example, the subject's psychological state appears in the position, speed, direction, and the like of changes in the limbs such as the eyeballs, eyelids, head, and arms, which are represented by gaze behavior, gestures, and hand gestures. In addition, the subject's psychological state appears in the timing of breathing or speaking, the speed, the size, and the like. Therefore, the fluctuation of each part of the limb of the target person and the sound emitted by the target person can be classified in advance, and can be derived as characteristic information indicating the characteristic of the behavior of the target person.
[0010]
Further, the information providing apparatus of the present invention includes a storage unit, and the storage unit stores structure information indicating a position and a structure of a structure in a target space to which the subject is to act. The structural information includes information representing the structure of the building itself as digital data such as CAD data, information representing the position of the building as digital data of color and density, and digital data of an image such as a painting. Therefore, an arbitrary part of the structure in the target space can be specified by the structure information.
[0011]
Since the feature information detected by the detection means is a feature of the action, it is considered that the feature information contributes to the state of interest at that time. Therefore, the deriving unit derives a degree of attention indicating the degree of attention of the target person from the feature information. First, the deriving unit divides the target space into unit spaces, and obtains a correspondence relationship as to which of the divided unit spaces the structure according to the structural information corresponds to. Thus, the structure can be grasped for each unit space. Then, the deriving unit derives a unit space corresponding to the position of interest of the subject based on the feature information. At the same time, the deriving unit derives a degree of attention indicating the degree of attention of the subject on the derived unit space. That is, the feature information that is the feature of the behavior includes the direction of the target person toward the building, the designated position, and the like, and the target unit space in the target space divided by the unit space can be derived. At this time, the degree of attention indicating the degree of attention of the target person in the unit space can be derived from the number of times and time taken in the unit space for the direction of the target person and the designated position toward the building according to the feature information.
[0012]
The storage unit stores the unit space derived by the deriving unit and the attention level in association with each other. Therefore, the degree of interest is stored in the storage means for each unit space, and the degree of interest by the subject is stored in the target space. The specifying means specifies a target part of interest by the subject based on the unit space and the degree of interest accumulated by the accumulation means. That is, since the degree of attention of the target person in the target space is stored in the storage unit, it can be estimated that the larger the amount of storage, the higher the degree of attention. In this case, it can be estimated that the unit space having the largest accumulation amount is the space noticed by the target person, and if the structure corresponding to this unit space is specified, the attention site noticed by the target person can be specified. Then, by presenting the attention area specified by the identification means by the presenting means, it is possible to present the attention area of the target person as information for a third party to grasp.
[0013]
The detection means comprises: video detection means for detecting photographed image information of a target person as characteristic information; and sound detection means for detecting, as characteristic information, sound information emitted by a plurality of target persons. And
[0014]
In order to detect the characteristic information appearing in the behavior of the target person, it is preferable to grasp the target person from the video or the sound capturing the target person. In view of this, it is preferable that the detecting means include a video detecting means for detecting photographed image information of the target person as characteristic information, and a sound detecting means for detecting sound information emitted by the target person as characteristic information.
[0015]
The video detecting means can capture the behavior of the target person as a video, and the voice detecting means can capture the sound emitted by the target person as voice information. This makes it easy to digitize features appearing in the behavior of the subject.
[0016]
The detection means detects gaze information related to the gaze of the target person, and the derivation means derives the unit space and the degree of attention from the gaze information of the target person.
[0017]
When the target person pays attention to the object, the target person may point out by hand gesture, but the target person often looks closely at the object. Therefore, gaze information related to the gaze of the target person is detected, and the deriving unit derives the unit space and the degree of attention from the gaze information of the target person. That is, if the line of sight information can be detected, the intersection between the line of sight and the structure based on the structure information can be detected, so that it is easy to specify the unit space. As a result, it is possible to easily specify the attention site to which the target person pays attention.
[0018]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an example of an embodiment of the present invention will be described in detail with reference to the drawings. In the present embodiment, the behavior (gaze movement, voice change, physical movement, etc.) of a viewer who views at an exhibition hall or the like is grasped and analyzed, and the trend of the exhibition hall is grasped. In addition, the present invention is applied to a case where information on a region of interest to a viewer is provided.
[0019]
〔Constitution〕
FIG. 1 shows a schematic configuration of a support device 90 for supporting the understanding of a trend at a venue or the like to which the present invention can be applied. As shown in FIG. 1, the support device 90 includes a recognition device 10 and an evaluation device 70.
[0020]
The recognition device 10 has a computer configuration in which a CPU 12, a ROM 14, and a RAM 16 are connected to an input / output port (hereinafter, referred to as I / O) 18 so that commands and data can be transmitted and received. The I / O 18 is connected to a memory 20 storing various data and programs.
[0021]
Also, a keyboard 28 for inputting data, a display (hereinafter referred to as CRT) 30 for displaying commands and data, and a printer 26 for printing commands and data are connected to the I / O 18. Further, the I / O 18 is connected to a photographing device 22 such as a camera for photographing the viewer, and an audio device 24 such as a microphone for inputting a voice uttered by the viewer. Further, a connection device 68 for transmitting and receiving various data to and from the evaluation device 70 is connected to the I / O 18.
[0022]
The evaluation device 70 has a computer configuration in which a CPU 72, a ROM 74, and a RAM 76 are connected to an input / output port (hereinafter, referred to as I / O) 78 and can transmit and receive commands and data. A memory 80 storing various data and programs is connected to the I / O 78.
[0023]
The I / O 78 is connected to a printer 82 for data output, a keyboard 84 for data input, and a CRT 86 for displaying commands and data. Further, a connection device 88 for exchanging various data with the recognition device 10 is connected to the I / O 78.
[0024]
[Function block]
In the present embodiment, the recognition device 10 performs a plurality of recognition processes in order to support the understanding of the trend of the event hall. The evaluation device 70 performs an evaluation process for grasping the trend of the event hall based on the data obtained by the recognition device 10.
[0025]
The outline of the recognition processing in the recognition device 10 is as follows. The behaviors of all subjects are grasped using data from the sensors (the photographing device 22 and the voice device 24), and then, the characteristics of the behaviors are grasped for the designated subjects. Then, the emotional element of the target person is estimated, and the mental state of the target person is estimated. In this case, the relationship with other subjects related to the subject whose mental state is estimated is also an element. By performing the above estimation for all the subjects, the psychological state of each subject including the mutual relationship can be estimated, so that it is possible to provide data necessary for more accurate grasp of the trend of the event hall. .
[0026]
The outline of the evaluation processing in the evaluation device 70 is as follows: the characteristics of the behavior of the target person are grasped using the data from the recognition device 10, the part to which the target person pays attention is derived, Is estimated to be the attention site. Note that by using the relationship between the subject and other subjects as an element, the attention site including the mutual relationship can be estimated, so that the trend of the event hall can be grasped more precisely.
[0027]
FIG. 2 shows a functional block diagram related to recognition processing in the recognition device 10 for obtaining various data for supporting the understanding of the trend of the event hall. The recognition device 10 is classified into a behavior grasping unit 32 and a psychological state recognizing unit 34 as its functional parts. In each of the functional units described below, a processing program functioning as each processing of each functional unit stored in the memory 20 of the recognition device 10 uses each resource such as the CPU 12, the ROM 14, the RAM 16, the I / O 18, and the like. It is configured by processing.
[0028]
The behavior grasping unit 32 grasps the instantaneous behavior of the subject from the image data of the subject photographed by the photographing device 22 and the voice data of the subject collected by the audio device 24. In addition, the psychological state of the target person is estimated from the instantaneous behavior of the target person grasped by the behavior grasping unit 32.
[0029]
The behavior grasping unit 32 classifies and grasps four types of instantaneous behaviors of the line of sight, blinking, body movement, and voice as grasping of the instantaneous behavior of the target person. In order to grasp these instantaneous actions, the behavior grasping unit 32 includes a gaze grasping unit 36, a blink grasping unit 38, a body movement grasping unit 40, and a voice grasping unit 42, and obtains data from each grasping unit. be able to.
[0030]
First, a gaze grasping unit 36, a blink grasping unit 38, and a body movement grasping unit 40 correspond to a grasping unit for recognizing (image recognizing) image data of a target person photographed by the photographing device 22. The voice recognition unit 42 corresponds to a recognition unit that performs recognition processing (voice recognition) on the voice data of the target person collected by the voice device 24. The items grasped by each grasping unit are shown in Table 1 below.
[0031]
[Table 1]

[0032]
The gaze grasping unit 36 obtains the gaze direction and the movement within a predetermined time as grasp data from the position of the eyeball and the position of the pupil of the subject by performing image processing on the captured image. The blink grasping unit 38 processes the photographed image (image data obtained by the photographing device 22) to determine the blink frequency and the blink distribution (clusteriness) from the position of the eyeball of the subject and the behavior of the eyelids. Obtain as grasp data.
[0033]
In addition, the body motion grasping unit 40 recognizes the head, face, hands, arms, upper body, and posture as the limbs of the target person by performing image processing on the captured image, and for each recognition, the position, direction, and The movement within a predetermined time is obtained as grasp data. The body movement grasping unit 40 further grasps the position of the subject at that place and the relative relationship with other subjects, and includes this in the grasp data.
[0034]
The voice grasping unit 42 extracts the conversation of the target person by performing voice processing on the acquired voice (voice data obtained by the voice device 24), and obtains voice volume, tone, speed, and continuity grasp data. In addition, since the acquired sound may include ambient environment sound, this may be grasped data as environmental sound.
[0035]
The psychological state recognizing unit 34 estimates the psychological state of the subject using the grasping data obtained by the behavior grasping unit 32. The psychological state recognition unit 34 includes an action recognition unit 44, an action state derivation unit 46, and a state change estimation unit 48.
[0036]
The behavior recognition unit 44 is a processing unit that grasps the characteristics of the behavior of the target person, and is configured to receive grasp data from the behavior grasp unit 32. The feature of the action grasped by the action recognizing unit 44 is configured to be output from the psychological state recognizing unit 34 as feature data 50, and the feature data 50 is configured to be output to the action state deriving unit 46. Is done.
[0037]
The behavior recognition unit 44 grasps the characteristics of the target person from the grasped data that has been input, and grasps using the line of sight, the blink, the body motion, and the voice included in the grasped data. The characteristic items to be grasped are shown in Table 2 below.
[0038]
[Table 2]

[0039]
The characteristic items grasped by the action recognizing unit 44 can be classified into items that can be individually extracted for the target person and items that can be extracted from the mutual relationship with other target persons. Items that can be extracted independently include the line of sight, the direction and movement of the line of sight, and the difference between the line of sight and the direction of the face of the subject. As for blinks, there are blink frequency and blink burstiness. The body movement includes a vertical movement and a left and right movement of the head, a displacement of the face and the body, a movement of the hands and arms, a direction of the upper body, a movement of the upper body, a forward lean, a backward lean, and the like. For voice, there are voice volume, tone, speed, and the like.
[0040]
Items that can be extracted from the correlation with other subjects include gaze intersections for gazes, and for body movements, facing postures, matching postures, matching gaze targets, and synchronizing and synchronizing gestures and hand gestures There is. In addition, speech has continuity of conversation. These features are output as feature data 50, but the size and frequency of each item can also be output as coefficients.
[0041]
When identifying a target person among a plurality of target persons, a plurality of pieces of grasp data may be required. For example, when the body motion grasping unit 40 further grasps the position of the subject at the place and the relative relationship with other subjects, it is determined whether the voice corresponds to the subject by a mouth or a hand synchronized with the voice. The direction of the target person with respect to another target person can be determined from the gaze direction, the body direction, and the like. That is, when it is difficult to determine from one piece of grasp data, it is possible to make a decision using other grasp data.
[0042]
In the present embodiment, an example will be described in which a line of sight is adopted as a feature of an action in order to obtain a target part of a subject.
[0043]
The recognition device 10 obtains the coordinates (x, y, z) of the center position of both eyes as the feature of the face of the target person. The coordinates (x, y, z) in each of the three-dimensional directions X, Y, and Z can be quantified with reference to the imaging device 22. In this case, it is preferable that the imaging device 22 sets the two cameras at a predetermined interval and in the same imaging direction. The direction X is set to the right and left directions with respect to the arrangement of the photographing device 22 with the right direction toward the photographing direction being a positive sign. The direction Y is set up and down with respect to the arrangement of the photographing device 22 with the upward direction toward the photographing direction being a positive sign. The direction Z is set in the front-back direction with respect to the arrangement of the photographing device 22 with the photographing direction toward the photographing direction, with the photographing direction being a negative sign. Therefore, the center position between the two cameras of the photographing device 22 is the origin (0, 0, 0).
[0044]
Further, as other directions, rotation around each axis in each direction X, Y, Z is defined as rotation direction φ, θ, ψ. According to the above definition, the direction of the line of sight of the subject is φe in the direction of rotation about the X axis, θe in the direction of rotation about the Y axis, and θe in the direction of the face from the position of the iris of the eyeball to the face. The value of the rotation direction 回り is set as ψe.
[0045]
With the above setting, the line of sight can be represented by a vector having a vector start point (x, y, z) and a vector direction (φ + φe, θ + θe, ψ + ψe).
[0046]
The behavior state deriving unit 46 is a processing unit that derives the emotional element of the subject, that is, the internal state. The behavior state, which is the internal state of the subject derived based on the feature data 50, is recognized as state data 52. The state data 52 is configured to be output to the state change estimating unit 48 while being configured to be output from the unit 34.
[0047]
The action state deriving unit 46 derives the internal state of the subject from the input feature data 50, and can obtain the internal state of the subject from the relationship between predetermined action features and emotion elements. The relationship between the behavior features and the emotional elements is shown in Tables 3 to 5 below. In the present embodiment, based on Wundt's three-dimensional theory of emotion, three types of emotion elements, comfort (comfort to discomfort), excitement (excitement to calm), and tension (tension to relaxation), are adopted.
[0048]
[Table 3]

[0049]
[Table 4]

[0050]
[Table 5]

[0051]
The behavior state deriving unit 46 quantifies the comfort using the pertinent data of the line of sight and the body movement in the characteristic data 50 in order to obtain the state data 52 of the comfort.
[0052]
In the examples of comfort and discomfort shown in Table 3, the direction of the line of sight is data that becomes more comfortable as the direction becomes closer to another subject (for example, the subject of the conversation) and data that becomes more unpleasant as the direction becomes more avoidable. . Similarly, regarding the shift of the face direction of the line of sight, the data becomes more comfortable as the direction is directed toward another subject, and the data becomes more unpleasant as the direction is avoided.
[0053]
In addition, the orientation of the upper body in the body motion is data that becomes more comfortable as the direction toward the other subject (for example, the subject of the conversation) becomes more comfortable, and becomes data that becomes more unpleasant as the direction becomes further away. Similarly, the inclination in the front-rear direction is data that becomes more comfortable as the user leans forward, and the data becomes more unpleasant as the person leans backward. , Data that becomes more uncomfortable as it shifts.
[0054]
Further, as for the vertical direction of the head in the body movement, the data becomes more comfortable as the movement increases. In addition, regarding the left-right direction of the head, data that becomes uncomfortable as the movement increases may be used.
[0055]
In the example of the excitement and the calm shown in Table 4, the movement of the line of sight is data that becomes excitable as it stops moving and data that becomes calm as it shakes. Similarly, regarding the frequency of blinks, it is assumed that the data becomes more exciting as the blink decreases, and the data becomes more subdued as a cluster.
[0056]
In addition, the movement of the upper body in the body movement is data that becomes more exciting as the body shakes, and the movement of the hand and arm is data that becomes more excited as the movement becomes larger, and data becomes more calm as the movement becomes smaller.
[0057]
Further, the volume of the voice is data that becomes more exciting as the volume increases, and the data becomes quieter as the volume becomes smaller. The speed is the data that becomes more exciting as the speed increases, and the data becomes quieter as the speed decreases. Is data which becomes more exciting as the height increases and data which becomes calm as the height decreases.
[0058]
In the example of tension and relaxation shown in Table 5, the movement of the line of sight is data that becomes nervous as it becomes unstable. Similarly, regarding the frequency of blinks, data that becomes nervous as the blink increases and data that relaxes as the blink decreases. In addition, the movement of the hand / arm among the body movements is data that relaxes as the movement stops.
[0059]
In order to obtain data as the state data 52, fuzzy inference for converting a language value into a numerical value, a numerical value using a neural network system or a genetic algorithm can be used.
[0060]
The state change estimating unit 48 is a processing unit that estimates the mental state of the target person, and the state change of the target person estimated based on the state data 52 is output from the mental state recognition unit 34 as the psychological data 54. Is configured. The state change estimating unit 48 quantifies the mental state using three types of state data 52 in order to obtain the psychological data 54. For this digitization, a three-dimensional lookup table is used.
[0061]
FIG. 3 shows a psychology centered on each of the three types of emotion elements employed in the present embodiment: comfort (comfort to discomfort), excitement (excitement to calm), and tension (tension to relaxation). The state space 60 is shown. In the psychological state space 60, a region of a predetermined mental state is defined. In this embodiment, the psychological state employs six types of empathy, disgust, interest, indifference, activity, and suppression. The data representing the psychological state consisting of the six types of sympathy, disgust, interest, indifference, activity, and suppression is referred to as psychological data 54.
[0062]
That is, the mental state can be included in the distribution of the emotion element, and the mental state can be estimated based on where in the included area. Therefore, the emotional elements of the state data 52 are plotted in advance in a psychological state space 60 shown in FIG. Since the mental state corresponds to the classified area, the classified area is defined as the area representing the mental state, and the size of the mental state is determined in the area. Thereby, the mental state of the mental data 54 corresponding to the emotion element of the state data 52 can be obtained, and the mental state of the subject can be estimated.
[0063]
For example, FIGS. 4 and 5 show the correspondence between the state data 52 of comfort, excitement, and tension and the psychological data 54 of empathy, disgust, interest, indifference, activity, and suppression.
[0064]
In the example of FIG. 4, the correspondence between the comfort and excitement state data 52 and the psychological data 54 of empathy, disgust, interest, and indifference is shown. Regarding the two-dimensional area of comfort and excitement, each of the four separated areas separated by the axis of comfort and excitement corresponds to each of the psychological states of empathy, disgust, interest, and indifference. That is, the psychological state (sympathy) corresponds to the separation area where the comfort is high and the excitement is high, and the psychological state (disgust) corresponds to the separation area where the comfort is low and the excitement is high, Psychological state (interest) corresponds to the separation area with high comfort and low excitement (high sedimentation), and the separation area with low comfort (high discomfort) and low excitement (high sedimentation) Is associated with the psychological state (indifferentity).
[0065]
Accordingly, if the point P1 on which the state data 52 of comfort and excitement is plotted is present in the region of the psychological state corresponding to the plot position (here, it corresponds to the region of empathy), the corresponding mental state is displayed. It can be estimated, and the mental state can be quantified by the weight of the position in the area. In the example of FIG. 4, the weight is set such that the psychological state (sympathy), which is a separation area, is a substantially circular area, and the center thereof has the highest sympathy and the sensitivity decreases as the center moves in the radial direction. can do. It should be noted that a table or a function determined in advance from a correlation obtained from a large number of data can be used as the weight.
[0066]
Further, in the example of FIG. 5, the correspondence between the state data 52 of comfort and tension and the psychological data 54 of activity and suppression is shown. With regard to the two-dimensional area of comfort and tension, of the separation areas separated by the axis of comfort and tension, the psychological state (activity) corresponds to the separation area of high comfort and high tension. The psychological state (suppression) is assigned to a separation region where the discomfort is low (high discomfort) and the tension is high.
[0067]
Accordingly, if the point P2 on which the comfort and tension state data 52 is plotted is within the area of the mental state corresponding to the plot position (here, corresponds to the active area), the corresponding mental state is determined. It can be estimated, and the mental state can be quantified by the weight of the position in the area. Similar to the example of FIG. 4, the weight is such that the psychological state (sympathy), which is the separation area, is set to be a substantially circular area, and the center has the highest sympathy, and the sensitivity decreases as the center moves in the radial direction. Can be set to It should be noted that a table or a function determined in advance from a correlation obtained from a large number of data can be used as the weight.
[0068]
This psychological state will be described in detail with respect to digitization. In this embodiment, numerical values are converted into numerical values using fuzzy inference for converting a language value into numerical values, a neural network system or a genetic algorithm, but the obtained numerical values are normalized to values of ± 1. It will be described as. It is possible.
[0069]
In the above description, the region of the psychological state is set as a substantially circular region. However, in the case of digitization, the present invention is not limited to this. FIG. 6 is an explanatory diagram of quantification (numericalization) of the empathic state, and FIG. 7 is an explanatory diagram of quantification (numericalization) of the disgust state.
[0070]
Whether or not the subject is sympathetic is determined by determining whether the psychological state is “pleasant” and “excited”, and expressed by a numerical value indicating the degree of the state of “empathy”. That is, on a plane having the “comfort-discomfort” axis and the “excitation-sediment” axis as the vertical and horizontal axes, a numerical value indicating the degree of the “sympathy” state at a distance from the origin (hereinafter, referred to as an empathy state value) ) And a numerical value indicating the degree of the state of “disgust” (hereinafter referred to as a disgust state value).
[0071]
In the example of FIG. 6, the comfort is +0.4 from the origin and the excitement is +0.3 from the origin. Therefore, the empathic state value is 0.5 (= √ (0.4 ² +0.3 ² )). Similarly, in the example of FIG. 7, the comfort is -0.4 from the origin and the excitement is +0.3 from the origin. Therefore, the disgust state value is 0.5 (= √ (−0.4 ² +0.3 ² )).
[0072]
As described above, in the present embodiment, the empathy state value and the disgust state value are quantified within a range of minimum 0 and maximum √2 (see FIG. 8).
[0073]
The characteristic data 50, the state data 52, and the psychological data 54 described above are presented so as to be referred to by the printer 26 and the CRT 30, and are transmitted to the evaluation device 70 by the connection device 88 of the evaluation device 70 to which the connection device 68 is connected. Is output.
[0074]
The operation of the recognition device 10 that obtains recognition data for evaluating the behavior of the target person for grasping the trend of the event hall with the above configuration will be described. In the recognition device 10, the processing routine shown in FIG. 9 is executed, and the imaging device 22 and the voice device 24 collect image data and voice data for a plurality of subjects, and detect them as sensor data (step 100).
[0075]
Next, an image recognition process and a voice recognition process are executed based on the detected sensor data (step 102). These image recognition processing and voice recognition processing correspond to the processing performed by the behavior recognition unit 32, which is the above-described functional unit, and the recognition data is obtained by this processing. Next, any one of the subjects who can be specified by the grasp data is automatically set by the input value of one or the operator using the keyboard 28 (step 104), and the characteristics of the behavior of the set subject are grasped (step 106). ). This processing corresponds to the processing performed by the action recognition unit 44, which is the above-described functional unit, and obtains feature data 50 representing the feature of the action of the subject.
[0076]
Next, emotion elements are quantified from the feature data 50 (step 108). This quantification corresponds to the processing performed by the behavioral state deriving unit 46 that is the above-described functional unit, and is a state in which the emotional element representing the internal state of the subject due to the behavioral state of the subject at that time is quantified. Data 52 is obtained. Next, the mental state is quantified from the state data 52 (step 110). This quantification corresponds to the processing performed by the state change estimating unit 48, which is the above-described functional unit, and categorizes the psychological state of the subject at that time and obtains psychological data 54 in which the categorized mental state is quantified. .
[0077]
The above process is repeated until the above process is completed for all the subjects (a positive determination is made in step 112). When the mental state estimation process is completed for all the subjects, the obtained feature data 50 and state data are obtained. 52, after outputting the psychological data 54 to the evaluation device 70 via the connection device 68 (step 114), the routine ends. The obtained feature data 50, state data 52, and psychological data 54 can be output to the printer 26 or the CRT 30.
[0078]
In this way, the features of the subject's behavior are extracted from the result of recognition processing using data collected by video and audio on the subject's behavior, emotion elements are obtained from the behavior characteristics, and the obtained emotion is obtained. Since the psychological state is estimated from the factors, data including the psychological state that is usually unknown, such as what kind of mental state the subject is in and what kind of mental state is present at the place Can be presented.
[0079]
FIG. 10 shows a functional block diagram relating to the evaluation processing in the evaluation device 70 for obtaining various data for supporting the understanding of the trend of the event hall. The evaluation device 70 evaluates the attention state of the target person using the feature data 50, the state data 52, and the psychological data 54 from the recognition device 10.
[0080]
The evaluation device 70 is classified into a space evaluation unit 92, an attention state evaluation unit 94, an attention site evaluation unit 96, and a structure storage unit 97 as its functional parts. Data representing the site of the structure in the space to which the subject pays attention based on the behavior of the subject evaluated by these functional units is obtained as the target site data 98. The target part data 98 is presented so as to be referred to by the printer 82 and the CRT 86.
[0081]
In each of the functional units described below, a processing program functioning as each processing of each functional unit stored in the memory 80 of the evaluation device 70 uses each resource such as the CPU 72, the ROM 74, the RAM 76, and the input / output port 78. And processing.
[0082]
The data from the recognition device 10 is configured to be input to the space evaluation unit 92, and the space evaluation unit 92 is configured to also receive the structure data from the structure storage unit 97. The evaluation result of the space evaluation unit 92 is configured to be output to the attention state evaluation unit 94, and the unit space and the attention degree (described later) are accumulated in the attention state evaluation unit 94. This accumulation result is configured to be output to the attention site evaluation unit 96, and the attention site evaluation unit 96 outputs target site data of the evaluation result. The attention site evaluation unit 96 is also configured to receive the structure data from the structure storage unit 97.
[0083]
The space evaluation unit 92 is a functional unit that evaluates a space in which the target person has noticed based on the characteristics, internal state, and mental state of the target person grasped by the recognition device 10. The space evaluation unit 92 divides the space into unit spaces and evaluates the space in the unit space, and also uses the structural data of the structures in the space stored in the structure storage unit 97 to identify the structures in the space. Will be evaluated in consideration of. The space evaluation unit 92 derives a unit space corresponding to the position of interest by the subject and the degree of interest in the unit space based on the evaluation.
[0084]
The attention state evaluation unit 94 is a functional unit that evaluates the attention state of the subject using the evaluation result of the space evaluation unit 92. The attention state evaluation unit 94 holds the attention state by repeatedly storing the unit space and attention degree associated with the unit space derived by the space evaluation unit 92 and storing them.
[0085]
The attention site evaluation unit 96 is a functional unit that evaluates the attention site of the subject using the evaluation result of the attention state evaluation unit 94. The attention site evaluation unit 96 obtains the attention from the unit space that is predicted to be the most noticed based on the unit space and the attention degree held by the attention state evaluation unit 94, that is, the history of the unit space and the attention degree of the subject. The part is derived. The attention site evaluation unit 96 derives the attention site by using the structure data in the space stored in the structure storage unit 97 in consideration of the structure in the space.
[0086]
Note that a system including the recognition device 10 and the evaluation device 70 corresponds to the information providing device of the present invention. In addition, the recognition device 10 corresponds to the detection unit of the present invention, the space evaluation unit 92 corresponds to the derivation unit of the present invention, the structure storage unit 97 corresponds to the storage unit of the present invention, and the attention state evaluation unit 94 The site of interest evaluation unit 96 corresponds to the storage unit of the present invention, and corresponds to the specifying unit of the present invention. The presenting means of the present invention corresponds to the printer 82 or the CRT 86.
[0087]
[Operation of Embodiment]
Next, the operation of the present embodiment will be described. In the present embodiment, the target device's attention site is obtained in the evaluation device 70 from the characteristics of the target person detected by the recognition device 10. Therefore, the operation of the evaluation device 70 will be mainly described. In the evaluation device 70, a processing routine shown in FIG. 11 is executed.
[0088]
In step 202 in FIG. 11, recognition data (feature data 50, state data 52, and psychological data 54) as a recognition result of the recognition device 10 is detected. That is, the gaze direction vector (start point and vector direction) detected by the recognition device 10 is detected, and the empathy state value and the disgust state value are detected as the recognition data J.
[0089]
In the next step 204, the emotion level K is derived from the detected recognition data J. The emotion degree K represents the degree of emotion, and corresponds to the above-mentioned sympathy indicating the degree of empathy or the degree of disgust indicating the degree of disgust. That is, since the maximum value of the empathy state value and the disgust state value (recognition data J) output from the recognition device 10 is $ 2, it corresponds to the process of normalizing this to the maximum value "1". The subsequent numerical value is defined as the degree of emotion K (sympathy or disgust). Specifically, a process (K = J / $ 2) of dividing the empathic state value or the disgust state value output from the recognition device 10 by a predetermined number (# 2) is executed.
[0090]
In the next step 206, the spatial position of interest of the subject is derived. This processing is to derive a spatial block (unit space) existing ahead of the subject's line of sight from the line-of-sight direction vector and the three-dimensional space data of the structure.
[0091]
As shown in FIG. 13, a three-dimensional space of an event hall such as an exhibition, that is, an attention space 230 in which a target person can pay attention is a cube (hereinafter, referred to as a unit space) whose one side is a predetermined value (for example, 10 cm). ) Divide by 232. In this division, the attention space 230 can be easily specified from the structure drawing and the design drawing of the event hall. Exhibits and structural articles are installed in the attention space 230, and their positional relationships are stored.
[0092]
That is, in the present embodiment, three-dimensional space data such as installation positions and dimensions are stored in the memory 80 for all the articles existing in the attention space 230, and the unit space into which each article is divided Is associated with.
[0093]
For example, as shown in FIG. 13, the attention space 230 is divided by the unit space 232. In this division, the attention space 230 can be easily specified from the structure drawing and the design drawing of the event hall. Exhibits and the like are arranged in the attention space 230. FIG. 12 shows an example in which a car 234 is arranged as the article. The structural data of the vehicle 234 is stored in the memory 80 together with the positional relationship data of the vehicle 234, and it is possible to know where each part of the vehicle 234 is located in the attention space 230. Therefore, by judging which unit space 232 contains each part or structure position for the vehicle 234 located in the attention space 230 divided by the unit space 232, the structure position of the vehicle 234 is changed to the unit space. 232 positions.
[0094]
Therefore, the unit space 232 that the target person is looking at can be specified from the line-of-sight vector information (vector start point, direction) from the recognition device 10 and the three-dimensional data. Then, an article (for example, a part of the automobile 234) in the unit space 232 can be specified.
[0095]
In the next step 208, the amount of movement of the line of sight when the target person watches is derived. This processing is performed in a unit space 232 that the subject is watching at an arbitrary time (t), and a unit space 232 that is next viewed (a unit space that is assumed to be viewed when a predetermined time {t + {t} has elapsed). 232). That is, assuming that the center position coordinates of the unit space 232 viewed by the subject at an arbitrary time (t) are (x0, y0, z0), the unit space 232 (the predetermined time {t + {t}) that has been viewed next has elapsed. Assuming that the coordinates of the center position of the unit space 232 that is visually observed at this time are (x1, y1, z1), the visual line movement amount L can be expressed by the following equation.
[0096]
L = √ ｛(x1-x0) ² + (Y1-y0) ² + (Z1-z0) ² ｝
[0097]
In the next step 210, the residence time for the line of sight of the unit space 232 viewed by the subject is determined. As shown in FIG. 14, the gaze retention degree M in this process can be expressed by a function that takes a value that decreases as the gaze movement amount L increases. For example, the gaze retention degree M can be obtained by the following equation.
[0098]

Here, Lmax is a predetermined value set in advance.
[0099]
The predetermined value Lmax is the maximum gaze movement that can ensure the relevance between the psychological state of “sympathy” or “disgust” derived at an arbitrary time (t) and the article ahead of the gaze at that time. It can be obtained from the maximum value of the speed.
[0100]
For example, when the line of sight is sampled at one-second intervals, it is assumed that when the line of sight is moving at a predetermined speed (for example, 3 m / s) or less, the relationship between the psychological state and the item in the line of sight can be ensured. You. That is, when the state in which the subject gazes is considered based on the moving speed of the line of sight, a predetermined value can be derived statistically, and if this moving speed is set, it can be used for the above-described determination of the relevance. On the other hand, when the target person changes the article to be viewed, the line of sight is changed at a high moving speed.
[0101]
Therefore, for example, if the predetermined speed is 3 m / s, the predetermined value L is 3 m (= 3 [m / s] × 1 [sec]).
[0102]
According to the above equation, when the visual line movement amount L = 0 (the same unit space 232 is observed within a predetermined time), the visual line staying degree M becomes “1”, and as the visual line movement amount L approaches the initial position Lmax. When it becomes smaller and exceeds a predetermined value Lmax, all become “0”.
[0103]
Next, in step 212, an instantaneous value of the emotional state is derived. This processing is for obtaining the instantaneous depth of the degree of empathy or disgust. The emotional state changes. In addition, during the emotional transition, the gaze stays or changes significantly. Therefore, in the present embodiment, the instantaneous value N of the emotional state is determined from the relationship between the degree of emotion K and the degree of gaze retention M. The instantaneous value N can be obtained by the following equation.
[0104]
N = K × M
Here, K is the sensitivity or disgust, and when K is the sensitivity, the instantaneous value N is the instantaneous value N representing the depth of the empathy, and when K is the disgust, the instantaneous value N is the disgust. Instantaneous value N representing the depth of
[0105]
In the next step 214, the instantaneous value N derived in step 212 is accumulated corresponding to the unit space 232. That is, since the instantaneous value N obtained in step 212 is an instantaneous value at an arbitrary time, by accumulating the instantaneous value N, how much the emotional state changes in the unit space 232, that is, Can be grasped as a cumulative value. Therefore, the instantaneous value N is accumulated as data of the corresponding unit space 232.
[0106]
FIG. 15 shows a diagram in which the attention space 230 is projected on a two-dimensional plane for the sake of simplicity. As shown in FIG. 15A, when the stay range 236 due to the line of sight of the subject who looks at the automobile 234 is around the door mirror, as shown in FIG. The values are 9.2 at the center and 8.5, 7.7, 7.2, 8.9 in the clockwise direction from the top. As shown in FIG. 15 (C), the instantaneous value N after the lapse of the predetermined time (t + Δt) is 0.8 at the center and 0.2, 0.5, 0 clockwise around the center from the top. 0.6 and 0.7. As a result, the cumulative value is 10.0 at the center and 8.7, 8.2, 7.8, and 9.6 clockwise from the top in the periphery.
[0107]
In the next step 216, a target part is derived from the accumulated instantaneous value N. That is, the instantaneous value N is accumulated in the memory 80 for each unit space 232. This is a distribution for each unit space 232 in which it is assumed that the subject pays attention to the attention space 230 within a predetermined time. Therefore, it is highly probable that the one with the largest accumulated value is the unit space 232 that includes the part that the subject noticed. The unit space 232 corresponds to a part of an article installed in the attention space 230 such as the car 234. For this reason, the unit space 232 where the accumulated instantaneous value N is the maximum is determined, and the part of the article corresponding to the determined unit space 232 is specified as the target part. In the example of FIG. 15, the unit space 232 having the accumulated value of 10.0 is specified, and it is specified that the door mirror corresponding to the unit space 232 is focused on.
[0108]
The specified region of interest includes the emotional state. That is, for example, when the depth of empathy is high, it can be specified that the user is paying attention while having interest. On the other hand, when the depth of disgust is high, it can be specified that the user is paying attention while having a disgusting feeling.
[0109]
In the next step 218, it is determined whether or not the evaluation is a real-time evaluation. If the determination is affirmative, the process returns to step 202 to repeatedly execute the above processing. The real-time evaluation is a real-time continuation process, and a positive determination is made until a predetermined condition is satisfied. An example of a condition that is determined to be negative in the real-time evaluation is a condition that interrupts the evaluation, such as when a predetermined time has elapsed or when the accumulated value has exceeded a predetermined value.
[0110]
If a negative determination is made in step 218, the process proceeds to step 220, where the specified part of interest is output to the printer 82 or the CRT 86, and the routine ends.
[0111]
In this way, based on the results of recognition processing using data collected by video and audio on the behavior of the target person, the target part of the target person is estimated from the characteristics of the target person's behavior, that is, the behavior characteristics and psychological state Therefore, the psychological state in the three-dimensional space that cannot be known, such as what kind of mental state the subject is looking at in the article, what part is being noticed in what state, etc. The attention site including the information can be presented.
[0112]
The system using the recognition device 10 and the evaluation device 70 according to the present embodiment includes an evaluation of customer interest and attention in a model room, an evaluation of customer interest in a store, an evaluation of articles with a high degree of interest in a show room, a museum and a museum, and the like. It can be suitably used for evaluation of an article having a high degree of interest in a place.
[0113]
In the above-described embodiment, the structure of the article such as the automobile 234 is suitable when the structure such as the outer shape is visually observed, but is not limited to the outer shape of the article. For example, the present invention can be applied to a case where the image is visually observed through a transparent medium such as glass like a show window. In this case, by giving an attribute representing the degree of transparency as structural data of the transparent medium, it is possible to numerically express a state indicating that it is visible, and to evaluate an article placed at the tip of the transparent medium, that is, evaluate a state of interest. can do.
[0114]
Further, the above embodiment is suitable for viewing fixed articles such as the automobile 234, but is not limited to viewing fixedly installed articles. For example, the present invention can be applied to a case where an article moves, such as an amusement, or a case where a person such as a moving performer is visually observed. In this case, by assigning a time history as an attribute to the moving data of the article or the person to be moved, it is possible to numerically represent the attention target accompanying the movement, and to evaluate the moved article or the person, that is, to evaluate the attention state. it can.
[0115]
In the above-described recognition apparatus 10, a case has been described in which a target person is set from a plurality of target persons and the characteristics of all the target persons are obtained by sequentially changing the settings. It is not limited to. For example, data of only a predetermined target person may be output.
[0116]
In the above description, data is output in a single place. However, the present invention can be easily applied to a place where a plurality of agendas are held in a plurality of communities. Achieved by classifying a venue as a community with a group of agendas and performing processing for each of the classified communities, such as graphically finding the presence of the target reaching in the direction of the target's line of sight. Can be.
[0117]
In the above description, the case where the psychological state is estimated using the data photographed or picked up by the photographing device 22 or the sound device 24 has been described. The stored data may be stored in a storage medium (for example, a magnetic tape or an optical disk), and the mental state may be estimated using the data stored in the storage medium.
[0118]
The processing routine of the present embodiment and the processing of each functional unit can be stored in a magnetic disk medium as a recording medium and distributed. In this case, by providing a magnetic disk medium read / write device (not shown), the processing routine and the like can be read and written to the magnetic disk medium using the magnetic disk medium read / write device (not shown). Therefore, a processing routine or the like may be recorded in advance on a magnetic disk medium, and a processing program recorded on the magnetic disk medium may be executed via a magnetic disk medium read / write device (not shown). Also, a large-capacity storage device (not shown) such as a hard disk drive is connected to the computer, and the processing program recorded on the magnetic disk medium is stored (installed) in the large-capacity storage device (not shown) and executed. Is also good. Examples of the recording medium include disks such as CD-ROM, MD, MO, and DVD, and magnetic tapes such as DAT. When these are used, they may be replaced with a magnetic disk medium read / write device (not shown) or further may be a CD-ROM. A device, an MD device, an MO device, a DVD device, a DAT device, or the like may be used.
[0119]
【The invention's effect】
As described above, according to the present invention, in a place such as an event hall or an exhibition, the information is derived from the feature information representing the feature of the detected behavior of the target person and the structure information of the structure stored in the storage unit. Means for deriving the degree of attention of the target person are derived and accumulated by the means, and the attention part noticed by the target person is specified by the identification means, so that the attention part noticed by the target person can be easily specified. Has the effect of
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a schematic configuration of a support device according to an embodiment of the present invention.
FIG. 2 is a conceptual configuration diagram for each function for functionally explaining the recognition device according to the embodiment of the present invention;
FIG. 3 is an image diagram showing a psychological state space around an emotion element used in the recognition device.
FIG. 4 is an explanatory diagram for explaining a correspondence between state data of comfort and excitement and psychological data of empathy, disgust, interest, and indifference;
FIG. 5 is an explanatory diagram for explaining a correspondence between state data of comfort and tension and psychological data of activity and suppression.
FIG. 6 is an explanatory diagram for explaining digitization of empathy.
FIG. 7 is an explanatory diagram for explaining digitization of disgust;
FIG. 8 is an explanatory diagram for explaining the maximum value of the empathy state value.
FIG. 9 is a flowchart illustrating a flow of processing executed in the recognition device.
FIG. 10 is a functional conceptual configuration diagram for functionally explaining the evaluation device according to the embodiment of the present invention.
FIG. 11 is a flowchart illustrating a flow of processing executed in the evaluation device.
FIG. 12 is an image diagram showing a space of interest to be evaluated by the evaluation device.
FIG. 13 is an image diagram showing an attention space including an article.
FIG. 14 is an explanatory diagram for explaining a support retention degree.
FIGS. 15A and 15B are explanatory diagrams for explaining accumulation of instantaneous values. FIG. 15A shows a stay range, FIG. 15B shows an accumulated value, and FIG. 15C shows an instantaneous value.
[Explanation of symbols]
10. Recognition device
50 ... characteristic data
52 ... status data
54… Psychological data
70 ... Evaluation device
90 ... Support device
92 ... Spatial evaluation unit
94: attention state evaluation unit
96… Attention site evaluation unit
97 Structure storage unit
98 ... Target site data
230 ... Space of interest
232 ... unit space
236 ... Retentive area

Claims

Detecting means for detecting characteristic information representing characteristics of the behavior of the target person;
Storage means for storing structure information indicating a position and a structure of a structure in a target space to be acted on by the target person;
The target space is divided by a unit space, and a correspondence relationship as to which of the divided unit spaces the structure according to the structure information corresponds to is obtained. Based on the characteristic information, a unit corresponding to a position to which a target person pays attention. Deriving means for deriving a space and deriving a degree of attention representing the degree of attention of the subject person with respect to the unit space;
Accumulation means for accumulating the unit space and the degree of attention derived by the derivation means in association with each other;
Specifying means for specifying a portion of interest to which the target person has paid attention, based on the unit space and the degree of attention accumulated by the accumulation means;
Presenting means for presenting the attention site specified by the specifying means;
An information providing device comprising:

The detection means includes: video detection means for detecting photographed image information of a target person as characteristic information; and sound detection means for detecting sound information emitted by a plurality of target persons as characteristic information. 2. The information providing apparatus according to claim 1, wherein:

2. The method according to claim 1, wherein the detecting unit detects line-of-sight information related to a line of sight of the target person, and the deriving unit derives the unit space and the degree of attention from the line of sight information of the target person. 3. The information providing device according to 2.