JP2004062883A

JP2004062883A - Information processor

Info

Publication number: JP2004062883A
Application number: JP2003173076A
Authority: JP
Inventors: Hisashi Aoki; 青木　恒
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2003-06-18
Filing date: 2003-06-18
Publication date: 2004-02-26
Anticipated expiration: 2015-09-12
Also published as: JP3751608B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device which efficiently acquires a part which is important for a user and a level of an importance degree by an easy operation and the sorting, processing and presentation of information based on them when multimedia information is treated in the information processor. <P>SOLUTION: Visual lines of the user at the time of information viewing are recorded and the level of the importance degree to original information is presumed based on the recorded visual level. In the presumption of the importance degree, a type of eye movement is first determined and a time considered as a visual system acquires information at high efficiency is discriminated. The activeness/inactiveness of the activity of the visual system are digitized based on information collection efficiency and the importance degree to information is presumed by indexing activeness at a cut, and scene. Thus, an important part for the user in the original information and the importance degree are presumed without a special operation of the user. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【産業上の利用分野】
本発明は映像・音声・文書などを統合的に表示・記録・再生・編集するマルチメディア情報処理装置に関する。
【０００２】
【従来の技術】
パーソナルコンピュータなどに搭載されるマイクロプロセッサ性能向上はめざましく、普及型のパーソナルコンピュータで大量の情報を高速に処理できる環境が整いつつあることは周知であろう。近年の光通信技術、通信制御技術や光磁気ディスク、高密度集積回路によるメモリ素子などの発展もあり、一方で情報伝送路が、一方で情報蓄積装置が大容量化していることも作用し、家庭やオフィスでも安価な装置で大量の情報を電子的に入手することができる世界が実現しつつある。また、同様に簡単な操作で情報を発信できるため、これまで放送局や新聞社などに限られていた情報発信の特権が一般の利用者へと広がり、さらに大量の情報が行き交うようになりつつある。このことは現在のパソコン通信や電子メール、電子ニュースなどの普及からうかがい知ることができる。また、パーソナルコンピュータに限らずこのように情報をやり取りできる端末は、電子手帳、携帯電話、ファクシミリなど、あらゆる装置の形で浸透している。そして、それらが扱うことのできる情報の種類も、従来の文字情報（テキストデータ）だけから、音声・音楽情報、静止画像情報、動画像情報へと広がり、その品質もそれぞれ飛躍的に向上している。
【０００３】
しかし、情報分類の手法に関しては旧来の方法がとられている場面が多い。電子ニュースの世界を例にとれば、ニュースグループ（＝話題ごとの「くくり」）が何百と存在し、各ニュースグループには１日数十もの投稿がなされる。ユーザは、その中から自分が必要とする／関心のあるニュースを探すために少なからず時間を費やさなければならない。これは各ニュースが、「ニュースグループ→ニュース」という、ごく単純な、そしてユーザが通常自分では定義できない層構造でとりあつかわれているためである。音楽テープやビデオテープにおいても、楽曲／番組ごとの開始位置は、無音部分あるいは録画開始のインデックス信号を検出することで判定できるが、その楽曲／番組の内容、あるいは特に関心のある場面や部分を指し示す情報は記録できないため、ユーザはそれらのラベルなどに自ら書き込むより他に方法がない。
【０００４】
これら情報の分類、あるいは情報の中から特徴となる部分を抽出する作業を自動化しようとする開発は、現在も盛んに行われている。しかし、それらの中の多くは人間の判断機構を代用することを目標としており、高度な人工知能を必要とし、開発の時間の面からも、費用の面からも、現時点ではコストがかかるといわざるを得ない。また、そのような自動分類システムには、ユーザは自分が望むものを指示してやる必要があり、何が見たい、どんな風な情報を得たい、というビジョンのはっきりしないユーザには扱いづらい。このような不案内なユーザは情報があふれる時代にこそ、激増するものと想定できる。
【０００５】
むしろ必要とされるのは、「重要だと感じる場所がここである」としめす情報であり、必ずしもその内容を記述することを必要としているわけではない。たとえば映画の中で１場面だけ魅力的な俳優が出演していたとき、その俳優が誰なのか、男なのか女なのか、さらにそのオブジェクト（被写体）が人間なのか、という情報は常には必要ではなく、ただその場面を指し示してくれる装置であれば十分用件は満たされる場合が多い。それなのに現在の情報分類の流れでは、「情報を装置が精査する→特徴となる部分を候補として抽出する→ユーザが入力した要求情報と照合する→分類・提示を行う」という手順であるため、いったん情報内容の解析を入念に行わなくてはならなくなる。
【０００６】
一方、一般の利用者が情報発信源となりつつある現在、情報加工ツールの要求も高まっている。これまでの加工ツールとしては文書情報のためのワードプロセッサ、図形情報のためのＣＡＤ（Ｃｏｍｐｕｔｅｒ　Ａｉｄｅｄ　Ｄｅｓｉｇｎ　）や描画ソフトウェアなどがある。しかし、とりわけビデオや音声に関しては操作性のよい一般向けのツールはきわめて少ないといわざるをえない。利用者は２台のデッキを接続して、一方のデッキでは目的となる場所を探して再生し、他方のデッキでそれを録画するという作業で「切り張り」編集を行っている。「切り張り」編集を行うためには最低でも２台のデッキが必要で、また操作も繁雑であるために、一般の利用者は編集作業を敬遠しがちである。現在、提案されている編集方法の中には、コンピュータのメモリやハードディスクなどの記憶媒体にビデオ／音声情報をいったん蓄積して、コンピュータの編集環境で情報加工するものもあるが、このためには大容量の記憶媒体を必要とするうえコストが高く、根本的には「切り張り」の編集であるために操作性の特段の向上は期待できない。
【０００７】
以上のように簡便で効率的な情報自動分類機構や情報加工手段がないことは、今後のマルチメディア統合環境が普及することへの大きな妨げになる恐れがある。
【０００８】
【発明が解決しようとする課題】
以上のように、従来の情報処理装置では、多種で多様な情報を、情報の種類を越えて効率よく、かつ利用者の意図を反映して分類・整理などを行う手段がなかった。このために利用者はその情報の処理作業に時間と労力を割かなければならず、これを装置として自動で行う場合にも、必ずしも個々の利用者に適応して処理できないという欠点があった。また従来は情報の加工方法も繁雑で、その実現のためには高コストとなりやすいという欠点もあった。
【０００９】
本発明は、上記のような課題を解決するためになされたもので、多種多量の情報から簡便な操作で効率よく利用者が必要とする情報を加工・提示できる情報処理装置を提供する。
【００１０】
【課題を解決するための手段】
前記課題のうち、画像中から被写体境界を判別せずに重要度水準の時間変化を推定するという課題を解決するために、本発明に係る情報処理装置は、画像、および画像と音声の情報を利用者に提示する装置において、該情報を視察または視聴している利用者の眼球運動を観測する手段と、この手段で観測した利用者の眼球運動の速度成分を逐次生成する手段と、この手段で生成した速度成分から眼球運動の活発さの度合いを求める手段と、この求める手段で求めた眼球運動の活発さの度合いをもとに該情報の重要度を推定する手段ととを含む。
【００１１】
【作用】
この発明における情報処理装置では、視線を利用して利用者が提示情報をどれだけ重要と認識しているのかを数値で表現できる。利用者の意思表示の一部を視線の自然な動きから得るので、利用者は操作への習熟の手間が軽くなり、利用者にとっても計測されているという意識が少ないままに意思の入力が行われる。このようにして、原情報中で利用者が重要と認識している部分とその度合が平易な入力手法で得られるため、「重要部分」に偏った提示、並べ替えなどの処理を行うことが可能になり、情報アクセスの効率向上を望むことができる。
【００１２】
【実施例】
以下、本発明の第１の実施例を図面に基づいて説明する。図１は本発明の一実施例に係わる情報処理装置の構成を示すブロック図である。また、図１２〜図１４は以下の処理の流れを説明するフローチャートである。
【００１３】
記録媒体１１６に保存されているマルチメディア情報は、読み出し制御部１１４で読み出され、提示形態作成部１１３を経由して画像情報提示部１１０、音声情報提示部１１１、音楽情報提示部１１２などによって利用者１０１に提示される。提示形態作成部１１３とは、具体的にはビデオＲＡＭ、ビデオプロセッサ、あるいはサウンドマネージャ、スピーカードライバやＭＩＤＩインタフェースなどのことを指す。以上の過程においては、テレビジョンとビデオテープレコーダー（ＶＴＲ）を組みあわせたものや、ＣＤ−ＲＯＭドライブを搭載したパーソナルコンピュータなど、従来のマルチメディア機器の情報提示の方法と違いはない。
【００１４】
一方、このようにして提示された情報を視聴している利用者１０１の視線は視線検出器１０２にで測定され、視点演算部１０３に送られる。視線検出器にはどのような形態のものを用いてもよいが、以下では視点の２次元座標上の位置に対応して２チャンネルの電圧値を出力するような視線検出器を想定して説明する。視線検出器１０２から視点演算部１０３に送られるデータは例えば図４ａのようなものである（図１２においてはａの場所で図４のａのようなデータが流れている。以下同様である）。視点演算部１０３では、この電圧値を画面内の位置に変換し一時記憶部１０８に蓄積する（図１２のＳ１２０１）。視点演算部１０３が出力するデータは図４ｂのようなものである。この座標データを元に、眼球運動種別判定部１０４が時刻時刻の眼球運動の種類を判定し、再び一時記憶部１０８に蓄積する（図１２のＳ１２０３〜Ｓ１２０９）。
【００１５】
人間の眼球運動は大きく分けて図２に示すように固視微動と追跡運動によって成っており、速度成分をみることで跳躍運動は固視微動や随従運動から区別することが可能である。図３には動画像視聴時の実際の視線の様子を模式化して示した。画面内には人物３０１とテレビ台３０２という主に２つのオブジェクト（被写体）が写っている。図３中の丸印（○）は６０分の１秒ごとに記録した視点である。ここに示したように、オブジェクトを見ている間は低速の運動が続き（３０３）、オブジェクト間を移動する際には高速の運動を行っている（３０４）ことがわかる。
【００１６】
図５には、同様に実際に記録された視点のデータを横軸に時間経過、縦軸に座標値をとって示した。図中実線と破線はそれぞれＸ座標とＹ座標を表している。図では３秒分を示しているが、極めて短時間のうちに急激に座標が変化する様子が５回みられる（５０１）。これが跳躍運動に相当する。一方、跳躍運動よりもはるかに大きいパルス状の運動も３回みられる（５０２）が、これはまばたきである。視線検出器の特性によってこのような動きになるが、パルス状の動きの前後で座標値がほとんど変化していないことから跳躍運動と弁別できる。
【００１７】
以上のようにして、眼球運動を「まばたき」、「跳躍」、およびそれ以外の「注視」の３種類に区別するのが図１の眼球運動種別判定部１０４である。眼球運動種別判定部１０４によって判定されたデータは図４ｃのような形をとる。なお、まばたきの最中に視点を移動させたものは、オブジェクトの移動を伴っていることから「跳躍」とみなすべきであり、眼球運動種別判定部１０４でもそのように判定されるものとする。眼球運動種別判定部１０４によって時刻時刻の眼球運動種別は再び一時記憶部１０８に蓄積される。
【００１８】
さて人間の視覚特性については、跳躍運動のあと２００ミリ秒程度の間に見えている画像の解像度が低下している、ということが知られている。このことについてはたとえば「眼球運動の実験心理学」（名古屋大学出版会、１９９３年）などに記述されている。したがって、実際に通常の解像度でオブジェクトを注視できているのは跳躍運動後２００ミリ秒後からといってよいだろう。したがってひとつひとつのオブジェクトを見ていた時間は図５の５０３のようになる。５０４は跳躍運動のために解像度低下が起こっている時間である。過去の単位時間あたりに「ひとつひとつのオブジェクトを見ていた時間」すなわち跳躍運動とその後２００ミリ秒をのぞいた時間を「情報獲得率」と定義する（図１３のＳ１３０２〜Ｓ１３０５）。ここでいう跳躍運動とは、まばたきの間にオブジェクトを移動した場合を含んでいる。逆にまばたきの前後での視点が大きく変化していないとき（図５の５０２）は、オブジェクトの移動をともなっていないので、以下の過程では注視と同様に取り扱われる。情報獲得率計算部１０５ではこのようにして眼球運動種別の時間推移から情報獲得率を計算し、再び一時記憶部１０８に保存する。このデータは例えば図４ｄのような形態をとる。
【００１９】
情報獲得率が低下するときは、視覚系は跳躍運動による解像度低下という代償を払ってでも多くのオブジェクトをみようとしているということができる。逆に画面内にとくに見るべきオブジェクトが少ない場合、情報獲得率が高い時間が長いものと考えられる。そこで、情報獲得率が低下している時間を視覚系が情報収拾に活発であったとみなし、その活発さを評価する時間関数を「視覚活性水準」として定義する。視覚活性水準とは、過去単位時間内に情報獲得率が低下に向かっていた時間とする。「情報獲得率が低下に向かう」時間は、情報獲得率の時間による２階微分が０になる瞬間のうち連続した２回に囲まれる時間区間で、情報獲得率の極小値を含むものとする。すなわち図６の６０１の時間区間などである。図６で６０２は情報獲得率、６０３は情報獲得率６０２の２階微分が０になる時間、６０４は情報獲得率６０２の極小値である。実際のコンピュータ内部の計算では時間方向に離散的な値をとるため、上記の「微分」は「差分」と置き換えて計算を行ってもよい。このような計算を行うのが視覚活性水準計算部１０６である。視覚活性水準計算部１０６ではこれまでに計算された情報獲得率から上記のような計算過程を経て時刻時刻に眼球運動が「活発」「不活発」のどちらであったかを判定して出力する。この出力は再び一時記憶１０８に保存される（Ｓ１３０６〜Ｓ１３０８）。出力データは例えば図４ｅのような形態である。
【００２０】
以上の過程の時間解像度は本発明の処理システムの能力によって規定されるものであるが、これを利用者にとって理解しやすい時間単位に分割する。システムによって規定される時間解像度は例えば３０分の１秒、６０分の１秒、１００分の１秒などであり、利用者にとって理解しやすい時間単位とは、例えば「カット」や「シーン」あるいは３０秒、３分、といったものである。重要度情報生成部１０７は、こうした利用者に理解しやすい時間単位で情報獲得率を演算し、その演算結果とその時間単位を重要度情報として出力する。つまり「カットナンバー１４は３分２３秒１５フィールドから３分２９秒２７フィールドまで。その重要度は０．７３」などといったデータである。ここでいうカットのような時間単位があらかじめ原情報とともに記録されている場合には、開始時刻や終了時刻に関する情報は省くことが可能である。また、どのような時間単位で重要度情報を付与するかは上記の「カット」、「シーン」、何秒、といった中から利用者の選択を許容してもよい。
【００２１】
この時間単位は情報単位検出部１０９によって判定され、重要度情報生成部１０７に送られる。原情報に情報の切れ目が記録されていないような場合、情報単位検出部１０９は、画像情報の中で色調や輝度が急激に変化する瞬間などを検出することでシーンチェンジなどを検出する。これら情報単位の定義方法は、どのようなものでもよい。また、情報単位検出部１０９が自動検出した情報の区切りを利用者が特に訂正したい場合には、利用者が新たに定義した情報の区切りを重要度情報のための時間単位としてもよい。さらに、原情報にこのような時間単位があらかじめ記録されている場合には、情報単位検出部１０９は記録されているその時間単位を利用してもよい。
【００２２】
時間単位内の重要度情報のうち、重要度の算出方法は例えば以下のようなものである。上述のように情報獲得率の「活発」「不活発」が時刻ごとにわかっているので、その時間単位に対して「活発」であった時間の割合を重要度として採用する（図１４）。長さが１０秒のカットに対して「活発」であった時間が合計４秒ならば重要度は０．４０、長さが３分のシーンに対して「活発」であった時間が合計２分６秒ならば０．７０、長さが２時間の映画に対して「活発」であった時間が合計４０分ならば０．３３などとなる。重要度情報は再び一時記憶部１０８に蓄積され、適切な時機に原情報と関連づけて記録媒体に記録される。適切な時機とは、システムの動作環境によって異なる。記録媒体１１６およびその周辺の機構が、情報再生と同時に記録が可能なものである時には、以上の過程を情報提示と同時に進行させ、再生・提示と平行して重要度情報を記録していってもよい。また情報提示が終了するまで一時記憶部１０８に蓄積したままにしておき、提示終了後に一括して重要度情報の記録にとりかかってもよい。一方、図１では記録媒体１１６として光ディスクのようなものを想定して描いているが、記録媒体１１６の形態は特にこれに限定されず、また、原情報と異なる媒体に重要度情報を記録してもよい。原情報や重要度情報記録のための記録媒体１１６としては、光ディスクや光磁気ディスク（ＭＯ）、ビデオテープなどの磁気テープ、フロッピー（Ｒ）ディスクやハードディスクなどの磁気ディスク、ＩＣメモリー、あるいは通信ケーブルを介して遠隔のコンピュータなどを用いてもよい。
【００２３】
このようにして記録された重要度は、同じ方法で時間単位を設定した他の重要度情報との比較に有効である。例えば家庭用ビデオカメラで撮影した家族旅行のビデオに上記の手段によって重要度が記録されているとき、再生装置は重要度０．３５が付与されたカットと重要度０．６４が付与されたカットでは後者の方が時間経過後に検索される可能性が高いものとシステムは推定して、あらかじめメモリー素子にそのカットの先頭数秒の画像音声情報を格納しておく。あらかじめメモリー素子に蓄積された情報はアクセスの高速化がはかれるため、利用者のより円滑に目的の画像に到達できることが期待できる。あるいは重要度０．２０の映画と重要度０．７７の映画の光ディスクがあったとき、ディスクチェンジャ（複数のディスクを格納できる再生装置）は後者を取り出し易い位置に格納しておくことで、同様の効果が期待できる。
【００２４】
次に上記の方法で重要度を付与し、利用者が実際に考えていた重要度との合致を比較した実験結果を図７に示す。図７は５つのシーンからなる一連の動画像を被験者に提示し、そのシーンを被験者が「面白い／重要だと思った」順に並べ替えた際、その１位、２位、３位…の画像に本発明の方法を用いたシステムがどのような重要度を付与したかを示すグラフである。なお、画像は映画、ＣＭ、風景、音楽ビデオ、サッカーをそれぞれ３０秒ずつ、１５秒の無地画面をはさんで連続させたものである。本発明の方法が妥当であればグラフは単調減少の傾向を示すはずである。図７を見ると、被験者ＫＮＴにおいては被験者が申告した重要順と本システムが付与した重要度順が合致していることがわかる。また、被験者ＮＺＳにおいても３位の画像以外では順序が合致している。さらに被験者ＫＴＳのようなデータでも、１、２位と３〜５位の分別は可能である。
【００２５】
ＣＧ（コンピュータグラフィックス）などにより生成された画像では、画面内のどの位置はどのオブジェクトに属すべきものかが既知である場合がある。このような場合には視点位置とオブジェクトとの関係を利用して、以上の方法の精度をあげられる可能性があるので、その方法について説明する。
【００２６】
図８ａのような画像があったとき、そのオブジェクト領域は図８ｂの人物８０１、紙８０２などのように既知で、それぞれの境界線が定義されていたとする。その際、視点が図８ｂの「○」あるいは「×」のようであったとすると、それら視点のうち「×」であるものは２つのオブジェクトのどちらにも属さない場所にある。図８ｂでいくつか見られる跳躍運動のうち、これまでに説明した重要度推定の手順では定義済みオブジェクト内の移動（８０３）、あるいは定義済みオブジェクト間の移動（８０４）と、非オブジェクト領域への移動（８０５）、非オブジェクト領域内での移動（８０６）は、まったく同等に扱われていた。このために、非オブジェクト領域に関する移動でありながら跳躍運動が多ければ「活発」とみなされ、高い重要度が付与される恐れがある。オブジェクト領域があらかじめ定義されているような場合には、その動きの終端が定義済みオブジェクトである場合に限って跳躍運動（オブジェクト間移動）とみなし、以下の情報獲得率、視覚活性水準、重要度の計算を行えばよい。この場合には記録媒体１１６に記録されているオブジェクト定義情報を読み出し制御部１１４で読みだし、オブジェクト情報として眼球運動種別判定部１０４に渡すなどして、対応する。また、オブジェクト領域があらかじめ定義されていない場合でも、若干の画像解析機能を備えたシステムでは、システム自身が画像の解析を行って、オブジェクト領域を定義して同様に用いてもよい。この解析に関しては、画像からオブジェクトの切り出しが可能であればどのような方法を用いてもよい。
【００２７】
一方、オブジェクト情報に、さらにその動きに関する情報も含まれている場合、オブジェクトの動きと視点の動きの類似度により「活発」「不活発」を判定してもよい。図９ａのような動画像があり、サッカーボール９０１は選手９０２に向かって飛んできているものとする。このとき、サッカーボールの動きベクトルが９０３のようであれば、ボールへの追随によって跳躍運動９０４が生じることが有り得る。ベクトル９０３とベクトル９０４の内積を計算すると、両ベクトルが「似ている」時には大きな数値となる。一方、視点がベクトル９０５のように動いたとき、これはオブジェクトの動き９０３とは無関係で、ベクトル９０５とベクトル９０３の内積は小さな（あるいは大きな負の）数値になる。画面内の全オブジェクトに対して領域情報に加えて動きが定義されている場合、視点の近くに視点ベクトルとの内積が大きな値をとるオブジェクトが存在しない場合、これは「オブジェクトをふまえない視点の動き」と考えることができる。このような場合には上述の場合と同様に重要度を計算するための跳躍運動とみなさない。
次に本発明の第２の実施例を図面に基づいて説明する。図１０は本発明の一実施例に係わる情報処理装置の構成を示すブロック図である。また、図１５は以下の処理の流れを説明するフローチャートである。
【００２８】
記録媒体１０１３に記録された画像が画像情報提示部１００９によって提示されるまでの過程については第１の実施例の冒頭で述べたのと同様であるから、ここでは省略する。画像を試聴している利用者１００１の視線はやはり視線検出器１００２によって検出され、第１の実施例と同様に視点演算部１００３で画面上の位置として一時記憶部１００６に送られる（図１５のＳ１５０１）。一方、動き検出部１００８は記録媒体１０１３に記録された画像情報から、輝度、色などを手がかりにしてオブジェクト領域の推定とその動き方向を検出する（Ｓ１５０２）。この動き検出の手段は現在用いられている方法を含め、どのようなものでもよい。動き検出部１００８は各時刻でのオブジェクトの領域とその動きをベクトル比較部１００４に送る。ベクトル比較部１００４はこの動きと一時記憶部１００６に蓄積されている視線の動きとを比較する（Ｓ１５０３）。図１１にはベクトル比較部１００４で行われる演算の様子を模式的に示す。画像が時間経過に伴って図１１のａ、ｂ、ｃのように変化したとき、動き検出部はオブジェクト領域１１０１およびそのオブジェクトの動きベクトル１１０２を出力する（図ではａとｄ、ｂとｅ、ｃとｆが対応している）。一方、そのときの視線の様子が図１１ｄ〜ｆの「○」のようであったとすると、ｄ〜ｆそれぞれの瞬間から過去数データの平均をみることで視線の動きの傾向１１０３が得られる。さてオブジェクトの動きベクトル１１０２と視線の動きベクトル１１０３との内積をとると、同じ方向に移動しているときに大きな数値となる。視点がオブジェクトの上あるいはその近傍にあり、かつ視点とオブジェクトの動きベクトルの内積が大きな数字（それら２ベクトルの長さの積の７０〜８０％程度以上）であるとき、視線はそのオブジェクトに追従しており、利用者にとってそのオブジェクトが関心のあるものであったといえるだろう。こうして一つのオブジェクトに対して、視点がその上または近傍にある場合には、視点動きとオブジェクト動きベクトルの内積（あるいはその内積を両ベクトルの長さで除算したものでもよい。このとき除算した結果の数値は両ベクトルのなす角の余弦である）をもって重要度の指針とすることが可能になる。ベクトル比較部１００４はこの内積の結果を一時記憶部１００６に蓄積する。重要度情報生成部１００５では、こうして蓄積された内積、および動き検出部１００８が定義したオブジェクト領域の位置と動きを、情報単位検出部１００７によって定義された情報の区切り（利用者が操作して決定してもよい。これについては第１の実施例で説明した）の単位で演算した結果とあわせて記録媒体に書き戻す形式に整えて再び一時記憶部１００６に送る（Ｓ１５０４）。この重要度情報は適切な時機に書き込み制御部１０１２を経由して記録媒体１０１３に記録される。「適切な時機」については第１の実施例で説明した。また、原情報と重要度情報が別の記録媒体でもよく、これについても第１の実施例で説明したのでここでは省略する。さらに、原情報にオブジェクト領域やその動きに関する情報があらかじめ記録されている場合には、動き検出部１００８は機能しなくてもよい。
【００２９】
以上のようにして、動きをてがかりにしてオブジェクトの重要度が記録されている場合、次のような活用法が見込まれる。たとえば図１１において、人物１１０４以外のオブジェクトが存在しており、その重要度は人物１１０４よりも低かったとする。するとこのシーン（あるいはカット）でもっとも主要なオブジェクトは人物１１０４であったと判断されるので、シーン検索のためのキー映像として主要オブジェクトが中心にある画像、すなわち図１１ｂが選択される。
【００３０】
【発明の効果】
本発明により、マルチメディア情報に対しての重要度という付加情報を生成することが可能になり、その重要度情報を用いることによって、利用者は自分にとって必要な情報に容易に、手早く到達できることが期待できる。また、視線を用いることによって、重要度情報の入力作業を、特に操作の意識なく行うことができる。これは大量のマルチメディア情報が誰にでも手に入るような環境において、操作に不慣れな利用者にとっても使いやすい情報検索・アクセス環境を提供する。
【図面の簡単な説明】
【図１】本発明の一実施例に係わる情報処理装置の構成を示すブロック図である。
【図２】本発明の一実施例に係わる眼球運動の速度による分類を示す図である。
【図３】本発明の一実施例に係わる人間の実際の視点の動きの例を示す図である。
【図４】本発明の一実施例に係わる情報処理装置の計算経過を示す図である。
【図５】本発明の一実施例に係わる人間の実際の眼球運動の例を示す図である。
【図６】本発明の一実施例に係わる情報処理装置の処理方法を示すグラフの図である。
【図７】本発明の一実施例で示した方法で行った実験結果を示すグラフの図である。
【図８】本発明の一実施例に係わる人間の視点の動きを示す概念図である。
【図９】本発明の一実施例に係わる人間の視点の動きを示す概念図である。
【図１０】本発明の一実施例に係わる情報処理装置の構成を示すブロック図である。
【図１１】本発明の一実施例に係わる情報処理装置の処理方法を示す概念図である。
【図１２】本発明の一実施例に係わる情報処理装置の処理方法を示すフローチャートである。
【図１３】本発明の一実施例に係わる情報処理装置の処理方法を示すフローチャートである。
【図１４】本発明の一実施例に係わる情報処理装置の処理方法を示すフローチャートである。
【図１５】本発明の一実施例に係わる情報処理装置の処理方法を示すフローチャートである。
【符号の説明】
３０３…注視しているときの視点
３０４…眼球が跳躍運動するときの視点の動き
５０１…眼球が跳躍運動しているときの視点座標の変化
５０２…まばたきを行ったときのデータの乱れ
５０４…視覚系の解像度が低下している時間
６０１…眼球運動が活発である時間
６０３…情報獲得率の２階差分が０になる時間
６０４…情報獲得率の極小点
８０３…オブジェクト内での跳躍運動
８０４…オブジェクト間の跳躍運動
８０５…オブジェクト外への跳躍運動
８０６…オブジェクト外での跳躍運動
９０３…ボールの動きベクトル
９０４…ボールに追随した眼球運動の動きベクトル
９０５…ボールの動きと無関係な眼球運動の動きベクトル
１１０２…オブジェクトの動きベクトル
１１０３…視点の動きベクトル[0001]
[Industrial applications]
The present invention relates to a multimedia information processing apparatus that integrally displays, records, reproduces, and edits video, audio, documents, and the like.
[0002]
[Prior art]
It is well known that the performance of microprocessors mounted on personal computers and the like has been remarkably improved, and an environment in which a large amount of information can be processed at high speed by a popular personal computer is being prepared. Recent developments in optical communication technology, communication control technology, magneto-optical disks, and memory elements based on high-density integrated circuits have also led to the increase in the capacity of information transmission paths and the increase in capacity of information storage devices. A world in which large amounts of information can be obtained electronically at low cost in homes and offices is being realized. In addition, since information can be transmitted with simple operations, the privilege of transmitting information, which was previously limited to broadcast stations and newspaper companies, has been expanded to general users, and more and more information has been exchanged. is there. This can be seen from the current spread of personal computer communications, e-mail, and electronic news. Terminals capable of exchanging information in this manner are not limited to personal computers, but are in the form of various devices such as electronic organizers, mobile phones, and facsimile machines. The types of information that they can handle have expanded from conventional character information (text data) to voice / music information, still image information, and moving image information, and their quality has been dramatically improved. I have.
[0003]
However, in many cases, a conventional method is used for the information classification method. Taking the world of electronic news as an example, there are hundreds of newsgroups (= “cuts” for each topic), and dozens of posts are made to each newsgroup per day. Users have to spend a considerable amount of time searching for news that they need / interest in. This is because each news is dealt with in a very simple, newsgroup-to-news layer structure that users cannot usually define on their own. Even on music tapes and video tapes, the start position of each song / program can be determined by detecting a silent portion or an index signal for starting recording, but the content of the song / program, or a scene or portion of particular interest, is determined. Since the information to be pointed cannot be recorded, the user has no other way than to write on those labels and the like.
[0004]
Development for automating the task of classifying the information or extracting a characteristic portion from the information is still being actively performed. However, many of them aim to substitute human judgment mechanisms, require advanced artificial intelligence, and are said to be costly in terms of development time and cost at the moment. I have no choice. Also, such an automatic classification system requires the user to indicate what he or she wants, and is difficult for a user with an unclear vision of what to see and what kind of information to obtain. It can be assumed that such unfamiliar users will increase dramatically in an era when information is full.
[0005]
Rather, what is needed is information that states that "the place I feel is important" is not necessarily required to describe its contents. For example, if an attractive actor appears only in one scene in a movie, it is always necessary to know who the actor is, whether it is a man or a woman, and whether the object (subject) is a human. Rather, a device that simply points to the scene often suffices. Nevertheless, in the current flow of information classification, the procedure is as follows: "Information is scrutinized by the device → Extract a characteristic part as a candidate → Compare with the request information entered by the user → Classify and present" The information content must be carefully analyzed.
[0006]
On the other hand, now that general users are becoming information transmission sources, demands for information processing tools are increasing. Conventional processing tools include a word processor for document information, a CAD (Computer Aided Design) for graphic information, and drawing software. However, there are very few general-purpose tools that are easy to operate, especially for video and audio. The user connects two decks, searches for a target location on one deck, plays it back, and records it on the other deck to perform "cutting" editing. At least two decks are required to perform "cutting" editing, and the operation is complicated, so that ordinary users tend to avoid editing work. At present, some of the proposed editing methods temporarily store video / audio information in a storage medium such as a computer memory or a hard disk and process the information in a computer editing environment. A large-capacity storage medium is required and the cost is high. Since the editing is basically "cutting", no particular improvement in operability can be expected.
[0007]
The lack of a simple and efficient information automatic classification mechanism and information processing means as described above may greatly hinder the spread of a multimedia integrated environment in the future.
[0008]
[Problems to be solved by the invention]
As described above, in the conventional information processing apparatus, there is no means for classifying and organizing various kinds of information efficiently across types of information and reflecting the intention of the user. For this reason, the user must spend time and effort on the processing of the information, and there is a drawback that even when the information is automatically processed as a device, the processing cannot always be adapted to the individual user. Conventionally, the information processing method is complicated, and there is a disadvantage that the cost is likely to be high for realizing the method.
[0009]
The present invention has been made to solve the above-described problem, and provides an information processing apparatus capable of efficiently processing and presenting information required by a user from a large amount of various information with a simple operation.
[0010]
[Means for Solving the Problems]
In order to solve the problem of estimating the temporal change of the importance level without discriminating the subject boundary from the image, the information processing apparatus according to the present invention includes an image, and information of an image and a sound. Means for observing eye movements of a user who is viewing or viewing the information, means for sequentially generating a velocity component of the eye movements of the user observed by the means, And means for estimating the degree of activity of the eye movement from the velocity component generated in step (a), and estimating the importance of the information based on the degree of activity of the eye movement obtained by the means for obtaining the eye movement.
[0011]
[Action]
In the information processing apparatus according to the present invention, it is possible to express numerically how important the user recognizes the presented information by using the line of sight. Since a part of the user's intention is obtained from the natural movement of the gaze, the user becomes less proficient in the operation, and the user can input his / her intention without being aware that the measurement is being performed. Is In this way, the parts that the user recognizes as important in the original information and the degree thereof can be obtained by a simple input method, so that processing such as presentation and sorting biased toward “important parts” can be performed. It is possible to improve the efficiency of information access.
[0012]
【Example】
Hereinafter, a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration of an information processing apparatus according to one embodiment of the present invention. 12 to 14 are flowcharts illustrating the following processing flow.
[0013]
The multimedia information stored in the recording medium 116 is read out by the readout control unit 114, and is sent by the image information presenting unit 110, the audio information presenting unit 111, the music information presenting unit 112 via the presentation form creating unit 113, and the like. It is presented to the user 101. The presentation form creation unit 113 specifically refers to a video RAM, a video processor, a sound manager, a speaker driver, a MIDI interface, and the like. In the above process, there is no difference from the method of presenting information of a conventional multimedia device such as a combination of a television and a video tape recorder (VTR) and a personal computer equipped with a CD-ROM drive.
[0014]
On the other hand, the line of sight of the user 101 viewing the information presented in this way is measured by the line of sight detector 102 and sent to the viewpoint calculation unit 103. The eye-gaze detector may be of any type, but the following description is based on the assumption that the eye-gaze detector outputs two-channel voltage values corresponding to the two-dimensional coordinates of the viewpoint. I do. The data sent from the eye gaze detector 102 to the viewpoint calculation unit 103 is, for example, as shown in FIG. 4A (in FIG. 12, the data as in FIG. 4A flows at the location of a in FIG. 12; the same applies hereinafter). . The viewpoint calculation unit 103 converts this voltage value into a position in the screen and stores it in the temporary storage unit 108 (S1201 in FIG. 12). The data output from the viewpoint calculation unit 103 is as shown in FIG. 4B. Based on the coordinate data, the eye movement type determining unit 104 determines the type of eye movement at the time and stores it again in the temporary storage unit 108 (S1203 to S1209 in FIG. 12).
[0015]
As shown in FIG. 2, human eye movements are roughly divided into fixation fine movements and tracking movements, and jumping movements can be distinguished from fixation fine movements and follow-up movements by observing the velocity components. FIG. 3 schematically shows an actual line of sight when viewing a moving image. In the screen, two main objects (subjects), a person 301 and a television stand 302, are shown. A circle (○) in FIG. 3 is a viewpoint recorded every 1/60 second. As shown here, it can be seen that the slow motion continues while looking at the object (303), and the fast motion is performed (304) when moving between objects.
[0016]
In FIG. 5, similarly, the data of the viewpoint actually recorded are shown with the lapse of time on the horizontal axis and coordinate values on the vertical axis. In the figure, a solid line and a broken line represent an X coordinate and a Y coordinate, respectively. Although three seconds are shown in the figure, the state in which the coordinates rapidly change in an extremely short time is seen five times (501). This corresponds to a jumping movement. On the other hand, a pulse-like motion much larger than the jumping motion is also seen three times (502), which is a blink. Such movement is caused by the characteristics of the eye-gaze detector. However, since the coordinate value hardly changes before and after the pulse-like movement, it can be distinguished from the jumping movement.
[0017]
As described above, it is the eye movement type determination unit 104 in FIG. 1 that distinguishes the eye movement into three types of “blink”, “jump”, and other “gaze”. The data determined by the eye movement type determination unit 104 has a form as shown in FIG. 4C. It should be noted that an object whose viewpoint has been moved during blinking should be regarded as a “jump” because of the movement of the object, and it is assumed that the eye-movement-type determining unit 104 makes such determination. The eye movement type at the time and time is again stored in the temporary storage unit 108 by the eye movement type determination unit 104.
[0018]
As for human visual characteristics, it is known that the resolution of an image viewed within about 200 milliseconds after a jumping motion is reduced. This is described in, for example, "Experimental Psychology of Eye Movement" (Nagoya University Press, 1993). Therefore, it can be said that the user can actually gaze at the object at the normal resolution 200 ms after the jumping motion. Therefore, the time for viewing each object is as shown by 503 in FIG. Reference numeral 504 denotes a time during which the resolution is reduced due to the jumping motion. The "time when each object was viewed" per unit time in the past, that is, the time excluding the jumping motion and 200 milliseconds thereafter is defined as the "information acquisition rate" (S1302 to S1305 in FIG. 13). Here, the jumping movement includes a case where the object is moved during blinking. Conversely, when the viewpoint before and after the blink does not change significantly (502 in FIG. 5), the object is not moved, and the following process is handled in the same manner as the gaze. The information acquisition rate calculation unit 105 calculates the information acquisition rate from the time transition of the eye movement type in this way, and stores the information acquisition rate in the temporary storage unit 108 again. This data takes a form as shown in FIG. 4D, for example.
[0019]
When the information acquisition rate decreases, it can be said that the visual system attempts to see many objects even at the expense of resolution reduction due to jumping motion. Conversely, when there are few objects to be seen in the screen, it is considered that the time during which the information acquisition rate is high is long. Therefore, it is considered that the visual system is active in collecting information while the information acquisition rate is decreasing, and a time function for evaluating the activity is defined as a “visual activity level”. The visual activity level is the time during which the information acquisition rate has been decreasing in the past unit time. The “time when the information acquisition rate is decreasing” is a time section surrounded by two consecutive moments at the moment when the second derivative of the information acquisition rate with time becomes 0, and includes the minimum value of the information acquisition rate. That is, it is the time section 601 in FIG. In FIG. 6, 602 is the information acquisition rate, 603 is the time at which the second derivative of the information acquisition rate 602 becomes 0, and 604 is the minimum value of the information acquisition rate 602. Since the actual calculation inside the computer takes a discrete value in the time direction, the above “derivative” may be replaced with “difference” to perform the calculation. The visual activity level calculator 106 performs such calculation. The visual activity level calculation unit 106 determines whether the eye movement is “active” or “inactive” at the time through the above-described calculation process based on the information acquisition rate calculated so far, and outputs the result. This output is stored again in the temporary storage 108 (S1306 to S1308). The output data is in a form as shown in FIG. 4e, for example.
[0020]
The time resolution of the above process is defined by the capability of the processing system of the present invention, and is divided into time units that are easy for the user to understand. The time resolution specified by the system is, for example, 1/30 second, 1/60 second, 1/100 second, etc., and the time unit that is easy for the user to understand is, for example, “cut”, “scene” or 30 seconds, 3 minutes, and so on. The importance information generation unit 107 calculates the information acquisition rate in a time unit that is easy for the user to understand, and outputs the calculation result and the time unit as the importance information. That is, data such as "the cut number 14 is from 3:23:15 field to 3:29:27 field. Its importance is 0.73". In the case where a time unit such as a cut here is recorded in advance together with the original information, the information on the start time and the end time can be omitted. In addition, in what time unit the importance information is added may be selected by the user from among the above “cut”, “scene”, and how many seconds.
[0021]
This time unit is determined by the information unit detection unit 109 and sent to the importance information generation unit 107. In the case where a break in information is not recorded in the original information, the information unit detection unit 109 detects a scene change or the like by detecting a moment such as a sudden change in color tone or luminance in the image information. The definition method of these information units may be any method. When the user particularly wants to correct the break of the information automatically detected by the information unit detection unit 109, the break of the information newly defined by the user may be set as the time unit for the importance information. Further, when such a time unit is recorded in the original information in advance, the information unit detection unit 109 may use the recorded time unit.
[0022]
Among the importance information in the time unit, the method of calculating the importance is, for example, as follows. As described above, since the information acquisition rate “active” or “inactive” is known for each time, the ratio of the “active” time to the time unit is adopted as the importance (FIG. 14). If the time that was "active" for a cut of 10 seconds in length is 4 seconds in total, the importance is 0.40, and the time of "active" in a scene of 3 minutes in length is 2 in total For example, 0.70 for a minute of 6 seconds, and 0.33 for a total of 40 minutes of "active" time for a movie having a length of 2 hours. The importance information is again stored in the temporary storage unit 108, and is recorded on the recording medium in association with the original information at an appropriate time. The appropriate time depends on the operating environment of the system. When the recording medium 116 and its peripheral mechanism are capable of recording at the same time as information reproduction, the above process is performed simultaneously with information presentation, and the importance information is recorded in parallel with reproduction / presentation. Is also good. Further, the information may be stored in the temporary storage unit 108 until the information presentation is completed, and the recording of the importance information may be started collectively after the presentation is completed. On the other hand, in FIG. 1, the recording medium 116 is illustrated assuming an optical disk or the like, but the form of the recording medium 116 is not particularly limited to this, and the importance information is recorded on a medium different from the original information. You may. As the recording medium 116 for recording the original information and the importance information, a magnetic tape such as an optical disk, a magneto-optical disk (MO), a video tape, a magnetic disk such as a floppy (R) disk or a hard disk, an IC memory, or a communication cable Alternatively, a remote computer or the like may be used.
[0023]
The importance recorded in this manner is effective for comparison with other importance information for which a time unit is set in the same manner. For example, when the importance is recorded in a video of a family trip photographed by a home video camera by the above-described means, the playback apparatus performs a cut with an importance of 0.35 and a cut with an importance of 0.64. Then, the system presumes that the latter is more likely to be searched after a lapse of time, and stores in advance the audio and video information of the first few seconds of the cut in a memory element. Since the information stored in the memory element in advance can be accessed at high speed, it can be expected that the user can reach the target image more smoothly. Alternatively, when there is an optical disk of a movie of importance 0.20 and an optical disk of a movie of importance 0.77, a disc changer (a reproducing device capable of storing a plurality of discs) stores the latter at a position where it can be easily taken out, and The effect can be expected.
[0024]
Next, FIG. 7 shows an experimental result in which the importance is given by the above-described method and the matching with the importance actually considered by the user is compared. FIG. 7 shows a series of moving images composed of five scenes to the subject, and when the scenes are rearranged in the order of “interesting / important”, the first, second, third,. 7 is a graph showing what importance is given to the system using the method of the present invention. Note that the image is a movie, commercial, landscape, music video, and soccer, each of which is continuous for 30 seconds and a solid screen of 15 seconds. If the method of the invention is valid, the graph should show a monotonically decreasing trend. FIG. 7 shows that, in the subject KNT, the order of importance declared by the subject matches the order of importance assigned by the present system. Also, in the subject NZS, the order is consistent except for the image of the third place. Further, even with data such as the subject KTS, it is possible to discriminate the 1st, 2nd and 3rd to 5th places.
[0025]
In an image generated by CG (computer graphics) or the like, it may be known which position in the screen belongs to which object. In such a case, there is a possibility that the accuracy of the above method can be improved by utilizing the relationship between the viewpoint position and the object, and the method will be described.
[0026]
When an image as shown in FIG. 8A is present, it is assumed that the object area is known, such as the person 801 and the paper 802 in FIG. 8B, and the respective boundaries are defined. At this time, assuming that the viewpoint is like “O” or “X” in FIG. 8B, the viewpoint that is “X” among those viewpoints is in a place that does not belong to either of the two objects. Among the jumping motions seen in FIG. 8b, in the importance estimation procedure described so far, the movement within the defined object (803) or the movement between the defined objects (804) and the movement to the non-object area are performed. The movement (805) and the movement in the non-object area (806) were treated exactly the same. For this reason, if there are many jumping movements in spite of movement relating to the non-object area, the movement is considered to be “active” and a high degree of importance may be given. If the object area is defined in advance, it is regarded as a jumping movement (movement between objects) only when the end of the movement is a defined object, and the following information acquisition rate, visual activity level, importance May be calculated. In this case, the object definition information recorded on the recording medium 116 is read by the read control unit 114, and is passed to the eye movement type determination unit 104 as object information. Further, even when the object area is not defined in advance, in a system having a slight image analysis function, the system itself may analyze an image, define an object area, and use the same. Regarding this analysis, any method may be used as long as the object can be cut out from the image.
[0027]
On the other hand, when the object information further includes information on the movement, “active” or “inactive” may be determined based on the similarity between the movement of the object and the movement of the viewpoint. It is assumed that there is a moving image as shown in FIG. 9A and the soccer ball 901 is flying toward the player 902. At this time, if the motion vector of the soccer ball is like 903, the jumping motion 904 may occur by following the ball. When the inner product of the vector 903 and the vector 904 is calculated, a large numerical value is obtained when the two vectors are “similar”. On the other hand, when the viewpoint moves like the vector 905, this is irrelevant to the movement 903 of the object, and the inner product of the vector 905 and the vector 903 becomes a small (or large negative) numerical value. If the motion is defined in addition to the area information for all objects in the screen, and there is no object near the viewpoint that has a large inner product with the viewpoint vector, Movement ". In such a case, it is not regarded as a jumping motion for calculating the importance as in the case described above.
Next, a second embodiment of the present invention will be described with reference to the drawings. FIG. 10 is a block diagram showing the configuration of the information processing apparatus according to one embodiment of the present invention. FIG. 15 is a flowchart illustrating the following processing flow.
[0028]
The process until the image recorded on the recording medium 1013 is presented by the image information presentation unit 1009 is the same as that described at the beginning of the first embodiment, and a description thereof will be omitted. The line of sight of the user 1001 who is previewing the image is also detected by the line of sight detector 1002, and sent to the temporary storage unit 1006 as a position on the screen by the viewpoint calculation unit 1003 as in the first embodiment (see FIG. 15). S1501). On the other hand, from the image information recorded on the recording medium 1013, the motion detection unit 1008 estimates the object area and detects the direction of the motion using the luminance, color, and the like as clues (S1502). This motion detection means may be of any type, including currently used methods. The motion detection unit 1008 sends the area of the object at each time and its motion to the vector comparison unit 1004. The vector comparison unit 1004 compares this movement with the movement of the line of sight stored in the temporary storage unit 1006 (S1503). FIG. 11 schematically shows the state of the operation performed by the vector comparison unit 1004. When the image changes over time as shown in a, b, and c in FIG. 11, the motion detection unit outputs an object area 1101 and a motion vector 1102 of the object (a and d, b, e, and c and f correspond to each other). On the other hand, assuming that the state of the line of sight at that time is as shown by “○” in FIGS. 11D to 11F, the tendency 1103 of the line of sight movement can be obtained by observing the average of past number data from the respective moments of d to f. When the inner product of the motion vector 1102 of the object and the motion vector 1103 of the line of sight is obtained, a large numerical value is obtained when moving in the same direction. When the viewpoint is on or near the object and the inner product of the motion vector of the viewpoint and the object is a large number (about 70 to 80% or more of the product of the lengths of these two vectors), the line of sight follows the object. Therefore, it can be said that the object was of interest to the user. Thus, when the viewpoint is on or near one object, the inner product of the viewpoint motion and the object motion vector (or the inner product may be divided by the length of both vectors. Is the cosine of the angle formed by the two vectors). The vector comparison unit 1004 stores the result of the inner product in the temporary storage unit 1006. The importance information generation unit 1005 determines the inner product thus accumulated, and the position and movement of the object area defined by the motion detection unit 1008 by dividing the information defined by the information unit detection unit 1007 (by operating the user. This is described in the first embodiment), and the data is returned to the temporary storage unit 1006 in the form of writing back to the recording medium together with the result calculated in the unit (S1504). This importance information is recorded on the recording medium 1013 via the write control unit 1012 at an appropriate time. The “appropriate timing” has been described in the first embodiment. In addition, the original information and the importance information may be different recording media, which have also been described in the first embodiment, and will not be described here. Further, when information relating to the object area and its movement is recorded in the original information in advance, the movement detecting unit 1008 may not function.
[0029]
As described above, when the importance of an object is recorded based on the movement, the following utilization method is expected. For example, in FIG. 11, it is assumed that an object other than the person 1104 exists and its importance is lower than that of the person 1104. Then, it is determined that the person 1104 is the most main object in this scene (or cut), and an image in which the main object is at the center, that is, FIG. 11B is selected as a key image for scene search.
[0030]
【The invention's effect】
According to the present invention, it is possible to generate additional information of importance to multimedia information, and by using the importance information, a user can easily and quickly reach necessary information for himself / herself. Can be expected. Further, by using the line of sight, the input operation of the importance information can be performed without particularly consciousness of the operation. This provides an information retrieval and access environment that is easy to use even for users unfamiliar with operations in an environment where a large amount of multimedia information is available to anyone.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of an information processing apparatus according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating classification based on the speed of eye movement according to an embodiment of the present invention.
FIG. 3 is a diagram illustrating an example of a movement of an actual viewpoint of a person according to an embodiment of the present invention;
FIG. 4 is a diagram showing a calculation progress of the information processing apparatus according to one embodiment of the present invention.
FIG. 5 is a diagram showing an example of actual eye movement of a person according to an embodiment of the present invention.
FIG. 6 is a graph showing a processing method of the information processing apparatus according to one embodiment of the present invention.
FIG. 7 is a graph showing the results of an experiment performed by the method shown in one example of the present invention.
FIG. 8 is a conceptual diagram showing movement of a human viewpoint according to an embodiment of the present invention.
FIG. 9 is a conceptual diagram showing movement of a human viewpoint according to an embodiment of the present invention.
FIG. 10 is a block diagram illustrating a configuration of an information processing apparatus according to an embodiment of the present invention.
FIG. 11 is a conceptual diagram showing a processing method of the information processing apparatus according to one embodiment of the present invention.
FIG. 12 is a flowchart illustrating a processing method of the information processing apparatus according to an embodiment of the present invention.
FIG. 13 is a flowchart illustrating a processing method of the information processing apparatus according to an embodiment of the present invention.
FIG. 14 is a flowchart illustrating a processing method of the information processing apparatus according to an embodiment of the present invention.
FIG. 15 is a flowchart illustrating a processing method of the information processing apparatus according to an embodiment of the present invention.
[Explanation of symbols]
303: Viewpoint when gazing
304: Movement of the viewpoint when the eyeball jumps
501: Changes in viewpoint coordinates when the eyeball is jumping
502: Data disorder when blinking
504: Time during which the resolution of the visual system is reduced
601: Time during which the eye movements are active
603: Time when the second order difference of the information acquisition rate becomes 0
604: Minimum point of information acquisition rate
803: Jumping motion in the object
804: Jumping motion between objects
805: Jumping movement outside the object
806: Jumping movement outside the object
903: Ball motion vector
904: motion vector of eye movement following the ball
905: motion vector of eye movement unrelated to ball movement
1102: Object motion vector
1103 ... Motion vector of viewpoint

Claims

Means for presenting image information to the user;
Viewpoint detection means for sequentially detecting a viewpoint position on the image information of a user who is viewing the image information,
Viewpoint motion detecting means for obtaining a viewpoint motion vector from the detected viewpoint,
Object motion detecting means for detecting a motion vector of the object in the image information,
Means for calculating an inner product of the motion vector of the object and the motion vector of the viewpoint,
Importance estimation means for estimating the importance of an object in an image based on the obtained inner product;
Information processing device having