JP2016194877A

JP2016194877A - Explanation support device, explanation support method, and explanation support program

Info

Publication number: JP2016194877A
Application number: JP2015075475A
Authority: JP
Inventors: 村瀬　健太郎; Kentaro Murase; 健太郎村瀬; 高橋　潤; Jun Takahashi; 潤高橋; 田中　正清; Masakiyo Tanaka; 正清田中
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-04-01
Filing date: 2015-04-01
Publication date: 2016-11-17
Anticipated expiration: 2035-04-01
Also published as: JP6471589B2

Abstract

【課題】説明箇所の推定精度の低下を抑制すること。【解決手段】説明支援装置１０は、表示部５に対する視線検出を実行し、音声認識を実行し、表示部５に表示された文書のページのうち音声認識の結果に対応する説明箇所を判定し、音声認識に対応する説明箇所と視線検出に対応する説明箇所とが一致する第１の状態、音声認識に対応する説明箇所と視線検出に対応する説明箇所とが一致しない第２の状態または視線検出による視線の位置が表示部５の画面内に検出されない第３の状態のうちいずれの説明状態であるのかを判定し、判定された説明状態に基づいて強調表示を実行する説明箇所を推定する。【選択図】図１An object of the present invention is to suppress a decrease in the estimation accuracy of an explanation location. An explanation support apparatus performs line-of-sight detection on a display unit, performs voice recognition, and determines an explanation portion corresponding to a result of voice recognition among pages of a document displayed on the display unit. The first state where the explanation location corresponding to the speech recognition and the explanation location corresponding to the line of sight detection match, the second state or the line of sight where the explanation location corresponding to the speech recognition and the description location corresponding to the line of sight detection do not match It is determined which of the third states in which the position of the line of sight by detection is not detected in the screen of the display unit 5, and an explanation location for performing highlighting is estimated based on the determined explanation state. . [Selection] Figure 1

Description

本発明は、説明支援装置、説明支援方法及び説明支援プログラムに関する。 The present invention relates to an explanation support apparatus, an explanation support method, and an explanation support program.

遠隔会議やプレゼンテーションにおける説明において、レーザーポインタやマウスカーソルなどのポインティングデバイスが使われてきた。このようなポインティングデバイスをプレゼンタ等の説明者に操作させる場合、説明以外に労力が割かれることになる。このことから、音声認識を活用して、プレゼンタが説明中である箇所をプレゼンタや聴講者に提示する技術も提案されている。 Pointing devices such as laser pointers and mouse cursors have been used in remote conferences and presentations. When such a pointing device is operated by a presenter such as a presenter, labor is devoted to other than the explanation. For this reason, a technique has also been proposed in which speech presenting is used to present to the presenter or the audience the location that the presenter is explaining.

特開２００５−３３８１７３号公報JP 2005-338173 A 特開２００４−２４６３９８号公報JP 2004-246398 A 特開２００４−７３５８号公報JP 2004-7358 A

ところが、説明箇所を提示するレスポンスを高めるためには、十分な数の単語が認識されていない場合でも、音声認識の結果と説明箇所との対応付けを行わねばならない場合があるが、音声認識には、その精度に自ずから限界がある。それ故、音声認識の結果に誤りが含まれると、説明箇所が正しく推定されない場合がある。 However, in order to increase the response to present the explanation location, it may be necessary to associate the speech recognition result with the explanation location even when a sufficient number of words are not recognized. Are inherently limited in their accuracy. Therefore, if an error is included in the result of speech recognition, the explanation location may not be estimated correctly.

このことから、説明箇所の推定精度が低下するのを抑制するために、音声認識と視線検出を併用することが考えられる。例えば、音声認識の結果から推定される説明箇所と視線検出の結果から推定される説明箇所が一致する場合に、当該説明箇所を強調表示することが考えられる。 For this reason, it is conceivable to use both speech recognition and line-of-sight detection in order to suppress a decrease in the estimation accuracy of the explanation location. For example, when the explanation location estimated from the speech recognition result matches the explanation location estimated from the line-of-sight detection result, the explanation location may be highlighted.

しかしながら、音声認識と視線検出を併用する場合にも、説明箇所の推定精度が低下するのを抑制できない場合がある。なぜなら、説明者は、必ずしも説明する箇所を注視しながら読み上げるとは限らないからである。 However, even when voice recognition and line-of-sight detection are used together, it may not be possible to suppress a decrease in the estimation accuracy of the explanation location. This is because the presenter does not always read out while watching the portion to be explained.

例えば、説明者が次に説明する箇所を目視により先行して確認しながら、それよりも前の箇所を読み上げて説明する場合もある。このような場合、音声認識の結果から推定される説明箇所と視線検出の結果から推定される説明箇所が一致する状態から一致しない状態に変わった場合、依然として説明が継続されているにもかかわらず、誤って説明箇所の強調表示が解除される。 For example, there may be a case where the presenter reads and explains the previous part while visually confirming the part to be described next in advance. In such a case, when the explanation location estimated from the result of speech recognition and the explanation location estimated from the result of line-of-sight detection change from a matching state to a mismatching state, the explanation is still continued. The highlighting of the explanation part is canceled by mistake.

また、音声認識の結果から推定される説明箇所と視線検出の結果から推定される説明箇所が一致しない状態で、先行的に動いた視線の先に、現在の説明箇所と同じ単語が偶然に存在する場合に、現在の説明箇所の説明が継続されているにもかかわらず、先行的に動いた視線の先を誤って説明箇所として強調表示されてしまう場合がある。 In addition, when the explanation location estimated from the result of speech recognition does not match the explanation location estimated from the result of eye gaze detection, the same word as the current explanation location accidentally exists ahead of the line of sight that moved in advance. In this case, there is a case where the point of the line of sight that has moved in advance is erroneously highlighted as the explanation part, even though the explanation of the current explanation part is continued.

１つの側面では、本発明は、説明箇所の推定精度の低下を抑制できる説明支援装置、説明支援方法及び説明支援プログラムを提供することを目的とする。 In one aspect, an object of the present invention is to provide an explanation support device, an explanation support method, and an explanation support program that can suppress a decrease in estimation accuracy of an explanation location.

一態様の説明支援装置は、所定の表示部に対する視線検出を実行する視線検出部と、音声認識を実行する音声認識部と、前記表示部に表示された文書のページのうち前記音声認識の結果に対応する説明箇所を判定する認識結果判定部と、前記音声認識に対応する説明箇所と前記視線検出に対応する説明箇所とが一致する第１の状態、前記音声認識に対応する説明箇所と前記視線検出に対応する説明箇所とが一致しない第２の状態または前記視線検出による視線の位置が前記表示部の画面内に検出されない第３の状態のうちいずれの説明状態であるのかを判定する説明状態判定部と、判定された説明状態に基づいて強調表示を実行する説明箇所を推定する推定部とを有する。 An explanation support apparatus according to an aspect includes a line-of-sight detection unit that performs line-of-sight detection on a predetermined display unit, a voice recognition unit that executes voice recognition, and a result of the voice recognition among document pages displayed on the display unit A recognition result determination unit for determining an explanation location corresponding to the first state where the explanation location corresponding to the speech recognition and the explanation location corresponding to the line-of-sight detection match, the explanation location corresponding to the speech recognition, and the Explanation for determining which explanation state is in a second state where the explanation location corresponding to the gaze detection does not match or a third state where the position of the gaze by the gaze detection is not detected in the screen of the display unit It has a state determination part and an estimation part which estimates the description location which performs highlighting based on the determined description state.

説明箇所の推定精度の低下を抑制できる。 It is possible to suppress a decrease in estimation accuracy of the explanation location.

図１は、実施例１に係る説明支援装置の機能的構成を示す図である。FIG. 1 is a diagram illustrating a functional configuration of the explanation support apparatus according to the first embodiment. 図２は、説明状態の一例を示す図である。FIG. 2 is a diagram illustrating an example of an explanation state. 図３は、説明状態の継続性と説明箇所の対応関係の一例を示す図である。FIG. 3 is a diagram illustrating an example of the correspondence between the continuity of the explanation state and the explanation location. 図４は、実施例１に係る説明支援処理の手順を示すフローチャートである。FIG. 4 is a flowchart illustrating the procedure of the explanation support process according to the first embodiment. 図５は、実施例２に係る説明支援装置の機能的構成を示す図である。FIG. 5 is a diagram illustrating a functional configuration of the explanation support apparatus according to the second embodiment. 図６は、停留点の動きの一例を示す図である。FIG. 6 is a diagram illustrating an example of movement of a stop point. 図７は、説明状態及び音読状態と説明箇所との対応関係の一例を示す図である。FIG. 7 is a diagram illustrating an example of a correspondence relationship between the explanation state and the reading aloud state and the explanation portion. 図８は、実施例２に係る説明支援処理の手順を示すフローチャートである。FIG. 8 is a flowchart illustrating the procedure of the explanation support process according to the second embodiment. 図９は、実施例１及び実施例２に係る説明支援プログラムを実行するコンピュータのハードウェア構成例を示す図である。FIG. 9 is a diagram illustrating a hardware configuration example of a computer that executes the explanation support program according to the first embodiment and the second embodiment.

以下に添付図面を参照して本願に係る説明支援装置、説明支援方法及び説明支援プログラムについて説明する。なお、この実施例は開示の技術を限定するものではない。そして、各実施例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Hereinafter, an explanation support apparatus, an explanation support method, and an explanation support program according to the present application will be described with reference to the accompanying drawings. Note that this embodiment does not limit the disclosed technology. Each embodiment can be appropriately combined within a range in which processing contents are not contradictory.

図１は、実施例１に係る説明支援装置の機能的構成を示す図である。図１に示す説明支援装置１０は、音声認識と視線検出を併用することにより、文書に含まれるページのうち表示部５に表示されたページ内の説明箇所の強調表示を行う説明支援サービスを提供するものである。 FIG. 1 is a diagram illustrating a functional configuration of the explanation support apparatus according to the first embodiment. The explanation support apparatus 10 shown in FIG. 1 provides an explanation support service that highlights explanation parts in a page displayed on the display unit 5 among pages included in a document by using both voice recognition and line-of-sight detection. To do.

この説明支援装置１０には、図１に示す通り、カメラ１と、マイク３と、表示部５とが接続される。なお、以下では、一例として、プレゼンタ及び聴講者がプレゼンテーションソフトにより作成されたスライドが表示された表示部５を閲覧することにより文書が共有される場合を例示するが、電話会議システム等により互いのコンピュータが同一の文書を表示することにより文書が共有される場合にもその適用範囲が及ぶのは言うまでもない。 As illustrated in FIG. 1, a camera 1, a microphone 3, and a display unit 5 are connected to the explanation support apparatus 10. In the following, as an example, a case where a presenter and a listener share a document by browsing the display unit 5 on which a slide created by presentation software is displayed is illustrated. Needless to say, the scope of application extends even when documents are shared by displaying the same document on a computer.

カメラ１は、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）などの撮像素子を搭載する撮像装置である。 The camera 1 is an imaging device that includes an imaging device such as a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS).

例えば、カメラ１は、表示部５のスクリーンと共に、プレゼンタと正対する位置に配置される。このカメラ１は、一例として、プレゼンタの顔がカメラ１の画角に含まれるように、表示部５のスクリーン及びプレゼンタとの間で互いの位置がキャリブレーションされた位置に配置することにより、プレゼンタの視線検出に用いることができる。このような配置によって、カメラ１は、カメラ１の近傍に配置された図示しない光源を制御することによって被写体に赤外線を照射させ、被写体からの反射光を受光した上でデジタル信号に変換された被写体の画像を視線検出部１１へ出力する。このとき、被写体にプレゼンタの眼球が含まれる場合には、網膜上の光の反射がカメラ１によって捉えられ、眼球の瞳孔部分が他の部分よりも明るく写った画像を視線検出部１１へ出力できる。 For example, the camera 1 is disposed at a position facing the presenter together with the screen of the display unit 5. As an example, the camera 1 is arranged so that the presenter's face is included in the angle of view of the camera 1 by arranging the positions of the presenter and the presenter at positions calibrated between the screen and the presenter. Can be used for detecting the line of sight. With such an arrangement, the camera 1 controls a light source (not shown) arranged in the vicinity of the camera 1 to irradiate the subject with infrared rays, receives reflected light from the subject, and then converts the subject into a digital signal. Are output to the line-of-sight detection unit 11. At this time, when a presenter's eyeball is included in the subject, reflection of light on the retina is captured by the camera 1, and an image in which the pupil part of the eyeball appears brighter than other parts can be output to the line-of-sight detection unit 11. .

マイク３は、音声を電気信号に変換する装置であり、マイクロフォンと呼ばれることもある。例えば、マイク３は、プレゼンテーションを実施するプレゼンタに装着させることができる。この場合、ヘッドセット型やタイピン型のマイクをプレゼンタの身体や衣服の所定位置に装着させたり、ハンド型のマイクをプレゼンタに携帯させたりすることができる。また、マイク３は、プレゼンタの発話が集音できる範囲の所定位置に設置することもできる。この場合、マイク３には、取付け型や据置き型のマイクを採用することもできる。これらいずれの場合においても、マイク３には、任意のタイプの指向性を持つマイクを採用できるが、プレゼンタの発話以外の音声、例えば聴講者等の発話や騒音などの雑音が集音されるのを抑制するために、マイクの感度をプレゼンタの発声方向に限定することもできる。なお、マイク３には、ダイナミック型、エレクトレットコンデンサ型、コンデンサ型などの任意の変換方式を採用することができる。 The microphone 3 is a device that converts sound into an electrical signal, and is sometimes called a microphone. For example, the microphone 3 can be attached to a presenter who performs a presentation. In this case, a headset-type or tie-pin type microphone can be attached to a predetermined position of the presenter's body or clothes, or a hand-type microphone can be carried by the presenter. The microphone 3 can also be installed at a predetermined position in a range where the utterance of the presenter can be collected. In this case, the microphone 3 may be an attachment type or a stationary type microphone. In any of these cases, a microphone having any type of directivity can be adopted as the microphone 3, but sounds other than the presenter's utterance, for example, the utterance of the listener and the noise such as noise are collected. In order to suppress this, the sensitivity of the microphone can be limited to the speaking direction of the presenter. The microphone 3 can employ any conversion method such as a dynamic type, an electret capacitor type, or a capacitor type.

このマイク３に音声を採取することにより得られたアナログ信号は、デジタル信号へ変換された上で説明支援装置１０へ入力される。 The analog signal obtained by collecting the sound in the microphone 3 is converted into a digital signal and then input to the explanation support apparatus 10.

表示部５は、各種の情報を表示する装置である。例えば、表示部５には、発光により表示を実現する液晶ディスプレイや有機ＥＬ（electroluminescence）ディスプレイなどを採用することもできるし、投影により表示を実現するプロジェクタを採用することもできる。 The display unit 5 is a device that displays various types of information. For example, the display unit 5 may be a liquid crystal display or an organic EL (electroluminescence) display that realizes display by light emission, or a projector that realizes display by projection.

例えば、表示部５は、説明支援装置１０からの指示にしたがってプレゼンテーション画面を表示する。具体的には、表示部５は、説明支援装置１０上で動作するプレゼンテーションソフトが開く文書のスライドを表示する。この場合、表示部５は、文書が含むスライドのうちプレゼンタが図示しない入力デバイス、例えばレーザーポインタやマウスカーソルなどのポインティングデバイスを介して指定する任意のスライドを表示させることもできるし、プレゼンテーションソフトが有するスライドショーの機能がＯＮ状態に設定された場合、各スライドが作成されたページ順に文書ファイルが含むスライドを切り替えて表示させることもできる。 For example, the display unit 5 displays a presentation screen according to an instruction from the explanation support apparatus 10. Specifically, the display unit 5 displays a slide of a document opened by presentation software that operates on the explanation support apparatus 10. In this case, the display unit 5 can display any slide specified by the presenter via a pointing device such as a laser pointer or a mouse cursor, which is not shown by the presenter, among slides included in the document. When the slide show function is set to ON, the slides included in the document file can be switched and displayed in the order in which the slides are created.

図１に示すように、説明支援装置１０は、視線検出部１１と、視線判定部１２と、音声認識部１３と、文書取得部１４と、認識結果判定部１５と、説明状態判定部１６と、履歴記憶部１６ａと、説明箇所推定部１７と、強調表示制御部１８とを有する。なお、説明支援装置１０は、図１に示した機能部以外にも既知のコンピュータが有する各種の機能部、例えば各種の入出力デバイスや音声出力デバイスなどの機能部を有することとしてもかまわない。 As illustrated in FIG. 1, the explanation support device 10 includes a gaze detection unit 11, a gaze determination unit 12, a voice recognition unit 13, a document acquisition unit 14, a recognition result determination unit 15, and an explanation state determination unit 16. , A history storage unit 16a, an explanation location estimation unit 17, and a highlight display control unit 18. Note that the explanation support apparatus 10 may include various functional units included in a known computer, for example, functional units such as various input / output devices and audio output devices, in addition to the functional units illustrated in FIG.

視線検出部１１は、視線検出を実行する処理部である。 The line-of-sight detection unit 11 is a processing unit that performs line-of-sight detection.

一実施形態として、視線検出部１１は、カメラ１から出力された被写体の画像に角膜反射法などのアルゴリズムを適用し、眼球の瞳孔の中心位置から視線の方向が指す視点の位置、いわゆる注視点を検出する。このように角膜反射法を用いて視線の位置を検出する以外にも、他の方法を用いて視線の位置を検出することもできる。例えば、表示部５の画面を分割し、分割した領域を見る眼の形状を学習しておき、カメラ１から入力される被写体の画像から検出した眼の形状とテンプレートマッチングを実行することによって視線検出を行うこともできる。また、利用者に視線の位置を検出するヘッドセットを装着させ、ヘッドセットによって検出された視線の位置を取得することとしてもかまわない。 As one embodiment, the line-of-sight detection unit 11 applies an algorithm such as a corneal reflection method to the subject image output from the camera 1, and a viewpoint position pointed by the direction of the line of sight from the center position of the pupil of the eyeball, so-called gaze point. Is detected. Thus, besides detecting the position of the line of sight using the corneal reflection method, the position of the line of sight can also be detected using another method. For example, by dividing the screen of the display unit 5, learning the shape of the eye that sees the divided area, and performing eye-gaze detection by executing template matching with the eye shape detected from the subject image input from the camera 1. Can also be done. Alternatively, the user may wear a headset that detects the position of the line of sight and acquire the position of the line of sight detected by the headset.

このように視線検出が行われた結果、表示部５のスクリーンの内または外に位置する注視点の座標データが視線検出部１１から停留点検出部１２ａへ出力されることになる。 As a result of the line-of-sight detection, coordinate data of the gazing point located inside or outside the screen of the display unit 5 is output from the line-of-sight detection unit 11 to the stop point detection unit 12a.

視線判定部１２は、視線検出部１１による視線検出の結果を用いて、視線の状態を判定する処理部である。図１に示す通り、視線判定部１２は、停留点検出部１２ａと、画面内判定部１２ｂとを有する。 The line-of-sight determination unit 12 is a processing unit that determines the state of the line of sight using the result of line-of-sight detection by the line-of-sight detection unit 11. As illustrated in FIG. 1, the line-of-sight determination unit 12 includes a stop point detection unit 12a and an in-screen determination unit 12b.

停留点検出部１２ａは、視線検出部１１による視線検出の結果から停留点を検出する処理部である。 The stop point detection unit 12 a is a processing unit that detects a stop point from the result of the line-of-sight detection by the line-of-sight detection unit 11.

一実施形態として、停留点検出部１２ａは、視線検出部１１により検出される注視点が所定の期間、例えば３０ｍｓｅｃ〜３００ｍｓｅｃにわたって所定の範囲内に停留しているか否かを監視する。そして、停留点検出部１２ａは、注視点が所定の期間にわたって所定の範囲内に停留している場合、当該期間にわたって検出された注視点の座標データに所定の統計処理、例えば平均処理を行うことにより、注視点の座標データの代表値を算出する。このようにして得られた注視点の座標データの代表値が停留点として画面内判定部１２ｂへ出力される。 As one embodiment, the stop point detection unit 12a monitors whether or not the gazing point detected by the line-of-sight detection unit 11 is stopped within a predetermined range for a predetermined period, for example, 30 msec to 300 msec. Then, when the gazing point remains within a predetermined range for a predetermined period, the stationary point detection unit 12a performs predetermined statistical processing, for example, average processing, on the coordinate data of the gazing point detected over the period. Thus, the representative value of the coordinate data of the gazing point is calculated. The representative value of the coordinate data of the gazing point obtained in this way is output to the in-screen determination unit 12b as a stop point.

画面内判定部１２ｂは、注視点または停留点が表示部５のスクリーンの内部に存在するか否かを判定する処理部である。 The in-screen determination unit 12 b is a processing unit that determines whether or not a gazing point or a stop point exists inside the screen of the display unit 5.

一実施形態として、画面内判定部１２ｂは、停留点検出部１２ａにより停留点が検出された場合、当該停留点が表示部５のスクリーンの内部に対応する座標の範囲内に存在するか否かを判定する。その後、画面内判定部１２ｂは、停留点の座標データと共に、停留点が表示部５のスクリーンの内部または外部のいずれに存在するのかを示す判定結果を説明状態判定部１６へ出力する。 As one embodiment, when the stop point is detected by the stop point detection unit 12a, the in-screen determination unit 12b determines whether or not the stop point exists within a coordinate range corresponding to the inside of the screen of the display unit 5. Determine. Thereafter, the in-screen determination unit 12b outputs to the explanation state determination unit 16 a determination result indicating whether the stop point exists inside or outside the screen of the display unit 5 together with the coordinate data of the stop point.

音声認識部１３は、音声認識を実行する処理部である。 The voice recognition unit 13 is a processing unit that performs voice recognition.

一実施形態として、音声認識部１３は、プレゼンテーションソフトが文書を開いた状態でプレゼンテーションの開始指示を受け付けた場合に起動し、マイク３から所定時間長の音声信号が入力されるまで待機する。例えば、少なくとも１フレーム分の時間長、例えば１０ｍｓｅｃの音声信号が入力されるのを待機する。そして、音声認識部１３は、マイク３から所定時間長の音声信号が入力される度に、図示しない認識辞書を参照して、当該音声信号にワードスポッティングなどの音声認識を実行する。その後、音声認識部１３は、音声認識の結果として得られた単語を始め、その単語が認識された時刻などの情報を対応付け部１５ｂへ出力する。 As an embodiment, the voice recognition unit 13 is activated when the presentation software receives a presentation start instruction in a state in which the document is opened, and waits until a voice signal having a predetermined time length is input from the microphone 3. For example, it waits for an audio signal having a time length of at least one frame, for example, 10 msec. Then, every time a voice signal having a predetermined time length is input from the microphone 3, the voice recognition unit 13 refers to a recognition dictionary (not shown) and performs voice recognition such as word spotting on the voice signal. After that, the voice recognition unit 13 outputs information such as a word obtained as a result of the voice recognition and the time when the word is recognized to the associating unit 15b.

文書取得部１４は、プレゼンテーションに用いられる文書を取得する処理部である。 The document acquisition unit 14 is a processing unit that acquires a document used for presentation.

一実施形態として、文書取得部１４は、プレゼンテーションの開始指示を受け付けた場合に、プレゼンテーションソフトを始めとするアプリケーションプログラムにより表示部５に表示される文書を取得する。例えば、文書取得部１４は、プレゼンテーションソフトを始め、ワープロソフト、表計算ソフトや画像編集ソフトなどの任意のアプリケーションプログラムによりメモリ上に展開される文書を取得できる。 As one embodiment, when receiving a presentation start instruction, the document acquisition unit 14 acquires a document displayed on the display unit 5 by an application program such as presentation software. For example, the document acquisition unit 14 can acquire a document developed on a memory by an arbitrary application program such as presentation software, word processor software, spreadsheet software, or image editing software.

認識結果判定部１５は、表示部５に表示中のスライドのうち音声認識の結果に対応する説明箇所を判定する処理部である。図１に示す通り、認識結果判定部１５は、説明単位抽出部１５ａと、対応付け部１５ｂとを有する。 The recognition result determination unit 15 is a processing unit that determines an explanation location corresponding to the result of voice recognition among the slides displayed on the display unit 5. As shown in FIG. 1, the recognition result determination unit 15 includes an explanation unit extraction unit 15a and an association unit 15b.

説明単位抽出部１５ａは、文書に含まれるスライドを分割することにより得られた区間を説明単位として抽出する処理部である。 The explanation unit extraction unit 15a is a processing unit that extracts a section obtained by dividing a slide included in a document as an explanation unit.

一実施形態として、説明単位抽出部１５ａは、文書取得部１４により取得された文書に含まれる各スライドを当該スライドが含むインデント情報などを用いて段落、行や一文などの単位で分割する。例えば、説明単位抽出部１５ａは、スライドが含む文字列を走査して、スペース、句点または改行に対応する区切り文字を検出し、当該区切り文字を境界に設定する。かかる境界を前後に、説明単位抽出部１５ａは、スライドが含む文字列を区切る。これによって、スライドが複数の区間へ区切り文字ごとに分割される。このようにしてスライドが複数の区間へ分割された後に、説明単位抽出部１５ａは、区間ごとに当該区間がスライド上で形成する座標の範囲、例えば区間の左上及び右下の頂点の座標と当該区間に含まれる単語とを説明単位として対応付け部１５ｂへ出力する。なお、ここでは、スライドを自動的に分割する場合を例示したが、入力デバイス等を介して区間の境界を指定させることによってスライドを手動設定で分割することとしてもかまわない。 As an embodiment, the explanation unit extraction unit 15a divides each slide included in the document acquired by the document acquisition unit 14 into units such as paragraphs, lines, and sentences using indent information included in the slide. For example, the explanation unit extracting unit 15a scans a character string included in the slide, detects a delimiter character corresponding to a space, a punctuation point, or a line feed, and sets the delimiter character as a boundary. The explanation unit extraction unit 15a divides the character string included in the slide before and after such a boundary. As a result, the slide is divided into a plurality of sections for each delimiter. After the slide is divided into a plurality of sections in this way, the explanation unit extracting unit 15a determines, for each section, the range of coordinates that the section forms on the slide, for example, the coordinates of the upper left and lower right vertices of the section and the coordinates The word included in the section is output to the associating unit 15b as an explanation unit. Although the case where the slide is automatically divided is illustrated here, the slide may be manually divided by designating the boundary of the section via an input device or the like.

対応付け部１５ｂは、音声認識の結果と文書中の説明箇所との対応付けを行う処理部である。 The associating unit 15b is a processing unit that associates the result of speech recognition with the explanation location in the document.

一実施形態として、対応付け部１５ｂは、音声認識部１３により認識される単語と、説明単位抽出部１５ａにより抽出された説明単位のうち表示部５に表示中のスライド内の説明単位に含まれる単語とを比較する。その上で、対応付け部１５ｂは、音声認識部１３により認識される単語と一致する単語が最も多く含まれる説明単位を、音声認識に対応する説明箇所として説明状態判定部１６へ出力する。 As an embodiment, the association unit 15b is included in the explanation unit in the slide being displayed on the display unit 5 among the word recognized by the speech recognition unit 13 and the explanation unit extracted by the explanation unit extraction unit 15a. Compare with a word. After that, the associating unit 15b outputs the explanation unit including the most words that match the words recognized by the speech recognizing unit 13 to the explanation state determining unit 16 as the explanation location corresponding to the speech recognition.

説明状態判定部１６は、視線判定部１２による判定結果及び認識結果判定部１５による判定結果に基づいて、音声認識に対応する説明箇所と視線検出に対応する説明箇所とが一致する一致状態、これらの説明箇所が一致しない不一致状態または視線が画面内に検出されない非目視状態のうちいずれの状態に説明状態が該当するのかを判定する処理部である。 Based on the determination result by the line-of-sight determination unit 12 and the determination result by the recognition result determination unit 15, the explanation state determination unit 16 matches the description portion corresponding to the voice recognition and the description portion corresponding to the line-of-sight detection, It is a process part which determines whether the description state corresponds to the non-coincidence state where the description part does not correspond or the non-viewing state where the line of sight is not detected in the screen.

一実施形態として、説明状態判定部１６は、画面内判定部１２ｂにより停留点が表示部５のスクリーンの外部に存在すると判定された場合、説明状態が「非目視状態」であると判定する。一方、説明状態判定部１６は、画面内判定部１２ｂにより停留点が表示部５のスクリーンの内部に存在すると判定された場合、停留点検出部１２ａにより検出された停留点の座標位置が対応付け部１５ｂにより音声認識の結果と対応付けられた説明単位の座標範囲内にあるか否かを判定する。このとき、説明状態判定部１６は、説明単位よりもサイズが大きく、かつ説明単位の座標範囲を包含する許容座標範囲を設定し、当該許容座標範囲内に停留点検出部１２ａにより検出された停留点の座標位置が存在するか否かを判定することにより、視線の誤差を吸収することができる。このように許容座標範囲を設定する場合、説明単位の座標範囲の幅方向及び高さ方向の両方を拡張することとしてもよいし、いずれか一方を拡張することとしてもかまわない。そして、説明状態判定部１６は、停留点の座標位置が上記の許容座標範囲内に存在する場合に説明状態が「一致状態」であると判定する一方で、停留点の座標位置が上記の許容座標範囲外に存在する場合に説明状態が「不一致状態」であると判定する。その後、説明状態判定部１６は、判定結果として得られた説明状態を履歴記憶部１６ａ及び説明箇所推定部１７へ出力する。これによって、履歴記憶部１６ａには、説明状態が判定される度に説明状態の履歴が記憶されることになる。 As one embodiment, the explanatory state determination unit 16 determines that the explanatory state is the “non-visual state” when the in-screen determination unit 12b determines that the stop point exists outside the screen of the display unit 5. On the other hand, when the in-screen determination unit 12b determines that the stop point exists inside the screen of the display unit 5, the explanation state determination unit 16 associates the coordinate position of the stop point detected by the stop point detection unit 12a with each other. It is determined by the part 15b whether it exists in the coordinate range of the description unit matched with the result of speech recognition. At this time, the explanation state determination unit 16 sets an allowable coordinate range that is larger than the explanation unit and includes the coordinate range of the explanation unit, and the stopping point detected by the stopping point detection unit 12a within the allowable coordinate range. By determining whether or not the coordinate position of the point exists, the line-of-sight error can be absorbed. When the allowable coordinate range is set in this way, both the width direction and the height direction of the coordinate range of the explanation unit may be expanded, or one of them may be expanded. Then, the explanation state determination unit 16 determines that the explanation state is “matched state” when the coordinate position of the stop point is within the above-described allowable coordinate range, while the coordinate position of the stop point is the above-described allowable position. If it exists outside the coordinate range, it is determined that the explanation state is a “mismatch state”. Thereafter, the explanation state determination unit 16 outputs the explanation state obtained as a determination result to the history storage unit 16 a and the explanation part estimation unit 17. Thus, the history storage unit 16a stores the history of the explanation state every time the explanation state is determined.

説明箇所推定部１７は、説明状態判定部１６により判定された説明状態に基づいて説明箇所を推定する処理部である。 The explanation location estimation unit 17 is a processing unit that estimates the explanation location based on the explanation state determined by the explanation state determination unit 16.

図２は、説明状態の一例を示す図である。図２に示すように、説明状態は、「一致状態」、「不一致状態」及び「非目視状態」の間を遷移する。これら「一致状態」、「不一致状態」及び「非目視状態」は、コンピュータが音声認識の結果および視線検出の結果から推定できる状態であるが、現実の状況とは次のように対応する。 FIG. 2 is a diagram illustrating an example of an explanation state. As shown in FIG. 2, the explanation state transitions between “matching state”, “mismatching state”, and “non-viewing state”. These “coincidence state”, “non-coincidence state”, and “non-viewing state” are states that the computer can estimate from the result of speech recognition and the result of line-of-sight detection, and correspond to the actual situation as follows.

すなわち、「一致状態」は、現在の説明箇所を目で追いながら音声で説明している同期状態に対応する一方で、「不一致状態」は、音声は現在の説明箇所を説明しながら、視線は次の説明箇所を確認するために現在の説明箇所にはない非同期状態に対応すると推定できる。また、「非目視状態」は、資料は見ずに、聴衆に向いて説明している状態に対応する。ただし、「一致状態」には、上記の非同期状態でたまたま、視線の先に現在の説明箇所に含まれる言葉と同じ言葉が含まれる場合、非同期状態でも、一時的に同期状態と推定されるケースが含まれる。また、「不一致状態」には、上記の同期状態に、音声誤認識が生じると一時的に非同期状態と推定されるケースが含まれる。 That is, the “matching state” corresponds to the synchronization state explained by voice while following the current explanation part, while the “mismatching state” means that the voice explains the current explanation part and the line of sight is In order to confirm the next explanation part, it can be estimated that it corresponds to the asynchronous state which is not in the present explanation part. The “non-viewing state” corresponds to a state that is explained to the audience without looking at the material. However, if the “matching state” happens to be in the asynchronous state as described above, and the same words as the words included in the current explanation part are included at the end of the line of sight, the asynchronous state is temporarily estimated even in the asynchronous state. Is included. In addition, the “mismatch state” includes a case where when the voice misrecognition occurs in the above synchronization state, it is temporarily assumed to be an asynchronous state.

このように、「一致状態」は、原則、同期状態に対応するが、例外として、一部の非同期状態を含む一方で、「不一致状態」は、原則、非同期状態に対応するが、例外として、一部の同期状態を含む。 In this way, the “matching state” basically corresponds to the synchronous state, but includes some asynchronous states as an exception, while the “mismatched state” basically corresponds to the asynchronous state, but as an exception, Includes some synchronization states.

ここで、上記の例外は、いずれも一時的な状態として現れるため、図３に示すように、説明箇所推定部１７は、説明状態判定部１６により判定された説明状態と、履歴記憶部１６ａに記憶された説明状態の過去の履歴とから説明箇所を推定することで、例外を除外でき、一致状態から同期状態、不一致状態から非同期状態を正確に判定できる。 Here, since all of the above exceptions appear as a temporary state, as shown in FIG. 3, the explanation location estimation unit 17 stores the explanation state determined by the explanation state determination unit 16 and the history storage unit 16a. By estimating the explanation location from the past history of the explanation state stored, the exception can be excluded, and the asynchronous state can be accurately determined from the coincidence state and the synchronization state.

図３は、説明状態の継続性と説明箇所の対応関係の一例を示す図である。図３に示すように、説明状態判定部１６により「一致状態」と判定された場合、説明箇所推定部１７は、一致状態が所定の時間以上にわたって継続しているか否かを判定する。このとき、一致状態が所定の時間以上にわたって継続している場合、説明箇所推定部１７は、音声認識の結果および視線検出の結果が一致する説明単位を説明箇所として推定する。一方、一致状態が所定の時間以上にわたって継続していない場合、説明箇所推定部１７は、現在の説明単位、すなわち１時刻前の前回に説明箇所として推定していた説明単位を説明箇所として引き継いで推定する。 FIG. 3 is a diagram illustrating an example of the correspondence between the continuity of the explanation state and the explanation location. As illustrated in FIG. 3, when the description state determination unit 16 determines “matched state”, the description location estimation unit 17 determines whether or not the match state continues for a predetermined time or more. At this time, when the coincidence state continues for a predetermined time or longer, the explanation location estimation unit 17 estimates an explanation unit in which the speech recognition result and the line-of-sight detection result match as an explanation location. On the other hand, when the coincidence state does not continue for a predetermined time or longer, the explanation location estimation unit 17 takes over the current explanation unit, that is, the explanation unit estimated as the explanation location one time before the previous explanation time as the explanation location. presume.

また、説明状態判定部１６により「不一致状態」と判定された場合、説明箇所推定部１７は、不一致状態が所定の時間以上にわたって継続しているか否かを判定する。このとき、不一致状態が所定の時間以上にわたって継続している場合、説明箇所推定部１７は、音声認識に対応する説明単位を説明箇所として推定する。一方、不一致状態が所定の時間以上にわたって継続していない場合、説明箇所推定部１７は、現在の説明単位、すなわち１時刻前の前回に説明箇所として推定していた説明単位を説明箇所として引き継いで推定する。 When the explanation state determination unit 16 determines that the state does not match, the explanation location estimation unit 17 determines whether or not the mismatch state continues for a predetermined time or more. At this time, when the disagreement state continues for a predetermined time or longer, the explanation location estimation unit 17 estimates the explanation unit corresponding to the speech recognition as the explanation location. On the other hand, when the inconsistent state has not continued for a predetermined time or more, the explanation location estimation unit 17 takes over the current explanation unit, that is, the explanation unit estimated as the explanation location one time before the previous explanation time as the explanation location. presume.

また、説明状態判定部１６により「非目視状態」と判定された場合、説明箇所推定部１７は、音声認識に対応する説明単位を説明箇所として推定する。これは、「一致状態」や「不一致状態」のような例外がないと見做すことができるので、継続性を問わず、「非目視状態」と判定された時点で、資料は見ずに、聴衆に向いて説明している状態と推定できるからである。 When the explanation state determination unit 16 determines that the state is “non-viewing state”, the explanation part estimation unit 17 estimates the explanation unit corresponding to the speech recognition as the explanation part. Since it can be assumed that there are no exceptions such as “matched state” and “mismatched state”, it is not necessary to look at the materials when it is judged as “non-viewing state” regardless of continuity. This is because it can be estimated that the situation is being explained to the audience.

なお、ここでは、継続性を判定するために同一の説明状態が所定の時間以上にわたって継続しているかどうかを判定する場合を例示したが、所定の回数にわたって継続しているかどうかを判定することとしてもかまわない。 Here, the case where it is determined whether or not the same explanation state continues for a predetermined time or more in order to determine the continuity is illustrated, but it is determined that it is determined whether or not it continues for a predetermined number of times. It doesn't matter.

強調表示制御部１８は、説明箇所推定部１７により推定された説明箇所の強調表示を実行する処理部である。ここで言う「強調表示」は、狭義のハイライト表示、すなわち背景色を明るくしたり、反転したりする表示制御に留まらず、広義のハイライト表示を意味する。例えば、説明箇所の囲み表示、説明箇所の塗りつぶしの強調、フォント（フォントサイズ、下線や斜体）の強調などのように、強調表示の全般を任意に実行することができる。 The highlight display control unit 18 is a processing unit that executes highlight display of the explanation portion estimated by the explanation portion estimation unit 17. The “highlight display” here refers to a highlight display in a narrow sense, that is, not just a display control that brightens or inverts a background color, but a broad highlight display. For example, it is possible to arbitrarily execute overall highlighting such as enclosing an explanation part, emphasizing the filling of the explanation part, and emphasizing a font (font size, underline or italic).

なお、上記の視線検出部１１、視線判定部１２、音声認識部１３、文書取得部１４、認識結果判定部１５、説明状態判定部１６、説明箇所推定部１７及び強調表示制御部１８などの機能部は、次のようにして実装できる。例えば、中央処理装置、いわゆるＣＰＵ（Central Processing Unit）などに、上記の各処理部と同様の機能を発揮するプロセスをメモリ上に展開して実行させることにより実現できる。これらの処理部は、必ずしも中央処理装置で実行されずともよく、ＭＰＵ（Micro Processing Unit）に実行させることとしてもよい。また、上記の各処理部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などのハードワイヤードロジックによっても実現できる。 Note that the functions of the line-of-sight detection unit 11, the line-of-sight determination unit 12, the speech recognition unit 13, the document acquisition unit 14, the recognition result determination unit 15, the description state determination unit 16, the description location estimation unit 17, and the highlight display control unit 18 are described. The part can be implemented as follows. For example, it can be realized by causing a central processing unit, a so-called CPU (Central Processing Unit), or the like to develop and execute a process that exhibits the same function as each of the above-described processing units on a memory. These processing units do not necessarily have to be executed by the central processing unit, but may be executed by an MPU (Micro Processing Unit). Each processing unit described above can also be realized by a hard wired logic such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).

また、上記の履歴記憶部１６ａには、一例として、各種の半導体メモリ素子、例えばＲＡＭ（Random Access Memory)やフラッシュメモリを採用できる。また、上記の履歴記憶部１６ａは、必ずしも主記憶装置でなくともよく、補助記憶装置であってもかまわない。この場合、ＨＤＤ（Hard Disk Drive）、光ディスクやＳＳＤ（Solid State Drive）などを採用できる。 The history storage unit 16a may employ various types of semiconductor memory elements such as a RAM (Random Access Memory) and a flash memory as an example. The history storage unit 16a is not necessarily a main storage device and may be an auxiliary storage device. In this case, an HDD (Hard Disk Drive), an optical disk, an SSD (Solid State Drive), or the like can be employed.

図４は、実施例１に係る説明支援処理の手順を示すフローチャートである。この処理は、プレゼンテーションソフトが文書を開いた状態でプレゼンテーションの開始指示を受け付けた場合に起動し、プレゼンテーションの終了指示を受け付けるまで繰返し実行される。 FIG. 4 is a flowchart illustrating the procedure of the explanation support process according to the first embodiment. This process is started when the presentation software receives a presentation start instruction with the document open, and is repeatedly executed until a presentation end instruction is received.

図４に示すように、視線検出部１１は、カメラ１から出力された画像に視線検出を実行することにより注視点を検出する（ステップＳ１０１）。続いて、停留点検出部１２ａは、ステップＳ１０１で検出された注視点が所定の期間、例えば３０ｍｓｅｃ〜３００ｍｓｅｃにわたって所定の範囲内に停留しているか否かを監視することにより、停留点を検出する（ステップＳ１０２）。 As illustrated in FIG. 4, the line-of-sight detection unit 11 detects the point of sight by performing line-of-sight detection on the image output from the camera 1 (step S <b> 101). Subsequently, the stop point detection unit 12a detects the stop point by monitoring whether or not the gazing point detected in step S101 is stopped within a predetermined range for a predetermined period, for example, 30 msec to 300 msec. (Step S102).

その後、画面内判定部１２ｂは、ステップＳ１０２で検出された停留点が表示部５のスクリーンの内部に対応する座標の範囲内に存在するか否かを判定する（ステップＳ１０３）。 Thereafter, the in-screen determination unit 12b determines whether or not the stop point detected in step S102 is within a coordinate range corresponding to the inside of the screen of the display unit 5 (step S103).

また、音声認識部１３は、図示しない認識辞書を参照して、マイク３から入力された音声信号にワードスポッティングなどの音声認識を実行する（ステップＳ１０４）。そして、説明単位抽出部１５ａは、表示部５に表示中であるスライドを当該スライドが含むインデント情報などを用いて段落、行や一文などの単位で分割することにより、スライドが分割された区間の座標範囲と当該区間に含まれる単語とを説明単位として抽出する（ステップＳ１０５）。 Further, the speech recognition unit 13 refers to a recognition dictionary (not shown) and performs speech recognition such as word spotting on the speech signal input from the microphone 3 (step S104). Then, the explanation unit extraction unit 15a divides the slide being displayed on the display unit 5 into units of paragraphs, lines, one sentence, and the like using indent information included in the slide, and the like. The coordinate range and the words included in the section are extracted as explanation units (step S105).

その後、対応付け部１５ｂは、ステップＳ１０４による音声認識の結果と、ステップＳ１０５で抽出された説明単位とを対応付ける（ステップＳ１０６）。すなわち、対応付け部１５ｂは、ステップＳ１０５で抽出された説明単位のうち、ステップＳ１０４で認識された単語と一致する単語が最も多く含まれる説明単位を、音声認識に対応する説明箇所として出力する。 Thereafter, the associating unit 15b associates the result of the speech recognition in step S104 with the explanation unit extracted in step S105 (step S106). That is, the associating unit 15b outputs the explanation unit containing the most words that match the word recognized in step S104 among the explanation units extracted in step S105 as the explanation location corresponding to the speech recognition.

その上で、説明状態判定部１６は、ステップＳ１０２〜ステップＳ１０３による視線の判定結果及びステップＳ１０６による音声認識の対応付けの結果に基づいて、説明状態が「一致状態」、「不一致状態」または「非目視状態」のうちいずれの状態に該当するのかを判定する（ステップＳ１０７）。 In addition, the explanation state determination unit 16 determines that the explanation state is “matched state”, “mismatched state”, or “based on the result of line-of-sight determination in steps S102 to S103 and the result of voice recognition association in step S106. It is determined which of the “non-visual state” corresponds to the state (step S107).

そして、説明箇所推定部１７は、ステップＳ１０７で判定された説明状態と、履歴記憶部１６ａに記憶された説明状態の履歴とから説明状態の継続性があるか否かにより、説明箇所を推定する（ステップＳ１０８）。その上で、強調表示制御部１８は、ステップＳ１０８で推定された説明箇所の強調表示を実行し（ステップＳ１０９）、処理を終了する。 Then, the explanation location estimation unit 17 estimates the explanation location based on whether or not there is continuity of the explanation state from the explanation state determined in step S107 and the explanation state history stored in the history storage unit 16a. (Step S108). After that, the highlighting control unit 18 executes highlighting of the explanation portion estimated in step S108 (step S109), and ends the process.

［効果の一側面］
上述してきたように、本実施例に係る説明支援装置１０は、音声認識及び視線検出の説明箇所が一致する一致状態、これらが一致しない不一致状態、視線が画面内にない非目視状態のいずれの説明状態であるかにより、音声、視線又は前回の推定結果に基づく説明箇所を強調表示する。したがって、本実施例に係る説明支援装置１０によれば、説明箇所の推定精度の低下を抑制できる。 [One aspect of effect]
As described above, the explanation support apparatus 10 according to the present embodiment has any of the coincidence state in which the explanation portions of the voice recognition and the line-of-sight detection coincide, the disagreement state in which they do not coincide, and the non-visual state in which the line of sight is not in the screen. Depending on whether the state is the explanation state, the explanation portion based on the voice, the line of sight or the previous estimation result is highlighted. Therefore, according to the explanation support apparatus 10 according to the present embodiment, it is possible to suppress a decrease in the estimation accuracy of the explanation location.

上記の実施例１では、説明状態の継続性の有無により一致状態から同期状態、不一致状態から非同期状態を判定する場合を例示したが、かかる判定を他の方法により実現することもできる。そこで、本実施例では、視線の移動方向および視線の移動速度により、一致状態から同期状態、不一致状態から非同期状態を判定する場合を判定する場合を例示する。 In the first embodiment, the case where the synchronous state is determined from the coincidence state and the asynchronous state is determined from the disagreement state based on the presence or absence of the continuity of the explanation state has been exemplified. Therefore, in the present embodiment, a case in which the case where the synchronous state is determined from the coincidence state and the asynchronous state is determined from the disagreement state is illustrated based on the moving direction of the visual line and the moving speed of the visual line.

図５は、実施例２に係る説明支援装置の機能的構成を示す図である。図５に示す説明支援装置２０は、図１に示した説明支援装置１０と比べて、視線判定部２１内に移動方向検出部２１ａ、移動速度算出部２１ｂ及び音読状態判定部２１ｃを有する点、説明箇所推定部２２の判定ロジックが相違する点、さらには、履歴記憶部１６ａが不要である点が異なる。なお、ここでは、図１に示した説明支援装置１０と同一の機能を発揮する処理部には同一の符号を付し、その説明を省略することとする。 FIG. 5 is a diagram illustrating a functional configuration of the explanation support apparatus according to the second embodiment. Compared to the explanation support apparatus 10 shown in FIG. 1, the explanation support apparatus 20 shown in FIG. 5 has a movement direction detection section 21a, a movement speed calculation section 21b, and a reading state determination section 21c in the line-of-sight determination section 21. The difference is that the determination logic of the explanation location estimation unit 22 is different, and the history storage unit 16a is unnecessary. Here, the same reference numerals are given to the processing units that exhibit the same functions as those of the explanation support apparatus 10 shown in FIG. 1, and the explanation thereof is omitted.

移動方向検出部２１ａは、視線の移動方向を検出する処理部である。 The movement direction detection unit 21a is a processing unit that detects the movement direction of the line of sight.

一実施形態として、移動方向検出部２１ａは、停留点検出部１２ａにより検出された停留点が水平方向または垂直方向のいずれに移動しているか否かを判定する。例えば、移動方向検出部２１ａは、過去の所定時間、例えば数秒にわたって検出された停留点を結ぶ近似直線を求める。その上で、移動方向検出部２１ａは、当該近似直線が水平方向から所定の範囲、例えば３０度以外である場合には移動方向を「水平方向」と検出し、当該近似直線が垂直方向から所定の範囲、例えば３０度以外である場合には移動方向を「垂直方向」と検出する。 As one embodiment, the movement direction detection unit 21a determines whether the stationary point detected by the stationary point detection unit 12a is moving in the horizontal direction or the vertical direction. For example, the moving direction detection unit 21a obtains an approximate straight line connecting stop points detected over a predetermined past time, for example, several seconds. Then, the movement direction detection unit 21a detects the movement direction as “horizontal direction” when the approximate straight line is outside a predetermined range from the horizontal direction, for example, 30 degrees, and the approximate straight line is predetermined from the vertical direction. If the angle is in a range other than 30 degrees, for example, the moving direction is detected as “vertical direction”.

移動速度算出部２１ｂは、視線の移動速度を算出する処理部である。 The movement speed calculation unit 21b is a processing unit that calculates the movement speed of the line of sight.

一実施形態として、移動速度算出部２１ｂは、一定時間における停留点の移動範囲を特定した上で当該移動範囲内に存在する文字を探索することにより、移動範囲に含まれる文字を抽出する。続いて、移動速度算出部２１ｂは、当該移動範囲内に含まれる文字を形態素解析で読み情報、すなわち表音文字列に変換し、モーラ数を求める。その上で、移動速度算出部２１ｂは、当該モーラ数を一定時間で割ることで、平均移動速度（モーラ／ｓｅｃ）を算出する。 As one embodiment, the movement speed calculation unit 21b specifies a movement range of a stop point at a certain time, and then searches for characters existing in the movement range, thereby extracting characters included in the movement range. Subsequently, the movement speed calculation unit 21b converts the characters included in the movement range into reading information, that is, a phonetic character string by morphological analysis, and obtains the number of mora. Then, the moving speed calculation unit 21b calculates the average moving speed (mora / sec) by dividing the number of mora by a certain time.

音読状態判定部２１ｃは、音読状態であるか否かを判定する処理部である。音読状態にない場合は黙読状態と判定する。 The reading aloud state determination unit 21c is a processing unit that determines whether or not the reading aloud state is in effect. When not in a reading state, it is determined as a silent reading state.

一実施形態として、音読状態判定部２１ｃは、下記図６に示す知見にしたがって音読状態であるか否かを判定する。図６は、停留点の動きの一例を示す図である。図６には、スライドが横書きである場合の例が図示されているが、縦書きの場合には水平を垂直へ、垂直を水平に読み替えることにより、同様のことが言える。図６に示すように、音読状態は、文字を継続して発声するという活動の特性から、眼球運動の自由度は黙読時よりも低下することが考えられ、読み飛ばしや読み戻りが生起しにくくなると考えられる。さらに、音読状態では、文中に注視点が出現した後、視線は文章を辿っていき、黙読に比し読み戻りは少ない。一方、黙読状態の場合には、注視点がランダムに動いたり、縦方向に動いたりする傾向にある。 As one embodiment, the reading aloud state determination unit 21c determines whether or not it is in a reading aloud state according to the knowledge shown in FIG. FIG. 6 is a diagram illustrating an example of movement of a stop point. FIG. 6 shows an example in which the slide is horizontally written, but in the case of vertically written, the same can be said by rereading the horizontal to the vertical and the vertical to the horizontal. As shown in FIG. 6, in the state of reading aloud, it is conceivable that the degree of freedom of eye movement is lower than that during silent reading because of the activity of uttering characters continuously, and skipping and reading are less likely to occur. It is considered to be. Furthermore, in a state of reading aloud, after a gazing point appears in the sentence, the line of sight follows the sentence, and reading back is less than in silent reading. On the other hand, in the silent reading state, the gazing point tends to move randomly or vertically.

これらのことから、音読状態判定部２１ｃは、視線の移動方向が水平方向であり、かつ視線の移動速度が所定の速度以内、例えば７または８モーラ／ｓｅｃ以内である場合に、音読状態と判定する。音読状態にない場合は黙読状態と判定する。なお、ここでは、音読状態または黙読状態の判定に移動方向及び移動速度の両方を用いる場合を例示したが、必ずしも移動方向及び移動速度の２つともを用いずともよく、いずれか一方だけを用いることもできる。 For these reasons, the reading aloud state determination unit 21c determines that the reading aloud state occurs when the line-of-sight moving direction is the horizontal direction and the line-of-sight moving speed is within a predetermined speed, for example, 7 or 8 mora / sec. To do. When not in a reading state, it is determined as a silent reading state. In addition, although the case where both the moving direction and the moving speed are used for the determination of the reading aloud state or the silent reading state is illustrated here, both the moving direction and the moving speed are not necessarily used, and only one of them is used. You can also.

説明箇所推定部２２は、図１に示した説明箇所推定部１７と同様、説明状態判定部１６により判定された説明状態に基づいて説明箇所を推定するが、音読状態であるか否かにより、一致状態及び不一致状態の原則と例外を弁別する点が異なる。 The explanation location estimation unit 22 estimates the explanation location based on the explanation state determined by the explanation state determination unit 16 as with the explanation location estimation unit 17 shown in FIG. The difference is in distinguishing between the principle of coincidence and inconsistency and exceptions.

図７は、説明状態及び音読状態と説明箇所との対応関係の一例を示す図である。図７に示すように、説明箇所推定部２２は、説明状態判定部１６により「一致状態」と判定された場合、かつ音読状態判定部２１ｃにより「音読状態」と判定された場合には、同期状態であると推定する。この場合、説明箇所推定部２２は、音声認識の結果および視線検出の結果が一致する説明単位を説明箇所として推定する。一方、説明箇所推定部２２は、説明状態判定部１６により「一致状態」と判定された場合、かつ音読状態判定部２１ｃにより「黙読状態」と判定された場合には、非同期状態の例外であると推定する。この場合、説明箇所推定部２２は、音声認識に対応する説明単位を説明箇所として推定する。 FIG. 7 is a diagram illustrating an example of a correspondence relationship between the explanation state and the reading aloud state and the explanation portion. As illustrated in FIG. 7, the explanation location estimation unit 22 is synchronized when the explanation state determination unit 16 determines “matching state” and when the reading state determination unit 21 c determines “reading state”. Presumed to be in a state. In this case, the explanation location estimation unit 22 estimates an explanation unit in which the speech recognition result and the line-of-sight detection result match as an explanation location. On the other hand, the explanation location estimation unit 22 is an exception in the asynchronous state when the explanation state determination unit 16 determines “matching state” and when the reading state determination unit 21 c determines “silent reading state”. Estimated. In this case, the explanation location estimation unit 22 estimates the explanation unit corresponding to the speech recognition as the explanation location.

また、説明箇所推定部２２は、説明状態判定部１６により「不一致状態」と判定された場合、かつ音読状態判定部２１ｃにより「音読状態」と判定された場合には、同期状態の例外であると推定する。この場合、説明箇所推定部２２は、視線検出に対応する説明単位を説明箇所として推定する。一方、説明箇所推定部２２は、説明状態判定部１６により「不一致状態」と判定された場合、かつ音読状態判定部２１ｃにより「黙読状態」と判定された場合には、非同期状態であると推定する。この場合、説明箇所推定部２２は、音声認識に対応する説明単位を説明箇所として推定する。 The explanation location estimation unit 22 is an exception to the synchronization state when the explanation state determination unit 16 determines that it is in a “mismatched state” and when it is determined as “reading state” by the reading aloud state determination unit 21c. Estimated. In this case, the explanation location estimation unit 22 estimates the explanation unit corresponding to the gaze detection as the explanation location. On the other hand, the explanation location estimation unit 22 estimates that the state is an asynchronous state when the explanation state determination unit 16 determines the “mismatch state” and when the reading state determination unit 21c determines the “silent reading state”. To do. In this case, the explanation location estimation unit 22 estimates the explanation unit corresponding to the speech recognition as the explanation location.

なお、説明箇所推定部２２は、説明状態判定部１６により「非目視状態」と判定された場合、音声認識に対応する説明単位を説明箇所として推定する。これは、「一致状態」や「不一致状態」のような例外がないと見做すことができるので、「非目視状態」と判定された時点で、資料は見ずに、聴衆に向いて説明している状態と推定できるからである。 In addition, the description location estimation part 22 estimates the description unit corresponding to speech recognition as an explanation location, when the description state determination part 16 determines with a "non-viewing state". Since it can be assumed that there are no exceptions such as “matched state” and “mismatched state”, it is explained to the audience without looking at the materials when judged as “non-viewing state”. This is because it can be estimated that the current state is in progress.

図８は、実施例２に係る説明支援処理の手順を示すフローチャートである。この処理は、図４に示した説明支援処理と同様、プレゼンテーションソフトが文書を開いた状態でプレゼンテーションの開始指示を受け付けた場合に起動し、プレゼンテーションの終了指示を受け付けるまで繰返し実行される。 FIG. 8 is a flowchart illustrating the procedure of the explanation support process according to the second embodiment. Similar to the explanation support process shown in FIG. 4, this process is started when the presentation software receives a presentation start instruction with the document open, and is repeatedly executed until a presentation end instruction is received.

図８に示すように、視線検出部１１は、カメラ１から出力された画像に視線検出を実行することにより注視点を検出する（ステップＳ１０１）。続いて、停留点検出部１２ａは、ステップＳ１０１で検出された注視点が所定の期間、例えば３０ｍｓｅｃ〜３００ｍｓｅｃにわたって所定の範囲内に停留しているか否かを監視することにより、停留点を検出する（ステップＳ１０２）。 As illustrated in FIG. 8, the line-of-sight detection unit 11 detects the point of sight by performing line-of-sight detection on the image output from the camera 1 (step S101). Subsequently, the stop point detection unit 12a detects the stop point by monitoring whether or not the gazing point detected in step S101 is stopped within a predetermined range for a predetermined period, for example, 30 msec to 300 msec. (Step S102).

ここで、停留点が表示部５のスクリーンの内部に対応する座標の範囲内に存在する場合（ステップＳ２０１Ｙｅｓ）、音読状態判定部２１ｃは、視線の移動方向が水平方向であり、かつ視線の移動速度が所定の速度以内であるか否かにより、音読状態または黙読状態のいずれに該当するかを判定する（ステップＳ２０２）。なお、停留点が表示部５のスクリーンの内部に対応する座標の範囲内に存在しない場合（ステップＳ２０１Ｎｏ）、ステップＳ２０２の処理をとばし、ステップＳ１０４の処理へ移行する。 Here, when the stop point exists within the coordinate range corresponding to the inside of the screen of the display unit 5 (Yes in step S201), the reading state determination unit 21c determines that the movement direction of the line of sight is the horizontal direction and the movement of the line of sight. Whether the speed falls within a predetermined speed or not is determined as to whether the reading state is silent or silent (step S202). When the stop point does not exist within the coordinate range corresponding to the inside of the screen of the display unit 5 (No in step S201), the process of step S202 is skipped and the process proceeds to step S104.

そして、説明箇所推定部２２は、ステップＳ１０７で判定された説明状態と、ステップＳ２０２で判定された音読状態の判定結果とから説明箇所を推定する（ステップＳ２０３）。その上で、強調表示制御部１８は、ステップＳ２０３で推定された説明箇所の強調表示を実行し（ステップＳ１０９）、処理を終了する。 And the description location estimation part 22 estimates an explanation location from the description state determined by step S107, and the determination result of the reading aloud state determined by step S202 (step S203). After that, the highlighting control unit 18 executes highlighting of the explanation portion estimated in step S203 (step S109), and ends the process.

［効果の一側面］
上述してきたように、本実施例に係る説明支援装置２０は、視線の移動方向および視線の移動速度から音読状態または黙読状態であるかを判定し、音読状態または黙読状態のいずれであるかにより、一致状態及び不一致状態の原則と例外を弁別する。したがって、本実施例に係る説明支援装置２０によれば、上記の実施例１と同様、説明箇所の推定精度の低下を抑制できる。 [One aspect of effect]
As described above, the explanation support apparatus 20 according to the present embodiment determines whether the reading state or the silent reading state is determined from the moving direction of the line of sight and the moving speed of the line of sight, and depends on whether the reading state or the silent reading state is set. Distinguish between principles and exceptions of consistent and inconsistent states. Therefore, according to the explanation support apparatus 20 according to the present embodiment, as in the first embodiment, it is possible to suppress a decrease in the estimation accuracy of the explanation location.

さて、これまで開示の装置に関する実施例について説明したが、本発明は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。そこで、以下では、本発明に含まれる他の実施例を説明する。 Although the embodiments related to the disclosed apparatus have been described above, the present invention may be implemented in various different forms other than the above-described embodiments. Therefore, another embodiment included in the present invention will be described below.

［文書ファイルの応用例］
上記の実施例１では、プレゼンテーションソフトによって作成された文書を用いる場合を例示したが、他のアプリケーションプログラムによって作成された文書を用いることもできる。すなわち、表示時に画面単位で表示されるページを含む文書ファイルであれば、ワープロソフトの文書ファイルが有するページをスライドに読み替えたり、表計算ソフトの文書ファイルが有するシートをスライドに読み替えることによって図４や図８に示した処理を同様に適用できる。 [Application examples of document files]
In the first embodiment, the case where a document created by presentation software is used has been exemplified. However, a document created by another application program can also be used. In other words, if the document file includes a page that is displayed on a screen-by-screen basis, the page of the word processing software document file is replaced with a slide, or the sheet of the spreadsheet software document file is replaced with a slide. The process shown in FIG. 8 can be applied in the same manner.

［他の実装例］
上記の実施例１では、説明支援装置１０が上記のプレゼンテーションソフトを外部のリソースに依存せずに単独で実行するスタンドアローンで上記の説明支援処理を実行する場合を例示したが、他の実装形態を採用することもできる。例えば、プレゼンテーションソフトを実行するクライアントに対し、上記の説明支援処理に対応する説明支援サービスを提供するサーバを設けることによってクライアントサーバシステムとして構築することもできる。この場合、パッケージソフトウェアやオンラインソフトウェアとして上記の説明支援サービスを実現する説明支援プログラムをインストールさせることによってサーバ装置を実装できる。例えば、サーバ装置は、上記の説明支援サービスを提供するＷｅｂサーバとして実装することとしてもよいし、アウトソーシングによって上記の説明支援サービスを提供するクラウドとして実装することとしてもかまわない。この場合、クライアントは、サーバ装置に対し、プレゼンテーションに用いる文書及びプレゼンテーションを実施する会場の識別情報をアップロードした後に、プレゼンテーションが開始される。プレゼンテーションが開始されると、クライアントは、マイク３から採取された音声信号をリアルタイムでアップロードし、表示部５に表示中のスライドのページが切り替わる度にスライドのページ情報をアップロードする。これによって、サーバ装置は、図４や図８に示した処理が実行可能となる。さらに、クライアントは、図示しない入力デバイスに関する操作情報をサーバへ伝送し、サーバから伝送される処理結果だけを表示部５に表示させることにより、シンクライアントシステムとして構築することもできる。この場合には、各種のリソース、例えば文書データもサーバにより保持されると共に、プレゼンテーションソフトもサーバで仮想マシンとして実装されることになる。なお、上記の実施例１では、説明支援プログラムがプレゼンテーションソフトにアドオンされる場合を想定したが、ライセンス権限を有するクライアントから説明支援プログラムをライブラリとして参照する要求を受け付けた場合に、説明支援プログラムをプラグインさせることもできる。 [Other implementation examples]
In the first embodiment, the explanation support apparatus 10 exemplifies the case where the explanation support processing is executed in a stand-alone manner in which the presentation software is independently executed without depending on an external resource. Can also be adopted. For example, a client server system can be constructed by providing a server that provides an explanation support service corresponding to the explanation support process described above for a client that executes presentation software. In this case, the server device can be implemented by installing an explanation support program for realizing the explanation support service as package software or online software. For example, the server device may be implemented as a Web server that provides the above explanation support service, or may be implemented as a cloud that provides the above explanation support service by outsourcing. In this case, the client starts the presentation after uploading the document used for the presentation and the identification information of the venue where the presentation is performed to the server device. When the presentation is started, the client uploads the audio signal collected from the microphone 3 in real time, and uploads the slide page information every time the slide page being displayed on the display unit 5 is switched. As a result, the server apparatus can execute the processes shown in FIG. 4 and FIG. Furthermore, the client can also be constructed as a thin client system by transmitting operation information related to an input device (not shown) to the server and displaying only the processing result transmitted from the server on the display unit 5. In this case, various resources such as document data are also held by the server, and the presentation software is also implemented as a virtual machine on the server. In the first embodiment, it is assumed that the explanation support program is added to the presentation software. However, when a request for referring to the explanation support program as a library is received from a client having a license authority, the explanation support program is executed. It can also be plugged in.

［分散及び統合］
また、図示した各装置の各構成要素は、必ずしも物理的に図示の如く構成されておらずともよい。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、視線検出部１１、視線判定部１２、音声認識部１３、文書取得部１４、認識結果判定部１５、説明状態判定部１６、説明箇所推定部１７または強調表示制御部１８を説明支援装置１０の外部装置としてネットワーク経由で接続するようにしてもよい。また、視線検出部１１、視線判定部１２、音声認識部１３、文書取得部１４、認識結果判定部１５、説明状態判定部１６、説明箇所推定部１７または強調表示制御部１８を別の装置がそれぞれ有し、ネットワーク接続されて協働することで、上記の説明支援装置１０の機能を実現するようにしてもよい。 [Distribution and integration]
In addition, each component of each illustrated apparatus does not necessarily have to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. For example, the explanation support apparatus 10 includes the gaze detection unit 11, the gaze determination unit 12, the voice recognition unit 13, the document acquisition unit 14, the recognition result determination unit 15, the explanation state determination unit 16, the explanation location estimation unit 17, or the highlight display control unit 18. You may make it connect via a network as an external apparatus. Further, another device includes the line-of-sight detection unit 11, the line-of-sight determination unit 12, the voice recognition unit 13, the document acquisition unit 14, the recognition result determination unit 15, the description state determination unit 16, the description location estimation unit 17, or the highlight display control unit 18. The functions of the explanation support apparatus 10 described above may be realized by having each connected and connected via a network.

［説明支援プログラム］
また、上記の実施例で説明した各種の処理は、予め用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することによって実現することができる。そこで、以下では、図９を用いて、上記の実施例と同様の機能を有する説明支援プログラムを実行するコンピュータの一例について説明する。 [Explanation support program]
The various processes described in the above embodiments can be realized by executing a prepared program on a computer such as a personal computer or a workstation. Therefore, in the following, an example of a computer that executes an explanation support program having the same function as in the above embodiment will be described with reference to FIG.

図９は、実施例１及び実施例２に係る説明支援プログラムを実行するコンピュータのハードウェア構成例を示す図である。図９に示すように、コンピュータ１００は、操作部１１０ａと、スピーカ１１０ｂと、カメラ１１０ｃと、ディスプレイ１２０と、通信部１３０とを有する。さらに、このコンピュータ１００は、ＣＰＵ１５０と、ＲＯＭ１６０と、ＨＤＤ１７０と、ＲＡＭ１８０とを有する。これら１１０〜１８０の各部はバス１４０を介して接続される。 FIG. 9 is a diagram illustrating a hardware configuration example of a computer that executes the explanation support program according to the first embodiment and the second embodiment. As illustrated in FIG. 9, the computer 100 includes an operation unit 110 a, a speaker 110 b, a camera 110 c, a display 120, and a communication unit 130. Further, the computer 100 includes a CPU 150, a ROM 160, an HDD 170, and a RAM 180. These units 110 to 180 are connected via a bus 140.

ＨＤＤ１７０には、図９に示すように、上記の実施例１や実施例２で示した各処理部と同様の機能を発揮する説明支援プログラム１７０ａが記憶される。この説明支援プログラム１７０ａは、図１や図５に示した各処理部の各構成要素と同様、統合又は分離してもかまわない。すなわち、ＨＤＤ１７０には、必ずしも上記の実施例１で示した全てのデータが格納されずともよく、処理に用いるデータがＨＤＤ１７０に格納されればよい。 As shown in FIG. 9, the HDD 170 stores an explanation support program 170 a that exhibits the same function as each processing unit shown in the first and second embodiments. This explanation support program 170a may be integrated or separated, like each component of each processing unit shown in FIG. 1 or FIG. That is, the HDD 170 does not necessarily have to store all the data shown in the first embodiment, and data used for processing may be stored in the HDD 170.

このような環境の下、ＣＰＵ１５０は、ＨＤＤ１７０から説明支援プログラム１７０ａを読み出した上でＲＡＭ１８０へ展開する。この結果、説明支援プログラム１７０ａは、図９に示すように、説明支援プロセス１８０ａとして機能する。この説明支援プロセス１８０ａは、ＲＡＭ１８０が有する記憶領域のうち説明支援プロセス１８０ａに割り当てられた領域にＨＤＤ１７０から読み出した各種データを展開し、この展開した各種データを用いて各種の処理を実行する。例えば、説明支援プロセス１８０ａが実行する処理の一例として、図４や図８に示す処理などが含まれる。なお、ＣＰＵ１５０では、必ずしも上記の実施例１で示した全ての処理部が動作せずともよく、実行対象とする処理に対応する処理部が仮想的に実現されればよい。 Under such an environment, the CPU 150 reads the explanation support program 170 a from the HDD 170 and expands it on the RAM 180. As a result, the explanation support program 170a functions as an explanation support process 180a as shown in FIG. The explanation support process 180a develops various data read from the HDD 170 in an area allocated to the explanation support process 180a in the storage area of the RAM 180, and executes various processes using the developed various data. For example, as an example of processing executed by the explanation support process 180a, processing shown in FIGS. 4 and 8 and the like are included. Note that the CPU 150 does not necessarily operate all the processing units described in the first embodiment, and the processing unit corresponding to the process to be executed may be virtually realized.

なお、上記の説明支援プログラム１７０ａは、必ずしも最初からＨＤＤ１７０やＲＯＭ１６０に記憶されておらずともかまわない。例えば、コンピュータ１００に挿入されるフレキシブルディスク、いわゆるＦＤ、ＣＤ−ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」に各プログラムを記憶させる。そして、コンピュータ１００がこれらの可搬用の物理媒体から各プログラムを取得して実行するようにしてもよい。また、公衆回線、インターネット、ＬＡＮ、ＷＡＮなどを介してコンピュータ１００に接続される他のコンピュータまたはサーバ装置などに各プログラムを記憶させておき、コンピュータ１００がこれらから各プログラムを取得して実行するようにしてもよい。 Note that the explanation support program 170a is not necessarily stored in the HDD 170 or the ROM 160 from the beginning. For example, each program is stored in a “portable physical medium” such as a flexible disk inserted into the computer 100, so-called FD, CD-ROM, DVD disk, magneto-optical disk, or IC card. Then, the computer 100 may acquire and execute each program from these portable physical media. Each program is stored in another computer or server device connected to the computer 100 via a public line, the Internet, a LAN, a WAN, etc., and the computer 100 acquires and executes each program from these. It may be.

１カメラ
３マイク
５表示部
１０説明支援装置
１１視線検出部
１２視線判定部
１２ａ停留点検出部
１２ｂ画面内判定部
１３音声認識部
１４文書取得部
１５認識結果判定部
１５ａ説明単位抽出部
１５ｂ対応付け部
１６説明状態判定部
１６ａ履歴記憶部
１７説明箇所推定部
１８強調表示制御部 DESCRIPTION OF SYMBOLS 1 Camera 3 Microphone 5 Display part 10 Description assistance apparatus 11 Eye-gaze detection part 12 Eye-gaze determination part 12a Stop point detection part 12b In-screen determination part 13 Voice recognition part 14 Document acquisition part 15 Recognition result determination part 15a Explanation unit extraction part 15b Correspondence 16 Description state determination unit 16a History storage unit 17 Description location estimation unit 18 Highlight display control unit

Claims

A line-of-sight detection unit that performs line-of-sight detection on a predetermined display unit;
A voice recognition unit for performing voice recognition;
A recognition result determination unit for determining an explanation location corresponding to the result of the voice recognition among the pages of the document displayed on the display unit;
The first state where the explanation location corresponding to the speech recognition and the explanation location corresponding to the line-of-sight detection match, the second state where the explanation location corresponding to the speech recognition and the description location corresponding to the line-of-sight detection do not match An explanation state determination unit that determines which one of the third states in which the state or the position of the line of sight by the detection of the line of sight is not detected within the screen of the display unit;
An explanation support device comprising: an estimation unit that estimates an explanation location for executing highlighting based on the determined explanation state.

A history storage unit for storing the history of the explanation state;
The estimation unit determines whether the speech recognition and the line of sight depend on whether or not the explanation state has continuity from the explanation state determined by the explanation state determination unit and the history of the explanation state stored in the history storage unit. Whether the explanation location where the detection results match is estimated as the explanation location where the highlighting is performed, the explanation location corresponding to the speech recognition is estimated as the explanation location where the highlighting is performed, or estimated last time The explanation support apparatus according to claim 1, wherein it is determined whether to continue the explanation part as the explanation part for executing the highlighting.

A reading state determination unit that determines whether the reading state or the silent reading state is based on at least one of the moving direction of the line of sight obtained from the result of the line of sight detection and the moving direction of the line of sight obtained from the result of the line of sight detection. Further comprising
The estimation unit includes an explanation location where the speech recognition result and the line-of-sight detection result match according to the explanation state determined by the explanation state determination unit and the reading state or silent reading state determined by the reading state determination unit, the sound The explanation support apparatus according to claim 1, wherein an explanation location corresponding to recognition or an explanation location corresponding to the visual line detection is to be estimated as an explanation location for executing the highlighting.

Computer
Execute line of sight detection for a given display,
Perform speech recognition,
Determining an explanation location corresponding to the result of the voice recognition among the pages of the document displayed on the display unit;
The first state where the explanation location corresponding to the speech recognition and the explanation location corresponding to the line-of-sight detection match, the second state where the explanation location corresponding to the speech recognition and the description location corresponding to the line-of-sight detection do not match A state or a position of the line of sight by the line-of-sight detection is determined in which of the third states that are not detected in the screen of the display unit,
An explanation support method, comprising: executing a process of estimating an explanation location for executing highlighting based on a determined explanation state.

On the computer,
Execute line of sight detection for a given display,
Perform speech recognition,
Determining an explanation location corresponding to the result of the voice recognition among the pages of the document displayed on the display unit;
The first state where the explanation location corresponding to the speech recognition and the explanation location corresponding to the line-of-sight detection match, the second state where the explanation location corresponding to the speech recognition and the description location corresponding to the line-of-sight detection do not match A state or a position of the line of sight by the line-of-sight detection is determined in which of the third states that are not detected in the screen of the display unit,
An explanation support program characterized by causing a process of estimating an explanation location for executing highlighting based on a determined explanation state to be executed.