JP6690442B2

JP6690442B2 - Presentation support device, presentation support system, presentation support method, and presentation support program

Info

Publication number: JP6690442B2
Application number: JP2016132824A
Authority: JP
Inventors: 高橋　潤; 潤高橋; 田中　正清; 正清田中; 村瀬　健太郎; 健太郎村瀬
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2016-07-04
Filing date: 2016-07-04
Publication date: 2020-04-28
Anticipated expiration: 2036-07-04
Also published as: JP2018005011A

Description

本発明は、プレゼンテーション支援装置、プレゼンテーション支援システム、プレゼンテーション支援方法及びプレゼンテーション支援プログラムに関する。 The present invention relates to a presentation support device, a presentation support system, a presentation support method, and a presentation support program.

会議や講演におけるプレゼンテーション、パンフレット紹介などが行われる場面では、複数人によって同一の内容の文書、例えば進捗アジェンダやスライド資料などに関する文書が共有された状態で会話によるコミュニケーションが行われる場合がある。 In a scene such as a presentation at a conference or a lecture, introduction of a pamphlet, etc., a plurality of people may communicate by conversation while sharing a document having the same content, for example, a document relating to a progress agenda or a slide material.

このような会話によるコミュニケーションを支援する技術の一例として、音声認識を活用することにより、共有文書に含まれる内容のうち発話箇所に対応する部分の表示状態を変更するものがある。 As an example of a technique that supports such communication by conversation, there is a technique that utilizes voice recognition to change the display state of the portion corresponding to the uttered portion of the content included in the shared document.

この他、下記に説明する同期コンテンツ情報生成装置が提案されている。この同期コンテンツ情報生成装置は、文書を用いた会議等の様子がビデオカメラ装置により録音・撮像された音声・映像情報に基づいて、文書情報がＮ個に分割された文書ブロック、例えば１ページや１段落などの単位で抽出されるキーワードが出現した時刻順の出現キーワード分布を計測し、音声・映像情報と時間的な同期がとられた状態で文書情報を表示させるための同期コンテンツ情報を生成する。 In addition, a synchronized content information generation device described below has been proposed. This synchronous content information generation device is a document block in which the document information is divided into N pieces, for example, one page or the like, based on audio / video information in which a state of a meeting using documents is recorded / imaged by a video camera device. Generates synchronized content information for displaying document information in time-sequential synchronization with audio / video information by measuring the distribution of the appearance keywords in the order in which the keywords extracted in units such as one paragraph appear. To do.

特開２００４−７３５８号公報JP, 2004-7358, A 特開２００９−２７１８１４号公報JP, 2009-272814, A 特開平７−３３４０７５号公報JP, 7-334075, A 特開２０１３−８３８９７号公報JP, 2013-83897, A

しかしながら、上記の技術では、発話箇所の表示状態の変更漏れが発生する場合がある。 However, in the above technique, there may be a case where the display state of the uttered portion is not changed.

すなわち、上記の同期コンテンツ情報生成装置では、文書ブロックのうちキーワードの発話頻度が高い文書ブロックの表示状態が変更される。ところが、上記の文書ブロック内に含まれるキーワードの絶対数が少ない場合、当該文書ブロック内のキーワードが発話されていたとしても、他の文書ブロック内のキーワードの発話頻度が高ければ、他の文書ブロックの表示状態が変更される。この結果、キーワードの絶対数が少ない文書ブロックの表示状態が変更されずにスキップされてしまう場合がある。 That is, in the above-described synchronized content information generation device, the display state of the document block in which the keyword has a high utterance frequency is changed. However, when the absolute number of keywords contained in the document block is small, even if the keyword in the document block is uttered, if the utterance frequency of the keyword in another document block is high, the other document block is uttered. The display state of is changed. As a result, the display state of a document block having a small number of keywords may be skipped without being changed.

１つの側面では、本発明は、発話箇所の表示状態の変更漏れが発生するのを抑制できるプレゼンテーション支援装置、プレゼンテーション支援システム、プレゼンテーション支援方法及びプレゼンテーション支援プログラムを提供することを目的とする。 In one aspect, it is an object of the present invention to provide a presentation support device, a presentation support system, a presentation support method, and a presentation support program that can prevent the omission of the change in the display state of the uttered portion.

一態様では、プレゼンテーション支援装置は、文書ファイルの表示コンテンツが分割された領域ごとに当該領域が含む文字列から抽出された単語を用いて、音声データに対する音声認識を実行する認識部と、前記音声認識により連続して認識される２つの認識単語が異なる領域に属する場合、前記２つの認識単語のうち後続して認識された方の認識単語を含む領域の表示状態を変更する表示制御部と、を有する。 In one aspect, the presentation support device includes a recognition unit that performs voice recognition on voice data by using a word extracted from a character string included in each region into which the display content of the document file is divided; A display control unit that changes a display state of an area including a subsequently recognized one of the two recognized words when two recognized words that are continuously recognized by recognition belong to different areas; Have.

発話箇所の表示状態の変更漏れが発生するのを抑制できる。 It is possible to suppress the omission of the change in the display state of the uttered portion.

図１は、実施例１に係るプレゼンテーション支援装置１０の機能的構成を示すブロック図である。FIG. 1 is a block diagram illustrating the functional configuration of the presentation support device 10 according to the first embodiment. 図２は、スライドの一例を示す図である。FIG. 2 is a diagram showing an example of a slide. 図３は、実施例１に係る抽出単語データの生成処理の手順を示すフローチャートである。FIG. 3 is a flowchart illustrating the procedure of the extraction word data generation process according to the first embodiment. 図４は、実施例１に係る音声認識処理の手順を示すフローチャートである。FIG. 4 is a flowchart illustrating the procedure of the voice recognition process according to the first embodiment. 図５は、実施例１に係る表示制御処理の手順を示すフローチャートである。FIG. 5 is a flowchart illustrating the procedure of the display control process according to the first embodiment. 図６は、実施例２に係るプレゼンテーション支援装置２０の機能的構成を示すブロック図である。FIG. 6 is a block diagram of the functional configuration of the presentation support device 20 according to the second embodiment. 図７は、実施例３に係るプレゼンテーション支援システム３の構成例を示す図である。FIG. 7 is a diagram illustrating a configuration example of the presentation support system 3 according to the third embodiment. 図８は、実施例３に係るプレゼンテーション支援システム４の構成例を示す図である。FIG. 8 is a diagram illustrating a configuration example of the presentation support system 4 according to the third embodiment. 図９は、電子会議システムへの適用例を示す図である。FIG. 9 is a diagram showing an application example to an electronic conference system. 図１０は、電子会議システムへの適用例を示す図である。FIG. 10 is a diagram showing an application example to an electronic conference system. 図１１は、プレゼンテーション支援システムへの実装例を示す図である。FIG. 11 is a diagram showing an example of implementation in the presentation support system. 図１２は、実施例１〜実施例３に係るプレゼンテーション支援プログラムを実行するコンピュータのハードウェア構成例を示す図である。FIG. 12 is a diagram illustrating a hardware configuration example of a computer that executes the presentation support program according to the first to third embodiments.

以下に添付図面を参照して本願に係るプレゼンテーション支援装置、プレゼンテーション支援システム、プレゼンテーション支援方法及びプレゼンテーション支援プログラムについて説明する。なお、この実施例は開示の技術を限定するものではない。そして、各実施例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Hereinafter, a presentation support device, a presentation support system, a presentation support method, and a presentation support program according to the present application will be described with reference to the accompanying drawings. Note that this embodiment does not limit the disclosed technology. Then, the respective embodiments can be appropriately combined within the range in which the processing contents are not inconsistent.

［プレゼンテーション支援装置が搭載する機能の一側面］
図１は、実施例１に係るプレゼンテーション支援装置の機能的構成を示すブロック図である。図１に示すプレゼンテーション支援装置１０は、複数人によって同一の内容の文書、例えば進捗アジェンダやスライド資料などに関する文書が共有された状態で当該文書に含まれるページ画面、例えばスライドのうち、話者により発話された音声から認識された単語に対応する部分をハイライト表示させるプレゼンテーション支援サービスを提供する。 [One side of the function of the presentation support device]
FIG. 1 is a block diagram of the functional configuration of the presentation support device according to the first embodiment. The presentation support device 10 illustrated in FIG. 1 is a page screen included in a document having the same content, for example, a document related to a progress agenda or slide materials, shared by a plurality of people, and a speaker on a page screen included in the document. A presentation support service that highlights a portion corresponding to a word recognized from a spoken voice.

ここで、以下では、あくまで一例として、上記のハイライト表示に関する機能がプレゼンテーションソフトにアドオンされる場合を想定し、当該プレゼンテーションソフトを用いて作成された文書ファイルが含む１または複数のスライドを表示装置５に表示させることによってプレゼンテーションが進行される場合を想定する。このスライドには、テキストや図形を始め、他のアプリケーションプログラムによって作成されたコンテンツをインポートすることができる。例えば、ワープロソフトで作成された文書、表計算ソフトで作成された表やグラフをインポートしたり、撮像装置で撮像された画像や動画、さらには、画像編集ソフトで編集された画像や動画などをインポートしたりすることができる。 Here, in the following description, as an example, assuming that the above-mentioned highlight display function is added to the presentation software, one or a plurality of slides included in the document file created using the presentation software are displayed on the display device. It is assumed that the presentation is advanced by displaying it on the display 5. You can import text and graphics, as well as content created by other application programs, into this slide. For example, you can import documents created with word processing software, tables and graphs created with spreadsheet software, images and videos taken with an imaging device, and images and videos edited with image editing software. It can be imported.

プレゼンテーション支援装置１０は、上記のプレゼンテーション支援サービスを実行するコンピュータである。 The presentation support device 10 is a computer that executes the above-mentioned presentation support service.

一実施形態として、プレゼンテーション支援装置１０には、デスクトップ型またはノート型のパーソナルコンピュータなどの情報処理装置を採用することができる。この他、プレゼンテーション支援装置１０には、上記のパーソナルコンピュータなどの据置き型の端末のみならず、各種の携帯端末装置を採用することもできる。例えば、携帯端末装置の一例として、スマートフォン、携帯電話機やＰＨＳ（Personal Handyphone System）などの移動体通信端末、さらには、ＰＤＡ（Personal Digital Assistants）などのスレート端末などがその範疇に含まれる。 As one embodiment, the information processing device such as a desktop or notebook personal computer can be adopted as the presentation support device 10. In addition, the presentation support device 10 can employ not only the above-mentioned stationary terminal such as a personal computer but also various mobile terminal devices. For example, examples of the mobile terminal device include a mobile communication terminal such as a smartphone, a mobile phone or a PHS (Personal Handyphone System), and a slate terminal such as a PDA (Personal Digital Assistants) in its category.

なお、本実施例では、あくまで一例として、プレゼンテーション支援装置１０が上記のプレゼンテーションソフトを外部のリソースに依存せずに単独で実行するスタンドアローンで上記のプレゼンテーション支援サービスを提供する場合を想定する。詳細は後述するが、上記のプレゼンテーション支援サービスは、スタンドアローンで提供される実装に限定されない。例えば、プレゼンテーションソフトを実行するクライアント端末に対し、上記のプレゼンテーション支援サービスを提供するサーバ装置を設けることによってクライアントサーバシステムとして構築することもできる。この他、プレゼンテーションソフトをサーバ装置に実行させ、その実行結果をサーバ装置がクライアント端末へ伝送して表示させるシンクライアントシステムとして構築することもできる。 In addition, in the present embodiment, as an example, it is assumed that the presentation support apparatus 10 provides the above-mentioned presentation support service as a stand-alone program that executes the above-mentioned presentation software independently without depending on external resources. Although the details will be described later, the above-mentioned presentation support service is not limited to the implementation provided as a stand-alone. For example, it is possible to construct a client-server system by providing a client device that executes presentation software with a server device that provides the above-mentioned presentation support service. In addition, it is also possible to construct a thin client system in which the presentation software is executed by the server device and the execution result is transmitted to the client terminal for display.

ここで、上記のプレゼンテーション支援装置１０は、上記のプレゼンテーション支援サービスの一環として、発話から音声認識される２つの認識単語がスライド上の複数の領域にまたがる場合に２つの認識単語のうち後続する認識単語が出現する領域の表示状態を変更する。それ故、単語の絶対数が少ない領域内の単語が発話された場合、他の領域内の単語の発話頻度が高くとも当該領域の表示状態をハイライト表示等に変更できる結果、単語の絶対数が少ない領域の表示状態が変更されずにスキップされるのを抑制できる。したがって、発話箇所の表示状態の変更漏れが発生するのを抑制できる。 Here, as a part of the above-mentioned presentation support service, the above-mentioned presentation support device 10 recognizes the following recognition of two recognition words when the two recognition words recognized by speech from the utterance extend over a plurality of areas on the slide. Change the display state of the area where the word appears. Therefore, when a word in a region with a small absolute number of words is uttered, even if the utterance frequency of words in other regions is high, the display state of the region can be changed to highlight display, etc. It is possible to suppress skipping without changing the display state of a region with a small number. Therefore, it is possible to suppress the omission of the change in the display state of the uttered portion.

［周辺機器］
図１に示すように、プレゼンテーション支援装置１０には、マイク１と、表示装置５と、入力装置７とが接続される。これらマイク１、表示装置５及び入力装置７などの周辺機器と、プレゼンテーション支援装置１０との間は、有線または無線により接続される。 [Peripheral equipment]
As shown in FIG. 1, the presentation support device 10 is connected to a microphone 1, a display device 5, and an input device 7. Peripheral devices such as the microphone 1, the display device 5, and the input device 7 are connected to the presentation support device 10 by wire or wirelessly.

マイク１は、音声を電気信号に変換する装置である。ここで言う「マイク」は、マイクロフォンの略称である。 The microphone 1 is a device that converts voice into an electric signal. The term “microphone” used here is an abbreviation for microphone.

例えば、マイク１は、話者、例えばプレゼンテーションを実施するプレゼンタに装着させることができる。この場合、ヘッドセット型やタイピン型のマイクをプレゼンタの身体や衣服の所定位置に装着させたり、ハンド型のマイクをプレゼンタに携帯させたりすることができる。また、マイク１は、プレゼンタの発話が集音できる範囲の所定位置に設置することもできる。この場合、マイク１には、取付け型や据置き型のマイクを採用することもできる。これらいずれの場合においても、マイク１には、任意のタイプの指向性を持つマイクを採用できるが、プレゼンタの発話以外の音声、例えば聴講者等の発話や騒音などの雑音が集音されるのを抑制するために、マイクの感度をプレゼンタの発声方向に限定することもできる。なお、マイク１には、ダイナミック型、エレクトレットコンデンサ型、コンデンサ型などの任意の変換方式を採用することができる。このマイク１に音声を採取することにより得られたアナログ信号は、デジタル信号へ変換された上でプレゼンテーション支援装置１０へ入力される。 For example, the microphone 1 can be attached to a speaker, for example, a presenter who gives a presentation. In this case, a headset type or tie pin type microphone can be attached to a predetermined position of the presenter's body or clothes, or a hand type microphone can be carried by the presenter. The microphone 1 can also be installed at a predetermined position in a range where the presenter's speech can be collected. In this case, an attached or stationary microphone can be adopted as the microphone 1. In any of these cases, a microphone having any type of directivity can be adopted as the microphone 1, but voices other than the utterance of the presenter, such as utterances of listeners or noise such as noise, are collected. In order to suppress the noise, the sensitivity of the microphone can be limited to the presenter's speaking direction. It should be noted that the microphone 1 can adopt any conversion method such as a dynamic type, an electret condenser type, or a condenser type. An analog signal obtained by collecting a voice in the microphone 1 is converted into a digital signal and then input to the presentation support device 10.

表示装置５は、各種の情報を表示する装置である。 The display device 5 is a device that displays various types of information.

例えば、表示装置５には、発光により表示を実現する液晶ディスプレイや有機ＥＬ（electroluminescence）ディスプレイなどを採用することもできるし、投影により表示を実現するプロジェクタを採用することもできる。また、表示装置５の設置台数は、必ずしも１台に限定されずともよく、複数の台数であってかまわない。以下では、一例として、プレゼンテーションの参加者であるプレゼンタ及び聴講者の両者が閲覧する共用の表示装置としてプロジェクタ及びプロジェクタが投影する画像を映すスクリーンが実装される場合を想定する。 For example, the display device 5 may be a liquid crystal display or an organic EL (electroluminescence) display that realizes display by light emission, or a projector that realizes display by projection. Further, the number of display devices 5 installed is not necessarily limited to one, and a plurality of display devices 5 may be installed. In the following, as an example, it is assumed that a projector and a screen for displaying an image projected by the projector are mounted as a shared display device viewed by both presenters and listeners who are participants of the presentation.

表示装置５は、一例として、プレゼンテーション支援装置１０からの指示にしたがってプレゼンテーション画面を表示する。例えば、表示装置５は、プレゼンテーション支援装置１０のプロセッサ上で動作するプレゼンテーションソフトが開く文書ファイルのスライドを表示する。このとき、表示装置５には、文書ファイルに含まれるスライドを自動または手動により切り替えて表示させることができる。例えば、プレゼンタが入力装置７を介して指定する任意のスライドを表示させることもできるし、プレゼンテーションソフトが有するスライドショーの機能がＯＮ状態に設定された場合、各スライドが作成されたページ順に文書ファイルに含まれるスライドを切り替えて表示させることもできる。 The display device 5 displays the presentation screen according to an instruction from the presentation support device 10, for example. For example, the display device 5 displays a slide of a document file opened by the presentation software operating on the processor of the presentation support device 10. At this time, the display device 5 can switch and display the slides included in the document file automatically or manually. For example, an arbitrary slide designated by the presenter via the input device 7 can be displayed, or when the slide show function of the presentation software is set to the ON state, the slides are created in the document file in the order of the pages created. The included slides can be switched and displayed.

入力装置７は、各種の情報に対する指示入力を受け付ける装置である。 The input device 7 is a device that receives an instruction input for various information.

例えば、表示装置５がプロジェクタとして実装される場合、スクリーンに映し出されたスライド上の位置を指し示すレーザポインタを入力装置７として実装することができる。すなわち、レーザポインタの中には、スライドのページを進めたり、戻したりする各種のボタンなどの操作部が設けられたリモコン機能付きのレーザポインタも存在する。このリモコン機能付きのレーザポインタが有する操作部を入力装置７として援用することもできる。この他、マウスやキーボードを入力装置７として採用したり、レーザポインタによって指し示されたポインタの位置のセンシング、プレゼンタの視線検出やジェスチャ認識を行うためにスクリーンまたはプレゼンタの所定の部位が撮像された画像を入力する画像センサを入力装置７として採用したりすることもできる。なお、表示装置５が液晶ディスプレイとして実装される場合、入力装置７には、液晶ディスプレイ上に貼り合わせられたタッチセンサを採用することもできる。 For example, when the display device 5 is implemented as a projector, a laser pointer for pointing a position on the slide projected on the screen can be implemented as the input device 7. That is, among laser pointers, there is also a laser pointer with a remote control function, which is provided with an operation unit such as various buttons for advancing and returning a slide page. The operation unit included in the laser pointer with the remote control function can be used as the input device 7. In addition, a mouse or a keyboard is adopted as the input device 7, and a screen or a predetermined part of the presenter is imaged in order to perform sensing of the position of the pointer pointed by the laser pointer, detection of the line of sight of the presenter, and gesture recognition. An image sensor for inputting an image may be adopted as the input device 7. When the display device 5 is implemented as a liquid crystal display, the input device 7 may be a touch sensor attached on the liquid crystal display.

入力装置７は、一例として、プレゼンテーション支援装置１０のプロセッサ上でプレゼンテーションソフトに実行させる文書ファイルの指定、スライドのページを進める操作やスライドのページを戻す操作などを受け付ける。このように入力装置７を介して受け付けられる操作は、プレゼンテーション支援装置１０へ出力されることになる。 As an example, the input device 7 receives designation of a document file to be executed by the presentation software on the processor of the presentation support device 10, operation of advancing a slide page, operation of returning a slide page, and the like. Thus, the operation accepted via the input device 7 is output to the presentation support device 10.

［プレゼンテーション支援装置１０の構成］
続いて、本実施例に係るプレゼンテーション支援装置１０の機能的構成について説明する。図１に示すように、プレゼンテーション支援装置１０は、入出力Ｉ／Ｆ（InterFace）部１１と、記憶部１３と、制御部１５とを有する。なお、図１には、データの入出力の関係を表す実線が示されているが、図１には、説明の便宜上、最小限の部分について示されているに過ぎない。すなわち、各処理部に関するデータの入出力は、図示の例に限定されず、図示以外のデータの入出力、例えば処理部及び処理部の間、処理部及びデータの間、並びに、処理部及び外部装置の間のデータの入出力が行われることとしてもかまわない。 [Configuration of presentation support device 10]
Subsequently, a functional configuration of the presentation support device 10 according to the present embodiment will be described. As shown in FIG. 1, the presentation support device 10 includes an input / output I / F (InterFace) unit 11, a storage unit 13, and a control unit 15. Note that FIG. 1 shows a solid line representing a data input / output relationship, but FIG. 1 shows only a minimum part for convenience of description. That is, the input / output of data relating to each processing unit is not limited to the illustrated example, and the input / output of data other than the illustrated units, for example, between the processing units and the processing units, between the processing units and the data, and between the processing units and the outside. Data may be input / output between devices.

入出力Ｉ／Ｆ部１１は、マイク１、表示装置５及び入力装置７などの周辺機器との間で入出力を行うインタフェースである。 The input / output I / F unit 11 is an interface that performs input / output with peripheral devices such as the microphone 1, the display device 5, and the input device 7.

一側面として、入出力Ｉ／Ｆ部１１は、入力装置７から入力された各種の操作を制御部１５へ出力する。また、入出力Ｉ／Ｆ部１１は、制御部１５から出力されたスライドの画像データを表示装置５へ出力したり、スライドに含まれる領域に対するハイライト指示またはそのキャンセル指示を表示装置５へ出力したりする。また、入出力Ｉ／Ｆ部１１は、マイク１から入力された音声データを制御部１５へ出力する。 As one aspect, the input / output I / F unit 11 outputs various operations input from the input device 7 to the control unit 15. Further, the input / output I / F unit 11 outputs the image data of the slide output from the control unit 15 to the display device 5, and outputs a highlight instruction for the area included in the slide or an instruction to cancel the same to the display device 5. To do The input / output I / F unit 11 also outputs the audio data input from the microphone 1 to the control unit 15.

記憶部１３は、制御部１５で実行されるＯＳ（Operating System）やプレゼンテーションソフトを始め、アプリケーションプログラムなどの各種プログラムに用いられるデータを記憶するデバイスである。 The storage unit 13 is a device that stores data used for various programs such as an application program including an OS (Operating System) executed by the control unit 15 and presentation software.

一実施形態として、記憶部１３は、プレゼンテーション支援装置１０における主記憶装置として実装される。例えば、記憶部１３には、各種の半導体メモリ素子、例えばＲＡＭ（Random Access Memory）やフラッシュメモリを採用できる。また、記憶部１３は、補助記憶装置として実装することもできる。この場合、ＨＤＤ（Hard Disk Drive）、光ディスクやＳＳＤ（Solid State Drive）などを採用できる。 As an embodiment, the storage unit 13 is implemented as a main storage device in the presentation support device 10. For example, the storage unit 13 can employ various semiconductor memory elements such as a RAM (Random Access Memory) and a flash memory. The storage unit 13 can also be implemented as an auxiliary storage device. In this case, an HDD (Hard Disk Drive), an optical disk, an SSD (Solid State Drive), or the like can be adopted.

記憶部１３は、制御部１５で実行されるプログラムに用いられるデータの一例として、文書データ１３ａ、抽出単語データ１３ｂ及び認識単語データ１３ｃを記憶する。これらのデータ以外にも、記憶部１３には、他の電子データ、例えば表示状態の変更制御に関する定義データなども併せて記憶することもできる。なお、上記の文書データ１３ａ以外の抽出単語データ１３ｂ及び認識単語データ１３ｃは、各データの登録または参照を行う処理部の説明に合わせて説明を行うこととする。 The storage unit 13 stores document data 13a, extracted word data 13b, and recognition word data 13c as an example of data used in the program executed by the control unit 15. In addition to these data, the storage unit 13 can also store other electronic data, for example, definition data regarding display state change control. The extracted word data 13b and the recognized word data 13c other than the document data 13a will be described in accordance with the description of the processing unit that registers or refers to each data.

文書データ１３ａは、文書に関するデータである。 The document data 13a is data relating to a document.

一実施形態として、文書データ１３ａには、プレゼンテーションソフトを用いて１または複数のスライドが作成された文書ファイルを採用できる。かかるスライドには、テキストや図形を始め、他のアプリケーションプログラムによって作成されたコンテンツをインポートすることができる。例えば、ワープロソフトで作成された文書、表計算ソフトで作成された表やグラフをインポートしたり、撮像デバイスで撮像された画像や動画、さらには、画像編集ソフトで編集された画像や動画などをインポートしたりすることができる。このように、テキスト以外のコンテンツには、音声認識によるキーワード検索を実現するために、プレゼンテーションの開始前までに当該コンテンツの説明語句や説明文などの文字列を含むメタ情報を付与しておくことができる。 As one embodiment, a document file in which one or a plurality of slides are created by using presentation software can be adopted as the document data 13a. Text and graphics, as well as content created by other application programs, can be imported into such slides. For example, you can import documents created with word processing software, tables and graphs created with spreadsheet software, images and videos taken with an imaging device, and images and videos edited with image editing software. It can be imported. As described above, in order to realize a keyword search by voice recognition, content other than text should be provided with meta-information including a character string such as a description phrase or description of the content before the presentation starts. You can

制御部１５は、各種のプログラムや制御データを格納する内部メモリを有し、これらによって種々の処理を実行するものである。 The control unit 15 has an internal memory that stores various programs and control data, and executes various processes by these.

一実施形態として、制御部１５は、中央処理装置、いわゆるＣＰＵ（Central Processing Unit）として実装される。制御部１５は、必ずしも中央処理装置として実装されずともよく、ＭＰＵ（Micro Processing Unit）やＤＳＰ（Digital Signal Processor）として実装されることとしてもよい。このように、制御部１５は、プロセッサとして実装されればよく、その種別が汎用型または特化型であるかは問われない。また、制御部１５は、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などのハードワイヤードロジックによっても実現できる。 As one embodiment, the control unit 15 is implemented as a central processing unit, so-called CPU (Central Processing Unit). The control unit 15 does not necessarily have to be mounted as a central processing unit, and may be mounted as an MPU (Micro Processing Unit) or a DSP (Digital Signal Processor). As described above, the control unit 15 may be implemented as a processor, and it does not matter whether the type is a general-purpose type or a specialized type. The control unit 15 can also be realized by a hard-wired logic such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

制御部１５は、各種のプログラムを実行することによって下記の処理部を仮想的に実現する。例えば、制御部１５は、図１に示すように、抽出部１５ａと、認識部１５ｂと、算出部１５ｃと、推定部１５ｄと、表示制御部１５ｅとを有する。 The control unit 15 virtually implements the following processing unit by executing various programs. For example, as shown in FIG. 1, the control unit 15 includes an extraction unit 15a, a recognition unit 15b, a calculation unit 15c, an estimation unit 15d, and a display control unit 15e.

抽出部１５ａは、文書ファイルに含まれるスライドから音声認識で用いる辞書データに登録する単語を抽出単語データ１３ｂとして抽出する処理部である。 The extraction unit 15a is a processing unit that extracts, from the slides included in the document file, words to be registered in the dictionary data used for voice recognition as the extracted word data 13b.

一実施形態として、抽出部１５ａは、上記の抽出単語データ１３ｂを抽出する処理を自動的に開始することもできるし、手動設定で開始することもできる。例えば、自動的に開始する場合、プレゼンテーションソフトが文書ファイルを記憶部１３に保存した上で閉じる場合、あるいはプレゼンテーションを介する文書ファイルの編集中に文書ファイルが記憶部１３に保存された場合に、処理を起動させることができる。また、手動設定で開始する場合、入力装置７を介してプレゼンテーションの前処理の実行指示を受け付けた場合に、処理を起動させることができる。いずれの場合においても、記憶部１３に記憶された文書データ１３ａが含む文書ファイルのうち、保存または実行指示に対応する文書ファイルを読み出すことによって処理が開始される。 As one embodiment, the extraction unit 15a can automatically start the process of extracting the extracted word data 13b, or can manually start the process. For example, when automatically starting, when the presentation software saves the document file in the storage unit 13 and then closes it, or when the document file is saved in the storage unit 13 while editing the document file through the presentation, Can be activated. Further, in the case of starting by manual setting, the process can be activated when an instruction to execute the pre-process of the presentation is received via the input device 7. In any case, the process is started by reading the document file corresponding to the save or execution instruction among the document files included in the document data 13a stored in the storage unit 13.

抽出単語データ１３ｂの生成について説明すると、抽出部１５ａは、記憶部１３に記憶された文書データ１３ａが含む文書ファイルのうち保存が実行された文書ファイルあるいはプレゼンテーションの前処理の実行指示を受け付けた文書ファイルを読み出す。ここでは、一例として、抽出部１５ａが記憶部１３から文書ファイルを読み出す場合を例示したが、文書ファイルの入手経路はこれに限定されない。例えば、抽出部１５ａは、ハードディスクや光ディスクなどの補助記憶装置またはメモリカードやＵＳＢ（Universal Serial Bus）メモリなどのリムーバブルメディアから文書ファイルを取得することもできる。また、抽出部１５ａは、外部装置からネットワークを介して受信することによって文書ファイルを取得することもできる。 Explaining the generation of the extracted word data 13b, the extracting unit 15a receives the instruction to execute the preprocessing of the document file in which the saving is executed among the document files included in the document data 13a stored in the storage unit 13 or the document. Read the file. Here, as an example, the case where the extraction unit 15a reads the document file from the storage unit 13 is illustrated, but the acquisition route of the document file is not limited to this. For example, the extraction unit 15a can also obtain the document file from an auxiliary storage device such as a hard disk or an optical disk, or a removable medium such as a memory card or a USB (Universal Serial Bus) memory. The extraction unit 15a can also acquire a document file by receiving it from an external device via a network.

続いて、抽出部１５ａは、先に読み出した文書ファイルに含まれるスライドを複数の領域へ分割する。例えば、抽出部１５ａは、一文、行、段落などの単位でスライドを分割する。この場合、抽出部１５ａは、スライドが含む文字列を走査して、スペース、句点または改行に対応する区切り文字を検出し、当該区切り文字を領域の境界に設定する。かかる境界を前後に、抽出部１５ａは、スライドが含む文字列を区切る。これによって、スライドが複数の領域へ区切り文字ごとに分割される。その上で、抽出部１５ａは、スライドの分割によって得られた領域に当該領域を識別するインデックスを割り当てる。なお、ここでは、スライドを自動的に分割する場合を例示したが、入力装置７等を介して領域の境界を指定させることによってスライドを手動設定で分割することとしてもかまわない。 Subsequently, the extraction unit 15a divides the slide included in the previously read document file into a plurality of areas. For example, the extraction unit 15a divides the slide in units of one sentence, line, paragraph, or the like. In this case, the extraction unit 15a scans the character string included in the slide, detects a delimiter character corresponding to a space, a punctuation mark, or a line feed, and sets the delimiter character at the boundary of the area. The extraction unit 15a divides the character string included in the slide before and after the boundary. As a result, the slide is divided into a plurality of areas for each delimiter. Then, the extraction unit 15a assigns an index for identifying the area to the area obtained by dividing the slide. In addition, although the case where the slide is automatically divided is illustrated here, the slide may be manually divided by designating the boundary of the area through the input device 7 or the like.

スライドの分割後に、抽出部１５ａは、当該スライドに含まれる複数の領域のうち領域を１つ選択する。続いて、抽出部１５ａは、先に選択された領域が含む文字列に対し、自然言語処理を実行することによって単語を抽出する。例えば、抽出部１５ａは、領域内の文字列に形態素解析等を実行することにより得られた形態素のうち品詞が名詞である単語や、文節を形成する単語などを抽出する。そして、抽出部１５ａは、先に抽出された単語ごとに当該単語が含まれる領域に割り当てられたインデックスを付与する。その後、抽出部１５ａは、スライドが含む領域が全て選択されるまで上記の単語の抽出及び上記のインデックスの付与を繰返し実行する。 After dividing the slide, the extraction unit 15a selects one of the plurality of areas included in the slide. Subsequently, the extraction unit 15a extracts a word by performing natural language processing on the character string included in the previously selected area. For example, the extraction unit 15a extracts a word whose part of speech is a noun, a word forming a phrase, or the like among the morphemes obtained by performing morphological analysis on the character strings in the area. Then, the extraction unit 15a assigns, to each of the previously extracted words, an index assigned to a region including the word. After that, the extraction unit 15a repeatedly executes the above-mentioned word extraction and the above-mentioned index assignment until all the areas included in the slide are selected.

このようにして全ての領域から単語が抽出された後に、抽出部１５ａは、スライドに含まれる単語ごとに当該単語ｋの読みおよびインデックスｉｄｘが対応付けられた抽出単語データ１３ｂを記憶部１３へ登録する。 After words have been extracted from all areas in this way, the extraction unit 15a registers in the storage unit 13 the extracted word data 13b in which the reading of the word k and the index idx are associated with each word included in the slide. To do.

認識部１５ｂは、音声認識を実行する処理部である。 The recognition unit 15b is a processing unit that executes voice recognition.

一実施形態として、認識部１５ｂは、プレゼンテーションソフトが文書ファイルを開いた状態でプレゼンテーションの開始指示を受け付けた場合に起動し、マイク１から所定時間長の音声信号が入力されるまで待機する。例えば、少なくとも１フレーム分の時間長、例えば１０ｍｓｅｃの音声信号が入力されるのを待機する。そして、認識部１５ｂは、マイク１から所定時間長の音声信号が入力される度に、当該音声信号が入力された時点から遡って過去の一定期間における音声信号にワードスポッティングなどの音声認識を実行する。なお、ワードスポッティングとは、必要な単語を事前に登録しておき、音声信号から登録した単語を抽出する方式の総称である。このとき、認識部１５ｂは、記憶部１３に記憶された抽出単語データ１３ｂのうちプレゼンテーションソフトが実行中である文書ファイルが含むスライドであり、かつ表示装置５に表示中であるスライドに関する抽出単語データ１３ｂをワードスポッティングに適用する。これによって、認識部１５ｂは、プレゼンタ等の話者による発話の中に表示中のスライドに含まれる各領域から抽出された単語が存在するか否かを認識する。そして、認識部１５ｂは、音声信号から単語の読みが認識された場合、当該単語及びその単語が認識された時間が対応付けられた認識単語データ１３ｃを記憶部１３へ登録する。なお、同一の単語が時間経過に伴って複数回にわたって認識される場合には、最後、すなわち最新に認識された時刻が記憶部１３へ登録される。 As one embodiment, the recognition unit 15b is activated when the presentation software receives a presentation start instruction with the document file opened, and waits until a voice signal of a predetermined time length is input from the microphone 1. For example, it waits for an audio signal of a time length of at least one frame, for example, 10 msec, to be input. Then, each time the recognition unit 15b inputs a voice signal of a predetermined time length from the microphone 1, the recognition unit 15b performs voice recognition such as word spotting on the voice signal in a past fixed period of time dating back from the time when the voice signal was input. To do. Note that word spotting is a general term for a method of registering necessary words in advance and extracting the registered words from a voice signal. At this time, the recognition unit 15b extracts the extracted word data regarding the slide included in the document file in which the presentation software is being executed among the extracted word data 13b stored in the storage unit 13 and on the display device 5. Apply 13b to word spotting. Accordingly, the recognition unit 15b recognizes whether or not there is a word extracted from each area included in the slide being displayed in the utterance by the speaker such as the presenter. Then, when the word reading is recognized from the audio signal, the recognition unit 15b registers the recognition word data 13c associated with the word and the time at which the word was recognized in the storage unit 13. In addition, when the same word is recognized a plurality of times over time, the last, that is, the latest recognized time is registered in the storage unit 13.

その後、認識部１５ｂは、記憶部１３に記憶された認識単語データ１３ｃのうち記憶部１３へ登録されてから所定の期間が経過した単語が存在するか否かを判定する。例えば、認識部１５ｂは、認識単語データ１３ｃに含まれる単語ごとに、当該単語に対応付けて登録された時間と、認識部１５ｂが認識単語データ１３ｃを参照する時間、すなわち現時間との差が所定の閾値を超過するか否かを判定する。このとき、認識部１５ｂは、スライドが分割された単位、例えば一文、行や段落などによって上記の判定に用いる閾値を変えることができる。例えば、スライドが行単位で分割される場合、１つの領域で読み上げられる文字数はおよそ２０〜３０文字であると想定できる。この場合、上記の閾値の一例として、説明音声の平均的な読み上げ速度である、７拍／秒〜８拍／秒から読み上げに必要な時間を計算して、３秒を用いることができる。また、スライドが段落単位で分割される場合、行単位よりも長い時間が読み上げに割かれると想定できる。この場合、上記の閾値の一例として、行数×３秒を用いることができる。 Then, the recognition unit 15b determines whether or not there is a word in the recognized word data 13c stored in the storage unit 13 for which a predetermined period has elapsed since the word was registered in the storage unit 13. For example, the recognition unit 15b determines, for each word included in the recognition word data 13c, the difference between the time registered in association with the word and the time at which the recognition unit 15b refers to the recognition word data 13c, that is, the current time. It is determined whether or not a predetermined threshold is exceeded. At this time, the recognition unit 15b can change the threshold value used for the above determination according to the unit in which the slide is divided, for example, one sentence, line or paragraph. For example, when the slide is divided into rows, it can be assumed that the number of characters read in one area is approximately 20 to 30. In this case, as an example of the threshold value, 3 seconds can be used by calculating the time required for reading from the average reading speed of the explanation voice, which is 7 beats / second to 8 beats / second. In addition, when the slide is divided into paragraphs, it can be assumed that a longer time than line units is spent reading aloud. In this case, the number of rows × 3 seconds can be used as an example of the above threshold.

ここで、記憶部１３へ登録されてから所定の期間、例えば行数×３秒間が経過した単語が存在する場合、当該単語を含むスライドの領域に関する説明が終了している可能性が高まる。このような単語を残しておくと、説明が終了している領域がハイライトで表示される可能性も高まる。よって、認識部１５ｂは、記憶部１３に記憶された認識単語データ１３ｃから当該単語に関するレコードを削除する。一方、記憶部１３へ登録されてから所定の期間が経過した単語が存在しない場合、認識単語データ１３ｃに含まれる単語が出現するスライドの領域に関する説明が終了していない可能性が高まる。よって、認識部１５ｂは、記憶部１３に記憶された認識単語データ１３ｃに含まれる単語を削除せずにそのまま残す。 Here, if there is a word that has been stored in the storage unit 13 for a predetermined period, for example, the number of lines × 3 seconds, there is a high possibility that the description of the slide region including the word has been completed. If such a word is left, there is a high possibility that the area where the explanation is completed is highlighted. Therefore, the recognition unit 15b deletes the record related to the word from the recognized word data 13c stored in the storage unit 13. On the other hand, when there is no word for which a predetermined period has elapsed since being registered in the storage unit 13, there is a high possibility that the explanation regarding the slide region in which the word included in the recognized word data 13c appears has not been completed. Therefore, the recognition unit 15b leaves the words included in the recognized word data 13c stored in the storage unit 13 as they are without deleting them.

また、認識部１５ｂは、表示装置５に表示されるスライドのページが変更されたか否かを判定する。例えば、認識部１５ｂは、スライドショーによりスライドが切り替えられたり、入力装置７を介してスライドのページを進める操作またはスライドのページを戻す操作を受け付けたりしたかを判定する。このとき、表示装置５に表示されるスライドのページが変更された場合、プレゼンタ等の話者による説明も変更前のページのスライドから変更後のページのスライドへ切り替わった可能性が高い。この場合、認識部１５ｂは、記憶部１３に記憶された認識単語データ１３ｃを削除する。一方、表示装置５に表示されるスライドのページが変更されていない場合、話者が説明するページにも変りがない可能性が高い。この場合、認識部１５ｂは、記憶部１３に記憶された認識単語データ１３ｃに含まれる単語を削除せずにそのまま残す。 The recognition unit 15b also determines whether the slide page displayed on the display device 5 has been changed. For example, the recognition unit 15b determines whether the slides have been switched by the slide show, or the operation of advancing the slide page or the operation of returning the slide page has been accepted via the input device 7. At this time, when the page of the slide displayed on the display device 5 is changed, it is highly possible that the explanation by the speaker such as the presenter is also switched from the slide of the page before the change to the slide of the page after the change. In this case, the recognition unit 15b deletes the recognized word data 13c stored in the storage unit 13. On the other hand, if the slide page displayed on the display device 5 has not been changed, there is a high possibility that the page explained by the speaker will not change. In this case, the recognition unit 15b leaves the words included in the recognized word data 13c stored in the storage unit 13 as they are without deleting them.

これら一連の動作により、認識部１５ｂは、表示中であるスライドの中でプレゼンタが説明中である可能性が高い単語を認識する。以下では、抽出単語データ１３ｂに含まれる単語のことを「抽出単語」と記載すると共に、認識単語データ１３ｃに含まれる単語のことを「認識単語」と記載し、互いのラベルを区別する場合がある。 Through a series of these operations, the recognition unit 15b recognizes a word that is likely to be explained by the presenter in the displayed slide. Hereinafter, a word included in the extracted word data 13b will be referred to as an “extracted word”, and a word included in the recognized word data 13c will be referred to as a “recognized word” to distinguish the labels from each other. is there.

算出部１５ｃは、認識単語の領域内の位置を算出する処理部である。 The calculation unit 15c is a processing unit that calculates the position of the recognized word in the area.

一実施形態として、算出部１５ｃは、記憶部１３に記憶された認識単語データ１３ｃのうち、互いが連続して音声認識された２つの認識単語に関するレコードを読み出す。例えば、エントリの時刻が最新である認識単語と、その認識単語の直前に認識された認識単語とのレコードを読み出す。以下では、２つの認識単語のうち前者の認識単語のことを「第１の認識単語」と記載すると共に後者の認識単語のことを「第２の認識単語」と記載する場合がある。さらに、第１の認識単語に対応付けられたインデックス、すなわちスライド上で第１の認識単語が出現する領域のことを「第１の領域」と記載すると共に第２の認識単語に対応付けられたインデックス、すなわちスライド上で第２の認識単語が出現する領域のことを「第２の領域」と記載する場合がある。 As one embodiment, the calculation unit 15c reads out, from the recognized word data 13c stored in the storage unit 13, records related to two recognized words that have been continuously speech-recognized. For example, a record of the recognized word having the latest entry time and the recognized word recognized immediately before the recognized word is read. In the following, the former recognized word of the two recognized words may be referred to as a “first recognized word” and the latter recognized word may be referred to as a “second recognized word”. Furthermore, the index associated with the first recognition word, that is, the area where the first recognition word appears on the slide is described as “first area” and is associated with the second recognition word. The index, that is, the area where the second recognized word appears on the slide may be referred to as the "second area".

その後、算出部１５ｃは、第１の認識単語に対応付けられたインデックスと、第２の認識単語に対応付けられたインデックスとが異なるか否かを判定する。言い換えれば、算出部１５ｃは、第１の領域および第２の領域が異なるか否かを判定する。このとき、第１の領域および第２の領域が同一である場合、プレゼンテーションがそれまでに説明が行われていた領域から次の説明に関する記述がある新たな領域へ遷移していない可能性の方が高いと推定できる。一方、第１の領域および第２の領域が異なる場合、プレゼンテーションがそれまでに説明が行われていた領域から次の説明に関する記述がある新たな領域へ遷移した段階である可能性の方が高いと推定できる。この場合、算出部１５ｃは、第１の認識単語が表示装置５に表示中であるスライド内の複数の領域に出現する単語であるか否かをさらに判定する。例えば、算出部１５ｃは、抽出単語データ１３ｂに含まれる抽出単語のうち第１の認識単語と一致する抽出単語に対応付けられたインデックスと、表示中のスライドに含まれる領域のインデックスとを比較し、第１の認識単語と一致する抽出単語に対応付けられたインデックスが表示中のスライドに含まれる領域のインデックスと複数個にわたって一致するか否かを判定する。 Then, the calculation unit 15c determines whether or not the index associated with the first recognized word and the index associated with the second recognized word are different. In other words, the calculation unit 15c determines whether the first area and the second area are different. At this time, if the first area and the second area are the same, it is possible that the presentation has not transitioned from the area in which the explanation has been given up to the new area in which the description regarding the next explanation has been given. Can be estimated to be high. On the other hand, when the first area and the second area are different, it is more likely that the presentation has transitioned from the area where the explanation was given up to that time to a new area where there is a description regarding the next explanation. It can be estimated that In this case, the calculation unit 15c further determines whether or not the first recognized word is a word that appears in a plurality of areas in the slide being displayed on the display device 5. For example, the calculation unit 15c compares the index associated with the extracted word that matches the first recognized word among the extracted words included in the extracted word data 13b with the index of the area included in the slide being displayed. , It is determined whether the index associated with the extracted word that matches the first recognized word matches a plurality of indexes of the region included in the slide being displayed.

ここで、第１の認識単語が表示中のスライド内の複数の領域に出現しない単語である場合、プレゼンテーションがそれまでに説明が行われていた領域から次の説明に関する記述がある新たな領域へ遷移した段階である可能性がより高まる。この場合、算出部１５ｃは、記憶部１３に記憶された抽出単語データ１３ｂのうち第１の領域のインデックスに対応付けられた抽出単語の数が所定値、例えば「２」以上であるか否かを判定する。つまり、スライド上の第１の領域に含まれる単語の絶対数が少ないか否かを判定する。このとき、第１の領域に含まれる単語の絶対数が少ない場合、第１の認識単語が音声認識により得られた段階でハイライト表示を実施しないと第１の領域のハイライト表示漏れが発生する可能性が高まる。この場合、後述の推定部１５ｄにより第１の領域が発話箇所であると推定される。一方、第１の領域に含まれる単語の絶対数が少なくはない場合、プレゼンテーションの進行が領域間を遷移する段階に差し掛かったか否かをより精細に判定するパラメータとして、算出部１５ｃは、第１の認識単語の領域上の位置と、第２の認識単語の領域上の位置とを算出する。 Here, if the first recognized word is a word that does not appear in a plurality of areas in the slide being displayed, the presentation is changed from the area in which the explanation was given up to the new area in which the description about the next explanation is given. It is more likely that this is a transitional stage. In this case, the calculation unit 15c determines whether or not the number of extracted words associated with the index of the first region in the extracted word data 13b stored in the storage unit 13 is a predetermined value, for example, “2” or more. To judge. That is, it is determined whether or not the absolute number of words included in the first area on the slide is small. At this time, if the absolute number of words included in the first area is small, highlighting of the first area is omitted unless highlighting is performed at the stage when the first recognized word is obtained by voice recognition. The possibility of doing so increases. In this case, the estimation unit 15d, which will be described later, estimates that the first region is the utterance location. On the other hand, when the absolute number of words included in the first area is not small, the calculation unit 15c uses the first value as a parameter for more finely determining whether or not the progress of the presentation has reached the stage of transition between areas. The position on the area of the recognition word and the position on the area of the second recognition word are calculated.

例えば、算出部１５ｃは、下記の式（１）および下記の式（２）にしたがって第１の認識単語の領域上の位置ｔ１と、第２の認識単語の領域上の位置ｔ２とを算出する。ここで、下記の式（１）及び下記の式（２）における「Ｎ」は、領域を指し、「Ｋ」は、認識単語を指す。そして、下記の式（１）における「ＩＮＤＥＸ１（Ｎ，Ｋ）」は、領域Ｎの最初に含まれる認識単語Ｋの先頭文字のインデックス番号を指す。一方、下記の式（２）における「ＩＮＤＥＸ２（Ｎ，Ｋ）」は、領域Ｎの最後に含まれる認識単語Ｋの最終文字のインデックス番号を指す。なお、ここでは、一例として、インデックス番号が０から始まる場合を想定して以下の説明を行う。 For example, the calculation unit 15c calculates the position t1 on the area of the first recognition word and the position t2 on the area of the second recognition word according to the following equations (1) and (2). . Here, “N” in the following formula (1) and the following formula (2) indicates a region, and “K” indicates a recognition word. Then, “INDEX1 (N, K)” in the following expression (1) indicates the index number of the first character of the recognition word K included at the beginning of the area N. On the other hand, “INDEX2 (N, K)” in the following formula (2) indicates the index number of the last character of the recognition word K included at the end of the area N. Note that the following description will be given here, assuming that the index number starts from 0 as an example.

ｔ１＝ＩＮＤＥＸ１（Ｎ，Ｋ）／（Ｎの文字数−１）・・・式（１）
ｔ２＝ＩＮＤＥＸ２（Ｎ，Ｋ）／（Ｎの文字数−１）・・・式（２） t1 = INDEX1 (N, K) / (number of N characters-1) ... Equation (1)
t2 = INDEX2 (N, K) / (number of N characters-1) ... Equation (2)

図２は、スライドの一例を示す図である。図２には、領域Ｅ１、領域Ｅ２、領域Ｅ３及び領域Ｅ４の４つの領域を含むスライドＳ１が示されている。図２に示すスライドＳ１が表示装置５に表示された状況の下、話者によって「明日の天気です。関東ですが・・・」との発話が行われた場合、認識単語が「明日」、「天気」、「関東」の順に得られる。この場合、「関東」が第１の認識単語に該当し、「天気」が第２の認識単語に該当する。そして、「関東」が含まれる第１の領域Ｅ２と「天気」が含まれる第２の領域Ｅ１とが相違し、「関東」はスライドＳ１上の他の領域には出現しない。さらに、第１の領域Ｅ２には、「関東」及び「地方」の２つ以上の抽出単語が含まれるので、第１の領域Ｅ２の単語の絶対数は少数でないと識別される。 FIG. 2 is a diagram showing an example of a slide. FIG. 2 shows a slide S1 including four areas E1, E2, E3 and E4. Under the situation where the slide S1 shown in FIG. 2 is displayed on the display device 5, when the speaker utters “Tomorrow's weather. It's Kanto ...”, the recognition word is “tomorrow”, Obtained in order of "weather" and "Kanto". In this case, "Kanto" corresponds to the first recognition word, and "weather" corresponds to the second recognition word. Then, the first area E2 including "Kanto" is different from the second area E1 including "weather", and "Kanto" does not appear in other areas on the slide S1. Furthermore, since the first area E2 includes two or more extracted words “Kanto” and “region”, the absolute number of words in the first area E2 is identified as not being a small number.

これらの条件が満たされたことで、上記の式（１）にしたがって第１の認識単語の領域上の位置ｔ１が算出される。具体的には、第１の領域Ｅ２の最初に含まれる認識単語「関東」の先頭文字は「関」であり、この先頭文字「関」は第１の領域Ｅ２に含まれる文字列「関東地方」の先頭でもあるので、そのインデックス番号であるＩＮＤＥＸ１（Ｅ２，関東）は「０」と算出される。このため、ｔ１は、「０／（４−１）」の計算により「０」と算出される。一方、第２の領域Ｅ１の最後に含まれる認識単語「天気」の最終文字は「気」であり、この最終文字「気」は第２の領域Ｅ１に含まれる文字列「明日の天気」の５文字目、すなわち最終文字に対応するので、そのインデックス番号は０から数えて５番目の整数「４」がＩＮＤＥＸ２（Ｅ１，天気）として算出される。このため、ｔ２は、「４／（５−１）」の計算により「１」と算出される。 By satisfying these conditions, the position t1 of the first recognized word on the region is calculated according to the above equation (1). Specifically, the first character of the recognition word "Kanto" included at the beginning of the first area E2 is "Kan", and the first character "Kan" is the character string "Kanto region" included in the first area E2. Since it is also the head of “”, the index number INDEX1 (E2, Kanto) is calculated as “0”. Therefore, t1 is calculated as "0" by calculating "0 / (4-1)". On the other hand, the final character of the recognition word “weather” included at the end of the second area E1 is “ki”, and this final character “ki” is the same as that of the character string “tomorrow weather” included in the second area E1. Since it corresponds to the fifth character, that is, the last character, the index number is the fifth integer "4" counting from 0, which is calculated as INDEX2 (E1, weather). Therefore, t2 is calculated as "1" by the calculation of "4 / (5-1)".

このようなｔ１及びｔ２を算出することにより、ｔ１及びｔ２から次のような評価を行うことが可能になる。例えば、上述の通り算出される第１の認識単語の領域上の位置ｔ１は、第１の領域「Ｅ２」上で第１の認識単語「関東」が先頭から近い位置に存在するほどその値が低く算出される一方で、先頭から遠い位置に存在するほどその値が高く算出される。それ故、第１の認識単語の領域上の位置ｔ１により、第１の認識単語が第１の領域「Ｅ２」上でどれだけ先頭に近いかどうかを評価できる。また、上述の通り算出される第２の認識単語の領域上の位置ｔ２は、第２の領域「Ｅ１」上で第２の認識単語「天気」が末尾から近い位置に存在するほどその値が高く算出される一方で、末尾から遠い位置に存在するほどその値が低く算出される。それ故、第２の認識単語の領域上の位置ｔ２により、第２の認識単語が第２の領域「Ｅ１」上でどれだけ末尾に近いかどうかを評価できる。 By calculating such t1 and t2, the following evaluation can be performed from t1 and t2. For example, the position t1 on the area of the first recognition word calculated as described above has a value that increases as the first recognition word “Kanto” is closer to the beginning on the first area “E2”. While the value is calculated to be low, the value is calculated to be higher as the position is farther from the beginning. Therefore, it is possible to evaluate how close the first recognition word is to the beginning on the first area "E2" by the position t1 on the area of the first recognition word. Further, the position t2 on the area of the second recognition word calculated as described above has a value as the second recognition word “weather” is closer to the end on the second area “E1”. While the value is calculated to be high, the value is calculated to be lower as the position is farther from the end. Therefore, the position t2 on the area of the second recognition word makes it possible to evaluate how close the second recognition word is to the end on the second area "E1".

推定部１５ｄは、表示中のスライドに含まれる領域のうち発話箇所に対応する領域を推定する処理部である。 The estimation unit 15d is a processing unit that estimates a region corresponding to the utterance portion among the regions included in the slide being displayed.

一実施形態として、推定部１５ｄは、第１の領域および第２の領域が異なり、第１の認識単語が表示中のスライド内の複数の領域に出現しない単語であり、第１の領域に含まれる抽出単語の数が所定値以上でない場合、第１の領域を発話箇所と推定する。一方、推定部１５ｄは、第１の領域および第２の領域が異なり、第１の認識単語が表示中のスライド内の複数の領域に出現しない単語であり、第１の領域に含まれる抽出単語の数が所定値以上である場合、次のような判定により、第１の領域を発話箇所として推定するか、認識単語数が最多である領域を発話箇所として推定するのかを決定する。すなわち、推定部１５ｄは、第１の認識単語の位置が第１の領域の先頭から所定の範囲内であり、かつ第２の認識単語の位置が第２の領域の末尾から所定の範囲内であるか否かの判定により、プレゼンテーションの進行が領域間を遷移する段階に差し掛かったか否かをより詳細に判定する。 As one embodiment, the estimation unit 15d is a word in which the first recognized region and the second region are different from each other, and the first recognized word is a word that does not appear in a plurality of regions in the slide being displayed, and is included in the first region. When the number of extracted words to be extracted is not equal to or larger than the predetermined value, the first area is estimated to be the uttered portion. On the other hand, the estimation unit 15d is a word in which the first region and the second region are different from each other, and the first recognized word is a word that does not appear in a plurality of regions in the currently displayed slide, and is an extracted word included in the first region. When the number of is equal to or larger than a predetermined value, it is determined by the following determination whether the first region is estimated as the utterance portion or the region having the largest number of recognized words is estimated as the utterance portion. That is, the estimation unit 15d determines that the position of the first recognition word is within a predetermined range from the beginning of the first area and the position of the second recognition word is within a predetermined range from the end of the second area. By determining whether or not there is, it is determined in more detail whether or not the progress of the presentation has reached the stage of transition between regions.

これを具体的に説明すると、推定部１５ｄは、第１の認識単語の領域上の位置ｔ１が所定の閾値Ｔｈ１、例えば「０．２」以下であるか否かを判定する。このとき、推定部１５ｄは、第１の認識単語の領域上の位置ｔ１が閾値Ｔｈ１以下である場合、第２の認識単語の領域上の位置ｔ２が所定の閾値Ｔｈ２、例えば「０．８」以上であるか否かをさらに判定する。このとき、第１の認識単語の領域上の位置ｔ１が閾値Ｔｈ１以下であり、かつ第２の認識単語の領域上の位置ｔ２が閾値Ｔｈ２以上である場合、プレゼンテーションがスライドの記述内容の通りに進行し、領域間を遷移した直後である可能性が高いと推認できる。この場合、推定部１５ｄは、第１の領域を発話箇所と推定する。一方、第１の認識単語の領域上の位置ｔ１が閾値Ｔｈ１以下でないか、あるいは第２の認識単語の領域上の位置ｔ２が閾値Ｔｈ２以上でない場合、領域間を遷移した直後でない可能性が残る。この場合、認識単語数が最多である領域を発話箇所として推定する。例えば、推定部１５ｄは、表示中のスライドに含まれる領域ごとに当該領域のインデックスが対応付けられた認識単語の数を計数し、認識単語の数が最多である領域を発話箇所として推定する。 Specifically, the estimation unit 15d determines whether or not the position t1 on the region of the first recognized word is equal to or less than a predetermined threshold Th1, eg, “0.2”. At this time, when the position t1 on the area of the first recognition word is equal to or less than the threshold Th1, the estimating unit 15d determines that the position t2 on the area of the second recognition word is a predetermined threshold Th2, for example, “0.8”. It is further determined whether or not the above. At this time, if the position t1 on the area of the first recognized word is less than or equal to the threshold Th1 and the position t2 on the area of the second recognized word is greater than or equal to the threshold Th2, the presentation follows the description content of the slide. It can be inferred that there is a high possibility that it has just progressed and transitioned between regions. In this case, the estimation unit 15d estimates the first area as the utterance location. On the other hand, if the position t1 of the first recognized word on the region is not less than or equal to the threshold Th1 or the position t2 of the second recognized word on the region is not greater than or equal to the threshold Th2, it may not be immediately after the transition between the regions. . In this case, the region having the largest number of recognized words is estimated as the uttered portion. For example, the estimation unit 15d counts the number of recognition words associated with the index of each area included in the displayed slide, and estimates the area having the largest number of recognition words as the utterance location.

このように、第１の認識単語および第２の認識単語の位置に基づいて発話箇所に対応する領域を推定することにより、発話箇所の表示状態の変更漏れが発生するのを抑制できる。例えば、図２の例で言えば、スライドＳ１が表示装置５に表示された状況の下、話者によって「明日の天気です。関東ですが・・・」との発話が行われた場合、領域Ｅ１の認識単語数は「２」となり、領域Ｅ２の認識単語数は「１」となる。このため、認識単語が最多である領域を発話箇所として画一的に推定したのでは、プレゼンテーションは領域Ｅ２へ進行しているにもかかわらず、領域Ｅ１がハイライト表示される。この場合、領域Ｅ２に含まれる抽出単語「地方」が発話されることなく、領域Ｅ３にプレゼンテーションが進行した場合、領域Ｅ２の認識単語数が最多となる状況が発生しづらく、領域Ｅ２のハイライト表示漏れが発生する可能性がある。一方、本実施例では、第１の認識単語の領域上の位置ｔ１が「０」であり、第２の認識単語の領域上の位置ｔ２が「１」であるので、閾値Ｔｈ１「０．２」以下であり、かつ閾値Ｔｈ２「０．８」以上という条件を満たすことになる。この結果、第１の領域Ｅ２が発話箇所と推定されるので、領域Ｅ２のハイライト表示漏れを抑制できる。 In this way, by estimating the region corresponding to the uttered portion based on the positions of the first recognized word and the second recognized word, it is possible to prevent the display state of the uttered portion from being omitted. For example, in the example of FIG. 2, when the slide S1 is displayed on the display device 5 and the speaker utters “Tomorrow's weather. Kanto ... The number of recognized words in E1 is "2", and the number of recognized words in the area E2 is "1". Therefore, if the region having the largest number of recognized words is uniformly estimated as the uttered portion, the region E1 is highlighted even though the presentation progresses to the region E2. In this case, when the presentation proceeds to the area E3 without uttering the extracted word “region” included in the area E2, it is difficult for the situation in which the number of recognized words in the area E2 is the maximum to occur, and the highlight of the area E2 is generated. Display leakage may occur. On the other hand, in the present embodiment, the position t1 on the area of the first recognition word is “0” and the position t2 on the area of the second recognition word is “1”, so that the threshold value Th1 is “0.2”. ”Or less and the threshold value Th2 is“ 0.8 ”or more. As a result, the first area E2 is estimated to be the utterance portion, so that the highlight display omission in the area E2 can be suppressed.

なお、ここでは、認識単語の数が最多である領域を発話箇所として推定する場合を例示したが、これ以外の公知の任意の方法を用いることができる。例えば、認識単語のスライドにおける出現頻度を始めとする任意のパラメータにしたがって重みを付与してスコアを領域別に算出し、最高のスコアを持つ領域を発話箇所として推定することもできる。 In addition, here, the case where the region having the largest number of recognized words is estimated as the uttered portion is illustrated, but any known method other than this can be used. For example, a score may be calculated for each region by weighting according to an arbitrary parameter such as the appearance frequency of the recognized word on the slide, and the region having the highest score may be estimated as the uttered portion.

さらに、推定部１５ｄは、第１の領域および第２の領域の距離が所定の閾値、例えばα行以内であるか否かをさらに加重要件として追加し、両者の距離が閾値以内である場合に始めて第１の領域を発話箇所と推定することもできる。このような判定を追加する意義は、領域間の距離が近い場合の方がスライドの記述内容の通りにプレゼンテーションが進行している可能性がより高いと判断できるからである。 Furthermore, the estimation unit 15d further adds whether or not the distance between the first region and the second region is within a predetermined threshold, for example, α rows, as an additional important matter, and when the distance between the two is within the threshold, For the first time, the first area can be estimated as the utterance location. The reason for adding such a determination is that it can be determined that the possibility that the presentation is proceeding according to the description content of the slide is higher when the distance between the regions is shorter.

表示制御部１５ｅは、表示装置５に対する表示制御を実行する処理部である。なお、ここでは、表示制御部１５ｅが実行する表示制御のうち、スライドに関する表示制御と、ハイライトに関する表示制御との一側面について説明する。 The display control unit 15e is a processing unit that executes display control on the display device 5. Here, of the display controls executed by the display controller 15e, one aspect of display control regarding slides and display control regarding highlights will be described.

［スライドの表示制御］
一側面として、表示制御部１５ｅは、プレゼンテーションソフトにより文書ファイルが開かれた場合、当該文書ファイルが含むスライドを表示装置５に表示させる。このとき、表示制御部１５ｅは、文書ファイルが含むスライドのうち最初のページのスライドを表示させることとしてもよいし、最後に編集が行われたページのスライドを表示させることとしてもよい。その後、表示制御部１５ｅは、プレゼンテーションの開始指示を受け付けた後、推定部１５ｄにより推定された発話箇所に対応する領域に関するスライドを表示装置５に表示させる。また、表示制御部１５ｅは、入力装置７を介してページの切替え指示を受け付けた場合、表示装置５に表示させるスライドを変更する。例えば、ページを進める操作を受け付けた場合、表示制御部１５ｅは、表示中のスライドの次ページのスライドを表示装置５に表示させる。また、ページを戻る操作を受け付けた場合、表示制御部１５ｅは、表示中のスライドの前ページのスライドを表示装置５に表示させる。 [Slide display control]
As one aspect, when the document file is opened by the presentation software, the display control unit 15e causes the display device 5 to display the slide included in the document file. At this time, the display control unit 15e may display the slide of the first page among the slides included in the document file, or may display the slide of the last edited page. After that, the display control unit 15e receives the instruction to start the presentation, and then causes the display device 5 to display the slide related to the region corresponding to the utterance location estimated by the estimation unit 15d. Further, when the display control unit 15e receives a page switching instruction via the input device 7, the display control unit 15e changes the slide to be displayed on the display device 5. For example, when the operation of advancing a page is received, the display control unit 15e causes the display device 5 to display the slide of the next page of the slide being displayed. When the operation of returning the page is accepted, the display control unit 15e causes the display device 5 to display the slide of the previous page of the slide being displayed.

［ハイライトの表示制御］
他の一側面として、表示制御部１５ｅは、プレゼンテーションの開始指示を受け付けてからプレゼンテーションの終了指示を受け付けるまで下記の処理を繰り返し実行する。すなわち、表示制御部１５ｅは、推定部１５ｄにより推定された発話箇所の領域のハイライト表示を実行する。ここで言う「ハイライト表示」は、狭義のハイライト表示、すなわち背景色を明るくしたり、反転したりする表示制御に留まらず、広義のハイライト表示を意味する。例えば、説明箇所の囲み表示、説明箇所の塗りつぶしの強調、フォント（フォントサイズ、下線や斜体）の強調などのように、強調表示の全般を任意に実行することができる。なお、ハイライト表示は、入力装置７を介してキャンセル操作を受け付けた場合に通常表示へ戻すこととしてもかまわない。また、当然のことながら、推定部１５ｄによりいずれの領域も説明箇所として出力されない場合、例えば認識単語が存在しない場合等には、表示中のスライド上でハイライト表示は実行されない。 [Highlight display control]
As another aspect, the display control unit 15e repeatedly executes the following processing from the reception of the presentation start instruction to the reception of the presentation end instruction. That is, the display control unit 15e performs highlight display of the area of the utterance location estimated by the estimation unit 15d. The “highlight display” referred to here means not only the highlight display in a narrow sense, that is, the display control of brightening or inverting the background color, but also the highlight display in a broad sense. For example, general highlighting can be arbitrarily executed, such as enclosing the explanation part, emphasizing the filling of the explanation part, and emphasizing the font (font size, underline or italic). The highlighted display may be returned to the normal display when a cancel operation is accepted via the input device 7. Further, as a matter of course, when neither of the regions is output as the explanation part by the estimation unit 15d, for example, when the recognized word does not exist, the highlight display is not executed on the slide being displayed.

［処理の流れ］
次に、本実施例に係るプレゼンテーション支援装置１０の処理の流れについて説明する。なお、ここでは、プレゼンテーション支援装置１０が実行する（１）抽出単語データの生成処理、（２）音声認識処理、（３）表示制御処理の順に説明することとする。 [Process flow]
Next, a processing flow of the presentation support device 10 according to the present embodiment will be described. Note that, here, the description will be given in order of (1) extraction word data generation processing, (2) voice recognition processing, and (3) display control processing executed by the presentation support apparatus 10.

（１）抽出単語データの生成処理
図３は、実施例１に係る抽出単語データの生成処理の手順を示すフローチャートである。この処理は、自動的に開始することもできるし、手動設定で開始することもできる。例えば、自動的に開始する場合、プレゼンテーションソフトが文書ファイルを記憶部１３に保存した上で閉じる場合、あるいはプレゼンテーションを介する文書ファイルの編集中に文書ファイルが記憶部１３に保存された場合に、処理を起動させることができる。また、手動設定で開始する場合、入力装置７を介してプレゼンテーションの前処理の実行指示を受け付けた場合に、処理を起動させることができる。いずれの場合においても、記憶部１３に記憶された文書データ１３ａが含む文書ファイルのうち、保存または前処理の実行指示に対応する文書ファイルを読み出すことによって処理が開始される。 (1) Extracted Word Data Generation Processing FIG. 3 is a flowchart showing the procedure of the extracted word data generation processing according to the first embodiment. This process can be started automatically or manually. For example, when automatically starting, when the presentation software saves the document file in the storage unit 13 and then closes it, or when the document file is saved in the storage unit 13 while editing the document file through the presentation, Can be activated. Further, in the case of starting by manual setting, the process can be activated when an instruction to execute the pre-process of the presentation is received via the input device 7. In any case, the process is started by reading the document file corresponding to the instruction to execute the save or pre-process from the document files included in the document data 13a stored in the storage unit 13.

図３に示すように、抽出部１５ａは、文書ファイルに含まれるスライドを一文、行または段落などの単位で複数の領域へ分割する（ステップＳ１０１）。続いて、抽出部１５ａは、ステップＳ１０１で得られた領域に各領域を識別するインデックスを割り当てる（ステップＳ１０２）。 As shown in FIG. 3, the extraction unit 15a divides the slide included in the document file into a plurality of areas in units of one sentence, line, paragraph, or the like (step S101). Subsequently, the extraction unit 15a assigns an index for identifying each area to the area obtained in step S101 (step S102).

そして、抽出部１５ａは、ステップＳ１０２で割り当てられたインデックスのうちインデックスを１つ選択する（ステップＳ１０３）。続いて、抽出部１５ａは、ステップＳ１０３で選択されたインデックスの領域内の文字列に形態素解析等を実行することにより得られた形態素のうち品詞が名詞である単語を抽出する（ステップＳ１０４）。その後、抽出部１５ａは、ステップＳ１０４で抽出された各単語に当該単語が含まれる領域に割り当てられたインデックスを付与する（ステップＳ１０５）。 Then, the extraction unit 15a selects one index from the indexes assigned in step S102 (step S103). Subsequently, the extraction unit 15a extracts a word whose part of speech is a noun from the morphemes obtained by performing morphological analysis on the character string in the area of the index selected in step S103 (step S104). After that, the extraction unit 15a gives each word extracted in step S104 an index assigned to a region including the word (step S105).

そして、抽出部１５ａは、ステップＳ１０２で割り当てられたインデックスが全て選択されるまで（ステップＳ１０６Ｎｏ）、上記のステップＳ１０３〜ステップＳ１０５までの処理を繰返し実行する。 Then, the extraction unit 15a repeatedly executes the above-described steps S103 to S105 until all the indexes assigned in step S102 are selected (No in step S106).

その後、ステップＳ１０２で割り当てられたインデックスが全て選択された場合（ステップＳ１０６Ｙｅｓ）、抽出部１５ａは、スライドに含まれる単語ごとに当該単語ｋの読み及びインデックスｉｄｘが対応付けられた抽出単語データ１３ｂを記憶部１３へ登録し（ステップＳ１０７）、処理を終了する。 After that, when all the indexes assigned in step S102 are selected (Yes in step S106), the extraction unit 15a extracts the extracted word data 13b associated with the reading of the word k and the index idx for each word included in the slide. It is registered in the storage unit 13 (step S107), and the process ends.

（２）音声認識処理
図４は、実施例１に係る音声認識処理の手順を示すフローチャートである。この処理は、プレゼンテーションソフトが文書ファイルを開いた状態でプレゼンテーションの開始指示を受け付けた場合に起動し、プレゼンテーションの終了指示を受け付けるまで繰返し実行される。 (2) Voice Recognition Processing FIG. 4 is a flowchart showing the procedure of the voice recognition processing according to the first embodiment. This process is started when the presentation software receives a presentation start instruction with the document file opened, and is repeatedly executed until a presentation end instruction is received.

図４に示すように、認識部１５ｂは、マイク１から所定時間長の音声信号が入力されるまで、例えば少なくとも１フレーム分の時間長、例えば１０ｍｓｅｃの音声信号が入力されるまで待機する（ステップＳ３０１）。 As shown in FIG. 4, the recognition unit 15b waits until a voice signal having a predetermined time length is input from the microphone 1, for example, a voice signal having a time length of at least one frame, for example, 10 msec is input (step). S301).

そして、マイク１から所定時間長の音声信号が入力されると（ステップＳ３０１Ｙｅｓ）、認識部１５ｂは、当該音声信号にワードスポッティングなどの音声認識を実行する（ステップＳ３０２）。かかるステップＳ３０２でワードスポッティングが実行される場合には、記憶部１３に記憶された抽出単語データ１３ｂのうちプレゼンテーションソフトが実行中である文書ファイルが含むスライドであり、かつ表示装置５に表示中であるスライドに関する抽出単語データが音声認識用の辞書データとして適用される。 Then, when a voice signal of a predetermined time length is input from the microphone 1 (Yes in step S301), the recognition unit 15b executes voice recognition such as word spotting on the voice signal (step S302). When word spotting is performed in step S302, the slide is included in the document file in which the presentation software is being executed among the extracted word data 13b stored in the storage unit 13, and is being displayed on the display device 5. Extracted word data regarding a certain slide is applied as dictionary data for voice recognition.

このとき、音声信号から単語が認識された場合（ステップＳ３０３Ｙｅｓ）、認識部１５ｂは、ステップＳ３０２で認識された単語及びその単語が認識された時間が対応付けられた認識単語データ１３ｃを記憶部１３へ登録し（ステップＳ３０４）、ステップＳ３０５の処理へ移行する。 At this time, when the word is recognized from the voice signal (Yes in step S303), the recognition unit 15b stores the recognized word data 13c in which the word recognized in step S302 and the time at which the word is recognized are associated with each other. Is registered (step S304), and the process proceeds to step S305.

一方、マイク１から所定時間長の音声信号が入力されていない場合、あるいは音声信号から単語が認識されなかった場合（ステップＳ３０１ＮｏまたはステップＳ３０３Ｎｏ）、以降の処理を飛ばしてステップＳ３０５の処理へ移行する。 On the other hand, when the voice signal of the predetermined time length is not input from the microphone 1 or when the word is not recognized from the voice signal (No in step S301 or No in step S303), the subsequent process is skipped and the process proceeds to step S305. .

ここで、認識部１５ｂは、記憶部１３に記憶された認識単語データ１３ｃのうち記憶部１３へ登録されてから所定の期間が経過した単語が存在するか否かを判定する（ステップＳ３０５）。そして、記憶部１３へ登録されてから所定の期間が経過した単語が存在する場合（ステップＳ３０５Ｙｅｓ）、認識部１５ｂは、記憶部１３に記憶された認識単語データ１３ｃから当該単語に関するレコードを削除する（ステップＳ３０６）。なお、記憶部１３へ登録されてから所定の期間が経過した単語が存在しない場合（ステップＳ３０５Ｎｏ）には、ステップＳ３０６の処理を飛ばしてステップＳ３０７の処理へ移行する。 Here, the recognition unit 15b determines whether or not there is a word in the recognized word data 13c stored in the storage unit 13 for which a predetermined period has elapsed since being registered in the storage unit 13 (step S305). Then, when there is a word for which a predetermined period has elapsed since being registered in the storage unit 13 (Yes in step S305), the recognition unit 15b deletes the record related to the word from the recognized word data 13c stored in the storage unit 13. (Step S306). If there is no word for which a predetermined period has elapsed since being registered in the storage unit 13 (No in step S305), the process of step S306 is skipped and the process proceeds to step S307.

その後、認識部１５ｂは、表示装置５に表示されるスライドのページが変更されたか否かを判定する（ステップＳ３０７）。このとき、表示装置５に表示されるスライドのページが変更された場合（ステップＳ３０７Ｙｅｓ）、認識部１５ｂは、記憶部１３に記憶された認識単語データ１３ｃを削除し（ステップＳ３０８）、ステップＳ３０１の処理へ戻り、上記のステップＳ３０１以降の処理が繰り返し実行される。なお、表示装置５に表示されるスライドのページが変更されていない場合（ステップＳ３０７Ｎｏ）、ステップＳ３０８の処理を実行せずにステップＳ３０１の処理へ戻る。 After that, the recognition unit 15b determines whether the page of the slide displayed on the display device 5 has been changed (step S307). At this time, if the page of the slide displayed on the display device 5 is changed (Yes in step S307), the recognition unit 15b deletes the recognition word data 13c stored in the storage unit 13 (step S308), and in step S301. Returning to the processing, the processing from step S301 above is repeatedly executed. If the slide page displayed on the display device 5 has not been changed (No in step S307), the process returns to step S301 without executing the process of step S308.

（３）表示制御処理
図５は、実施例１に係る表示制御処理の手順を示すフローチャートである。この処理は、一例として、図４に示した音声認識処理と並行して実行される処理であり、プレゼンテーションソフトが文書ファイルを開いた状態でプレゼンテーションの開始指示を受け付けた場合に起動し、プレゼンテーションの終了指示を受け付けるまで繰返し実行される。なお、処理の実行が繰り返される周期は、図４に示した音声認識処理と同様であってもよいし、異なってもよく、図４に示した音声認識処理と同期して実行されることとしてもよいし、非同期で実行されることとしてもかまわない。 (3) Display Control Process FIG. 5 is a flowchart showing the procedure of the display control process according to the first embodiment. This process is, for example, a process that is executed in parallel with the voice recognition process shown in FIG. 4, and is started when the presentation software receives a presentation start instruction with the document file open, and It is repeatedly executed until the end instruction is received. Note that the cycle in which the execution of the processing is repeated may be the same as or different from the voice recognition processing shown in FIG. 4, and is assumed to be executed in synchronization with the voice recognition processing shown in FIG. It can be done asynchronously.

図５に示すように、算出部１５ｃは、記憶部１３に記憶された認識単語データ１３ｃのうち、エントリの時刻が最新である第１の認識単語と、第１の認識単語の直前に認識された第２の認識単語とのレコードを読み出す（ステップＳ５０１）。 As shown in FIG. 5, the calculation unit 15c recognizes the first recognized word having the latest entry time among the recognized word data 13c stored in the storage unit 13 and the recognized word immediately before the first recognized word. A record with the second recognized word is read (step S501).

続いて、算出部１５ｃは、第１の認識単語に対応付けられたインデックスと、第２の認識単語に対応付けられたインデックスとが異なるか否か、すなわち第１の領域および第２の領域が異なるか否かを判定する（ステップＳ５０２）。 Then, the calculation unit 15c determines whether or not the index associated with the first recognized word is different from the index associated with the second recognized word, that is, whether the first region and the second region are It is determined whether they are different (step S502).

このとき、第１の認識単語に対応付けられたインデックスと、第２の認識単語に対応付けられたインデックスとが異なる場合（ステップＳ５０２Ｙｅｓ）、プレゼンテーションがそれまでに説明が行われていた領域から次の説明に関する記述がある新たな領域へ遷移した段階である可能性の方が高いと推定できる。この場合、算出部１５ｃは、第１の認識単語が表示装置５に表示中であるスライド内の複数の領域に出現する単語であるか否かをさらに判定する（ステップＳ５０３）。 At this time, if the index associated with the first recognized word and the index associated with the second recognized word are different (Yes in step S502), the presentation starts from the area in which the explanation is given until the next. It can be inferred that there is a higher possibility that it is at the stage where the description has changed to a new area. In this case, the calculation unit 15c further determines whether or not the first recognized word is a word that appears in a plurality of regions in the slide being displayed on the display device 5 (step S503).

ここで、第１の認識単語が表示中のスライド内の複数の領域に出現しない単語である場合（ステップＳ５０３Ｎｏ）、プレゼンテーションがそれまでに説明が行われていた領域から次の説明に関する記述がある新たな領域へ遷移した段階である可能性がより高まる。この場合、算出部１５ｃは、記憶部１３に記憶された抽出単語データ１３ｂのうち第１の領域のインデックスに対応付けられた抽出単語の数が所定値、例えば「２」以上であるか否かを判定する（ステップＳ５０４）。 Here, when the first recognized word is a word that does not appear in a plurality of regions in the slide being displayed (No in step S503), there is a description regarding the next explanation from the region where the presentation has been explained up to that point. It is more likely that this is a stage where a transition has been made to a new area. In this case, the calculation unit 15c determines whether or not the number of extracted words associated with the index of the first region in the extracted word data 13b stored in the storage unit 13 is a predetermined value, for example, “2” or more. Is determined (step S504).

そして、第１の領域に含まれる抽出単語の数が所定値以上である場合（ステップＳ５０４Ｙｅｓ）、プレゼンテーションの進行が領域間を遷移する段階に差し掛かったか否かをより精細に判定するパラメータとして、算出部１５ｃは、第１の認識単語の領域上の位置と、第２の認識単語の領域上の位置とを算出する（ステップＳ５０５）。 Then, when the number of extracted words included in the first area is equal to or larger than a predetermined value (Yes in step S504), calculation is performed as a parameter for more finely determining whether or not the progress of the presentation has reached the stage of transition between areas. The unit 15c calculates the position of the first recognized word on the region and the position of the second recognized word on the region (step S505).

その上で、推定部１５ｄは、第１の認識単語の領域上の位置ｔ１が所定の閾値Ｔｈ１、例えば「０．２」以下であるか否かを判定する（ステップＳ５０６）。このとき、第１の認識単語の領域上の位置ｔ１が閾値Ｔｈ１以下である場合（ステップＳ５０６Ｙｅｓ）、第２の認識単語の領域上の位置ｔ２が所定の閾値Ｔｈ２、例えば「０．８」以上であるか否かをさらに判定する（ステップＳ５０７）。 Then, the estimation unit 15d determines whether or not the position t1 on the region of the first recognized word is equal to or less than a predetermined threshold Th1, eg, “0.2” (step S506). At this time, when the position t1 on the area of the first recognition word is less than or equal to the threshold Th1 (step S506 Yes), the position t2 on the area of the second recognition word is equal to or larger than a predetermined threshold Th2, for example, “0.8” or more. Is further determined (step S507).

ここで、第１の認識単語の領域上の位置ｔ１が閾値Ｔｈ１以下であり、かつ第２の認識単語の領域上の位置ｔ２が閾値Ｔｈ２以上である場合、プレゼンテーションがスライドの記述内容の通りに進行し、領域間を遷移した直後である可能性が高いと推認できる。この場合、推定部１５ｄは、第１の領域および第２の領域の距離が所定の閾値、例えばα行以内であるか否かをさらに判定する（ステップＳ５０８）。このとき、第１の領域および第２の領域の距離が閾値以内である場合（ステップＳ５０８Ｙｅｓ）、推定部１５ｄは、第１の領域を発話箇所と推定する（ステップＳ５０９）。 Here, when the position t1 on the area of the first recognition word is less than or equal to the threshold value Th1 and the position t2 on the area of the second recognition word is greater than or equal to the threshold value Th2, the presentation follows the description content of the slide. It can be inferred that there is a high possibility that it has just progressed and transitioned between regions. In this case, the estimation unit 15d further determines whether or not the distance between the first region and the second region is within a predetermined threshold value, for example, α rows (step S508). At this time, when the distance between the first area and the second area is within the threshold value (step S508 Yes), the estimation unit 15d estimates the first area as the utterance location (step S509).

また、第１の領域に含まれる抽出単語の数が所定値以上でない場合（ステップＳ５０４Ｎｏ）、第１の認識単語が音声認識により得られた段階でハイライト表示を実施しないと第１の領域のハイライト表示漏れが発生する可能性が高まる。この場合にも、推定部１５ｄは、第１の領域を発話箇所と推定する（ステップＳ５０９）。 If the number of extracted words included in the first area is not equal to or larger than the predetermined value (No in step S504), highlight display is not performed when the first recognized word is obtained by voice recognition. The possibility of highlight display omission increases. Also in this case, the estimation unit 15d estimates the first region as the utterance location (step S509).

一方、第１の認識単語に対応付けられたインデックスと第２の認識単語に対応付けられたインデックスとが同一である場合、第１の認識単語が表示中のスライド内の複数の領域に出現する単語である場合、第１の認識単語の領域上の位置ｔ１が閾値Ｔｈ１以下でない場合、第２の認識単語の領域上の位置ｔ２が閾値Ｔｈ２以上でない場合、あるいは第１の領域および第２の領域の距離が閾値以内でない場合（ステップＳ５０２Ｎｏ、ステップＳ５０３Ｙｅｓ、ステップＳ５０６Ｎｏ、ステップＳ５０７ＮｏまたはステップＳ５０８Ｎｏ）、推定部１５ｄは、認識単語数が最多である領域を発話箇所として推定する（ステップＳ５１０）。 On the other hand, when the index associated with the first recognized word and the index associated with the second recognized word are the same, the first recognized word appears in a plurality of regions in the slide being displayed. In the case of a word, if the position t1 on the area of the first recognized word is not less than or equal to the threshold Th1, the position t2 on the area of the second recognized word is not greater than or equal to the threshold Th2, or if the first area and the second area When the distance of the area is not within the threshold (step S502 No, step S503 Yes, step S506 No, step S507 No or step S508 No), the estimation unit 15d estimates the area having the largest number of recognized words as the utterance location (step S510).

その後、表示制御部１５ｅは、ステップＳ５０９またはステップＳ５１０で発話箇所として推定された領域に関するハイライト表示を実行し（ステップＳ５１１）、処理を終了する。 After that, the display control unit 15e performs highlight display regarding the area estimated as the uttered portion in step S509 or step S510 (step S511), and ends the processing.

なお、図５に示したステップＳ５０２、ステップＳ５０３、ステップＳ５０６〜ステップＳ５０８の判定は、図示の順序通りに実行されずともかまわず、順不同で実行することもできるし、並列処理により判定することもできる。 The determinations of step S502, step S503, and step S506 to step S508 shown in FIG. 5 do not have to be performed in the order shown in the figure, and may be performed in any order, or may be performed by parallel processing. it can.

［効果の一側面］
上述してきたように、本実施例に係るプレゼンテーション支援装置１０は、発話から音声認識される２つの認識単語がスライド上の複数の領域にまたがる場合に２つの認識単語のうち後続する認識単語が出現する領域の表示状態を変更する。それ故、単語の絶対数が少ない領域内の単語が発話された場合、他の領域内の単語の発話頻度が高くとも当該領域の表示状態をハイライト表示等に変更できる結果、単語の絶対数が少ない領域の表示状態が変更されずにスキップされるのを抑制できる。したがって、本実施例に係るプレゼンテーション支援装置１０によれば、発話箇所の表示状態の変更漏れが発生するのを抑制できる。 [One side of effect]
As described above, in the presentation support device 10 according to the present embodiment, when the two recognized words that are speech-recognized from the utterance span a plurality of areas on the slide, the succeeding recognized word of the two recognized words appears. Change the display status of the area to be edited. Therefore, when a word in a region with a small absolute number of words is uttered, even if the utterance frequency of words in other regions is high, the display state of the region can be changed to highlight display, etc. It is possible to suppress skipping without changing the display state of a region with a small number. Therefore, according to the presentation support device 10 according to the present embodiment, it is possible to prevent the omission of the change in the display state of the uttered portion.

また、本実施例に係るプレゼンテーション支援装置１０は、第１の領域に含まれる抽出単語の数が所定値以上である場合、第１の認識単語の位置が第１の領域の先頭から所定の範囲内であり、かつ第２の認識単語の位置が第２の領域の末尾から所定の範囲内であるか否かを判定する。したがって、本実施例に係るプレゼンテーション支援装置１０によれば、第１の領域から第２の領域へのハイライト表示の切替えを迅速に行うことができる。 Further, in the presentation support apparatus 10 according to the present embodiment, when the number of extracted words included in the first area is equal to or larger than a predetermined value, the position of the first recognized word is within a predetermined range from the beginning of the first area. And the position of the second recognized word is within a predetermined range from the end of the second area. Therefore, according to the presentation support device 10 according to the present embodiment, it is possible to quickly switch the highlight display from the first area to the second area.

上記の実施例１では、プレゼンテーションの一例として、会議や講演等を想定したが、プレゼンテーションはこれら会議や講演などの場面に限定されない。すなわち、予め定められたシナリオにしたがって演劇や映画などの音声が映像と共に出力される場面もプレゼンテーションの範疇に含まれる。 In the above-described first embodiment, a conference, a lecture, etc. are assumed as an example of the presentation, but the presentation is not limited to the scenes of the conference, the lecture, etc. That is, a presentation includes a scene in which audio such as a play or a movie is output together with a video according to a predetermined scenario.

そこで、本実施例では、演劇や映画などのシナリオ、例えばセリフなどに関する文書データ１３ａ、抽出単語データ１３ｂ及び認識単語データ１３ｃを記憶しておき、演劇や映画などのコンテンツが表示装置５で再生される状況の下、文書データ１３ａに含まれるスライドのうちセリフ等の発話箇所に対応する領域がピックアップして表示される実施形態について説明する。 Therefore, in the present embodiment, the scenario data such as plays and movies, for example, the document data 13a, the extracted word data 13b, and the recognition word data 13c related to the dialogue are stored, and the contents such as plays and movies are reproduced on the display device 5. Under such a situation, an embodiment in which an area corresponding to an utterance portion such as a dialogue among the slides included in the document data 13a is picked up and displayed will be described.

図６は、実施例２に係るプレゼンテーション支援装置２０の機能的構成を示すブロック図である。図６に示すプレゼンテーション支援装置２０は、図１に示したプレゼンテーション支援装置１０に比べて、記憶部２１に映像データ２１ａが記憶されると共に、制御部２３が上記の表示制御部１５ｅの機能と一部が異なる表示制御部２３ａを有する点が異なる。なお、以下では、図１に示したプレゼンテーション支援装置１０と機能が同一である部分には同一の符号を付し、その説明を省略することとする。 FIG. 6 is a block diagram of the functional configuration of the presentation support device 20 according to the second embodiment. Compared with the presentation support device 10 shown in FIG. 1, the presentation support device 20 shown in FIG. 6 stores video data 21a in the storage unit 21, and the control unit 23 has the same function as the display control unit 15e. The difference is that the units have different display control units 23a. In the following, the parts having the same functions as those of the presentation support device 10 shown in FIG. 1 are designated by the same reference numerals, and the description thereof will be omitted.

図６に示す映像データ２１ａは、動画等の映像コンテンツに関するデータである。この映像コンテンツの一例として、演劇や映画などのコンテンツを採用できる。これと関連して、映像コンテンツに含まれるセリフ等は、脚本や台本などの文書もしくはそこからセリフが抽出された文書が文書データ１３ａとして記憶部２１に記憶される。 The video data 21a shown in FIG. 6 is data relating to video content such as a moving image. As an example of this video content, content such as a play or movie can be adopted. In connection with this, as the dialogue included in the video content, a document such as a script or script or a document in which the dialogue is extracted is stored in the storage unit 21 as the document data 13a.

図６に示す表示制御部２３ａは、表示装置５に対する表示制御を実行する点は図１に示した表示制御部１５ｅと共通するが、その表示制御の内容が異なる。すなわち、表示制御部２３ａは、文書データ１３ａのスライドを表示する代わりに、映像データ２１ａに含まれる映像コンテンツを表示装置５で再生する。そして、表示制御部２３ａは、文書データ１３ａのうち推定部１５ｄにより発話箇所として推定された領域に対応するセリフを映像コンテンツに重畳して表示させる。つまり、表示制御部２３ａは、推定部１５ｄにより発話箇所として推定された領域に対応するセリフを抜粋して表示することにより、発話箇所として推定された領域の表示状態の変更を実現する。このようにセリフを映像コンテンツ上に重畳表示させる場合、セリフが文字列として表示される位置や大きさは任意とすることができるが、一例として、字幕スーパーと同様の表示方法を採用できる。 The display control unit 23a illustrated in FIG. 6 is similar to the display control unit 15e illustrated in FIG. 1 in that the display control unit 23a executes display control on the display device 5, but the display control contents are different. That is, the display control unit 23a reproduces the video content included in the video data 21a on the display device 5, instead of displaying the slide of the document data 13a. Then, the display control unit 23a superimposes the dialogue corresponding to the region of the document data 13a estimated by the estimation unit 15d as the uttered portion on the video content to be displayed. In other words, the display control unit 23a realizes the change of the display state of the area estimated as the utterance location by extracting and displaying the dialogue corresponding to the area estimated as the utterance location by the estimation unit 15d. In this way, when the dialogue is superimposed and displayed on the video content, the position and size of the dialogue displayed as a character string can be set arbitrarily, but as an example, a display method similar to that of the subtitle supermarket can be adopted.

以上の映像データ２１ａや表示制御部２３ａの実装により、本実施例に係るプレゼンテーション支援装置２０は、演劇や映画のセリフ等の発話箇所に対応する領域がピックアップして表示するプレゼンテーションを実現できる。このプレゼンテーション支援装置２０においても、上記の実施例１と同様、発話箇所の表示状態の変更漏れが発生するのを抑制できる。 By mounting the video data 21a and the display control unit 23a as described above, the presentation support apparatus 20 according to the present embodiment can realize a presentation in which an area corresponding to an utterance portion such as a dialogue of a play or a movie is picked up and displayed. Also in this presentation support device 20, it is possible to suppress the omission of the change in the display state of the uttered portion, as in the first embodiment.

さて、これまで開示の装置に関する実施例について説明したが、本発明は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。そこで、以下では、本発明に含まれる他の実施例を説明する。 Although the embodiments of the disclosed device have been described so far, the present invention may be implemented in various different forms other than the embodiments described above. Therefore, other embodiments included in the present invention will be described below.

［文書ファイルの応用例］
上記の実施例１では、プレゼンテーションソフトによって作成された文書を用いる場合を例示したが、他のアプリケーションプログラムによって作成された文書を用いることもできる。例えば、ワープロソフトの文書ファイルが有するページをスライドに読み替えたり、表計算ソフトの文書ファイルが有するシートをスライドに読み替えたりすることによって図３〜図５に示した処理を同様に適用できる。 [Application example of document file]
In the above-described first embodiment, the case where the document created by the presentation software is used is illustrated, but the document created by another application program can also be used. For example, the processing shown in FIGS. 3 to 5 can be similarly applied by replacing the page included in the document file of the word processing software with a slide or the sheet included in the document file of the spreadsheet software with a slide.

［分散および統合］
また、図示した各装置の各構成要素は、必ずしも物理的に図１や図６の如く構成されておらずともよい。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、抽出部１５ａ、認識部１５ｂ、算出部１５ｃ、推定部１５ｄまたは表示制御部１５ｅもしくは表示制御部２３ａをプレゼンテーション支援装置１０またはプレゼンテーション支援装置２０の外部装置としてネットワーク経由で接続するようにしてもよい。また、抽出部１５ａ、認識部１５ｂ、算出部１５ｃ、推定部１５ｄまたは表示制御部１５ｅもしくは表示制御部２３ａを別の装置がそれぞれ有し、ネットワーク接続されて協働することで、上記のプレゼンテーション支援装置１０またはプレゼンテーション支援装置２０の機能を実現するようにしてもよい。 Distributed and integrated
Further, each constituent element of each illustrated device may not necessarily be physically configured as shown in FIG. 1 or FIG. That is, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or part of the device may be functionally or physically distributed / arranged in arbitrary units according to various loads and usage conditions. It can be integrated and configured. For example, the extraction unit 15a, the recognition unit 15b, the calculation unit 15c, the estimation unit 15d, the display control unit 15e, or the display control unit 23a may be connected via a network as an external device of the presentation support device 10 or the presentation support device 20. Good. Further, another device has the extraction unit 15a, the recognition unit 15b, the calculation unit 15c, the estimation unit 15d, the display control unit 15e, or the display control unit 23a, and each device has a network connection to cooperate with each other to support the above presentation. The functions of the device 10 or the presentation support device 20 may be realized.

［他の実装例］
上記の実施例１では、プレゼンテーション支援装置１０またはプレゼンテーション支援装置２０が上記のプレゼンテーションソフトを外部のリソースに依存せずに単独で実行するスタンドアローンで図３〜図５に関する処理を実行する場合を例示したが、他の実装形態を採用することもできる。例えば、プレゼンテーションソフトを実行するクライアントに対し、図３〜図５に関する処理のうち一部または全部の処理を実行するサーバを設けることによってクライアントサーバシステムとして構築することもできる。この場合、パッケージソフトウェアやオンラインソフトウェアとして上記のプレゼンテーション支援サービスを実現するプレゼンテーション支援プログラムをインストールさせることによってサーバ装置を実装できる。例えば、サーバ装置は、上記のプレゼンテーション支援サービスを提供するＷｅｂサーバとして実装することとしてもよいし、アウトソーシングによって上記のプレゼンテーション支援サービスを提供するクラウドとして実装することとしてもかまわない。この場合、クライアントは、サーバ装置に対し、ハイライト表示の開始指示、例えば少なくともプレゼンテーションに用いる文書ファイルを指定する情報をアップロードした後に、プレゼンテーションが開始される。プレゼンテーションが開始されると、クライアントは、マイク１から採取された音声信号または音声認識処理の結果をアップロードし、表示装置５に表示中のスライドのページが切り替わる度にスライドのページ情報をアップロードする。すなわち、抽出単語データの生成処理や音声認識処理は、クライアント側で実行させることもできるし、サーバ側で実行させることとしてもかまわない。これによって、サーバ装置は、少なくとも図５に示した処理が実行可能となる。さらに、クライアントは、図示しない入力デバイスに関する操作情報をサーバへ伝送し、サーバから伝送される処理結果だけを表示装置５に表示させることにより、シンクライアントシステムとして構築することもできる。この場合には、各種のリソース、例えば文書データもサーバにより保持されると共に、プレゼンテーションソフトもサーバで仮想マシンとして実装されることになる。例えば、プレゼンテーションソフトがクライアント側で実行される場合、サーバからクライアントへハイライト表示を実施する領域の識別情報、例えば上記の領域のインデックスを伝送すればよく、また、シンクライアントシステムとして実装される場合、説明箇所のハイライト表示が実施されたスライドの表示データまたはハイライト表示が行われる前の画面との差分データをサーバからクライアントへ伝送すればよい。なお、上記の実施例１では、上記のプレゼンテーション支援処理が組み込まれたプレゼンテーションソフトが実行される場合を想定したが、ライセンス権限を有するクライアントからプレゼンテーション支援プログラムをライブラリとして参照する要求を受け付けた場合に、プレゼンテーション支援プログラムをプレゼンテーションソフトへプラグインさせることもできる。 [Other implementation example]
In the above-described first embodiment, the case where the presentation support device 10 or the presentation support device 20 independently executes the above-mentioned presentation software without depending on external resources and executes the processing relating to FIGS. 3 to 5 is illustrated. However, other implementations can also be adopted. For example, a client server system can be constructed by providing a client that executes the presentation software with a server that executes a part or all of the processes shown in FIGS. In this case, the server device can be mounted by installing the presentation support program that realizes the above-mentioned presentation support service as package software or online software. For example, the server device may be implemented as a Web server that provides the above-mentioned presentation support service, or may be implemented as a cloud that provides the above-mentioned presentation support service by outsourcing. In this case, the client uploads the highlight display start instruction, for example, at least the information designating the document file used for the presentation, to the server device, and then the presentation is started. When the presentation is started, the client uploads the audio signal collected from the microphone 1 or the result of the voice recognition process, and uploads the page information of the slide every time the page of the slide being displayed on the display device 5 is switched. That is, the extraction word data generation processing and the voice recognition processing can be executed either on the client side or on the server side. As a result, the server device can execute at least the processing shown in FIG. Furthermore, the client can be constructed as a thin client system by transmitting operation information regarding an input device (not shown) to the server and displaying only the processing result transmitted from the server on the display device 5. In this case, various resources such as document data are held by the server, and the presentation software is also installed as a virtual machine by the server. For example, when the presentation software is executed on the client side, the identification information of the area to be highlighted, such as the index of the above area, may be transmitted from the server to the client, and when it is implemented as a thin client system. The display data of the slide in which the highlighted portion is displayed or the difference data from the screen before the highlighted display may be transmitted from the server to the client. In the first embodiment described above, it is assumed that the presentation software incorporating the above-mentioned presentation support processing is executed. It is also possible to plug the presentation support program into the presentation software.

［シンクライアントシステムへの適用例１］
図７は、実施例３に係るプレゼンテーション支援システム３の構成例を示す図である。図７には、図１に示したプレゼンテーション支援装置１０がシンクライアントシステムとして実装される例が示されている。図７に示すプレゼンテーション支援システム３は、一例として、クライアント端末３０に最低限の機能しか持たせず、サーバ装置３００でアプリケーションやファイルなどのリソースを管理する。なお、ここでは、プレゼンテーション支援システム３の一形態としてシンクライアントシステムを例示するが、後述のように、汎用のクライアントサーバシステムにも上記のプレゼンテーション支援サービスを適用できることをここで付言しておく。 [Application example 1 to thin client system]
FIG. 7 is a diagram illustrating a configuration example of the presentation support system 3 according to the third embodiment. FIG. 7 shows an example in which the presentation support device 10 shown in FIG. 1 is implemented as a thin client system. As an example, the presentation support system 3 illustrated in FIG. 7 has the client terminal 30 with a minimum function, and the server device 300 manages resources such as applications and files. Although a thin client system is illustrated as an example of the presentation support system 3 here, it should be additionally noted that the above-described presentation support service can be applied to a general-purpose client server system as described later.

図７に示すように、プレゼンテーション支援システム３には、クライアント端末３０と、サーバ装置３００とが含まれる。 As shown in FIG. 7, the presentation support system 3 includes a client terminal 30 and a server device 300.

クライアント端末３０には、デスクトップ型またはノート型のパーソナルコンピュータなどの情報処理装置を採用することができる。この他、クライアント端末３０には、上記のパーソナルコンピュータなどの据置き型の端末のみならず、各種の携帯端末装置を採用することもできる。例えば、携帯端末装置の一例として、スマートフォン、携帯電話機やＰＨＳなどの移動体通信端末、さらには、ＰＤＡなどのスレート端末などがその範疇に含まれる。 The client terminal 30 may be an information processing device such as a desktop or notebook personal computer. In addition, as the client terminal 30, not only the above-mentioned stationary terminal such as a personal computer but also various mobile terminal devices can be adopted. For example, examples of mobile terminal devices include smartphones, mobile communication terminals such as mobile phones and PHS, and slate terminals such as PDA in its category.

サーバ装置３００は、上記のプレゼンテーション支援サービスを提供するコンピュータである。 The server device 300 is a computer that provides the above-mentioned presentation support service.

一実施形態として、サーバ装置３００は、パッケージソフトウェアやオンラインソフトウェアとして上記のプレゼンテーション支援サービスを実現するプレゼンテーション支援プログラムをインストールさせることによってサーバ装置を実装できる。例えば、サーバ装置３００は、上記のプレゼンテーション支援サービスを提供するＷｅｂサーバとして実装することとしてもよいし、アウトソーシングによって上記のプレゼンテーション支援サービスを提供するクラウドとして実装することとしてもかまわない。 As an embodiment, the server device 300 can be installed by installing a presentation support program that realizes the above-mentioned presentation support service as package software or online software. For example, the server device 300 may be implemented as a Web server that provides the above presentation support service, or may be implemented as a cloud that provides the above presentation support service by outsourcing.

これらクライアント端末３０及びサーバ装置３００は、ネットワークＮＷを介して、互いが通信可能な状態で接続される。ネットワークＮＷの一例として、有線または無線を問わず、インターネットを始め、ＬＡＮやＶＰＮ（Virtual Private Network）などの任意の種類の通信網を採用できる。 The client terminal 30 and the server device 300 are connected to each other via the network NW in a communicable state. As an example of the network NW, any type of communication network such as LAN, VPN (Virtual Private Network) including the Internet can be adopted regardless of wired or wireless.

図７に示す通り、クライアント端末３０は、マイク１と、表示装置５と、入力装置７と、データ授受部２４とを有する。なお、図７には、図１に示した機能部と同様の機能を発揮する機能部、例えばマイク、表示装置及び入力装置に同一の符号を付し、その説明を省略する。 As shown in FIG. 7, the client terminal 30 includes a microphone 1, a display device 5, an input device 7, and a data transfer unit 24. Note that, in FIG. 7, the same reference numerals are given to the functional units that perform the same functions as the functional units illustrated in FIG. 1, for example, the microphone, the display device, and the input device, and the description thereof will be omitted.

データ授受部３４は、サーバ装置３００との間で各種のデータの授受を制御する処理部である。 The data transfer unit 34 is a processing unit that controls transfer of various data to and from the server device 300.

一実施形態として、データ授受部３４は、一例として、クライアント端末３０が有するＣＰＵなどのプロセッサにより、シンクライアントシステムのクライアント用のプログラムが実行されることで、仮想的に実現される。 As an embodiment, the data transfer unit 34 is virtually realized by, for example, a processor such as a CPU included in the client terminal 30 executing a program for a client of the thin client system.

例えば、データ授受部３４は、マイク１により入力される音声データ、さらには、入力装置７が受け付けた操作情報などをサーバ装置３００へ送信する。また、データ授受部３４は、サーバ装置３００で実行されるプレゼンテーションソフトの実行結果を含むデスクトップ画面、すなわち表示装置５のスクリーンに表示させる表示データを受信する。例えば、プレゼンテーションソフトにより文書ファイルがスライドショーで表示される場合、プレゼンテーションソフトにより生成されるウィンドウは全画面表示されるので、デスクトップ画面とウィンドウ画面とが同じ表示内容となる。ここで、データ授受部３４は、サーバ装置３００が伝送するデスクトップ画面の表示データを任意のフレームレートで受信することができる他、デスクトップ画面の表示データに差分がある場合に絞ってデスクトップ画面の表示データを受信することもできる。このとき、サーバ装置３００から伝送されるデスクトップ画面の表示データは、デスクトップ画面の全体であってもよいし、デスクトップ画面の一部、例えばフレーム間の差分の表示データであってもかまわない。 For example, the data transfer unit 34 transmits the voice data input by the microphone 1, the operation information received by the input device 7, and the like to the server device 300. The data transfer unit 34 also receives display data to be displayed on the desktop screen including the execution result of the presentation software executed by the server device 300, that is, the screen of the display device 5. For example, when a document file is displayed as a slide show by the presentation software, the window generated by the presentation software is displayed on the full screen, so that the desktop screen and the window screen have the same display content. Here, the data transfer unit 34 can receive the display data of the desktop screen transmitted by the server device 300 at an arbitrary frame rate, and displays the desktop screen only when there is a difference in the display data of the desktop screen. It can also receive data. At this time, the display data of the desktop screen transmitted from the server device 300 may be the entire desktop screen or a part of the desktop screen, for example, display data of a difference between frames.

このように、クライアント端末３０及びサーバ装置３００の間で授受される各種のデータには、トラフィックを抑制する観点から、圧縮符号化を行うこととしてもよいし、また、セキュリティの観点から、各種の暗号化を行うこととしてもよい。 As described above, various types of data exchanged between the client terminal 30 and the server device 300 may be compression-encoded from the viewpoint of suppressing traffic, and various types of data from the viewpoint of security. The encryption may be performed.

図７に示すように、サーバ装置３００は、記憶部３２０と、制御部３４０とを有する。なお、サーバ装置３００は、図７に示す機能部以外にも既知のコンピュータが有する各種の機能部、例えば他の装置との間で通信制御を行う通信Ｉ／Ｆ部などの機能部を有することとしてもかまわない。 As shown in FIG. 7, the server device 300 has a storage unit 320 and a control unit 340. In addition to the functional units shown in FIG. 7, the server device 300 has various functional units included in a known computer, for example, a functional unit such as a communication I / F unit that controls communication with other devices. It doesn't matter.

記憶部３２０は、制御部３４０で実行されるＯＳやプレゼンテーションソフトを始め、アプリケーションプログラムなどの各種プログラムに用いられるデータを記憶するデバイスである。 The storage unit 320 is a device that stores data used for various programs such as an application program including an OS and presentation software executed by the control unit 340.

一実施形態として、記憶部３２０は、サーバ装置３００における主記憶装置として実装される。例えば、記憶部３２０には、各種の半導体メモリ素子、例えばＲＡＭやフラッシュメモリを採用できる。また、記憶部３２０は、補助記憶装置として実装することもできる。この場合、ＨＤＤ、光ディスクやＳＳＤなどを採用できる。 As one embodiment, the storage unit 320 is implemented as a main storage device in the server device 300. For example, the storage unit 320 can employ various semiconductor memory devices, such as a RAM or a flash memory. The storage unit 320 can also be implemented as an auxiliary storage device. In this case, HDD, optical disk, SSD, etc. can be adopted.

例えば、記憶部３２０は、制御部３４０で実行されるプログラムに用いられるデータの一例として、図７に示す文書データ３２１、抽出単語データ３２２及び認識単語データ３２３を記憶する。これら文書データ３２１、抽出単語データ３２２及び認識単語データ３２３は、サーバ装置３００に接続されるクライアント端末３０のうちいずれのクライアント端末３０に関するデータであるのかがサーバ装置３００で識別できるように、文書データ３２１、抽出単語データ３２２及び認識単語データ３２３が格納される記憶領域がクライアント端末３０の識別情報ごとに区別されたり、あるいは文書データ３２１、抽出単語データ３２２及び認識単語データ３２３がクライアント端末３０の識別情報とさらに対応付けられたりする他は、図１に示した文書データ１３ａ、抽出単語データ１３ｂ及び認識単語データ１３ｃと同様のデータである。 For example, the storage unit 320 stores the document data 321, the extracted word data 322, and the recognition word data 323 illustrated in FIG. 7 as an example of the data used in the program executed by the control unit 340. The document data 321, the extracted word data 322, and the recognition word data 323 are document data so that the server device 300 can identify which of the client terminals 30 is connected to the server device 300. 321, the storage area in which the extracted word data 322 and the recognized word data 323 are stored is distinguished according to the identification information of the client terminal 30, or the document data 321, the extracted word data 322, and the recognized word data 323 identify the client terminal 30. It is the same data as the document data 13a, the extracted word data 13b, and the recognition word data 13c shown in FIG. 1 except that it is further associated with information.

制御部３４０は、各種のプログラムや制御データを格納する内部メモリを有し、これらによって種々の処理を実行するものである。 The control unit 340 has an internal memory that stores various programs and control data, and executes various processes by these.

一実施形態として、制御部３４０は、中央処理装置、いわゆるＣＰＵとして実装される。なお、制御部３４０は、必ずしも中央処理装置として実装されずともよく、ＭＰＵやＤＳＰとして実装されることとしてもよい。また、制御部３４０は、ＡＳＩＣやＦＰＧＡなどのハードワイヤードロジックによっても実現できる。 In one embodiment, the control unit 340 is implemented as a central processing unit, so-called CPU. The control unit 340 does not necessarily have to be implemented as a central processing unit, and may be implemented as an MPU or DSP. The control unit 340 can also be realized by hard-wired logic such as ASIC or FPGA.

制御部３４０は、各種のプログラムを実行することによって下記の処理部を仮想的に実現する。例えば、制御部３４０は、図７に示すように、抽出部３４１と、認識部３４２と、算出部３４３と、推定部３４４と、表示制御部３４５とを有する。 The control unit 340 virtually implements the following processing units by executing various programs. For example, the control unit 340 includes an extraction unit 341, a recognition unit 342, a calculation unit 343, an estimation unit 344, and a display control unit 345, as illustrated in FIG. 7.

図７に示す抽出部３４１、認識部３４２、算出部３４３及び推定部３４４は、図１に示した抽出部１５ａ、認識部１５ｂ、算出部１５ｃ及び推定部１５ｄと同様の処理を実行する処理部である。 The extraction unit 341, the recognition unit 342, the calculation unit 343, and the estimation unit 344 illustrated in FIG. 7 perform the same processing as the extraction unit 15a, the recognition unit 15b, the calculation unit 15c, and the estimation unit 15d illustrated in FIG. Is.

表示制御部３４５は、クライアント端末３０の表示装置５に対する表示制御を実行する処理部である。 The display control unit 345 is a processing unit that executes display control of the display device 5 of the client terminal 30.

ここで、表示制御部３４５は、クライアント端末３０のデスクトップ画面、すなわち表示装置５のスクリーンに表示させる表示データを所定のフレームレート、あるいはデスクトップ画面の更新を契機に送信する。このとき、表示制御部３４５は、デスクトップ画面に更新がない場合、必ずしもデスクトップ画面の表示データをクライアント端末３０へ伝送せずともかまわない。さらに、表示制御部３４５は、デスクトップ画面の全体の表示データを送信することとしてもよいし、デスクトップ画面の一部、例えばフレーム間の差分の表示データを送信することとしてもかまわない。このようなデスクトップ画面の伝送と並行して、表示制御部３４５は、図１に示した表示制御部１５ｅと同様に、クライアント端末３０から伝送される入力装置７の操作情報にしたがって上記のスライドの表示制御を実行したり、さらには、上記のハイライトの表示制御などを実行することにより、プレゼンテーションソフトにより生成されるウィンドウ画面の表示データを更新する。このようにしてデスクトップ画面の伝送時にウィンドウ画面の更新内容がサーバ装置３００からクライアント端末３０へ伝送されることになる。 Here, the display control unit 345 transmits display data to be displayed on the desktop screen of the client terminal 30, that is, the screen of the display device 5 at a predetermined frame rate or when the desktop screen is updated. At this time, the display control unit 345 may not necessarily transmit the display data of the desktop screen to the client terminal 30 when the desktop screen is not updated. Furthermore, the display control unit 345 may transmit display data of the entire desktop screen, or may transmit a part of the desktop screen, for example, display data of a difference between frames. In parallel with the transmission of such a desktop screen, the display control unit 345, similar to the display control unit 15e shown in FIG. 1, displays the above slide according to the operation information of the input device 7 transmitted from the client terminal 30. The display data of the window screen generated by the presentation software is updated by executing the display control, the display control of the highlight, and the like. In this way, when the desktop screen is transmitted, the updated contents of the window screen are transmitted from the server device 300 to the client terminal 30.

以上のように、本実施例に係るプレゼンテーション支援システム３がシンクライアントシステムとして実装された場合、サーバ装置３００の抽出部３４１が図３に示した処理を実行し、認識部３４２が図４に示した音声認識処理を実行することができる。この音声認識処理では、ステップＳ３０１でマイク１から音声データが直接取得される代わりに、クライアント端末３０からサーバ装置３００へ伝送される音声データが取得される以外に処理内容の差はない。さらに、サーバ装置３００の算出部３４３、推定部３４４及び表示制御部３４５が図５に示した表示制御処理を実行することができる。 As described above, when the presentation support system 3 according to the present embodiment is implemented as a thin client system, the extraction unit 341 of the server device 300 executes the processing illustrated in FIG. 3 and the recognition unit 342 illustrated in FIG. It is possible to execute a voice recognition process. In this voice recognition process, there is no difference in the processing content except that voice data transmitted from the client terminal 30 to the server device 300 is obtained instead of directly obtaining voice data from the microphone 1 in step S301. Furthermore, the calculation unit 343, the estimation unit 344, and the display control unit 345 of the server device 300 can execute the display control process illustrated in FIG.

［シンクライアントシステムへの適用例２］
図８は、実施例３に係るプレゼンテーション支援システム４の構成例を示す図である。図８には、図６に示したプレゼンテーション支援装置２０がシンクライアントシステムとして実装される例が示されている。図８に示すプレゼンテーション支援システム４は、一例として、クライアント端末４０に最低限の機能しか持たせず、サーバ装置４００でアプリケーションやファイルなどのリソースを管理する。なお、ここでは、プレゼンテーション支援システム４の一形態としてシンクライアントシステムを例示するが、後述のように、汎用のクライアントサーバシステムにも上記のプレゼンテーション支援サービスを適用できることをここで付言しておく。 [Application example 2 to thin client system]
FIG. 8 is a diagram illustrating a configuration example of the presentation support system 4 according to the third embodiment. FIG. 8 shows an example in which the presentation support device 20 shown in FIG. 6 is implemented as a thin client system. As an example, the presentation support system 4 shown in FIG. 8 has the client terminal 40 having a minimum function, and the server device 400 manages resources such as applications and files. Although a thin client system is illustrated here as an example of the presentation support system 4, it should be additionally noted that the above-mentioned presentation support service can be applied to a general-purpose client server system as described later.

図８に示すように、プレゼンテーション支援システム４には、クライアント端末４０と、サーバ装置４００とが含まれる。 As shown in FIG. 8, the presentation support system 4 includes a client terminal 40 and a server device 400.

クライアント端末４０には、デスクトップ型またはノート型のパーソナルコンピュータなどの情報処理装置を採用することができる。この他、クライアント端末４０には、上記のパーソナルコンピュータなどの据置き型の端末のみならず、各種の携帯端末装置を採用することもできる。例えば、携帯端末装置の一例として、スマートフォン、携帯電話機やＰＨＳなどの移動体通信端末、さらには、ＰＤＡなどのスレート端末などがその範疇に含まれる。 As the client terminal 40, an information processing device such as a desktop or notebook personal computer can be adopted. In addition, the client terminal 40 may be not only the above-mentioned stationary terminal such as a personal computer but also various mobile terminal devices. For example, examples of mobile terminal devices include smartphones, mobile communication terminals such as mobile phones and PHS, and slate terminals such as PDA in its category.

サーバ装置４００は、上記のプレゼンテーション支援サービスを提供するコンピュータである。 The server device 400 is a computer that provides the above-mentioned presentation support service.

一実施形態として、サーバ装置４００は、パッケージソフトウェアやオンラインソフトウェアとして上記のプレゼンテーション支援サービスを実現するプレゼンテーション支援プログラムをインストールさせることによってサーバ装置を実装できる。例えば、サーバ装置４００は、上記のプレゼンテーション支援サービスを提供するＷｅｂサーバとして実装することとしてもよいし、アウトソーシングによって上記のプレゼンテーション支援サービスを提供するクラウドとして実装することとしてもかまわない。 As one embodiment, the server apparatus 400 can be implemented by installing a presentation support program that realizes the above-mentioned presentation support service as package software or online software. For example, the server device 400 may be implemented as a Web server that provides the above-mentioned presentation support service, or may be implemented as a cloud that provides the above-mentioned presentation support service by outsourcing.

これらクライアント端末４０及びサーバ装置４００は、ネットワークＮＷを介して、互いが通信可能な状態で接続される。ネットワークＮＷの一例として、有線または無線を問わず、インターネットを始め、ＬＡＮやＶＰＮなどの任意の種類の通信網を採用できる。 The client terminal 40 and the server device 400 are connected to each other via the network NW so that they can communicate with each other. As an example of the network NW, any type of communication network such as the Internet, LAN, VPN, etc. can be adopted regardless of wired or wireless.

図８に示す通り、クライアント端末４０は、図７に示したマイク１、表示装置５、入力装置７及びデータ授受部３４に加え、映像入力装置８をさらに有する。この映像入力装置８には、一例として、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）などの撮像素子を搭載する撮像装置を採用できる。これにより、映像データ２１ａを予め保持しておかずとも、映像入力装置８に撮像された演劇などの映像コンテンツをリアルタイムで表示装置５に再生させることもできる。また、映像データ２１ａをサーバ装置４００に保持させておき、映像コンテンツを表示制御部４４５に再生させることもできる。なお、図８には、図１に示した機能部と同様の機能を発揮する機能部、例えばマイク、表示装置及び入力装置に同一の符号を付し、その説明を省略する。 As shown in FIG. 8, the client terminal 40 further includes a video input device 8 in addition to the microphone 1, the display device 5, the input device 7, and the data transfer unit 34 shown in FIG. 7. As the video input device 8, for example, an image pickup device equipped with an image pickup device such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor) can be adopted. Accordingly, the video content such as a play imaged by the video input device 8 can be reproduced on the display device 5 in real time without holding the video data 21a in advance. Further, the video data 21a may be held in the server device 400 and the video content may be reproduced by the display control unit 445. Note that, in FIG. 8, the same reference numerals are given to the functional units that exhibit the same functions as the functional units illustrated in FIG. 1, for example, the microphone, the display device, and the input device, and the description thereof will be omitted.

データ授受部４４は、サーバ装置４００との間で各種のデータの授受を制御する処理部である。 The data transfer unit 44 is a processing unit that controls transfer of various data to and from the server device 400.

一実施形態として、データ授受部４４は、一例として、クライアント端末４０が有するＣＰＵなどのプロセッサにより、シンクライアントシステムのクライアント用のプログラムが実行されることで、仮想的に実現される。例えば、データ授受部４４は、マイク１により入力される音声データ、さらには、入力装置７が受け付けた操作情報などをサーバ装置４００へ送信する。また、データ授受部４４は、発話箇所に対応するセリフに関する表示データ、さらには、セリフが表示される大きさや位置などの属性情報を受信する。 As one embodiment, the data transfer unit 44 is virtually realized by, for example, a processor such as a CPU included in the client terminal 40 executing a client program of the thin client system. For example, the data transfer unit 44 transmits to the server device 400 the voice data input by the microphone 1 and the operation information received by the input device 7. Further, the data transfer unit 44 receives the display data regarding the dialogue corresponding to the uttered portion, and further the attribute information such as the size and the position where the dialogue is displayed.

このように、クライアント端末４０及びサーバ装置４００の間で授受される各種のデータには、トラフィックを抑制する観点から、圧縮符号化を行うこととしてもよいし、また、セキュリティの観点から、各種の暗号化を行うこととしてもよい。 As described above, various types of data exchanged between the client terminal 40 and the server device 400 may be compression-encoded from the viewpoint of suppressing traffic, and various types of data from the viewpoint of security. The encryption may be performed.

図８に示すように、サーバ装置４００は、記憶部４２０と、制御部４４０とを有する。なお、サーバ装置４００は、図８に示す機能部以外にも既知のコンピュータが有する各種の機能部、例えば他の装置との間で通信制御を行う通信Ｉ／Ｆ部などの機能部を有することとしてもかまわない。 As shown in FIG. 8, the server device 400 has a storage unit 420 and a control unit 440. It should be noted that the server device 400 has various functional units included in a known computer other than the functional units illustrated in FIG. 8, for example, a functional unit such as a communication I / F unit that controls communication with another device. It doesn't matter.

記憶部４２０は、制御部４４０で実行されるＯＳやプレゼンテーションソフトを始め、アプリケーションプログラムなどの各種プログラムに用いられるデータを記憶するデバイスである。 The storage unit 420 is a device that stores data used for various programs such as an application program including an OS and presentation software executed by the control unit 440.

一実施形態として、記憶部４２０は、サーバ装置４００における主記憶装置として実装される。例えば、記憶部４２０には、各種の半導体メモリ素子、例えばＲＡＭやフラッシュメモリを採用できる。また、記憶部４２０は、補助記憶装置として実装することもできる。この場合、ＨＤＤ、光ディスクやＳＳＤなどを採用できる。 As one embodiment, the storage unit 420 is implemented as a main storage device in the server device 400. For example, various semiconductor memory devices such as a RAM and a flash memory can be used as the storage unit 420. The storage unit 420 can also be implemented as an auxiliary storage device. In this case, HDD, optical disk, SSD, etc. can be adopted.

例えば、記憶部４２０は、制御部４４０で実行されるプログラムに用いられるデータの一例として、図８に示す文書データ４２１、抽出単語データ４２２及び認識単語データ４２３を記憶する。これら文書データ４２１、抽出単語データ４２２及び認識単語データ４２３は、サーバ装置４００に接続されるクライアント端末４０のうちいずれのクライアント端末４０に関するデータであるのかがサーバ装置４００で識別できるように、文書データ４２１、抽出単語データ４２２及び認識単語データ４２３が格納される記憶領域がクライアント端末４０の識別情報ごとに区別されたり、あるいは文書データ４２１、抽出単語データ４２２及び認識単語データ４２３がクライアント端末４０の識別情報とさらに対応付けられたりする他は、図６に示した文書データ１３ａ、抽出単語データ１３ｂ及び認識単語データ１３ｃと同様のデータである。なお、図６に示した映像データ２１ａをさらに記憶部４２０に記憶させることもできる。 For example, the storage unit 420 stores the document data 421, the extracted word data 422, and the recognition word data 423 illustrated in FIG. 8 as an example of data used in the program executed by the control unit 440. The document data 421, the extracted word data 422, and the recognition word data 423 are document data so that the server device 400 can identify which client terminal 40 among the client terminals 40 connected to the server device 400. The storage area in which 421, the extracted word data 422, and the recognized word data 423 are stored is distinguished for each identification information of the client terminal 40, or the document data 421, the extracted word data 422, and the recognized word data 423 identify the client terminal 40. The data is the same as the document data 13a, the extracted word data 13b, and the recognition word data 13c shown in FIG. 6 except that it is further associated with information. The video data 21a shown in FIG. 6 can be further stored in the storage unit 420.

制御部４４０は、各種のプログラムや制御データを格納する内部メモリを有し、これらによって種々の処理を実行するものである。 The control unit 440 has an internal memory that stores various programs and control data, and executes various processes by these.

一実施形態として、制御部４４０は、中央処理装置、いわゆるＣＰＵとして実装される。なお、制御部４４０は、必ずしも中央処理装置として実装されずともよく、ＭＰＵやＤＳＰとして実装されることとしてもよい。また、制御部４４０は、ＡＳＩＣやＦＰＧＡなどのハードワイヤードロジックによっても実現できる。 In one embodiment, the controller 440 is implemented as a central processing unit, a so-called CPU. The control unit 440 does not necessarily have to be implemented as a central processing unit, and may be implemented as an MPU or DSP. The control unit 440 can also be realized by hard-wired logic such as ASIC or FPGA.

制御部４４０は、各種のプログラムを実行することによって下記の処理部を仮想的に実現する。例えば、制御部４４０は、図８に示すように、抽出部４４１と、認識部４４２と、算出部４４３と、推定部４４４と、表示制御部４４５とを有する。図８に示す抽出部４４１、認識部４４２、算出部４４３、推定部４４４及び表示制御部４４５は、図６に示した抽出部１５ａ、認識部１５ｂ、算出部１５ｃ、推定部１５ｄ及び表示制御部２３ａと同様の処理を実行する処理部である。 The control unit 440 virtually realizes the following processing units by executing various programs. For example, the control unit 440 includes an extraction unit 441, a recognition unit 442, a calculation unit 443, an estimation unit 444, and a display control unit 445, as illustrated in FIG. 8. The extraction unit 441, the recognition unit 442, the calculation unit 443, the estimation unit 444, and the display control unit 445 illustrated in FIG. 8 are the extraction unit 15a, the recognition unit 15b, the calculation unit 15c, the estimation unit 15d, and the display control unit illustrated in FIG. 23 a is a processing unit that executes the same processing.

以上のように、本実施例に係るプレゼンテーション支援システム４がシンクライアントシステムとして実装された場合、サーバ装置４００の抽出部４４１が図３に示した処理を実行し、認識部４４２が図４に示した音声認識処理を実行することができる。この音声認識処理では、ステップＳ３０１でマイク１から音声データが直接取得される代わりに、クライアント端末３０からサーバ装置３００へ伝送される音声データが取得される以外に処理内容の差はない。さらに、サーバ装置４００の算出部４４３、推定部４４４及び表示制御部４４５が図５に示した表示制御処理を実行することができる。 As described above, when the presentation support system 4 according to the present embodiment is implemented as a thin client system, the extraction unit 441 of the server device 400 executes the processing illustrated in FIG. 3 and the recognition unit 442 illustrated in FIG. It is possible to execute a voice recognition process. In this voice recognition process, there is no difference in the processing content except that voice data transmitted from the client terminal 30 to the server device 300 is obtained instead of directly obtaining voice data from the microphone 1 in step S301. Furthermore, the calculation unit 443, the estimation unit 444, and the display control unit 445 of the server device 400 can execute the display control process illustrated in FIG.

［汎用のクライアントサーバシステムへの適用例］
図７及び図８には、プレゼンテーション支援システム３または４がシンクライアントシステムとして実装される場合を例示したが、必ずしもシンクライアントシステムとして実装されずともかまわず、汎用のクライアントサーバシステムとして実装することもできる。 [Application example to a general-purpose client-server system]
7 and 8 exemplify the case where the presentation support system 3 or 4 is implemented as a thin client system, but it does not necessarily have to be implemented as a thin client system, and may be implemented as a general-purpose client server system. it can.

例えば、図１または図６に示したプレゼンテーション支援装置１０または２０をクライアント端末とし、このクライアント端末を収容する図示しないサーバ装置に、プレゼンテーション支援装置１０または２０が有する処理部のうち、算出部１５ｃ、推定部１５ｄ及び表示制御部１５ｅもしくは２３ａなどの処理部を実装することとすればよい。この場合、クライアント端末であるプレゼンテーション支援装置１０または２０が図４に示した音声認識処理を実行し、認識単語が得られる度に追加の認識単語もしくは認識単語データの全体を図示しないサーバ装置へ伝送することにより、図示しないサーバ装置上でクライアント端末ごとに認識単語データが記憶されることになる。これによって、クライアント及びサーバ間で音声データが伝送されずともよくなる。 For example, the presentation support apparatus 10 or 20 shown in FIG. 1 or 6 is used as a client terminal, and a server unit (not shown) accommodating the client terminal includes a calculation unit 15c among processing units included in the presentation support apparatus 10 or 20. A processing unit such as the estimation unit 15d and the display control unit 15e or 23a may be mounted. In this case, the presentation support device 10 or 20 which is a client terminal executes the voice recognition process shown in FIG. 4, and every time a recognition word is obtained, the additional recognition word or the entire recognition word data is transmitted to a server device (not shown). By doing so, the recognition word data is stored for each client terminal on the server device (not shown). This prevents voice data from being transmitted between the client and the server.

以上のように、汎用のクライアントサーバシステムにも上記のプレゼンテーション支援サービスを適用できる。 As described above, the above presentation support service can be applied to a general-purpose client server system.

［電子会議システムへの適用例］
例えば、上記の実施例１では、話者と聴講者が１つの表示装置５を共用する場面を例示したが、必ずしも話者と聴講者が１つの表示装置を共用せずともかまわず、複数の表示装置の間で同一の表示内容が共有される場面にも上記のプレゼンテーション支援サービスを適用できる。例えば、電子会議等のコミュニケーションにおいて各参加者が話者及び聴講者の少なくとも一方または両方の立場で参加する状況が挙げられる。この場合、互いの表示装置に接続されるコンピュータがネットワークを介して接続されていれば互いが遠隔地に存在してもかまわない。 [Example of application to electronic conference system]
For example, in the above-described first embodiment, the case where the speaker and the listener share one display device 5 has been illustrated, but the speaker and the listener do not necessarily share one display device. The above-mentioned presentation support service can be applied to a situation where the same display content is shared between display devices. For example, in a communication such as an electronic conference, there is a situation in which each participant participates as at least one or both of a speaker and a listener. In this case, if the computers connected to the respective display devices are connected to each other via the network, they may exist in remote places.

図９は、電子会議システムへの適用例を示す図である。例えば、図９に示すように、図１に示したプレゼンテーション支援装置１０と同様の機能を有するクライアント端末１０Ａ及び１０ＢがネットワークＮＷを介して接続されると共にクライアント端末１０Ａ及び１０Ｂ上でコミュニケーションツール、例えば画面共有用のアプリケーションプログラムが実行される場面に適用できる。これによって、クライアント端末１０Ａ及び１０Ｂが有する各表示装置の間で同一の表示内容、例えばプレゼンテーションソフト用の文書ファイルが共有される。このような状況の下、クライアント端末１０Ａ及び１０Ｂのうち少なくとも一方の端末が図３〜図５に示した処理を実行することにより、クライアント端末１０Ａまたは１０Ｂの利用者の発話および視線を利用して、文書ファイルに含まれるスライドのうち説明箇所に対応する領域をハイライト表示することができる。 FIG. 9 is a diagram showing an application example to an electronic conference system. For example, as shown in FIG. 9, client terminals 10A and 10B having the same functions as those of the presentation support device 10 shown in FIG. 1 are connected via a network NW, and communication tools such as a communication tool on the client terminals 10A and 10B are used. It can be applied to the scene where the screen sharing application program is executed. As a result, the same display content, for example, a document file for presentation software is shared between the display devices of the client terminals 10A and 10B. Under such a situation, at least one of the client terminals 10A and 10B executes the processing shown in FIGS. 3 to 5 to utilize the utterance and line of sight of the user of the client terminal 10A or 10B. It is possible to highlight the area corresponding to the explanation part among the slides included in the document file.

図１０は、電子会議システムへの適用例を示す図である。例えば、図１０に示すように、図７に示したクライアント端末３０と同様の機能を有するクライアント端末３０Ａ及び３０Ｂと、図７に示したサーバ装置３００とがネットワークＮＷを介して接続されると共に、サーバ装置３００上でコミュニケーションツール、例えば画面共有用のアプリケーションプログラムが実行される場面に適用できる。これによって、クライアント端末３０Ａ及び３０Ｂが有する各表示装置の間で同一の表示内容、例えばプレゼンテーションソフト用の文書ファイルが共有される。このような状況の下、サーバ装置３００が図３〜図５に示した処理を実行することにより、クライアント端末３０Ａまたは３０Ｂの利用者の発話を利用して、文書ファイルに含まれるスライドのうち説明箇所に対応する領域をハイライト表示することができる。 FIG. 10 is a diagram showing an application example to an electronic conference system. For example, as shown in FIG. 10, client terminals 30A and 30B having the same functions as the client terminal 30 shown in FIG. 7 and the server device 300 shown in FIG. 7 are connected via the network NW, and The present invention can be applied to a situation in which a communication tool such as a screen sharing application program is executed on the server device 300. As a result, the same display content, for example, a document file for presentation software is shared between the display devices of the client terminals 30A and 30B. Under such circumstances, the server device 300 executes the processing shown in FIGS. 3 to 5 to use the utterance of the user of the client terminal 30A or 30B to explain the slides included in the document file. The area corresponding to the location can be highlighted.

［表示状態の変更方法］
図１に示したプレゼンテーション支援装置１０や図７に示したプレゼンテーション支援システム３では、スライドを常に表示させる場合を例示したが、発話箇所が推定されることを条件に当該発話箇所に対応する領域を抜粋することにより領域の表示状態の変更を実現し、プレゼンテーションを支援することとしてもよい。図１１は、プレゼンテーション支援システムへの実装例を示す図である。図１１には、図８に示したプレゼンテーション支援システム４が会議等のプレゼンテーションに援用される場合が示されている。図１１に示すように、図８に示したクライアント端末４０が有する映像入力装置８がプレゼンタ等の話者の様子を撮像できる位置に設置されると共に、クライアント端末４０が有する表示装置５が聴講者が閲覧可能な状態で設置される。なお、図１１には、図示を省略しているが、クライアント端末４０にはサーバ装置４００が接続されている。この表示装置５は、クライアント端末４０と通信可能な状態であれば遠隔地に設置されることとしてもよい。図１１に示す構成の下、発話箇所が推定されることを条件に当該発話箇所に対応する領域を抜粋することにより領域の表示状態の変更を実現し、プレゼンテーションを支援することができる。例えば、映像入力装置８により入力される映像に発話箇所に対応する領域を字幕スーパー５ａとして重畳表示させる。 [How to change the display status]
The presentation support apparatus 10 shown in FIG. 1 and the presentation support system 3 shown in FIG. 7 exemplify the case where the slide is always displayed. However, the area corresponding to the uttered portion is set on condition that the uttered portion is estimated. By excerpting, the display state of the area may be changed to support the presentation. FIG. 11 is a diagram showing an example of implementation in the presentation support system. FIG. 11 shows a case where the presentation support system 4 shown in FIG. 8 is applied to a presentation such as a conference. As shown in FIG. 11, the video input device 8 included in the client terminal 40 illustrated in FIG. 8 is installed at a position where an image of a speaker such as a presenter can be captured, and the display device 5 included in the client terminal 40 is a listener. Will be installed in a viewable state. Although not shown in FIG. 11, the server device 400 is connected to the client terminal 40. The display device 5 may be installed at a remote place as long as it can communicate with the client terminal 40. Under the configuration shown in FIG. 11, it is possible to change the display state of the area by extracting the area corresponding to the uttered portion on the condition that the uttered portion is estimated, thereby supporting the presentation. For example, a region corresponding to the uttered portion is displayed as a subtitle supermarket 5a in a superimposed manner on the video input by the video input device 8.

［プレゼンテーション支援プログラム］
また、上記の実施例で説明した各種の処理は、予め用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することによって実現することができる。そこで、以下では、図１２を用いて、上記の実施例と同様の機能を有するプレゼンテーション支援プログラムを実行するコンピュータの一例について説明する。 [Presentation support program]
Further, the various processes described in the above embodiments can be realized by executing a prepared program on a computer such as a personal computer or a workstation. Therefore, in the following, an example of a computer that executes a presentation support program having the same functions as those in the above embodiments will be described with reference to FIG.

図１２は、実施例１〜実施例３に係るプレゼンテーション支援プログラムを実行するコンピュータのハードウェア構成例を示す図である。図１２に示すように、コンピュータ１００は、操作部１１０ａと、スピーカ１１０ｂと、カメラ１１０ｃと、ディスプレイ１２０と、通信部１３０とを有する。さらに、このコンピュータ１００は、ＣＰＵ１５０と、ＲＯＭ１６０と、ＨＤＤ１７０と、ＲＡＭ１８０とを有する。これら１１０〜１８０の各部はバス１４０を介して接続される。 FIG. 12 is a diagram illustrating a hardware configuration example of a computer that executes the presentation support program according to the first to third embodiments. As illustrated in FIG. 12, the computer 100 includes an operation unit 110a, a speaker 110b, a camera 110c, a display 120, and a communication unit 130. Further, the computer 100 has a CPU 150, a ROM 160, an HDD 170, and a RAM 180. Each unit of these 110 to 180 is connected via a bus 140.

ＨＤＤ１７０には、図１２に示すように、上記の実施例１で示した抽出部１５ａ、認識部１５ｂ、算出部１５ｃ、推定部１５ｄ及び表示制御部１５ｅと同様の機能を発揮するプレゼンテーション支援プログラム１７０ａが記憶される。また、ＨＤＤ１７０には、上記の実施例２で示した抽出部１５ａ、認識部１５ｂ、算出部１５ｃ、推定部１５ｄ及び表示制御部２３ａと同様の機能を発揮するプレゼンテーション支援プログラム１７０ａが記憶されることとしてもかまわない。このプレゼンテーション支援プログラム１７０ａは、図１に示した抽出部１５ａ、認識部１５ｂ、算出部１５ｃ、推定部１５ｄ及び表示制御部１５ｅの各構成要素と同様、統合又は分離してもかまわない。すなわち、ＨＤＤ１７０には、必ずしも上記の実施例１または上記の実施例２で示した全てのデータが格納されずともよく、処理に用いるデータがＨＤＤ１７０に格納されればよい。 As shown in FIG. 12, the HDD 170 has a presentation support program 170a that exhibits the same functions as the extraction unit 15a, the recognition unit 15b, the calculation unit 15c, the estimation unit 15d, and the display control unit 15e described in the first embodiment. Is memorized. Further, the HDD 170 stores the presentation support program 170a that exhibits the same functions as the extraction unit 15a, the recognition unit 15b, the calculation unit 15c, the estimation unit 15d, and the display control unit 23a described in the second embodiment. It doesn't matter. The presentation support program 170a may be integrated or separated similarly to each component of the extraction unit 15a, the recognition unit 15b, the calculation unit 15c, the estimation unit 15d, and the display control unit 15e illustrated in FIG. That is, the HDD 170 does not necessarily need to store all the data shown in the first embodiment or the second embodiment, and the data used for the processing may be stored in the HDD 170.

このような環境の下、ＣＰＵ１５０は、ＨＤＤ１７０からプレゼンテーション支援プログラム１７０ａを読み出した上でＲＡＭ１８０へ展開する。この結果、プレゼンテーション支援プログラム１７０ａは、図１２に示すように、プレゼンテーション支援プロセス１８０ａとして機能する。このプレゼンテーション支援プロセス１８０ａは、ＲＡＭ１８０が有する記憶領域のうちプレゼンテーション支援プロセス１８０ａに割り当てられた領域にＨＤＤ１７０から読み出した各種データを展開し、この展開した各種データを用いて各種の処理を実行する。例えば、プレゼンテーション支援プロセス１８０ａが実行する処理の一例として、図３〜図５に示す処理などが含まれる。なお、ＣＰＵ１５０では、必ずしも上記の実施例１で示した全ての処理部が動作せずともよく、実行対象とする処理に対応する処理部が仮想的に実現されればよい。 Under such an environment, the CPU 150 reads out the presentation support program 170a from the HDD 170 and expands it in the RAM 180. As a result, the presentation support program 170a functions as a presentation support process 180a, as shown in FIG. The presentation support process 180a expands various data read from the HDD 170 in the area allocated to the presentation support process 180a in the storage area of the RAM 180, and executes various processes using the expanded data. For example, the processing shown in FIGS. 3 to 5 is included as an example of the processing executed by the presentation support process 180a. Note that in the CPU 150, not all the processing units described in the above-described first embodiment may operate, and the processing unit corresponding to the processing to be executed may be virtually realized.

なお、上記のプレゼンテーション支援プログラム１７０ａは、必ずしも最初からＨＤＤ１７０やＲＯＭ１６０に記憶されておらずともかまわない。例えば、コンピュータ１００に挿入されるフレキシブルディスク、いわゆるＦＤ、ＣＤ−ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」にプレゼンテーション支援プログラム１７０ａを記憶させる。そして、コンピュータ１００がこれらの可搬用の物理媒体からプレゼンテーション支援プログラム１７０ａを取得して実行するようにしてもよい。また、公衆回線、インターネット、ＬＡＮ、ＷＡＮなどを介してコンピュータ１００に接続される他のコンピュータまたはサーバ装置などにプレゼンテーション支援プログラム１７０ａを記憶させておき、コンピュータ１００がこれらからプレゼンテーション支援プログラム１７０ａを取得して実行するようにしてもよい。 The presentation support program 170a does not have to be stored in the HDD 170 or the ROM 160 from the beginning. For example, the presentation support program 170a is stored in a “portable physical medium” such as a flexible disk, a so-called FD, a CD-ROM, a DVD disk, a magneto-optical disk, an IC card, which is inserted into the computer 100. Then, the computer 100 may acquire and execute the presentation support program 170a from these portable physical media. Further, the presentation support program 170a is stored in another computer or a server device connected to the computer 100 via a public line, the Internet, a LAN, a WAN, etc., and the computer 100 acquires the presentation support program 170a from these. You may make it execute.

１マイク
５表示装置
７入力装置
１０プレゼンテーション支援装置
１１入出力Ｉ／Ｆ部
１３記憶部
１３ａ文書データ
１３ｂ抽出単語データ
１３ｃ認識単語データ
１５制御部
１５ａ抽出部
１５ｂ認識部
１５ｃ算出部
１５ｄ推定部
１５ｅ表示制御部 1 microphone 5 display device 7 input device 10 presentation support device 11 input / output I / F unit 13 storage unit 13a document data 13b extracted word data 13c recognized word data 15 control unit 15a extraction unit 15b recognition unit 15c calculation unit 15d estimation unit 15e display Control unit

Claims

A recognition unit that performs voice recognition on voice data using a word extracted from a character string included in the display content of the document file for each divided region,
A display control unit that changes a display state of an area including a subsequently recognized recognition word of the two recognition words when two recognition words continuously recognized by the voice recognition belong to different areas. When,
A presentation support device comprising:

The two recognition words are a first recognition word recognized at the latest time by the voice recognition and a second recognition word recognized immediately before the first recognition word. 1. The presentation support device according to 1.

When the word extracted from the character string included in the area to which the first recognition word belongs is equal to or larger than a predetermined value, the position on the area of the first recognition word and the position on the area of the second recognition word are determined. Further has a calculation unit for calculating,
When the position on the area of the first recognition word is within a predetermined range from the beginning and the position on the area of the second recognition word is within a predetermined range from the end, the display control unit may: The presentation support device according to claim 2, wherein a display state of an area including the first recognized word is changed.

If the position of the first recognition word on the region is not within a predetermined range from the beginning, or if the position of the second recognition word on the region is not within the predetermined range from the end, it is obtained by the voice recognition. The presentation support device according to claim 3, wherein the display state of the region having the larger number of recognition words is changed.

When the distance between the area to which the first recognition word belongs and the area to which the second recognition word belongs is within a predetermined threshold, the display control unit displays the display state of the area including the first recognition word. The presentation support device according to claim 2, 3 or 4, wherein

A presentation support system having a first device and a second device,
The first device is
A display device for displaying,
A microphone for inputting voice,
A transmitter for transmitting voice data input by the microphone to the second device,
The second device is
A recognition unit that performs voice recognition on the voice data by using a word extracted from a character string included in the region in which the display content of the document file is divided,
When two recognized words continuously recognized by the voice recognition belong to different areas, the control for changing the display state of the area including the subsequently recognized one of the two recognized words is performed. A display control unit for the display device,
A presentation support system comprising:

For each area into which the display content of the document file is divided, the words extracted from the character string included in the area are used to perform voice recognition on the voice data,
When two recognized words continuously recognized by the voice recognition belong to different areas, a display state of an area including a subsequently recognized one of the two recognized words is changed.
A presentation support method characterized in that processing is executed by a computer.

For each area into which the display content of the document file is divided, the words extracted from the character string included in the area are used to perform voice recognition on the voice data,
When two recognized words continuously recognized by the voice recognition belong to different areas, a display state of an area including a subsequently recognized one of the two recognized words is changed.
A presentation support program that causes a computer to execute processing.