JP2002268683A

JP2002268683A - Method and device for information processing

Info

Publication number: JP2002268683A
Application number: JP2001067221A
Authority: JP
Inventors: Shinichi Yamazaki; 信一山▲嵜▼; Kenichiro Nakagawa; 賢一郎中川; Hiroki Yamamoto; 寛樹山本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2001-03-09
Filing date: 2001-03-09
Publication date: 2002-09-20

Abstract

PROBLEM TO BE SOLVED: To actualize stable voice recognition which is not affected by environmental changes by making it possible to properly set operation conditions of voice recognition processing corresponding to the environmental changes. SOLUTION: A voice recognition part is provided which inputs a vocal indication to a presentation application being executed. According to words inputted from the voice recognition part, and environmental noise and lightness, the current progress state of an assumed schedule is discriminated (step S410). According to the discriminated progress state, the operation conditions of the voice recognition part are set (steps S412 to S415).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声による操作が
可能な情報処理方法及び装置に関し、特にプレゼンテー
ションのような講演における資料提示等に好適な情報処
理方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing method and apparatus which can be operated by voice, and more particularly to an information processing method and apparatus suitable for presenting materials in a lecture such as a presentation.

【０００２】[0002]

【従来の技術】従来より、プレゼンテーションにおいて
は、ＯＨＰ用シートに印刷したプレゼンテーション資料
を発表者が手動で取り替えながらプレゼンテーションを
行っていた。また、最近では、パーソナルコンピュータ
（ＰＣ）機器上のプレゼンテーションソフトでプレゼン
テーションデータを作成し、ＰＣ機器とプロジェクタを
使用してプレゼンテーション資料の提示を行うのが一般
的になりつつある。2. Description of the Related Art Conventionally, in a presentation, a presenter has performed presentation while manually replacing presentation materials printed on an OHP sheet. Also, recently, it has become common to create presentation data using presentation software on a personal computer (PC) device and present presentation materials using a PC device and a projector.

【０００３】この種のプレゼンテーションソフトはＰＣ
機器のキーボード、マウスを用いて操作するのが一般的
であるが、ＰＣ機器上のプレゼンテーションソフトを音
声認識を用いて操作する手法も幾つか提案されている
（例えば、特許第０２９２４７１７号、特開平６−３１
８２３５号公報）。音声認識を用いた操作の利点は、ペ
ージめくり、効果音の出力、アニメーション表示などプ
レゼンテーションソフトを操作する際に必要となるマウ
スなどによるポインティング操作、あるいはキーボード
の操作を省くことができ、講演者が発表に集中できるこ
とである。また、その操作により発表が中断されること
も無く、更に必ずしも発表者がＰＣの前に居る必要がな
くなり、発表スタイルに幅ができるという利点もある。[0003] This kind of presentation software is PC
Although it is common to operate using the keyboard and mouse of the device, some methods of operating presentation software on a PC device using voice recognition have been proposed (for example, Japanese Patent No. 0924717, Japanese Patent Application Laid-Open No. 6-31
No. 8235). The advantage of operation using voice recognition is that pointing operations with a mouse or the like, which are necessary when operating presentation software such as turning pages, outputting sound effects, and displaying animations, or keyboard operations can be omitted. You can concentrate on the presentation. In addition, there is an advantage that the presentation is not interrupted by the operation, and the presenter does not necessarily need to be in front of the PC, and the presentation style can be varied.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、音声認
識の精度は発話者の違い、周囲雑音の変化など使用する
環境の変化に影響を受けやすく、音響モデル、音響分析
方法、音声区間検出の閾値、探索条件などの音声認識の
動作条件は、使用する環境によって最適な設定が異なっ
てくる。However, the accuracy of speech recognition is susceptible to changes in the environment in which it is used, such as differences in speakers, changes in ambient noise, and the like. Optimum settings for speech recognition operating conditions such as search conditions differ depending on the environment in which they are used.

【０００５】例えば、一般的なプレゼンテーションにお
いては、司会者による講演の紹介に始まり、講演者によ
る講演、講演後の質疑応答という流れが一般的である。
そして、これらの流れの中で、講演の進行状況によっ
て、音声認識に関わる環境は変化する。例えば、発話者
を例にとると、講演中の発話者は講演者に限られること
が多いが、質疑応答の場面では、講演者以外にも聴講者
や司会者が発言する機会が増える。また、音声認識にと
って雑音となる会場内のざわめきも講演の進行状況に応
じて変化する。例えば、一般的な講演の場合、講演前後
は聴講者が互いに話す声などでざわざわとしているが、
講演中は聴講者の注意が講演に移り、ざわめきは小さく
なる。[0005] For example, in a general presentation, a flow generally starts with an introduction of a lecture by a moderator, a lecture by the lecturer, and a question-and-answer session after the lecture.
In these flows, the environment related to speech recognition changes depending on the progress of the lecture. For example, taking a speaker as an example, a speaker during a lecture is often limited to the speaker, but in a question-and-answer session, there are more opportunities for an auditor and a moderator to speak in addition to the speaker. In addition, the noise in the venue, which is a noise for voice recognition, changes according to the progress of the lecture. For example, in the case of a general lecture, before and after the lecture, the listeners are busy with each other's voice,
During the lecture, the attention of the listeners is transferred to the lecture, and the noise is reduced.

【０００６】従って、音声認識の精度は、上記のような
プレゼンテーションの流れの中で生じる環境の変化に影
響を受けてしまい、安定した操作が行なえなくなる可能
性がある。従来より提案されている手法は、音声認識を
行ってプレゼンテーションを操作するものではあるが、
いずれの提案においてもプレゼンテーション中に音声認
識の精度を改善させるために音声認識処理の動作条件を
変更するといった提案はない。[0006] Therefore, the accuracy of voice recognition is affected by changes in the environment that occur in the flow of presentation as described above, and there is a possibility that stable operations cannot be performed. Conventionally proposed methods use speech recognition to operate a presentation,
None of the proposals suggests changing the operating conditions of the speech recognition process in order to improve the accuracy of speech recognition during the presentation.

【０００７】本発明は上記の課題に鑑みてなされたもの
であり、環境の変化に追従して音声認識処理の動作条件
を好適に設定可能とし、環境の変化によらず安定した音
声認識を実現する情報処理装置及び方法を提供すること
を目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned problems, and enables an operation condition of a voice recognition process to be appropriately set according to a change in an environment, thereby realizing stable voice recognition regardless of a change in the environment. It is an object of the present invention to provide an information processing apparatus and method for performing the above.

【０００８】また、本発明の他の目的は、講演等、予め
想定されたスケジュールの進行状況を検出し、この進行
状況に応じて想定される環境に適するように、音声認識
処理の動作条件を切換え可能とすることにある。Another object of the present invention is to detect the progress of a schedule, such as a lecture, which is assumed in advance, and to set the operating conditions of the speech recognition process so as to be suitable for an environment assumed according to the progress. That is, it is possible to switch.

【０００９】[0009]

【課題を解決するための手段】上記の目的を達成するた
めの本発明による情報処理装置は例えば以下の構成を備
える。すなわち、実行中のアプリケーションに対する指
示入力を音声によって行なうための音声認識手段と、予
め想定されたスケジュールにおける、現在の進行段階を
判定する判定手段と、前記判定手段で判定された進行段
階に基づいて前記音声認識手段における動作条件を設定
する設定手段とを備える。An information processing apparatus according to the present invention for achieving the above object has, for example, the following arrangement. That is, based on a voice recognition unit for inputting an instruction to a running application by voice, a determination unit for determining a current progression stage in a previously assumed schedule, and a progression stage determined by the determination unit. Setting means for setting operating conditions in the voice recognition means.

【００１０】また、上記の目的を達成するための本発明
による情報処理方法は、実行中のアプリケーションに対
する指示入力を音声によって行なうための音声認識工程
と、予め想定されたスケジュールにおける、現在の進行
段階を判定する判定工程と、前記判定工程で判定された
進行段階に基づいて前記音声認識工程における動作条件
を設定する設定工程とを備える。According to another aspect of the present invention, there is provided an information processing method comprising: a voice recognition step for inputting an instruction to a running application by voice; and a current progress stage in a previously assumed schedule. And a setting step of setting operating conditions in the voice recognition step based on the progression stage determined in the determination step.

【００１１】[0011]

【発明の実施の形態】以下、添付の図面を参照して本発
明の好適な実施形態を説明する。Preferred embodiments of the present invention will be described below with reference to the accompanying drawings.

【００１２】以下の実施形態では、例えばプレゼンテー
ション等の発表の進行状況を判断し、進行状況に応じて
音声認識の動作条件を変更する音声で操作可能なプレゼ
ンテーション装置を説明する。In the following embodiment, a presentation device that can be operated by voice, for example, which determines the progress of a presentation such as a presentation and changes the operation conditions of voice recognition according to the progress will be described.

【００１３】上述したように、プレゼンテーションの進
行に伴う環境の変化に対して安定した認識精度を保つた
めには、環境の変化に応じてその環境に即した動作条件
を選択する必要がある。例えば、静かな環境から周囲雑
音が大きい環境に移行する場合、一般的な方法として、
音響モデルを雑音に適応させたり、スペクトルサブトラ
クションに代表される雑音除去処理を音響分析時におい
て行ったりする必要がある。従って、プレゼンテーショ
ンの進行において、講演会場のざわめき（雑音）に関
し、ざわめきが大きくなるような状況では、先に述べた
ように、音響モデルを雑音適応したり、音響分析方法を
雑音環境に適した方法に変更したりすることが望まし
い。As described above, in order to maintain stable recognition accuracy with respect to an environment change accompanying the progress of a presentation, it is necessary to select an operation condition suitable for the environment according to the environment change. For example, when moving from a quiet environment to an environment with high ambient noise, a common method is
It is necessary to adapt the acoustic model to noise and to perform noise removal processing represented by spectral subtraction at the time of acoustic analysis. Therefore, in a situation where the noise of the lecture hall is increased in the course of the presentation in the course of the presentation, as described above, the acoustic model is adapted to the noise or the acoustic analysis method is adapted to the noise environment. It is desirable to change to.

【００１４】また、例えば、発話者の変化に対しては、
講演中は講演者の音声から作成した音響モデル（特定話
者モデル）を用い、質疑応答時は講演者以外の音声も認
識できるように多数話者の音声から作成した音響モデル
（不特定話者モデル）を用いる方法が考えられる。For example, for a change in the speaker,
During the lecture, an acoustic model (specific speaker model) created from the speech of the speaker was used, and at the time of the question and answer session, an acoustic model (unspecified speaker) created from the speech of many speakers so that speech other than the speaker could be recognized Model).

【００１５】以下、本実施形態のプレゼンテーション装
置について詳細に説明する。Hereinafter, the presentation apparatus of the present embodiment will be described in detail.

【００１６】図１は、本発明のプレゼンテーション装置
の一構成例を示すブロック図である。図１において、プ
レゼンテーション装置は、ディスプレイやプロジェクタ
などの表示装置１０１、マイクロフォン等で構成され、
音声を入力するために用いる音声入力装置１０２、光量
を検出する光量センサ１０３、マウスなどのポンティン
グデバイスやキーボードなどを含んで構成される入力装
置１０４、ＣＰＵなどの電子データを処理する中央処理
装置１０５、磁気記憶装置やメモリなどの記憶装置１０
６を具備する。FIG. 1 is a block diagram showing an example of the configuration of a presentation device according to the present invention. In FIG. 1, the presentation device includes a display device 101 such as a display or a projector, a microphone, and the like.
A voice input device 102 used to input voice, a light intensity sensor 103 for detecting light intensity, an input device 104 including a pointing device such as a mouse and a keyboard, a central processing unit for processing electronic data such as a CPU 105, a storage device 10 such as a magnetic storage device or a memory
6 is provided.

【００１７】図２は、本実施形態によるプレゼンテーシ
ョン装置の機能構成を示すブロック図である。２０１は
制御部であり、本プレゼンテーション装置全体の処理を
制御する。２０２は音声検出部であり、音声区間検出を
行う。また、２０３は音声認識部であり、言語モデル２
１４や音響モデル２１３を用いて、音声検出部２０２で
検出した音声を認識し、その認識結果を出力する。な
お、音響モデル２１３には、ＨＭＭなどの音声認識に用
いる音響モデルが格納されており、言語モデル２１４に
は、単語辞書など音声認識に用いる言語モデルが格納さ
れている。FIG. 2 is a block diagram showing a functional configuration of the presentation device according to the present embodiment. Reference numeral 201 denotes a control unit that controls processing of the entire presentation apparatus. Reference numeral 202 denotes a voice detection unit which performs voice section detection. Reference numeral 203 denotes a speech recognition unit, which is a language model 2
The voice detection unit 202 recognizes the voice detected by the voice detection unit 202 using the audio model 14 and the acoustic model 213, and outputs the recognition result. The acoustic model 213 stores an acoustic model used for speech recognition such as an HMM, and the language model 214 stores a language model such as a word dictionary used for speech recognition.

【００１８】２０４は進行状況判断部であり、音声認識
部２０３よりの音声認識結果や光量センサ１０３からの
信号に基づいてプレゼンテーションの進行状況の変化を
検出し、進行状況を判断する。なお、進行情況の判断に
は、プレゼンテーションの進行状況の変化を検出する条
件とその条件に対応するプレゼンテーションの進行状況
の変化を記述した進行状況判断テーブル２１１が用いら
れる。Reference numeral 204 denotes a progress status judging unit, which detects a change in the progress status of the presentation based on the voice recognition result from the voice recognition unit 203 and a signal from the light amount sensor 103, and judges the progress status. The progress situation determination uses a progress situation determination table 211 that describes a condition for detecting a change in the progress of the presentation and a change in the progress of the presentation corresponding to the condition.

【００１９】２０５は動作条件選択部であり、プレゼン
テーションの進行状況に対応する音声認識の動作条件を
選択する。２０６は動作条件適用部であり、選択した動
作条件を音声検出部２０２や音声認識部２０３に適用す
る。なお、動作条件選択部２０５及び動作条件適用部２
０６は、プレゼンテーションの進行状況に対応する音声
認識の動作条件を記述した動作条件テーブル２１２を参
照して動作条件の選択及び適用を実行する。An operation condition selection unit 205 selects operation conditions for speech recognition corresponding to the progress of the presentation. An operation condition application unit 206 applies the selected operation condition to the speech detection unit 202 and the speech recognition unit 203. The operation condition selection unit 205 and the operation condition application unit 2
Reference numeral 06 refers to an operation condition table 212 that describes operation conditions for speech recognition corresponding to the progress of the presentation, and selects and applies the operation conditions.

【００２０】２０７は進行状況判断テーブル作成部であ
り、進行状況判断テーブルを作成する。２０８は動作条
件テーブル作成部であり、動作条件テーブルを作成す
る。２０９は動作環境適応部であり、音響モデルや音声
区間検出の閾値などを使用する環境に適応させる。２１
０はプレゼンテーション制御部であり、ページめくりな
どのプレゼンテーション操作を制御する。２１５はプレ
ゼンテーションデータであり、プレゼンテーションにお
いて表示装置１０１に表示したりするデータ等を含む。Reference numeral 207 denotes a progress status determination table generation unit which generates a progress status determination table. An operation condition table creation unit 208 creates an operation condition table. Reference numeral 209 denotes an operation environment adaptation unit that adapts to an environment using an acoustic model, a threshold for detecting a speech section, and the like. 21
Reference numeral 0 denotes a presentation control unit, which controls presentation operations such as turning pages. Reference numeral 215 denotes presentation data, including data to be displayed on the display device 101 in the presentation.

【００２１】以上の２０１〜２１０で示される各機能ユ
ニットは、中央処理装置１０５が記憶装置１０６に格納
された制御プログラムを実行することによって実現され
る。また、各機能で参照される、２１１〜２１５で示さ
れるデータは記憶装置１０６に格納されている。すなわ
ち、以降で説明する一連の処理は、基本的には各機能を
記述したプログラムおよび各機能で参照するデータを記
憶装置（1０６）上にロードし、ロードしたプログラム
を中央処理装置（1０５）が実行することによって実現
される。Each of the functional units 201 to 210 is realized by the central processing unit 105 executing a control program stored in the storage device 106. Data indicated by 211 to 215 referred to by each function is stored in the storage device 106. That is, in a series of processing described below, basically, a program describing each function and data referred to by each function are loaded on the storage device (106), and the loaded program is loaded by the central processing unit (105). It is realized by executing.

【００２２】以上の構成からなる本実施形態の動作をフ
ローチャートに基づき説明する。The operation of this embodiment having the above configuration will be described with reference to flowcharts.

【００２３】図３は本実施形態による音声認識を用いた
プレゼンテーション装置の動作概要を示すフローチャー
トである。FIG. 3 is a flowchart showing an outline of the operation of the presentation device using the speech recognition according to the present embodiment.

【００２４】まず、ステップＳ３００において、音声認
識処理のための動作条件の更新を行なう。これは、進行
状況判断部２０４がプレゼンテーションの進行状況を判
断し、動作条件選択部２０５及び動作条件適用部２０６
が、上記判断された進行状況に応じた動作条件を音声検
出部２０２や音声認識部２０３に設定するものである。
この処理の詳細については、図４Ａ、図４Ｂ等を参照し
て後述する。ステップ３０１では、プレゼンテーション
装置のユーザの発声を、マイクロフォンなどの音声入力
装置１０２を介して取り込み音声検出部２０１で音声区
間の検出を行う。音声検出部２０１では、音声入力装置
１０２から入力された音声データを単位時間（例えば
０．０１秒）毎のブロックに分割し、このブロック毎に
音声データが音声区間であるか非音声区間であるか判定
している。音声検出部２０２で音声区間として検出され
た場合は、ステップＳ３０２に進み、取り込んだ音声を
音声認識部２０３で音声認識する。First, in step S300, the operating conditions for the voice recognition processing are updated. This is because the progress determination unit 204 determines the progress of the presentation, and the operation condition selection unit 205 and the operation condition application unit 206
Is to set operating conditions according to the determined progress in the voice detection unit 202 and the voice recognition unit 203.
Details of this processing will be described later with reference to FIGS. 4A and 4B. In step 301, the utterance of the user of the presentation device is fetched via the voice input device 102 such as a microphone, and the voice section is detected by the voice detection unit 201. The voice detection unit 201 divides the voice data input from the voice input device 102 into blocks for each unit time (for example, 0.01 seconds), and the voice data is a voice section or a non-voice section for each block. Has been determined. If it is detected by the voice detection unit 202 as a voice section, the process proceeds to step S302, and the voice recognition unit 203 performs voice recognition on the captured voice.

【００２５】ここで、認識結果が、例えば『次のペー
ジ』、『最後のページ』などプレゼンテーション装置に
指示を与える特定のキーワードである場合には、ステッ
プＳ３０３からステップＳ３０４へ進み、認識結果にし
たがってプレゼンテーション制御部２１０でプレゼンテ
ーションの制御を行う（ステップ３０４〜ステップ３０
５）。例えば、『次のページ』というキーワードが認識
された場合は、次のページのプレゼンテーションデータ
を表示装置１０１に表示する制御を行う。また、認識結
果が終了を指示している場合は、処理を終了する。一
方、ステップ３０３において、特定のキーワード以外の
単語が認識された場合は、ステップＳ３００へ戻り、再
び音声認識の動作条件の更新処理、音声区間の検出処理
を行なう。If the recognition result is a specific keyword for giving an instruction to the presentation device, for example, "next page" or "last page", the process proceeds from step S303 to step S304, and according to the recognition result. The presentation control is performed by the presentation control unit 210 (steps 304 to 30).
5). For example, when the keyword “next page” is recognized, control is performed to display the presentation data of the next page on the display device 101. If the recognition result indicates termination, the process is terminated. On the other hand, when a word other than the specific keyword is recognized in step 303, the process returns to step S300, and the processing for updating the operation conditions of voice recognition and the processing for detecting a voice section are performed again.

【００２６】上述のような動作をする本実施形態のプレ
ゼンテーション装置において、プレゼンテーションの進
行状況を検出し、その進行状況に応じて音声認識処理の
動作条件を切換える制御を説明する。A description will now be given of a control of detecting the progress of the presentation and switching the operating conditions of the speech recognition processing in accordance with the progress of the presentation in the presentation apparatus of the present embodiment that operates as described above.

【００２７】次に、上述したステップＳ３００におけ
る、音声認識の動作条件変更処理について説明する。な
お、この動作条件変更処理は、進行状況判断部２０４、
動作条件選択部２０５、動作条件適用部２０６が、進行
状況判断テーブル２１１及び動作条件テーブル２１２を
参照して行なう。従って、ユーザは進行状況判断テーブ
ル２１１及び動作条件テーブル２１２を、プレゼンテー
ションの実施に先立って設定しておく必要がある。ま
ず、テーブルの設定操作について説明する。Next, a description will be given of the processing for changing the operating conditions for speech recognition in step S300 described above. Note that this operation condition change processing is performed by the progress status determination unit 204,
The operation condition selection unit 205 and the operation condition application unit 206 perform the operation with reference to the progress status determination table 211 and the operation condition table 212. Therefore, the user needs to set the progress judgment table 211 and the operation condition table 212 prior to the presentation. First, a table setting operation will be described.

【００２８】図４Ａは、実施形態によるテーブル設定処
理を説明するフローチャートである。入力装置１０４か
らテーブル設定処理のための所定の操作が入力され、当
該操作が進行状況判断テーブルの作成処理を指示してい
る場合は、ステップＳ４０１からステップＳ４０２へ進
み、進行状況判断テーブル作成部２０７を起動し、表示
装置１０１に進行状況判断テーブル作成用の画面を表示
する。そして、ステップＳ４０３において当該作成用画
面への各種操作入力を受け付け、進行状況判断テーブル
２１１を作成し、これを記憶装置１０６に格納する。FIG. 4A is a flowchart illustrating a table setting process according to the embodiment. If a predetermined operation for table setting processing is input from the input device 104 and the operation instructs the creation processing of the progress judgment table, the process proceeds from step S401 to step S402, and the progress judgment table creation unit 207 Is started, and a screen for creating a progress judgment table is displayed on the display device 101. Then, in step S 403, various operation inputs to the creation screen are accepted, a progress status determination table 211 is created, and this is stored in the storage device 106.

【００２９】進行状況判断テーブル作成部２０７で作成
される進行状況判断テーブル２１１には、進行状況の変
化を検出する条件、および条件に対応する進行状況が記
述される。ここで、進行状況とは、例えば、『講演前』
『講演中』『質疑応答』『講演終了』などのプレゼンテ
ーションの経過や状態を表し、ユーザが自由に定義でき
る。進行状況の変化の検出には、例えば直前の音声認識
の結果、講演会場内の照明の光量の変化、雑音レベルの
変化などの情報を用いる。そして、その時点でのプレゼ
ンテーションの進行状況と、上記情報に基づいて、プレ
ゼンテーションの進行状況に辺かがあるかどうかを判断
する。The progress judgment table 211 created by the progress judgment table creation unit 207 describes conditions for detecting a change in the progress and the progress corresponding to the condition. Here, the progress status is, for example, “before the lecture”
Shows the progress and status of presentations such as "during lecture", "question and answer", and "end of lecture", and can be freely defined by the user. For detecting the change in the progress, information such as a result of the immediately preceding speech recognition, a change in the amount of light in the lecture hall, and a change in the noise level are used. Then, based on the progress of the presentation at that time and the above information, it is determined whether or not the progress of the presentation is close.

【００３０】図５に進行状況判断テーブルの一例を示
す。図５は、直前の音声認識部で特定のキーワードを認
識した場合に、プレゼンテーションの進行状況の変化を
検出する場合の例である。進行状況が『講演前』の状態
の時に、講演開始を意味する『開始します』、『始めま
す』、『スタート』という音声が認識された場合に、プ
レゼンテーションが開始されると判断して進行状況を
『講演中』とすることが記述されている。また、進行状
況が『講演中』である場合に、『終了します』、『以上
です』という音声が認識された場合に、プレゼンテーシ
ョンが『質疑応答』の段階に入ったと判断して、進行状
況を『質疑応答』とする。なお、図５に示したように、
進行状況判断テーブルには、一つの進行状況の変化に対
し、進行状況の変化を検出する条件（ここでは認識結
果）を複数対応づけられる。FIG. 5 shows an example of the progress judgment table. FIG. 5 is an example of a case where a change in the progress of the presentation is detected when a specific keyword is recognized by the immediately preceding voice recognition unit. When the progress status is "Before Lecture" and the speech "Start", "Begin", or "Start", which means the start of the lecture, is recognized, it is determined that the presentation will be started, and the process proceeds. It is described that the situation is "in lecture". In addition, if the progress status is "in lecture" and the speech "finished" or "it is over" is recognized, it is determined that the presentation has entered the "question and answer" stage, and the progress status is determined. Is "Question and Answer". In addition, as shown in FIG.
A plurality of conditions (here, recognition results) for detecting a change in the progress status are associated with one change in the progress status in the progress status determination table.

【００３１】同様に、図６は、講演会場の照明の明るさ
の変化によりプレゼンテーションの進行状況を判断する
場合の進行状況判断テーブルの例である。講演会場内の
照明の明るさ、すなわち光量は、光量センサ１０３が検
出する。この例では、『講演中』の講演会場内の照明を
暗くし、逆に『質疑応答』『講演終了』時には、再度照
明を明るくする場合を想定している。従って、図６の進
行状況判断テーブルでは、現在の進行状況が『講演前』
であって、光量が一定値を下回った場合に、進行状況の
判断結果を『講演中』とし、現在の進行状況が『講演
中』であって光量が一定値を上回った場合に、進行状況
の判断結果を『質疑応答』とするよう記述している。Similarly, FIG. 6 is an example of a progress judgment table for judging the progress of a presentation based on a change in the brightness of the illumination of the lecture hall. The brightness of the illumination in the lecture hall, that is, the light amount is detected by the light amount sensor 103. In this example, it is assumed that the illumination in the lecture hall of “during lecture” is darkened, and conversely, the illumination is brightened again at “question and answer” and “end of lecture”. Therefore, in the progress status determination table of FIG. 6, the current progress status is “before lecture”.
If the light amount falls below a certain value, the result of the determination of the progress is regarded as “in lecture”, and if the current progress is “in lecture” and the light amount exceeds a certain value, the progress is determined. Is described as "question and answer".

【００３２】同様に図７は、雑音レベルの変化により進
行状況を判断する場合の進行状況判断テーブルの例であ
る。ここで言う『雑音レベル』は、非音声区間（ユーザ
が音声を発してない区間）、すなわち、マイクを通して
入力される会場のざわめきなど周囲雑音の音の大きさを
意味する。この大きさを測るには、（１）音声区間と非
音声区間の切り分け、（２）非音声区間の音の大きさの
算出の処理が必要になる。（１）の音声区間と非音声区
間の切り分けは、音声検出部２０２の結果に基づいて音
声／非音声を切り分ける。（２）の非音声区間の音の大
きさは、一般に、振幅やパワーを用いる。この例では、
講演中は講演会場内が比較的静かなのに対し、質疑応答
時にはややざわつくことを想定している。従って、図７
の進行状況判断テーブルでは、現在の進行状況が『講演
前』であって雑音レベル所定の閾値Ｔｈを下まわった場
合には進行状況を『講演中』に切換え、現在の進行状況
が『講演中』であって雑音レベルが閾値Ｔｈ以上になっ
た場合には進行状況を『質疑応答』に切換えることが示
されている。なお、雑音レベルの検出は音声入力装置１
０２を介して入力された信号に基づいて行なわれる。Similarly, FIG. 7 shows an example of the progress status judgment table when the progress status is judged based on the change in the noise level. The “noise level” referred to here means a non-speech section (a section where the user does not emit a sound), that is, a loudness of ambient noise such as a noise of a hall input through a microphone. In order to measure the loudness, it is necessary to perform processing of (1) separation between a voice section and a non-voice section, and (2) calculation of a loudness of a sound section in the non-voice section. The voice section and the non-voice section (1) are separated into voice and non-voice based on the result of the voice detection section 202. In general, amplitude and power are used as the loudness of the sound in the non-voice section of (2). In this example,
During the lecture, the inside of the lecture hall is relatively quiet, but it is assumed that it will be a little annoying during the question and answer session. Therefore, FIG.
In the progress judgment table of the above, if the current progress is “before lecture” and the noise level is lower than a predetermined threshold Th, the progress is switched to “during lecture”, and the current progress is changed to “during lecture”. Indicates that the progress status is switched to “question and answer” when the noise level is equal to or greater than the threshold value Th. The noise level is detected by the voice input device 1
02 is performed on the basis of the signal input through the input terminal 02.

【００３３】図４Ａに戻り、入力装置１０４からテーブ
ル設定処理のための所定の操作が入力され、当該操作が
動作条件テーブルの作成処理を指示している場合は、ス
テップＳ４０１からステップＳ４０４を経てステップＳ
４０５へ進む。ステップＳ４０５では、動作条件テーブ
ル作成部２０８を起動し、表示装置１０１に動作条件テ
ーブル作成用の画面を表示する。そして、ステップＳ４
０６において当該作成用画面への各種操作入力を受け付
け、動作条件テーブル２１２を作成し、これを記憶装置
１０６に格納する。Referring back to FIG. 4A, if a predetermined operation for table setting processing is input from the input device 104 and the operation instructs the creation of the operation condition table, the processing proceeds from step S401 to step S404, and returns to step S404. S
Proceed to 405. In step S405, the operation condition table creation unit 208 is activated, and a screen for creating an operation condition table is displayed on the display device 101. Then, step S4
At step 06, various operation inputs to the creation screen are accepted, an operation condition table 212 is created, and this is stored in the storage device 106.

【００３４】動作条件テーブル２１２には、進行状況に
対応する音声認識の動作条件を記述する。動作条件の項
目としては、例えば、音声認識に使用する音響モデル、
音声認識で用いる文法あるいは言語モデル、音響分析方
法、音響分析を行う際の各種条件と、音声区間検出に用
いる閾値、音声認識を行う際に用いる各種探索条件と、
音声認識結果の棄却処理に用いる各種閾値などがある。The operating condition table 212 describes operating conditions for speech recognition corresponding to the progress. The items of the operating conditions include, for example, an acoustic model used for speech recognition,
Grammar or language model used in speech recognition, acoustic analysis method, various conditions when performing acoustic analysis, thresholds used for voice section detection, various search conditions used when performing voice recognition,
There are various threshold values used for rejection processing of the speech recognition result.

【００３５】図８に動作条件テーブルの例を示す。図８
の動作条件テーブルでは、各進行状況において選択すべ
き音響モデルと音声区間検出の閾値、並びに環境適応処
理の有無を記述している。図８に示したように、動作条
件テーブルには一つの進行状況に対し、複数の動作条件
について記述できる。FIG. 8 shows an example of the operation condition table. FIG.
In the operating condition table, the acoustic model to be selected in each progress situation, the threshold for detecting the voice section, and the presence or absence of the environment adaptation process are described. As shown in FIG. 8, a plurality of operation conditions can be described for one progress state in the operation condition table.

【００３６】入力装置１０４よりテーブル作成処理の終
了が指示されるとステップＳ４０７より本処理を終了す
る。When the end of the table creation processing is instructed from the input device 104, the processing is terminated from step S407.

【００３７】次にプレゼンテーション中における音声認
識処理の動作条件の変更処理、すなわちステップＳ３０
０の処理について説明する。音声区間の検出を行う前
に、上記図４Ａの処理によって作成した進行状況判断テ
ーブル２１１及び動作条件テーブル２１２を用いて、音
声認識の動作条件の設定が実行される。Next, processing for changing the operating conditions of the speech recognition processing during the presentation, that is, step S30
The processing of 0 will be described. Before the detection of the voice section, the operation conditions for voice recognition are set using the progress status determination table 211 and the operation condition table 212 created by the processing of FIG. 4A.

【００３８】まずステップＳ４１０において、進行状況
判断部２０４が進行状況判断テーブル２１１を参照して
進行状況を判断する。本実施形態では、図５乃至図７に
示した進行状況判断テーブルを用い、現在の進行状況
と、音声検出部２０２及び音声認識部２０３からの認識
結果、光量センサ１０３からの光量信号及び音声入力装
置１０２からの信号に基づいて得られる雑音レベルに基
づいて進行状況を判断する。この判断の結果、進行状況
に変化があった場合は、ステップＳ４１１からステップ
Ｓ４１２に進む。ステップＳ４１２では、動作条件選択
部２０５において、動作条件テーブル２１２を参照し
て、進行状況判断部２０４の判断結果（進行状況）に対
応する音声認識の動作条件を選択する。First, in step S410, the progress judgment unit 204 judges the progress with reference to the progress judgment table 211. In the present embodiment, the current progress status, the recognition result from the voice detection unit 202 and the voice recognition unit 203, the light amount signal from the light amount sensor 103, and the voice input are used by using the progress state determination tables shown in FIGS. The progress is determined based on the noise level obtained based on the signal from the device 102. If the result of this determination is that the progress has changed, the process proceeds from step S411 to step S412. In step S412, the operation condition selection unit 205 refers to the operation condition table 212 and selects an operation condition for speech recognition corresponding to the determination result (progress) of the progress determination unit 204.

【００３９】例えば、図５に示した進行状況判断テーブ
ルを用い、進行状況が『講演中』の場合に『以上です』
が認識された場合、進行状況判断部２０４では、進行状
況判断テーブル（図５）に従って進行状況が『講演中』
から『質疑応答』に変化したと判断される（ステップ４
１０、Ｓ４１１）。この場合、進行状況が変化している
ので、続いて動作条件選択部２０５で動作条件テーブル
を参照して、ステップＳ４１０で判断した進行状況『質
疑応答』に該当する動作条件を選択する。図８の動作条
件テーブルを用いた場合は、不特定話者用の音響モデル
を選択し、音声区間検出の閾値がＴｈ３に変更される。
また、同時に環境適応処理を行うという条件を選択す
る。For example, using the progress judgment table shown in FIG. 5, when the progress is "in lecture", "it is over"
Is recognized, the progress status determining unit 204 determines that the progress status is “in lecture” according to the progress status determination table (FIG. 5).
Is changed to “Q & A” (Step 4)
10, S411). In this case, since the progress has changed, the operation condition selection unit 205 subsequently refers to the operation condition table and selects an operation condition corresponding to the progress “question and answer” determined in step S410. When the operation condition table of FIG. 8 is used, the acoustic model for the unspecified speaker is selected, and the threshold for detecting the voice section is changed to Th3.
At the same time, a condition for performing the environment adaptation process is selected.

【００４０】次に、ステップＳ４１３へ進み、ステップ
Ｓ４１２で選択された動作条件に従い、以降で行う音声
認識の各種動作条件の変更を行う。この際に、環境適応
処理を行う動作条件が選択された場合は、動作環境適応
部２０９において音響モデルの環境適応処理を行い、適
応後の音響モデルを記憶装置１０６に格納する（ステッ
プＳ４１４、Ｓ４１５）。ここでいう環境適応処理は、
主に音響モデルの変更を伴うものをいい、公知の技術と
して、ＣＭＳ，ＭＬＬＲ，ＰＭＣなどの手法を指す。そ
して、ステップＳ３０１以降の音声認識処理では新しく
適用された動作条件により、音声認識が実行される。Next, the process proceeds to step S413, in which various operating conditions for speech recognition performed thereafter are changed in accordance with the operating condition selected in step S412. At this time, if the operation condition for performing the environment adaptation process is selected, the environment adaptation process of the acoustic model is performed in the operation environment adaptation unit 209, and the acoustic model after the adaptation is stored in the storage device 106 (steps S414 and S415). ). The environment adaptation processing here is
It mainly refers to a method involving a change in an acoustic model, and indicates a technique such as CMS, MLLR, PMC, etc. as a known technique. Then, in the voice recognition processing after step S301, voice recognition is executed under the newly applied operating condition.

【００４１】以上のように、上記実施形態によれば、音
声の内容、会場の明るさ、雑音レベルによりプレゼンテ
ーションの進行状況を検出し、その検出された進行状況
に従って各進行状況に対して予め割り当てられた音声認
識処理の動作条件が設定される。従って、進行状況に応
じて適切な音声認識処理を実現でき、音声による安定し
たプレゼンテーション操作が可能となる。As described above, according to the above embodiment, the progress of the presentation is detected based on the content of the sound, the brightness of the venue, and the noise level, and the progress is preliminarily assigned to each progress according to the detected progress. The operating condition of the speech recognition process is set. Therefore, an appropriate voice recognition process can be realized according to the progress, and a stable presentation operation by voice can be performed.

【００４２】なお、上記実施形態では、進行状況判断部
２０４による進行状況の判断に際して図５〜図７に示し
た全てのテーブルを用いるとしたが、その一部を用いる
ようにしてもよい。例えば、図６に示した進行状況判断
テーブルが用いられない場合、光量センサは不要とな
る。In the above-described embodiment, all the tables shown in FIGS. 5 to 7 are used when the progress judgment is made by the progress judgment section 204, but a part of them may be used. For example, when the progress situation determination table shown in FIG. 6 is not used, the light amount sensor becomes unnecessary.

【００４３】また、上記実施形態のステップＳ４１０に
おける進行状況判断において、マウスなどのポインティ
ングデバイスの特定の操作やキーボードにおける特定の
キー操作が行われた場合に、進行状況の変化を検出して
進行状況を判断してもよい。この場合の進行状況判断テ
ーブルの例を図９に示す。この場合、予め進行状況を切
換えるためのキー操作を割り当てて、進行状況判断テー
ブルに登録することになる。Further, in the progress status determination in step S410 of the above embodiment, when a specific operation of a pointing device such as a mouse or a specific key operation on a keyboard is performed, a change in the progress status is detected to detect the progress status. May be determined. FIG. 9 shows an example of the progress status determination table in this case. In this case, a key operation for switching the progress is assigned in advance and registered in the progress determination table.

【００４４】更に、上記実施形態のステップＳ４１０に
おける進行状況判断において、アプリケーション（本例
ではプレゼンテーションプログラム）の実行状況に基づ
いて判断を行うようにしてもよい。例えば、講演に使用
するプレゼンテーションデータのうち、特定のデータを
表示した時に進行状況の変化を検出することが挙げられ
る。より具体的には、例えば、プレゼンテーションデー
タが有する各表示ページに、タイトル部、１ページ…最
終ページ等の属性を持たせておき、図１０に示す如き進
行状況判断テーブルを参照して、表示中のページの属性
に基づいて進行状況を判断する。なお、上述した図５〜
図７及び図９、図１０に示すような進行状況判断テーブ
ルのうちの一つのみが用いられてもよいし、複数を組み
合わせて用いてもよい。Further, in the progress status determination in step S410 of the above embodiment, the determination may be made based on the execution status of the application (presentation program in this example). For example, detection of a change in the progress when specific data is displayed among presentation data used for a lecture can be cited. More specifically, for example, each display page included in the presentation data is given attributes such as a title portion, a first page,..., The last page, etc., and by referring to the progress judgment table shown in FIG. The progress is determined based on the attributes of the page. In addition, FIG.
Only one of the progress status determination tables as shown in FIGS. 7, 9, and 10 may be used, or a plurality of progress status determination tables may be used in combination.

【００４５】また、実施形態のステップＳ４１２におけ
る動作条件選択においては、動作条件テーブル２１２を
参照して自動で動作条件を選択したが、図１１に示すよ
うなウィンドウを表示し、マウスなどのポインティング
デバイスあるいはキーボードを用いて講演者が直接、動
作条件を選択できるようにしてもよい。この場合、設定
画面の表示は進行状況に変化があったことをトリガとし
てもよいし所定の操作によって表示を行うようにしても
よい。In the operation condition selection in step S412 of the embodiment, the operation condition is automatically selected with reference to the operation condition table 212. However, a window as shown in FIG. 11 is displayed and a pointing device such as a mouse is displayed. Alternatively, the speaker may be allowed to directly select an operation condition using a keyboard. In this case, the display of the setting screen may be triggered by a change in the progress status, or may be displayed by a predetermined operation.

【００４６】また、上記実施形態のステップＳ４１０に
おける進行状況判断において判断された進行状況を講演
者に通知してもよい。また、同様に、ステップＳ４１２
の動作条件選択において選択された動作条件を講演者に
通知してもよい。ここで、講演者への通知の方法として
は、表示装置１０１上に判断された進行状況或いは選択
された動作条件を表示するようにしてもよい。本装置の
機器構成において、複数の表示装置を設け、講演者のみ
が見ることのできる表示装置に上記通知内容を表示する
ようにしてもよい。図１２に表示装置上に表示した場合
のイメージ図を示す。In addition, the progress determined in the progress determination in step S410 of the above embodiment may be notified to the speaker. Also, similarly, step S412
The operating condition selected in the operating condition selection may be notified to the speaker. Here, as a method of notifying the speaker, the determined progress or the selected operating condition may be displayed on the display device 101. In the device configuration of the present apparatus, a plurality of display devices may be provided, and the notification content may be displayed on a display device that can be viewed only by the speaker. FIG. 12 shows an image diagram when the image is displayed on the display device.

【００４７】また、図１２のごとく表示装置上に表示さ
れている進行状況の部分をマウスなどポインティングデ
バイスで操作することにより、進行状況を選択できるよ
うにしてもよい。同様に、表示装置上に表示されている
動作条件の部分をマウスなどポインティングデバイスで
操作することにより、動作条件を選択できるようにして
もよい。このようにすることで、プレゼンテーション中
にキーワードを認識しにくくなったときに、講演者は進
行状況あるいは動作条件をプレゼンテーション中でも任
意に変更することが可能となる。The progress status may be selected by operating the progress status portion displayed on the display device with a pointing device such as a mouse as shown in FIG. Similarly, the operating condition may be selected by operating a portion of the operating condition displayed on the display device with a pointing device such as a mouse. By doing so, when it becomes difficult to recognize the keyword during the presentation, the speaker can arbitrarily change the progress status or operating conditions even during the presentation.

【００４８】また、上記実施形態において、動作条件テ
ーブルに記述できる動作条件の項目は、図８に示したも
のに限らず、例えば、音声認識で用いる文法あるいは言
語モデルや、音響分析方法（MFCC分析，LPC-MEL分析, S
pectral Subtraction, Cepstrum Mean Subtraction）
や、音響分析を行う際の各種条件（サンプリング周波
数，分析窓幅，メル係数，Spectral Subtraction の係
数，パラメータの種類）や、言語探索を行う際の各種条
件（詳細な探索を行なうか行なわないか、使用する言語
モデルの種類、言語尤度の重み、探索時に用いる処理軽
減方法やビーム幅、求める候補数）や，音声認識結果の
棄却処理に用いる各種閾値や、話者適応処理の有無な
ど、音声認識の動作に係わる任意の動作条件を記述でき
るようにしてよいことはいうまでもない。Further, in the above embodiment, the items of the operation conditions that can be described in the operation condition table are not limited to those shown in FIG. 8, and include, for example, a grammar or language model used in speech recognition, an acoustic analysis method (MFCC analysis). , LPC-MEL analysis, S
pectral Subtraction, Cepstrum Mean Subtraction)
And various conditions for acoustic analysis (sampling frequency, analysis window width, mel coefficient, spectral subtraction coefficient, parameter type), and various conditions for language search (whether or not to perform detailed search) , The type of language model to be used, the weight of language likelihood, the processing reduction method and beam width used in the search, the number of candidates to be searched), various thresholds used for rejection processing of speech recognition results, and the presence or absence of speaker adaptation processing. It goes without saying that any operation conditions relating to the operation of voice recognition may be described.

【００４９】以上説明したように本実施形態によれば、
自動でプレゼンテーションの進行状況を判断し、進行状
況に応じて音声認識の動作条件を変更することにより、
プレゼンテーションの進行状況によらず認識精度の高
い、音声認識を用いたプレゼンテーション装置を実現で
きるという効果がある。As described above, according to the present embodiment,
By automatically judging the progress of the presentation and changing the operating conditions of speech recognition according to the progress,
There is an effect that a presentation device using voice recognition with high recognition accuracy can be realized regardless of the progress of the presentation.

【００５０】また、本発明の目的は、前述した実施形態
の機能を実現するソフトウェアのプログラムコードを記
録した記憶媒体を、システムあるいは装置に供給し、そ
のシステムあるいは装置のコンピュータ（またはＣＰＵ
やＭＰＵ）が記憶媒体に格納されたプログラムコードを
読出し実行することによっても、達成されることは言う
までもない。Another object of the present invention is to provide a system or an apparatus with a storage medium storing software program codes for realizing the functions of the above-described embodiments,
And MPU) by reading and executing the program code stored in the storage medium.

【００５１】この場合、記憶媒体から読出されたプログ
ラムコード自体が前述した実施形態の機能を実現するこ
とになり、そのプログラムコードを記憶した記憶媒体は
本発明を構成することになる。In this case, the program code itself read from the storage medium implements the functions of the above-described embodiment, and the storage medium storing the program code constitutes the present invention.

【００５２】プログラムコードを供給するための記憶媒
体としては、例えば、フロッピディスク，ハードディス
ク，光ディスク，光磁気ディスク，ＣＤ−ＲＯＭ，ＣＤ
−Ｒ，磁気テープ，不揮発性のメモリカード，ＲＯＭな
どを用いることができる。As a storage medium for supplying the program code, for example, a floppy disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD
-R, a magnetic tape, a nonvolatile memory card, a ROM, or the like can be used.

【００５３】また、コンピュータが読出したプログラム
コードを実行することにより、前述した実施形態の機能
が実現されるだけでなく、そのプログラムコードの指示
に基づき、コンピュータ上で稼働しているＯＳ（オペレ
ーティングシステム）などが実際の処理の一部または全
部を行い、その処理によって前述した実施形態の機能が
実現される場合も含まれることは言うまでもない。When the computer executes the readout program code, not only the functions of the above-described embodiment are realized, but also the OS (Operating System) running on the computer based on the instruction of the program code. ) May perform some or all of the actual processing, and the processing may realize the functions of the above-described embodiments.

【００５４】さらに、記憶媒体から読出されたプログラ
ムコードが、コンピュータに挿入された機能拡張ボード
やコンピュータに接続された機能拡張ユニットに備わる
メモリに書込まれた後、そのプログラムコードの指示に
基づき、その機能拡張ボードや機能拡張ユニットに備わ
るＣＰＵなどが実際の処理の一部または全部を行い、そ
の処理によって前述した実施形態の機能が実現される場
合も含まれることは言うまでもない。Further, after the program code read from the storage medium is written in a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, the program code is read based on the instruction of the program code. It goes without saying that the CPU included in the function expansion board or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.

【００５５】[0055]

【発明の効果】以上説明したように、本発明によれば、
環境の変化に追従して音声認識処理の動作条件を好適に
設定することが可能となり、環境の変化に影響されな
い、安定した音声認識を実現できる。また、本発明によ
れば、講演等、予め想定されたスケジュールの進捗状態
を検出し、この進捗状態に応じて想定される環境に適す
るように、音声認識処理の動作条件を切換えることが可
能となる。As described above, according to the present invention,
It is possible to appropriately set the operating conditions of the voice recognition process following changes in the environment, and to achieve stable voice recognition that is not affected by changes in the environment. Further, according to the present invention, it is possible to detect a progress state of a schedule assumed in advance, such as a lecture, and to switch an operation condition of a voice recognition process according to the progress state so as to be suitable for an assumed environment. Become.

[Brief description of the drawings]

【図１】実施形態によるプレゼンテーション装置の基本
構成を示すブロック図である。FIG. 1 is a block diagram illustrating a basic configuration of a presentation device according to an embodiment.

【図２】実施形態によるプレゼンテーション装置の機能
構成を示すブロック図である。FIG. 2 is a block diagram illustrating a functional configuration of the presentation device according to the embodiment.

【図３】実施形態によるプレゼンテーション装置におけ
る音声認識を用いたプレゼンテーションの処理手順を示
すフローチャートである。FIG. 3 is a flowchart showing a procedure of a presentation using speech recognition in the presentation device according to the embodiment;

【図４Ａ】実施形態によるプレゼンテーション装置にお
けるテーブル作成処理を説明するフローチャートであ
る。FIG. 4A is a flowchart illustrating a table creation process in the presentation device according to the embodiment;

【図４Ｂ】実施形態によるプレゼンテーション装置にお
けるプレゼンテーションの進行状況に応じて音声認識の
動作条件を変更する場合の処理手順を示すフローチャー
トである。FIG. 4B is a flowchart showing a processing procedure in a case where an operation condition of voice recognition is changed according to the progress of the presentation in the presentation device according to the embodiment.

【図５】特定のキーワードが認識された場合に進行状況
を判断する場合の進行状況判断テーブルのデータ構成例
を説明する図である。FIG. 5 is a diagram illustrating an example of a data configuration of a progress status determination table when determining a progress status when a specific keyword is recognized.

【図６】講演会場の照明の変化により進行状況を判断す
る場合の進行状況判断テーブルのデータ構成例を説明す
る図である。FIG. 6 is a diagram illustrating a data configuration example of a progress status determination table when the progress status is determined based on a change in illumination of a lecture hall.

【図７】雑音レベルの変化により進行状況を判断する場
合の進行状況判断テーブルのデータ構成例を説明する図
である。FIG. 7 is a diagram illustrating a data configuration example of a progress status determination table when the progress status is determined based on a change in noise level.

【図８】動作条件テーブルのデータ構成例を説明する図
である。FIG. 8 is a diagram illustrating a data configuration example of an operation condition table.

【図９】マウスなどのポインティングデバイスの特定の
操作や特定のキーボード操作が行われた場合に、進行状
況を判断する場合の進行状況判断テーブルのデータ構成
例を説明する図である。FIG. 9 is a diagram illustrating an example of a data configuration of a progress determination table when determining a progress when a specific operation of a pointing device such as a mouse or a specific keyboard operation is performed.

【図１０】講演に使用するプレゼンテーションデータの
うち、特定のデータを表示した時に進行状況を判断する
場合の進行状況判断テーブルのデータ構成例を説明する
図である。FIG. 10 is a diagram illustrating an example of a data configuration of a progress determination table when determining a progress when specific data is displayed among presentation data used for a lecture.

【図１１】ポインティングデバイスあるいはキーボード
を用いて講演者が直接動作条件を選択する場合の、設定
画面例を説明する図である。FIG. 11 is a diagram illustrating an example of a setting screen when a speaker directly selects an operation condition using a pointing device or a keyboard.

【図１２】表示装置上に進行状況あるいは動作条件を表
示する様子を説明する図である。FIG. 12 is a diagram illustrating a state of displaying a progress status or an operation condition on a display device.

───────────────────────────────────────────────────── フロントページの続き (72)発明者山本寛樹東京都大田区下丸子３丁目30番２号キヤノン株式会社内Ｆターム(参考） 5D015 HH23 KK01 LL02 LL03 LL10 ────────────────────────────────────────────────── ─── Continued on the front page (72) Inventor Hiroki Yamamoto 3-30-2 Shimomaruko, Ota-ku, Tokyo F-term in Canon Inc. (reference) 5D015 HH23 KK01 LL02 LL03 LL10

Claims

[Claims]

1. A voice recognition unit for inputting an instruction to an application being executed by voice, a determination unit for determining a current progression stage in a previously assumed schedule, and a progression stage determined by the determination unit A setting unit for setting an operating condition in the voice recognition unit based on the information.

2. The method according to claim 1, wherein the setting unit sets an acoustic model, a grammar or a language model, sets an acoustic analysis method for analyzing a speech, sets a condition for performing an acoustic analysis, and sets a speech section. 2. The information processing apparatus according to claim 1, wherein at least one of setting a detection threshold, setting a search condition at the time of speech recognition, and setting a threshold used for rejection processing of the speech recognition result is performed.

3. An operation condition table which describes operation conditions of the voice recognition means corresponding to the progression stage of the schedule, wherein the setting means sets the operation condition corresponding to the current progression stage determined by the determination means. 3. The information processing apparatus according to claim 1, wherein the information is extracted from the operation condition table, and an operation condition of the voice recognition unit is set according to the extracted operation condition.

4. The information processing apparatus according to claim 3, further comprising an operation condition table creating unit that provides a user interface for creating the operation condition table.

5. The apparatus according to claim 1, wherein said determination means determines whether or not a signal taken from outside satisfies a predetermined transition condition, and determines a progress stage based on the determination result. An information processing apparatus according to claim 1.

6. The information processing apparatus according to claim 5, wherein the transition condition includes the presence of a predetermined keyword in a recognition result by the voice recognition unit.

7. The information processing apparatus according to claim 5, further comprising a detection unit configured to detect a light amount of an environment, wherein the transition condition includes a light amount detected by the detection unit.

8. The information processing apparatus according to claim 5, further comprising a measuring unit that measures a noise level of the environment, wherein the transition condition includes a noise level measured by the measuring unit.

9. The information processing apparatus according to claim 5, wherein the transition condition includes whether or not a predetermined operation signal is input by an operator.

10. The system according to claim 5, wherein the transition condition includes a progress status of an application being executed.
An information processing apparatus according to claim 1.

11. The information processing apparatus according to claim 10, wherein the application includes an image presentation function, and the progress of the application is an image being presented by the application.

12. The system according to claim 1, further comprising a determination table in which said transition condition is described for each progress stage in said schedule, wherein said determination means determines said progress stage by referring to said determination table. Items 5 to 11
An information processing device according to any one of the above.

13. The information processing apparatus according to claim 12, wherein a plurality of transition conditions can be described for one transition in the determination table.

14. The information processing apparatus according to claim 12, further comprising table creation means for providing a user interface for creating the determination table.

15. The apparatus according to claim 1, wherein the setting unit provides a user interface for setting an operation condition in the voice recognition unit when a progression stage determined by the determination unit changes. Item 2. The information processing device according to item 1.

16. The information processing apparatus according to claim 1, further comprising a notifying means for notifying a progress stage determined by said determining means and an operation condition set by said setting means.

17. The information processing apparatus according to claim 16, wherein the notification unit displays the content of the notification on a display screen, and the display also serves as a setting screen for operating conditions.

18. The information processing apparatus according to claim 1, further comprising an operation environment learning unit that adapts an operation condition of speech recognition to an environment based on the progression stage determined by the determination unit.

19. A voice recognition process for inputting an instruction to a running application by voice, a determination process of determining a current progress stage in a previously assumed schedule, and a progress stage determined in the determination process. A setting step of setting an operation condition in the voice recognition step based on the information processing method.

20. The setting step includes setting an acoustic model used in the speech recognition step, setting a grammar or a language model, setting an acoustic analysis method for analyzing a speech, setting conditions for performing an acoustic analysis, and a speech section. 20. The information processing method according to claim 19, wherein at least one of setting a detection threshold, setting a search condition at the time of speech recognition, and setting a threshold used for rejection processing of the speech recognition result is performed.

21. An operation condition table in which operation conditions of the voice recognition step corresponding to the advance stages of the schedule are described, wherein the setting step sets the operation conditions corresponding to the current advance stage determined in the determination step. 21. An operation condition extracted in the operation condition table, and an operation condition of the voice recognition step is set according to the extracted operation condition.
0. The information processing method according to 0.

22. The information processing method according to claim 21, further comprising an operation condition table creating step of providing a user interface for creating the operation condition table.

23. The method according to claim 19, wherein the determining step determines whether or not a signal fetched from outside satisfies a predetermined transition condition, and determines a proceeding stage based on the determination result. An information processing method according to claim 1.

24. The information processing method according to claim 23, wherein the transition condition includes the presence of a predetermined keyword in a recognition result in the voice recognition step.

25. The information processing method according to claim 23, further comprising a detecting step of detecting a light amount of an environment, wherein the transition condition includes a light amount detected in the detecting step.

26. The information processing method according to claim 23, further comprising a measurement step of measuring a noise level of the environment, wherein the transition condition includes a noise level measured in the measurement step.

27. The information processing method according to claim 23, wherein the transition condition includes whether or not a predetermined operation signal is input by an operator.

28. The transition condition according to claim 2, wherein the transition condition includes a progress status of an application being executed.
3. The information processing method according to 3.

29. The information processing method according to claim 28, wherein the application includes an image presentation function, and the progress of the application is an image being presented by the application.

30. The apparatus according to claim 30, further comprising: a determination table describing the transition condition for each progression stage in the schedule, wherein the determination step determines the progression stage with reference to the determination table. Items 23 to 2
10. The information processing method according to any one of items 9.

31. The information processing method according to claim 30, wherein a plurality of transition conditions can be described for one transition in the determination table.

32. The information processing method according to claim 30, further comprising a table creation step of providing a user interface for creating the determination table.

33. The setting step provides a user interface for setting an operation condition in the voice recognition step when a progress stage determined in the determination step is changed. Item 20. The information processing method according to Item 19.

34. The information processing method according to claim 19, further comprising a notification step of notifying a progress stage determined in the determination step and an operation condition set in the setting step.

35. The information processing method according to claim 34, wherein in the notifying step, the content of the notification is displayed on a display screen, and the display also serves as a setting screen for operating conditions.

36. The information processing method according to claim 1, further comprising an operation environment learning step of adapting an operation condition of speech recognition to an environment based on the progress stage determined in the determination step.

37. A computer program for implementing the information processing method according to claim 19 by a computer.

38. A storage medium for storing a computer program for implementing the information processing method according to claim 19 by a computer.