JP3299418B2

JP3299418B2 - Method and apparatus for speech recognition

Info

Publication number: JP3299418B2
Application number: JP20544395A
Authority: JP
Inventors: マービン・エル・ウィリアムス
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1995-08-11
Filing date: 1995-08-11
Publication date: 2002-07-08
Anticipated expiration: 2015-08-11
Also published as: JPH0954596A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識の分野に
関するものであり、特に、未知フレーズの認識に関する
ものである。さらに詳しく説明するならば、背景ノイズ
を考慮した音声認識の方法及び装置に関するものであ
る。The present invention relates to the field of speech recognition, and more particularly to the recognition of unknown phrases. More specifically, the present invention relates to a method and an apparatus for speech recognition in consideration of background noise.

【０００２】[0002]

【従来の技術】音声分析及び音声認識のアルゴリズム、
機器及び装置は、一般によく知れわたってきている。こ
のようなシステムは、ますます強力に、且つ低価格にな
ってきている。近年では、音声認識システムの使用が爆
発的に増加した。これらのシステムにより、データ処理
システムのユーザは、音声活動化コマンドを使用して様
々なプログラムやアプリケーションに指示を与えること
が可能になる。音声認識システムの目的の１つは、デー
タ処理システムの操作に、より人間的なインタフェース
を提供することである。音声認識システムは、マウス、
キーボードまたはプリンタなど他の入力装置とともに使
用されるのが一般的である。これらの入力装置はしばし
ば、音声認識システムの入出力（Ｉ／Ｏ）処理を補助す
るために使用される。様々な既知の音声認識システム
は、認識可能フレーズのセット（すなわち、テンプレー
ト）を含むのが一般的であり、音声活動化コマンドを使
用するためにユーザはその中から話すことができる。音
声認識システムのメモリにはいつも、認識可能セットが
含まれている。この認識可能セットには、それから認識
可能フレーズを選択する、デジタル化された音響フレー
ズのセットが含まれている。例えば、音声認識システム
のメモリに６４個のトレーニング済みの、すなわち仕込
まれたフレーズが存在する場合には、背景音または意図
した音である検出された音は、この認識可能セットと比
較される。このように、意図したものではない背景ノイ
ズが、そのセット内の認識可能フレーズとして解釈され
る可能性のある信頼要因を作り出してしまうこともあ
る。2. Description of the Related Art Speech analysis and speech recognition algorithms,
Equipment and devices are generally well known. Such systems are becoming more powerful and cheaper. In recent years, the use of speech recognition systems has exploded. These systems allow users of data processing systems to use voice activation commands to direct various programs and applications. One of the goals of a speech recognition system is to provide a more human interface to the operation of a data processing system. Speech recognition system, mouse,
It is commonly used with other input devices such as a keyboard or printer. These input devices are often used to assist input / output (I / O) processing in speech recognition systems. Various known speech recognition systems typically include a set of recognizable phrases (ie, templates) from which a user can speak to use a voice activation command. The memory of a speech recognition system always contains a recognizable set. The recognizable set includes a set of digitized acoustic phrases from which to select recognizable phrases. For example, if there are 64 trained or trained phrases in the memory of the speech recognition system, the detected sounds that are background sounds or intended sounds are compared to this recognizable set. In this way, unintended background noise may create confidence factors that can be interpreted as recognizable phrases in the set.

【０００３】概して、音響環境のモニタが原因で、音声
認識システムが背景ノイズを検出してしまう。これらの
背景ノイズがユーザによる認識可能な入力として解釈さ
れることが多い。このような状況では、背景ノイズが原
因で、音声認識システムが操作またはコマンドを実行し
てしまうという問題を引き起こす可能性がある。この問
題を、カリブレーション技術を用いて解決しようと試み
られてきた。このような方法は本質的には、最初に背景
ノイズのサンプルをモニタするために音声認識システム
を用いることになる。そのサンプルは、音声認識システ
ムが認識可能フレーズを実際に聞こうとする時に、集合
要因として機能する。これらのカリブレーション技術は
非効率であることが多く、カリブレーション・フェーズ
の間に検出された背景ノイズのサンプルが実際の認識フ
ェーズの間に存在する背景ノイズと同一または類似して
いると決めてかかることがよくある。[0003] Generally speaking, monitoring of the acoustic environment causes the speech recognition system to detect background noise. These background noises are often interpreted as recognizable inputs by the user. In such a situation, the background noise may cause a problem that the speech recognition system executes an operation or a command. Attempts have been made to solve this problem using calibration techniques. Such a method would essentially use a speech recognition system to monitor samples of background noise first. The sample serves as a collective factor when the speech recognition system actually tries to hear the recognizable phrase. These calibration techniques are often inefficient and determine that the sample of background noise detected during the calibration phase is the same or similar to the background noise that exists during the actual recognition phase. This often happens.

【０００４】別のアプローチにより、ユーザは音声認識
システムの認識モードを手動で非活動化することが可能
になった。しかし、このようなアプローチでは、背景ノ
イズが音声認識システムの動作の妨げになるとユーザが
思った時に、認識モードの活動化及び非活動化を手動で
行なわなければならない。また、この技術では、音声認
識システムがどちらのモードで動作しているかをユーザ
が覚えておかなければならない。さらに、音声認識を活
動化及び非活動化することは非常にやっかいなこともあ
る。これらの背景ノイズはユーザによって引き起こされ
ることが多い。ユーザによって起こされる背景ノイズの
例としては、キーボードやプリンタなど周辺装置の音が
ある。これらのノイズは音声認識システムの動作を妨げ
る。すなわち、システムが背景ノイズをコマンドまたは
機能に対応するフレーズとして認識してしまう可能性が
ある。[0004] Another approach has allowed a user to manually deactivate the recognition mode of a speech recognition system. However, such an approach requires manually activating and deactivating the recognition mode when the user thinks that background noise will interfere with the operation of the speech recognition system. In this technique, the user must remember which mode the speech recognition system is operating in. In addition, activating and deactivating speech recognition can be very cumbersome. These background noises are often caused by the user. Examples of background noise caused by the user include sounds of peripheral devices such as a keyboard and a printer. These noises hinder the operation of the speech recognition system. That is, the system may recognize the background noise as a phrase corresponding to a command or a function.

【０００５】背景ノイズを認識可能フレーズとして不注
意に選択してしまうという問題は、その背景ノイズが音
声認識システムのメモリにある認識可能セット内のフレ
ーズとよく似ているために起こる。そのため、音声認識
システムの認識モード中に、背景ノイズを発生する周辺
装置の動作を背景ノイズとして認識できる方法及び装置
を持つことは都合が良い。[0005] The problem of inadvertently selecting background noise as a recognizable phrase occurs because the background noise is very similar to the phrases in the recognizable set in the memory of the speech recognition system. Therefore, it is convenient to have a method and apparatus that can recognize the operation of peripheral devices that generate background noise as background noise during the recognition mode of the speech recognition system.

【０００６】[0006]

【発明が解決しようとする課題】本発明の目的は、改良
された音声認識の方法及び装置を提供することである。
また、未知フレーズの認識の方法及び装置を提供するこ
とも目的の１つであり、さらに、背景ノイズを考慮した
音声認識方法及び装置を提供することも目的とする。It is an object of the present invention to provide an improved method and apparatus for speech recognition.
Another object is to provide a method and an apparatus for recognizing an unknown phrase, and further to provide a speech recognition method and an apparatus in consideration of background noise.

【０００７】[0007]

【課題を解決するための手段】本発明は、音響入力事象
を分析する方法及び装置を提供し、音響入力事象を分析
するためにテンプレートを利用する。音声入力事象は識
別され、識別された音声入力事象は記録される。記録さ
れた音声入力事象は、テンプレートの第１のエントリを
作成するために処理される。選択された環境で発生する
選択された非音声入力事象は識別され、識別された非音
声入力事象は記録される。そして、記録された非音声入
力事象は、テンプレートの第２のエントリを作成するた
めに処理される。その後は、音響入力事象を、非音声入
力事象が識別されるテンプレートと比較することによ
り、音声入力事象と非音声入力事象とは互いに区別され
る。SUMMARY OF THE INVENTION The present invention provides a method and apparatus for analyzing a sound input event, and utilizes a template to analyze the sound input event. The speech input event is identified and the identified speech input event is recorded. The recorded speech input event is processed to create a first entry in the template. Selected non-speech input events occurring in the selected environment are identified and the identified non-speech input events are recorded. The recorded non-speech input event is then processed to create a second entry in the template. Thereafter, the sound input event is distinguished from the non-speech input event by comparing the sound input event with the template in which the non-speech input event is identified.

【０００８】上記のことは、下記の詳細な説明により、
本発明の他の目的、機能、及び優位点とともに明らかに
なるであろう。[0008] The foregoing has been described, by way of the following detailed description:
It will become apparent with other objects, features, and advantages of the present invention.

【０００９】[0009]

【発明の実施の形態】図１は、コンピュータ１５に電気
的に接続された複数のマルチメディア端末装置１３を含
むマルチメディア・データ処理システム１１を示す。コ
ンピュータ１５は、例えば、ＩＢＭＰＳ２コンピュー
タなどの任意のパーソナル・コンピュータ・システムで
も良い。複数のマルチメディア端末装置１３には、リア
ルタイム／非同期ストリーム・データを作成または使用
するすべてのタイプのマルチメディア端末装置が含ま
れ、ビデオ・モニタ２５も制限なく含まれる。これらの
マルチメディア端末装置の各々は、ストリーム・データ
を作成または使用するために、マルチメディア・アプリ
ケーション・ソフトウェアによって呼び出されるであろ
う。FIG. 1 shows a multimedia data processing system 11 including a plurality of multimedia terminals 13 electrically connected to a computer 15. Computer 15 may be any personal computer system such as, for example, an IBM PS2 computer. The plurality of multimedia terminals 13 include all types of multimedia terminals that create or use real-time / asynchronous stream data, and include the video monitor 25 without limitation. Each of these multimedia terminals will be called by multimedia application software to create or use stream data.

【００１０】例えば、ＣＤ−ＲＯＭプレーヤ１７の動作
は、コンピュータ１５中に存在し、コンピュータ１５に
より実行されるマルチメディア・アプリケーション・ソ
フトウェアによって制御される。ＣＤ−ＲＯＭプレーヤ
１７の出力として生成されたリアルタイム・デジタル・
データ・ストリームは、コンピュータ１５中のマルチメ
ディア・アプリケーションの命令に従い、コンピュータ
１５によって受け取られ、処理される。例えば、リアル
タイム・デジタル・データ・ストリームは、フロッピー
ディスクに記憶するために、またはモデム経由で通常の
電話回線により遠隔コンピュータ・システムに伝送する
ために、圧縮される。そして、その遠隔コンピュータ・
システムは、それを受け取ると、伸張し、アナログ音響
機器でそのデジタル・ストリーム・データを再生する。
あるいは、ＣＤ−ＲＯＭプレーヤ１７から出力されたリ
アルタイム・データ・ストリームは、コンピュータ１５
によって受け取られ、デジタルまたはアナログのフィル
タをかけられ、増幅され、サウンド・バランシングが行
われた後に、音響スピーカ３１、３３に出力するため、
アナログ信号形式で、アナログ・ステレオ・アンプ２９
に送られる。For example, the operation of the CD-ROM player 17 resides in the computer 15 and is controlled by multimedia application software executed by the computer 15. The real-time digital data generated as the output of the CD-ROM player 17
The data stream is received and processed by computer 15 according to the instructions of the multimedia application in computer 15. For example, a real-time digital data stream may be compressed for storage on a floppy disk, or for transmission over a regular telephone line via a modem to a remote computer system. And the remote computer
When the system receives it, it decompresses and plays the digital stream data on the analog audio device.
Alternatively, the real-time data stream output from the CD-ROM player 17 is
After being received by a digital or analog filter, amplified and sound balanced, for output to acoustic speakers 31, 33,
Analog signal format, analog stereo amplifier 29
Sent to

【００１１】周囲の音に対応するアナログ入力信号を受
け取るためには、マイクロフォン１９が使用される。そ
のリアルタイム・アナログ・データ・ストリームは、コ
ンピュータ１５に送られ、デジタル形式に変換され、音
声認識プログラムなどのマルチメディア・アプリケーシ
ョン・ソフトウェアによって操作される。デジタル・デ
ータは、格納され、圧縮され、暗号化され、フィルタを
かけられ、変換され、アナログ・ステレオ・アンプ２９
にアナログ形式で出力され、電話２３にアナログ形式の
出力として送られ、電話回線に伝送するためのモデムの
出力としてデジタル化アナログ形式で出され、ビデオ・
モニタ２５に表示するためにビジュアル・イメージに変
換され、または様々な他の異なるマルチメディア・デジ
タル信号処理操作や通常のマルチメディア・デジタル信
号処理操作が行われる。A microphone 19 is used to receive an analog input signal corresponding to ambient sound. The real-time analog data stream is sent to computer 15, converted to digital form, and manipulated by multimedia application software, such as a speech recognition program. The digital data is stored, compressed, encrypted, filtered, transformed, and converted to an analog stereo amplifier 29.
Output in analog form to the telephone 23, sent as analog output to the telephone 23, output in digitized analog form as the output of a modem for transmission over the telephone line,
It is converted to a visual image for display on monitor 25, or undergoes various other different multimedia digital signal processing operations or normal multimedia digital signal processing operations.

【００１２】同様に、キーボード２１、電話２３及びビ
デオ・モニタ２５のアナログ／デジタル入出力には、コ
ンピュータ１５において、通常のマルチメディア操作が
行われる。特に、コンピュータ１５は、そこで実行され
ている他のアプリケーションにコマンド及び機能を指示
する音声認識システムとして使用される。マイクロフォ
ン１９は、音声入力事象、すなわち人間の声を受け取る
ために使用され、その音響入力事象は、マイクロフォン
１９からの入力を分析することにより音声を認識するマ
ルチメディア・アプリケーションを用いて処理される。Similarly, for the analog / digital input / output of the keyboard 21, the telephone 23 and the video monitor 25, ordinary multimedia operations are performed in the computer 15. In particular, the computer 15 is used as a speech recognition system that directs commands and functions to other applications running thereon. The microphone 19 is used to receive voice input events, i.e., human voices, which are processed using a multimedia application that recognizes voice by analyzing input from the microphone 19.

【００１３】図２は、マルチメディア端末装置１３の動
作を制御するマルチメディア・アプリケーションを実行
するために、本発明において利用される主要ハードウェ
ア構成要素を説明するブロック図である。マルチメディ
ア・データ処理操作においては通常のように、中央演算
処理装置（ＣＰＵ）３３がコンピュータ１５に提供され
ている。音声認識アプリケーションのようなマルチメデ
ィア・アプリケーション・ソフトウェアはＲＡＭメモリ
３５に存在するのが一般的である。ＣＰＵ３３は、マル
チメディア・アプリケーションに含まれる命令を実行す
る。また、マルチメディア・データ処理操作においては
通常のように、リアルタイム／非同期ストリーム・デー
タの操作実行専用の補助プロセッサとして、デジタル信
号プロセッサ３７が提供される。当業者にはよく知られ
ていることであるが、デジタル信号プロセッサは、リア
ルタイム・データに基づく操作の実行を目的とし、また
はリアルタイム・データを含み、且つマルチメディア端
末装置のリアルタイム操作性を実現するために非常に高
速に動作し、すばやく応答するように設計されたマイク
ロプロセッサ装置である。デジタル信号プロセッサ３７
の動作を高速にするために、ダイレクト・メモリ・アク
セス（ＤＭＡ）３９が提供されているのが一般的であ
り、データの高速な取り出し及び格納を実現している。
本発明では、デジタル信号プロセッサ３７の動作をさら
に高速にするために、命令メモリ（ＩＭ）４１及びデー
タ・メモリ（ＤＭ）４３が提供されている。デジタル信
号プロセッサ３７とハードウェア・インタフェース４７
との間でデータを通信するために、バス４５が提供され
ている。ハードウェア・インタフェース４７には、デジ
タル・アナログ変換器及びアナログ・デジタル変換器が
含まれる。様々なマルチメディア端末装置１３の入出力
は、デジタル・アナログ（Ｄ／Ａ）変換器及びアナログ
・デジタル（Ａ／Ｄ）変換器４７を介して接続される。
図２では、電話入出力４９、マイクロフォン入力５３及
びステレオ出力５５、５７が代表例として示され、ハー
ドウェア・インタフェース４７のＡ／Ｄ変換器及びＤ／
Ａ変換器を介して接続されている。ＭＩＤＩ入出力もハ
ードウェア・インタフェース４７を介してデジタル信号
プロセッサ３７に接続されているが、Ａ／Ｄ変換器及び
Ｄ／Ａ変換器には接続されていない。FIG. 2 is a block diagram illustrating the main hardware components used in the present invention to execute a multimedia application that controls the operation of the multimedia terminal device 13. A central processing unit (CPU) 33 is provided to computer 15 as is conventional in multimedia data processing operations. Multimedia application software, such as a voice recognition application, typically resides in RAM memory 35. The CPU 33 executes instructions included in the multimedia application. A digital signal processor 37 is provided as an auxiliary processor dedicated to performing operations on real-time / asynchronous stream data, as is usual in multimedia data processing operations. As is well known to those skilled in the art, digital signal processors are intended to perform operations on, or include, real-time data and provide real-time operability of multimedia terminals. A microprocessor device designed to operate very quickly and respond quickly. Digital signal processor 37
In general, a direct memory access (DMA) 39 is provided in order to increase the speed of the operation of the memory device, thereby realizing high-speed retrieval and storage of data.
In the present invention, an instruction memory (IM) 41 and a data memory (DM) 43 are provided to further speed the operation of the digital signal processor 37. Digital signal processor 37 and hardware interface 47
A bus 45 is provided for communicating data between and. The hardware interface 47 includes a digital-to-analog converter and an analog-to-digital converter. The inputs and outputs of the various multimedia terminals 13 are connected via digital-to-analog (D / A) converters and analog-to-digital (A / D) converters 47.
In FIG. 2, a telephone input / output 49, a microphone input 53 and stereo outputs 55 and 57 are shown as typical examples, and the A / D converter and D /
It is connected via an A converter. The MIDI input / output is also connected to the digital signal processor 37 via the hardware interface 47, but is not connected to the A / D converter and the D / A converter.

【００１４】図３は、音声認識コマンドを認識するよう
にアプリケーションを仕込むためにユーザによって用い
られるプロセスの高レベルのフローチャートを本発明の
好適な実施例に従って示している。ユーザは、ブロック
３００に示されるように、セット、すなわち認識のため
のフレーズのテンプレートを仕込まなければならない。
ユーザはまた、本発明の好適な実施例に従ったブロック
３０２に示されるように、所定のアクションを明らかに
するマクロの形式でアクションのセットを定義する。そ
して、ブロック３０４に示されるように、ユーザは、特
定のフレーズをアクションと関係付ける。言い換えれ
ば、ユーザは、音声フレーズすなわち理想の入力事象を
マクロと関係付ける。そして、ブロック３０６に示され
るように、ユーザは、特定のアプリケーションのための
テンプレートをロードする。プロセスのこのフェーズの
間に、音声認識システムは背景ノイズに出くわし、その
ノイズがテンプレートのエントリと合致、すなわちフレ
ーズのセット内のエントリの信頼要因と合致するかもし
れない。FIG. 3 illustrates a high-level flowchart of a process used by a user to prime an application to recognize speech recognition commands in accordance with a preferred embodiment of the present invention. The user must supply a set, a template of phrases for recognition, as shown in block 300.
The user also defines a set of actions in the form of a macro that reveals a predetermined action, as shown in block 302 according to a preferred embodiment of the present invention. Then, as shown at block 304, the user associates the particular phrase with the action. In other words, the user associates a speech phrase or ideal input event with the macro. Then, as shown in block 306, the user loads a template for the particular application. During this phase of the process, the speech recognition system encounters background noise, which may match an entry in the template, ie, a confidence factor for an entry in the set of phrases.

【００１５】また、テンプレート内の認識可能セット
は、特定のアプリケーションのためにユーザが望むコマ
ンドのすべては実行しないかもしれない。例えば、所望
の音声認識テンプレートが現在、音声認識システムのメ
モリ内に存在しないこともある。このような場合、ユー
ザは、メモリからテンプレートをスワップするための音
声コマンドを発行することができる。これは、ブロック
３０８に示されるように、ユーザが追加のテンプレート
を必要とする場合に起きる。別の状況では、ユーザは、
新しいアプリケーションをロードする時に、ブロック３
１０に示されるように、新しいセットのテンプレートを
ロードする必要があるかもしれない。Also, the recognizable set in the template may not execute all of the commands desired by the user for a particular application. For example, the desired speech recognition template may not currently be in the memory of the speech recognition system. In such a case, the user can issue a voice command to swap the template from memory. This occurs when the user needs additional templates, as shown in block 308. In another situation, the user
Block 3 when loading a new application
As shown at 10, a new set of templates may need to be loaded.

【００１６】本発明は、音声認識システムが周辺装置に
よって作り出された背景ノイズを自動的に登録すること
を可能にする方法及び装置を用いる。本発明はまた、周
辺装置からの割込みに基づき、音声認識モードの活動化
及び非活動化を自動的に行うこともある。本発明は、背
景割込みノイズが無視されず、認識可能フレーズのセッ
トに動的に追加される方法及び装置を伴う。背景ノイズ
・フレーズには、本発明の好適な実施例に従い、空コマ
ンドが登録される。The present invention uses a method and apparatus that allows a speech recognition system to automatically register background noise created by peripheral devices. The present invention may also automatically activate and deactivate the speech recognition mode based on interrupts from peripheral devices. The present invention involves a method and apparatus wherein background interrupt noise is not ignored and is dynamically added to the set of recognizable phrases. An empty command is registered in the background noise phrase according to the preferred embodiment of the present invention.

【００１７】あるいは、コマンドが背景ノイズ・フレー
ズと関係付けられることもある。例えば、そのコマンド
は、ある別の事象が起きるまで、音声認識システムを非
活動化にすることもある。このアプローチにより、背景
ノイズのために音声認識システムを動的に仕込むことが
可能になる。システムは、異なる背景ノイズを認識する
ように仕込まれることもあり、これにより、背景ノイズ
をセット内の認識可能フレーズと間違える可能性が減少
される。すなわち、背景ノイズのための信頼要因が認識
可能セットに含まれることになるのである。[0017] Alternatively, the command may be associated with a background noise phrase. For example, the command may deactivate the speech recognition system until some other event occurs. This approach allows the speech recognition system to be dynamically populated for background noise. The system may be trained to recognize different background noises, which reduces the likelihood that the background noise will be mistaken for a recognizable phrase in the set. That is, the confidence factor for the background noise is included in the recognizable set.

【００１８】図４は、本発明の好適な実施例に従ったテ
ンプレート４００を示している。フレーズ・カラムは、
認識可能フレーズのテキスト表示を識別する。コマンド
・カラムは、音響フレーズを認識する音声認識システム
上で実行されるキーボード・マクロなどのコマンドを識
別する。例えば、音声認識システムで、"ＰＲＩＮＴＤ
ＯＣＵＭＥＮＴ"というフレーズを認識すると、ファン
クション・キーＦ７がキーボード・バッファに送られ、
続いて、ＤＯＣという語が送られ、ＥＮＴＥＲキー（＾
Ｅ）がキーボード・バッファに入る。結果として、"Ｐ
ＲＩＮＴＤＯＣＵＭＥＮＴ"というフレーズを認識し
た音声認識システム上で、アプリケーションは、Ｆ７
ＤＯＣＥＮＴＥＲというキーストロークを受け取る。
これらは、アプリケーションが文書を印刷するために必
要なコマンドになるであろう。デジタル化形式カラム
は、テンプレート内の各フレーズに対する音響サンプル
のグラフィカル表示を示す。その表示は図解のみを目的
としており、ユーザが特定のフレーズをどのように話す
かの平均、すなわち、仕込まれたサンプル・フレーズを
示す。FIG. 4 shows a template 400 according to a preferred embodiment of the present invention. The phrase column is
Identify the textual representation of the recognizable phrase. The command column identifies a command, such as a keyboard macro, that runs on a speech recognition system that recognizes acoustic phrases. For example, in a speech recognition system, "PRINTD
When the phrase "OCUMENT" is recognized, the function key F7 is sent to the keyboard buffer,
Next, the word DOC is sent, and the ENTER key (@
E) enters the keyboard buffer. As a result, "P
On a speech recognition system that recognized the phrase "RINT DOCUMENT", the application
Receives the keystroke DOC ENTER.
These will be the commands that the application needs to print the document. The digitized format column shows a graphical representation of the acoustic samples for each phrase in the template. The display is for illustration purposes only and shows the average of how the user speaks a particular phrase, ie, the sampled phrase that was loaded.

【００１９】デジタル化された音の形式とユーザによっ
て仕込まれたテンプレート内のデジタル化形式との比較
は、音すなわち音響入力事象を検出する音声認識システ
ムによって行われる。割込み基準によって定義されたよ
うな背景ノイズを検出すると、音響フレーズをテンプレ
ートに動的に付け加える。記号｛｝は、図４のフレーズ
・カラムに見ることができるように、発明によって作成
されたフレーズのエントリを示す。The comparison between the digitized sound form and the digitized form in the template supplied by the user is made by a speech recognition system that detects sounds, ie, acoustic input events. Upon detecting background noise as defined by the interrupt criteria, an acoustic phrase is dynamically added to the template. The symbol ｛｝ indicates a phrase entry created according to the invention, as can be seen in the phrase column of FIG.

【００２０】本発明の好適な実施例に従い、作成された
フレーズのエントリに空コマンドを関係付けることがで
きる。音声認識システムは、背景音または人間の声の音
響入力事象を検出すると、その音をテンプレート内のす
べてのエントリと比較する。背景音は"非音声入力事象"
とも呼ばれ、人間の声は"音声入力事象"とも呼ばれる。
音声認識システムは、音響入力事象をテンプレート内の
認識可能セットのエントリと比較するため、背景ノイズ
と背景ノイズのサンプルとの比較のために、より高い信
頼要因が存在する。In accordance with a preferred embodiment of the present invention, a null command can be associated with the created phrase entry. When the speech recognition system detects an acoustic input event of a background sound or a human voice, it compares the sound with all entries in the template. Background sound is "non-voice input event"
The human voice is also called "speech input event".
Speech recognition systems have higher confidence factors for comparing acoustic input events to recognizable set entries in the template and for comparing background noise to samples of background noise.

【００２１】図５は、本発明の好適な実施例に従った、
ノイズ、すなわち背景音（非音声入力事象）を作り出す
可能性のある周辺装置を登録するプロセスのフローチャ
ートを示す。そのプロセスは、ブロック５０２に示され
るように、周辺装置の名前の形式によるユーザ入力を受
け取ることによって始まる。そして、プロセスは、ブロ
ック５０４に示されるように、周辺装置の各々の割込み
を識別するユーザ入力を受け取る。その後、プロセス
は、ブロック５０６に示されるように、周辺装置に関係
付けられた通信ポートを示すユーザ入力を受け取る。そ
して、プロセスは、ブロック５０８に示されるように、
装置認識の経過時間に関するユーザ入力を受け取る。FIG. 5 illustrates a preferred embodiment of the present invention.
FIG. 4 shows a flowchart of a process for registering peripheral devices that may create noise, ie, background sounds (non-voice input events). The process begins by receiving user input in the form of a peripheral device name, as shown in block 502. The process then receives a user input identifying the interrupt for each of the peripheral devices, as shown in block 504. Thereafter, the process receives user input indicating a communication port associated with the peripheral device, as shown in block 506. The process then proceeds as shown in block 508:
Receive user input regarding elapsed time of device recognition.

【００２２】次に、ブロック５１０に示されるように、
認識中に実行される任意のオプションのコマンドを識別
するユーザ入力が受け取られる。そして、プロセスは、
ブロック５１２に示されるように、通知選択に関するユ
ーザ入力を受け取る。ユーザは、適切な認識がなされた
場合に通知が行われるように選択するかもしれない。そ
して、プロセスは、ブロック５１４に示されるように、
周辺装置からのノイズを検出する間に通知されることを
ユーザが望んでいるか否かを判断する。ユーザが通知を
望む場合には、プロセスは、ブロック５１６に示される
ように、通知のための出力装置を指定するユーザ入力を
受け取る。ユーザは、音声通知のためのスピーカまたは
映像通知のためのビデオ・モニタなど、様々な出力装置
を介して通知されるであろう。そして、プロセスは、ブ
ロック５１８に示されるように、通知フラグを活動化す
る。Next, as shown in block 510,
User input is received identifying any optional commands to be performed during recognition. And the process is
As shown at block 512, user input for a notification selection is received. The user may choose to be notified when the proper recognition has been made. The process then proceeds as shown in block 514:
It is determined whether the user wants to be notified while detecting noise from the peripheral device. If the user wishes to be notified, the process receives user input specifying an output device for the notification, as shown in block 516. The user will be notified via various output devices, such as a speaker for audio notification or a video monitor for video notification. The process then activates the notification flag, as shown in block 518.

【００２３】その後、プロセスは、ブロック５２０に示
されるように、ユーザによって入力された情報を装置認
識テーブルに格納した後に終了する。ブロック５１４
で、ユーザが通知を望まない場合には、プロセスはブロ
ック５２０に進む。装置認識テーブルは、ファイル継続
フィールドまたは関係データ・ベースなど、様々な形式
をとる。Thereafter, the process ends, as indicated by block 520, after storing the information entered by the user in the device recognition table. Block 514
At, if the user does not want notification, the process proceeds to block 520. The device recognition table may take various forms, such as a file continuation field or a relational database.

【００２４】図６は、音声入力事象または非音声入力事
象である音（"音響入力事象"とも呼ばれる）を検出し、
識別するように音声認識システムを仕込み、更新するプ
ロセスのフローチャートを示す。そのプロセスは、ブロ
ック６００に示されるように、装置認識テーブルを活動
メモリにロードすることによって開始される。装置認識
テーブルは、図５で示されたように、ユーザによって入
力され、格納されたデータである。プロセスは、ブロッ
ク６０２に示されるように、装置認識テーブルに指定さ
れた周辺装置からの割込みが目標アプリケーションに到
達する前に、代行受信するために割込みベクタを設定す
る。そして、プロセスは、ブロック６０４に示されるよ
うに、モニタ・サービスを活動化する。割込みをモニタ
するために使用されるモニタ・サービスは、当業者には
周知のものであり、本発明の好適な実施例に従って、様
々な方法が用いられるであろう。FIG. 6 detects a sound (also called an "acoustic input event") that is a speech input event or a non-speech input event;
4 shows a flowchart of a process for provisioning and updating a speech recognition system to identify. The process begins by loading the device recognition table into active memory, as shown in block 600. The device recognition table is data input and stored by the user as shown in FIG. The process sets the interrupt vector to intercept before the interrupt from the peripheral device specified in the device recognition table reaches the target application, as shown in block 602. The process then activates the monitor service, as shown in block 604. The monitor services used to monitor interrupts are well known to those skilled in the art, and various methods may be used in accordance with the preferred embodiment of the present invention.

【００２５】プロセスは、ブロック６０６に示されるよ
うに、周辺装置からの割込みを待つ。そして、ブロック
６０８に示されるように、プロセスは周辺装置からの割
込みを受け取る。次に、プロセスは、ブロック６１０に
示されるように、最終的に伝達する実在のアプリケーシ
ョン・アドレスへの割込みを目標アプリケーションへの
割込みに渡す。プロセスは次に、ブロック６１２に示さ
れるように、割込みの受信の時間を記録する。次に、プ
ロセスは、ブロック６１４に示されるように、満了時計
を開始させる。満了時計は、基本的に、割込みの検出か
ら経過した時間を判断するために本発明の好適な実施例
において用いられるタイマである。The process waits for an interrupt from a peripheral, as indicated by block 606. Then, as indicated at block 608, the process receives an interrupt from a peripheral device. Next, the process passes the interrupt to the finally communicating real application address to the interrupt to the target application, as shown in block 610. The process then records the time of receipt of the interrupt, as shown in block 612. Next, the process starts an expiration clock, as indicated by block 614. The expiration clock is basically a timer used in the preferred embodiment of the present invention to determine the time elapsed since the detection of the interrupt.

【００２６】そして、ブロック６１６に示されるよう
に、プロセスは、音響認識を待つ。言い換えれば、プロ
セスは、テンプレート中のエントリの信用しきい値を満
たす認識パターンが検出されるかどうか調べるために待
つ。音声を認識すると、プロセスは、ブロック６１８に
示されるように、音響割込みが受け取られたか否か判断
する。音響割込みは、マイクロフォンなどの入力装置が
音響入力事象を検出した時に起きる。音響割込みが受け
取られていない場合には、プロセスは次に、ブロック６
２０に示されるように、認識のための時間が満了したか
否か判断する。満了した場合には、プロセスは、ブロッ
ク６２２に示されるように、割込み受信の時間の記録を
クリアする。そして、周辺装置からの割込みを待つブロ
ック６０６に戻る。ブロック６２０で、認識のための時
間が満了していない場合には、プロセスは、ブロック６
１６に戻り、音響認識を待つ。Then, as indicated by block 616, the process waits for acoustic recognition. In other words, the process waits to see if a recognition pattern that meets the trust threshold of an entry in the template is found. Upon recognizing the speech, the process determines whether an audible interrupt has been received, as indicated at block 618. An acoustic interrupt occurs when an input device, such as a microphone, detects an acoustic input event. If an acoustic interrupt has not been received, the process then proceeds to block 6
As shown at 20, it is determined whether the time for recognition has expired. If expired, the process clears the record of the time of the interrupt reception, as shown in block 622. Then, the process returns to block 606 to wait for an interrupt from the peripheral device. At block 620, if the time for recognition has not expired, the process proceeds to block 6
Return to step 16 and wait for acoustic recognition.

【００２７】ブロック６１８で、音響割込みが受け取ら
れた場合には、プロセスは、ブロック６２４に示される
ように、音響パターン（音響入力事象）を受け取るため
の処理に進む。プロセスは、ブロック６２６に示される
ように、音響パターンの受信の時間を記録する。その
後、プロセスは、ブロック６２８に示されるように、音
響パターンが認識されるか否か判断する。音響パターン
が認識されない場合には、プロセスは、ブロック６３０
に示されるように、音響パターン受信の時間の記録をク
リアするための処理に進む。そして、プロセスはブロッ
ク６２２に進む。At block 618, if an acoustic interrupt has been received, the process proceeds to receive an acoustic pattern (acoustic input event), as indicated at block 624. The process records the time of receipt of the acoustic pattern, as indicated at block 626. Thereafter, the process determines whether the acoustic pattern is recognized, as shown at block 628. If no acoustic pattern is recognized, the process proceeds to block 630.
As shown in (1), the process proceeds to a process for clearing the record of the time of receiving the acoustic pattern. The process then proceeds to block 622.

【００２８】ブロック６２８で、音響パターンが認識さ
れた場合には、ブロック６３２に示されるように、プロ
セスは、経過時間を判断するために、音響割込み時間か
ら周辺装置の割込み時間を引く。次にプロセスは、ブロ
ック６３４に示されるように、計算された時間が許容範
囲内であるか否か判断する。許容範囲内でない場合に
は、上述したように、プロセスは、ブロック６３０に進
む。一方、許容範囲内である場合には、プロセスは次
に、ブロック６３６に示されるように、非音声入力事象
の認識をユーザに通知すべきか否か判断する。ユーザに
通知すべきでない場合には、プロセスは、ブロック６３
８に示されるように、コマンドが実行されるべきか否か
判断する。At block 628, if an acoustic pattern is recognized, the process subtracts the peripheral interrupt time from the acoustic interrupt time to determine the elapsed time, as shown at block 632. Next, the process determines whether the calculated time is within an acceptable range, as indicated by block 634. If not, the process proceeds to block 630, as described above. On the other hand, if so, the process then determines whether the user should be notified of the recognition of the non-speech input event, as shown at block 636. If the user should not be notified, the process proceeds to block 63
As shown at 8, it is determined whether the command should be executed.

【００２９】ブロック６３６で、ユーザに通知すべき場
合には、プロセスは、ブロック６４０に示されるよう
に、認識テーブル定義に従ってユーザに通知する。その
後、プロセスは、ブロック６３８に示されるように、コ
マンドが実行されるべきか否か判断する。コマンドが実
行されるべき場合には、プロセスは、ブロック６４２に
示されるように、認識テーブルに従ってコマンドを実行
する。その後、ブロック６４４に示されるように、ノイ
ズ（非音声入力事象）が認識可能テンプレート・パター
ンとして格納される。言い換えれば、非音声入力事象が
テンプレートのエントリとして格納される。その後、プ
ロセスは、上述したように、ブロック６３０に進む。ブ
ロック６３８で、コマンドが実行されるべきでない場合
には、プロセスは直接ブロック６４４に進む。At block 636, if the user is to be notified, the process notifies the user according to the recognition table definition, as indicated at block 640. Thereafter, the process determines whether the command is to be executed, as indicated by block 638. If the command is to be executed, the process executes the command according to the recognition table, as shown in block 642. Thereafter, as shown in block 644, the noise (non-speech input event) is stored as a recognizable template pattern. In other words, non-voice input events are stored as template entries. Thereafter, the process proceeds to block 630, as described above. At block 638, if the command is not to be executed, the process proceeds directly to block 644.

【００３０】本発明の好適な実施例に従い、図６に示さ
れたプロセスは、ユーザまたはそのアプリケーションの
管理担当者など他のエンティティによって登録された割
込みを代行受信する常駐終了（ＴＳＲ）サービスとして
実現される。周辺装置からの割込みは、それと関係付け
られた割込みベクタ・テーブル・アドレスにすぐに送ら
れ、すなわち、その割込みが適切な装置に転送される。
これにより、キーボード割込みサービスがキーボード割
込みを受け取り、プリンタ・サービスがその出力を受け
取ることが保証される。テンプレート内の信頼要因と合
致しないが、割込みの指定時間内に受け取られる検出可
能音声認識フレーズは、本発明の好適な実施例に従った
空関係付けの候補である。各周辺装置には、それのため
に定義された関係付けられた割込みがある。例えば、パ
ーソナル・コンピュータは、キーボード割込みのため
に、割込み１４Ｈを使用することができる。In accordance with a preferred embodiment of the present invention, the process illustrated in FIG. 6 is implemented as a Terminate Resident (TSR) service that intercepts interrupts registered by a user or other entity, such as a person responsible for managing the application. Is done. An interrupt from a peripheral device is immediately sent to the interrupt vector table address associated with it, that is, the interrupt is forwarded to the appropriate device.
This ensures that the keyboard interrupt service receives the keyboard interrupt and the printer service receives its output. Detectable speech recognition phrases that do not match the confidence factors in the template, but are received within the specified time of the interrupt, are candidates for a null association according to a preferred embodiment of the present invention. Each peripheral has an associated interrupt defined for it. For example, a personal computer can use interrupt 14H for keyboard interrupts.

【００３１】本発明は、ハードウェア割込みまたはオペ
レーティング・システムの割込みを代行受信するように
使用できる。図５に示した登録サービスにより、ユーザ
は、音響入力事象のために記録が活動化されるべき割込
みを指定することが可能になる。ユーザは、割込みが背
景ノイズ、すなわち非音声入力事象として解釈されるべ
き感度を調整しても良い。現存の装置のために所定の省
略値が設定されるであろう。例えば、プリンタは通常、
割込み５Ｈで作動する。The present invention can be used to intercept hardware or operating system interrupts. The registration service shown in FIG. 5 allows a user to specify an interrupt at which recording should be activated for an acoustic input event. The user may adjust the sensitivity at which the interruption should be interpreted as background noise, ie, a non-speech input event. Certain default values will be set for existing devices. For example, printers are usually
Operates at interrupt 5H.

【００３２】本発明の好適な実施例に従って、検出され
た音響入力事象が既存の空フレーズと比較されるべき
か、または新規のフレーズを作成するべきかどうかを評
価するために事前プロセスが用いられることもある。こ
のようなプロセスでは、システムに特定のノイズ・フレ
ーズを仕込むため、背景ノイズを継続的に利用すること
が必要となるであろう。加えて、空コマンドは、ユーザ
供給コマンドまたはシステム省略時コマンドの代わりを
することもある。例えば、そのコマンドは、ワード・プ
ロセッサのためにＳＡＶＥコマンドを発行することもで
きるだろう。妨害雑音のようなノイズ活動が増加する
と、ユーザは今終了した作業を保存したいと思うかもし
れない。このような場合には、背景ノイズの１つが実行
可能フレーズと合致する。According to a preferred embodiment of the present invention, a pre-process is used to evaluate whether the detected acoustic input event should be compared with an existing empty phrase or whether a new phrase should be created. Sometimes. Such a process would require the continuous use of background noise to feed the system with a particular noise phrase. In addition, empty commands may take the place of user-supplied commands or system default commands. For example, the command could issue a SAVE command for the word processor. As noise activity increases, such as jamming noise, the user may want to save the work that has just been completed. In such a case, one of the background noises matches the executable phrase.

【００３３】ユーザは、空関係付けが作成された時に、
グラフィカルに通知されても良い。このような通知は、
音響手段を通してすることも可能である。さらに、ユー
ザは、作成通知時の空コマンドを変更することもできる
であろう。When the user creates an empty association,
It may be notified graphically. Such notifications are:
It is also possible through acoustic means. Further, the user could be able to change the empty command at the time of creation notification.

【００３４】本発明はまた、空関係付けよりはむしろ、
受け取られた割込みのタイプに基づいてテンプレート全
体を切り換えることも可能にするであろう。このような
オプションは、検出された割込みが、テンプレートの新
規のセットを要求する、異なる音声認識フレーズのセッ
トを用いる新規のアプリケーションに対する秀れた信号
になるであろうことを意味する。The present invention also provides for,
It would also allow switching the entire template based on the type of interrupt received. Such an option means that the detected interrupt will be an excellent signal for new applications that use a different set of speech recognition phrases, requiring a new set of templates.

【００３５】本発明の好適な実施例に従い、非音声入力
事象と音声入力事象とを区別することを伴う音声認識シ
ステムの根本的問題が処理される。本発明は、周辺装置
が背景ノイズを作り出すであろうということ、および、
システムが本質的にこの背景ノイズに応答してアプリケ
ーションに不適切なコマンドを実行する可能性があるこ
とを認識する。本発明は、周辺装置がテンプレートのよ
うな音声認識セット内にデジタル化された音声フレーズ
を付け加えることを可能にする方法及び装置を提供す
る。本発明はさらに、本発明がいつも活動化されている
必要がないという点で、従来技術に対する利点を有す
る。一度、背景ノイズが"仕込まれ"、テンプレートに登
録されると、発明は非活動化されても、または取り除か
れても構わない。これは、他のアプリケーションのため
にコンピュータ資源を解放する利点を有する。In accordance with a preferred embodiment of the present invention, the fundamental problem of a speech recognition system involving distinguishing non-speech input events from speech input events is addressed. The present invention provides that the peripheral device will create background noise, and
Recognize that the system may inherently execute commands inappropriate for the application in response to this background noise. The present invention provides a method and apparatus that allows a peripheral device to add digitized speech phrases into a speech recognition set such as a template. The invention has further advantages over the prior art in that the invention need not always be activated. Once the background noise is "built in" and registered in the template, the invention may be deactivated or removed. This has the advantage of freeing up computer resources for other applications.

【００３６】例を参照しながら、本発明を説明したが、
当業者であるならば、本発明の思想及び範囲から離れず
に形式及び詳細部分に様々な変更をすることが可能であ
ることは理解されるであろう。The invention has been described with reference to examples.
It will be understood by those skilled in the art that various changes may be made in form and detail without departing from the spirit and scope of the invention.

【００３７】[0037]

【発明の効果】本発明に従えば、音声と背景ノイズのよ
うな非音声とを区別することが可能となり、音声認識の
精度が向上される。According to the present invention, speech can be distinguished from non-speech such as background noise, and the accuracy of speech recognition is improved.

[Brief description of the drawings]

【図１】本発明の好適な実施例に従った、マルチメディ
ア・データ処理システムFIG. 1 is a multimedia data processing system according to a preferred embodiment of the present invention.

【図２】本発明の好適な実施例に従った、音声認識シス
テムのようなアプリケーションを実行するために利用さ
れる主要なハードウェア構成要素のブロック図FIG. 2 is a block diagram of the main hardware components used to execute an application, such as a speech recognition system, according to a preferred embodiment of the present invention.

【図３】本発明の好適な実施例に従った、音声認識コマ
ンドを認識するようにアプリケーションを仕込むため
に、ユーザによって用いられるプロセスの高レベルのフ
ローチャートFIG. 3 is a high-level flowchart of a process used by a user to train an application to recognize speech recognition commands, in accordance with a preferred embodiment of the present invention.

【図４】本発明の好適な実施例に従ったテンプレートFIG. 4 is a template according to a preferred embodiment of the present invention.

【図５】本発明の好適な実施例に従った、ノイズ、すな
わち背景音を作り出す可能性のある周辺装置を登録する
プロセスのフローチャートFIG. 5 is a flowchart of a process for registering peripheral devices that may create noise, ie, background sound, according to a preferred embodiment of the present invention.

【図６】音声入力事象または非音声入力事象の音を検出
し、識別するように音声認識システムを仕込み、更新す
るプロセスのフローチャートFIG. 6 is a flowchart of a process for installing and updating a speech recognition system to detect and identify sounds of a speech input event or a non-speech input event.

[Description of sign]

１１マルチメディア・データ処理システム１３マルチメディア末端装置１５コンピュータ１７ＣＤ−ＲＯＭプレーヤ１９マイクロフォン２１キーボード２３電話２５ビデオ・モニタ２９アナログ・ステレオ・アンプ３１音響スピーカ３３音響スピーカ４５バス 11 Multimedia Data Processing System 13 Multimedia Terminal Device 15 Computer 17 CD-ROM Player 19 Microphone 21 Keyboard 23 Telephone 25 Video Monitor 29 Analog Stereo Amplifier 31 Acoustic Speaker 33 Acoustic Speaker 45 Bus

フロントページの続き (56)参考文献特開昭49−57702（ＪＰ，Ａ) 特開昭59−109095（ＪＰ，Ａ) 特開昭60−158494（ＪＰ，Ａ) 特開平３−160499（ＪＰ，Ａ) 特開平１−138595（ＪＰ，Ａ) 特開平５−19781（ＪＰ，Ａ) 特公平２−22960（ＪＰ，Ｂ２) 特公平４−49952（ＪＰ，Ｂ２) 特公平５−56512（ＪＰ，Ｂ２) 特公昭61−49226（ＪＰ，Ｂ１) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/00 - 15/28 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (56) References JP-A-49-57702 (JP, A) JP-A-59-109095 (JP, A) JP-A-60-158494 (JP, A) JP-A-3-160499 (JP) JP-A-1-138595 (JP, A) JP-A-5-19781 (JP, A) JP-B-2-22960 (JP, B2) JP-B-4-49952 (JP, B2) JP-B 5-56512 (JP, B2) JP 61-49226 (JP, B1) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 15/00-15/28 JICST file (JOIS)

Claims

(57) [Claims]

1. A method for analyzing a sound input event using a template in a data processing system including a non-speech input event that becomes background noise and a peripheral device that generates an interrupt, comprising: identifying a speech input event; Recording the identified speech input event; processing the recorded speech input event to create a first entry in the template; identifying a selected non-speech input event occurring in the selected environment. Recording the identified non-speech input event; processing the recorded non-speech input event to create a second entry in the template; and detecting an interrupt from the peripheral device Detecting an audio input event occurring after the interrupt; and By comparing the template, acoustic input event analysis method comprising the distinguishing step the audio input event and non-speech input events.

Determining, in response to the identification of the non-speech input event, whether a command is associated with the non-speech input event; and wherein the command is associated with the non-speech input event. Executing the command in response to the request.

3. A method for analyzing a sound input event using a template in a data processing system including a non-speech input event that becomes background noise and a peripheral device that generates an interrupt, comprising: identifying a speech input event; Recording the identified speech input event; processing the recorded speech input event to create a first entry in the template; identifying a selected non-speech input event occurring in the selected environment. Recording the identified non-speech input event; processing the recorded non-speech input event; and creating a second entry in the template for the processed non-speech input event; Detecting an interrupt from a peripheral device; and detecting an acoustic input event occurring after the interrupt. Identifying the non-voice input event by comparing the acoustic input event with the template; and responding to the identification of the non-voice input event occurring within a predetermined time after the interruption, Determining whether or not the command is associated with the non-speech input event; and executing the command in response to the command being associated with the non-speech input event. Event analysis method.

4. The method according to claim 1, further comprising the step of processing said identified non-speech input event occurring within a predetermined time after said interruption and replacing said second entry of said template for said processed non-speech input event. 4. The method of claim 3, comprising:

5. An apparatus for analyzing a sound input event using a template in a data processing system including a non-speech input event serving as background noise and a peripheral device generating an interrupt, wherein the first input device identifies the speech input event. Identification means, and first recording means for recording the identified speech input event, and processing the recorded speech input event, and
First processing means for creating an entry of the following, second identification means for identifying a selected non-speech input event occurring in a selected environment, and second recording means for recording the identified non-speech input event Second processing means for processing a recorded non-voice input event and creating a second entry of the template for the processed non-voice input event; and first detecting an interrupt from the peripheral device. Detecting means for detecting a sound input event occurring after the interruption, and comparing means for distinguishing a voice input event from a non-voice input event by comparing the sound input event with the template. Including equipment.

6. A means responsive to the identification of the non-voice input event to determine whether a command is associated with the non-voice input event, and wherein the command is associated with the non-voice input event. Means for executing the command in response to the request.

7. An apparatus for analyzing an audio input event using a template in a data processing system including a non-voice input event serving as background noise and a peripheral device for generating an interrupt, wherein the first input device identifies the voice input event. Identification means, and first recording means for recording the identified speech input event, and processing the recorded speech input event, and
First processing means for creating an entry of the following, second identification means for identifying a selected non-voice input event occurring in a selected environment, and second recording for recording the identified non-voice input event Means for processing the recorded non-speech input event, and second processing means for creating a second entry of the template for the processed non-speech input event; and detecting an interrupt from the peripheral device First detection means, second detection means for detecting an audio input event occurring after the interruption, and third identification means for identifying a non-voice input event by comparing the audio input event with the template. Determining, in response to identifying a non-voice input event occurring within a predetermined time after the interruption, whether or not a command is associated with the non-voice input event; Command is in response to being associated with a said non-speech input events, and an execution means for executing the command, the acoustic input event analyzer.

8. The apparatus further comprising: means for processing the identified non-voice input event occurring within a predetermined time after the interruption and replacing the second entry of the template for the processed non-voice input event. An apparatus according to claim 7.