JPH0580794A

JPH0580794A - Speech recognition device

Info

Publication number: JPH0580794A
Application number: JP3245473A
Authority: JP
Inventors: Kazuo Fujimoto; 和生藤本
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1991-09-25
Filing date: 1991-09-25
Publication date: 1993-04-02

Abstract

PURPOSE:To provide the speech recognition device having a function which starts an exceptional process by interrupting a normal recognizing process if misrecognition occurs during speech input recognition and returns to the normal recognizing process only when a vocabulary for the exceptional process is recognized. CONSTITUTION:This speech recognition device is provided with an input means 1 for accepting a speech input, a recognizing means 2 which recognizes the speech input, a next process determining means 3 which determines a next process according to the recognition result, an exceptional process recognizing means 4 which performs the exceptional process when misrecognition occurs during the speech recognition, an output means 5 which outputs the recognition result, a normal information storage means 7 which is referenced by the recognizing means 2, and an exceptional information storage means 8 which is referenced by the exceptional process recognizing means 4. If the misrecognition occurs, the exceptional information storage means 8 is referred to for the recognition, so a user need not be questioned from the beginning and the speech recognition device suitable for high-speed and continuous speech input operation is provided.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声による入力を受け
付けた時に、入力された音声信号に従って音声認識を行
い、機器を制御する手段を有する音声認識装置に関する
ものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition apparatus having means for controlling voice recognition by performing voice recognition according to an input voice signal when a voice input is accepted.

【０００２】[0002]

【従来の技術】近年、金融業界，流通業界では、電話回
線等を用いた音声入力による音声応答認識装置が導入さ
れている。これらの装置を用いて、残高照会や、オーダ
ーエントリー等のサービスが利用者に提供されている。
また最近では家電製品や、自動車内製品に音声認識装置
が開発され、カーオーディオや、ＶＴＲの予約、電話等
に応用されている。2. Description of the Related Art In recent years, a voice response recognition device by voice input using a telephone line has been introduced in the financial industry and the distribution industry. Services such as balance inquiry and order entry are provided to users using these devices.
Recently, a voice recognition device has been developed for home electric appliances and in-vehicle products, and has been applied to car audio, VTR reservation, telephone and the like.

【０００３】図６は、従来の音声認識装置の構成を示し
たものである。図６において、５１は音声入力を受け付
ける入力手段、５２は入力された音声信号を用いて音声
認識を行う認識手段、５３は認識を行うときに参照され
る認識情報格納手段、５４は認識結果から次の処理を決
定する次処理決定手段、５５は処理結果を表示する出力
手段である。FIG. 6 shows the configuration of a conventional voice recognition device. In FIG. 6, reference numeral 51 is input means for receiving voice input, 52 is recognition means for performing voice recognition using the input voice signal, 53 is recognition information storage means referred to when performing recognition, and 54 is recognition result. Next processing determining means for determining the next processing, and 55 is an output means for displaying the processing result.

【０００４】以上のように構成された音声認識装置につ
いて、以下その動作について説明する。まず利用者は、
音声認識装置の出力手段５５からの入力促進メッセージ
（音声によるガイダンスや、表示装置による案内表示）
に従って音声入力を行う。入力手段５１はこの入力を受
け付けるとともに、次の認識手段５２で認識処理を行う
ために必要な処理を行う。まず入力されたアナログ信号
をデジタル信号に変換する。通常サンプリング周波数と
して、８から１０kHzが用いられ、また８から１６ビッ
トの値で量子化される。また必要に応じて、高周波成分
や雑音成分が除かれる。認識手段５２は、入力信号と認
識情報格納手段５３の中の音声情報とパターンマッチン
グすることによって、ある一定の閾値以上で最も似てい
る情報を選び出し、発声された単語としてその結果を返
す。次処理決定手段５４は、その結果をみて次に要求す
る入力内容を決定したり、出力手段５５から出力する内
容を決定する。The operation of the speech recognition apparatus configured as above will be described below. First, the user
Input prompt message from the output means 55 of the voice recognition device (guidance by voice or guidance display by display device)
Follow the steps below to input the voice. The input means 51 accepts this input and performs the processing necessary for the recognition processing by the next recognition means 52. First, the input analog signal is converted into a digital signal. Normally, a sampling frequency of 8 to 10 kHz is used, and quantization is performed with a value of 8 to 16 bits. In addition, high frequency components and noise components are removed as needed. The recognition means 52 selects the most similar information at a certain threshold value or more by pattern matching with the input signal and the voice information in the recognition information storage means 53, and returns the result as a uttered word. The next process determining means 54 determines the input content to be requested next or the content to be output from the output means 55 by looking at the result.

【０００５】この音声認識装置を電話機に応用した場
合、市外局番号を入力するためには、１０桁程度の数字
の発声が必要である。少なくとも０（ぜろと発音）から
９（きゅうと発音）までの数字を順に発声し、その内容
を認識して、ダイヤルパルスやトーン信号に変換し、電
話をかけるものである。その認識結果は、必要に応じて
電話機に付属の表示装置や、発声出力装置を用いて示す
手段を持つことによって、認識された結果を利用者が確
認することができる（特開昭６３−３３７９６号公報
等）。電話番号以外にも、暗証番号の入力についても同
様である。When this voice recognition device is applied to a telephone, it is necessary to utter a number of about 10 digits in order to input the area code. At least a number from 0 (pronounced zero) to 9 (pronounced kyu) is uttered in order, the content is recognized, converted into a dial pulse or tone signal, and a call is made. The recognition result can be confirmed by the user by having a display device attached to the telephone and means for indicating the recognition result using a voicing output device, if necessary (Japanese Patent Laid-Open No. 63-33796). No. The same applies to the input of a password other than the telephone number.

【０００６】１０桁の数字を入力する時間を少しでも短
縮するためには、１つの数字を発声し、認識を行って結
果を表示すると同時に次の入力を待つように構成され
る。認識が行われた場合は、該当数字情報に変換されて
次の入力を待つ。発声が小さかった場合等で、認識がで
きなかった場合は言い直しを要求することによって、次
の入力を待つような構成となっていた。音声入力に慣れ
てくると表示装置などの情報を見なくても入力ができる
ようになる。従って視線を電話帳の該当数字列から外す
ことなく、順に発声していくことにより電話番号入力を
行うことが可能となる。入力結果は、発声全ての完了後
に確認すればよく、１個１個入力をするごとに表示装置
等を見る必要はない。In order to shorten the time for inputting a 10-digit number as much as possible, one number is uttered, recognition is performed, the result is displayed, and at the same time, the next input is awaited. If it is recognized, it is converted into the corresponding numerical information and waits for the next input. When the utterance is small, for example, when the recognition cannot be made, a request for rewording is made to wait for the next input. As you become accustomed to voice input, you will be able to input without looking at information on the display device. Therefore, it is possible to input the telephone number by speaking in order without removing the line of sight from the corresponding number string in the telephone directory. The input result may be confirmed after the completion of all utterances, and it is not necessary to look at the display device or the like every time one by one is input.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら従来の構
成では、１０桁の数字を順に発声した時に、途中で認識
ができなかった場合においても、連続的に次の入力を要
求するので（この場合は言い直し）、利用者が認識結果
に注意していないと実際の意志と違った入力が行われて
しまう問題点を有していた。However, in the conventional configuration, the next input is continuously requested even when the 10-digit number is uttered in order and the recognition is not possible in the middle (in this case, In other words, if the user does not pay attention to the recognition result, there is a problem that an input different from the actual intention is made.

【０００８】具体例では、発声データが、「ぜろ」，
「いち」，「に」，「さん」，「よん」，「ご」，「ろ
く」，「なな」，「はち」，「きゅう」と順に１０桁の
数字であった場合、「よん」のところで、発声が小さか
った等の原因で認識ができなかった場合、「よん」のと
ころは、認識が行われずに、認識結果は、「ぜろ」，
「いち」，「に」，「さん」，「ご」，「ろく」，「な
な」，「はち」，「きゅう」という結果となり、「よ
ん」の情報が抜け落ちた結果となってしまう。In the specific example, the utterance data is "zero",
If it is a ten-digit number such as "ichi", "ni", "san", "yon", "go", "roku", "nana", "hachi", "kyu", then "yon" By the way, when recognition is not possible due to a small amount of utterance, etc., the recognition result is "Zero", without recognition at "Yon".
The result is "ichi,""ni,""san,""go,""roku,""nana,""hachi," and "kyu," resulting in missing information about "yon."

【０００９】利用者は全ての入力が完了した後で、表示
装置等に出力された結果をみて、「よん」の情報が欠落
していることを発見した場合、また最初から入力をやり
直さなければならない問題点もあった。After all the input is completed, the user looks at the result output to the display device or the like and finds that the information of "Yon" is missing, and the user must input again from the beginning. There were some problems that didn't happen.

【００１０】本発明は上記従来の問題点を解決するもの
で、発声認識に誤りが発生した場合には、通常の音声入
力を中断して例外処理用の単語を発声するまで、次の入
力を許さない機能を有するので、認識誤りの状態が発生
しても、全ての入力をやり直す必要がない音声認識装置
を提供することを目的とする。The present invention solves the above-mentioned conventional problems. When an error occurs in utterance recognition, the normal voice input is interrupted and the next input is made until a word for exception processing is uttered. It is an object of the present invention to provide a voice recognition device that has a function that does not allow it, so that even if a recognition error occurs, it is not necessary to redo all the inputs.

【００１１】[0011]

【課題を解決するための手段】この目的を達成するため
に本発明は、音声入力を受け付ける入力手段と、この入
力手段により受け付けた音声入力を認識する認識手段
と、この認識手段により認識した結果に基づいて次の処
理を決定する次処理決定手段と、音声認識時に認識誤り
が発生した場合に実行する例外処理認識手段と、認識処
理結果を出力する出力手段と、前記認識手段が参照する
通常情報格納手段と、前記例外処理認識手段が参照する
例外情報格納手段を有している。In order to achieve this object, the present invention provides an input means for receiving a voice input, a recognition means for recognizing the voice input accepted by the input means, and a result recognized by the recognition means. Next processing determining means for determining the next processing based on the following, an exception processing recognizing means to be executed when a recognition error occurs during voice recognition, an output means for outputting a recognition processing result, and a normal operation referred to by the recognizing means. It has information storage means and exception information storage means referred to by the exception handling recognition means.

【００１２】[0012]

【作用】この構成によって、認識手段が認識誤りという
結果を出したときに、次処理決定手段は、通常の認識処
理を中断し、例外処理認識手段を活性化させて、例外情
報格納手段を参照して認識を行う手段を有することで、
認識誤りが発生した場合に、利用者に初めから全てを言
い直させることのない、高速連続音声入力に適した音声
認識装置を提供することができる。With this configuration, when the recognition means produces a recognition error, the next processing determination means interrupts the normal recognition processing, activates the exception processing recognition means, and refers to the exception information storage means. By having a means for performing recognition,
It is possible to provide a voice recognition device suitable for high-speed continuous voice input, which does not require the user to reword everything from the beginning when a recognition error occurs.

【００１３】[0013]

【Example】

（実施例１）以下本発明の第１の実施例について、図面
を参照しながら説明する。図１は、本発明の第１の実施
例における音声認識装置の構成を示すものである。図１
において、１は主として音声入力を受け付ける入力手
段、２は認識手段で、発声区間を検出し、発声区間内の
音声認識を行う。３は次処理決定手段で、発声認識の結
果によって、次に実行すべき内容を決定し、音声認識装
置内の各手段を制御する。４は例外処理認識手段で、音
声認識時に認識誤りが発生した場合に実行される。５は
出力手段で、認識処理結果等を出力する。６は音声情報
格納手段で、認識する音声情報を格納する。７は通常情
報格納手段で、音声情報格納手段６内で、認識手段２が
認識処理を実行する際に参照されるものである。８は例
外情報格納手段で、例外処理認識手段４が、例外処理を
行う際に参照されるものである。(First Embodiment) A first embodiment of the present invention will be described below with reference to the drawings. FIG. 1 shows the configuration of a speech recognition apparatus according to the first embodiment of the present invention. Figure 1
In 1, the input means 1 mainly accepts a voice input, and the reference numeral 2 is a recognition means, which detects a utterance section and performs voice recognition within the utterance section. Reference numeral 3 denotes a next process determining means, which determines the content to be executed next according to the result of voice recognition, and controls each means in the voice recognition device. Reference numeral 4 denotes an exception processing recognition means, which is executed when a recognition error occurs during voice recognition. An output unit 5 outputs the recognition processing result and the like. Reference numeral 6 denotes a voice information storage means for storing voice information to be recognized. Reference numeral 7 is a normal information storage means, which is referred to in the voice information storage means 6 when the recognition means 2 executes a recognition process. Reference numeral 8 denotes an exception information storage means, which is referred to by the exception processing recognition means 4 when performing exception processing.

【００１４】以上のように構成された音声認識装置につ
いて、図２を用いてその動作を説明する。まず最初の語
彙を認識するために必要な設定が行われる。数字情報を
認識する目的があれば、入力手段１を活性化し、出力手
段５から利用者に対して入力促進の出力が行われる。そ
して利用者が該当語彙を発声するのを待つ。発声の検知
は、次のようにして行われる。音声入力利得は初期値あ
るいは以前に設定された適当な値になっていると仮定す
る。利用者が音声認識装置に対して音声入力を開始する
と、音声レベル検出を行い、ある閾値以上の音声レベル
が検出され、発声の開始が確認される。発声の開始を確
認した後で、音声レベル検出により、入力がある閾値以
下のレベルになった時に発声の終了を知ることができ
る。The operation of the speech recognition apparatus configured as described above will be described with reference to FIG. First, the necessary settings are made to recognize the first vocabulary. For the purpose of recognizing the numerical information, the input means 1 is activated, and the output means 5 outputs an input prompt to the user. Then, it waits for the user to speak the vocabulary. The detection of utterance is performed as follows. It is assumed that the voice input gain has an initial value or an appropriate value set previously. When the user starts voice input to the voice recognition device, voice level detection is performed, a voice level above a certain threshold is detected, and the start of utterance is confirmed. After confirming the start of utterance, the voice level detection allows the end of utterance to be known when the input reaches a level below a certain threshold.

【００１５】入力時における周囲雑音に対しては、入力
手段１に、指向性の強いマイクを使用し、できるだけ周
囲の別の信号が混入しないように構成する手段がある。
また音声レベル検出で入力利得を変更する手段もある。
さらに音声レベル検出が、周囲の雑音レベルを測定し、
その雑音レベル分を差し引くような音声信号処理を行う
ことによって音声認識を行うために必要な信号成分を獲
得する構成をとることができる。With respect to the ambient noise at the time of input, there is a means for using a microphone having a strong directivity as the input means 1 so that another ambient signal is not mixed in as much as possible.
There is also a means for changing the input gain by detecting the voice level.
In addition, voice level detection measures the ambient noise level,
It is possible to adopt a configuration in which a signal component necessary for voice recognition is obtained by performing voice signal processing that subtracts the noise level.

【００１６】ガイダンスの指示に従い適当な語彙を発声
すると、入力手段１から入力された語彙情報が音声情報
格納手段６内に格納されている音声情報とパターンマッ
チングされることによって、音声認識が行われる。認識
手段２は音声認識を行い、該当情報を音声信号から適当
な数字や文字情報等の情報に変換して、次情報決定手段
３に出力する。そして出力手段５から、利用者に対して
必要なメッセージが出力される。ガイダンスは発声情報
のみや、表示出力によって行うことも可能である。When a proper vocabulary is uttered in accordance with the instruction of the guidance, the vocabulary information input from the input means 1 is pattern-matched with the voice information stored in the voice information storage means 6 to perform voice recognition. .. The recognizing means 2 performs voice recognition, converts the corresponding information from the voice signal into information such as appropriate numeral and character information, and outputs it to the next information determining means 3. Then, the output means 5 outputs a necessary message to the user. The guidance can be provided only by vocalization information or by display output.

【００１７】パターンマッチング（以降マッチングと略
す）される音声情報は、利用種別ごとに、群というグル
ープに分割されて格納されている。例えば、第１群は操
作指令名であり、第２群は電話番号等の入力に用いる数
字であり、第３群は詳細指令用の語彙等に、グループ分
けされて格納されている。このように利用目的に応じて
語彙情報をグループ化することにより、指定されたグル
ープ内で最も発声された音声情報と近いものを探し出す
作業を行う。もちろん各情報は、マッチングを短時間で
行いやすい情報に符号化されている。またこのマッチン
グに、閾値というものを設け、ある値以上でないと、マ
ッチング語彙がないという音声認識結果を出力する手段
を認識手段２内に有する。The voice information to be pattern-matched (hereinafter abbreviated as matching) is divided and stored in groups called groups for each usage type. For example, the first group is operation command names, the second group is numbers used for inputting telephone numbers and the like, and the third group is grouped and stored in vocabulary for detailed commands. In this way, by grouping the vocabulary information according to the purpose of use, work is performed to find the one that is closest to the most vocalized voice information in the designated group. Of course, each information is coded into information that facilitates matching in a short time. Further, a threshold value is provided for this matching, and the recognition means 2 has means for outputting a voice recognition result indicating that there is no matching vocabulary unless the threshold value is not less than a certain value.

【００１８】入力音声は、特定の入力レベルの閾値を越
えたところで、語彙情報を持つ発声の開始を検知し、閾
値を下回った時に発声の終了を確認する。この発声の区
間中の入力音声を、数kHz（８kHzから１０kHz程度）で
サンプリングし、８ビットから１６ビット程度で、量子
化することによってデジタル化する。音声認識はこのデ
ジタル値を用いて、ＬＰＣケプストラム係数を求め、こ
れを特徴パラメータとして、標準の語彙のもつそのパラ
メータ量と比較することにより、最も近いものを、その
群番号で与えられた語彙の中から見つけだす。これを発
声の区間中を通じてマッチングし、最終的に最も近いも
のを発声語彙として認識するものである。ＬＰＣケプス
トラム係数以外にも種々の方法があるが、前記のサンプ
リングによって得られたデジタル値を、各種の信号処理
を施して特徴パラメータを抽出し、マッチングを行う方
法が一般的である。When the input voice exceeds the threshold of a specific input level, the start of utterance having vocabulary information is detected, and when it falls below the threshold, the end of utterance is confirmed. The input voice in the utterance section is sampled at several kHz (about 8 kHz to 10 kHz) and quantized at about 8 to 16 bits to be digitized. Speech recognition uses this digital value to find the LPC cepstrum coefficient and compares it with the parameter amount of the standard vocabulary using this as a characteristic parameter to find the closest one of the vocabulary given by the group number. Find out from inside. This is matched throughout the vocalization section, and finally the closest one is recognized as the vocalized vocabulary. Although there are various methods other than the LPC cepstrum coefficient, a method is generally used in which the digital value obtained by the above-described sampling is subjected to various signal processings to extract characteristic parameters and matching is performed.

【００１９】認識が行われ適当な語彙が選択されると、
この語彙情報に対応する制御が行われる。指定ダイヤル
パルスや、トーン信号が発信されたり、各種の制御信号
が作り出される。そして次の入力を促進するメッセージ
を出力し、次の音声入力を待つ。Once the recognition is done and the appropriate vocabulary is selected,
Control corresponding to this vocabulary information is performed. Designated dial pulses, tone signals are emitted, and various control signals are generated. Then, it outputs a message prompting the next input and waits for the next voice input.

【００２０】認識手段２は、必ずしも確実に発声された
語彙を正しく認識できるとは限らない。発声語彙の言い
まちがいや、発声が途中で途切れた場合、入力レベルが
音声認識するために小さすぎる（発声は確認されたが認
識するには、入力レベルが小さすぎる）場合、大きすぎ
る場合、また発声が長すぎる場合や、認識可能な語彙以
外の発声を受け付けた場合は、その旨をエラーメッセー
ジ出力として表示または音声出力する出力手段５をも
つ。The recognizing means 2 is not always able to correctly recognize the uttered vocabulary. If the vocabulary is incorrect, or if the utterance is interrupted in the middle, the input level is too low for voice recognition (the utterance is confirmed but the input level is too low for recognition), too high, or When the utterance is too long, or when a utterance other than the recognizable vocabulary is received, the output means 5 is provided to display or output as an error message output.

【００２１】認識誤りが発生した時の結果情報の内容例
を図３に示す。図３に示すとおり、認識誤りが発生した
場合は、そのエラー番号を得ることによってその原因を
知ることができる。従って入力が小さすぎる場合は、も
う少し大きな声で発声することを希望する旨のメッセー
ジを出力し、音声入力利得を変更する構成とする。入力
が大きすぎる場合も小さいときと同様に、もう少し小さ
な声で発声することを希望する旨のメッセージを出力
し、音声入力利得を変更する。その他、発声は確認した
が、認識できなかったときには、もう一度発声を行わせ
るようなメッセージを出力する。このように認識時にお
いて、語彙の認識ができなかった場合の処理を例外処理
と定義する。また音声認識を行うに必要な音声入力が得
られた場合で、該当語彙とマッチングを行った場合に、
該当語彙がないと判定された場合においても、同様に語
彙の認識ができなかったという結果を返す。An example of the contents of the result information when a recognition error occurs is shown in FIG. As shown in FIG. 3, when a recognition error occurs, the cause can be known by obtaining the error number. Therefore, when the input is too small, a message indicating that the user wants to speak a little louder is output and the voice input gain is changed. When the input is too large, as in the case where the input is too small, a message indicating that a smaller voice is desired is output and the voice input gain is changed. In addition, although the utterance was confirmed, if the utterance was not recognized, a message prompting the user to utter again is output. In this way, the process when the vocabulary cannot be recognized at the time of recognition is defined as an exception process. In addition, when the voice input necessary for voice recognition is obtained and matching with the corresponding vocabulary is performed,
Even when it is determined that there is no corresponding vocabulary, the result that the vocabulary cannot be recognized is returned in the same manner.

【００２２】例外処理時における音声認識の手順を図４
に示す。次処理決定手段３が、認識手段２からエラー出
力を得た場合は、認識手段２を非活性化し、例外情報格
納手段８を設定し、例外処理認識手段４を活性化する。
そして次の音声入力を待つ。そして例外情報格納手段８
内にある例外処理用の音声情報と、入力された音声情報
とのマッチングが行われる。認識結果は次処理決定手段
３に返される。次処理決定手段３は、例外処理用の語彙
であると認識された場合は、例外処理認識手段８を非活
性化し、再び認識手段２を活性化する。それ以外の場合
は、例外処理が続けられる。例外処理認識中では、認識
誤りが発生した場合は、正しい語彙が認識できるまで正
しい入力を要求する。よって入力レベルの変更等で、入
力に関する是正の機能も持つ。例外処理用の語彙を発声
しない限り、通常の処理に復帰しないので、途中で入力
がどこまで進んだかわからなくなることもない（誤りが
発生した時点で停止している）。FIG. 4 shows the procedure of speech recognition during exception processing.
Shown in. When the next process determining means 3 receives the error output from the recognizing means 2, the recognizing means 2 is deactivated, the exception information storing means 8 is set, and the exception processing recognizing means 4 is activated.
Then wait for the next voice input. And the exception information storage means 8
Matching is performed between the input voice information and the voice information for exception processing in the above. The recognition result is returned to the next process determining means 3. When the next processing determination means 3 is recognized as a vocabulary for exception processing, the next processing determination means 3 deactivates the exception processing recognition means 8 and activates the recognition means 2 again. Otherwise, exception handling continues. If a recognition error occurs during exception handling recognition, correct input is required until the correct vocabulary can be recognized. Therefore, it also has a function to correct input by changing the input level. Unless the vocabulary for exception handling is uttered, it does not return to normal processing, so you will not lose track of the input in the middle (it stops when an error occurs).

【００２３】音声情報格納手段６には、通常の場合と、
例外処理の場合の２種類の音声情報が格納されている。
この構成を図５に示す。認識手段２が参照するのは通常
情報格納手段７であり、例外処理認識手段４が参照する
のは例外情報格納手段８である。図５では通常情報格納
手段７に数字情報を格納し、例外情報格納手段８に、入
力再開，訂正用の語彙情報を格納している例を示す。通
常情報格納手段７の内部には、数字情報以外に先に説明
した通り、複数の情報がグループ分けにされて格納され
ている。一方例外情報格納手段８には、音声入力を再開
するための語彙情報が格納されている。The voice information storage means 6 stores the normal case,
Two types of audio information in the case of exception processing are stored.
This configuration is shown in FIG. The recognition means 2 refers to the normal information storage means 7, and the exception processing recognition means 4 refers to the exception information storage means 8. FIG. 5 shows an example in which numeric information is stored in the normal information storage means 7 and vocabulary information for input restart and correction is stored in the exception information storage means 8. Inside the normal information storage means 7, as described above, a plurality of pieces of information are grouped and stored in addition to the numerical information. On the other hand, the exception information storage means 8 stores vocabulary information for restarting voice input.

【００２４】例外情報格納手段８には、通常の入力には
用いられない語彙を複数用意する。例外情報の語彙が、
通常の入力用の語彙と似ていると、利用者が戸惑う場合
がある。認識ができなかった場合にのみ動作する例外処
理においては、利用者に認識誤りが発生したことを知ら
せると共に、通常の処理に復帰するという意識付けを行
う効果がある。例外処理用の語彙と通常処理用の語彙が
似ていると、認識誤りが発生して特定の語彙を発声した
後で、通常の処理に復帰したのか、まだ例外処理の中か
は理解しにくい。従って両者の語彙を区別するために、
２種類の情報の語彙の選択には注意を要する。In the exception information storage means 8, a plurality of vocabularies that are not used for normal input are prepared. The vocabulary of exception information is
If the vocabulary for normal input is similar, the user may be confused. In the exception processing that operates only when the recognition cannot be performed, there is an effect of notifying the user that a recognition error has occurred and consciousness of returning to the normal processing. If the vocabulary for exception processing and the vocabulary for normal processing are similar, it is difficult to understand whether the recognition error occurred and a specific vocabulary was uttered, and then it returned to normal processing or is still in exception processing. .. Therefore, to distinguish the two vocabularies,
Care must be taken when selecting the vocabulary of two types of information.

【００２５】また例外情報格納手段８に格納する情報
は、その中で類似している語彙を複数用意する。図５で
は、「さいかい」，「さいにゅうりょく」，「もういち
ど」，「もういっかい」等の情報を示している。このよ
うに複数でかつそれらの語彙が似た語彙を用意すること
で、例外処理から復帰するときの語彙には柔軟性をもた
せる。特定の語彙を１つ、または数個用意する手段もあ
るが、数が少ないと利用者が語彙を忘れてしまった場
合、その語彙を発声しない限り、通常の処理には戻れな
いので不便さを感じさせる原因となってしまう可能性が
ある。また利用者の発声する語彙にすべて標準語を採用
することは、話し言葉と、機器を制御する言葉は違うと
いう印象を与える原因となる。できるだけ違和感のない
程度で、語彙を増やすことで、地域差を解消し広い範囲
で使用できる機器を供給することができる。As the information stored in the exception information storage means 8, a plurality of similar vocabularies are prepared. In FIG. 5, information such as “saikai”, “saiyonryuku”, “again”, “more” is shown. By preparing a plurality of vocabularies that are similar to each other, flexibility is provided in the vocabulary when returning from exception handling. There is a way to prepare one or several specific vocabulary, but if the number is small, if the user forgets the vocabulary, it can not return to normal processing unless the vocabulary is uttered, which is inconvenient. It may cause you to feel. Also, adopting all standard words in the vocabulary spoken by the user gives the impression that spoken words are different from the words that control the device. By increasing the vocabulary to the extent that it is as comfortable as possible, it is possible to eliminate regional differences and supply equipment that can be used in a wide range.

【００２６】さらに、例外情報格納手段８を、書き換え
可能なメモリや、増設メモリで構成すると、標準用の語
彙情報を本体メモリ内に持ち、その地方の語彙情報を増
設メモリ等で供給する形をとることによって、全ての地
方の情報を予め収拾する方法をとらずとも、これに対応
することができる。同一情報のために複数の語彙をもつ
ことは、例外情報だけでなく、通常情報においても有効
である。しかしながら、通常情報においては、音声で制
御する対象が増えれば増えるほど、その地方の話し言葉
に対応する情報も増えると想定されるため、実現性が少
ない。一方例外情報に限れば、語彙数が限られており、
また使用対象もはっきりしているため、対応しやすい。
また、「まちがい」，「しっぱい」，「もういっかい」
等の語彙は意識せずとも、ふと口にしやすい言葉である
ため、これらの語彙のみを例外情報とし、地域性を考慮
した音声情報を構成することに有効性は大きい。Further, when the exception information storage means 8 is composed of a rewritable memory or an additional memory, the standard vocabulary information is held in the main body memory and the local vocabulary information is supplied by the additional memory or the like. By taking this, it is possible to deal with this without taking a method of collecting all the local information in advance. Having multiple vocabularies for the same information is effective not only in exceptional information but also in normal information. However, in the normal information, it is assumed that the more the number of objects to be controlled by voice increases, the more information corresponding to the local spoken language will increase, and thus the feasibility is low. On the other hand, if it is limited to exception information, the number of vocabulary is limited,
In addition, since the target of use is clear, it is easy to handle.
In addition, "mistake", "sour", "more"
Even if the vocabulary such as "is" is a word that can be easily spoken without being conscious of it, it is very effective to use only these vocabulary as exceptional information and construct voice information considering regional characteristics.

【００２７】数字情報を複数個連続して音声入力する場
合について説明する。電話番号入力や暗証番号入力、各
種音響機器のプログラム予約等がこれに該当する。入力
要素が複数あるので、これらの入力を完了するために
は、少しでも次の入力までの時間の短縮化が要求され
る。すなわち、１つの語彙情報が認識されると同時に次
の語彙情報の認識を開始する認識処理の多重化が要求さ
れる。一方高速に連続で発声する場合には、１つ１つを
区切って単独で発声していく時と違って、息切れや抑揚
の影響を受けやすい。従って高速連続音声入力において
は、途中で認識するための音声情報が得られなくなる可
能性が、単独発声の場合に比べてかなり高くなる。高速
連続音声入力といっても、途中で少し休んだりすること
は構わないが、頭の中で暗記している電話番号等の数字
情報は、一気に発声したほうが、発声しやすいというこ
ともある。A case where a plurality of pieces of numerical information are continuously input by voice will be described. This includes telephone number input, personal identification number input, program reservations for various audio equipment, etc. Since there are multiple input elements, it is necessary to shorten the time until the next input as much as possible in order to complete these inputs. That is, it is necessary to multiplex the recognition process for starting recognition of the next vocabulary information at the same time that one vocabulary information is recognized. On the other hand, when uttering at high speed continuously, it is easy to be affected by shortness of breath and intonation, unlike when singing one by one separately. Therefore, in high-speed continuous voice input, the possibility that voice information for recognizing in the middle cannot be obtained becomes considerably higher than in the case of single utterance. Even though it is said to be high-speed continuous voice input, it is okay to take a short break on the way, but it may be easier to say numerical information such as a telephone number that is memorized in your head at a stretch.

【００２８】以上のように本実施例によれば、複数個の
高速音声連続入力を行ったときに、途中で認識誤りが発
生した時点で、通常の認識処理を中断し、例外処理を実
行する構成によって、途中の情報の欠落を起すことな
く、また誤りが発生した時点から後の情報のみを言い直
せばよい、使い勝手のよい装置を構成できる。As described above, according to this embodiment, when a plurality of high-speed voice continuous inputs are made, when a recognition error occurs in the middle, the normal recognition process is interrupted and the exception process is executed. According to the configuration, it is possible to configure an easy-to-use device that does not cause loss of information on the way and only needs to restate only the information after the time when the error occurs.

【００２９】（実施例２）以下本発明の第２の実施例に
ついて説明する。構成要素は第１の実施例と同様であ
る。ただし次処理決定手段３が出力手段５と入力手段１
の両者を制御する構成とする。(Second Embodiment) A second embodiment of the present invention will be described below. The constituent elements are the same as in the first embodiment. However, the next process determining means 3 is the output means 5 and the input means 1.
Both of them are controlled.

【００３０】出力手段５に音声合成機能を持たせて音声
合成を行い、出力手段５から音声ガイダンス出力を行っ
ている間は、音声入力を禁止する。理由は、少しでも音
声ガイダンス出力による雑音成分（音声認識のための入
力にとっては雑音となる）をなくし、また利用者の思い
まちがいによる誤入力を減らすことを目的とする。The output means 5 is provided with a voice synthesis function to perform voice synthesis, and voice input is prohibited while the voice guidance is being output from the output means 5. The purpose is to eliminate the noise component (which becomes noise for the input for voice recognition) due to the voice guidance output, and to reduce the erroneous input due to the user's mistake.

【００３１】利用者が音声入力に慣れ、入力の高速化が
要望された場合、音声合成量を減らす（メッセージを短
くする）指令や、音声合成を中止する指令を認識する語
彙を設け、これらの語彙が認識された場合、その内容に
従って音声合成の内容を変更したり、中止したり、また
再開する機能を有すれば、さらに使いやすくなる。同様
にして、表示内容を変更したり、表示を中止したりする
ことも実現できる。When the user becomes accustomed to voice input and demands high-speed input, a vocabulary for recognizing a command to reduce the amount of voice synthesis (to shorten a message) and a command to stop voice synthesis are provided, and these commands are provided. When a vocabulary is recognized, it becomes easier to use if it has a function of changing, stopping, and resuming the contents of speech synthesis according to the contents. Similarly, it is possible to change the display content or cancel the display.

【００３２】このように、音声認識装置内で、装置の操
作指令情報を用いれば、利用目的に応じて必要な情報
を、要望する形で提供する装置を構成できる。As described above, by using the operation command information of the device in the voice recognition device, it is possible to construct a device that provides necessary information in a desired form according to the purpose of use.

【００３３】[0033]

【発明の効果】以上のように本発明は、音声入力を認識
する認識手段と、認識した結果に基づいて次の処理を決
定する次処理決定手段と、音声認識時に認識誤りが発生
した場合に実行する例外処理認識手段と、認識手段が参
照する通常情報格納手段と、例外処理認識手段が参照す
る例外情報格納手段を有することにより、認識誤りとい
う結果を出したときに、次処理決定手段は、通常の認識
処理を中断し、例外処理認識手段を活性化させて、例外
情報格納手段を参照して認識を行う手段を有すること
で、認識誤りが発生した場合に、利用者に初めから全て
を言い直させることなく、高速で連続した音声入力に適
した音声認識装置を実現するものである。As described above, according to the present invention, the recognizing means for recognizing a voice input, the next processing determining means for deciding the next processing based on the recognition result, and the following processing when a recognition error occurs during voice recognition By having the exception processing recognition means to execute, the normal information storage means referred to by the recognition means, and the exception information storage means referred to by the exception processing recognition means, when the result of recognition error is produced, the next processing determination means By disabling the normal recognition processing, activating the exception processing recognition means, and recognizing by referring to the exception information storage means, if a recognition error occurs, the user can do everything from the beginning. It is to realize a voice recognition device suitable for high-speed continuous voice input without rewording.

[Brief description of drawings]

【図１】本発明の一実施例における音声認識装置の構成
図FIG. 1 is a configuration diagram of a voice recognition device according to an embodiment of the present invention.

【図２】同音声認識装置の動作説明のための制御手順図FIG. 2 is a control procedure diagram for explaining the operation of the voice recognition device.

【図３】本発明の構成要素である認識手段から出力され
る認識誤り情報の内容図FIG. 3 is a content diagram of recognition error information output from a recognition unit that is a constituent element of the present invention.

【図４】同音声認識装置の例外処理動作説明のための制
御手順図FIG. 4 is a control procedure diagram for explaining an exceptional processing operation of the voice recognition device.

【図５】本発明の構成要素である通常情報格納手段と例
外情報格納手段の構成図FIG. 5 is a configuration diagram of normal information storage means and exception information storage means, which are constituent elements of the present invention.

【図６】従来の音声認識装置の構成図FIG. 6 is a configuration diagram of a conventional voice recognition device.

[Explanation of symbols]

１入力手段２認識手段３次処理決定手段４例外処理認識手段５出力手段７通常情報格納手段８例外情報格納手段 DESCRIPTION OF SYMBOLS 1 Input means 2 Recognition means 3rd process determination means 4 Exception processing recognition means 5 Output means 7 Normal information storage means 8 Exception information storage means

Claims

[Claims]

1. Input means for receiving a voice input, recognition means for recognizing the voice input accepted by the input means, and next processing determining means for determining the next processing based on the result recognized by the recognition means. Exception processing recognition means that is executed when a recognition error occurs during voice recognition, output means that outputs the recognition processing result, normal information storage means that the recognition means refers to, and exception information that the exception processing recognition means refers to A voice recognition device comprising storage means.