JPS61175696A

JPS61175696A - Voice recognition responder

Info

Publication number: JPS61175696A
Application number: JP60015437A
Authority: JP
Inventors: 純一田村
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1985-01-31
Filing date: 1985-01-31
Publication date: 1986-08-07
Anticipated expiration: 2009-05-02
Also published as: JPH0634188B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［技術分野］本発明は音声認識応答装置に関し、特に不特定話者が使
用するもので、かつ音声認識の信頼性が要求される音声
認識応答装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field] The present invention relates to a voice recognition response device, and particularly to a voice recognition response device that is used by unspecified speakers and requires reliable voice recognition.

［従来技術］従来、この種の音声認識応答装置では完全な音声認識は
不可能であり、特に認識の信頼性が要求される用途にお
いては、音声応答装置等を使用して認識結果を応答出力
し、その確認入力を促していた。[Prior art] Conventionally, perfect speech recognition has not been possible with this type of voice recognition response device, and in applications where recognition reliability is particularly required, it is necessary to use a voice response device or the like to output the recognition results as a response. and asked for confirmation.

一例として、バンキングサービスにおける残高照会を示
すと表１のような音声認識応答手順となる。　　　　　
　　　　　　　　　以下余白表　　１このように顧客側入力と銀行側応答は交互に行なわれ、
その中で認識結果の確認「はい」、「いいえ」の認識も
行なわれる。即ち、認識結果の確認が肯定的であれば次
のステップに進み、新たな入力と応答が行なわれるが、
否定的の場合は同じプロセスを繰り返し、正しく認識さ
れるまでは次のステップに進めなかった。As an example, for a balance inquiry in a banking service, the voice recognition response procedure is as shown in Table 1.
Margin table below 1 In this way, the customer side input and the bank side response are performed alternately,
In this process, the recognition result is also confirmed as ``yes'' or ``no''. That is, if the confirmation of the recognition result is positive, the process advances to the next step and a new input and response are performed.
In case of negative results, the same process was repeated and the next step could not proceed until it was recognized correctly.

しかしながら単に認識回数を増やしても正しい結果が得
られるとは限らず、実際は２〜３回入力して認識できな
い時は以後何回入力しても認識されない事が多い、音声
認識応答装置は操作者をわずられしいキー操作から開放
するものとして開発されたが、現実には誰にでも使用で
きるものでなく、認識がうまくできない場合は正しく認
識されるまで何度でも音声を入力しなければならず、か
えって話者に時間と労力を強要するものと考えられてい
た。However, simply increasing the number of recognition times does not always result in correct results; in fact, if the input cannot be recognized after 2 or 3 inputs, it often remains unrecognized no matter how many times the voice recognition response device is input. It was developed to free people from tedious key operations, but in reality it cannot be used by anyone, and if recognition is not successful, the user must input the voice over and over again until it is recognized correctly. However, it was thought that it would actually require more time and effort from the speaker.

［目的］本発明は上述した従来技術の欠点に鑑みて成されたもの
であって、その目的とする所は、音声を何度入力しても
認識されない場合の解決策を与えるものである。[Objective] The present invention has been made in view of the above-mentioned drawbacks of the prior art, and its object is to provide a solution to the case where voice is not recognized no matter how many times it is input.

また、この種の装置で認識されにくいとされる個性の強
い話者に対しも十分対応できる音声認識応答装置を提供
することにある。Another object of the present invention is to provide a voice recognition and response device that can sufficiently handle speakers with strong personalities who are difficult to recognize with this type of device.

〔実施例」以下、添付図面に従って本発明の実施例を詳細に説明す
る。[Embodiments] Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

第１図は本発明に係る実施例の音声認識応答装置のブロ
ック構成図である０図において、ｌは音声を入力するマ
イクロホン、２は音声信号をデジタル変換するＡ／Ｄ変
換器、３は入力音声信号の特徴パラメータを抽出する特
徴抽出部、４はＲＡＭ、ＲＯＭを含むマイクロプロセッ
サから成り、入力音声の認識及びそれに基づく音声応答
制御を実行するセントラルブロセツシングユニット（Ｃ
ＰＵ）、５は装置が音声出力をするスピーカ、６は水装
置の認識結果を利用して動作する外部装置（キャッシュ
ディスペンサ〕である。FIG. 1 is a block diagram of a voice recognition response device according to an embodiment of the present invention. In FIG. The feature extractor 4 extracts the feature parameters of the audio signal, and is composed of a microprocessor including RAM and ROM, and includes a central processing unit (C) that recognizes input audio and performs audio response control based on it.
PU), 5 is a speaker from which the device outputs audio, and 6 is an external device (cash dispenser) that operates using the recognition result of the water device.

ＣＰＵ４内にはプログラムの実行により実現される各種
機能ブロックが示されている。７は所定種類の音声、例
えば「ゼロ」〜「キュー」及び「ハイ」等の入力を認識
する第１認識部、８は特定音声、例えば「ハイ」のみを
高性能、高認識率で認識する第２認識部、９は認識結果
に基づいて制御を進める制御部、１０は音声応答信号を
合成出力する音声応答部、１１は応答用メツセージの音
声データを格納しているメツセージメモリである。Inside the CPU 4, various functional blocks are shown that are realized by executing programs. 7 is a first recognition unit that recognizes inputs of predetermined types of sounds, such as "zero" to "cue" and "high", and 8 recognizes only specific sounds, such as "high", with high performance and high recognition rate. A second recognition section, 9 is a control section that advances control based on the recognition result, 10 is a voice response section that synthesizes and outputs a voice response signal, and 11 is a message memory that stores voice data of a response message.

第２図はメツセージメモリ１１の記憶内容を示す図であ
る。メツセージメモリ１１はグループ分けした各種メツ
セージを記憶しており、１２は話者に対する案内用（ガ
イタンス）メツセージを記憶しているガイダンスメツセ
ージの記憶エリア、１３は入力音声認識結果の確認メツ
セージを記憶しているエリア、１４は認識がうまくいか
ない場合に装置側から話者に提案する提案メツセージを
記憶しているエリアである。FIG. 2 is a diagram showing the contents of the message memory 11. The message memory 11 stores various grouped messages, 12 is a guidance message storage area that stores guidance messages for the speaker, and 13 stores confirmation messages for input voice recognition results. Area 14 is an area that stores a suggested message to be proposed from the device to the speaker if recognition is not successful.

第３図は実施例の動作手順を説明するフローチャートで
ある。ステップＳ１では認識できなかった回数を数える
リトライカウンタＲＣを０に、またガイダンスエリア用
のインデックスレジスタｉを１に初期化する。ステップ
Ｓ２ではスイッチ５Ｗｔ−１側に接続する。所定種類の
入力音声を認識、識別するためである。ステップＳ３で
はインデックスレジスタｉの内容（最初は１）でガイダ
ンスメツセージをアクセスしてスピーカ５に出力する。FIG. 3 is a flowchart illustrating the operating procedure of the embodiment. In step S1, a retry counter RC for counting the number of times that recognition has failed is initialized to 0, and an index register i for the guidance area is initialized to 1. In step S2, it is connected to the switch 5Wt-1 side. This is to recognize and identify a predetermined type of input voice. In step S3, the guidance message is accessed using the contents of the index register i (initially 1) and output to the speaker 5.

即ち、［暗証番号の１ケタ目をどうぞ」の音声出力をす
る。ステップＳ４では話者の音声入力を待つ。音声入力
があるとステップＳ５に進んでで音声認識をし、認識結
果の符号（数）をレジスタｊに格納する。ステップＳ６
では該レジスタｊの内容（例えば３）で確認メツセージ
をアクセスし、スピーカ５に出力する。即ち、「サンで
すか」を音声出力する。ステップＳ７では話者の返事を
待ち、返事があるとステップＳ８で入力音声を認識し、
ステップＳ９で「はい］か否かを判別する。That is, a voice message saying "Please enter the first digit of your PIN number" is output. In step S4, a voice input from the speaker is waited for. If there is voice input, the process proceeds to step S5 where voice recognition is performed and the sign (number) of the recognition result is stored in register j. Step S6
Then, the confirmation message is accessed using the contents of register j (for example, 3) and output to the speaker 5. In other words, "Is it San?" is outputted as a voice. In step S7, the speaker waits for a reply, and when there is a reply, the input voice is recognized in step S8.
In step S9, it is determined whether "Yes" or not.

「はい」であればステップＳ５の認識が正しいことの確
認がとれたことになる。ステップＳＩＯでリトライカウ
ンタＲｅを０にし、ステップＳ１１でインデックスレジ
スタｉにプラス１し、ステップＳ１２で認識結果ｊの符
号を外部装置６に送る。ステップ５１３ではインデック
スレジスタｉが最大（暗証の入力ケタ数を満足した）か
否かを判別し、満足なら処理を終了し、満足でなければ
ステップＳ２に戻り、次のガイダンスメツセージを出力
する。If "yes", it is confirmed that the recognition in step S5 is correct. The retry counter Re is set to 0 in step SIO, the index register i is incremented by 1 in step S11, and the code of the recognition result j is sent to the external device 6 in step S12. In step 513, it is determined whether the index register i is at the maximum (the number of input digits of the password is satisfied), and if it is satisfied, the process is terminated, and if it is not satisfied, the process returns to step S2 and the next guidance message is output.

次に前記同様にしてステップＳ３からステップＳ９に進
み、話者の返事が「はい」でないときは認識結果のｊが
誤りであったことを意味する０例えば「イチ」と発音し
たのに「ハチ」と認識してしまった場合はアドレスＡ−
Ｑ　（８）の音声「ハチですか」が出力される０話者は
間違っているので「イイエ」を入力する。「ハイ」でな
いからフローはステップＳ１４に進み、リトライカウン
タＲＣにプラス１する。ステップ３１５ではリトライカ
ウンタＲＣを調べ、内容が２でなければステップＳ２に
戻る。このように実施例では１回だけ同一方法で音声の
再入力、確認を行うこととした。Next, the process proceeds from step S3 to step S9 in the same manner as above, and if the speaker's response is not "yes", it means that the recognition result j was incorrect. ”, address A-
Q (8) Speaker 0 who outputs the voice "Is it a bee?" is wrong, so input "No". Since it is not "high", the flow advances to step S14, where the retry counter RC is incremented by one. In step 315, the retry counter RC is checked, and if the content is not 2, the process returns to step S2. In this way, in the embodiment, the voice is re-input and confirmed using the same method only once.

もし、ステップＳ１５でリトライカウンタＲＣ＝２と判
別するとステップ５１６に進み、スイッチＳＷを２側に
切り替える。第２認識部を使うためである。実施例の第
２認識部８はあらゆるタイプの話者の「はい」のみを高
性能、高信頼で認識できるように構成されている。この
意味で本実施例では第１認識部と第２認識部を分けて示
しである。If it is determined in step S15 that the retry counter RC=2, the process proceeds to step 516 and the switch SW is switched to the 2 side. This is to use the second recognition section. The second recognition unit 8 of the embodiment is configured to be able to recognize only "yes" from all types of speakers with high performance and high reliability. In this sense, the first recognition section and the second recognition section are shown separately in this embodiment.

さて、ステップＳ１７では提案カウンタにの内容を１に
初期化する。ステップ５１８では提案カウンタにの内容
で提案メツセージをアクセスしてスピーカ５に音声出力
する。即ち、「イチならばハイとこたえてください」を
音声出力する。ステップＳ９では話者の返事を待つ。ス
テップＳ２０では返事の音声入力を認識する。ステップ
Ｓ２１では認識結果が「ハイ」か否かを調べる。「ハイ
」ならば話者の入力したかった音声数字は提案カウンタ
にの内容と等しいから、ステップＳ２４でｋの内容をイ
ンデックスレジスタｉに移し、ステップＳ１０に進む０
次の桁の暗証入力を行うためである。Now, in step S17, the content of the proposal counter is initialized to 1. In step 518, the proposed message is accessed based on the contents of the proposal counter and output as audio to the speaker 5. That is, "If it's 1, please answer yes" is outputted as a voice. In step S9, a reply from the speaker is awaited. In step S20, the voice input of the reply is recognized. In step S21, it is checked whether the recognition result is "high". If "yes", the phonetic digit that the speaker wanted to input is equal to the content in the proposal counter, so in step S24 the content of k is transferred to index register i, and the process proceeds to step S10.
This is to enter the next digit of the password.

またステップＳ２１で「ハイ」でないときは話者の意図
した数でないことを意味する。フローはステップ５２２
に進んで提案カウンタｋにプラス１をし、ステップＳ２
３で提案カウンタｋが最大か否かを判別する。最大でな
けれがステップ３１８に戻って次の数を提案し、また最
大ならステップ３１７に戻って１から始める。Further, if it is not "high" in step S21, it means that the number is not the number intended by the speaker. The flow is step 522
Proceed to step S2, add 1 to the proposal counter k, and proceed to step S2.
3, it is determined whether the proposal counter k is the maximum. If it is not the maximum, the process returns to step 318 and proposes the next number, and if it is the maximum, the process returns to step 317 and starts from 1.

尚、上述実施例において、第１認識部が「ハイ」又は「
イイエ」を高性能認識できるなら、第２認識部を別に設
ける必要はない。In the above-mentioned embodiment, the first recognition unit indicates "high" or "
If "No" can be recognized with high performance, there is no need to separately provide a second recognition section.

また、第２認識部の「ハイ」又は「イイエ」の高性能認
識が困難な場合は「ハイ」又は「イイエ」を認識する代
りに単に音声（又は音）の有無を検出するような単純か
つ確実な方法で、確認するようにしてもよい。In addition, if it is difficult for the second recognition unit to recognize “hi” or “no” with high performance, a simple method that simply detects the presence or absence of voice (or sound) instead of recognizing “yes” or “no” may be used. It may be confirmed by a reliable method.

また上述実施例において、制御部は話者が入力した音声
が誤って認識された場合に誤りの回数を数え、ある一定
値（実施例では２回）を越えた場合に質問応答形式を変
えていた。この場合に１話者が入力した音声の認識結果
応答はスコア（類似度）の一番高い語句を第１候補とし
て話者に確認出力しているわけであるが、この！ｇｌ候
補を認識結果として出力し、確認音声を認識した結果、
誤っていた場合は、話者に再び数字の音声入力を要求す
るのでなく、直ちに装置内の第２候補、第３候補でもっ
て提案応答を行なうことにすれば、更に効率良い動作を
行なわせることができる。Furthermore, in the embodiment described above, the control unit counts the number of errors when the voice input by the speaker is incorrectly recognized, and changes the question-and-answer format when the number of errors exceeds a certain value (two times in the embodiment). Ta. In this case, the recognition result response for the speech input by one speaker is outputted to the speaker for confirmation, with the word with the highest score (similarity) as the first candidate. As a result of outputting the gl candidate as a recognition result and recognizing the confirmation voice,
If the number is incorrect, instead of requesting the speaker to input the number aloud again, the device immediately makes a suggestion response using the second and third choices in the device, making the operation more efficient. I can do it.

［効果Ｊ以上述べた如く本発明によれば、誤認識の回数を数えそ
の回数がある一定値を越だ場合には音声入力・認識一応
答確認の工程から提案出力一応答認識の工程へと切り替
える制御を付加した事により、効率良く正確な認識がで
きる音声認識応答装置を提供できる。[Effect J As described above, according to the present invention, the number of erroneous recognitions is counted, and if the number exceeds a certain value, the process moves from the voice input/recognition-response confirmation process to the proposal output-response recognition process. By adding switching control, it is possible to provide a voice recognition response device that can perform efficient and accurate recognition.

また、第２認識部への入力を単なる音声又は音が入力さ
れたか否かで判断するようにすることで、誰もがより確
実に操作できる音声認識応答装置を提供できる。また、
本発明を音声タイプライタに応用すれば、高速、高信頼
性のものが実現できる。Furthermore, by determining the input to the second recognition unit based on whether or not mere voice or sound is input, it is possible to provide a voice recognition response device that anyone can operate more reliably. Also,
If the present invention is applied to a voice typewriter, a high speed and highly reliable typewriter can be realized.

[Brief explanation of the drawing]

第１図は本発明に係る実施例の音声認識応答装着のブロ
ック構成図、第２図はメツセージメモリの記憶内容を示す図、第３図は実施例の動作手順を説明するフローチャートで
ある。ここで、１・・・マイクロホン、２・・・特徴抽出部、
３・・・Ａ／Ｄ変換器、４・・・セントラルプロセツシ
ングユニット（ＣＰＵ）、５・・・スピーカ、６・・・
外部装置、７・・・第１認識部、８・・・第２認識部、
９・・・制御部、１０・・・音声応答部、１１・・・メ
ツセージメモリである。FIG. 1 is a block diagram of a voice recognition response device according to an embodiment of the present invention, FIG. 2 is a diagram showing the contents of a message memory, and FIG. 3 is a flowchart illustrating the operating procedure of the embodiment. Here, 1...microphone, 2...feature extraction unit,
3... A/D converter, 4... Central processing unit (CPU), 5... Speaker, 6...
external device, 7... first recognition unit, 8... second recognition unit,
9...Control unit, 10...Voice response unit, 11...Message memory.

Claims

[Claims]

(1) recognition means for recognizing voice input; response means for outputting the recognition result of the recognition means; confirmation means for determining whether the recognition result is correct by determining the response input to the output of the response means; a word suggestion means for outputting the word selected by a predetermined method as a voice when the confirmation means detects an error in the recognition result; and a word identification means for identifying the word intended to be input by determining a response input to the output of the word suggestion means. A voice recognition response device comprising: means.

(2) The voice recognition response device according to claim 1, wherein the word suggestion means sequentially selects words within a predetermined group.

(3) The voice recognition response device according to claim 1, wherein the word suggestion means selects words in order of similarity to the recognition result.