JPH04152397A

JPH04152397A - Voice recognizing device

Info

Publication number: JPH04152397A
Application number: JP2278394A
Authority: JP
Inventors: Shinichi Tsurufuji; 鶴藤　真一; Hiroki Onishi; 宏樹大西
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1990-10-16
Filing date: 1990-10-16
Publication date: 1992-05-26
Anticipated expiration: 2013-12-14
Also published as: JP2834880B2

Abstract

PURPOSE:To obtain the correct recognition result by setting strictly a threshold value in advance and setting generously the threshold by a prescribed quantity at the time of reinputting an input voice rejected by the strict threshold, while preventing an erroneous recognition. CONSTITUTION:The device is provided with a deciding part 26 for outputting a signal corresponding to a selection standard pattern in the case, when the degree of maximum similarity of a standard pattern selected by a similarity calculating part 24 is larger than a threshold set in advance, and deciding it to be recognition rejection, when this maximum similarity is smaller than the threshold. In the case it is decided to be rejection by the deciding part 26, a signal corresponding to a selection standard pattern in that case is saved as a result of recognition rejection, and in the case the same result of recognition rejection as the previous voice input is obtained, the threshold set to the deciding part 26 is set generously. While preventing an erroneous recognition by executing a generous decision by using a more generous threshold than the time of the previous voice input, a correct result of recognition is obtained.

Description

[Detailed description of the invention]

（イ）産業上の利用分野音声によって、各種機器を制御する音声認識装置に関す
る。（ロ）従来の技術近年、音声を認識できる音声認識装置の研究開発が盛ん
に行われており、この種装置の実用化が望まれている。この種装置は、一般には、音声を分析して得られる音声
の特徴を表すパラメータからなる音声パタンをデータ処
理するものであり、あらかじめ複数の音声について貯え
られた音声パタン（ｍ準音声パタン）のそれぞれを未知
の音声パタンとパタンマツチングの手法によって比較し
、最も誤差の小さい（即ち、類似度の多きい）標準パタ
ンを見出すことで、この標準パタンに対応した信号が認
識結果として出力されるものである。このような音声認識手法に於ては、入力音声と最も類似
する標準音声を見出しても、これとの類似度が極めて小
さい時は、誤認識である可能性が高いので、これを防止
するために、この最大類似度が一定の閾値を越えなけれ
ば、認識棄却（以後リジェクトという）にするのが一般
的である。（ハ）発明が解決しようとする課題上述の如くリジェクト機能を備えた従来の音声認識装置
に於ては、その閾値を固定して設定していたので、閾値
が厳し過ぎる場合には、ｍ識対象になっている音声を入
力しているのに、発生の微妙な曖昧蚕業によって認識結
果が得られない場合が多発する不都合がある。また、逆
にこの閾値を甘くすると音声以外の雑音でもこれに最も
近い認識対象の音声であると誤認識してしまう不都合が
ある。このように、音声認識装置に於ては、如何に闇値の設定
するかが、重要課題であった。本発明は、このような従来からの課題を解決するために
なされたものであり、リジェクトの状況に応じて閾値を
ダイナミックに設定できる音声認識装置を提供するもの
である。（ニ）課題を解決するための手段本発明の音声認識装置は、音声を入力するマイクとマイ
クから入力された音声を分析する音声分析部、音声分析
部で分析された分析結果を音声パタンに変換するパタン
作成部、あらかじめ複数の音声パタンを標準音声パタン
として貯えている標準パタンメモリ、標準パタンメモリ
の複数の各標準パタンの夫々と上記パタン作成部で作成
された入力音声の音声パタンを比較してその類似度を計
算し、最も大きい類似度を示す標準パタンを選択する類
似度計算部、類似度計算部で選択した標準パタンの最大
類似度があらかじめ設定されている閾値よりも大きい時
、この時の選択標準パタンに対応する信号を認識結果と
して出力し、該最大類似度が閾値よりも小さいとき、認
識棄却と判定する判定部を具備した音声認識装置であっ
て、その特徴とするところは、上記判定部で認識棄却と
判定された場合には、その時の選択標準パタンに対応す
る信号を認識棄却結果として貯え、認識棄却された前回
の音声入力と同一の認識棄却結果が得らる場合には、上
記判定部に設定されている閾値を甘く　（小さく）設定
することにあり、また、その特徴とするところは、上記
類似度計算部から得られた選択標準パタンか示す信号と
認識棄却された前回の音声入力時の認識棄却結果とが一
致する場合には、上記判定部は、前回の音声入力時より
甘い（小さい）閾値を用いて、この時の音声入力に対す
る判定を行うことにある。（ホ）作用本発明の音声認識装置に於ては、あらかじめ閾値を厳し
く設定しておき、誤認識を防止を図りながら、この閾値
でリジェクトになった入力音声の再入力時に、閾値を所
定量甘くしてやることで、正しい認識結果を得ることが
できる。（へ）実施例第２図に本発明の音声認識装置の構成を示し、その要部
の判定部の一実施例の構成を第１図に示す。第２図に於て、（２１）は音声を入力するマイク、（２
２）はマイクから入力された音声を分析する音声分析部
、（２３）は音声分析部で分析された分析結果を音声パ
タンに変換するパタン作成部、（２５）あらかじめ複数
の音声パタンを標準音声パタンとして貯えている標準パ
タンメモリ、（２４）は標準パタンメモリの複数の各標
準パタンの夫々と上記パタン作成部で作成された入力音
声の音声パタンを比較してその類似度を計算し、最も大
きい類似度を示す標準パタンを選択する類似度計算部、
（２６）は類似度計算部で選択した標準パタンの最大類
似度があらかじめ設定されている閾値よりも大きい時、
この時の選択標準パタンに対応する信号を認識結果とし
て出力し、該最大類似度が閾値よりも小さいとき、認識
棄却と判定する判定部である。このような第２図の音声認識装置に於て、本発明の特徴
とするところは、判定部（２６）にあり、その構成は第
１図に示す如く、類似度メモリ（１）、認識結果メモ’
Ｊ（２）、第１闇値テーブル（３）、第１リジェクト判
定部（４）、出力部（５）、リジェクト回数メモリ（６
）、リジェクト回数判定部（７）、同一リジェクト判定
部（８）、リジェクト番号メモリ（９）、第２閾値テー
ブル（１０）、第２リジェクト判定部（１１）からなる
。このような第１図の構成を第２図の判定部（２６）に採
用した場合の本発明装置の動作について以下に解説する
。ここでは、説明を簡単にするために、標準音声パタンメ
モリに、複数の標準パタンが既に貯えられているものと
する。例えば、標準パタンメモリ（２４）の番号［０コ
に対応つけられたメモリ領域に音声”ゼロ”のパタン、
番号［１］に対応つけたれたメモリ領域に音声”イチ”
のパタン、番号［２］に対応つけられたメモリ領域に音
声”二”のパタン、以下同様に番号［９］に対応つけら
れたメモリ領域に音声”キュウ”のパタンが順番に貯え
られている。このように数字音声のパタンか格納された
標準パタンメモリ（２４）を持つ音声認識装置は勿論数
字音声を認識するために動作することになる。今、使用者は、例えば、マイク（２１）に向かって”イ
チ”と発声したとする。この時、マイクから入力された音声は、音声分析部（２
２）で分析された後に、パタン作成部（２３）でパタン
化される。類似度計算部（２４）は、標準パタンメモリ
（２５）内の標準パタンとマイク（２１）から入力され
、パタン作成部（２３）で作成された音声パタンとの類
似度を計算して、そのうち、最も大きな類似度を示した
標準パタンの番号とその類似度を判定部（２６）に伝達
する。斯して判定部（２６）に伝達された標準バタ二の番号は
第１図に示す如く認識結果メモリ（２）へ転送され、ま
たこれと同時に判定部（２６Ｎ、伝達されたその最大類
似度は第１図に示す如く突似度メモリ〔１）に転送され
る。例えば、最大類似度を示す標準パタンの番角［１］であ
り、その類似度が１７０であった４合、認識結果メモリ
（２）に〔１］が、類似度メモリ（１）に１７０がそれ
ぞれ伝達される。この場合、第１リジェクト判定部（４）では、類似度メ
モリ（１）から類似度として１７０が、第１閾値テーブ
ル（３）から第１の閾ｉ［Ｖ　ｔ　ｈ　１（例えば、１
５０に設定されている）が伝達され、その大小関係を調
べる。この数値例では、心似度メモリの値が大きい（類
似している）ので、第１リジェクト判定部（４）は認識
結果メモリ（２）の内容、即ち番号［１コを出力部（５
）に出力すると共に、リジェクト回数メモリの内容をク
リアする。以下に、第６図を参照しつつ解説を続ける。今、例えば、音声”イチ”の入力に対して、最大類似度
を示す標準パタンの番号［１］であり、その類似度が、
同図（ａ）のイで表示した数値１３０であった場合、認
識結果メモリ（２）に［１］が、類似度メモリ（１）に
１３０がそれぞれ格納され、第１リジェクト判定部（４
）にて、第１閾値テーブル（３）の第１の閾値Ｖｔｈｌ
（＝１５０）と類似度メモリ（１）の類似度１３０が図
示の如く比較される。この結果、類似度メモリの値が小さい（認ｍｌ却）ので
、第１リジェクト判定部（４）は、リジェクトと判定し
て、リジェクト回数判定部（７）にその旨を伝達する。リジェクト回数判定部（７）は、リジェクト回数メモリ
（６）からりジエクト回数を呼び込む。リジェクトが初
めてであれば、その値が０であるので、リジェクトと判
定してその旨を出力部（５）に伝達するとともにリジェ
クト回数メモリ（６）のカウント値に１を加え、更に認
識結果メモリ（２）の内容をリジェクト番号メモリに蓄
える。(b) Industrial application field This relates to a voice recognition device that controls various devices using voice. (b) Prior Art In recent years, research and development of speech recognition devices capable of recognizing speech have been actively conducted, and it is desired to put this type of device into practical use. This type of device generally data-processes voice patterns consisting of parameters representing voice characteristics obtained by analyzing voice, and processes voice patterns (m quasi-speech patterns) stored in advance for multiple voices. By comparing each of these with unknown speech patterns using a pattern matching method and finding a standard pattern with the smallest error (that is, with the highest degree of similarity), a signal corresponding to this standard pattern is output as a recognition result. It is something. In such speech recognition methods, even if the most similar standard speech to the input speech is found, if the degree of similarity to this is extremely small, there is a high possibility of misrecognition, so in order to prevent this, Generally, if this maximum similarity does not exceed a certain threshold, the recognition is rejected (hereinafter referred to as "reject"). (c) Problems to be Solved by the Invention As mentioned above, in conventional speech recognition devices equipped with a reject function, the threshold value was set as fixed, so if the threshold value is too strict, m-recognition There is an inconvenience in that even though the target voice is input, recognition results are often not obtained due to the subtle ambiguity of the occurrence. On the other hand, if this threshold value is set too low, there is a problem in that even noise other than speech may be mistakenly recognized as the closest speech to be recognized. Thus, in speech recognition devices, how to set the dark value has been an important issue. The present invention has been made to solve these conventional problems, and provides a speech recognition device that can dynamically set a threshold value depending on the rejection situation. (d) Means for Solving the Problems The speech recognition device of the present invention includes a microphone that inputs speech, a speech analysis section that analyzes the speech input from the microphone, and a speech analysis section that converts the analysis results analyzed by the speech analysis section into speech patterns. Compare the audio pattern of the input audio created by the pattern creation unit to be converted, the standard pattern memory that stores multiple audio patterns as standard audio patterns, and each of the multiple standard patterns in the standard pattern memory with the above pattern creation unit. a similarity calculation unit that calculates the similarity and selects the standard pattern showing the largest similarity; when the maximum similarity of the standard pattern selected by the similarity calculation unit is greater than a preset threshold; A speech recognition device equipped with a determination unit that outputs a signal corresponding to the selected standard pattern as a recognition result and determines that recognition is rejected when the maximum similarity is smaller than a threshold value. When the above judgment unit determines that the recognition is rejected, the signal corresponding to the selected standard pattern at that time is stored as the recognition rejection result, and the same recognition rejection result as the previous voice input for which recognition was rejected is obtained. In this case, the threshold value set in the above-mentioned judgment section is set sweetly (small), and its feature is that the threshold value set in the above-mentioned judgment section is set to be sweet (small), and the feature is that it is recognized as a signal indicating the selected standard pattern obtained from the above-mentioned similarity calculation section. If the recognition rejection result for the previous rejected voice input matches the recognition rejection result, the determination unit makes a determination for the current voice input using a sweeter (smaller) threshold than for the previous voice input. It is in. (E) Function In the speech recognition device of the present invention, the threshold value is set strictly in advance, and while trying to prevent erroneous recognition, the threshold value is set by a predetermined amount when re-inputting input speech that has been rejected at this threshold value. By making it sweeter, you can obtain correct recognition results. (F) Embodiment FIG. 2 shows the configuration of a speech recognition apparatus of the present invention, and FIG. 1 shows the configuration of an embodiment of the determining section, which is the main part thereof. In Figure 2, (21) is a microphone for inputting audio, (2
2) is a voice analysis unit that analyzes the voice input from the microphone, (23) is a pattern creation unit that converts the analysis results analyzed by the voice analysis unit into voice patterns, and (25) is a pattern creation unit that converts multiple voice patterns in advance into standard voices. The standard pattern memory stored as patterns (24) compares each of the plurality of standard patterns in the standard pattern memory with the audio pattern of the input audio created by the pattern creation section, calculates the degree of similarity, and calculates the similarity between each of the plurality of standard patterns in the standard pattern memory. a similarity calculation unit that selects a standard pattern showing a large degree of similarity;
(26) is when the maximum similarity of the standard pattern selected by the similarity calculation section is larger than the preset threshold;
This determination unit outputs a signal corresponding to the selected standard pattern at this time as a recognition result, and determines that recognition is rejected when the maximum similarity is smaller than a threshold value. In the speech recognition device shown in FIG. 2, the feature of the present invention lies in the determination section (26), whose configuration is as shown in FIG. Memo'
J (2), first dark value table (3), first rejection determination section (4), output section (5), rejection number memory (6
), a rejection count determination unit (7), a same rejection determination unit (8), a rejection number memory (9), a second threshold table (10), and a second rejection determination unit (11). The operation of the apparatus of the present invention when the configuration shown in FIG. 1 is adopted as the determining section (26) shown in FIG. 2 will be explained below. Here, to simplify the explanation, it is assumed that a plurality of standard patterns are already stored in the standard voice pattern memory. For example, in the memory area associated with the standard pattern memory (24) number [0], there is a sound pattern of "zero",
The voice "Ichi" is displayed in the memory area associated with number [1].
The pattern of the sound "2" is stored in the memory area associated with the number [2], and the pattern of the sound "Kyu" is stored in the same manner in the memory area associated with the number [9]. . The speech recognition device having the standard pattern memory (24) in which patterns of numeric sounds are stored in this manner naturally operates to recognize numeric sounds. Now, suppose that the user utters "Ichi" into the microphone (21), for example. At this time, the voice input from the microphone is processed by the voice analysis section (2
After being analyzed in step 2), it is patterned in a pattern creation section (23). The similarity calculation unit (24) calculates the similarity between the standard pattern in the standard pattern memory (25) and the audio pattern input from the microphone (21) and created by the pattern creation unit (23). , the number of the standard pattern showing the greatest degree of similarity and its degree of similarity are transmitted to the determination unit (26). The standard Bata 2 number thus transmitted to the determination unit (26) is transferred to the recognition result memory (2) as shown in FIG. is transferred to the similarity memory [1] as shown in FIG. For example, if the standard pattern angle [1] indicates the maximum similarity, and the similarity is 170, [1] is stored in the recognition result memory (2) and 170 is stored in the similarity memory (1). Each is transmitted. In this case, in the first rejection determination unit (4), 170 is set as the similarity from the similarity memory (1), and the first threshold i[V t h 1 (for example, 1
50) is transmitted, and their magnitude relationship is examined. In this numerical example, since the value of the similarity memory is large (similar), the first rejection determination section (4) outputs the contents of the recognition result memory (2), that is, the number [1] to the output section (5).
) and clears the contents of the reject count memory. The explanation will be continued below with reference to Figure 6. Now, for example, for the input of the voice "ichi", the standard pattern number [1] indicating the maximum similarity is
If the numerical value is 130, which is indicated by A in FIG.
), the first threshold value Vthl of the first threshold value table (3)
(=150) and the similarity 130 of the similarity memory (1) are compared as shown. As a result, since the value of the similarity memory is small (acceptance/rejection), the first rejection determination unit (4) determines that the application is rejected, and transmits this to the rejection frequency determination unit (7). The reject count determination unit (7) retrieves the reject count from the reject count memory (6). If the rejection is the first time, the value is 0, so it is determined to be rejected and that fact is transmitted to the output unit (5), and 1 is added to the count value of the rejection number memory (6), and the recognition result memory is Store the contents of (2) in the reject number memory.

【この状態を〈状態１〉とする】以上に説明した〈状１！１〉に於て、続いてマイク（２
１）から音声”いち”が入力され、この時の類似度が第
６図（ａ）の口で示される値１．２０であった場合につ
いて、解説をさらに較ける。この場合も、第１リジェクト判定部（４）は、第１閾値
テーブル（３）から第１の閾値Ｖｔｈｌ；１５０と類似
度メモリ（１）から類似度１２０を呼び込み、その大小
関係を調べる。その結果、同図に示す如く、類似度メモリ（２）の値が
小さいので、第１リジェクト判定部（４）は、リジェク
トと判定して、リジェクト回数判定部（７）にその旨を
伝達する。リジェクト回数判定部（７）は、リジェクト
回数メモリ（６）からりジェクト回数を呼び込む。縦に
１回リジェクトされており、その値が１であるので、た
だちにリジェクトとは判定しないで、その旨をリジェク
ト同一判定部（８）に伝達する。リジェクト同一判定部
（８）はりジエクト番号メモリ（９）からりジエクト番
号［１］を、そして認識結果メモリ（２）から認識結果
［１］を呼び込み、その結果が同一であるかを判定する
。この場合には、両番号が一致するので、その旨を第２
リジェクト判定部（１１）に伝達する。斯して第２リジェクト判定部（１１）はりジェクト回数
メモリ（６）からりジエクト回数［１コが伝達され、ま
た第２Ｇｉｉ値テーブル（］０）からりジェクト回数に
応じた第２の閾値Ｖｔｈ２＝１１０が伝達されるａ第２
リジェクト判定部（１１）では前記の類似度１２０と第
２の閾値Ｖｔｈ２＝１１０の大小関係を調べる。この場
合は、図示の如く、類似度メモリ（２）の値が大きいの
で、第２リジェクト判定部（１１）は認識結果メモリ（
２）の内容を出力部（５）に出力すると共に、リジェク
ト回数メモリの内容をクリアする。一方、第６図（ｂ）に示す如く、この状態でもリジェク
トである場合には、リジェクト回数メモリ（６）の内容
に［１コを加えるので、その記憶内容は［２］になる。【この状態をく状態２〉とする）第６図（ｂ）の場合、く状ｉ！２〉でハで示される類似
度９３の”いち”の入力があった時は、これがＶｔｈｌ
より小さいので、今度はＶｔｈ２と比較するが、このＶ
ｔｈ２はそれまでのりジエクト回数に応じて小さく設定
されることになる。同図の場合、Ｖｔｈ２は８０である
ので、類似度９３はこのＶｔｈ２より大きくなり、認識
結果が出力される。そして、この認識結果の出力でリジ
ェクト回数が０にリセットされる。一方、１度音声が入力され、リジェクトとなり、結果が
１であり、上記の〈状態１〉で、再度音声が入力され最
も大きい類似度を示すものが番号［２コの音声”二”で
ある場合について説明する。再度入力された音声と最も類似している音声パタンの番
号［２］が認識結果メモリ（２）に伝達され、類似度１
．２０が類似度メモリ（１）に格納される。第１リジェ
クト判定部（４）は、第１閾値テーブル（３）から閾値
１５０と類似度メモリ（１）から類似度１２０を呼び込
み、その大小関係を調べる。類似度メモリの値が小さい
ので、第１リジェクト判定部（４）は１．リジェクトと
判定して、リジェクト回数判定部（７）にその旨を伝達
する。リジェクト回数判定部（７）は、リジェクト回数
メモリ（６）からりジエクト回数を呼び込み、その値が
１であるので、リジェクトと判定は行わないで、その旨
をリジェクト同一判定部（８）に伝達する。リジェクト
同一判定部（８）はりジエクト番号メモリ（９）からり
ジエクト番号［１］を、そして認識結果メモリ（２）か
ら認識結果２を呼び込み、その結果が同一であるかを判
定する。この場合には、［１，］　と［２］で同一では
ないので、リジェクト同一判定部（８）は出力部（５）
にリジェクトである旨を伝達し、認識結果メモリ（２）
の値２をリジェクト番号メモリ（９）に格納する。また
リジェクト同一判定部（８）はりジエクト回数メモリの
カウント値を１にセットする。上述の本発明装置によれば、−度リジエクトされても、
同じ言葉を言い直し、それが同一結果にリジェクトされ
れば、閾値を甘くして再度判定をしなおすことが可能で
あるので、認識対象語が入力された場合には、リジェク
トとなっても言い直しをすることにより認識しやすくす
ることが可能となる。また、認識対象語以外が入力され
た場合には、同じ認識結果にリジェクトしない限りは、
閾値が甘くならないため、突発音や会話音声による誤動
作を削減させることが可能となった。一方、第３図に本発明の音声認識装置の判定部の他の実
施例の構成を示す。同図の装置構成は、第１図のりジエ
クト回数メモリを一定時間後にクリアするタイマ（１２
）を追加したものである。同図の装置は、前述と同様の〈状態１〉の状態に於て、
タイマ（１２）には、リジェクト回数判定部（７）ある
いは、第２リジェクト判定部（１１）がリジェクトと判
定すると、そのリジェクト回数判定部（７）、第２リジ
ェクト判定部（１１）により所定時間（例えば１０秒）
がセットされる。このタイマ（１２）は、所定時間が設
定されるとカウントを開始することになり、このカウン
トが所定時間を越えた場合には、リジェクト回数メモリ
（６）の内容がクリアされる。このように前回の音声入
力から相当の時間が経過した後では、使用者の再度の発
声の可能性は小さく、新たな音声である可能性が高いの
で、閾値を甘くする必要はない。又、第４図に本発明装置の判定部のさらに他の実施例の
構成を示す。同図の装置構成は、第１図のりジェクト回
数メモリのりジエクト回数が所定回数を越えた場合には
、リジェクト回数をクリアする最大棄却回数判定部（１
３）が追加場れなものである。同図の装置は、前述と同様の〈状！！２〉の状態に於て
、最大棄却回数判定部（１３）は、第２リジェクト判定
部（１１）がリジェクトと判定するとその旨が伝達され
、リジェクト回数メモリ（６）の内容を呼び込み、その
内容が所定値（例えば４）ならばリジェクト回数メモリ
（６）の内容をクリアする。リジェクト回数が極めて多
くなるとそれは雑音入力である可能性が高いので、閾値
を甘くする必要はない。又、第５図に本発明装置の判定部の他の実施例の構成を
示す。同図の装置構成は、第４図の類似度メモリの値が
第３閾値テーブル（１５）の値より小さい場合に、リジ
ェクト回数メモリをクリアする類似度判定部（１４）が
追加されたものである。同図の装置は、前述と同様のく状ｔ！１〉の状態に於て
、類似度判定部（１４）は、リジェクト回数判定部（７
）あるいは、第２リジェクト判定部（１１）がリジェク
トと判定すると類似度メモリ（１）の類似度を呼び込み
、その値が第３閾値テーブル（１５）の値（例えば５０
）よりも小さい（ｆｆ準パタンと入力音声パタンの類似
度が掛は離れている）ときは、リジェクト回数メモリ（
６）の内容をクリアする。類似度が極めて小さい場合に
は、それは雑音入力である可能性が高いので、閾値を甘
くする必要はない。以上の説明に於ては、上記類似度計算部から得られた選
択標準パタンが示す信号と認識棄却された前回の音声入
力時の認識棄却結果とが一致する場合には、−度Ｖｔｈ
ｌと比較してから、ｖｔｈ２と比較しているが、始めの
Ｖｔｈｌとの比較をを省略して、直ちにＶｔｈ２と比較
するようにしもよい。（ト）発明の効果本発明の音声認識装置によれば、あらかじめ閾値を厳し
く設定しておき、誤認識を防止を図りながら、この閾値
でリジェクトになった入力音声の再入力時に、閾値を所
定量甘くしてやることができるので、リジェクトを抑制
して、正しい認識結果を得ることができる。[This state is referred to as <State 1>] In <State 1!1> explained above, the microphone (2)
The explanation will be further compared for the case where the voice "ichi" is input from 1) and the similarity at this time is 1.20, which is the value shown by the mouth in FIG. 6(a). In this case as well, the first rejection determination unit (4) retrieves the first threshold value Vthl; 150 from the first threshold value table (3) and the similarity degree 120 from the similarity degree memory (1), and examines their magnitude relationship. As a result, as shown in the figure, since the value of the similarity memory (2) is small, the first rejection determination unit (4) determines that the rejection has been rejected, and notifies the rejection count determination unit (7) to that effect. . The rejection number determination unit (7) reads the number of rejections from the rejection number memory (6). Since it has been rejected once vertically and the value is 1, it is not immediately determined to be rejected, but that fact is transmitted to the reject identity determination unit (8). Reject identity determination unit (8) reads reject number [1] from reject number memory (9) and recognition result [1] from recognition result memory (2), and determines whether the results are the same. In this case, since both numbers match, please indicate this in the second
It is transmitted to the rejection determination section (11). In this way, the second rejection determination unit (11) receives the rejection number [1] from the rejection number memory (6), and also sets a second threshold value Vth2 corresponding to the rejection number from the second Gii value table (]0). = 110 is transmitted a second
The rejection determination unit (11) examines the magnitude relationship between the similarity 120 and the second threshold Vth2=110. In this case, as shown in the figure, since the value of the similarity degree memory (2) is large, the second rejection determination section (11) uses the recognition result memory (
The contents of 2) are output to the output section (5), and the contents of the rejection count memory are cleared. On the other hand, as shown in FIG. 6(b), if it is rejected even in this state, [1] is added to the contents of the reject count memory (6), so the stored contents become [2]. [This state is defined as state 2] In the case of Fig. 6(b), the state i! When there is an input of “1” with a similarity of 93 indicated by C in 2>, this is the Vthl
Since it is smaller, we will compare it with Vth2 next time, but this V
th2 is set to a small value according to the number of times of pasting. In the case of the figure, since Vth2 is 80, the degree of similarity 93 is greater than Vth2, and the recognition result is output. Then, the number of rejections is reset to 0 by outputting this recognition result. On the other hand, the voice is input once, it is rejected, and the result is 1, and in the above <state 1>, the voice is input again and the one that shows the greatest similarity is the voice "2" of number [2]. Let me explain the case. The number [2] of the speech pattern that is most similar to the re-input speech is transmitted to the recognition result memory (2), and the similarity is set to 1.
．． 20 is stored in the similarity memory (1). The first rejection determination unit (4) retrieves the threshold value 150 from the first threshold value table (3) and the similarity degree 120 from the similarity degree memory (1), and examines their magnitude relationship. Since the value of the similarity memory is small, the first rejection determination unit (4) selects 1. It is determined to be rejected and the rejection number determination unit (7) is notified of the rejection. The reject count determining unit (7) retrieves the reject count from the reject count memory (6), and since the value is 1, it does not determine that it is a reject, but transmits this to the reject identity determining unit (8). do. Reject identity determination unit (8) reads reject number [1] from reject number memory (9) and recognition result 2 from recognition result memory (2), and determines whether the results are the same. In this case, [1,] and [2] are not the same, so the reject identity determination unit (8) outputs the output unit (5).
The recognition result memory (2) is sent to the recognition result memory (2).
The value 2 is stored in the reject number memory (9). Further, the reject identity determination unit (8) sets the count value of the rejection number memory to 1. According to the above-mentioned device of the present invention, even if the device is rejected by − degrees,
If you rephrase the same word and it is rejected with the same result, it is possible to soften the threshold and make the judgment again, so if the recognition target word is input, it is possible to say it even if it is rejected. By correcting it, it becomes possible to make it easier to recognize. Also, if a word other than the recognition target word is input, unless the same recognition result is rejected,
Since the threshold value does not become too soft, it is possible to reduce malfunctions caused by sudden sounds or conversational sounds. On the other hand, FIG. 3 shows the configuration of another embodiment of the determination section of the speech recognition device of the present invention. The device configuration shown in the figure is based on a timer (12
) is added. The device shown in the figure is in the same <state 1> state as described above.
The timer (12) is set to a timer (12) when the rejection number determination unit (7) or the second rejection determination unit (11) determines that the rejection has been rejected. (e.g. 10 seconds)
is set. This timer (12) starts counting when a predetermined time is set, and when this count exceeds the predetermined time, the contents of the rejection number memory (6) are cleared. In this way, after a considerable amount of time has passed since the previous voice input, the possibility that the user will utter the voice again is small and the voice is likely to be new, so there is no need to set the threshold value loosely. Further, FIG. 4 shows the configuration of still another embodiment of the determination section of the apparatus of the present invention. The configuration of the device shown in FIG. 1 is based on the maximum rejection number determination unit (1
3) is a rare addition. The device shown in the figure has the same shape as described above! ! In the state 2>, when the second rejection determination unit (11) determines that the rejection is rejected, the maximum rejection number determination unit (13) is notified of this, reads the contents of the rejection number memory (6), and stores the contents. If is a predetermined value (for example, 4), the contents of the reject count memory (6) are cleared. If the number of rejections is extremely large, there is a high possibility that it is noise input, so there is no need to make the threshold softer. Further, FIG. 5 shows the configuration of another embodiment of the determination section of the apparatus of the present invention. The device configuration in the same figure has an additional similarity determination unit (14) that clears the rejection count memory when the value in the similarity memory in FIG. 4 is smaller than the value in the third threshold table (15). be. The device shown in the figure has a dog-shaped t! In the state 1>, the similarity determination unit (14) rejects the number of rejection determination units (7).
) Alternatively, if the second rejection determination unit (11) determines that the rejection is rejected, it reads the similarity in the similarity memory (1) and sets the value to the value in the third threshold table (15) (for example, 50
) (the similarity between the ff quasi-pattern and the input speech pattern is far apart), the rejection count memory (
Clear the contents of 6). If the degree of similarity is extremely small, there is a high possibility that it is a noise input, so there is no need to make the threshold softer. In the above explanation, if the signal indicated by the selected standard pattern obtained from the similarity calculation unit matches the recognition rejection result of the previous voice input that was rejected, - degree Vth
Although the comparison is made with Vth2 after the comparison with Vth1, the initial comparison with Vthl may be omitted and the comparison with Vth2 may be performed immediately. (G) Effects of the Invention According to the speech recognition device of the present invention, the threshold value is set strictly in advance to prevent misrecognition, and the threshold value is set to the desired value when re-inputting input speech that has been rejected by this threshold value. Since it is possible to make the quantity quantitative, it is possible to suppress rejections and obtain correct recognition results.

[Brief explanation of the drawing]

第１図は本発明の一実施例の判定部の構成図、第２図は
本発明の音声認識装置の構成図、第３図乃至第５図は夫
々本発明装置の判別部の興なる実施例を示すｔｌｌ成図
、第６図は本発明の動作を示す模式図である。（１）・・・類似度メモリ、（２）・・・認識結果メモ
リ、（３）・・・第１閾値テーブル、（４）・・・第１
リジェクト判定部、（５）・・・出力部、（６）・・・
リジェクト回数メモリ、（７）・・・リジェクト回数判
定部、（８）・・・リジェクト同一判定部、（９）・・
・ノジエクト番号メモリ、（１０）・・・第２閾値テー
ブル、（１１）・・・第２リジェクト判定部。FIG. 1 is a block diagram of a determining unit according to an embodiment of the present invention, FIG. 2 is a block diagram of a speech recognition device of the present invention, and FIGS. 3 to 5 are respective implementations of the determining unit of the device of the present invention. FIG. 6 is a schematic diagram showing the operation of the present invention. (1)...similarity memory, (2)...recognition result memory, (3)...first threshold table, (4)...first
Reject determination section, (5)... Output section, (6)...
Rejection count memory, (7)... Rejection count determination section, (8)... Rejection sameness determination section, (9)...
- No reject number memory, (10)...Second threshold table, (11)...Second reject determination unit.

Claims

[Claims]

(1) A microphone that inputs voice, a voice analysis section that analyzes the voice input from the microphone, a pattern creation section that converts the analysis results analyzed by the voice analysis section into voice patterns, and a standard voice pattern that converts multiple voice patterns in advance. Compare each of the plurality of standard patterns stored in the standard pattern memory and the standard pattern memory with the audio pattern of the input audio created by the pattern creation section, calculate the similarity, and indicate the largest similarity. A similarity calculation unit that selects a standard pattern. When the maximum similarity of the standard pattern selected by the similarity calculation unit is greater than a preset threshold, a signal corresponding to the selected standard pattern at this time is output as a recognition result. However, in a speech recognition device equipped with a determination unit that determines that recognition is rejected when the maximum similarity is smaller than a threshold, if the determination unit determines that recognition is rejected, a method corresponding to the selected standard pattern at that time is used. A voice recognition system characterized in that a signal is stored as a recognition rejection result, and when the same recognition rejection result as the previous voice input whose recognition was rejected is obtained, the threshold value set in the determination section is set to a small value. Device.

(2) In the speech recognition device according to claim 1, if the same recognition rejection result is continuously obtained for continuous speech input, the threshold set in the determination section is re-reduced to a smaller value. A voice recognition device characterized in that the voice input at this time is compared again using the most recently set threshold.

(3) The speech recognition device according to claim 3, further comprising a storage means for storing the number of times the same recognition rejection result is obtained in response to consecutive speech inputs, and the storage means stores the number of times the same recognition rejection result is obtained in response to consecutive speech inputs. A speech recognition device characterized in that a threshold value set in the determination section is set to a small value according to the number of times recognition results are rejected.

(4) The speech recognition device according to claim 3, further comprising a timer that starts counting after a recognition rejection result is obtained, and clears the number of recognition rejections when the timer elapses for a predetermined period of time. voice recognition device.

(5) The speech recognition device according to claim 3, characterized in that when the number of rejections of recognition results stored in the storage means exceeds a predetermined number, the memory contents of the storage means are cleared. voice recognition device.

(6) In the speech recognition device according to claim 3, when the degree of similarity obtained from the degree of similarity calculation section is smaller than a third threshold value which is smaller than the second threshold value, the memory contents of the memory means are A voice recognition device that has the feature of clearing.

(7) A microphone that inputs voice, a voice analysis section that analyzes the voice input from the microphone, a pattern creation section that converts the analysis results analyzed by the voice analysis section into voice patterns, and a standard voice pattern that converts multiple voice patterns in advance. The standard pattern memory stored as a standard pattern memory, each of the plurality of standard patterns in the standard pattern memory is compared with the audio pattern of the input audio created by the pattern creation section, the similarity is calculated, and the standard indicating the maximum similarity is calculated. When the maximum similarity of the standard pattern selected by the similarity calculation unit that selects the pattern is greater than a preset threshold, the signal corresponding to the selected standard pattern at this time is output as a recognition result. , in a speech recognition device equipped with a determination unit that determines that recognition is rejected when the maximum similarity is smaller than a threshold, the signal indicated by the selection standard pattern obtained from the similarity calculation unit and the previous speech rejected for recognition are used. If the recognition rejection result at the time of input matches, the determination unit makes a determination regarding the current voice input using a threshold value smaller than that at the time of the previous voice input.