JP2975772B2

JP2975772B2 - Voice recognition device

Info

Publication number: JP2975772B2
Application number: JP4173114A
Authority: JP
Inventors: 真一鶴藤
Original assignee: Sanyo Denki Co Ltd
Current assignee: Sanyo Denki Co Ltd
Priority date: 1992-06-30
Filing date: 1992-06-30
Publication date: 1999-11-10
Anticipated expiration: 2014-11-10
Also published as: JPH0619491A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、音声によって各種機器
を制御する音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device for controlling various devices by voice.

【０００２】[0002]

【従来の技術】近年、音声を認識できる音声認識装置の
研究開発が盛んに行われており、この種装置の実用化が
望まれている。2. Description of the Related Art In recent years, research and development of a voice recognition device capable of recognizing voice have been actively carried out, and practical use of this type of device is desired.

【０００３】この種装置は、一般には、音声を分析して
得られる音声の特徴を表すパラメータからなる例えば図
７のような音声パタンをデータ処理するものであり、あ
らかじめ複数の音声について貯えられた音声パタン（音
声標準パタン）のそれぞれを未知の音声パタンとパタン
マッチングの手法によって比較し、最も誤差の小さい
（即ち、類似度の高い）音声標準パタンを見出すこと
で、この標準パタンに対応した信号が認識結果として出
力されるものである。[0003] This type of apparatus generally performs data processing on a voice pattern such as that shown in FIG. 7, which comprises parameters representing voice characteristics obtained by analyzing voice, and is stored in advance for a plurality of voices. Each of the voice patterns (voice standard patterns) is compared with an unknown voice pattern by a method of pattern matching, and a voice standard pattern with the smallest error (that is, a high similarity) is found, so that a signal corresponding to this standard pattern is obtained. Is output as a recognition result.

【０００４】このような音声認識装置において、使用者
は最初に認識させるべき音声の情報をメモリに蓄える登
録作業を行い、この登録終了後に本来の認識処理を行っ
ていた。この場合、登録された音声が正しく登録されて
いない場合、即ちメモリに記憶された音声の情報が誤っ
ているような場合には誤認識を引き起こす原因となって
いた。従って認識性能を高めるためには、如何に音声の
情報（音声標準パタン）を正しくメモリに蓄えるかが、
大きな問題である。従来、音声標準パタンを正しくする
ために、登録時に同一音声につき必ず３回以上発声し、
そのうち最も類似する２つのパタンから音声標準パタン
を作成する方法（特公平１−３６６３９号公報に詳し
い）や一度登録した音声パタンをテストモ−ドなどによ
り音声標準パタンのチェックを行う方法によって、音声
標準パタンをより正確なものにしていた。この場合に
は、音声パタンを登録するために、少なくとも２回以上
音声を発声する必要があり、登録が複雑になっていた。In such a voice recognition device, a user first performs a registration operation of storing voice information to be recognized in a memory, and performs an original recognition process after completion of the registration. In this case, if the registered voice is not correctly registered, that is, if the information of the voice stored in the memory is wrong, it has caused a cause of erroneous recognition. Therefore, in order to improve the recognition performance, how to correctly store voice information (voice standard pattern) in the memory,
It is a big problem. Conventionally, in order to make the voice standard pattern correct, at the time of registration, the same voice must be spoken at least three times,
An audio standard pattern is created by a method of creating an audio standard pattern from the two patterns that are the most similar (detailed in Japanese Patent Publication No. 1-36639) or a method of checking an audio standard pattern once registered in a test mode or the like. The pattern was more accurate. In this case, it is necessary to utter a voice at least twice in order to register a voice pattern, and the registration is complicated.

【０００５】他の方法として、認識結果を用いて音声標
準パタンの修正を行うことも、試みられている。この方
法では、認識結果を出力し、その結果が正しい旨をスイ
ッチなどにより、音声認識装置に使用者が入力し、その
情報を用いて音声標準パタンと入力音声パタンを平均処
理した平均パタンを作成し、音声標準パタンをこの平均
パタンに変更する処理を行っていた。しかし、この方法
を用いる場合には、使用者が認識結果が正しいかどうか
の情報を音声認識装置に入力する必要があり、実用的で
ない。As another method, an attempt has been made to correct a speech standard pattern using a recognition result. In this method, a recognition result is output, and a user inputs the fact that the result is correct to a voice recognition device using a switch or the like, and generates an average pattern obtained by averaging a voice standard pattern and an input voice pattern using the information. Then, processing for changing the voice standard pattern to this average pattern was performed. However, when this method is used, it is necessary for the user to input information on whether or not the recognition result is correct to the speech recognition device, which is not practical.

【０００６】また、認識結果を用いて音声標準パタンの
修正を行う他の方法として、認識結果を出力し、その結
果が正しいかを判断することなく音声標準パタンの修正
を行う方法がある。この場合、誤認識の場合にも音声標
準パタンが修正されてしまうため、かえって誤った音声
標準パタンに修正されてしまう可能性があった。As another method of correcting a voice standard pattern using a recognition result, there is a method of outputting a recognition result and correcting the voice standard pattern without determining whether the result is correct. In this case, since the voice standard pattern is corrected even in the case of erroneous recognition, there is a possibility that the voice standard pattern is corrected instead to an incorrect voice standard pattern.

【０００７】[0007]

【発明が解決しようとする課題】本発明は、上記問題点
を解決するものであり、登録操作を簡単にし、かつ音声
標準パタンの修正を効率よく行う音声認識装置を提供す
るものである。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and an object of the present invention is to provide a voice recognition apparatus which simplifies a registration operation and efficiently corrects a voice standard pattern.

【０００８】[0008]

【課題を解決するための手段】マイクロホンから入力さ
れた音声を分析する音声分析部と、音声分析部で分析さ
れた結果に基づいて音声パタンを作成するパタン作成部
と、あらかじめ複数の音声標準パタンが蓄積されている
標準パタンメモリと、上記パタン作成部で作成された音
声パタンと上記標準パタンメモリに蓄積されている各音
声標準パタンとの間の類似差を計算し、最も類似してい
る音声標準パタンおよびその類似度を出力する類似度計
算部と、該類似度計算部から得られる音声標準パタンの
類似度が、あらかじめ設定されている閾値よりも大きい
時、音声標準パタンに対応する信号を認識結果として出
力し、この類似度が上記閾値よりも小さい時、認識棄却
と判断する判断部と、該判断部で認識棄却と判断された
音声パタンを記憶する第１の入力パタンバッファメモリ
と、上記判断部で認識棄却と判断された場合、再度の音
声入力で得た音声パタンが最も類似している音声標準パ
タンが、上記第１の入力パタンバッファメモリの音声パ
タンが最も類似している音声標準パタンと同一である
時、この再度の入力で得た音声パタンを記憶する第２の
入力パタンバッファメモリと、第１または第２の入力パ
タンバッファメモリの音声パタンに基づいて、第１また
は第２の入力パタンバッファメモリの音声パタンが最も
類似している上記音声標準パタンメモリ内の音声標準パ
タンを修正するパタン修正部とからなるものである。A voice analysis unit for analyzing a voice input from a microphone, a pattern generation unit for generating a voice pattern based on a result analyzed by the voice analysis unit, and a plurality of voice standard patterns in advance. Is calculated, and the similarity difference between the voice pattern created by the pattern creation unit and each voice standard pattern stored in the standard pattern memory is calculated, and the most similar voice is calculated. A similarity calculation unit that outputs a standard pattern and its similarity, and a signal corresponding to the voice standard pattern when the similarity of the audio standard pattern obtained from the similarity calculation unit is greater than a preset threshold. Output as a recognition result, and when the similarity is smaller than the threshold value, store a judgment unit that judges rejection of recognition and a voice pattern judged to be rejection of recognition by the judgment unit. The first input pattern buffer memory and the voice standard pattern having the most similar voice pattern obtained by the second voice input when the determination unit determines that the recognition is rejected. Is the same as the most similar voice standard pattern, the second input pattern buffer memory for storing the voice pattern obtained by this re-input and the first or second input pattern buffer memory And a pattern correction unit that corrects the audio standard pattern in the audio standard pattern memory to which the audio pattern of the first or second input pattern buffer memory is most similar based on the audio pattern.

【０００９】[0009]

【作用】本発明の音声認識装置によれば、音声の登録処
理の時に、たとえ音声標準パタンに誤りがあっても、音
声の認識処理の時に、この誤り音声標準パタンに対して
修正を行う事ができる。According to the speech recognition apparatus of the present invention, even if there is an error in the speech standard pattern at the time of speech registration processing, the error speech standard pattern is corrected at the time of speech recognition processing. Can be.

【００１０】[0010]

【実施例】図１には本発明の音声認識装置の構成を示
し、その要部のパタン修正部の一実施例の構成を図２に
示す。FIG. 1 shows the configuration of a speech recognition apparatus according to the present invention, and FIG. 2 shows the configuration of an embodiment of a main part of a pattern correction unit.

【００１１】図１の音声認識装置の構成は、マイクロフ
ォン１から入力された音声を分析する音声分析部２、音
声分析部で分析された特徴パラメ−タから音声区間を検
出し、音声パタン化するパタン作成部３、あらかじめ音
声標準パタンが蓄積されている音声標準パタンメモリ
４、分析部で分析された未知音声と音声標準パタンメモ
リに蓄積されている音声標準パタンのマッチングを行
い、音声標準パタン毎に類似度を計算する類似度計算部
５、類似度計算部で計算された各音声標準パタン毎の類
似度を蓄えるとともに、最も類似している音声標準パタ
ンを選択し、判定基準（以降閾値という）と比較し、認
識結果が有効であるかを判定する判定部６、判定部の結
果に基づいて音声標準パタンの修正を行うパタン修正部
７からなる。The configuration of the speech recognition apparatus shown in FIG. 1 is as follows. A speech analysis unit 2 for analyzing speech input from a microphone 1, a speech section is detected from feature parameters analyzed by the speech analysis unit, and speech patterns are formed. A pattern creation unit 3, a voice standard pattern memory 4 in which voice standard patterns are stored in advance, and matching between the unknown voice analyzed by the analysis unit and a voice standard pattern stored in the voice standard pattern memory, and for each voice standard pattern The similarity calculator 5 for calculating the similarity stores the similarity calculated for each voice standard pattern by the similarity calculator, selects the most similar voice standard pattern, and determines a criterion (hereinafter referred to as a threshold). ), A determination unit 6 for determining whether the recognition result is valid, and a pattern correction unit 7 for correcting the voice standard pattern based on the result of the determination unit.

【００１２】このような図１の音声認識装置において、
本発明の特徴とするところはパタン修正部７にあり、そ
の構成は図２に示す如く、修正される音声標準パタンが
蓄えられる第１音声バッファ７１、入力音声の音声パタ
ンが蓄えられる第２音声バッファ７２、再度入力された
音声の音声パタンが蓄えられる第３音声バッファ７３、
各音声バッファの音声パタン間の類似度を計算するパタ
ン類似度計算部７４、第１音声バッファ７１に伝達され
た音声標準パタンの番号を蓄える棄却番号記憶部７６、
第１音声バッファ７１または第２音声バッファ７２また
は第３音声バッファ７３の音声パタンに基づいて、音声
パタンの修正平均処理を行うパタン修正平均部７７を備
えている。In such a speech recognition device of FIG.
The feature of the present invention resides in a pattern correction unit 7, which has a first voice buffer 71 for storing a voice standard pattern to be corrected and a second voice for storing a voice pattern of an input voice, as shown in FIG. A buffer 72, a third audio buffer 73 for storing an audio pattern of the audio input again,
A pattern similarity calculator 74 for calculating the similarity between the audio patterns of each audio buffer; a rejection number storage 76 for storing the number of the audio standard pattern transmitted to the first audio buffer 71;
A pattern correction averaging unit 77 that performs a correction averaging process on the audio pattern based on the audio pattern in the first audio buffer 71, the second audio buffer 72, or the third audio buffer 73 is provided.

【００１３】このような図２の構成を図１のパタン修正
部７に採用した場合の本発明装置の動作について以下に
解説する。The operation of the apparatus according to the present invention when the configuration shown in FIG. 2 is employed in the pattern correction unit 7 shown in FIG. 1 will be described below.

【００１４】〔実施例１〕使用者は、最初に音声の登録
を行う。これは、登録スイッチ（図示せず）により装置
を登録モードに設定し、順次登録すべき音声をマイクロ
フォン１に向かって発声する。例えば、”ゼロ”と発声
する。マイクロフォン１から入力された音声は電気信号
に変換され、音声分析部２で、特徴パラメータとして抽
出される。例えばバンド・パス・フィルタなどにより図
７に示すような一般的な周波数分析が行われる。分析さ
れた特徴パラメータは、パタン作成部３に伝達され、さ
らに、パタン作成部において音声区間の検出及び音声パ
タン化が行われる。パタン作成部においてパタン化され
た音声パタンは、音声標準パタンとして音声標準パタン
メモリ４の所定のエリアに格納される。続いて順次”イ
チ”、”ニ”、・・・”キュウ”と音声標準パタンメモ
リ４に格納され、全ての登録を行う。[First Embodiment] A user first registers a voice. That is, the apparatus is set to a registration mode by a registration switch (not shown), and sounds to be sequentially registered are uttered toward the microphone 1. For example, say "zero". The voice input from the microphone 1 is converted into an electric signal, and the voice analyzer 2 extracts the voice as a feature parameter. For example, a general frequency analysis as shown in FIG. 7 is performed by a band pass filter or the like. The analyzed characteristic parameters are transmitted to the pattern creating unit 3, and the pattern creating unit 3 detects a voice section and performs voice patterning. The voice pattern patterned by the pattern creation unit is stored in a predetermined area of the voice standard pattern memory 4 as a voice standard pattern. Subsequently, "one", "d",... "Kyu" are sequentially stored in the voice standard pattern memory 4, and all registrations are performed.

【００１５】次に、実際の音声認識について説明する。
オペレータがマイクロフォン１に向かって”ゼロ”と発
声した場合について説明する。マイクロフォン１から入
力された音声は登録モードと同じ処理が行われ、パタン
作成部３において音声パタンが作成される。認識モード
においてはこの音声パタンが類似度計算部５に伝達され
る。類似度計算部５においては、パタン作成部３で作成
された音声パタンと音声標準パタンメモリ４に格納され
ている各々の音声標準パタンと各々の類似度を計算し、
その類似度が判定部６に伝達される。例えば、入力音
声”ゼロ”に対しては、図５に示されるように類似度が
伝達される。続いて判定部６においては、最大類似度を
与える音声標準パタン及びその類似度を判定する。入
力”ゼロ”に対しては、最大類似度を与える音声標準パ
タンは、図５に示すように、”ゼロ”で、その類似度は
７０である。判定部６においては、あらかじめ設定され
ている閾値と類似度の大小の判定を行い、入力音声の有
効性を判定する。ここでは、閾値は８０であり、認識棄
却と判定する。判定部６は、パタン修正部７にその旨を
伝達する。〔この状態を＜状態１＞とする〕ここで、最
大類似度が９０であった場合、認識されたと判断され、
判定部６から音声標準パタンに対応する信号を出力す
る。この時、パタン修正部７で音声標準パタンの修正は
行われない。Next, actual speech recognition will be described.
The case where the operator utters “zero” toward the microphone 1 will be described. The voice input from the microphone 1 is subjected to the same processing as in the registration mode, and the pattern generation unit 3 generates a voice pattern. In the recognition mode, this voice pattern is transmitted to the similarity calculation unit 5. The similarity calculation unit 5 calculates the similarity between the voice pattern created by the pattern creation unit 3 and each of the voice standard patterns stored in the voice standard pattern memory 4,
The similarity is transmitted to the determination unit 6. For example, the similarity is transmitted to the input voice “zero” as shown in FIG. Subsequently, the determination unit 6 determines the voice standard pattern that gives the maximum similarity and the similarity. For the input “zero”, the voice standard pattern that gives the maximum similarity is “zero”, and the similarity is 70, as shown in FIG. The determination unit 6 determines the threshold value and the degree of similarity that are set in advance, and determines the validity of the input voice. Here, the threshold is 80, and it is determined that recognition is rejected. The determination unit 6 notifies the pattern correction unit 7 to that effect. [This state is referred to as <state 1>] Here, if the maximum similarity is 90, it is determined that the recognition has been performed.
The determination unit 6 outputs a signal corresponding to the audio standard pattern. At this time, the pattern correction unit 7 does not correct the audio standard pattern.

【００１６】次にパタン修正部７の処理について説明す
る。パタン修正部７は、判定部から認識棄却の信号をう
けると、最大類似度を読み込み、棄却番号記憶部７６に
最大類似度を与える音声標準パタンの番号を蓄えるとと
もに、音声標準パタンメモリ４から最大類似度を与える
音声標準パタンを第１音声バッファ７１に、入力された
音声の音声パタンをパタン作成部３から第２音声バッフ
ァに読み込む。Next, the processing of the pattern correction unit 7 will be described. When receiving the recognition rejection signal from the determination unit, the pattern correction unit 7 reads the maximum similarity, stores the number of the voice standard pattern that gives the maximum similarity in the rejection number storage unit 76, and stores the maximum number from the voice standard pattern memory 4. An audio standard pattern giving similarity is read into the first audio buffer 71, and an audio pattern of the input audio is read from the pattern creation unit 3 into the second audio buffer.

【００１７】入力された音声が認識棄却と判定された
時、通常、使用者は再度同じ言葉を発声する。ここで
は、再度”ゼロ”と発声されたとする。この入力も同じ
ように類似度が、図６に示すように計算される。図６に
示されるように最大類似度を与える音声標準パタンは”
ゼロ”であると判定部６で判定される。パタン修正部７
では、最大類似度を与える音声標準パタンの番号を、棄
却番号記憶部７６に伝達する。棄却番号記憶部７６は、
既に記憶されている番号と伝達された番号が一致する場
合には、第３音声バッファにパタン作成部で作成された
音声パタンを伝達する。When it is determined that the input speech is rejected, the user usually utters the same word again. Here, it is assumed that “zero” is uttered again. Similarly, the similarity of this input is calculated as shown in FIG. As shown in FIG. 6, the voice standard pattern that gives the maximum similarity is "
The determination unit 6 determines that the value is “zero”. The pattern correction unit 7
Then, the number of the voice standard pattern giving the maximum similarity is transmitted to the rejection number storage unit 76. The rejection number storage unit 76
If the number already stored and the transmitted number match, the audio pattern created by the pattern creation unit is transmitted to the third audio buffer.

【００１８】次に、パタン類似度計算部７４において以
下の計算を行う。第１音声バッファの音声パタンをＰ₁
（ｉ，ｊ）とする。第２音声バッファの音声パタンをＰ
₂（ｉ，ｊ）とする。第３音声バッファの音声パタンを
Ｐ₃（ｉ，ｊ）とする。修正パタンをＰ_ref（ｉ，ｊ）と
する。この時、第１音声バッファと第２音声バッファの
音声パタン間の類似度Ｓ₁₂は、Next, the following calculation is performed in the pattern similarity calculator 74. The audio pattern of the first audio buffer is P ₁
(I, j). Set the audio pattern of the second audio buffer to P
₂ (i, j). The audio pattern of the third audio buffer is P ₃ (i, j). Let the modified pattern be P _ref (i, j). In this case, the similarity S ₁₂ between voice patterns of the first audio buffer and the second audio buffer,

【００１９】[0019]

【数１】 (Equation 1)

【００２０】第１音声バッファと第３音声バッファの音
声パタン間の類似度Ｓ₁₃は、The similarity S ₁₃ between the audio patterns of the first audio buffer and the third audio buffer is

【００２１】[0021]

【数２】 (Equation 2)

【００２２】第２音声バッファと第３音声バッファの音
声パタン間の類似度Ｓ₂₃は、The similarity S ₂₃ between voice patterns of the second audio buffer and the third audio buffer,

【００２３】[0023]

【数３】 (Equation 3)

【００２４】このような計算結果Ｓ₁₂、Ｓ₁₃、Ｓ₂₃の中
で最も値の大きいもの（最も類似しているもの）の音声
パタンをパタン修正平均部７７に伝達する。パタン修正
平均部７７は、２つの音声パタンの平均処理を以下のよ
うに行う。The voice pattern having the largest value (similar one) among the calculation results S ₁₂ , S ₁₃ , and S ₂₃ is transmitted to the pattern correction averaging unit 77. The pattern correction averaging unit 77 performs an averaging process of two voice patterns as follows.

【００２５】第１音声バッファと第２音声バッファの音
声パタンの平均処理はThe average processing of the audio patterns in the first audio buffer and the second audio buffer is as follows.

【００２６】[0026]

【数４】 (Equation 4)

【００２７】第１音声バッファと第３音声バッファの音
声パタンの平均処理はThe average processing of the audio patterns of the first audio buffer and the third audio buffer is as follows.

【００２８】[0028]

【数５】 (Equation 5)

【００２９】第２音声バッファと第３音声バッファの音
声パタンの平均処理はThe average processing of the audio patterns in the second audio buffer and the third audio buffer is as follows.

【００３０】[0030]

【数６】 (Equation 6)

【００３１】このような平均処理結果から、修正パタン
を作成する。作成されたこの音声パタンは棄却番号記憶
部に記憶されている番号を元に、標準パタンメモリ４の
該当する音声標準パタンのエリアに格納される。A correction pattern is created from such an averaging result. The created voice pattern is stored in the corresponding voice standard pattern area of the standard pattern memory 4 based on the number stored in the rejection number storage unit.

【００３２】また、棄却番号記憶部７６に既に記憶され
ている番号と伝達された番号が一致しない場合には、第
１音声バッファ、第２音声バッファ、第３音声バッファ
及び棄却番号記憶部の内容をクリアし、新しく最大類似
度を与える番号を棄却番号記憶部に、最大類似度を与え
る音声標準パタンを第１音声バッファに、パタン作成部
３で作成された音声パタンを第２音声バッファへ格納す
る。If the number already stored in the rejection number storage unit 76 and the transmitted number do not match, the contents of the first audio buffer, the second audio buffer, the third audio buffer, and the rejection number storage unit And the number giving the new maximum similarity is stored in the rejection number storage unit, the voice standard pattern giving the maximum similarity is stored in the first voice buffer, and the voice pattern created by the pattern creation unit 3 is stored in the second voice buffer. I do.

【００３３】本実施例においては、類似した２つの音声
パタンを元に、新たな音声パタンを作成したが、類似度
を計算することなく、例えば、In the present embodiment, a new voice pattern is created based on two similar voice patterns, but without calculating the similarity, for example,

【００３４】[0034]

【数７】 (Equation 7)

【００３５】の計算式で示すように第１音声バッファ、
第２音声バッファ、第３音声バッファ全ての音声パタン
を平均処理して、修正パタンを作成することも考えられ
る。As shown by the calculation formula, the first audio buffer,
A modified pattern may be created by averaging all the audio patterns of the second and third audio buffers.

【００３６】尚、本発明の音声認識装置に於て、使用さ
れる入力音声パタン（入力パタンバッファ）の数は２個
に限定されずにＮ個（例えば５個）でも可能である。こ
の場合、再度の音声入力処理をＮ回繰り返せばよい。In the speech recognition apparatus of the present invention, the number of input speech patterns (input pattern buffers) used is not limited to two, but may be N (for example, five). In this case, the voice input processing may be repeated N times.

【００３７】〔実施例２〕図３に本発明の音声認識装置
のパタン修正部の他の実施例の構成を示す。同図の装置
構成が図２のそれと異なる所は、認識棄却結果が得られ
た後に計時を開始し、再度の入力との時間間隔を測定す
る入力時間測定機能７５（以降タイマという）を追加し
た点にある。[Embodiment 2] FIG. 3 shows the configuration of another embodiment of the pattern correction unit of the speech recognition apparatus of the present invention. 2 is different from that of FIG. 2 in that an input time measuring function 75 (hereinafter referred to as a timer) for starting time measurement after a recognition rejection result is obtained and measuring a time interval between re-inputs is added. On the point.

【００３８】同図の装置は、前述の＜状態１＞の状態に
おいて、判定部６から認識棄却の信号を受けたとき、タ
イマ７５が計時を開始し、再度音声入力があり、判定部
６から再度認識棄却の信号を受けると計時を終了する。
計時の開始から終了までの時間が設定値（例えば、１０
秒〜２０秒程度の時間があらかじめ設定されている。）
以内であればパタン修正部７で音声標準パタンの修正を
行う。これによって、所定時間を過ぎてからの音声入力
が適切でない場合の誤修正を回避している。すなわち、
第１回目の認識棄却の信号が発生した後、無制限に長時
間、第２回目の認識棄却の信号を得るような音声の入力
を許容するような装置では，１回目と２回目の認識棄却
の原因が類似の雑音入力である場合に、この雑音パタン
によって音声標準パタンを誤修正してしまう不都合があ
るのに対し、本発明装置では、上述のごとき時間制限手
段を備えることによりこのような不都合を発生する頻度
を小さくしている。In the apparatus shown in FIG. 7, in the above-mentioned <state 1>, when receiving a signal of rejection of recognition from the judging section 6, the timer 75 starts counting time, and there is a speech input again. When the recognition rejection signal is received again, the timing is ended.
The time from the start to the end of the time measurement is a set value (for example, 10
A time of about seconds to 20 seconds is set in advance. )
If not, the pattern correction unit 7 corrects the voice standard pattern. This avoids erroneous correction when voice input after a predetermined time is not appropriate. That is,
After the first recognition rejection signal is generated, a device that allows speech input to obtain the second recognition rejection signal for an indefinite period of time may be used. When the cause is similar noise input, there is a problem that the voice standard pattern is erroneously corrected by the noise pattern. On the other hand, the device of the present invention has such a problem by providing the time limiting means as described above. Is less frequent.

【００３９】〔実施例３〕図４に本発明の音声認識装置
のパタン修正部のさらに他の実施例の構成を示す。同図
の装置構成が図２のそれと異なる所は、認識棄却結果が
得られた後に読み込んだ最大類似度に対して、所定値
（以降第２の閾値という）との比較を行う第２閾値判定
部を追加した点にある。[Embodiment 3] FIG. 4 shows the configuration of still another embodiment of the pattern correction unit of the speech recognition apparatus of the present invention. 2 differs from that of FIG. 2 in that the maximum similarity read after the recognition rejection result is obtained is compared with a predetermined value (hereinafter referred to as a second threshold). The point is that the section has been added.

【００４０】同図の装置は、前述の＜状態１＞の状態に
おいて、判定部６から認識棄却の信号を受けたとき、読
み込んだ最大類似度に対して、第２閾値判定部７８で大
小の比較を行う。第２の閾値は、閾値より小さく設定さ
れるものであり、例えば’４５’に設定されている。最
大類似度が第２の閾値よりも大きい場合、パタン修正部
７で音声標準パタンの修正を行う。最大類似度が第２の
閾値以下の場合、再度入力した音声パタンはあまりにも
類似度が低いのでこれを無効として、３度目の音声入力
に対し、２度目の音声入力と同様の処理を行う。In the state shown in FIG. 4, when the recognition rejection signal is received from the determination unit 6 in the state <State 1>, the second threshold value determination unit 78 determines whether the read maximum similarity is large or small. Make a comparison. The second threshold is set smaller than the threshold, and is set to, for example, '45'. If the maximum similarity is larger than the second threshold, the pattern correction unit 7 corrects the voice standard pattern. When the maximum similarity is equal to or less than the second threshold, the voice pattern input again has too low a similarity, so that the voice pattern is invalidated, and the same processing as the second voice input is performed for the third voice input.

【００４１】本実施例において、最大類似度が第２の閾
値以下の場合、再度入力した音声パタンを無効にし、３
度目の音声入力を待つのではなく音声標準パタンの修正
を中止することも考えられる。In this embodiment, if the maximum similarity is equal to or less than the second threshold, the input voice pattern is invalidated, and
It is also conceivable to stop correcting the voice standard pattern instead of waiting for the second voice input.

【００４２】[0042]

【発明の効果】本発明の音声認識装置によれば、音声の
登録処理の時にたとえ音声標準パタンに誤りがあって
も、音声の認識処理の時に誤り音声標準パタンのみに対
して簡単な操作で修正を行う事ができる。また、誤った
入力音声パタンに対しては音声標準パタンの修正を行わ
ないので、信頼性の高い音声標準パタンを得ることがで
きる。According to the speech recognition apparatus of the present invention, even if there is an error in the speech standard pattern at the time of the speech registration process, only the error speech standard pattern can be operated only by a simple operation at the time of speech recognition processing. Corrections can be made. Further, since the audio standard pattern is not corrected for an incorrect input audio pattern, a highly reliable audio standard pattern can be obtained.

[Brief description of the drawings]

【図１】本発明の音声認識装置の構成図を示す。FIG. 1 shows a configuration diagram of a speech recognition device of the present invention.

【図２】本発明の音声認識装置のパタン修正部の一実施
例の構成を示す。FIG. 2 shows a configuration of an embodiment of a pattern correction unit of the speech recognition device of the present invention.

【図３】本発明の音声認識装置のパタン修正部の他の実
施例の構成を示す。FIG. 3 shows the configuration of another embodiment of the pattern correction unit of the speech recognition device of the present invention.

【図４】本発明の音声認識装置のパタン修正部のさらに
他の実施例の構成を示す。FIG. 4 shows the configuration of still another embodiment of the pattern correction unit of the speech recognition device of the present invention.

【図５】入力音声に対する類似度の例を示す。FIG. 5 shows an example of a similarity to an input voice.

【図６】再度入力した音声に対する類似度の例を示す。FIG. 6 shows an example of similarity with respect to re-inputted speech.

【図７】音声パタン例（バンド・パス・フィルタにより
周波数分析された音声パタン）を示す。FIG. 7 shows an example of an audio pattern (an audio pattern subjected to frequency analysis by a band-pass filter).

[Explanation of symbols]

１マイクロフォン２音声分析部３パタン作成部４音声標準パタンメモリ５類似度計算部６判定部７パタン修正部７１第１音声バッファ７２第２音声バッファ７３第３音声バッファ７４パタン類似度計算部７５タイマ７６棄却番号記憶部７７パタン修正平均部７８第２閾値判定部 Reference Signs List 1 microphone 2 voice analysis unit 3 pattern creation unit 4 voice standard pattern memory 5 similarity calculation unit 6 determination unit 7 pattern correction unit 71 first voice buffer 72 second voice buffer 73 third voice buffer 74 pattern similarity calculation unit 75 timer 76 Rejection number storage unit 77 Pattern corrected average unit 78 Second threshold value judgment unit

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G10L 3/00 521 G10L 3/00 561 ──────────────────────────────────────────────────続き Continued on front page (58) Field surveyed (Int.Cl. ⁶ , DB name) G10L 3/00 521 G10L 3/00 561

Claims

(57) [Claims]

An audio analysis unit that analyzes audio input from a microphone, a pattern creation unit that creates an audio pattern based on a result analyzed by the audio analysis unit, and a plurality of audio standard patterns stored in advance. A standard pattern memory, and a similarity difference between the audio pattern created by the pattern creation unit and each audio standard pattern stored in the standard pattern memory, and the most similar audio standard pattern and its A similarity calculator that outputs the similarity, and a signal corresponding to the voice standard pattern is output as a recognition result when the similarity of the voice standard pattern obtained from the similarity calculator is greater than a preset threshold. When the similarity is smaller than the threshold value, the determination unit determines that the recognition is rejected, and the first input pattern that stores the voice pattern determined to be rejected by the determination unit. If the pattern buffer memory and the determination unit determine that recognition is rejected, the voice standard pattern having the most similar voice pattern obtained by the second voice input is the voice pattern of the first input pattern buffer memory which is the most similar. When the voice pattern is the same as the similar voice standard pattern, the voice pattern is obtained based on the second input pattern buffer memory for storing the voice pattern obtained by the re-input and the voice pattern of the first or second input pattern buffer memory. And a pattern correction unit for correcting a voice standard pattern in the voice standard pattern memory having the most similar voice pattern in the first or second input pattern buffer memory.

2. A speech recognition apparatus according to claim 1, wherein said pattern correction unit comprises: a speech standard pattern for providing a maximum similarity; a speech pattern of a first input pattern buffer memory; and a second input pattern buffer memory. Create an average pattern by averaging two similar patterns among the audio patterns
A speech recognition apparatus characterized in that a speech standard pattern is changed to this average pattern.

3. The speech recognition apparatus according to claim 1, wherein the pattern correction unit starts timing after a recognition rejection result is obtained, and measures an input time interval between re-input. A speech recognition apparatus having a measurement function and correcting a speech standard pattern only when a measured time is equal to or shorter than a predetermined time.

4. The speech recognition device according to claim 1, wherein when the maximum similarity of the input speech pattern is smaller than a threshold, the maximum similarity is lower than a second threshold set lower than the threshold. A speech recognition device wherein a speech standard pattern is modified only when the maximum similarity is large.

5. The speech recognition device according to claim 1, 2 or 3,
In this case, the voice standard pattern obtained at the time of voice registration is stored in a voice standard pattern memory, and at the time of voice recognition, the voice standard pattern is stored in the memory based on N input voice patterns obtained from the same input voice as the voice standard pattern of the memory. A speech recognition device characterized by correcting a speech standard pattern.