JPS59170899A

JPS59170899A - Voice registration pattern learning system for voice recognition equipment

Info

Publication number: JPS59170899A
Application number: JP58045496A
Authority: JP
Inventors: 米山　正秀; 潤一郎藤本
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1983-03-17
Filing date: 1983-03-17
Publication date: 1984-09-27

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】技術９厭本発明は、音声認識装置における辞書学習方式音声認識
装置の辞書作成において、通常は、一つの単語について
何回かの平均をとって登録するよう１こしている。しか
し、使用者に対応した辞書を用いて認識をする所謂る特
定話者認識装置においては、辞書作成から時間が経るに
従って発声の仕方が変化するため誤認識が増加すること
が知られている。これを防ぐために辞書学習が必要とな
る。この辞書学習方式として、従来、（１）認識に成功
した単語のみ辞書を書き替える方式（特開昭５６−８１
８９９号）　、（２）辞書と入力音声間の類似度が一定
値以下になると辞書を書き替える方式（特開昭５６−５
１８００号）、（３）誤認識をした単語だけオペレータ
の操作で辞書を更新する方式（特開昭５６−１５６９０
０号）等が発表されている。しかしながら、前記（１）
の方式では、認識に成功すると全て書き替えられるため
、認識発声中に周囲騒音による雑音が発生したような場
合、或いは２発声が不十分であったような場合にも書き
替えられてしまうという欠点があるが、これは、前記（
２）の方式でも同様で、これらの方式では辞書を改悪す
る可能性がある。また、（３）の方式では、オペレータ
の操作の誤りが辞書に与える影響が大きい等の欠点があ
る。なお、前記（特開昭５６−５１８００号）には辞書
と入力音声との平均をとって再登録する方法が述べられ
ているが、このようにすると入力音声の突発的な変動に
よって辞書へ与える影響を少くするメリットはあるもの
の、登録車語中使用頻度の低い単語では十分更新されな
いという欠点がある。[Detailed Description of the Invention] Technique 9 - The present invention provides a dictionary learning method in a speech recognition device.In creating a dictionary for a speech recognition device, normally one word is registered by taking the average of several times. There is. However, in a so-called specific speaker recognition device that performs recognition using a dictionary corresponding to the user, it is known that erroneous recognition increases as the way of speaking changes as time passes after the dictionary is created. To prevent this, dictionary learning is necessary. Conventionally, as this dictionary learning method, (1) a method of rewriting the dictionary only for words that have been successfully recognized (Japanese Patent Laid-Open No. 56-81
(No. 899), (2) A method of rewriting the dictionary when the degree of similarity between the dictionary and the input speech becomes less than a certain value (Japanese Patent Laid-Open No. 56-5
(No. 1800), (3) A method in which the dictionary is updated only by the operator's operation for incorrectly recognized words (Japanese Patent Laid-Open No. 15690/1983)
No. 0) etc. have been announced. However, the above (1)
In this method, everything is rewritten when recognition is successful, so the disadvantage is that it will also be rewritten if noise from ambient noise occurs during recognition utterance, or if two utterances are insufficient. There is, but this is the same as mentioned above (
The same applies to method 2), and there is a possibility that the dictionary may be corrupted in these methods. Furthermore, the method (3) has drawbacks such as the fact that errors in operation by the operator have a large effect on the dictionary. Note that the above-mentioned (Japanese Patent Application Laid-open No. 56-51800) describes a method of re-registering by taking the average of the dictionary and the input voice, but in this way, sudden fluctuations in the input voice cause the input voice to be added to the dictionary. Although it has the advantage of reducing the impact, it has the disadvantage that words that are used infrequently in the registered car language are not updated sufficiently.

且−一煎本発明は、上記従来技術の欠点を解消するためになされ
たもので、特に、入力音声によって辞書を書き替える際
に、突発的変動の影響を減じかつ使用頻度の低い単語を
も十分更新できる音声登録バタＴン学習方式を提供しよ
うとするものである。The present invention has been made in order to eliminate the drawbacks of the above-mentioned prior art.In particular, when rewriting a dictionary based on input speech, the present invention reduces the influence of sudden fluctuations and also allows words that are used infrequently to be used. This paper attempts to provide a voice registration pattern learning method that can be updated sufficiently.

生−一處本発明の構成において、以下、一実施例に基づいて説明
する。The structure of the present invention will be described below based on one embodiment.

本発明においては、入力音声が認識装置によって正しく
認識された場合には、六カされた音声のパラメータと辞
書中に登録されたパラメータの平均を新たに辞書中に登
録するようにしており、その場合に、辞書のパラメータ
には入力音声のパラメータに対するよりかなり大きな重
みをつけて登録するようにしている。また、発声された
音声が正しく認識されなかった場合には、再度発声しな
おし、この入力音声のパラメータと辞書のパラメータの
平均をとって再登録するようにしているが、この場合に
は入力音声と辞書のパラメータにほぼ同等か又は入力音
声の方により大きな重みをつけて平均するようにしてい
る。従って、これによって、使用頻度の低い誤認識しや
すい単語は更新時に大きく変化し、使用頻度の高い単語
はわずかづつ刻々と更新される。In the present invention, when the input voice is correctly recognized by the recognition device, the average of the voice parameters and the parameters registered in the dictionary are newly registered in the dictionary. In this case, the dictionary parameters are registered with considerably greater weight than the input voice parameters. In addition, if the uttered voice is not recognized correctly, it is uttered again and the parameters of this input voice and the parameters of the dictionary are averaged and re-registered. The input speech is averaged by giving almost the same weight to the parameters of the dictionary, or by giving greater weight to the input speech. Therefore, as a result, words that are used less frequently and are likely to be misrecognized change significantly during updating, and words that are used more frequently are updated little by little.

図は、上述のごとき本発明を達成するための一構成例を
示す図で、図中５１はマイクロフォン、２はフィルタ一
群、３は音声区間切り出し部、４は切り換えスイッチ、
５は辞書、６は照合部、７は最大類似度選出部、８は結
果表示部、９は平均化部、１０は重み付は部、１１はス
イッチで、辞書の登録、認識は次のようにして行う。The figure is a diagram showing an example of a configuration for achieving the present invention as described above, in which 51 is a microphone, 2 is a group of filters, 3 is a voice section cutting section, 4 is a changeover switch,
5 is a dictionary, 6 is a matching section, 7 is a maximum similarity selection section, 8 is a result display section, 9 is an averaging section, 10 is a weighting section, and 11 is a switch. Dictionary registration and recognition are as follows. Do it.

く辞書の登録〉スイッチ４を辞書側に倒し、マイク１がらの音声入力を
フィルタ一群２を通してスペクトルに分解し、音声区間
切り出し部３を通して音声区間のみをとり出して辞９５
に登録する。Registration of a dictionary> Turn the switch 4 to the dictionary side, decompose the voice input from the microphone 1 into a spectrum through the filter group 2, extract only the voice section through the voice section cutout section 3, and output it to the dictionary 95.
Register.

く認　　識〉スイッチ１を認識側にし、前記と同様にして音声区間が
切り出された信号が辞書５内の各単記と照合され、類似
度が計算される。この中から最大類似度を得た単語を結
果として出力する。この表示を見てオペレータが正しい
か誤認識かを判断し、スイッチ１１で入力する。この情
報は重みに反映され、誤認識の時は再発声を要求し、大
きな重みで辞書単語と平均をとったものが辞書に登録し
なおされる。一方、正解の場合は誤認識時より小さな重
みでそのまま辞書単語と平均される。Recognition> Switch 1 is set to the recognition side, and the signal from which the speech section is cut out in the same manner as described above is compared with each single entry in the dictionary 5, and the degree of similarity is calculated. The word with the highest degree of similarity is output as the result. Looking at this display, the operator determines whether the recognition is correct or incorrect, and inputs it using the switch 11. This information is reflected in the weight, and in the event of a misrecognition, a re-speech is requested, and the word with a higher weight is averaged with the dictionary word and is re-registered in the dictionary. On the other hand, if the answer is correct, it is averaged with the dictionary word with a smaller weight than when it is incorrectly recognized.

匁−一困以上の説明から明らかなように、本発明によると、長時
間更新することなく安定した認識率が得られる音声認識
用辞書を実現することができる。As is clear from the above description, according to the present invention, it is possible to realize a speech recognition dictionary that can obtain a stable recognition rate without updating for a long time.

[Brief explanation of drawings]

図は、本発明の一実施例を説明するための構成図である
。１・・・マイクロフォン、２・・・フィルタ一群、３・
・音声区間切り出し部、４・・・切り換えスイッチ、５
・・・辞書、６・・・照合部、７・・・最大類似度選出
部、８・・・結果表示部、９・・・平均化部、１０・・
・重み付は部、１１・・・スイッチ。The figure is a configuration diagram for explaining one embodiment of the present invention. 1...Microphone, 2...Group of filters, 3.
・Voice section cutting section, 4... changeover switch, 5
. . . Dictionary, 6. Collation unit, 7. Maximum similarity selection unit, 8. Result display unit, 9. Averaging unit, 10.
・Weighting is part, 11...switch.

Claims

[Claims]

In a speech recognition device that registers words in a dictionary for comparison, if the input speech is correctly recognized, the pattern in the dictionary and the pattern of the input speech are given a certain weight and averaged, and then the result is returned to the dictionary. If the same word is incorrectly recognized, the next time the same word is uttered and correctly recognized, the weight value is changed and averaged, and then the word is registered in the dictionary again. A method for learning voice registration patterns in devices.