JPS6075894A

JPS6075894A - Dictionary updating system

Info

Publication number: JPS6075894A
Application number: JP58184153A
Authority: JP
Inventors: 潤一郎藤本
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1983-10-02
Filing date: 1983-10-02
Publication date: 1985-04-30
Anticipated expiration: 2009-05-02
Also published as: JPH0634179B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】１１次Ｉ本発明は、音響認識装置における辞書の更新方式に関す
るものである。DETAILED DESCRIPTION OF THE INVENTION Eleventh Order I The present invention relates to a dictionary updating method in an acoustic recognition device.

鎧ｍ曹ＤＰ法（動的計画法）を用いない音声パターン照合法、
或いは話者による変動吸収法として、音声の時間−周波
数パターンのピークを連絡したパターンとブロードなパ
ターンとの重ね合わせによる照合法か考えられている（
音響学会秋季全国大会１９８３．１０）。この方法の例
を第１図に示す。第１図において、１はマイクロフォン
、２はフィルタ群、３は音声区間検出部、４は２値化（
閑イ１６１）部、５は２４ｆｉ化（閾値２）部、６はレ
ジスタ、７は加算部、８は辞書部、９は類似度算出部、
１０は結果表示部で、まず、辞書登録に際しては登録す
べき単語を順に発声し、各単語を複数回発白するとする
。マイクｌから入力された音ｊ１ｊは周波数パターンに
変換され１゛１声に一間がす」り出されて、ある悶（直
（閾値２）によってＯｌｌに２値化されてレジスタ６に
格納される。次に同じ単語の２回目の発声が行なわれ、
同様の２値化パターンかすでにレジスタに格納されてい
るパターンと加算され、再びレジスタに納められる。こ
うして１つの単語について複数回発声されたパターンは
加算され、辞書単語数だけ登録される。Voice pattern matching method that does not use DP method (dynamic programming),
Alternatively, as a method of absorbing variations caused by speakers, a matching method is considered that involves superimposing a pattern that connects the peaks of the time-frequency pattern of the voice with a broad pattern (
Autumn National Conference of the Acoustical Society of Japan (October 1983). An example of this method is shown in FIG. In Fig. 1, 1 is a microphone, 2 is a filter group, 3 is a voice section detection unit, and 4 is a binarization (
161) section, 5 is a 24fi conversion (threshold value 2) section, 6 is a register, 7 is an addition section, 8 is a dictionary section, 9 is a similarity calculation section,
Reference numeral 10 denotes a result display section. First, when registering a dictionary, words to be registered are uttered in order, and each word is uttered multiple times. The sound j1j input from the microphone L is converted into a frequency pattern, outputted with a pause for each voice, and then binarized into the Oll by a certain amount of time (threshold value 2) and stored in the register 6. A second utterance of the same word is then made,
A similar binarized pattern is added to a pattern already stored in the register and stored in the register again. In this way, patterns in which one word is uttered multiple times are added up and registered as many times as there are words in the dictionary.

次に認識に際しては、音声が区間検出された後辞書作成
時と異る閾値（閾値ｌ）で２値化される。Next, during recognition, after the speech is detected, it is binarized using a different threshold (threshold 1) than when creating the dictionary.

仮に閾値１〉閾値２とすると辞書パターンに比べ入カバ
ターンの力か２値化後の「１」の幅が狭いことになる。If threshold value 1>threshold value 2, then the strength of the input pattern or the width of "1" after binarization will be narrower than in the dictionary pattern.

このような入カバターンを辞書の各単語のパターン七に
重ねる操作をし、その重なりの度合から類似度を計算し
、最大類似度をもつ辞書単語を認識結果とする。一般に
人間の発声は刻々と変化し、体調などでも変化するため
、使用しながら辞書を更新する必要が生ずる。今、便宜
上、時間ｌサンプル分の３回加賀の辞書によって更新す
る場合について説明すると、周波数方向に８サンプルで
第２図（ａ）のような計算を（ｂ）のような入力の１．
０パターンで更新する時、（ａ）と（ｂ）のパターンを
加え合わせ（Ｃ）のパターンにして１以上の部分から１
を引いて最大値が「３」となるような辞ａ　（ｄ）に更
新する方法が考えられる。ところが、このような方法で
は、（ｂ）のパターン特有の情報（ｇもの変化等による
）である周波数第２サンプルのｒｌＪか（ｄ）には反映
されない。つまり更新しても辞書は新しくはならない欠
点がある。又、第３図に示す如く逆に現在の辞書パター
ンの「１」以上の部分から先にｒｌＪを引き（ｂ）、そ
の後入カッくターン（ｄ）を加える方法も考えられる。The input pattern is superimposed on each pattern of each word in the dictionary, the degree of similarity is calculated from the degree of overlap, and the dictionary word with the maximum degree of similarity is taken as the recognition result. In general, human speech changes from moment to moment, and changes due to factors such as physical condition, so it is necessary to update the dictionary while using it. For convenience, we will explain the case where Kaga's dictionary is updated three times for a time of l samples.The calculation as shown in FIG.
When updating with 0 pattern, add patterns (a) and (b) and make pattern (C) from 1 or more parts.
A possible method is to subtract ``a'' and update the term a(d) such that the maximum value becomes ``3''. However, in such a method, rlJ of the second frequency sample, which is information specific to the pattern (b) (due to a change in g, etc.), is not reflected in (d). In other words, there is a drawback that the dictionary will not be updated even if it is updated. Alternatively, as shown in FIG. 3, it is conceivable to conversely draw rlJ (b) from the part of the current dictionary pattern that is greater than or equal to "1", and then add an incoming turn (d).

この場合、第３図のように一見更新され人カッくターン
の周波数第２サンプルのｒｌＪが保存されるかに見える
。しかし、次に更新される時にはまず（ａ）の「ｌ」の
部分が引き去られ、今更新された部分かＯとなってしま
い、その後に反映しないとｌ、）う欠点かある。また、
第４図のような方式も考えられる。第４図において、１
１は入カバターンレジスタ、１２は加算部、１３は減算
部、１４は辞書部、１５は２値化部、１６はレジスタで
これは該当する辞書パターン（第５図（ａ））を２値化
して（ｂ）の如きパターンにしてレジスタに保持し、人
カバターンと該当辞書パターンと加算した後（ｄ）、！
パターン（ｂ）を差し引くことによって更新するもので
ある（ｅ）。この方式によると入カバターンの４４徴的
な部分Ａは更新辞書に保存されるが１例えばＢのような
辞書中の「ｌ」のパターンは更新辞書」二でも「１」で
あり、いくら人カバターンで更新されても「１」以上に
なることがないという欠点が生じる。In this case, as shown in FIG. 3, it appears at first glance that the rlJ of the second sample of the frequency of the human cut turn is updated and saved. However, when it is updated next time, the "l" part in (a) is first removed, and the currently updated part becomes O, which has the disadvantage of not being reflected afterwards. Also,
A method as shown in FIG. 4 is also conceivable. In Figure 4, 1
1 is an input pattern register, 12 is an addition section, 13 is a subtraction section, 14 is a dictionary section, 15 is a binarization section, and 16 is a register that converts the corresponding dictionary pattern (Fig. 5 (a)) into two values. After converting into a pattern like (b) and storing it in a register, adding the human cover pattern and the corresponding dictionary pattern, (d), !
The pattern (e) is updated by subtracting the pattern (b). According to this method, the 44 characteristic part A of the input kabataan is stored in the updated dictionary, but 1. For example, the pattern of "l" in the dictionary such as B is "1" even in the updated dictionary. The disadvantage is that even if the value is updated, it will never become greater than "1".

止−−１本発明は、上述のごとき欠点を解決するためになされた
もので、特に、音声認識装置においてんに高認識率を保
持できるような辞書を得るための辞書更新方式を提供す
ることを目的としてなされたものである。The present invention has been made in order to solve the above-mentioned drawbacks, and in particular, to provide a dictionary updating method for obtaining a dictionary that can maintain a high recognition rate in a speech recognition device. It was made for the purpose of

癒−一一戎本発明の構成について、以下、実施例に基づいて説明す
る。EMBODIMENT OF THE INVENTION The structure of the present invention will be described below based on examples.

本発明においては、前記目的を達するために次の如き操
作をしている。すなわち、音声認識装置で認識を行ない
、正解が得られたものに関し、認識された入カバターン
中の「０」の部分を負の単位で置きかえ、更新すべき辞
書パターンに加算する。その結果の負の部分を再び「０
」に変換し、辞書パターンの最大値を越える偵は最大値
に置きかえて辞書更新パターンを作成する。この方法の
基本的条件は次のようなものであるが、ここでは便宜」
二２値化パターンを３つ加ｑして辞書を作る（同じ単語
を３回発声して各々の２値化パターンを加３７する）と
して考える。つまり辞書パターン中の値はＯ〜３までの
整数となり辞書パターンの最大値は３である。In the present invention, the following operations are performed to achieve the above object. That is, recognition is performed by the speech recognition device, and for those for which a correct answer is obtained, the "0" part in the recognized input pattern is replaced with a negative unit and added to the dictionary pattern to be updated. The negative part of the result is again “0”
'', and if the value exceeds the maximum value of the dictionary pattern, a dictionary update pattern is created by replacing it with the maximum value. The basic conditions for this method are as follows, but we will explain them here for convenience.
It is assumed that a dictionary is created by adding 3 2-binarization patterns (by saying the same word 3 times and adding 37 each of the 2-binarization patterns). That is, the values in the dictionary pattern are integers from O to 3, and the maximum value of the dictionary pattern is 3.

（条件）辞書パターンのＯに入カバターンのＯが重なると更新パ
ターンはＯとするｔｔ　ｌ　ｔｔ、Ｑ　ｔｔ　Ｏ／／／／、　２　／／　Ｑ　ｔｔ　ｌ　ｔｔｔｔ　３　ｔｔ
　Ｑ　／／　２　ｔｔｌ／　Ｏｌｌ　ｌ　ｌｌ　１　〃ｔｔ　ｌ　／／　ｌ　／／　２／／ｌｔ　２　ｔｔ　ｌ　ｔｔ　３　／／ｌｔ　３　ｔｔ　ｌ　ｔｔ　３　／／これを実現するために更新法の構成を上記の如くした。(Condition) If the O of the dictionary pattern and the O of the input pattern overlap, the update pattern is O.tt l tt, Q tt O// //, 2 // Q tt l tttt 3 tt
Q // 2 tt l/ Oll l ll 1 〃 tt l // l // 2// lt 2 tt l tt 3 // lt 3 tt l tt 3 // To achieve this, the configuration of the update method is described above. It was like this.

この上の条性に従うと第７図（ａ）の如き辞書パターン
を（ｂ）の如き人カバターンで更新すると（ｆ）の如き
パターンとなる。According to the above condition, when a dictionary pattern as shown in FIG. 7(a) is updated with a human cover turn as shown in FIG. 7(b), a pattern as shown in FIG. 7(f) is obtained.

未発、明のＬ順を第６図に示し、合わせて第７図（ａ）
、（ｂ）のパターンがどのように変化して第７図（ｆ）
のパターンになって行くかを第７図（Ｃ）〜（ｆ）に示
す。まず人カデ〜り中に０があるかを判断し、０を負の
甲位である一工に置き変える（第７図（Ｃ））。これと
更新すべき辞書パターンを加算する（第７図（ｄ））。The L order of undiscovered and light is shown in Fig. 6, and also shown in Fig. 7(a).
, how the pattern in (b) changes to Fig. 7(f)
Figures 7(C) to (f) show how the pattern changes. First, it is determined whether there is a 0 in the position of the person, and the 0 is replaced with 1, which is a negative position (Fig. 7 (C)). This and the dictionary pattern to be updated are added (FIG. 7(d)).

この第７図（ｄ）の′パターン中の負の部分をｒＱＪへ
戻す（第７図（ｅ））。その後、パターンの最大値の補
正をする。この場合、辞書は３つのパターンの和で構成
されているため最大値は３である。他の単語辞書もこの
ように構成されているため常に全ての辞書パターンも回
し構成にしなければならない。そこで第６図（ｅ）のパ
ターン中の「４」以」−の部分を全て「３」に書き変え
ろことで第７図（ｆ）のパターンつまり更新パターンが
得られる。これは前述のようにしてめた第７図の（ｆ）
と一致して望む更新パターンが作られていることを示し
ている。The negative part in the ' pattern of FIG. 7(d) is returned to rQJ (FIG. 7(e)). After that, the maximum value of the pattern is corrected. In this case, the maximum value is 3 because the dictionary is made up of the sum of three patterns. Since other word dictionaries are also structured in this way, all dictionary patterns must always be configured in a rotating configuration. Therefore, by rewriting all the parts "4" and "-" in the pattern of FIG. 6(e) to "3", the pattern of FIG. 7(f), that is, the updated pattern is obtained. This is (f) in Figure 7, which was created as described above.
This shows that the desired update pattern is created in accordance with the above.

防−一一ス以上の説明から明らかなごとく、本発明によるよ、入カ
バターンの刻々の変化によって辞書パターンも追従して
変化し、従って、発声者の声の状態か変化しても認識率
を低下させることなく高６’！識−Ｖを維持するための
辞書を得ることができる。Prevention - Eleven As is clear from the above explanation, according to the present invention, as the input pattern changes from moment to moment, the dictionary pattern changes accordingly, and therefore, even if the state of the speaker's voice changes, the recognition rate cannot be improved. High 6' without lowering! A dictionary for maintaining knowledge-V can be obtained.

[Brief explanation of drawings]

第１図は、従来の音声照合法の一例を説明するための電
気的ブロック線図、第２図及び第３図は、それぞれ辞書
更新法の例を説明するための１％　、第４図は、従来の
音声照合法の他の例を示す電気的ブロック線図、・第５
図は、その辞書更新法の例を説明すめための図、第６図
は、本発明の一実施例を説明するためのフローチャート
、第７１剣は、そのための信号パターン図である。１・・・マイク、２・・・フィルター〇ｆ、３・・・音
声区間検出部、４・・・２値化（閾値ｌ）部、５・・・
２値化（１４Ｊ値２）部、６・・・レジスタ、７・・・
加算部、８・・・辞ａｆｔ、９・・・類似度算出部、１
０・・・結果出力部、１１　用大カパタールジスタ、１
２・・・加算部、１３・・・減算部、１４・・・辞書部
、１５・・・２値化部、１６・・・レジスタ。FIG. 1 is an electrical block diagram for explaining an example of a conventional voice matching method, FIGS. 2 and 3 are electrical block diagrams for explaining an example of a dictionary updating method, and FIG. , an electrical block diagram illustrating another example of the conventional voice matching method, ・Fifth
The figure is a diagram for explaining an example of the dictionary updating method, FIG. 6 is a flowchart for explaining an embodiment of the present invention, and the 71st sword is a signal pattern diagram for that purpose. 1... Microphone, 2... Filter 〇f, 3... Voice section detection section, 4... Binarization (threshold l) section, 5...
Binarization (14J value 2) section, 6... register, 7...
Addition unit, 8... End aft, 9... Similarity calculation unit, 1
0...Result output section, 11 Large kaputar resistor, 1
2... Addition section, 13... Subtraction section, 14... Dictionary section, 15... Binarization section, 16... Register.

Claims

[Claims]

Convert speech into time-frequency patterns and store some of them as a dictionary. When unknown speech is input, this unknown speech is converted into time-frequency patterns in the same way as above and stored in the dictionary. If a correct answer is obtained in a speech recognition device that calculates similarity by overlapping each pattern with each pattern, it changes the "0" part in the human cover turn to a negative unit by 1&, and updates the dictionary pattern to be updated. A dictionary update method characterized in that an update pattern is created by adding the negative part to "0" again, and replacing the part exceeding the maximum value of the dictionary pattern with the maximum value.