JPH06266385A

JPH06266385A - Dictionary update of speech recognition device

Info

Publication number: JPH06266385A
Application number: JP5055541A
Authority: JP
Inventors: Ryosuke Hamazaki; 良介濱崎
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1993-03-16
Filing date: 1993-03-16
Publication date: 1994-09-22
Anticipated expiration: 2019-03-31
Also published as: JP3514481B2

Abstract

(57)【要約】【目的】話者の発声状態の変動しても高い認識率を維
持できるようにするとともに、歪んだ音声パターンが辞
書に登録されることを避けること。【構成】音声パターン照合手段３は音響分析手段２で
特徴が抽出された入力音声パターンと、辞書４のテンプ
レートとのスコア、および、入力音声パターン保持手段
６に保持されている直前の正解入力音声パターンとのス
コアを求める。認識結果判定手段５は上記スコアに基づ
き認識結果を出力する。また、認識結果が正しくない場
合、ユーザ入力手段９により入力音声パターンに正解ラ
ベルを付与する。照合結果判定手段８は、辞書４と正解
入力音声パターン間のスコアと、入力音声パターン保持
手段６の正解音声パターンと上記入力音声パターン間の
スコアを比較する。辞書更新手段７は上記比較結果に基
づき辞書４の更新を行う。 (57) [Summary] [Purpose] To maintain a high recognition rate even when the speaking state of the speaker fluctuates, and to avoid distorted voice patterns from being registered in the dictionary. The voice pattern matching unit 3 has a score of the input voice pattern whose features are extracted by the acoustic analysis unit 2 and the template of the dictionary 4, and the correct input voice immediately before being held in the input voice pattern holding unit 6. Find the score with the pattern. The recognition result determination means 5 outputs the recognition result based on the score. If the recognition result is not correct, the user input means 9 gives a correct label to the input voice pattern. The matching result determination means 8 compares the score between the dictionary 4 and the correct input voice pattern with the score between the correct voice pattern of the input voice pattern holding means 6 and the input voice pattern. The dictionary updating means 7 updates the dictionary 4 based on the comparison result.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声認識装置における辞
書更新方式に関し、特に本発明は入力音声パターンで辞
書を更新（再学習）する手段を持った音声認識装置にお
ける辞書更新方式に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a dictionary updating system in a voice recognition device, and more particularly, the present invention relates to a dictionary updating system in a voice recognition device having means for updating (re-learning) a dictionary with an input voice pattern. .

【０００２】[0002]

【従来の技術】登録型の音声認識装置においては、認識
装置を使用し始める前に、予め認識装置を使用する者が
自分の声の特徴を音声認識装置の辞書に登録しておく必
要がある。しかし、声の特徴は変化し易く、認識時の周
囲の環境や物理的・精神的な制限によって容易に変わり
うるだけでなく、時間の経過とともに変化してしまい、
同じ辞書を使い続けていると、次第に認識率が低下す
る。2. Description of the Related Art In a registration type voice recognition device, it is necessary for a person who uses the recognition device to register his or her voice characteristics in a dictionary of the voice recognition device before starting to use the recognition device. . However, the characteristics of the voice are liable to change, not only change easily due to the surrounding environment at the time of recognition and physical and mental limitations, but also change over time,
If you continue to use the same dictionary, the recognition rate will gradually decrease.

【０００３】従来は、これに対処するために、マルチ・
テンプレートの辞書に対して、認識時に正解だった場合
に、音響分析された入力パターンと距離が一番遠い正解
テンプレートを入れ換えることにより辞書の更新を行っ
ていた。図７は上記した従来の音声認識装置を示す図で
ある。同図において、１はマイクロフォン等により音声
を入力する音声入力手段、２はマイクロフォン等により
入力された音声信号に対して周波数分析等を行い音声と
しての特徴を抽出する音響分析手段、３は音響分析手段
２により分析された入力音声パターンと、予め分析され
辞書に登録されている各テンプレートとの照合を行い２
つのパターンがどのくらい似ているかを示す「類似
度」、もしくは、どのくらい離れているかを示す「距
離」を計算し（以下、これらをスコアという）、出力す
る音声パターン照合手段である。Conventionally, in order to cope with this, multi-
When the correct answer was recognized at the time of recognition, the dictionary of the template was updated by replacing the correct answer template with the farthest distance with the input pattern subjected to the acoustic analysis. FIG. 7 is a diagram showing the above-mentioned conventional voice recognition device. In the figure, reference numeral 1 is a voice input means for inputting voice by a microphone or the like, 2 is acoustic analysis means for performing frequency analysis or the like on a voice signal input by a microphone or the like, and extracting characteristics as voice, 3 is acoustic analysis The input voice pattern analyzed by the means 2 is collated with each template previously analyzed and registered in the dictionary.
This is a voice pattern matching means that calculates "similarity" indicating how similar two patterns are, or "distance" indicating how far they are (hereinafter, these are referred to as scores), and outputs them.

【０００４】また、４は予め認識対象が分析されたパタ
ーンを登録した辞書、５は音声パターン照合手段３から
出力された各テンプレートに対するスコアにしたがっ
て、ソートし、類似度が大きい順、もしくは、距離が小
さい順に一つまたは複数のテンプレート・ラベルを出力
する認識結果判定手段、６は音響分析手段２により分析
された入力音声パターンを一時的に保持しておく入力音
声パターン保持手段、７は入力音声パターンを辞書４に
登録したり、辞書４のテンプレートを削除する辞書更新
手段である。Further, 4 is a dictionary in which patterns in which recognition targets have been analyzed are registered in advance, and 5 is sorted according to the score for each template output from the voice pattern matching means 3 in descending order of similarity or distance. Recognition result determining means for outputting one or more template labels in ascending order, 6 is input voice pattern holding means for temporarily holding the input voice pattern analyzed by the acoustic analysis means 2, and 7 is input voice It is a dictionary updating means for registering a pattern in the dictionary 4 or deleting a template of the dictionary 4.

【０００５】同図において、音声入力手段１より入力さ
れた音声は音響分析手段２により分析されて特徴が抽出
され、音声パターン照合手段３に与えられるとともに、
入力音声パターン保持手段６に与えられ保持される。音
声パターン照合手段３は辞書４に登録されたテンプレー
トと入力音声の特徴パラメータとを照合し、それらの間
のスコアを求める。認識結果判定手段５はユーザからの
入力および音声パターン照合手段３が出力するスコアに
基づき入力音声を判定し、認識結果を出力する。In the figure, the voice input from the voice input means 1 is analyzed by the acoustic analysis means 2 to extract the features, which are given to the voice pattern matching means 3 and
It is given to the input voice pattern holding means 6 and held. The voice pattern matching means 3 matches the template registered in the dictionary 4 with the characteristic parameters of the input voice, and obtains the score between them. The recognition result determination means 5 determines the input voice based on the input from the user and the score output by the voice pattern matching means 3, and outputs the recognition result.

【０００６】また、認識結果判定手段５の出力は辞書更
新手段７に与えられ、辞書更新手段７は認識結果が正解
であった場合に、辞書４に登録された距離が一番遠い正
解テンプレートと入力音声パターン保持手段６に保持さ
れた入力音声パターンとを入れ換えることにより辞書の
更新を行う。Further, the output of the recognition result judging means 5 is given to the dictionary updating means 7, and when the recognition result is a correct answer, the dictionary updating means 7 stores the correct answer template with the longest distance registered in the dictionary 4. The dictionary is updated by replacing the input voice pattern held in the input voice pattern holding means 6 with each other.

【０００７】[0007]

【発明が解決しようとする課題】ところで、上記した従
来の辞書更新方式においては、認識結果が正解であった
場合には、何らチェックが行われることなく入力音声パ
ターン保持手段６に保持された入力音声パターンと、正
認識カテゴリの距離が一番遠い正解テンプレートとが入
れ換えられる。By the way, in the above-mentioned conventional dictionary updating method, when the recognition result is correct, the input held in the input voice pattern holding means 6 is not checked. The voice pattern and the correct answer template with the longest distance in the correct recognition category are replaced.

【０００８】このため、偶然に入力音声パターンが歪む
などの不都合が生じている場合であっても、正認識の場
合には辞書が更新されてしまうという問題があった。ま
た、誤認識した場合でも、声質の変化による当然の結果
である場合もあるが、上記した従来の辞書更新方式にお
いては、このような場合に辞書の更新が行われず、初期
テンプレートに拘束され続けるという問題があった。For this reason, there is a problem that the dictionary is updated in the case of correct recognition even if the input voice pattern is accidentally distorted or the like. Further, even if the recognition error is made, it may be a natural result due to a change in voice quality, but in the above-described conventional dictionary update method, the dictionary is not updated in such a case, and the initial template continues to be bound. There was a problem.

【０００９】図８、図９は、声質が変化した場合におけ
る特徴パラメータ空間の概念図であり、図８、図９は３
つのカテゴリＡ，Ｂ，Ｃが存在する場合を示している。
図８、図９において、図８は辞書作成時のカテゴリの分
布、図９（ａ）（ｂ）は声質の変動による分布が変動し
た場合を示し、○で囲んだところが各カテゴリが分布し
ている範囲であり、図８においては、その中に黒三角、
黒丸、黒四角のサンプルが存在しており、図９（ａ），
（ｂ）においては、黒四角、黒丸で認識時の入力サンプ
ルが示されている。また、３つのカテゴリの境界は実線
で示されており、点線は声質の変化により境界線が移動
した様子を示している。8 and 9 are conceptual diagrams of the characteristic parameter space when the voice quality changes, and FIGS.
The case where there are two categories A, B, and C is shown.
8 and 9, FIG. 8 shows the distribution of categories at the time of dictionary creation, and FIGS. 9 (a) and 9 (b) show the case where the distribution changes due to the change in voice quality. The circles indicate the distribution of each category. In FIG. 8, a black triangle,
There are black circles and black squares samples, as shown in FIG.
In (b), black squares and black circles indicate input samples at the time of recognition. The boundaries of the three categories are shown by solid lines, and the dotted lines show how the boundaries have moved due to changes in voice quality.

【００１０】図８に示すように、辞書を作成した後しば
らくは、認識時の各カテゴリの分布状態と辞書のそれと
が、一致しているので、高い認識率を得ることができ
る。ここで、時間の経過とともに声質の変化等により、
図９（ａ），（ｂ）に示すように実際の各カテゴリの分
布状態が変化したものと仮定する。なお、同図には、説
明の便宜上、カテゴリの分布が全体的に同じ方向に移動
したと仮定しているが、実際にはより複雑に変化してい
る。As shown in FIG. 8, for a while after the dictionary is created, the distribution state of each category at the time of recognition matches that of the dictionary, so that a high recognition rate can be obtained. Here, due to changes in voice quality over time,
It is assumed that the actual distribution state of each category has changed as shown in FIGS. 9 (a) and 9 (b). It should be noted that, for convenience of explanation, it is assumed in the same drawing that the distribution of categories has moved in the same direction as a whole, but in reality, the distribution changes more complicatedly.

【００１１】各カテゴリの分布が移動することにより、
実際の各カテゴリの境界線は図９（ａ）、図９（ｂ）に
示すように実線から点線に移っていく。さて図９（ａ）
において、黒四角で示すあるサンプルに注目してみる。
この入力サンプルは実はカテゴリＣ’であるとする。こ
の入力サンプルは時間経過後のカテゴリＣ’の分布から
かけ離れており、点線の境界により本来カテゴリＢと判
定されるべきである。しかしながら図８に示す辞書作成
時の分布では、カテゴリＣと判定され、前記した従来方
式においては、辞書が上記サンプルにより更新されてし
まう。By changing the distribution of each category,
The actual boundary line of each category shifts from a solid line to a dotted line as shown in FIGS. 9 (a) and 9 (b). Now, Fig. 9 (a)
Let's focus on a sample shown as a black square.
This input sample is actually a category C '. This input sample is far from the distribution of the category C ′ after the passage of time, and should be originally determined as the category B by the boundary of the dotted line. However, the distribution at the time of creating the dictionary shown in FIG. 8 is determined to be category C, and the dictionary is updated by the sample in the above-described conventional method.

【００１２】また、図９（ｂ）において、黒丸で示すあ
るサンプルに注目してみる。この入力サンプルは実はカ
テゴリＢ’であるとする。この入力サンプルは点線の境
界により本来カテゴリＢと判定されるので、更新される
べきであるが、辞書作成時の分布では、カテゴリＡと判
定され、辞書が更新されない。以上のように、従来方式
においては、本来更新すべきでないのに辞書が更新され
たり、また辞書を更新すべきなのに更新されなかったり
する場合が生ずる。Further, in FIG. 9B, attention is paid to a certain sample indicated by a black circle. This input sample is actually a category B '. This input sample should be updated because it is originally determined to be category B due to the boundary of the dotted line, but in the distribution at the time of dictionary creation, it is determined to be category A and the dictionary is not updated. As described above, in the conventional method, the dictionary may be updated even though it should not be updated, or the dictionary may be updated but not updated.

【００１３】本発明は上記した従来技術の問題点を改善
するためになされたものであって、話者の発声状態が時
間の経過とともに変化しても、認識を行う度に辞書の特
徴パターンを新しいものに更新していくことにより、話
者の発声状態が変動しても、高い認識率を維持すること
が可能であり、また、偶然に歪んだ音声パターンが辞書
に登録されることを防止することができる音声認識装置
における辞書更新方式を提供することを目的とする。The present invention has been made in order to improve the above-mentioned problems of the prior art. Even if the utterance state of the speaker changes with the passage of time, the feature pattern of the dictionary is set every time recognition is performed. By updating to a new one, it is possible to maintain a high recognition rate even if the speaking state of the speaker fluctuates, and prevent accidentally distorted voice patterns from being registered in the dictionary. It is an object of the present invention to provide a dictionary updating method in a voice recognition device that can be used.

【００１４】[0014]

【課題を解決するための手段】図１は本発明の原理説明
図である。同図において、１はマイクロフォン等により
音声を入力する音声入力手段、２はマイクロフォン等に
より入力された音声信号に対して周波数分析等を行い音
声としての特徴を抽出する音響分析手段、３は音響分析
手段２により分析された入力音声パターンと、予め分析
され辞書に登録されている各テンプレートとの照合を行
うとともに、入力音声パターン保持手段６に保持されて
いる直前に入力された音声パターンとの照合を行い、類
似度もしくは距離を計算する音声パターン照合手段、４
は予め認識対象が分析されたパターンを登録した辞書、
５は音声パターン照合手段３の出力に基づき認識結果を
出力する認識結果判定手段、６は音響分析手段２により
分析された入力音声パターンを一時的に保持しておく入
力音声パターン保持手段、７は入力音声パターンを辞書
４に登録したり、辞書４のテンプレートを削除する辞書
更新手段、８は入力音声の認識対象となるテンプレート
について、音声パターン照合手段３から出力された辞書
４側と入力音声パターン保持手段６側のスコアを比較
し、何方のスコアが高いかを判定する照合結果判定手
段、９は入力音声パターンに正解ラベルを付与するユー
ザ入力手段である。FIG. 1 is a diagram for explaining the principle of the present invention. In the figure, reference numeral 1 is a voice input means for inputting voice by a microphone or the like, 2 is acoustic analysis means for performing frequency analysis or the like on a voice signal input by a microphone or the like, and extracting characteristics as voice, 3 is acoustic analysis The input voice pattern analyzed by the means 2 is collated with each template previously analyzed and registered in the dictionary, and the input voice pattern held by the input voice pattern holding means 6 is collated with the voice pattern inputted immediately before. Voice pattern matching means for calculating similarity or distance, and 4
Is a dictionary in which patterns for which recognition targets have been analyzed are registered in advance,
Reference numeral 5 is a recognition result determination means for outputting a recognition result based on the output of the voice pattern matching means 3, 6 is an input voice pattern holding means for temporarily holding the input voice pattern analyzed by the acoustic analysis means 2, and 7 is The dictionary updating means 8 registers the input voice pattern in the dictionary 4 or deletes the template of the dictionary 4, and 8 indicates the dictionary 4 side output from the voice pattern matching means 3 and the input voice pattern for the template to be recognized as the input voice. Collation result determining means for comparing the scores on the holding means 6 side to determine which score is higher, and 9 is a user input means for giving a correct answer label to the input voice pattern.

【００１５】上記課題を解決するため、本発明の請求項
１の発明は、図１に示すように、音声入力手段１より入
力された未知入力音声を音響分析する音響分析手段２
と、音響分析手段２により得られた入力音声パターン
と、予め辞書４中に登録された各ラベルに対応した標準
音声パターンとを照合する音声パターン照合手段３と、
その照合結果に基づき、認識結果を得る認識結果判定手
段５と、入力音声パターンにより辞書４を更新する辞書
更新手段７と、入力音声パターンに正解のラベルを付与
するユーザ入力手段９とを備えた音声認識装置におい
て、入力音声パターンを一時的に保持する入力音声パタ
ーン保持手段６を設け、認識結果が正解であった入力音
声パターン、もしくは、認識時にユーザ入力手段９によ
り正解ラベルが付与された入力音声パターンについて、
音声パターン照合手段３において、入力音声パターンと
辞書４に登録された各正解標準音声パターンとを照合す
るとともに、入力音声パターン保持手段６に、上記入力
音声パターンと同じラベルが付された正解音声パターン
が保持されている場合に、上記入力音声パターンと入力
音声パターン保持手段６に保持された音声パターンとを
照合し、上記入力音声パターンと入力音声パターン保持
手段６に保持された音声パターンの類似度が、上記入力
音声パターンと辞書４に登録された標準音声パターンと
の類似度より大き場合に、入力音声パターン保持手段６
に保持された音声パターンを辞書４に登録し、辞書４の
標準音声パターンを削除するようにしたものである。In order to solve the above problems, the invention of claim 1 of the present invention, as shown in FIG. 1, is an acoustic analysis means 2 for acoustically analyzing an unknown input speech input from the speech input means 1.
And a voice pattern matching unit 3 for matching the input voice pattern obtained by the acoustic analysis unit 2 with the standard voice pattern corresponding to each label registered in the dictionary 4 in advance,
A recognition result determining means 5 for obtaining a recognition result based on the matching result, a dictionary updating means 7 for updating the dictionary 4 with an input voice pattern, and a user input means 9 for giving a correct answer label to the input voice pattern are provided. In the voice recognition device, an input voice pattern holding means 6 for temporarily holding an input voice pattern is provided, and an input voice pattern whose recognition result is a correct answer, or an input with a correct answer label given by the user input means 9 at the time of recognition About voice patterns,
The voice pattern collating means 3 collates the input voice pattern with each correct answer standard voice pattern registered in the dictionary 4, and the input voice pattern holding means 6 is provided with the same label as the above input voice pattern. Is held, the input voice pattern is compared with the voice pattern held in the input voice pattern holding means 6, and the similarity between the input voice pattern and the voice pattern held in the input voice pattern holding means 6 Is greater than the similarity between the input voice pattern and the standard voice pattern registered in the dictionary 4, the input voice pattern holding means 6
The voice pattern stored in the dictionary is registered in the dictionary 4, and the standard voice pattern in the dictionary 4 is deleted.

【００１６】本発明の請求項２の発明は、新たな入力音
声パターンと入力音声パターン保持手段６に保持された
音声パターンの類似度が、上記入力音声パターンと辞書
４に登録された標準音声パターンとの類似度より大きい
場合に、請求項１の発明の発明のように入力音声パター
ン保持手段６に保持された音声パターンを辞書４に登録
するかわりに、新たに入力された入力音声パターンを辞
書４に登録するようにしたものである。According to a second aspect of the present invention, the similarity between the new input voice pattern and the voice pattern held in the input voice pattern holding means 6 is the standard voice pattern registered in the dictionary and the input voice pattern. If the degree of similarity is higher than the similarity to the present invention, instead of registering the voice pattern held in the input voice pattern holding means 6 in the dictionary 4 as in the invention of claim 1, the newly input input voice pattern is stored in the dictionary. It is designed to be registered in 4.

【００１７】本発明の請求項３の発明は、請求項１また
は請求項２の発明において、新たな入力音声パターンと
入力音声パターン保持手段６に保持された音声パターン
の類似度が、上記入力音声パターンと辞書４に登録され
た標準音声パターンとの類似度より小さい場合に、入力
音声パターン保持手段６に保持された音声パターンを削
除し、入力音声パターンを入力音声パターン保持手段６
に登録するようにしたものである。According to a third aspect of the present invention, in the first or second aspect of the invention, the similarity between the new input voice pattern and the voice pattern held in the input voice pattern holding means 6 is the input voice. If the similarity between the pattern and the standard voice pattern registered in the dictionary 4 is smaller than the similarity, the voice pattern held in the input voice pattern holding means 6 is deleted, and the input voice pattern is held in the input voice pattern holding means 6
It was designed to be registered in.

【００１８】本発明の請求項４の発明は、請求項１また
は請求項２の発明において、新たな入力音声パターンと
入力音声パターン保持手段６に保持された音声パターン
の類似度が、上記入力音声パターンと辞書４に登録され
た標準音声パターンとの類似度より小さい場合に、入力
音声パターン保持手段６に保持された音声パターンをそ
のまま残し、上記入力音声パターンを削除するようにし
たものである。According to a fourth aspect of the present invention, in the first or second aspect of the invention, the similarity between the new input voice pattern and the voice pattern held in the input voice pattern holding means 6 is the input voice. When the degree of similarity between the pattern and the standard voice pattern registered in the dictionary 4 is smaller than that, the voice pattern held in the input voice pattern holding means 6 is left as it is, and the input voice pattern is deleted.

【００１９】本発明の請求項５の発明は、請求項１また
は請求項２の発明において、新たな入力音声パターンと
入力音声パターン保持手段６に保持された音声パターン
の類似度が、上記入力音声パターンと辞書４に登録され
た標準音声パターンとの類似度より小さい場合に、入力
音声パターン保持手段６に保持された音声パターンと入
力音声パターンを削除するようにしたものである。According to a fifth aspect of the present invention, in the first or second aspect of the invention, the similarity between the new input voice pattern and the voice pattern held in the input voice pattern holding means 6 is the input voice. When the degree of similarity between the pattern and the standard voice pattern registered in the dictionary 4 is smaller than that, the voice pattern and the input voice pattern held in the input voice pattern holding means 6 are deleted.

【００２０】[0020]

【作用】図１において、音声入力手段１より入力された
音声は音響分析手段２により分析されて特徴が抽出さ
れ、音声パターン照合手段３に与えられる。音声パター
ン照合手段３は音響分析手段２により分析された入力音
声パターンと、予め分析され辞書４に登録されている各
標準音声パターンとの照合を行うとともに、入力音声パ
ターンの認識結果が正解であった場合、もしくは、ユー
ザ入力により入力音声パターンに正解ラベルが付された
場合、入力音声パターン保持手段６に保持されている直
前に入力された正解音声パターンとの照合を行い、スコ
アを計算する。In FIG. 1, the voice input from the voice input means 1 is analyzed by the acoustic analysis means 2 to extract the features, which are given to the voice pattern matching means 3. The voice pattern matching unit 3 matches the input voice pattern analyzed by the acoustic analysis unit 2 with each standard voice pattern previously analyzed and registered in the dictionary 4, and the recognition result of the input voice pattern is correct. If the correct answer label is attached to the input voice pattern by the user input, the matching is performed with the correct voice pattern input immediately before being held in the input voice pattern holding means 6, and the score is calculated.

【００２１】認識結果判定手段５は音声パターン照合手
段３が出力するスコアに基づき入力音声を判定し、認識
結果を出力する。また、認識結果が不正解であった場合
には、ユーザ入力手段９からのユーザ入力に基づき、入
力音声パターンに正解ラベルを付与する。照合結果判定
手段８は、認識結果が正解であった入力音声パターン、
もしくは、認識時にユーザ入力手段９により正解ラベル
が付与された入力音声パターンについて、辞書４の正解
標準音声パターンと入力音声パターンとの照合結果と、
入力音声パターン保持手段６に保持されている正解ラベ
ルが付与された音声パターンと入力音声パターンとの照
合結果とを比較し、どちらのスコアが高いかを判定す
る。The recognition result judging means 5 judges the input voice based on the score outputted by the voice pattern matching means 3 and outputs the recognition result. When the recognition result is incorrect, the correct answer label is given to the input voice pattern based on the user input from the user input means 9. The collation result judging means 8 inputs the input voice pattern whose recognition result is correct,
Alternatively, with respect to the input voice pattern to which the correct answer label is given by the user input means 9 at the time of recognition, the matching result between the correct standard voice pattern of the dictionary 4 and the input voice pattern,
The comparison result of the correct voice label-added voice pattern held in the input voice pattern holding means 6 and the input voice pattern is compared to determine which score is higher.

【００２２】辞書更新手段７は照合結果判定手段８によ
り判定した結果、入力音声パターン保持手段６に保持さ
れている音声パターンと入力音声パターンとの照合結果
の方が、辞書４と入力音声パターンの照合結果のスコア
より高い場合に、辞書４からスコアの最低な標準音声パ
ターンを削除し、入力音声パターン保持手段６に保持さ
れている音声パターン、もしくは、入力音声パターンを
辞書４に登録することにより標準音声パターンの入れ換
えを行う。As a result of the dictionary updating means 7 judging by the matching result judging means 8, the matching result between the voice pattern held in the input voice pattern holding means 6 and the input voice pattern is the dictionary 4 and the input voice pattern. When the score is higher than the score of the matching result, the standard voice pattern having the lowest score is deleted from the dictionary 4 and the voice pattern held in the input voice pattern holding means 6 or the input voice pattern is registered in the dictionary 4. Replace the standard voice pattern.

【００２３】以上のように、本発明の請求項１ないし請
求項５の発明においては、予め用意されている辞書とは
別に、一時的な辞書として機能する入力音声を保持する
入力音声パターン保持手段６を設け、入力音声パターン
による辞書４の更新の際、２度目に同じ音声パターンが
入力されたときに、認識結果が正解であった入力音声パ
ターン、もしくは、ユーザ入力手段９により正解ラベル
が付与された入力音声パターンと、辞書４に登録された
正解標準音声パターン、および、入力音声パターン保持
手段６に一時的に保持されている正解ラベルが付与され
た音声パターンとの各々に関して照合を行い、それらの
照合結果のうち、入力音声パターンと入力音声パターン
保持手段６に保持された音声パターン間のスコアの方が
高い場合に、辞書４を更新している。As described above, according to the first to fifth aspects of the present invention, the input voice pattern holding means for holding the input voice functioning as a temporary dictionary is provided in addition to the dictionary prepared in advance. 6 is provided, and when updating the dictionary 4 with the input voice pattern, when the same voice pattern is input a second time, the input voice pattern whose recognition result is correct or the correct label is given by the user input means 9. The input voice pattern that has been registered, the correct answer standard voice pattern registered in the dictionary 4, and the voice pattern to which the correct answer label is temporarily stored in the input voice pattern holding means 6 are collated, If the score between the input voice pattern and the voice pattern held in the input voice pattern holding means 6 is higher among those matching results, the dictionary It is updating.

【００２４】したがって、過去の作成された辞書４の正
解標準音声パターンよりも、入力音声パターン保持手段
６に保持された直前に入力された正解音声パターンの方
がスコアが高くなることが当然期待され、話者の発声状
態の変動に対応して辞書４の更新を行うことができる。
ここで、直前に入力された入力音声パターン保持手段６
に保持されている音声パターンが、背景雑音や発声の不
具合で歪んだものになった場合には辞書４の更新は避け
なければならないが、本発明の請求項１ないし請求項５
の発明においては、上記のように正解入力音声パターン
を、辞書４に登録された正解標準音声パターンと、入力
音声パターン保持手段６に一時的に保持されている正解
音声パターンとの各々に関して照合を行い、それらの照
合結果のうち、入力音声パターンと入力音声パターン保
持手段６に保持された音声パターン間のスコアの方が高
い場合に、辞書４を更新しているので、２度目に発声さ
れた音声に不具合がない限り、入力音声パターン保持手
段６に保持された歪んだ音声パターンよりも辞書４に登
録された標準音声パターンの方がスコアが高いと判定さ
れ、歪んだ音声パターンによる辞書４の更新を避けるこ
とができる。Therefore, it is naturally expected that the correct answer voice pattern input immediately before being held in the input voice pattern holding means 6 will have a higher score than the correct answer standard voice pattern of the dictionary 4 created in the past. The dictionary 4 can be updated in response to changes in the speaking state of the speaker.
Here, the input voice pattern holding means 6 input immediately before
If the voice pattern held in the table is distorted due to background noise or a utterance defect, updating of the dictionary 4 must be avoided.
In the invention, the correct answer input voice pattern is collated with each of the correct answer standard voice pattern registered in the dictionary 4 and the correct answer voice pattern temporarily held in the input voice pattern holding means 6 as described above. If the score between the input voice pattern and the voice pattern held in the input voice pattern holding means 6 is higher among the matching results, the dictionary 4 is updated, so that the second voice is uttered. As long as there is no problem in the voice, it is determined that the standard voice pattern registered in the dictionary 4 has a higher score than the distorted voice pattern held in the input voice pattern holding means 6, and the dictionary 4 based on the distorted voice pattern You can avoid updating.

【００２５】さらに、本発明の請求項２の発明において
は、入力音声パターン保持手段６に保持された正解音声
パターンを辞書４に登録するかわりに、新たに入力され
た正解入力音声パターンを辞書４に登録するようにした
ので、請求項１の発明と同様の効果を得ることができる
とともに、次に発声された音声パターンを入力音声パタ
ーン保持手段６にそのまま保持することができ、処理工
数を減少させることができる。Further, in the second aspect of the present invention, instead of registering the correct answer voice pattern held in the input voice pattern holding means 6 in the dictionary 4, the newly input correct answer input voice pattern is used in the dictionary 4. Since the input voice pattern holding means 6 can hold the next voiced voice pattern as it is, it is possible to obtain the same effect as that of the invention of claim 1 and to reduce the number of processing steps. Can be made.

【００２６】またさらに、本発明の請求項３ないし５の
発明においては、新たな正解入力音声パターンと入力音
声パターン保持手段６に保持された正解音声パターンと
の間のスコアが、上記入力音声パターンと辞書４に登録
された標準音声パターンとの間のスコアより小さい場合
に、入力音声パターン保持手段６に保持された音声パタ
ーンか入力音声パターンのいずれか一方、もしくは両方
を削除するようにしたので、歪んだ音声パターンが入力
音声パターン保持手段６に保持されることがない。Furthermore, in the third to fifth aspects of the present invention, the score between the new correct input voice pattern and the correct voice pattern held in the input voice pattern holding means 6 is the input voice pattern. If the score is smaller than the score between the standard voice pattern registered in the dictionary 4 and the standard voice pattern registered in the dictionary 4, one or both of the voice pattern held in the input voice pattern holding means 6 and the input voice pattern are deleted. Therefore, the distorted voice pattern is not held in the input voice pattern holding means 6.

【００２７】[0027]

【実施例】図２は本発明の第１の実施例を示す図であ
り、同図において、１１はマイクロフォン等から入力さ
れる音声をデジタル信号に変換するＡＤ変換器、２１は
ＡＤ変換器１１によりデジタル信号に変換された音声信
号を分析し特徴パラメータ時系列ベクトルを抽出する音
響分析手段、３は音響分析手段２により分析された特徴
パラメータ時系列ベクトルと、予め分析され辞書に登録
されている各テンプレートとの照合を行う音声パターン
照合手段であり、音声パターン照合手段３は第２の入力
音声パターン・バッファ６２に保持されている音声パタ
ーンがある場合には、この音声パターンとも照合を行
う。FIG. 2 is a diagram showing a first embodiment of the present invention. In FIG. 2, 11 is an AD converter for converting voice input from a microphone or the like into a digital signal, and 21 is an AD converter 11. The acoustic analysis means 3 for analyzing the voice signal converted into the digital signal by means of extracting the characteristic parameter time-series vector and the characteristic parameter time-series vector 3 analyzed by the acoustic analysis means 2 are pre-analyzed and registered in the dictionary. This is a voice pattern matching means for performing matching with each template, and the voice pattern matching means 3 also performs matching with this voice pattern if there is a voice pattern held in the second input voice pattern buffer 62.

【００２８】また、４は予め認識対象が分析されたパタ
ーンを登録した辞書、５は音声パターン照合手段３から
出力された各テンプレートに対するスコアに基づき認識
結果を出力する認識結果判定手段、５１は音声パターン
照合手段３が出力するスコアをソートするスコア・ソー
ト手段、５２はユーザからの入力にしたがって正しい認
識結果を選択して最終的な認識結果を選択する認識結果
選択手段である。Further, 4 is a dictionary in which patterns in which recognition targets have been analyzed are registered in advance, 5 is recognition result determining means for outputting a recognition result based on the score for each template output from the voice pattern matching means 3, and 51 is a voice. The score sorting means for sorting the scores output by the pattern matching means 3, and the recognition result selecting means 52 for selecting the correct recognition result according to the input from the user and selecting the final recognition result.

【００２９】６１は音響分析手段２により分析された正
解入力音声パターンを一時的に保持しておく第１の入力
音声パターン保持手段、６２は第１の入力音声パターン
保持手段に保持された正解入力音声パターンを一時的に
保持しておく第２の入力音声パターン保持手段であり、
第１および第２の入力音声パターン保持手段は、それぞ
れ、少なくとも入力音声のラベルの数に対応した数のバ
ッファを備え、第１および第２の入力音声パターン保持
手段６１，６２には認識結果判定手段より与えられる正
解時の音声のラベルもしくはユーザ入力よりあたえられ
る音声ラベルが付与され入力音声パターンが保持され
る。Reference numeral 61 is a first input voice pattern holding means for temporarily holding the correct input voice pattern analyzed by the acoustic analysis means 2, and 62 is a correct input input held in the first input voice pattern holding means. Second input voice pattern holding means for temporarily holding a voice pattern,
Each of the first and second input voice pattern holding means includes at least buffers corresponding to the number of labels of the input voice, and the first and second input voice pattern holding means 61 and 62 each have a recognition result judgment. The input voice pattern is held by adding a voice label at the time of a correct answer given by the means or a voice label given by a user input.

【００３０】７は入力音声パターンを辞書４に登録した
り、辞書４のテンプレートを削除する辞書更新手段、７
１は第２の入力音声パターン保持手段６２に保持された
音声パターンを辞書４に登録する音声パターン登録手
段、７２は辞書４からスコアが最低の音声パターンを削
除する音声パターン削除手段、８は入力音声の正解テン
プレートについて、音声パターン照合手段３から出力さ
れた辞書４側と第２の入力音声パターン保持手段６２側
のスコアを比較し、何方のスコアが高いかを判定する照
合結果判定手段である。Reference numeral 7 is a dictionary updating means for registering the input voice pattern in the dictionary 4 or deleting the template of the dictionary 4,
Reference numeral 1 is a voice pattern registration means for registering the voice pattern held in the second input voice pattern holding means 62 in the dictionary 4, 72 is a voice pattern deleting means for deleting the voice pattern having the lowest score from the dictionary 4, and 8 is an input. It is a collation result determination unit that compares the scores of the dictionary 4 side output from the voice pattern matching unit 3 and the second input voice pattern holding unit 62 side with respect to the correct answer template of the voice and determines which score is higher. .

【００３１】次に図２に示す本発明の第１の実施例の動
作を説明する。音声はマイクロフォン等からＡＤ変換部
１１に入力され、デジタル信号に変換されて、離散化さ
れた信号データとして音響分析手段２１に送られる。音
響分析手段２１は、上記離散化された信号データから、
例えば５ｍｓｅｃ〜２５ｍｓｅｃの一定の時間ごとに、
音声の特徴パラメータ時系列ベクトルを抽出する。Next, the operation of the first embodiment of the present invention shown in FIG. 2 will be described. The voice is input to the AD conversion unit 11 from a microphone or the like, converted into a digital signal, and sent to the acoustic analysis unit 21 as discretized signal data. The acoustic analysis means 21 uses, from the discretized signal data,
For example, at fixed time intervals of 5 msec to 25 msec,
A feature parameter time series vector of voice is extracted.

【００３２】音声の特徴パラメータ時系列ベクトルの抽
出手法としては、下記の手法などが知られている。複数のフィルタ・バンクにより、異なる周波数帯域
でのスペクトルを抽出するもの。ＦＦＴ（高速フーリェ変換）を行ったのち、複数の
チャンネルに分割したスぺクトル・パワー時系列を求め
るもの。線型予測分析（ＬＰＣ）を行ない、その係数時系列
を求めるもの。ＦＦＴ（高速フーリェ変換）や線型予測分析（ＬＰ
Ｃ）を用いて、ケプストラム（ｃｅｐｓｔｒｕｍ）係数
時系列を求めるもの。The following method is known as a method of extracting the time series vector of the characteristic parameter of the voice. A filter bank that extracts spectra in different frequency bands. After performing FFT (Fast Fourier Transform), it calculates the spectrum power time series divided into multiple channels. A linear predictive analysis (LPC) is performed to find the coefficient time series. FFT (Fast Fourier Transform) and linear prediction analysis (LP
C) is used to obtain a cepstrum coefficient time series.

【００３３】音響分析手段２により抽出された特徴パラ
メータ時系列ベクトルは、音声パターン照合手段３に出
力され、予め辞書４に登録されているテンプレートと照
合される。音声パターン照合手段３における照合の手法
としては、一般的に用いられているＤＰマッチングなど
を用いることができ、音声パターン照合手段３は照合結
果として２パターン間のスコアを求め、認識結果判定手
段５に出力する。The characteristic parameter time series vector extracted by the acoustic analysis means 2 is output to the voice pattern matching means 3 and matched with the template registered in the dictionary 4 in advance. As a matching method in the voice pattern matching means 3, a generally used DP matching or the like can be used. The voice pattern matching means 3 obtains a score between two patterns as a matching result, and the recognition result judging means 5 Output to.

【００３４】認識結果判定手段５のスコア・ソート手段
５１は音声パターン照合手段３で計算されたスコアに基
づきソーティングを行い、スコアが高い順にソートす
る。認識結果選択手段５２は、ユーザの入力に従って正
しい結果を選択し、最終結果を出力する。一方、音響分
析手段２で分析された音声パターンの内、認識結果が正
解であった音声パターン、もしくは、ユーザ入力により
正解ラベルが付与された音声パターンは、音声パターン
照合手段３に出力されるとともに、第１の入力音声パタ
ーン・バッファ６１にも出力される。そして、次に新た
な音声が入力されると、音響分析手段２で分析された正
解音声パターンは第１の入力音声パターン・バッファ６
１により保持され、第１の入力音声パターン・バッファ
６１に保持されていた正解音声パターンは第２の入力音
声パターン・バッファ６２に出力されそこで保持され
る。つまり、本実施例における第１の入力音声パターン
・バッファ６１は入力音声パターンのためのバッファと
して機能し、また第２の入力音声パターン・バッファ６
２は一次的な辞書として機能する。The score sorting means 51 of the recognition result determining means 5 performs sorting based on the scores calculated by the voice pattern matching means 3 and sorts the scores in descending order. The recognition result selection means 52 selects the correct result according to the user's input and outputs the final result. On the other hand, among the voice patterns analyzed by the acoustic analysis means 2, the voice pattern whose recognition result is correct or the voice pattern to which the correct answer label is given by the user input is output to the voice pattern matching means 3. , Is also output to the first input voice pattern buffer 61. Then, when a new voice is input next, the correct voice pattern analyzed by the acoustic analysis unit 2 is the first input voice pattern buffer 6
The correct answer voice pattern held by 1 and held in the first input voice pattern buffer 61 is output to the second input voice pattern buffer 62 and held there. That is, the first input voice pattern buffer 61 in this embodiment functions as a buffer for the input voice pattern, and the second input voice pattern buffer 6 is used.
2 functions as a primary dictionary.

【００３５】また、第１および第２の入力音声パターン
・バッファ６１，６２に音声パターンが保持される際、
前記したように、認識結果判定手段５による判定結果が
正解のときの正解ラベル、もしくは、認識結果が不正解
であったときにユーザが入力する正解ラベルが付与され
る。そして、音声パターン照合手段３は、正解ラベルが
付与された入力音声パターンと予め辞書４に登録されて
いる正解テンプレートとを照合するとともに、正解ラベ
ルが付与された入力音声パターンと、辞書として機能す
る入力音声パターン・バッファ６２に保持されている直
前に入力された正解ラベルが付与された音声パターンと
を照合しスコアを求める。When voice patterns are held in the first and second input voice pattern buffers 61 and 62,
As described above, the correct answer label is given when the judgment result by the recognition result judging means 5 is correct, or the correct answer label input by the user when the recognition result is incorrect. Then, the voice pattern matching unit 3 matches the input voice pattern to which the correct answer label is attached with the correct answer template registered in the dictionary 4 in advance, and also functions as the dictionary to the input voice pattern to which the correct answer label is attached. The score is obtained by collating with the voice pattern to which the correct answer label input just before is stored in the input voice pattern buffer 62.

【００３６】照合結果判定手段８は、認識結果が正解で
あった入力音声パターン、もしくは、認識時にユーザ入
力手段９により正解ラベルが付与された入力音声パター
ンについて、辞書４の正解テンプレートと入力音声パタ
ーンとの照合結果と、入力音声パターン保持手段６に保
持されている正解ラベルが付与された音声パターンと入
力音声パターンとの照合結果とを比較し、どちらのスコ
アが高いかを判定する。The matching result judging means 8 determines the correct answer template of the dictionary 4 and the input speech pattern for the input speech pattern whose recognition result is the correct answer or the input speech pattern to which the correct answer label is given by the user input means 9 at the time of recognition. The result of collation is compared with the result of collation between the voice pattern with the correct label and the input voice pattern held in the input voice pattern holding means 6 to determine which score is higher.

【００３７】辞書更新手段７は照合結果判定手段８によ
り判定した結果、入力音声パターン・バッファ６２に保
持されている正解音声パターンと正解入力音声パターン
との照合結果の方が、辞書４と正解入力音声パターンの
照合結果のスコアより高い場合に、辞書４からスコアの
最低のテンプレートを削除し、入力音声パターン・バッ
ファ６２に保持されている正解音声パターンを辞書４に
登録することによりテンプレートの入れ換えを行う。す
なわち、辞書更新手段７の音声パターン削除手段７２に
より辞書４のテンプレートを削除し、音声パターン登録
手段７１により入力音声パターン・バッファ６２に保持
されている正解音声パターンを辞書に登録する。As a result of the dictionary updating means 7 judging by the matching result judging means 8, the result of matching between the correct answer voice pattern held in the input voice pattern buffer 62 and the correct answer input voice pattern is the dictionary 4 and the correct answer input. When the score is higher than the score of the matching result of the voice pattern, the template with the lowest score is deleted from the dictionary 4, and the correct voice pattern held in the input voice pattern buffer 62 is registered in the dictionary 4 to replace the template. To do. That is, the voice pattern deleting means 72 of the dictionary updating means 7 deletes the template of the dictionary 4, and the voice pattern registering means 71 registers the correct voice pattern held in the input voice pattern buffer 62 in the dictionary.

【００３８】以上説明したように、本実施例において
は、入力音声による辞書の更新の際、正解ラベルが付与
された入力音声パターンと辞書に登録されている正解テ
ンプレートとの照合結果と、第２の入力音声パターン保
持手段に保持されている直前に入力された正解音声パタ
ーンとの照合結果を比較し、比較結果に応じて第２の入
力音声パターン保持手段に保持されている直前に入力さ
れた音声パターンにより辞書を更新するので、話者の発
声状態の変動に対処して辞書を更新することができる。As described above, in the present embodiment, when the dictionary is updated by the input voice, the result of collation between the input voice pattern with the correct answer label and the correct answer template registered in the dictionary, and the second The comparison result with the correct answer voice pattern input immediately before being held in the input voice pattern holding means is compared, and it is input just before being held in the second input voice pattern holding means in accordance with the comparison result. Since the dictionary is updated by the voice pattern, it is possible to update the dictionary by coping with the change in the speaking state of the speaker.

【００３９】また、偶然に歪んだ音声パターンが辞書に
登録されることを防止することができる。すなわち、２
度目に発声された音声に不具合がない限り、入力音声パ
ターン・バッファ６２に保持された歪んだ音声パターン
よりも辞書４に登録されたテンプレートの方がスコアが
高くなり、歪んだ音声パターンによる辞書４の更新を避
けることができる。It is also possible to prevent accidentally distorted voice patterns from being registered in the dictionary. Ie 2
As long as there is no problem in the voice uttered the second time, the template registered in the dictionary 4 has a higher score than the distorted voice pattern held in the input voice pattern buffer 62, and the dictionary 4 with the distorted voice pattern You can avoid updating.

【００４０】この点について、前記した図８、図９によ
り説明する。図９（ａ）において、前記した黒四角の入
力サンプルが一度目の発声で、一時的に入力音声パター
ン保持手段６２に保持されていたとする。ここで、２度
目に入力された音声の発声は問題がなくカテゴリＣ’内
であった場合、２度目に入力された入力音声パターンと
辞書４との照合結果と、入力音声パターン保持手段６２
に保持された上記サンプルの音声パターンとの照合結果
では、辞書側のスコアの方が当然高くなり、辞書４を上
記のような歪んだ音声パターンで更新することを避ける
ことができる。This point will be described with reference to FIGS. 8 and 9 described above. In FIG. 9A, it is assumed that the black square input sample is temporarily held in the input voice pattern holding means 62 by the first utterance. Here, if the utterance of the voice input the second time is within the category C ′ without any problem, the collation result of the input voice pattern input the second time and the dictionary 4, and the input voice pattern holding unit 62.
In the collation result with the voice pattern of the sample held in, the score on the dictionary side naturally becomes higher, and it is possible to avoid updating the dictionary 4 with the distorted voice pattern as described above.

【００４１】また、図９（ｂ）において、前記した黒丸
の入力サンプルが一度目の発声で、一時的に入力音声パ
ターン保持手段６２に保持されていたとする。ここで、
２度目に入力された音声の発声は問題がなくカテゴリ
Ｂ’内であった場合、辞書作成時のカテゴリ境界により
判定すると更新は行われないが、２度目に入力された入
力音声パターンと辞書４とのスコアと、入力音声パター
ン保持手段６に保持された上記サンプルの音声パターン
とのスコアを計算することにより、辞書４を更新できる
場合がでてくる。Further, in FIG. 9B, it is assumed that the above-mentioned black circle input sample is temporarily held in the input voice pattern holding means 62 by the first utterance. here,
If the utterance of the voice input the second time is within the category B ′ without any problem, the update is not performed if judged by the category boundary at the time of dictionary creation, but the input voice pattern input the second time and the dictionary 4 In some cases, the dictionary 4 can be updated by calculating the scores of the above-mentioned sample and the sample voice pattern held in the input voice pattern holding means 6.

【００４２】すなわち、一度目に入力された音声パター
ンとカテゴリＢの最も遠いサンプル間のスコアが、一度
目に入力された音声パターンと２度目に入力された音声
パターン間のスコアより小さい場合に、辞書４が更新さ
れる。図３は本発明の第２の実施例を示す図である。本
実施例の構成は基本的には図２に示した第１の実施例と
同一であり、本実施例と第１の実施例とは、第１および
第２の入力音声パターン・バッファ６１，６２と辞書更
新手段７との接続が異なっている。That is, when the score between the voice pattern input at the first time and the furthest sample of category B is smaller than the score between the voice pattern input at the first time and the voice pattern input at the second time, The dictionary 4 is updated. FIG. 3 is a diagram showing a second embodiment of the present invention. The configuration of this embodiment is basically the same as that of the first embodiment shown in FIG. 2, and this embodiment and the first embodiment are the same as those of the first and second input voice pattern buffers 61, The connection between 62 and the dictionary updating means 7 is different.

【００４３】すなわち、第１の実施例においては、第２
の入力音声パターン・バッファ６２に保持された正解ラ
ベルが付与された音声パターンを辞書更新手段７に与え
ているのに対し、第２の実施例においては、第１の入力
音声パターン・バッファ６１に保持された正解ラベルが
付与された音声パターンを辞書更新手段７に与えてい
る。That is, in the first embodiment, the second
While the input speech pattern buffer 62 holds the speech pattern to which the correct answer label is added to the dictionary updating means 7, in the second embodiment, the first input speech pattern buffer 61 The voice pattern to which the held correct answer label is added is given to the dictionary updating means 7.

【００４４】したがって、図３の実施例においては、入
力音声パターン・バッファ６２に保持されている正解音
声パターンと正解入力音声パターンとの照合結果の方
が、辞書４と正解入力音声パターンの照合結果のスコア
より高い場合に、辞書４からスコアの最低のテンプレー
トを削除し、第１の入力音声パターン・バッファ６１に
保持されている音声パターンにより辞書４の更新を行
う。Therefore, in the embodiment of FIG. 3, the matching result between the correct answer speech pattern held in the input speech pattern buffer 62 and the correct answer input speech pattern is the matching result between the dictionary 4 and the correct answer input speech pattern. When the score is higher than the score, the template with the lowest score is deleted from the dictionary 4, and the dictionary 4 is updated with the voice pattern held in the first input voice pattern buffer 61.

【００４５】本実施例においては、第１の実施例と同様
の効果を得ることができるとともに、第１の実施例のも
のより、処理工数を少なくすることができる。すなわ
ち、本実施例においては、第１の入力音声パターン・バ
ッファ６１に保持されている音声パターンにより辞書４
を更新しているので、辞書４を更新することにより、第
１の入力音声パターン・バッファ６１の内容は空とな
り、次に発声された正解音声パターンを第１の入力音声
パターン・バッファ６１にそのまま保持することができ
る。In this embodiment, the same effect as that of the first embodiment can be obtained, and the number of processing steps can be reduced as compared with the first embodiment. That is, in the present embodiment, the dictionary 4 is defined by the voice pattern held in the first input voice pattern buffer 61.
Since the contents of the first input voice pattern buffer 61 are emptied by updating the dictionary 4, the correct uttered correct voice pattern is directly stored in the first input voice pattern buffer 61. Can be held.

【００４６】図４は本発明の第３の実施例を示すフロー
チャートであり、本実施例の構成は図２に示した第１の
実施例と同一である。本実施例は、正解入力音声パター
ンと辞書４の類似度より正解入力音声パターンと第２の
入力音声パターン・バッファ６２の類似度の方が大きい
場合には、第２の入力音声パターン・バッファ６２の正
解特徴パターンを辞書４に登録し、また、正解入力音声
パターンと第２の入力音声パターン・バッファ６２の類
似度の方が小さい場合には、第２の入力音声パターン・
バッファ６２の正解特徴パターンを削除して、第１の入
力音声パターン・バッファ６１の正解特徴パターンを第
２の入力音声パターン・バッファ６２に登録するように
したものである。FIG. 4 is a flow chart showing the third embodiment of the present invention, and the configuration of this embodiment is the same as that of the first embodiment shown in FIG. In this embodiment, when the similarity between the correct input speech pattern and the second input speech pattern buffer 62 is larger than the similarity between the correct input speech pattern and the dictionary 4, the second input speech pattern buffer 62 Of the correct input feature pattern is registered in the dictionary 4, and when the similarity between the correct input voice pattern and the second input voice pattern buffer 62 is smaller, the second input voice pattern
The correct feature pattern of the buffer 62 is deleted, and the correct feature pattern of the first input voice pattern buffer 61 is registered in the second input voice pattern buffer 62.

【００４７】本実施例においては、辞書４側の類似度が
第２の入力音声パターン・バッファ６２側の類似度より
大きい場合に、第２の入力音声パターン・バッファ６２
に保持されている特徴パターンを削除しているので、入
力音声が歪み（変形し）、第２の入力音声パターン・バ
ッファ６２に歪んだ入力音声の特徴パターンが保持され
た場合に、その特徴パターンを削除することができる。In the present embodiment, when the similarity on the dictionary 4 side is higher than the similarity on the second input voice pattern buffer 62 side, the second input voice pattern buffer 62 is detected.
Since the feature pattern held in the input voice is deleted (deformed), and the feature pattern of the distorted input voice is held in the second input voice pattern buffer 62, the feature pattern Can be deleted.

【００４８】なお、上記実施例においては、照合による
スコアとして類似度を用いているが、スコアとして距離
を用いてもよい。その場合には、図４に示す不等号の向
きは逆向きとなる。図５は本発明の第４の実施例を示す
フローチャートであり、本実施例の構成は図２に示した
第１の実施例と同一である。In the above embodiment, the similarity is used as the score by the matching, but the distance may be used as the score. In that case, the direction of the inequality sign shown in FIG. 4 is opposite. FIG. 5 is a flow chart showing a fourth embodiment of the present invention, and the configuration of this embodiment is the same as that of the first embodiment shown in FIG.

【００４９】本実施例は、正解入力音声パターンと辞書
４の類似度より正解入力音声パターンと第２の入力音声
パターン・バッファ６２の類似度の方が大きい場合に
は、第２の入力音声パターン・バッファ６２の正解特徴
パターンを辞書４に登録し、また、正解入力音声パター
ンと第２の入力音声パターン・バッファ６２の類似度の
方が小さい場合には、第１の入力音声パターン・バッフ
ァ６１の正解特徴パターンを削除するようにしたもので
ある。In the present embodiment, if the similarity between the correct input voice pattern and the second input voice pattern buffer 62 is higher than the similarity between the correct input voice pattern and the dictionary 4, the second input voice pattern is obtained. When the correct answer feature pattern in the buffer 62 is registered in the dictionary 4 and the similarity between the correct answer input voice pattern and the second input voice pattern buffer 62 is smaller, the first input voice pattern buffer 61 The correct feature pattern of is deleted.

【００５０】本実施例においては、辞書４側の類似度が
第２の入力音声パターン・バッファ６２側の類似度より
大きい場合に、第１の入力音声パターン・バッファ６１
に保持されている特徴パターンを削除しているので、入
力音声が歪み（変形し）、第１の入力音声パターン・バ
ッファ６１に歪んだ入力音声の特徴パターンが保持され
た場合に、その特徴パターンを削除することができる。In this embodiment, when the similarity on the dictionary 4 side is higher than the similarity on the second input voice pattern buffer 62 side, the first input voice pattern buffer 61 is generated.
Since the feature pattern held in the input voice is deleted (deformed), and the feature pattern of the distorted input voice is held in the first input voice pattern buffer 61, the feature pattern Can be deleted.

【００５１】なお、上記実施例においては、照合による
スコアとして類似度を用いているが、第３の実施例と同
様、スコアとして距離を用いてもよい。その場合には、
図５に示す不等号の向きは逆向きとなる。図６は本発明
の第５の実施例を示すフローチャートであり、本実施例
の構成は図２に示した第１の実施例と同一である。In the above embodiment, the similarity is used as the score by the collation, but the distance may be used as the score as in the third embodiment. In that case,
The direction of the inequality sign shown in FIG. 5 is opposite. FIG. 6 is a flow chart showing a fifth embodiment of the present invention, and the configuration of this embodiment is the same as that of the first embodiment shown in FIG.

【００５２】本実施例は、正解入力音声パターンと辞書
４の類似度より正解入力音声パターンと第２の入力音声
パターン・バッファ６２の類似度の方が大きい場合に
は、第２の入力音声パターン・バッファ６２の正解特徴
パターンを辞書４に登録し、また、正解入力音声パター
ンと第２の入力音声パターン・バッファ６２の類似度の
方が小さい場合には、第１および第２の入力音声パター
ン・バッファ６１，６２の正解特徴パターンを削除する
ようにしたものである。In this embodiment, when the similarity between the correct input voice pattern and the second input voice pattern buffer 62 is larger than the similarity between the correct input voice pattern and the dictionary 4, the second input voice pattern is obtained. The correct answer feature pattern of the buffer 62 is registered in the dictionary 4, and when the similarity between the correct answer input speech pattern and the second input speech pattern buffer 62 is smaller, the first and second input speech patterns The correct answer feature patterns in the buffers 61 and 62 are deleted.

【００５３】本実施例においては、辞書４側の類似度が
第２の入力音声パターン・バッファ６２側の類似度より
大きい場合に、第１および第２の入力音声パターン・バ
ッファ６１に保持されている正解特徴パターンを削除し
ているので、入力音声が歪み（変形し）、第１および第
２の入力音声パターン・バッファ６１，６２のいずれか
に歪んだ入力音声の特徴パターンが保持された場合に、
その特徴パターンを削除することができる。In the present embodiment, when the similarity on the dictionary 4 side is higher than the similarity on the second input voice pattern buffer 62 side, it is held in the first and second input voice pattern buffers 61. Since the correct answer feature pattern is deleted, the input voice is distorted (deformed), and the distorted input voice feature pattern is held in one of the first and second input voice pattern buffers 61 and 62. To
The characteristic pattern can be deleted.

【００５４】なお、上記実施例においては、照合による
スコアとして類似度を用いているが、第３の実施例と同
様、スコアとして距離を用いてもよい。その場合には、
図６に示す不等号の向きは逆向きとなる。In the above embodiment, the similarity is used as the score by the collation, but the distance may be used as the score as in the third embodiment. In that case,
The direction of the inequality sign shown in FIG. 6 is opposite.

【００５５】[0055]

【発明の効果】以上説明したように、本発明において
は、入力音声パターンによる辞書の更新の際、２度目に
同じ音声パターンが入力されたときに、認識結果か正解
であった入力音声パターン、もしくは、ユーザ入力手段
により正解ラベルが付与された入力音声パターンと、辞
書４に登録された正解標準音声パターン、および、入力
音声パターン保持手段に一時的に保持されている正解ラ
ベルが付与された音声パターンとの各々に関して照合を
行い、それらの照合結果のうち、入力音声パターンと入
力音声パターン保持手段に保持された音声パターン間の
スコアの方が高い場合に、辞書４を更新しているので、
話者の発声状態が時間の経過とともに変化しても、認識
を行う度に辞書の標準音声パターンを新しいものに更新
していくことができるとともに、声質や発声状況が変化
する前の認識基準に制限されることなく辞書の更新を行
うことができ、話者の発声状態に対応した高い認識率を
得ることができる。As described above, according to the present invention, when the dictionary is updated by the input voice pattern, the input voice pattern which is the recognition result or the correct answer when the same voice pattern is input the second time, Alternatively, an input voice pattern given a correct answer label by the user input means, a correct answer standard voice pattern registered in the dictionary 4, and a voice given the correct answer label temporarily held in the input voice pattern holding means. When the score between the input voice pattern and the voice pattern held in the input voice pattern holding means is higher among the check results, the dictionary 4 is updated.
Even if the speaker's utterance changes over time, the standard voice pattern in the dictionary can be updated to a new one each time recognition is performed, and it can be used as a recognition standard before the voice quality or utterance changes. The dictionary can be updated without restriction, and a high recognition rate corresponding to the speaking state of the speaker can be obtained.

【００５６】また、偶然に歪んだ音声パターンが辞書に
登録されることを避けることができる。It is also possible to avoid accidentally distorted voice patterns from being registered in the dictionary.

[Brief description of drawings]

【図１】本発明の原理説明図である。FIG. 1 is a diagram illustrating the principle of the present invention.

【図２】本発明の第１の実施例を示す図である。FIG. 2 is a diagram showing a first embodiment of the present invention.

【図３】本発明の第２の実施例を示す図である。FIG. 3 is a diagram showing a second embodiment of the present invention.

【図４】本発明の第３の実施例を示す図である。FIG. 4 is a diagram showing a third embodiment of the present invention.

【図５】本発明の第４の実施例を示す図である。FIG. 5 is a diagram showing a fourth embodiment of the present invention.

【図６】本発明の第５の実施例を示す図である。FIG. 6 is a diagram showing a fifth embodiment of the present invention.

【図７】従来例を示す図である。FIG. 7 is a diagram showing a conventional example.

【図８】辞書作成時のカテゴリの分布を示す図である。FIG. 8 is a diagram showing distribution of categories when creating a dictionary.

【図９】声質の変動により分布が変動した場合のカテゴ
リ分布を示す図である。FIG. 9 is a diagram showing a category distribution in the case where the distribution changes due to a change in voice quality.

[Explanation of symbols]

１音声入力手段１１ＡＤ変換器２，２１音響分析手段３音声パターン照合手段４辞書５認識結果判定手段５１スコア・ソート手段５２認識結果選択手段６，６１，６２入力音声パターン保持手段７辞書更新手段７１音声パターン登録手段７２音声パターン削除手段８照合結果判定手段９ユーザ入力手段 1 voice input means 11 AD converter 2, 21 acoustic analysis means 3 voice pattern matching means 4 dictionary 5 recognition result judging means 51 score sorting means 52 recognition result selecting means 6, 61, 62 input voice pattern holding means 7 dictionary updating means 71 voice pattern registration means 72 voice pattern deletion means 8 collation result determination means 9 user input means

Claims

[Claims]

1. An acoustic analysis means (2) for acoustically analyzing an unknown input speech input by the speech input means (1), and an input speech pattern obtained by the acoustic analysis means (2).
A voice pattern matching means (3) for matching a standard voice pattern corresponding to each label registered in the dictionary (4) in advance, and a recognition result judging means (5) for obtaining a recognition result based on the matching result, In a voice recognition device provided with a user input means (9) for giving a correct answer label to an input voice pattern and a dictionary updating means (7) for updating the dictionary (4) with the input voice pattern, the input voice pattern is temporarily An input voice pattern holding means (6) for holding the input voice pattern for which the recognition result is the correct answer, or
The voice pattern matching means for the input voice pattern to which the correct answer label is given by the user input means (9) at the time of recognition.
In (3), the input voice pattern is compared with each correct answer standard voice pattern registered in the dictionary (4), and the correct answer with the same label as the above input voice pattern is attached to the input voice pattern holding means (6). When the voice pattern is held, the input voice pattern is collated with the voice pattern held in the input voice pattern holding means (6), and the input voice pattern and the input voice pattern holding means (6) are compared.
If the similarity of the voice pattern held in is larger than the similarity between the input voice pattern and the standard voice pattern registered in the dictionary (4), the input voice pattern holding means (6)
Register the voice pattern held in to the dictionary (4) and
A dictionary updating method in a voice recognition device, characterized in that the standard voice pattern of (4) is deleted.

2. An acoustic analysis means (2) for acoustically analyzing an unknown input speech input by the speech input means (1), and an input speech pattern obtained by the acoustic analysis means (2).
A voice pattern matching means (3) for matching a standard voice pattern corresponding to each label registered in the dictionary (4) in advance, and a recognition result judging means (5) for obtaining a recognition result based on the matching result, In a voice recognition device provided with a user input means (9) for giving a correct answer label to an input voice pattern and a dictionary updating means (7) for updating the dictionary (4) with the input voice pattern, the input voice pattern is temporarily The input voice pattern holding means (6) for holding the input voice pattern is provided in the voice pattern matching means (3), or the input voice pattern whose recognition result is correct in the voice pattern matching means (3) or the input voice labeled by the user input means at the time of recognition. Regarding the pattern, the input voice pattern is compared with each correct answer standard voice pattern registered in the dictionary (4), and the input voice pattern is stored in the input voice pattern holding means (6). When the correct answer voice pattern with the same label is held, the input voice pattern and the voice pattern held in the input voice pattern holding means (6) are collated, and the input voice pattern and the input voice pattern are held. Means (6)
If the similarity of the voice pattern held in is larger than the similarity between the above input voice pattern and the standard voice pattern registered in the dictionary (4), the newly input input voice pattern is stored in the dictionary (4). A dictionary update method for a voice recognition device, characterized by registering and deleting the standard voice pattern of the dictionary (4).

3. The similarity between the new input voice pattern and the voice pattern held in the input voice pattern holding means (6) is the similarity between the input voice pattern and the standard voice pattern registered in the dictionary (4). If it is smaller, the voice pattern held in the input voice pattern holding means (6) is deleted and the input voice pattern is held in the input voice pattern holding means (6).
The dictionary updating method in the voice recognition device according to claim 1 or 2, characterized in that

4. The similarity between the new input voice pattern and the voice pattern held in the input voice pattern holding means (6) is the similarity between the input voice pattern and the standard voice pattern registered in the dictionary (4). When it is smaller, the voice pattern held in the input voice pattern holding means (6) is left as it is, and the input voice pattern is deleted, and the dictionary updating system in the voice recognition apparatus according to claim 1 or 2. .

5. The similarity between the new input voice pattern and the voice pattern held in the input voice pattern holding means (6) is the similarity between the input voice pattern and the standard voice pattern registered in the dictionary (4). The dictionary updating method in the voice recognition device according to claim 1 or 2, characterized in that the voice pattern and the input voice pattern held in the input voice pattern holding means (6) are deleted when they are smaller.