JPS63229496A

JPS63229496A - Pattern updating system for voice recognition

Info

Publication number: JPS63229496A
Application number: JP62063405A
Authority: JP
Inventors: 沢井　秀文
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1987-03-18
Filing date: 1987-03-18
Publication date: 1988-09-26

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】止１本発明は、音声認識用パターン更新方式、より詳細には
、不特定話者の単語音声を認識する音声認識装置におけ
る標準パターンの更新方式に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition pattern update method, and more particularly to a standard pattern update method in a speech recognition device that recognizes word speech of an unspecified speaker.

０」従来、不特定話者を対象とした単語音声認識装置におい
て、認識率の向上を図るための話者適応化の手法の一つ
として１発声者のパターンを用いて予め（２録されてい
る標準パターンの更新を行なう方式があるが、更新後の
パターンが不良であったりすると、かえって認識率の低
下を招く場合や。Conventionally, in word speech recognition devices targeting unspecified speakers, one method of speaker adaptation in order to improve the recognition rate is to use the pattern of one speaker in advance (2 recorded). There is a method of updating a standard pattern, but if the updated pattern is defective, it may actually reduce the recognition rate.

誤認識が発生する度にパターンを更新することは煩雑で
あり、使い易い認識装置をユーザーに提供する観点から
は問題点が多い。It is cumbersome to update the pattern every time an erroneous recognition occurs, and there are many problems from the viewpoint of providing the user with an easy-to-use recognition device.

目　　　　　了り本発明は、上述のごとき実情に鑑みてなされたもので、
音声認識装置、特に、不特定話者の発声する単語音声認
識装置において、予め登録格納されている認識対象語の
標準パターンに対し、認識装置を使用する話者の発声パ
ターンを用いて、予め格納されている不特定話者用の標
準パターンを改変することなく追加、更新等の操作を施
すことにより認識装置の性能を使用者に適応させて認識
率を向上させることを目的としてなされたものである。The present invention was made in view of the above-mentioned circumstances.
A speech recognition device, especially a speech recognition device for words uttered by an unspecified speaker, uses the utterance pattern of the speaker using the recognition device to pre-store standard patterns of recognition target words that are registered and stored in advance. This was done with the aim of adapting the performance of the recognition device to the user and improving the recognition rate by adding, updating, etc. without modifying the standard pattern for non-specific speakers. be.

構成本発明は、上記目的を達成するために、不特定話者の単
語音声を認識する音声Ｌ！！識装置において。Structure In order to achieve the above object, the present invention provides a voice L! that recognizes word sounds of unspecified speakers. ! In the cognitive device.

予め登録されている不特定話者用の単語標準パターンに
対し、ある話者が発声した単語が一定回数以上誤認識を
起こした場合には、誤認識単語に対応する登録されてい
る標準パターンを改変することなく認識対象からはずし
、新たに話者の発声パターンを登録追加すること、或い
は、認識対象となる単語セット中の類似単語を予めリス
トアップしておき、話者が認識モードに入る前に、類似
昨語を全て発声して不特定話者用の標準パターンに追加
登録しておくことを特徴としたものである。If a word uttered by a certain speaker is misrecognized more than a certain number of times against pre-registered word standard patterns for unspecified speakers, the registered standard pattern corresponding to the misrecognized word is used. Remove the speech pattern from the recognition target without changing it and add a new speaker's utterance pattern to the recognition target, or list similar words in the recognition target word set in advance before the speaker enters recognition mode. The feature is that all similar last words are uttered and added to the standard pattern for unspecified speakers.

以下、本発明の実施例に基づいて説明する。Hereinafter, the present invention will be explained based on examples.

第１図は、本発明の一実施例を説明するためのブロック
線図で、図中、１は音声入力部、２は特徴抽出部、３は
登＠／認識モード切り換え部、４は標準パターン格納部
、５は単語音声認識部、６は認識結果出力部で、音声入
力部１より入力された単語音声は、特徴抽出部２でバン
ドパスフィルタ群による分析やＬＰＧ分析を経て、特徴
パラメータの時系列に変換される。モード切り換え部３
ではスイッチＳＷが登録モード３１にあるときは、単語
標準パターンの登録を行ない、標準パターン登録部４に
格納する。また、スイッチＳＷが認識モード３２にある
ときには、標準パターン格納部４より標準パターンを引
用し、単語認識部５にて未知入力音声とのパターンマツ
チングを行ない、最も類似したパターンから１つ選択し
１Ｍ！！Ｐ識結果出力部６にて出力する。FIG. 1 is a block diagram for explaining one embodiment of the present invention, in which 1 is a voice input section, 2 is a feature extraction section, 3 is an entry/recognition mode switching section, and 4 is a standard pattern 5 is a storage unit, 5 is a word speech recognition unit, and 6 is a recognition result output unit.The word speech input from the speech input unit 1 is analyzed by a group of bandpass filters and LPG analysis in a feature extraction unit 2, and then extracted with feature parameters. Converted to time series. Mode switching section 3
Now, when the switch SW is in the registration mode 31, a word standard pattern is registered and stored in the standard pattern registration section 4. When the switch SW is in the recognition mode 32, the standard pattern is quoted from the standard pattern storage section 4, and the word recognition section 5 performs pattern matching with the unknown input voice to select one of the most similar patterns. 1M! ! The P recognition result output section 6 outputs the result.

第２図は、本発明における標準パターンの更新方法を説
明するためのブロック図で、図中、７は誤認識カウント
部、８は更新モード切り換え判定部、４１は更新パター
ン格納部、Ｃは更新モード切り換え制御（１号で、その
他、第１図に示した実施例と同様の作用をする部分には
第１図の場合と同一の参照番号が付しである。第２図に
おいて。FIG. 2 is a block diagram for explaining the standard pattern updating method according to the present invention. In the figure, 7 is an erroneous recognition counting section, 8 is an update mode switching judgment section, 41 is an update pattern storage section, and C is an update section. Mode switching control (No. 1) Other parts having the same functions as those in the embodiment shown in FIG. 1 are given the same reference numerals as in FIG. 1. In FIG.

モード切り換え部３のスイッチＳＷが認識モード３２に
ある時は、音声入力部１より入力された未知人力音声は
、特微分析部２を経て、単語認識部５で認識が行なわれ
るが、その際、誤認識が生じた場合に、発声パターンが
不良の場合と不特定話者用の標準パターン４が不良の場
合の２通りが考えられる。しかし、同一単語を複数回発
声しても依然として誤認識を生ずる場合には発声パター
ンの不良と考えるよりも、標準パターン自体が話者の発
声パターンと大きく異なっていると考える方が妥当であ
る。このような観点から、ある一定回数以上、同−単語
が誤った場合には、モード切り換え部３への制御信号Ｃ
を送信し、スイッチＳＷを更新モード３１に切り換えて
パターンの更新を行なう、その際、従来在る不特定話者
用の標準パターンを消去するのではなく、新たに話者の
パターンを追加登録する。そして、再びスイッチＳＷを
認識モード３２に切り換えた時には、更新パターン４１
を優先的にアクセスし、更新パターンに対応する不特定
話者用標準パターンをアクセスしない措置をとる。これ
により、不特定用のパターンを改変することなく話者に
適用した標準パターンの更新を行なうことが可能となる
。When the switch SW of the mode switching unit 3 is in the recognition mode 32, the unknown human voice input from the voice input unit 1 passes through the feature analysis unit 2 and is recognized by the word recognition unit 5. When erroneous recognition occurs, there are two possible cases: a case where the utterance pattern is defective and a case where the standard pattern 4 for unspecified speakers is defective. However, if erroneous recognition still occurs even after uttering the same word multiple times, it is more reasonable to consider that the standard pattern itself is significantly different from the speaker's utterance pattern than to consider that the utterance pattern is defective. From this point of view, if the same word is incorrect more than a certain number of times, the control signal C to the mode switching unit 3 is
and change the switch SW to update mode 31 to update the pattern.In this case, instead of deleting the existing standard pattern for unspecified speakers, a new pattern for a new speaker is added and registered. . Then, when the switch SW is switched to the recognition mode 32 again, the update pattern 41
, and take measures not to access the standard pattern for unspecified speakers that corresponds to the update pattern. This makes it possible to update the standard pattern applied to the speaker without modifying the non-specific pattern.

第３図は、認識に先立ち、認識対象となる単語セット中
の類似単語を予めリストアップしておき、話者がそれら
の類似単語−欄表の音声を発声して不特定話者用の標準
パターンに追加登録する方式を説明するための図であり
、図中、９は標準パターン格納部４の認識対象単語−欄
から導かれる類似ｍ語一様表であり、これはディスプレ
イ１０を通して話者に表示できるようになっている。認
識モードに入る前に、話者はディスプレイ１，０に示さ
れる類似単語を発声し、標準パターン追加べ）緑部４１
に登録し、スイッチＳＷを認識モード３２に切り換えた
後、認識を行なう。その際、追加登録部４１の追加パタ
ーンを優先してアクセスすることは第２図に示した実施
例の場合と同様である。In Figure 3, prior to recognition, similar words in the word set to be recognized are listed in advance, and the speaker utters the sounds of those similar words - column table to create a standard for unspecified speakers. This is a diagram for explaining the method of additionally registering a pattern. In the diagram, 9 is a similar m-word uniform table derived from the recognition target word column of the standard pattern storage unit 4. It is now possible to display. Before entering the recognition mode, the speaker should utter the similar words shown on displays 1 and 0 and add the standard pattern.
, and after switching the switch SW to recognition mode 32, recognition is performed. At this time, priority is given to accessing the additional patterns in the additional registration section 41, as in the embodiment shown in FIG.

第４図は、第３図の類似単語−欄表９の類似単語同士を
作成する場合の実施例を説明するための図であり１図中
、データ（ＤＥＥＴＡ）とデルタ（ＤＥＲＵＴＡ）　、
訂正（ＴＥＥＳＥＥ）と形成（ＫＥＥＳＥＩＥ）、こく
（ＫＯＫＵ）と６（１１０にＵ）は音韻の異なり数に基
づき、各々単語間距離を２．０．１．０．１．０と定義
する。このように、認識対象語中の単語間距離が一定値
（例えば２．０）以下となる単語の組をリストアップす
る方法が考えられる。FIG. 4 is a diagram for explaining an example of creating similar words in the similar words column table 9 in FIG. 3. In FIG. 1, data (DEETA), delta (DERUTA),
Correction (TEESEE), formation (KEESEIE), body (KOKU) and 6 (U in 110) are based on the number of different phonemes, and the distance between words is defined as 2.0.1.0.1.0, respectively. In this way, a method can be considered in which a list of word pairs in which the distance between words in the recognition target words is equal to or less than a certain value (for example, 2.0) is considered.

第５図は、第３図の類似単語−欄表９の類似単語−欄を
作成する他の実施例を説明するための図つまり単語間の
コンフユージヨンマトリックスの一例を示す図であり、
図中、認識対象単語の番号を１，２．・・・、　Ｎ　（
Ｎは単語数）とし、ｎ番目とｍ番目の単語間のコンフユ
ージヨンの値をｐ＜ｎｔｍ）とする。全く、単語間にコ
ンフユージヨンが無い場合には単位行列（対角成分が全
て１で他の成分が全てＯ）となるが、一般的には類似単
語同士のペアではＯ＜ｐ（ｎ、ｍ）＜１の値をとる。し
たがって、ｐ（ｎ、ｍ）の値がある閾値Ｔｈ以上となる
（ｐ（ｎ、ｍ）≧Ｔｈ）単語のペア（π、冨）（Ｅ　（
（ｎ　、ｍ）ｌ　ｐ　（ｎ　、ｍ）≧Ｔｈ）を全て選択
する方法も考えられる。FIG. 5 is a diagram for explaining another example of creating the similar word column of the similar word column table 9 in FIG. 3, that is, a diagram showing an example of a confusion matrix between words;
In the figure, the numbers of the words to be recognized are 1, 2, etc. ..., N (
N is the number of words), and the value of the fusion between the nth and mth words is p<ntm). If there is no confusion between words, the matrix becomes an identity matrix (all diagonal elements are 1 and all other elements are O), but in general, for pairs of similar words, O<p(n, m) takes a value of <1. Therefore, a pair of words (π, depth) (E (
A method of selecting all (n, m)l p (n, m)≧Th) can also be considered.

このように、簡単な手続きで標準パターンを改変するこ
となく、話者の音声を追加登録することができるので、
話者に適応した標準パターンを作成することができ、従
って、認識率を大幅に向上させることができる。また、
従来、標準パターンの更新時に存在したような特定の話
者のみにパターンが適応してしまう弊害を生ずることな
く、他の話者に対しても同様の手続きをふむことにより
、パターンを適応させ認識率を向上させることができる
。In this way, the speaker's voice can be additionally registered with a simple procedure without modifying the standard pattern.
A standard pattern adapted to the speaker can be created, and the recognition rate can therefore be greatly improved. Also,
The pattern can be adapted and recognized by applying the same procedure to other speakers, without causing the problem of patterns being adapted only to a specific speaker, which previously existed when updating standard patterns. rate can be improved.

効−１以上の説明から明らかなように、本発明によると、不特
定話者用の単語音声認識装置を使用する際に、話者の発
声パターンを用いて標準パターンを消去することなく、
パターンを追加更新し・、また類似単語を認識対象語中
から予めリストアップして話者の音声を追加登録してお
くことにより、元の不特定話者用の標準パターンを改変
することなく話者の発声パターンに適用させることがで
きる。換言すれば、特定の話者の使用に対しても予め登
録されている不特定話者の標準パターンに改変する必要
が無いため、標準パターンが特定ユーザーだけに適用し
て他のユーザーには不適応となる弊害を生ずることなく
、全ゆるユーザーに対して適用可能な標準パターンを提
供して認識を向上させることができる。Effect-1 As is clear from the above explanation, according to the present invention, when using a word speech recognition device for unspecified speakers, the speech pattern of the speaker can be used without erasing the standard pattern.
By adding and updating the pattern, listing similar words from among the words to be recognized in advance, and additionally registering the speaker's voice, it is possible to speak without changing the original standard pattern for non-specific speakers. It can be applied to a person's vocal pattern. In other words, there is no need to modify the pre-registered standard pattern for unspecified speakers for use by a specific speaker, so the standard pattern can be applied only to a specific user and not to other users. Recognition can be improved by providing a standard pattern that can be applied to all users without causing adaptational harm.

[Brief explanation of drawings]

第１図乃至第３図は、それぞれ本発明によるパターン更
新方式の実施例を説明するためのブロック線図、第４図
及び第５図は、それぞれ類似単語の一欄表を作成する作
成方法を説明するための図である。１・・・音声入力部、２・・・特徴抽出部、３・・・登
録／認識モード切り換え部、４・・・標準パターン格納
部。５・・・単語音声認識部、６・・・認識結果出力部、７
・・・誤認識カウント部、８・・・更新モード切り換え
判定部、９・・・類似単語−欄表、１０・・ディスプレ
ー。゛、２−FIGS. 1 to 3 are block diagrams for explaining an embodiment of the pattern update method according to the present invention, and FIGS. 4 and 5 each illustrate a method for creating a list of similar words. It is a figure for explaining. DESCRIPTION OF SYMBOLS 1... Voice input section, 2... Feature extraction section, 3... Registration/recognition mode switching section, 4... Standard pattern storage section. 5... Word speech recognition unit, 6... Recognition result output unit, 7
. . . Erroneous recognition count section, 8. Update mode switching determination section, 9. Similar words column table, 10. Display.゛, 2-

Claims

[Claims]

(1) In a speech recognition device that recognizes word sounds from unspecified speakers, words uttered by a certain speaker are incorrectly recognized more than a certain number of times against pre-registered word standard patterns for unspecified speakers. 1. A pattern updating method for speech recognition, characterized in that, in the case of a misrecognized word, a registered standard pattern corresponding to an erroneously recognized word is removed from the recognition target without altering it, and a new speaker's utterance pattern is registered and added.

(2) List similar words in the word set to be recognized in advance, and before the speaker enters recognition mode,
A speech recognition pattern update method characterized by uttering all similar words and additionally registering them in a standard pattern for unspecified speakers.

(3) As a method for listing similar words in a set of words to be recognized, a method is selected in which the distance between kana sequences or the distance between phoneme sequences in a word is less than a certain threshold. A speech recognition pattern update method according to claim (2).

(4) As a method for listing similar words in a set of words to be recognized, the values of the elements of the matrix are The speech recognition pattern updating method according to claim (2), characterized in that a word pair having a value equal to or greater than a certain threshold is selected.