JPH0261700A

JPH0261700A - Speech recognition device

Info

Publication number: JPH0261700A
Application number: JP63213405A
Authority: JP
Inventors: Takeshi Nishibe; 西部　毅; Seiko Ishikawa; 石川　せい子
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 1988-08-27
Filing date: 1988-08-27
Publication date: 1990-03-01

Abstract

PURPOSE:To prepare correct phoneme candidates by adding an another candidate to candidates by phonemes which are selected by a speech recognition part while referring a rule in a rule dictionary storage means. CONSTITUTION:A standard pattern storage means 3 is stored with standard patterns which are generated by phonemes and a speech is inputted by an input means 1; and a speech recognition part 2 compares an input speech with standard patterns and calculates the extent of matching with the standard patterns to select candidates by phonemes. Then a phoneme processing part 4 while referring to the rule dictionary storage means 5 where recognition rules derived from past speech recognition experience adds a candidate to the phoneme candidate by the phonemes which are selected by the speech recognition part 2. Thus, the phoneme candidates are increased by using a rule dictionary 5, so even if an error occurs in a speech recognizing process, a correct phoneme candidate can be prepared.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、音声ワープロなどに用いられる音声認識装置
に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech recognition device used in speech word processors and the like.

［従来の技術］音声認識装置において、音素単位で認識を行う場合、各
音素の認識結果の第１位候補を単につなげただけでは、
正しく語句を音声認識できる可能性は低い。なぜなら、
例えば、各音素の認識率が９５％であったとしても、そ
れから計算した音節の認識率は（０，９５）　２＝０．
９０となり、更に、４音節発生したとすると、全体の認
識率は、　（（０，９５）”　）’　＝０．６６となり
、かなり低いものとなってしまうからである。[Prior Art] When recognizing each phoneme in a speech recognition device, it is difficult to simply connect the first candidates of the recognition results for each phoneme.
It is unlikely that words can be recognized correctly. because,
For example, even if the recognition rate for each phoneme is 95%, the recognition rate for a syllable calculated from it is (0,95) 2 = 0.
90, and if 4 syllables were generated, the overall recognition rate would be ((0,95)")' = 0.66, which would be quite low.

そこで、従来各音素毎の候補を一つに紋らず、複数個出
力することによって、正解率をあげるようにしている。Therefore, conventional methods have been used to increase the accuracy rate by outputting multiple candidates for each phoneme instead of just one.

［発明が解決しようとする課題］しかしながら、そのように、各音素毎の候補を複数個出
力することにすると、その各候補同士の組合せ数は膨大
な数にのぼることになり、その後のかな漢字変換などの
処理に使う際、事実上処理時間が掛かりすぎるという課
題がある。[Problem to be solved by the invention] However, if multiple candidates for each phoneme are output in this way, the number of combinations of each candidate will increase to an enormous number, and the subsequent kana-kanji conversion. When used for processing such as, there is a problem that the processing time is actually too long.

更に、発生が連続音声の場合には、調音結合などの影響
により各音素毎に複数の候補をあげても、その中に正解
がない場合があり、また各音素の切り出し自体が正しく
できない場合もあるので、そのままでは正確なかな漢字
変換が出来ないという課題もある。Furthermore, in the case of continuous speech, even if multiple candidates are given for each phoneme due to effects such as articulatory combination, there may not be a correct answer among them, and each phoneme itself may not be correctly extracted. Therefore, there is also the problem that accurate kana-kanji conversion cannot be done as is.

本発明は、このような従来の音声認識装置の課題を解決
した音声認識装置を提供することを目的とする。An object of the present invention is to provide a speech recognition device that solves the problems of conventional speech recognition devices.

［課題を解決するための手段］請求項１の本発明は、音声を入力する入力手段と、音素
毎に作成された標準パターンを記憶した標準パターン記
憶手段と、前記入力手段から入力された音声と前記標準
パターン記憶手段に記憶された標準パターンとを比較し
、標準パターンとのマツチング度を計算し、音素毎の候
補を選び出す音声認識部と、過去の音声認識経験から導
かれた認識ルールを集めたルール辞書記憶手段と、その
ルール辞書記憶手段のルールを参照しながら、前記音声
認識部により選び出された音素毎の候補に、更に他の候
補を追加する音素処理部とを備えたものである。[Means for Solving the Problems] The present invention according to claim 1 provides an input means for inputting speech, a standard pattern storage means for storing a standard pattern created for each phoneme, and a method for inputting speech input from the input means. and a standard pattern stored in the standard pattern storage means, a speech recognition unit that calculates the degree of matching with the standard pattern, and selects candidates for each phoneme, and a recognition rule derived from past speech recognition experience. A device comprising: a collected rule dictionary storage means; and a phoneme processing section that adds other candidates to the candidates for each phoneme selected by the speech recognition section while referring to the rules in the rule dictionary storage means. It is.

請求項２の本発明は、音声を入力する入力手段と、音素
毎に作成された標準パターンを記憶した標準パターン記
憶手段と、前記入力手段から入力された音声と前記標準
パターン記憶手段に記憶された標準パターンとを比較し
、標準パターンとのマツチング度を計算し、音素毎の候
補を選び出す音声認識部と、その音声認識部により選び
出された候補を組合せ、その組み合わされた種々の単語
と予め用意された単語辞書の毛語とを比較し、単語辞書
内に、対応する単語がある場合はその単語を出力するク
ラスタリング部と、航記音声認識部で計算された少なく
とも音素毎のマツチング度に基づき、前記クラスタリン
グ部により選ばれた候補単語から更に適切な単語を選択
する単語ＤＰマツチング部とを備えたものである。The present invention according to claim 2 provides an input means for inputting a voice, a standard pattern storage means for storing a standard pattern created for each phoneme, and a method for storing the voice input from the input means and the standard pattern storage means. A speech recognition unit that compares the selected standard pattern with the standard pattern, calculates the degree of matching with the standard pattern, and selects candidates for each phoneme, and combines the candidates selected by the speech recognition unit, A clustering unit that compares Mao words in a word dictionary prepared in advance and outputs the corresponding word if there is a corresponding word in the word dictionary, and a matching degree for at least each phoneme calculated by a speech recognition unit. and a word DP matching unit that selects a more appropriate word from the candidate words selected by the clustering unit based on the above.

［作用］請求項１の本発明は、標準パターン記憶手段に予め音素
毎に作成された標準パターンを記憶し、入力手段によっ
て音声を入力し、音声認識部によって、入力された音声
と前記標準パターンとを比較して標準パターンとのマツ
チング度を計算して音素毎の候補を選び出し、音素処理
部によって、過去の音声認識経験から導かれた認識ルー
ルを集めたルール辞書記憶手段を参照しながら、前記音
声認識部により選び出された音素毎の候補に、更に他の
候補を特徴する請求項２の本発明は、標準パターン記憶手段に予め音素
毎に作成された標準パターンを記憶し、入力手段によっ
て音声を入力し、音声認識部によって、入力された音声
と前記標準パターンとを比較して標準パターンとのマツ
チング度を計算し、音素毎の候補を選び出し、クラスタ
リング部によって、音声認識部により選び出された候補
を組合せ、その組み合わされた種々の単語と予め用意さ
れた単語辞書の単語とを比較し、一致する場合は、その
単語を出力し、単語ＤＰマツチング部によって、前記音
声認識部で計算された少なくとも音素毎マツチング度に
基づき、前記クラスタリング部により選ばれた候補単語
から更に適切な単語を選択する。[Operation] The present invention according to claim 1 stores a standard pattern created in advance for each phoneme in a standard pattern storage means, inputs a voice through an input means, and combines the input voice and the standard pattern with a voice recognition unit. The phoneme processor selects candidates for each phoneme by calculating the degree of matching with the standard pattern, and uses the phoneme processing unit to refer to a rule dictionary storage means that collects recognition rules derived from past speech recognition experience. The present invention according to claim 2, wherein the candidates for each phoneme selected by the speech recognition section further include other candidates, the standard pattern storage means stores a standard pattern created in advance for each phoneme, and the input means The voice recognition unit compares the input voice with the standard pattern to calculate the degree of matching with the standard pattern, selects candidates for each phoneme, and the clustering unit selects candidates for each phoneme. The candidates are combined, and the combined various words are compared with words in a word dictionary prepared in advance. If they match, the word is output, and the word DP matching unit outputs the word and the speech recognition unit Based on the calculated degree of matching for each phoneme, a more appropriate word is selected from the candidate words selected by the clustering unit.

［実施例］以下に、本発明をその実施例を示す図面に基づいて説明
する。[Examples] The present invention will be described below based on drawings showing examples thereof.

第１図は、本発明にかかる音声認識装置の一実施例を示
すブロック図である。FIG. 1 is a block diagram showing an embodiment of a speech recognition device according to the present invention.

標準パターン記憶手段３は、各種音素の標準パターン波
形を記憶したＲＯＭ　（読み取り専用メモリ）等の手段
である。The standard pattern storage means 3 is a means such as a ROM (read-only memory) that stores standard pattern waveforms of various phonemes.

音声認識部２は、入力手段１のマイクから入力された音
声と、前記標準パターン記憶手段３の標準パターンを比
較して照合し、音素毎に標準パターンとのマツチング度
（距離）を計算し、マツチング度の高いものを音素候補
とする手段である。The speech recognition unit 2 compares and matches the speech input from the microphone of the input means 1 with the standard pattern of the standard pattern storage means 3, calculates the degree of matching (distance) with the standard pattern for each phoneme, This is a means of selecting phoneme candidates with a high degree of matching.

例えば、子音については第４位まで、母音については第
１位又は第２位までを候補とする。更に、母音について
は曖昧さを示すフラグも候補に付す。For example, the candidates are up to 4th place for consonants, and up to 1st or 2nd place for vowels. Furthermore, for vowels, a flag indicating ambiguity is also attached to the candidates.

ルール辞書記憶手段５は、過去の音声認識における分析
経験から得られたルールを記憶したＲＯＭである。The rule dictionary storage means 5 is a ROM that stores rules obtained from past analysis experience in speech recognition.

音素処理部４は、前記ルール辞書記憶手段５の各種ルー
ルを参照しながら、前記音声認識部２によって選ばれた
音素毎の候補に、他の候補を必要に応じて追加する手段
である。The phoneme processing unit 4 is a means for adding other candidates to the candidates for each phoneme selected by the speech recognition unit 2, as necessary, while referring to various rules in the rule dictionary storage unit 5.

クラスタリング部６は、前記音素処理部４で選択された
音素毎の候補について、互いに組み合わせる。更にその
組み合わせた各音素候補列（単語）が、後述するかな漢
字変換辞書から抽出された単語辞１１（単語辞書記憶手
段７に記憶されている）の中の単語に該当するかどうか
を調べ、該当する単語がある場合は、その単語を音素候
補列として出力する手段である。The clustering section 6 combines the candidates for each phoneme selected by the phoneme processing section 4 with each other. Furthermore, it is checked whether each of the combined phoneme candidate strings (words) corresponds to a word in the word dictionary 11 (stored in the word dictionary storage means 7) extracted from the Kana-Kanji conversion dictionary, which will be described later. If there is a word to be used, this means outputs that word as a phoneme candidate string.

単語間ＤＰ部８は、前記母音候補についている曖昧フラ
グの状態と、前記標準パターンのマツチング度（距離）
とを用いて、前記クラスタリング部から出力された候補
単語のマツチング度を計算する手段である。そして、マ
ツチング度が高いと判断された単語は、次のかな漢字変
換部９に渡される。The word-to-word DP unit 8 determines the state of the ambiguity flag attached to the vowel candidate and the matching degree (distance) of the standard pattern.
This means calculates the degree of matching of the candidate words output from the clustering unit using the above. Then, words determined to have a high degree of matching are passed to the next kana-kanji converter 9.

かな漢字変換部９は、かな漢字変換辞書を用いて、単語
間ＤＰ部８から送られてきたかなについて、かな漢字変
換を行うと共に、入力音声に対する出力結果としての妥
当性（コスト）の計算を行い、妥当性の高いと判断され
たものを出力する手段である。The kana-kanji conversion unit 9 uses the kana-kanji conversion dictionary to perform kana-kanji conversion on the kana sent from the word-to-word DP unit 8, and calculates the validity (cost) of the output result for the input voice. This is a means of outputting those that are judged to have high quality.

次に１、Ｌ記実施例の動作を、　「こうばの」という文
節を音声入力した場合を例にとって説明する。Next, the operation of the embodiment 1.L will be explained by taking as an example the case where the phrase "Kobano" is input by voice.

第２図は、入力手段ｌから「こうばの」という文節を音
声入力した場合、前記音声認識部２から出力されたもの
である。　「こうばの」の「こう」の部分は実際にはｒ
ｋｏ−ＪとｒＯＪ母音を長く延ばして発音しているので
連母音フラグ「４」がつけられている。即ち、その曖昧
フラグ「４」の意味は、第３図に示す通り、連母音の可
能性ありという意味である。また、　「の」の部分のｒ
ｎＪ子音は、　「ン」と誤認識されることがあるため、
この場合は、　「の」ｌ音節外が［零Ｎｇｏ］の様に２
音節分に分かれて認識されている。しかし、この［零Ｎ
］については曖昧フラグ「２」がつけられている。その
意味は、付加の可能性があるという意味である。FIG. 2 shows what is output from the speech recognition section 2 when the phrase "Kobano" is input by voice from the input means 1. The “Kou” part of “Kobano” is actually r
Since the ko-J and rOJ vowels are pronounced with a long pronunciation, the continuous vowel flag "4" is attached. That is, the meaning of the ambiguity flag "4" is that there is a possibility of continuous vowels, as shown in FIG. Also, r in the “no” part
The nJ consonant is sometimes misrecognized as "n", so
In this case, the outside of the l syllable of “no” is 2 as in [zero Ngo].
It is recognized in syllables. However, this [zero N
] is marked with an ambiguity flag "2". Its meaning is that there is a possibility of addition.

第２図に示す音素候補の表記法の意味は次の通りである
。The meaning of the notation of phoneme candidates shown in FIG. 2 is as follows.

即ち、音素候補は、日本語をローマ字表示した場合の各
子音・母音の表記に原則として準じている。但し、　「
ン」については、大文字「Ｎ」で示し、またア行及び「
ン」の子音部分に当たる箇所には「木」が記入されてい
る。That is, the phoneme candidates basically conform to the notation of each consonant and vowel when Japanese is displayed in Roman letters. however, "
"N" is indicated by a capital letter "N", and "A" and "A" are indicated by a capital letter "N".
``木'' is written in the place corresponding to the consonant part of ``n''.

音素処理部４は、第２図に示すような音素毎の候補を入
力し、ルール辞書に基づいて、処理する。The phoneme processing unit 4 receives candidates for each phoneme as shown in FIG. 2 and processes them based on the rule dictionary.

第４図は、その結果を示すものである。過去の音声分析
経験からみて、連母音フラグのついたｒＯＪ音にはｒｏ
＊ｕ」、　「０本０」という候補が追加され、また、付
加フラグのついたｒＪＪ］音と次の母音候補から考えて
、音声認識では２音節分と判断されている部分に対して
、　ｒ　ｍ　ｏ　］、　ｒｎＯＪというｌ音節外の候補
が追加される。このほか、子音は、連続発声した場合、
前後の母音の影響を受けて変化し易く、４つの候補の中
に、正解が入っていない場合もあるので、図のように子
音候補の追加も行う。FIG. 4 shows the results. Based on past speech analysis experience, rOJ sounds with continuous vowel flags have ro
*u" and "0 0 0" are added, and considering the rJJ] sound with the addition flag and the next vowel candidate, for the part that is judged to be two syllables in speech recognition, r m o ], rnOJ are added as candidates outside the l syllable. In addition, when consonants are uttered continuously,
Since it is easy to change due to the influence of the preceding and following vowels, and there may be cases where the correct answer is not among the four candidates, consonant candidates are also added as shown in the figure.

音声認識部２により出力された第２図の音素候補と、音
素処理部４により出力された第４図の音素候補とを比較
すると、第２図の候補は、どのように組み合わせたとし
ても、入力された「こうばの」の正しいかな列は得られ
ない。しかし、第４図の候補は、組合せ次第では「こう
ばの」が得られうろことになる。Comparing the phoneme candidates in FIG. 2 output by the speech recognition unit 2 and the phoneme candidates in FIG. 4 output by the phoneme processing unit 4, the candidates in FIG. I cannot get the correct kana string for the input "Kobano". However, depending on the combination of the candidates in Figure 4, ``Kobano'' may be obtained.

第５図は、前記クラスタリング部６において、音素処理
部４から出力された音素候補について、あらゆる組合せ
を作り、単語辞書中に含まれるものを捜す。同図に於て
、抽出された単語の左側に記された数字は、その組合せ
の仕方を示すもので、音声認識部２により得られた結果
の同音節目から同音節目に当たる部分から抽出した単語
であることを示すためのものである。本実施例において
は、音声入力は文節毎に行われるので、入力音声中の単
語数は一定ではなく、又文節中で同音節目から単語が始
まるのかを限定できるものでもない。このため、クラス
タリング部６は、文節中の単語数、及び各単語の文字数
及び各単語の位置については制限を与えずに、単語を抽
出するようにしている。FIG. 5 shows that the clustering unit 6 creates all possible combinations of phoneme candidates output from the phoneme processing unit 4 and searches for those included in the word dictionary. In the figure, the numbers written to the left of the extracted words indicate the way they are combined, and the words are extracted from the same syllable to the same syllable of the result obtained by the speech recognition unit 2. It is meant to show that something is true. In this embodiment, since voice input is performed for each phrase, the number of words in the input voice is not constant, and it is not possible to limit whether words start from the same syllable within a phrase. For this reason, the clustering unit 6 extracts words without limiting the number of words in a clause, the number of characters in each word, and the position of each word.

その左側に示された数字は、単語間ＤＰ部８及びかな漢
字変換部９において、使用される。The numbers shown on the left side are used in the word-to-word DP section 8 and the kana-kanji conversion section 9.

第６図は、単語間ＤＰ部８でのＤＰＰマツチング様子を
、クラスタリング部６から候冨として出力される第５図
に示す種々の言葉のうち、　「こうば」という単語を例
に取って図示したものである。FIG. 6 illustrates the DPP matching in the word-to-word DP section 8, taking as an example the word "Koba" among the various words shown in FIG. 5 output as candidates from the clustering section 6. This is what I did.

「こうば」は、認識結果の第１．２音節目（第５図の１
−２参照）に相当する部分から出て来た一つのｍ語候補
であるので、ＤＰＰマツチング使用するマツチング度も
この部分のマツチング度を用いる。即ち、前記音声認識
部２より渡された音素毎の各標準パターンとのマツチン
グ度（距離）を第６図に示すＤＰマツチング用子テーブ
ル各空欄に入れ、ＤＰＰマツチング行う。同様のＤＰＰ
マツチング、第！、２音節の他の単語候補「こうぼ」、
　「こな」、　「こま」　・・・についても行う。“Koba” is the 1st and 2nd syllable of the recognition result (1 in Figure 5).
Since this is one m-word candidate that has come out of the part corresponding to (see -2), the matching degree of this part is used as the matching degree used in DPP matching. That is, the degree of matching (distance) between each phoneme and each standard pattern passed from the speech recognition unit 2 is entered into each blank column of the DP matching child table shown in FIG. 6, and DPP matching is performed. Similar DPP
Matching, No. , another two-syllable word candidate "kobo",
``Kona'', ``Koma'', etc. will also be explained.

そして、得られたマツチング度の最も高い単語候補をそ
の１−２音節に関する代表とする。この場合は「こうば
」となる。Then, the obtained word candidate with the highest degree of matching is taken as a representative for that 1-2 syllable. In this case, it would be "Koba".

このようにして、　１１音節、１−２音節、１３音節、
・・・　２−２音節、２−３音節・・・についてそれぞ
れ最大５個までの代表晰語候補を選ぶ。In this way, 11 syllables, 1-2 syllables, 13 syllables,
... Select up to 5 representative lucid word candidates for each of 2-2 syllables, 2-3 syllables, etc.

なお、母音に曖昧フラグがついている場合は、前記ＤＰ
マツチング用子テーブル各空欄に上述したマツチング度
を入れず、それに代えて、特殊な値を入れる場合がある
。− 第７図及び第８図はその特殊処理の流れを示すフローチ
ャートである。Note that if the vowel has an ambiguity flag, the DP
In some cases, the above-mentioned matching degree is not entered in each blank column of the child table for matching, and instead, a special value is entered. - Figures 7 and 8 are flowcharts showing the flow of the special processing.

第７図は、脱落フラグと付加フラグと連母音フラグの曖
昧フラグ位置の吠況を判断するためのフローチャートで
ある。FIG. 7 is a flowchart for determining the barking status of the ambiguous flag positions of the dropout flag, addition flag, and continuous vowel flag.

例えば、脱落フラグｒｌＪの処理を例にとって説明する
。For example, the processing of the omission flag rlJ will be explained as an example.

そもそも脱落フラグ「１」は、次のような場合に付加さ
れる。In the first place, the omission flag "1" is added in the following cases.

第９図（ａ）に示すように、例えば「ふそく」と発声し
た場合の音声波形がマイク１から入力されたとする。As shown in FIG. 9(a), it is assumed that, for example, a voice waveform when uttering "fusoku" is input from the microphone 1.

音声認識部２においては、第９図（ｂ）に示す様に、先
ず音声区間（Ａ）を検出し、次に母音区間（Ｂ）を検出
し、その母音を認識する。入力された「ふそく」につい
ては、　「ふ」の”Ｕ′が母音として検出できなかった
とする。In the speech recognition section 2, as shown in FIG. 9(b), first a speech section (A) is detected, then a vowel section (B) is detected, and the vowel is recognized. Assume that for the input "Fusoku", "U' of "Fu" could not be detected as a vowel.

次に、母音区間（Ｂ）の残りの区間を子音（Ｃ）とし、
子音認識を行う。そのようにして母音、子音候補が、第
９図（Ｃ）に示すようにあげられる（第２図参照）。と
ころで、子音区間（Ｃ）のうち、☆部分の区間は一つの
子音としては長すぎる。Next, the remaining section of the vowel section (B) is set as a consonant (C),
Perform consonant recognition. In this way, vowel and consonant candidates are listed as shown in FIG. 9(C) (see FIG. 2). By the way, in the consonant section (C), the section marked with ☆ is too long as one consonant.

そこで、間に母音があるのかも知れないので、脱落フラ
グ「１」をオンとする。脱落区間の子音をＣ１母音を■
で示す。また、この区間の子音は、前半は”ｈ′で、後
半は′ｓ′であるため、子音候補として、この両方がで
る可能性がある。Therefore, since there may be a vowel in between, the omission flag "1" is turned on. Change the consonant in the dropped section to C1 vowel■
Indicated by Furthermore, since the consonants in this section are "h" in the first half and "s" in the second half, both of these may appear as consonant candidates.

このようにして、脱落フラグ「１」が付加された音声認
識結果について、単語間ＤＰ部８により、第７図に示す
ように、脱落フラグ「ｌ」がオンであるので（ステップ
Ｓｌ）、次に「フラグの位置に文字が無い可能性」を調
べる（ステップＳ２）。In this way, regarding the speech recognition result to which the omission flag "1" has been added, the inter-word DP section 8 determines that the omission flag "l" is on (step Sl) as shown in FIG. ``The possibility that there is no character at the flag position'' is checked (step S2).

即ち、候補となるかな文字列を作り出すときに、脱落フ
ラグの文字位置に、文字がある場合と、ない場合の両方
について処理を行う。従って、　「ふそく」に対する候
補としては、２文字のものと３文字のものが上がる。候
補が２文字のときは、脱落フラグの文字位置に文字はな
いと判断して処理したときなので、判断はＹＥＳとなり
、ｃｈｋに１が代入される（ステップＳ３）。That is, when creating a kana character string to be a candidate, processing is performed both when there is a character at the character position of the omission flag and when there is no character. Therefore, the candidates for "Fusoku" are two-letter and three-letter ones. When the number of candidates is two characters, this means that the processing is performed with the determination that there is no character at the character position of the omission flag, so the determination is YES and 1 is assigned to chk (step S3).

「次の文字の子音は、この位置の子音に対するもの」　
（ステップＳ４）では、「ふそく」の第１文字の正しい
認識結果′ｈ゛は、脱落フラグの次の文字に現れる可能
性がある。このような場合、辞書引きによフて得られた
単語候補の第１文字目の子音が、第２文字目の子音の音
声認識結果の中で上位にくる。このときこの判断はＹＥ
Ｓとなり（ステップＳ４）、ｃ　ｈ　ｋには２が代入さ
れる（ステップＳ５）。"The consonant of the next letter is for the consonant in this position."
In (step S4), there is a possibility that the correct recognition result 'h' of the first character of "Fusoku" appears in the next character of the omission flag. In such a case, the consonant of the first letter of the word candidate obtained by dictionary lookup is ranked high among the speech recognition results of the consonant of the second letter. At this time, this judgment is YES
S (step S4), and 2 is assigned to ch k (step S5).

「フラグの位置は促音の可能性」　（ステップＳ６）で
は、促音は無音区間であり、母音と母音の間にかなり長
い時間間隔があるので、脱落フラグがつくこともある（
促音は専用のフラグを持っているが、脱落として判断さ
れる場合もある）。ルール辞書には、この事実も登録さ
れているので脱落フラグの位置に当たるところに、　「
ツ」が入った単語も候補としてあがる。小さい「ツ」の
後の子音は限られている（１（、ＳＳ　　ｔ、　　ｐ）
ので、脱落フラグのついている次の文字の候補にこれら
の子音があり、フラグの位置に対する文字が小さい「ツ
」のとき、この判断はＹＥＳとなり（ステップＳ６）、
ｃｈｋには３が代入される（ステップＳ７）。"The position of the flag may be a consonant" (step S6), since the consonant is a silent section and there is a fairly long time interval between vowels, a dropout flag may be attached (
Consonants have their own flags, but they may be omitted and judged). This fact is also registered in the rule dictionary, so "
Words that contain "tsu" are also suggested. The number of consonants after the small “tsu” is limited (1 (, SS t, p)
Therefore, if these consonants are candidates for the next character with the omission flag, and the character corresponding to the flag position is a small "tsu", the determination is YES (step S6).
3 is assigned to chk (step S7).

このようにして、付加フラグや連母音フラグについても
、その位置の状況が判断され、その結果がｃｈｋに代入
される。In this way, the positional status of the additional flag and continuous vowel flag is also determined, and the result is substituted into chk.

第８図は、このようにして、フラグ位置の状況が判断さ
れた結果ｃｈｋを利用して、ＤＰマツチングテーブルの
各欄を補正する様子を示す。FIG. 8 shows how each column of the DP matching table is corrected using chk, which is the result of determining the state of the flag position in this manner.

同図において、脱落フラグｒｌＪがオンであるので（ス
テップＳｔ）、次にステップ８２〜Ｓ５において、ｃ　
ｈ　ｋの内容がチエツクされ、各々に応じた前記ＤＰマ
ツチングテーブルの補正が行われる（ステップ８６〜Ｓ
９）。In the same figure, since the dropping flag rlJ is on (step St), next in steps 82 to S5, c
The contents of hk are checked, and the DP matching table is corrected accordingly (steps 86 to S
9).

なお、ステップＳ６において、子、母は、辞書よりの単
語の子音、母音を意味し、Ｃ９■は、脱落フラグの位置
の子音、母音である。Ｃ，Ｖはフラグのついていない認
識結果であり、Ｃは子音、■は母音である。１．０１０
．５は、この処理によって入れられるマツチング度の値
である。１．０又は０．５の一値が入れられる。In step S6, child and mother mean the consonant and vowel of the word from the dictionary, and C9■ is the consonant and vowel at the position of the omission flag. C and V are recognition results without flags, C is a consonant, and ■ is a vowel. 1.010
．． 5 is the matching degree value entered by this process. A single value of 1.0 or 0.5 is entered.

ステップＳ７において、Ｃ−子１は、脱落フラグの次の
文字の認識結果の中の子１に対するもの。Ｃ−子２は、
脱落フラグ用の子音のスコアの中の子２に対するものく
脱落位置については認識結果がないので、予め数（ｌσ
を子音、母音とも用意しておく）。In step S7, C-Child 1 is for Child 1 among the recognition results of the character next to the omission flag. C-child 2 is
Since there is no recognition result regarding the consonant dropout position for child 2 in the consonant score for the dropout flag, the number (lσ
(prepare both consonants and vowels).

ステップＳ８において、子、母は、小さな「ツ」である
。In step S8, the child and mother are small "tsu".

ステップＳ９において、　（ｃ）、　　（ｖ）は、ＤＰ
マツチングには直接関与しない位置の子音、母音、　（
ｃ）−子は、次の文字の認ｌ結果の中の子に対するもの
である。太線は、実際のＤＰマツチングのときのマトリ
ックスの終端（単語候補が発声の途中までに対するもの
であるとき）を示す。In step S9, (c) and (v) are DP
Consonants and vowels in positions that are not directly involved in matching, (
c) -Child is for the child in the result of the next character's recognition. The thick line indicates the end of the matrix during actual DP matching (when the word candidate is for the middle of the utterance).

ステップＳＩＯにおいて、　（Ｃ）、　（Ｖ）は、ＤＰ
マツチングに直接関与しない位置の脱落フラグのついた
子音、母音、　（Ｃ）−子は、脱落フラグ用の子音のス
コアの中で子に対するものである。In step SIO, (C) and (V) are DP
Consonants and vowels with dropout flags in positions that are not directly involved in matching, (C)-Child are for children in the score of consonants for dropout flags.

太線は、実際のＤＰマツチングのときの始端（単語候補
が発声の途中からに対するものであるとき）である。The thick line is the starting point during actual DP matching (when the word candidate is from the middle of the utterance).

ステップＳｌｌにおいて、　「Ｖ付」とは、付加フラグ
オンの母音候補、１．３☆は、１．３をＤＰマツチング
テーブルに入れた後この前後の子音の認識結果から判断
した補正を行う意味である。In step Sll, "V attached" means a vowel candidate with an additional flag on, and 1.3☆ means that after putting 1.3 into the DP matching table, correction is performed based on the recognition results of the consonants before and after this. be.

ステップＳ１３において、ｃ　ｔ　、　　ｖ　ｌは、フ
ラグの付いていない認識結果である。In step S13, c t and v l are recognition results without flags.

ステップＳ１４において、■連とは、連母音フラグオン
の母音候補、ｖ　　ｒｎｂは、連鋳音用に用意した数値
の中から母゛に対するものを入れる。In step S14, ``2'' is a vowel candidate for the continuous vowel flag on, and vrnb is a value for the vowel from among the numerical values prepared for continuous vowels.

子２、母９は、上の子、母と区別するため「２」が付い
ている。Child 2 and mother 9 are marked with "2" to distinguish them from the older child and mother.

このようにして、単語間ＤＰ部８において、ＤＰマツチ
ング処理を行い、各音節部分（１音節目、ｌ音節口４２
音節目、ｌ音節口〜３音節目、・・・２音節目、２音節
目〜３音節目、・・・）において、マツチング度の高い
単語群をかな漢字譲換部９に渡す。In this way, the inter-word DP unit 8 performs DP matching processing, and each syllable part (1st syllable, l syllable mouth 42
Word groups with a high degree of matching are passed to the kana-kanji transfer unit 9 at the syllable, the first syllable to the third syllable, the second syllable, the second to third syllable, and so on.

かな漢字変換部９では、かな漢字変換用辞書を用いてか
な漢字変換を行い、更に、クラスタリング部６で単語候
補に付された数字を用いて文節の組立を行い、前記単語
間ＤＰマツチング部８でのマツチング度と、言語的知識
を用いて、文節候補の順位付けを行い、出力する。The kana-kanji conversion unit 9 performs kana-kanji conversion using a kana-kanji conversion dictionary, and furthermore, the clustering unit 6 assembles phrases using numbers attached to word candidates, and the word-to-word DP matching unit 8 performs matching. This system uses language knowledge and linguistic knowledge to rank and output phrase candidates.

第１０図は、その結果を示すものである。FIG. 10 shows the results.

なお、前記ルール辞書５は、母音認識時の曖昧フラグを
考慮にいれて作成されているが、音声認識結果にこの種
のフラグがないときは、認識エラーの傾向からルール辞
書を作成することもできる。Note that the rule dictionary 5 is created taking into account ambiguity flags during vowel recognition, but if there are no flags of this kind in the speech recognition results, a rule dictionary may be created based on the tendency of recognition errors. can.

又、単語間ＤＰマツチング部８でのマツチング度の計算
においては、上述したような方法に限らず、他の計算方
法を用いることもできることは勿論である。Further, in calculating the degree of matching in the word-to-word DP matching section 8, it is needless to say that other calculation methods can be used instead of the method described above.

［発明の効果］以上述べたところから明らかなように、請求項１の本発
明は、ルール辞書を用いて音素候補の補強を行うので、
音声認識過程においてエラーが起こっても、正しい音素
候補を用意することが出来る。[Effects of the Invention] As is clear from the above description, the present invention according to claim 1 uses a rule dictionary to reinforce phoneme candidates.
Even if an error occurs during the speech recognition process, correct phoneme candidates can be prepared.

又、請求項２の本発明は、クラスタリング部、単語間Ｄ
Ｐマツチン部８により、候補の数を絞ることが出来るの
で、例えば、その後の処理である、かな漢字変換処理の
負担を軽減できるという長所を有する。Further, the present invention according to claim 2 provides a clustering unit, an inter-word D
Since the P-matching unit 8 can narrow down the number of candidates, it has the advantage that, for example, the burden of subsequent kana-kanji conversion processing can be reduced.

[Brief explanation of the drawing]

第１図は、本発明にかかる音声認識装置の一実施例を示
すブロック図、第２図は、同実施例の音声認識部の出力
例を示す構成図、第３図は、同実施例で用いられる曖昧
フラグの意味を示すフラグ構成図、第４図は、同実施例
における音素処理部の処理結果を示す構成図、第５図は
、同実施例におけるクラスタリング部による処理結果を
示す構成図、第６図は、同実施例における単語間ＤＰマ
ツチング部における処理の状態を示す構成図、第７図及
び第８図は、同実施例における単語間ＤＰマツチング部
での曖昧フラグによる処理を示すフローチャート、第９
図（ａ）、　（ｂ）、　　（ｃ）は、同実施例における
脱落フラグを説明するための構成図、第１０図は、同実
施例のかな漢字変換部の処理結果を示す構成図である。１・・・入力手段　　　３・・・標準パターン記憶手段
２・・・音声認識部　　４・・・音素処理部５・・・ル
ール辞書記憶手段６・・・クラスタリング部８・・・単
語ＤＰマツチング部出願人　　ブラザー工業株式会社FIG. 1 is a block diagram showing an embodiment of the speech recognition device according to the present invention, FIG. 2 is a block diagram showing an example of the output of the speech recognition section of the embodiment, and FIG. 3 is a block diagram showing an example of the output of the speech recognition unit of the embodiment. FIG. 4 is a configuration diagram showing the processing results of the phoneme processing unit in the same embodiment. FIG. 5 is a configuration diagram showing the processing results by the clustering unit in the same embodiment. , FIG. 6 is a block diagram showing the state of processing in the inter-word DP matching section in the same embodiment, and FIGS. 7 and 8 show processing using ambiguity flags in the inter-word DP matching section in the same embodiment. Flowchart, No. 9
Figures (a), (b), and (c) are block diagrams for explaining the omission flag in the same embodiment, and Fig. 10 is a block diagram showing the processing results of the kana-kanji converter of the same embodiment. 1... Input means 3... Standard pattern storage means 2... Speech recognition section 4... Phoneme processing section 5... Rule dictionary storage means 6... Clustering section 8... Word DP matching section Applicant: Brother Industries, Ltd.

Claims

[Claims]

(1) An input means for inputting speech, a standard pattern storage means for storing a standard pattern created for each phoneme, and a speech input from the input means and the standard pattern stored in the standard pattern storage means. A speech recognition unit that compares and calculates the degree of matching with a standard pattern and selects candidates for each phoneme, a rule dictionary storage unit that collects recognition rules derived from past speech recognition experience, and a rule dictionary storage unit that collects recognition rules derived from past speech recognition experience. A speech recognition device comprising: a phoneme processing unit that adds other candidates to the candidates for each phoneme selected by the speech recognition unit while referring to rules.

(2) An input means for inputting speech, a standard pattern storage means for storing a standard pattern created for each phoneme, and a method for inputting the speech input from the input means and the standard pattern stored in the standard pattern storage means. A speech recognition unit that compares, calculates the degree of matching with a standard pattern, and selects candidates for each phoneme, and combines the candidates selected by the speech recognition unit, and combines various words with pre-prepared words. a clustering unit that compares the words with the words in the dictionary and outputs the corresponding word if there is a corresponding word in the word dictionary; A speech recognition device comprising: a word DP matching unit that selects a more appropriate word from the selected candidate words.