JPS63250698A

JPS63250698A - Voice recognition equipment

Info

Publication number: JPS63250698A
Application number: JP62084679A
Authority: JP
Inventors: 入間野　孝雄
Original assignee: Matsushita Communication Industrial Co Ltd
Current assignee: Panasonic Mobile Communications Co Ltd
Priority date: 1987-04-08
Filing date: 1987-04-08
Publication date: 1988-10-18

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、入力音声と、音素表記された単語辞書の各辞
書項目の辞書音素系列との類似度を、音素の尤度マトリ
クスを用いて計算して単語を認識する音声認識装置に関
するものである。[Detailed Description of the Invention] (Industrial Application Field) The present invention calculates the similarity between an input speech and a dictionary phoneme sequence of each dictionary entry in a word dictionary in which phonemes are expressed using a phoneme likelihood matrix. The present invention relates to a speech recognition device that recognizes words by calculation.

（従来の技術）従来の単語音声認識装置の第１の例を第２図に基づいて
説明する。第２図は音声認識装置のフローチャートであ
る。同図において入力音声を分析してパラメータ抽出を
行ない、次に音素認識を行なう。この認識された音素系
列と単語辞書中の各辞書項目の辞書音素系列とを照合し
、対応する各音素間の尤度を尤度マトリクスを参照して
求め、各音素の尤度和として音素系列同志の尤度和を求
め、これを音素数で割って音素系列間の類似度を求め、
この類似度が最大となる辞書項目をもって認識結果とし
ている。(Prior Art) A first example of a conventional word speech recognition device will be explained based on FIG. FIG. 2 is a flowchart of the speech recognition device. In the figure, input speech is analyzed to extract parameters, and then phoneme recognition is performed. This recognized phoneme sequence is compared with the dictionary phoneme sequence of each dictionary item in the word dictionary, the likelihood between each corresponding phoneme is determined by referring to the likelihood matrix, and the phoneme sequence is calculated as the sum of the likelihoods of each phoneme. Find the sum of like-like likelihoods, divide this by the number of phonemes, and find the similarity between phoneme sequences.
The dictionary item with the maximum degree of similarity is taken as the recognition result.

ここで、単語辞書は音素で表記されているが、音素はほ
ぼローマ字に近い単位であって、本従来例における単語
辞書の各項目は、５ＡＰＰＯＲ○（サラポロ）、ＴＯＯ
ＫＹ○０（トウキヨウ）。Here, the word dictionary is written in phonemes, and phonemes are units almost similar to the Roman alphabet, and each item in the word dictionary in this conventional example is 5APPOR○, TOO
KY○0 (Tokyo).

００ＳＡＫＡ（オオサカ）等と表わされている。It is expressed as 00SAKA (Osaka), etc.

第３図は音素のコンフユージヨンマトリクス（以下、Ｃ
，Ｍ、と略す）を示すものである。同図において縦軸は
辞書音素系列の音素、横軸は認識された音素を示し、図
中■が正しく工と認識される確率は７５％、工がＵと誤
認識される確率は５％、■が認識されず脱落する（存在
を無視される）確率は８％等を示している。このＣ，Ｍ
、の各数値を対数化して、上記の尤度マトリクスとして
いる。第３図に示すＣ，Ｍ、は予め何という単語である
かわかっている多数の音声データの音素認識を行ない。Figure 3 shows the phoneme conflation matrix (hereinafter referred to as C
, M). In the figure, the vertical axis shows the phonemes in the dictionary phoneme series, and the horizontal axis shows the recognized phonemes. The probability that (i) is not recognized and is omitted (its existence is ignored) is 8%. This C,M
, are logarithmized to form the above likelihood matrix. C and M shown in FIG. 3 perform phoneme recognition on a large number of voice data whose words are known in advance.

その音素認識結果に基づき作成されている。すなｈちＴ
がＡに！！識される確率が０％ということは、このＣ，
Ｍ、を作成するのに使用した音声データにおいて■がＡ
と認識されたものはなかったということである。このＣ
，Ｍ、を用いて単語認識を行なうとき、ある入力音声の
音素認識結果において、ＩをＡと認識すると、この入力
単語の辞書音素系列からはそのような音素認識結果が生
じる確率は０であると判断され、その入力が正しく単語
認識される可能性は全くない。このようにＣ，Ｍ、作成
に用いた音声データの音素認識において存在しなかった
音素の対応関係が生ずると、正しく単語認識される可能
性が全くなくなるということを防ぐため１本従来例では
Ｃ，Ｍ、の各要素において。It is created based on the phoneme recognition results. Sunahchi T
becomes A! ! The probability of being recognized is 0%, which means that C,
In the audio data used to create M, ■ is A.
There was nothing recognized as such. This C
When performing word recognition using ,M, if I is recognized as A in the phoneme recognition result of a certain input speech, the probability that such a phoneme recognition result will occur from the dictionary phoneme sequence of this input word is 0. Therefore, there is no possibility that the input word will be recognized correctly. In this way, if a correspondence relationship between phonemes that did not exist in the phoneme recognition of the audio data used to create C and M occurs, there is no possibility that the word will be recognized correctly. ,M, for each element.

０．１％以下は全て０．１％であるとみなして、認識に
用いる尤度マトリクスを作成していた。A likelihood matrix used for recognition was created by assuming that everything below 0.1% is 0.1%.

しかし上記第１の従来例では以下の例に示すような欠点
を生じていた。入力音声がナベシマ（辞書音素系列はＮ
ＡＢＥＳＩＭＡ）であり、その認識音素系列は、後尾の
Ａの音素のセグメンテーションを誤りＡＷＡとなってＮ
ＡＢＥＳＩＭＡＷＡとなったとき、単語認識結果はナメ
リガヮ（辞書音素系列はＮＡＭＥＲＩＧＡＷＡ）となっ
た、ＲをＳと音素認識することは、発声が正常である限
りほとんどあり得ないのであるが、従来例ではＲをＳと
認識する確率を０．１％と見なしたため、他の部分の音
素系列の良く似たナメリガヮに誤認識したものである。However, the first conventional example described above had drawbacks as shown in the following example. The input voice is Nabeshima (dictionary phoneme series is N
ABESIMA), and the recognized phoneme sequence is N
When ABESIMAWA was obtained, the word recognition result was NAMERIGAWA (dictionary phoneme sequence is NAMERIGAWA). It is almost impossible to recognize the phoneme R as S as long as the pronunciation is normal, but in the conventional example, R Since the probability of recognizing S as S was assumed to be 0.1%, it was mistakenly recognized as Namerigawa, which has a very similar phoneme sequence in other parts.

上記第１の従来例の欠点は対処した音声認識方法として
、以下に述べる第２の従来例がある。第２の従来例にお
いて、第１の従来例と異なるところは、第３図に示すよ
うなＣ，Ｍ、を作成したとき、Ｃ，Ｍ、の要素で確率Ｏ
％となった場合について、それを−律に０．１％と見な
すのではなく、その音素の対応が音声の性質上有り得る
ものであるかどうかを予め人間が判断し、有る得る場合
には第１の従来例と同様に０．１％と見なし、有り得な
い場合には０．０００１％と見なして音素の尤度マトリ
クスを作成することである。ここで有り得ない場合を０
％とせず０．０００１％としたのは、全く０％とすると
尤度マトリクスを作るために対数化したとき一ψとなっ
てしまうから、はとんど０％と見なし得る値として設定
したものである。この第２の従来例によれば、第１の従
来例での誤認識の例の場合でも、ＲをＳと音素認識する
ことはほとんどないという尤度マトリクスの情報を用い
ることができ、ナベシマの認識音素系列ＮＡＢＥＳＩＭ
ＡＷＡとナメリガワの辞書音素系列ＮＡＭＥＲＩＧ　Ａ
　Ｗ　Ａとの類似度は小さくなり、正しく単語認識がな
される。さらにこの第２の従来例は次の場合にも対処し
ている。人力音声がシンマチ（辞書音素系列はＳ　ＩＮ
ＭＡＣＩ）で認識音素系列がＳＩＮＯＭＡＣＩの場合、
第１の従来例では単語認識結果はイシノマキ（Ｉ　Ｓ　
ＩＮＯＭＡＫＩ）となっていた。ここで語頭の工の脱落
確率は８％であるから、音素系列のＳ以下の部分が似て
いるイシノマキに誤ったものである。この問題への対処
は、実際に語頭のＴが脱落し得るわけだから、前記ナベ
シマの例のように単純ではない。ここで第２の従来例で
は以下に述べる規則を用いてこの問題に対処していた。There is a second conventional example described below as a speech recognition method that addresses the drawbacks of the first conventional example. The difference between the second conventional example and the first conventional example is that when C and M are created as shown in FIG.
%, instead of considering it as 0.1%, humans should judge in advance whether the correspondence between the phonemes is possible based on the nature of the speech, and if it is possible, the first step should be taken. Similar to the conventional example No. 1, the likelihood matrix of the phoneme is created by considering it as 0.1%, and if it is impossible, assuming it as 0.0001%. Here, the impossible case is 0
The reason why I set it to 0.0001% instead of % is because if I set it to 0% at all, it would become 1 ψ when logarithmized to create a likelihood matrix, so I set it as a value that can be considered as 0%. It is. According to this second conventional example, even in the case of erroneous recognition in the first conventional example, it is possible to use the information of the likelihood matrix that it is almost impossible to recognize R as a phoneme S. Recognized phoneme series NABESIM
AWA and Namerigawa dictionary phoneme series NAMERIG A
The degree of similarity with W A becomes small, and the word is correctly recognized. Furthermore, this second conventional example also deals with the following case. Human voice is symmetrical (dictionary phoneme series is S IN
MACI) and the recognized phoneme sequence is SINOMACI,
In the first conventional example, the word recognition result is Ishinomaki (I S
INOMAKI). Here, the probability of omission of the word-initial word is 8%, so it is mistaken for Ishinomaki, which resembles the part below S in the phoneme sequence. Dealing with this problem is not as simple as in the Nabeshima example, since the T at the beginning of a word can actually be dropped. Here, in the second conventional example, this problem was dealt with using the rules described below.

すなわち、イシノマキは明瞭に発声する限り語頭のＩは
有声音であり、次のＳは持続時間の長い無声音であり、
■を全くＳに含めてＩＳを合わせてＳと音素認識してし
まうことはない。■の発声がやや弱いと、工とは認識さ
れないことがあるが、この場合Ｕか鼻音になるか、また
はＳと一体化した場合でも有声性はあるからＺ等に認識
される。すなわちＩＳがＵＳ、ＮＳ、Ｚ等と認識される
ことはあるがＳとはならないというルールを設定し、こ
のルールに反する音素の対応関係に対しては尤度マトリ
クスにより得られた尤度に対し尤度の減点を行なう。こ
れにより、前記シンマチの入力例の場合、辞書項目がイ
シノマキのとき、語頭の工が脱落しかつＳは正しくＳと
認識されたというマツチングがなされるが、これは前記
のルールに反するので尤度が減点され、第１の従来例の
場合と比べ当然類似度も減少する。他方、辞書項目シン
マチはそのルールとは関係なく類似度は第１の従来例の
場合と変わらないから、この入力例において、第２の従
来例では正しい単語認識結果が得られる。この第２の従
来例は、前記のようなルールをいくつか持ち、音声の性
質上有り得ない音素の対応がありながら高い類似度が得
られることを防ぎ、高い認識性能を得ている。しかしこ
の第２の従来例では各音素の尤度計算をするたびに多数
のルールとの照合を行なうため、類似度計算に要する時
間を大幅に増加する欠点があった。In other words, as long as Ishinomaki is pronounced clearly, the first I is a voiced sound, and the next S is a voiceless sound with a long duration.
It is not possible to include ■ in S at all and recognize the phoneme together with IS as S. If the pronunciation of (①) is a little weak, it may not be recognized as a ``work'', but in this case, it becomes a U or a nasal sound, or even if it is combined with an S, it is still voiced, so it is recognized as a Z, etc. In other words, a rule is set that IS may be recognized as US, NS, Z, etc., but not as S, and for phoneme correspondences that violate this rule, the likelihood obtained by the likelihood matrix is The likelihood is subtracted. As a result, in the case of the above-mentioned input example of symmetrical lines, when the dictionary entry is Ishinomaki, matching is performed such that the initial word is dropped and S is correctly recognized as S, but since this is against the above rule, the likelihood is points are deducted, and the degree of similarity naturally decreases compared to the case of the first conventional example. On the other hand, since dictionary item symmetry has no relation to the rules and the degree of similarity is the same as in the first conventional example, in this input example, correct word recognition results can be obtained in the second conventional example. This second conventional example has some of the above-mentioned rules, prevents high similarity from being obtained even when there is an unlikely phoneme correspondence due to the nature of speech, and achieves high recognition performance. However, this second conventional example has the disadvantage that the time required for similarity calculation increases significantly, since comparison with a large number of rules is performed each time the likelihood of each phoneme is calculated.

（発明が解決しようとする問題点）上記第１の従来例では入力音声を入力とは異なる単語の
辞書項目とマツチングしたとき、音声の性質上明らかに
ありえないような音素の対応関係を含んでいても高い類
似度が得られ、単語認識を誤ることがある欠点があった
。第２の従来例では前記第１の従来例の問題を解決した
が、認識に要する時間が大幅に増大する欠点があった。(Problems to be Solved by the Invention) In the first conventional example described above, when input speech is matched with a dictionary entry for a word different from the input, it contains phoneme correspondences that are obviously impossible due to the nature of the speech. However, a high degree of similarity was obtained, but the problem was that word recognition could be incorrect. Although the second conventional example solved the problems of the first conventional example, it had the disadvantage that the time required for recognition increased significantly.

本発明の目的は、従来の欠点を解消し、認識性能が良く
、認識に要する時間の少ない優れた音声認識装置を提供
することである。SUMMARY OF THE INVENTION An object of the present invention is to provide an excellent speech recognition device that eliminates the conventional drawbacks, has good recognition performance, and requires less time for recognition.

（問題点を解決するための手段）本発明の音声認識装置は、音素表記された単語辞書と、
この単語辞書の各音素が音素認識において、どの音素に
認識されるか、あるいは脱落するか、あるいは付加され
るかを示す尤度マトリクスと、音素認識手段と、この音
素の尤度を利用して辞書音素系列と認識音素系列との間
の類似度をＤＰにより計算する手段と、単語辞書の音素
と認識された音素の対応関係が音声の性質上適当かどう
かのルールを示すテーブルで、尤度マトリクスと同形式
のテーブルを備え、また各音素の対応の尤度は尤度マト
リクスを用い、この尤度マトリクスの要素のうち、出現
確率の低い要素については。(Means for Solving the Problems) The speech recognition device of the present invention includes a word dictionary in which phonemes are expressed,
In phoneme recognition, each phoneme in this word dictionary uses a likelihood matrix that indicates which phoneme is recognized, dropped, or added, a phoneme recognition means, and the likelihood of this phoneme. A means for calculating the similarity between a dictionary phoneme sequence and a recognized phoneme sequence using DP, and a table showing rules for determining whether the correspondence between a phoneme in a word dictionary and a recognized phoneme is appropriate based on the nature of speech. It has a table in the same format as the matrix, and uses a likelihood matrix to determine the likelihood of correspondence between each phoneme, and among the elements of this likelihood matrix, elements with low probability of occurrence.

その要素が音声の性質上生じ得ないものの場合には確率
ＯまたはＯと見なし得る値を与え、その要素が音声の性
質上生じ得るものの場合にはＯに近いが０でない有限の
値を与える尤度マトリクスと。If the element cannot occur due to the nature of speech, it gives a probability O or a value that can be considered as O. If the element can occur due to the nature of speech, it gives a finite value close to O but not 0. degree matrix.

音声の性質上有り得るかどうかの判断を一つずつの音素
に対し独立して行なうだけでなく、音素の認識結果、脱
落または前記音素との間の音素の付加の結果を条件に行
なうようなルールを設け、上記の構成により入力音声の
認識時には、先ずその入力音声の音素認識を行ない、得
られた認識音素系列と辞書音素系列との類似度をＤＰに
より計算し、そのときにＤＰ計算において音素系列の始
めから各音素の対応する点適の尤度和を求め、その音素
の対応関係が、音素の認識結果も含めて音声。A rule that not only determines whether or not it is possible due to the nature of speech, but also based on the results of phoneme recognition, omissions, or additions of phonemes between the phoneme and the previous phoneme. When recognizing input speech using the above configuration, first perform phoneme recognition of the input speech, calculate the similarity between the obtained recognized phoneme sequence and the dictionary phoneme sequence using DP, and then The sum of the likelihoods of the corresponding points of each phoneme is calculated from the beginning of the sequence, and the correspondence between the phonemes is determined as a sound, including the phoneme recognition results.

の性質上適当かどうかを、前記のルールのテーブルを参
照してチェックし、特別なルールを用いないときは、尤
度マトリクスを参照して、そこに示される尤度の値をそ
のまま用い、特別なルールを用いるときは、そのテーブ
ルで示されるルールを適用して尤度計算を行なうもので
ある。Check whether it is appropriate due to the nature of When a rule is used, the likelihood calculation is performed by applying the rule shown in the table.

（作　用）本発明によれば、尤度を計算するときに尤度マトリクス
と同形式のルールのテーブルを参照することによって、
認識に要する時間をほとんど増大させることなく、高い
Ｌｙ！ｐ′ｒａ率を確保できる。(Function) According to the present invention, when calculating the likelihood, by referring to a table of rules having the same format as the likelihood matrix,
High Ly! with little increase in recognition time! A p'ra rate can be ensured.

（実施例）本発明の一実施例を第１図に基づいて説明する。(Example) An embodiment of the present invention will be described based on FIG.

第１図は本発明の音声認識装置における動作のフローチ
ャートである。本実施例は、上記第１および第２の従来
例を改良したものであり、その動作も従来例をベースと
したものである。FIG. 1 is a flowchart of the operation of the speech recognition device of the present invention. This embodiment is an improvement on the first and second conventional examples, and its operation is also based on the conventional examples.

第１図において、入力音声を分析してパラメータ抽出を
行ない１次に音素認識を行なう。この認識された音素系
列と単語辞書中の各辞書項目の辞書音素系列とを参照し
、対応する各音素間の尤度を求め、各音素の尤度の和を
求めて音素系列間の類似度を求め、この類似度が最大と
なる辞書項目をもって認識結果としている。In FIG. 1, input speech is analyzed, parameters are extracted, and first-order phoneme recognition is performed. This recognized phoneme sequence and the dictionary phoneme sequence of each dictionary entry in the word dictionary are referred to, the likelihood between each corresponding phoneme is calculated, and the sum of the likelihoods of each phoneme is calculated to measure the similarity between the phoneme sequences. The dictionary item with the highest degree of similarity is determined as the recognition result.

ここで単語辞書は、第１および第２の従来例と同様であ
り、尤度マトリクスは第２の従来例と同様である。すな
わち、第１の従来例と同様の形式を持ち、第１の従来例
と同様に第３図に示すような音素のＣ，Ｍ、から確率を
対数化して作成するが、このとき、Ｃ，Ｍ、上で確率Ｏ
であるものについて、音声の性質上有り得るものについ
ては、確率０．１％、有り得ないものについては確率０
．０００１％と見なして作成する。Here, the word dictionary is the same as in the first and second conventional examples, and the likelihood matrix is the same as in the second conventional example. That is, it has the same format as the first conventional example, and is created by logarithmizing the probability from the phonemes C and M as shown in FIG. 3, as in the first conventional example. M, probability O on
For things that are possible due to the nature of the voice, the probability is 0.1%, and for things that are impossible, the probability is 0.
．． It is created assuming that it is 0001%.

以下、従来例と異なる部分について述べる。ルールのテ
ーブルは尤度マトリクスと同様な構造を持ち、各要素は
各種音素それぞれの対応関係を示し、尤度計算時に尤度
マトリクスと同じように参照される。各要素には、その
要素が示す音素の対）、’ｉ　、すなわち、ある音素が
ある音素に認識されることが音声の性質上室にあり得る
場合、および常にあり得ない場合には、特別なルールを
用いる必要がないことを示すフラグ０が立っている。す
なわち、各音素の尤度を求めるとき、このルールのテー
ブルを参照し、値がＯであったならば、単純に尤度マト
リクスで示される尤度を用い、ルールのテーブルの要素
がＯでないものは、その要素が示す音素の対応関係が音
声の性質上有り得るかどうかの判断がその音素単独でな
く、前音素の認識結果に依存するものである。−例をあ
げれば、従来例の説明で述べたシンマチが入力し、音素
認識結果がＩ　Ｓ　ＩＮＯＭＡＣＩであり、イシノマキ
（辞書音素系列はＩＳＩＮＯＭＡＫＩ）に誤認識される
ことを防ぐためのルールである。すなわち、誤頭のＩの
脱落することはあり得るが、後続する音素がＳである語
頭の工が脱落し、かつＳがＳと認識されることはないと
いうルールである。このルール自体はルールベースに格
納されているが。The differences from the conventional example will be described below. The rule table has a similar structure to a likelihood matrix, and each element indicates the correspondence between various phonemes, and is referred to in the same way as a likelihood matrix when calculating the likelihood. Each element has a pair of phonemes that it represents), i. A flag of 0 is set indicating that there is no need to use a rule. In other words, when calculating the likelihood of each phoneme, refer to this rule table, and if the value is O, simply use the likelihood shown in the likelihood matrix, and calculate if the element in the rule table is not O. In this case, the judgment as to whether the correspondence between the phonemes indicated by the element is possible due to the nature of speech depends on the recognition result of the previous phoneme, not on the phoneme alone. - For example, when the symmetry described in the explanation of the conventional example is input, the phoneme recognition result is IS INOMACI, and this is a rule to prevent it from being erroneously recognized by ISINOMAKI (dictionary phoneme series is ISINOMAKI). In other words, the rule is that, although it is possible for an incorrect initial I to be dropped, a word initial whose following phoneme is S will not be dropped, and S will not be recognized as S. This rule itself is stored in the rulebase.

この場合、ルールのテーブルの語頭の工が脱落したこと
に相当する要素に、上記のルールのルールベース中のア
ドレスが入っている。ここで、このシンマチの入力時の
イシノマキとの類似度計算において、Ｓの尤度計算時に
：ＳがＳとマツチングした場合、Ｓの前の語頭のＩが脱
落したことになるから、そこで語頭の工の脱落について
ルールがあるかどうかルールのテーブルを参照する。す
ると上記のルールが存在することがわかり、そのルール
のルールベースにおけるアドレスが示されるので、その
ルールを参照し、上記条件に該当していることがわかり
、音声の性質上ありえない音素認識結果であると判断で
きるから確率０．０００１％に相当する尤度を与え尤度
を大幅に減点する。In this case, the address in the rule base of the above rule is included in the element corresponding to the omission of the initial word in the rule table. Here, in calculating the similarity with Ishinomaki when inputting this symmetry, when calculating the likelihood of S: If S is matched with S, it means that the initial I before S has been dropped, so the initial Check the table of rules to see if there are any rules about dropping workers. Then, it is found that the above rule exists, and the address of that rule in the rule base is shown, so by referring to that rule, it is found that the above condition is met, and this is a phoneme recognition result that is impossible due to the nature of the speech. Since it can be judged that, a likelihood corresponding to a probability of 0.0001% is given and the likelihood is significantly reduced.

このように、上記実施例によれば、各音素の尤度計算時
にそのような音素認識、脱落、付加が音声の性質上あり
得るかどうかにより尤度の値を変え、さらにこの判定を
その音素単独ではなく、前音素の認識結果も考慮して行
ない、音声の性質上あり得ないような音素の対応に対し
て確実に尤度が低くなるようにして、高い認識性能を得
られる。In this way, according to the above embodiment, when calculating the likelihood of each phoneme, the likelihood value is changed depending on whether or not such phoneme recognition, omission, or addition is possible due to the nature of the speech, and this determination is also performed based on the phoneme. It is possible to obtain high recognition performance by taking into consideration the recognition results of the previous phoneme, not just the recognition result alone, and ensuring that the likelihood is low for phoneme correspondences that are impossible due to the nature of speech.

さらに、単に尤度マトリクスを参照するだけでは対応で
きない、前音素の認識結果も考慮して音素の対応関係が
音声の性質上有る得るかどうかの判定を行なう方法とし
て、尤度マトリクスと同様な形式を持つルールのテーブ
ルを参照し、そのテープルの要素によって示されるフラ
グによって特別なルールを適用しない大部分の場合には
、単に尤度マトリクスに示される尤度値を尤度として採
用できるから、尤度計算時間の増加はほとんどなく。Furthermore, as a method for determining whether or not a correspondence relationship between phonemes is possible due to the nature of speech, considering the recognition results of previous phonemes, which cannot be handled by simply referring to the likelihood matrix, we have developed a method similar to the likelihood matrix. In most cases where you refer to a table of rules with a table of rules and do not apply any special rules depending on the flags indicated by the elements of that table, you can simply take the likelihood value shown in the likelihood matrix as the likelihood. There is almost no increase in calculation time.

また特別なルールを用いる場合でもテーブルに示された
フラグに示されるルールベースのアドレスをすぐに参照
にできるので、すべての場合についてその都度ルールの
適用を判断していた第２の従来例に比較し、はるかに能
率が良く、第１の従来例と比較しても認識に要する時間
の増加は僅がである。In addition, even when using special rules, the address of the rule base indicated by the flag shown in the table can be immediately referenced, compared to the second conventional example in which application of the rule was determined each time in every case. However, it is much more efficient, and the time required for recognition increases only slightly compared to the first conventional example.

なお、本特許請求の範囲、実施例等において近さを表わ
す尤度、類似度という尤度を用いたが、遠さを表わす尤
度である距離を用いても大小が逆になるだけで同じこと
である。In addition, in the claims, examples, etc. of this patent, the likelihood of representing proximity and the likelihood of similarity are used, but even if distance, which is the likelihood of representing distance, is used, the results are the same, only the magnitude is reversed. That's true.

本発明は、計算において、距離を用いる場合も含むもの
である。The present invention also includes cases where distance is used in calculations.

また、上記実施例は、孤立単語の認識の例を示したが１
本発明は連続音声の認識においても同様である。In addition, although the above embodiment shows an example of isolated word recognition, 1
The present invention also applies to recognition of continuous speech.

（発明の効果）本発明によれば、尤度マトリクスと同様な形式のルール
のテーブルを持ち、辞書の音素と認識された音素の対応
関係が音声の性質上ある得るかどうかを、ルールのテー
ブルを参照して判定し、必要に応じてルールベースを参
照して、適正な尤度の値を得られるようにしたものであ
り、高い認識性能を持つと同時に、認識に要する時間も
短かくなり、その実用上の効果は大なるものがある。(Effects of the Invention) According to the present invention, a table of rules is provided in a format similar to a likelihood matrix, and the table of rules is used to determine whether or not there is a correspondence relationship between a phoneme in a dictionary and a recognized phoneme due to the nature of speech. The system makes judgments by referring to the rules, and refers to the rule base as necessary to obtain an appropriate likelihood value.It has high recognition performance and shortens the time required for recognition. , its practical effects are great.

[Brief explanation of drawings]

第１図は本発明の一実施例における音声認識装置の動作
のフローチャート、第２図は第１の従来例の音声！！識
装置の動作フローチャート、第３図は音素のＣ，Ｍ、の
一部分を示すチャート図である。特許出願人　松下通信工業株式会社第１図第２図FIG. 1 is a flowchart of the operation of a speech recognition device according to an embodiment of the present invention, and FIG. 2 is a flowchart of the operation of a speech recognition device in an embodiment of the present invention. ! FIG. 3 is a flowchart of the operation of the recognition device, which is a chart showing a portion of the phonemes C and M. Patent applicant: Matsushita Tsushin Kogyo Co., Ltd. Figure 1 Figure 2

Claims

[Claims]

A word dictionary with phoneme notation and each phoneme in the word dictionary,
In phoneme recognition, a likelihood matrix indicating which phoneme is recognized, dropped, or added, a phoneme recognition means, a dictionary phoneme sequence using the likelihood of the phoneme, and a recognized phoneme. A means for calculating the degree of similarity between sequences using DP, and a table indicating rules for determining whether the correspondence between a phoneme in the word dictionary and a recognized phoneme is appropriate based on the nature of speech, and is the same as the likelihood matrix. The likelihood of correspondence, addition, or omission of each phoneme is determined using the likelihood matrix, and for elements of the likelihood matrix that have a low probability of appearance, that is, an element that has a low likelihood, the likelihood of correspondence, addition, or omission of each phoneme is determined by In the case of something that cannot occur due to the nature of
Or, if the element can occur due to the nature of the voice, it is a likelihood matrix that gives a finite value that is close to 0 but not 0. This is not only done for each phoneme independently, but also based on the recognition results of the phoneme, its omission, or the addition of a phoneme between it and the phoneme (in the case of the beginning of a word, the addition of a phoneme to the beginning of the word). When recognizing input speech using the above configuration, first perform phoneme recognition of the input speech, calculate the degree of similarity between the obtained recognized phoneme sequence and the dictionary phoneme sequence using DP, and in the DP calculation, the phoneme sequence Find the sum of likelihoods from the beginning to the corresponding point of each phoneme, and check whether the correspondence between the phonemes is appropriate based on the nature of the speech, including the phoneme recognition results, by referring to the table of rules. , when no special rules are used, the likelihood values shown by referring to the likelihood matrix are used as they are; when special rules are used, the rules shown in the table are applied to calculate the likelihood. A voice recognition device characterized by: