JPS6039522A

JPS6039522A - Word voice recognizing method

Info

Publication number: JPS6039522A
Application number: JP58147312A
Authority: JP
Inventors: Takao Irumano; 入間野　孝雄; Kunio Akiba; 秋場　国夫; Hisanori Kanezashi; 金指　久則
Original assignee: Computer Basic Technology Research Association Corp
Current assignee: Computer Basic Technology Research Association Corp
Priority date: 1983-08-13
Filing date: 1983-08-13
Publication date: 1985-03-01
Also published as: JPH0158519B2

Abstract

PURPOSE:To maintain a high recognition rate by correcting a value of tolerance weight in the direction for reducing a difference between the first rank of a weighted similarity degree and a weighted similarity degree in a dictionary item of an input word, whenever an erroneous recognition is generated, and using its value in the next time and thereafter. CONSTITUTION:When a word A is at the first rank of a similarity degree, for instance, if a value of tolerance weight applied to a similarity degree of a word B is an (n) point, and a result of work recognition is A against an input of B, the previous value of tolerance weight is corrected automatically to (n)+(m) points. Also, in case when a correct recognition result is obtained against an input of A and B, a value of tolerance weight remains as it is. On the other hand, in case when the result of word recognition becomes B against the input of A, on the contrary, the value of tolerance weight is corrected to (n)-(m). By continuing it, the value of tolerance weight is converged to a value by which a recognition rate as a whole becomes the highest.

Description

【発明の詳細な説明】産業上の利用分野本発明は、入力音声と音素表記された単語辞書とを照合
して単語を認識する単語音声認識方法に関するものであ
る。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a word speech recognition method for recognizing words by comparing input speech with a word dictionary in which phonemes are expressed.

従来例の構成とその問題点従来の単語音声認識方法を１図とともに説明する。　図
に示すように、入力音声に対して先ず分析を行ない、こ
の入力単語音声の特徴を抽出して、入力単語音声を構成
する音素を認識する。The configuration of a conventional example and its problems A conventional word speech recognition method will be explained with reference to FIG. As shown in the figure, the input speech is first analyzed, the features of the input word speech are extracted, and the phonemes that make up the input word speech are recognized.

この認識された音素系列を、単語辞書中の各辞書項目の
辞書音素系列と照合し、２つの音素系列間の類似度を、
音素間のコンツー−ジョンマトリクス（ＣＭ）を用いて
各音素毎の認識確率をめることにより算出し、次に、各
辞書項目と前記類似度が第１位である辞書項目との組み
合わせ毎に予め定められている尤度１みを各辞書項目毎
の前記類似度に加算、又は減算し、得られた重み伺き類
似度が最大となる辞書項目をもって認識単語とするもの
である。第１表は、前記単語音声認識方法に用いる単語
辞書の一例を示しておシ、各単語は第２表に示す音素表
記法に従って表記されている。This recognized phoneme sequence is compared with the dictionary phoneme sequence of each dictionary entry in the word dictionary, and the degree of similarity between the two phoneme sequences is calculated as follows:
It is calculated by calculating the recognition probability for each phoneme using a contusion matrix (CM) between phonemes, and then for each combination of each dictionary item and the dictionary item with the highest similarity. A predetermined likelihood of 1 is added to or subtracted from the similarity for each dictionary item, and the dictionary item with the maximum weighted similarity is selected as a recognized word. Table 1 shows an example of a word dictionary used in the word speech recognition method, and each word is written according to the phoneme notation shown in Table 2.

第１表第２表第３表第３表は、前記コンフユージヨンマトリクスの一部を示
す。第３表において、縦は単語辞書中の音素を示し、横
は認識音素を示している。また第３表中の数字は単語辞
書中の各音素がどのような音素に認識されるかの確率を
チで示したものである。例えば、第３表において、単語
辞書中のｌが■と認識される確率は７５％、Ｕに認識さ
れる確率は５チ、Ａに認識される確率はｏｑ６、脱落す
る確率は８％、等を示している。Table 1, Table 2, Table 3, and Table 3 show a portion of the conflation matrix. In Table 3, the vertical lines indicate phonemes in the word dictionary, and the horizontal lines indicate recognized phonemes. Furthermore, the numbers in Table 3 indicate the probability of what kind of phoneme each phoneme in the word dictionary will be recognized as. For example, in Table 3, the probability that l in the word dictionary will be recognized as ■ is 75%, the probability that it will be recognized as U is 5chi, the probability that it will be recognized as A is oq6, the probability that it will be omitted is 8%, etc. It shows.

上記従来例において、単語Ａの認識音素系列が特有の音
素認識誤逆傾向を持ち、そのような認識音素系列が単語
Ａよシ単語Ｂの辞書音素系列との類似度の方が高い場合
、単語Ａの入力音声が集中的にＢに誤認識されるのを防
止するために類似度の値を調整するのが尤度重みを用い
る目的である。In the above conventional example, if the recognized phoneme sequence of word A has a unique reverse tendency for phoneme recognition errors, and such recognized phoneme sequence has a higher degree of similarity with the dictionary phoneme sequence of word A than word B, then the word The purpose of using the likelihood weight is to adjust the similarity value in order to prevent A's input speech from being incorrectly recognized by B in a concentrated manner.

この尤度重みの値は、予め多数の音声データの認識実験
を行ない、単語認識率が最大となるような値にセットさ
れていた。単語毎の音素認識傾向が常に変わらなければ
、従来の方法でも差しつかえないが、伝送路、周囲雑音
の影響等で音素認識傾向が変化すると尤度重みの値は最
適な値ではなくなシ、単語認識率が低下するという問題
があった。The value of this likelihood weight was set to a value that maximized the word recognition rate by performing recognition experiments on a large number of voice data in advance. If the phoneme recognition tendency for each word does not always change, the conventional method can be used, but if the phoneme recognition tendency changes due to the influence of the transmission path, ambient noise, etc., the likelihood weight value will not be the optimal value. There was a problem that the word recognition rate decreased.

発明の目的本発明は前記従来例の問題を解決し、環境が変わっても
高い単語認識率を維持できる単語音声認識方法を提供す
ることを目的とするものである。OBJECTS OF THE INVENTION It is an object of the present invention to provide a word speech recognition method that solves the problems of the conventional example and can maintain a high word recognition rate even if the environment changes.

発明の構成本発明の単語音声認識方法は、上記目的を達成するため
、入力音声と辞書音素系列との１久み付き類似度を計算
するのに用いる尤度重みの値を、実際の単語音声認識に
おいて誤認識が発生する度に、その時の重み付き類似度
−位と、入力語の辞書項目における重み付き類似度の差
を縮少する方向に修正し、その修正値を次回以後の単語
音声認識に用いることを特徴とする。Structure of the Invention In order to achieve the above object, the word speech recognition method of the present invention changes the likelihood weight value used to calculate the one-time similarity between the input speech and the dictionary phoneme sequence to the actual word speech. Every time a misrecognition occurs during recognition, the difference between the weighted similarity at that time and the weighted similarity in the dictionary entry of the input word is corrected to reduce the difference, and the corrected value is used for the next word speech. It is characterized by being used for recognition.

実施例の説明本実施例における単語音声認識方法は、前記従来例を改
良したものであシ、認識アルゴリズムのくし概略は、従来例と同様、図で表わされる。入力音声の音
素認識を行ない、この認識音素系列と単語辞書中の各辞
書項目の辞書音素系列との類似度をめ、この類似度に尤
度重みを加算又は減算して重み付き類似度を算出し、そ
の値が最大となる辞書項目をもって認識単語とするとい
う点で前記従来例と共通であるが、用いる尤度重みの値
を固定値ではなく、音素認識傾向の変化に合わせ最適値
に修正する点が従来例と異なる。修正法を具体的に述べ
る。単語Ａが類似度１位の時に単語Ｂの類似度に加える
尤度重みの値がｎ点であるとし、Ｂの入力に対し単語認
識結果がＡであったとすると、先の尤度重みの値は自動
的にｎ　十ｍ点に修正され、以後の認識においてこのｎ
　十ｍ点を用いることになる。またＡ、Ｈの入力に対し
正しい認識結果が得られた場合には尤度重みの値はその
ままである。またＡの入力に対し単語認識結果がＢとな
った時は、上記と反対に尤度重みの値をｎ　−ｍに修正
する。これ−を続けることにより、尤度重みの値はＡと
Ｂが互に誤認識する確率が等しい状態、即ち、全体とし
ての認識率が最高になる値に収束するものである。なお
、初期値は０点でも、事前に実験的に得られた値を用い
ても、修正を繰シ返せば同じ値に収束する。DESCRIPTION OF THE EMBODIMENTS The word speech recognition method in this embodiment is an improvement on the conventional example, and the outline of the recognition algorithm is shown in the diagram as in the conventional example. Performs phoneme recognition of the input voice, calculates the degree of similarity between this recognized phoneme sequence and the dictionary phoneme sequence of each dictionary item in the word dictionary, and calculates weighted similarity by adding or subtracting a likelihood weight to this degree of similarity. However, it is similar to the conventional example in that the dictionary entry with the maximum value is selected as the recognized word, but the value of the likelihood weight used is not a fixed value, but is modified to an optimal value according to changes in phoneme recognition trends. This is different from the conventional example. The revised method will be described in detail. When word A has the highest similarity, the value of the likelihood weight added to the similarity of word B is n points, and if the word recognition result is A for the input of B, then the value of the previous likelihood weight is automatically corrected to n 10 m points, and in subsequent recognition this n
The 10m point will be used. Further, if correct recognition results are obtained for inputs A and H, the value of the likelihood weight remains unchanged. Further, when the word recognition result is B for input A, the value of the likelihood weight is corrected to n - m, contrary to the above. By continuing this process, the value of the likelihood weight converges to a value where the probability of misrecognizing A and B is equal, that is, the overall recognition rate is the highest. Note that even if the initial value is 0 point or a value obtained experimentally in advance, if the correction is repeated, the value will converge to the same value.

本実施例においては、伝送路、周囲雑音の影響等で音素
認識傾向が変化しても、単語音声認識を行ないながら尤
度重みの修正値が得られ、それを次の単語音声認識に用
いることによシ、高い認識率を維持できるという効果を
有する。In this embodiment, even if the phoneme recognition tendency changes due to the influence of the transmission path, ambient noise, etc., a corrected value of the likelihood weight can be obtained while performing word speech recognition, and this can be used for the next word speech recognition. This has the advantage that a high recognition rate can be maintained.

なお本発明は、認識音素系列を一意に定める場合に限ら
ず、ラティス形式の認識音素系列を用いる場合、さらに
単語音声認識アルゴリズムにおいて、明白な認識音素系
列の形？とらず、入力音声の分析結果と、音素表記され
た単語辞書の各項目の辞書音素系列とのマツチングを直
接性なって尤度をめる場合、に適用しても同様の効果を
得ることができる。Note that the present invention is applicable not only to the case where a recognized phoneme sequence is uniquely determined, but also when a recognized phoneme sequence in a lattice format is used, and furthermore, in a word speech recognition algorithm, the obvious form of the recognized phoneme sequence? A similar effect can also be obtained when applying the method to directly match the input speech analysis result with the dictionary phoneme sequence of each item in a word dictionary with phoneme notation. can.

発明の効果本発明によれば、伝送路、周囲雑音等の環境が変化して
も高い単語認識率を維持できるという利点がある。Effects of the Invention According to the present invention, there is an advantage that a high word recognition rate can be maintained even if the environment such as the transmission path and ambient noise changes.

[Brief explanation of the drawing]

図は本発明の実施例およびその従来例を説明するだめの
単語音声認識方法の概略を示す図である。The figure is a diagram illustrating an outline of a word speech recognition method for explaining an embodiment of the present invention and a conventional example thereof.

Claims

[Claims]

When recognizing words by calculating the similarity between the input speech and the dictionary phoneme series of each dictionary entry in the word dictionary with phoneme notation,
When calculating the similarity for the input speech, a likelihood weight predetermined for each combination of each dictionary item and the dictionary item with the highest similarity is added to the similarity for each dictionary item. In a word speech recognition method, a weighted similarity is calculated by addition, subtraction, multiplication, or division, and the dictionary entry with the highest weighted similarity is used as a recognized word. A word sound characterized in that each time recognition occurs, the difference between the current weighted similarity degree and the weighted similarity degree in a dictionary entry of the input word is automatically corrected in a direction that reduces the difference. Recognition method.