JPS617894A

JPS617894A - Voice recognition

Info

Publication number: JPS617894A
Application number: JP59129854A
Authority: JP
Inventors: 入間野　孝雄; 金指　久則; 秋場　国夫
Original assignee: Matsushita Communication Industrial Co Ltd
Current assignee: Panasonic Mobile Communications Co Ltd
Priority date: 1984-06-22
Filing date: 1984-06-22
Publication date: 1986-01-14
Also published as: JPH0458636B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は、入力音声と音素表記された単語辞書を照合し
て単語を認識する音声認識方法に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a speech recognition method for recognizing words by comparing input speech with a dictionary of words expressed in phoneme form.

従来例の構成とその問題点従来の音声認識方法を図面とともに説明する。Conventional configuration and its problems A conventional speech recognition method will be explained with reference to the drawings.

図において、単語辞書は認識すべき全単語を音素で表記
したものであり、例えばアサヒ、ユーキ。In the figure, the word dictionary represents all the words to be recognized using phonemes, such as Asahi and Yuki.

ユキは／ＡＳＡＨ，Ｉ／、／ＪＵＵＫＩ／、／ＪＵＫＩ
／等と表記されている。音素の標準、Ｃタンは、各音素
毎に予め予備実験等により作成しておく。Yuki is /ASAH,I/, /JUUKI/, /JUKI
It is written as / etc. The phoneme standard, C-tan, is created in advance for each phoneme through preliminary experiments.

次に上記従“来例の動作について説明する。入力音声を
ＩＱＩＩＩｓＯフレーム毎に分析し、ノ（ラメータを抽
出してパラメータ時系列を作成する。）（ラメータを予
め計算しておくものである。次に、各辞書項目毎に類似
度を求めるのであるが、この類似度計算時に、その辞書
項目を構成する辞書音素系列に従って音素のセグメンテ
ーションを行ない、そのセグメンテーションされた音声
区間が、その音素を発声したものである確からしさを表
わす尺度である尤度を計算し、その辞書項目における各
音素の尤度の平均値として類似度を求め、類似度が最大
となる辞書項目をもって認識単語とする。Next, the operation of the above-mentioned conventional example will be explained. The input voice is analyzed for each IQIIIsO frame, parameters are extracted and a parameter time series is created, and parameters are calculated in advance. Next, the degree of similarity is calculated for each dictionary item. When calculating this degree of similarity, the phoneme is segmented according to the dictionary phoneme sequence that makes up the dictionary item, and the segmented speech interval is used to utter the phoneme. The similarity is calculated as the average value of the likelihood of each phoneme in the dictionary entry, and the dictionary entry with the maximum similarity is selected as the recognized word.

ここで長母音、例えば／ＪＵＵＩ＜Ｉ／の／ＵＵ／は、
／Ｕ／と／［工／の境界を見い出すことは通常できない
ので、／ＵＵ／を１まとめにしてセグメンテーションし
、尤度も１まとめにして計算する。なお、上記／ＵＵ／
のような長母音は、長母音であるところの１個の音素と
考えることも可能であるが本従来例では２個の音素／Ｕ
／が続くものとして取り扱っている。従って尤度も２音
素分の尤度の値を算出する。Here, the long vowel, for example /UU/ in /JUUI<I/, is
Since it is usually not possible to find the boundary between /U/ and /[工/, segmentation is performed by grouping /UU/ together, and the likelihood is also calculated by grouping them together. In addition, the above /UU/
It is possible to think of a long vowel as one long vowel, but in this conventional example, it is two phonemes /U.
It is treated as if it is followed by /. Therefore, the likelihood value for two phonemes is also calculated.

本従来例において、辞書音素系列における乙番目の音素
の尤度！、は次式で表わされる。　　　・、ｅ乙＝石、
。−Ａｉ、　　　　　　　　・・・・・・■ここで−ｅ
□０は、セグメンテーションされた区間中の各フレーム
におけるパラメータが、その音素、の標準パタ／にどれ
だけ合致するかを表わす尺度として計算される。また、
看□２は、セグメンテーションされた区間が長過ぎ、ま
たは短過ぎの場合の尤度の減点を表わす。長母音は前記
のように複数音素をまとめて尤度計算を行なうので０式
とは若干異なるが、基本的には同様で、標準バタンとの
合致度と、長さによる減点で尤度を決定する。ここで単
語／ＪＵＵＫＩ／と／ＪＵＫＩ／は、／Ｕ／が長母音か
短母音かという点のみ異なる。このような語を識別する
ため、本従来例において、長母音、この例では／ＵＵ／
の尤度計算時に／　Ｕ　Ｕ　／の区間の長さが予め定め
られたスレッショルドよシ短い場合には尤度の減点を行
ない、一方、通常の短母音。In this conventional example, the likelihood of the second phoneme in the dictionary phoneme sequence! , is expressed by the following formula.・, e otsu = stone,
. -Ai, ・・・・・・■Here -e
□0 is calculated as a measure representing how well the parameters in each frame in the segmented interval match the standard pattern of that phoneme. Also,
□2 represents a reduction in likelihood when the segmented section is too long or too short. For long vowels, as mentioned above, the likelihood is calculated by grouping multiple phonemes together, so it is slightly different from formula 0, but it is basically the same, and the likelihood is determined by the degree of match with the standard vowel and the deduction of points depending on the length. do. Here, the words /JUUKI/ and /JUKI/ differ only in whether /U/ is a long vowel or a short vowel. In order to identify such words, in this conventional example, a long vowel, in this example /UU/
When calculating the likelihood of /UU/, if the length of the interval is shorter than a predetermined threshold, the likelihood is deducted, whereas for ordinary short vowels.

この例では／Ｕ／の尤度計算時に、／’Ｕ／の区間の長
きがスレッショルドよシ長い場合には尤度の減点を行な
っている。In this example, when calculating the likelihood of /U/, if the length of the interval of /'U/ is longer than the threshold, points are deducted from the likelihood.

しかしながら、上記従来例においては以下のような欠点
があった。入力単語が／ＪＵＵＫＩ／で、辞書音素系列
も／ＪＵＵＫＩ／である場合、／Ｊ／のセグメンテーシ
ョンにおいて、パラメータの変動の大きい部分を／Ｊ／
と／Ｕ／の境界としているが、／Ｊ／の区間が非常に長
くなり／ＵＵ／の区間がその分短くなってし甘うことが
しばしばある。これは半母音に長母音が後続すると、短
母音が後続する場合と比べ、半母音特有のパラメータ変
化が長く続くからである。つまり聴感的には、／ＪＵＵ
／の／ＵＵ／は、／　Ｊ　Ｕ　／の／Ｕ／より明らかに
長いのであるが、／　Ｊ’　Ｕ　Ｕ　／の／ＵＵ／は、
単に／Ｕ／を引き伸ばしだものではなく、／Ｊ／の性質
が長く続いているということである。よって、前記のよ
うに／Ｊ／のセグメンテーションを行なうと、／ＪＵＵ
／が／ＪＵ／に比べ長くなった分を、／Ｊ／が長くなる
こＬによって食ってしまい、／、Ｔ／を除いた／ＵＵ／
の部分は長母音にしては短いということが生じる。この
ような時、入力／Ｊ　ＵＵＫ　Ｔ　／に対し、辞書項目
／　Ｊ　Ｕ　ＵＪ（Ｉ　／において／　Ｕ　Ｕ　／の尤
度が短過ぎ減点のために低くなり、一方、辞書項目／Ｊ
ＵＫＩ／の／　Ｕ　／は減点されないため、高い尤度と
なって＠似度も／ＪＵＫＩ／の方が大となり、単語認識
結果が／ＪＵＫＩ／に誤ってしまう欠夾があった。However, the above conventional example had the following drawbacks. When the input word is /JUUKI/ and the dictionary phoneme sequence is also /JUUKI/, in the segmentation of /J/, the part with large parameter variation is divided into /J/
and /U/, but it is often the case that the /J/ section becomes extremely long and the /UU/ section becomes correspondingly short. This is because when a semi-vowel is followed by a long vowel, the parameter changes peculiar to semi-vowels last longer than when a short vowel follows. In other words, audibly, /JUU
/UU/ of / is clearly longer than /U/ of / J U /, but /UU/ of / J' U U / is
It is not simply an extension of /U/, but the properties of /J/ continue for a long time. Therefore, if we perform the segmentation of /J/ as described above, /JUU
The length of /J/ compared to /JU/ is taken up by the length of /J/, and /UU/ is obtained by removing /, T/.
This happens to be short for a long vowel. In such a case, for the input /J UUK T /, the likelihood of /U U / in dictionary entry /J
Since the / U / of UKI/ is not deducted, the likelihood is high and the degree of @ similarity is also higher for /JUKI/, which has the drawback that the word recognition result is incorrectly written as /JUKI/.

発明の目的本発明は、上記従来例の欠点を除去するものであり、類
似度計算の精度を向上させ、それにより単語認識率を向
上させることを目的とする。OBJECTS OF THE INVENTION The present invention eliminates the drawbacks of the above-mentioned conventional examples, and aims to improve the accuracy of similarity calculation, thereby improving the word recognition rate.

発明の構成本発明は、上記目的を達成するために、半母音に後続す
る長母音又は短母音の長さの適否を判定するにあたり、
その長母音又はその短母音の区間長と、先行する半母音
の区間長の和を用いることにより、長母音の区間長が短
い場合でも先行する半母音の区間との和が長ければ、長
母音の短過ぎによる尤度の減点を行なわず、一方、短母
音の区間長が短い場合でも先行する半母音の区間との和
が長い時には尤度の減点を行ない、それによシ尤度、類
似度計算の精度を向上させる効果を持つものである。Structure of the Invention In order to achieve the above-mentioned object, the present invention provides the following steps when determining the appropriateness of the length of a long vowel or a short vowel following a semi-vowel.
By using the sum of the interval length of the long vowel or its short vowel and the interval length of the preceding semi-vowel, even if the interval length of the long vowel is short, if the sum of the interval length of the preceding semi-vowel is long, then the long vowel is short. On the other hand, even if the interval length of a short vowel is short, if the sum with the interval of the preceding semi-vowel is long, the likelihood is deducted. It has the effect of improving

゛　実施例の説明以下に本発明の一実施例について、図面とともに説明す
る。本実施例の基本構成は、前記従来例と同様であシ、
また単語辞書、音素の標準パタ／も前記従来例と同様で
ある。゛Description of an Embodiment An embodiment of the present invention will be described below with reference to the drawings. The basic configuration of this embodiment is the same as that of the conventional example,
Further, the word dictionary and the standard pattern of phonemes are the same as in the conventional example.

本実施例の動作について化４明する。本実施例の前記従
来例と異なる所は、半母音に後続する母音の尤度計舞法
であり、他の部分は同様であるから、その尤度計算法に
ついて述べる。なお、ここで半母音と言う場合、語頭や
母音に挾まれた半母音だけでなく　、ｉｎｋ音における
半母音部分、すなわちリヤ（／ＲＪＡ／）等の／Ｊ／も
含む。尤度計算は、前記従来例と同様、入力音声を辞書
音素素列に従ってセグメンテーションし、０式または長
母音の場合０式に準じた式により尤度を求める。しかし
、本実施例において、尤度の減点、すなわち０式におけ
るｐＩｐの決め方が前記従来例と異なる。本実施例にお
ける、半母音に後続する長母音の尤度計算において、長
母音の短過ぎの減点は、長母音だけの区間の長さではな
く、長母音の区間と先行する半母音の区間の長ざの和に
対してスレッショルドを設け、その和がスレッショルド
よσ短い場合に行なう。さらに、半母音に後続する短母
音の尤度計算において、短母音の長過ぎ、の減点は、そ
の母音だけの区間の長さではなく、短母音の区間と先行
する半母音の区間の長さの和に対してスレン　　　　゛
ショルドを設け、その和がスレッショルドより長い場合
に行なう。The operation of this embodiment will be explained below. The difference between this embodiment and the conventional example is the method of calculating the likelihood of a vowel following a semi-vowel, and since the other parts are the same, the method of calculating the likelihood will be described below. Note that when we say semi-vowels here, we include not only semi-vowels at the beginning of words or between vowels, but also the semi-vowel part of the ink sound, ie /J/ such as riya (/RJA/). In the likelihood calculation, as in the conventional example, the input speech is segmented according to the dictionary phoneme sequence, and the likelihood is calculated using the 0 formula or a formula similar to the 0 formula in the case of a long vowel. However, in this embodiment, the deduction of the likelihood, that is, the method of determining pIp in the 0 equation is different from the conventional example. In this example, in calculating the likelihood of a long vowel following a semi-vowel, the deduction for being too short for a long vowel is not based on the length of the section containing only the long vowel, but on the difference in length between the long vowel section and the preceding semi-vowel. A threshold is set for the sum of , and this is done when the sum is σ shorter than the threshold. Furthermore, when calculating the likelihood of a short vowel following a semi-vowel, the point deduction for a short vowel that is too long is not based on the length of the segment containing only that vowel, but rather on the sum of the lengths of the short vowel segment and the preceding semi-vowel segment. A threshold is set for , and this is done when the sum is longer than the threshold.

本実施例の効果の例を述べる。前記従来例と同じ、図に
示す入力単語／ＪＵＵＫＩ／の場合を述べる。この入力
例において、辞書項目も／ＪＵＵＫＩ／である時、セグ
メンテーション結果は前記従来例と同様／Ｊ／の区間が
長く、／ＵＵ／の区間は長毎　　′音にしては短いとい
う結果になった。しかし、本実施例では／ＵＵ／の区間
に／Ｊ／の区間を加えた長さの和をみるので、ここでは
／Ｊ／の区間が長いだめ、／ＵＵ／に対する短過ぎの尤
度の減点は無かった。また、同じ人力／ＪＵＵＫＩ／に
対し、辞書項目／ＪＵＫＩ１０場合、セグメンテーショ
ン結果は／Ｊ／は／Ｊ　ＵＵＫ　Ｉ　／の／Ｊ／と同じ
区間、／Ｕ／は／Ｊ　ＵＵＫ　Ｉ　／の／ＵＵ／と同じ
区間となった。ことて／Ｕ／の長さは、短母音の／Ｕ／
とじて標準的な長さであった。しかし、本実施例におい
ては／Ｕ／の区間に／Ｊ／の区間を加えた和をみるため
、ここでは／Ｊ／の区間が非常に長いため、その和がス
レッショルドを越え、／Ｕ／の尤度は長過ぎ減点された
。この結果、この／Ｊ　ＵＵＫ　Ｉ　／の入力に対し、
辞書項目／ＪＵＵＫＩ／における類似度は、前記従来例
の場合と比べ、／ＵＵ／の短過ぎ減点が無い分だけ大き
くなり、一方辞書項目７’ＪＵＩ＜Ｉ／における類似度
は、前記従来例の場合と比べ、／Ｕ／の長過ぎ減点の分
だけ小さくなり、これらによね単語認識結果は正しく／
ＪＵＵＫＩ／となつた。An example of the effect of this embodiment will be described. The case of the input word /JUUKI/ shown in the figure, which is the same as the conventional example, will be described. In this input example, when the dictionary entry is also /JUUKI/, the segmentation result is the same as in the previous example, where the section for /J/ is long, and the section for /UU/ is short for each long ' sound. . However, in this example, since we look at the sum of the lengths of the /UU/ section plus the /J/ section, here the /J/ section is long, so the likelihood of being too short for /UU/ is deducted. There was no. Also, for the same human power /JUUKI/, if the dictionary entry /JUKI10, the segmentation result is /J/ is the same interval as /J/ in /J UUK I /, and /U/ is /UU/ in /J UUK I /. It was the same area. The length of kotte /U/ is the short vowel /U/
It was a standard length. However, in this example, the sum of the /U/ interval plus the /J/ interval is looked at, so the /J/ interval is very long, so the sum exceeds the threshold, and the /U/ interval is looked at. Likelihood was too long and points were deducted. As a result, for this /J UUK I / input,
The degree of similarity in the dictionary entry /JUUKI/ is greater than that in the conventional example because /UU/ is not deducted for being too short, while the degree of similarity in the dictionary item 7'JUI<I/ is greater than that in the conventional example. Compared to the case, /U/ is reduced by the point deducted for being too long, and the word recognition result is correct due to /U/.
JUUKI/Natsuta.

本実施例においては、半母音に後続する長母音の短過ぎ
に対する尤度の減点、及び半母音に後続する短母音の長
過ぎに対する尤度の減点を、それら母音の区間と先行す
る半母音の区間の長さの和に対してスレッショルドを設
けて行なうことにより、半母音に後続する長母音、及び
短母音の先度の減点を適確に行ない、尤度計算の精度を
向上できる利点がある。In this example, the likelihood deduction points for a long vowel that follows a semi-vowel that is too short, and the likelihood deduction points for a short vowel that follows a semi-vowel that is too long are calculated based on the length of the interval of those vowels and the interval of the preceding semi-vowel. By setting a threshold for the sum of s, it is possible to accurately deduct points for long vowels and short vowels that follow a semi-vowel, thereby improving the accuracy of likelihood calculation.

発明の効果本発明は上記のような構成であり、半母音に後続する長
母音の短過ぎの制限、半母音に後続する１、短母音の長
過ぎの制限を、それら母音の区間に先行する半母音の区
間の長さを加えた長さの和に対し、スレッショルドを設
けて尤度を減点することにより行ない、長母音と短母音
の識別度を上げる方向に類似度計算の精度を向上させ、
高い単語認識率を得ることができる。Effects of the Invention The present invention has the above-mentioned configuration, and limits the length of a long vowel that follows a semi-vowel, and limits the length of a short vowel that follows a semi-vowel. This is done by setting a threshold for the sum of the lengths plus the length of the interval and subtracting the likelihood, improving the accuracy of similarity calculation in the direction of increasing the degree of discrimination between long and short vowels.
A high word recognition rate can be obtained.

[Brief explanation of drawings]

図は、従来及び本発明の一実施例における音声認識方法
を示す図である。The figure is a diagram showing a speech recognition method according to a conventional method and an embodiment of the present invention.

Claims

[Claims]

(1) Have a word dictionary in which the words to be recognized are expressed in phonemes, and a standard pattern for each phoneme, and match the input speech with each dictionary entry in the word dictionary, and then Segment the input speech for each phoneme, calculate the likelihood, which is a measure of the probability that the segmented speech interval is the one that uttered that phoneme, and use this likelihood value to identify each dictionary item. When recognizing an input word based on the similarity of the input speech, a restriction on whether a long vowel following a semi-vowel is too short is applied to the sum of the lengths of the long vowel and the preceding semi-vowel. voice recognition method.

(2) The speech recognition method according to claim 1, wherein the short vowel following a semi-vowel is restricted from being too long with respect to the sum of the lengths of the short vowel and the preceding semi-vowel.