JPH01260499A

JPH01260499A - Consonant recognizing method

Info

Publication number: JPH01260499A
Application number: JP8982988A
Authority: JP
Inventors: Masakatsu Hoshimi; 昌克星見; Katsuyuki Futayada; 二矢田　勝行
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1988-04-12
Filing date: 1988-04-12
Publication date: 1989-10-17

Abstract

PURPOSE:To improve the accuracy of a large classification of a consonant and the recognition rate by applying the magnitude of a dip of low band power and high band power of an input voice spectrum to a discrimination chart of the succeeding separate vowel and classifying it into four consonant groups and several intermediate areas. CONSTITUTION:A power dip is detected from low band power and high band power by a power dip detecting part 10, and magnitude of the dip is derived by a power dip magnitude extracting part 11. Subsequently, with respect to each phoneme group, a standard pattern consisting of a variance, covariance and an average value is generated by using the magnitude of the power dip of the low band and the high band as a parameter, and a discrimination chart for executing in advance a discrimination to the phoneme group of the highest similarity against each input is generated. Next, when the consonant is divided roughly into four phoneme groups by using this discrimination chart, an erroneous classification is generated in the vicinity of a boundary of the discrimination chart, and the recognition cannot be executed correctly. Therefore, an intermediate area is set in the vicinity of vicinity of a boundary of the phoneme group, the consonant which has been classified into this intermediate area executes matching with both standard patterns of the adjacent phoneme groups and the consonant is recognized. In such a way, the recognition rate can be improved.

Description

【発明の詳細な説明】産業上の利用分野本発明は音素認識を行なうことを特徴とする音素認識方
法における子音の認識法に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a consonant recognition method in a phoneme recognition method characterized by performing phoneme recognition.

従来の技術最近、子音の認識法は音声認識の分野で盛んに利用され
るようになってきた。この子音認識法は、例、ｔ、ば「
音声スペクトルの概略形とその動特性を利用した単語音
声認識システム」（日本音響学会誌３４巻３号１９７Ｂ
　）に記載されている構成が知られている。BACKGROUND OF THE INVENTION Recently, consonant recognition methods have been widely used in the field of speech recognition. This consonant recognition method uses, for example, t, ``
"Word speech recognition system using the outline form of the speech spectrum and its dynamic characteristics" (Journal of the Acoustical Society of Japan, Vol. 34, No. 3, 197B)
) is known.

以下、第１３図及び第１４図を参照して従来の子音認識
法について説明する。Hereinafter, a conventional consonant recognition method will be explained with reference to FIGS. 13 and 14.

まず最初に入力音声を音素単位に分けて音素の１組合わ
せとして認識しく音素認識と呼ぶ）音素単位で表記され
た単語辞書との類似度を求めて認識結果を出力する従来
の単語認識装置の機能ブロック図を第１３図に示す。ま
ず、あらかじめ多数話者の音声を１フレーム（１フレー
ムは１０ｒｎｓ・Ｃとする）毎に音響分析部１によって
フィルタ・バンクを用いて分析し、得られたスペクトル
情報をもとに特徴抽出部２によって特徴パラメータを求
める。この特徴パラメータから５母音や子音の音素グル
ープ毎に標準パターンを作成して標準パターン登録部３
に登録しておく。実際に認識を行なう場合においては、
特徴抽出部２によって求められた特徴パラメータを用い
てセグメンテーション部４において子音のセグメンテー
ションを行なう。First, the conventional word recognition device divides the input speech into phoneme units and recognizes them as a combination of phonemes (called phoneme recognition). A functional block diagram is shown in FIG. First, the audio analysis unit 1 analyzes the voices of multiple speakers using a filter bank for each frame (one frame is 10rns·C), and the feature extraction unit 2 analyzes the voices of multiple speakers using a filter bank based on the obtained spectrum information. Find the feature parameters by The standard pattern registration unit 3 creates standard patterns for each of the five vowel and consonant phoneme groups from these feature parameters.
Register in . When actually performing recognition,
A segmentation unit 4 performs segmentation of consonants using the feature parameters determined by the feature extraction unit 2.

この結果をもとに、音素判別部５において、標準パター
ン登錬部３の標準パターンと照合することによって音素
を決定する。最後に、この結果作成した音素の時系列を
単語認識部６に送り、同様に音素の時系列で表現された
単語辞書７と最も類似度の大きい項目に該当する単語を
認識結果として出力する。Based on this result, the phoneme discrimination section 5 determines the phoneme by comparing it with the standard pattern of the standard pattern training section 3. Finally, the time series of phonemes created as a result is sent to the word recognition unit 6, and the word corresponding to the item with the highest degree of similarity to the word dictionary 7 similarly expressed in the time series of phonemes is output as a recognition result.

ここで、セグメンテーションは第１４図のように全域パ
ワーの時間的変化８の形が凹状の形をしている時（これ
をデイツプと呼ぶ）、パワーが極小値を示すフレームを
０１　とし、ｎｌ　の前後のフレームでパワーの時間に
よる変化速度９（これをパワーの差分値と呼ぶ）が負お
よび正の極大値を示すフレームをｎ２．ｎ３とする。ま
た、あるフレームｎにおける差分値をＷＤ（ｎ）とする
と、ｗｏ　　（ｎ　　　　）、　　ｗｏ（ｎ　　３　）
がノＷ　Ｄ　（ｎ　２　）≦θＷＷ　Ｄ　（ｎ　３　）≦θＷの条件を満足する時、ｎ２〜ｎ３までの区間を子音区間
とする。ここで、Ｏｗは子音の付加を防ぐためのいき値
である。Here, for segmentation, when the shape of the temporal change 8 of the overall power is concave as shown in Fig. 14 (this is called a dip), the frame in which the power shows the minimum value is set as 01, and the frame where the power shows the minimum value is set as 01. A frame in which the rate of change in power over time 9 (this is called a power difference value) has negative and positive maximum values in the previous and subsequent frames is designated as n2. Let it be n3. Also, if the difference value at a certain frame n is WD(n), wo(n), wo(n 3 )
When satisfies the condition of WD (n 2 )≦θW D (n 3 )≦θW, the section from n2 to n3 is defined as a consonant section. Here, Ow is a threshold value for preventing addition of consonants.

つぎに、この子音区間に対してフレーム毎に音素の特徴
を示す特徴パラメータを求め、あらかし用意されている
各音素の標準パターンと比較してフレーム毎に子音分類
を行なう。この結果を子音分類ツリーに適用して、条件
の一致したものに子音を分類する。Next, characteristic parameters indicating phoneme characteristics for this consonant section are determined for each frame, and consonant classification is performed for each frame by comparing it with a standard pattern for each phoneme for which a pattern has been prepared. This result is applied to the consonant classification tree to classify consonants into those that match the conditions.

発明が解決しようとする課題しかし、以上のような構成では、最初にパワーディップ
を用いて語中子音のセグメンテーションを行ない、つぎ
にフレーム毎に子音分類を行なう。Problems to be Solved by the Invention However, in the above configuration, first segmentation of mid-word consonants is performed using power dip, and then consonant classification is performed for each frame.

そして、最後にフレーム毎の子音分類の結果を子音分類
ツリーにあてはめて、条件の一致したものに子音を分類
するために非常にアルゴリズムも複雑で手間のかかると
いう課題があった。また、パワー情報として全域パワー
だけを使用しているために子音のセグメンテーション精
度も悪いという課題もあった。Finally, the results of the consonant classification for each frame are applied to the consonant classification tree, and the algorithm is extremely complex and time-consuming in order to classify the consonants into those that meet the conditions. Furthermore, since only the overall power is used as power information, there is also the problem that the accuracy of consonant segmentation is poor.

本発明は従来技術の以上のような課題を解決するもので
、入力音声の子音の大分類と認識をきわめて簡単に精度
良く行なうことを目的とするものである。The present invention is intended to solve the above-mentioned problems of the prior art, and it is an object of the present invention to greatly simplify and accurately classify and recognize consonants in input speech.

課題を解決するための手段本発明は、入力音声スペクトルの低域パワーと高域パワ
ーのデイツプの大きさを後続母音別の判別図に適用し４
つの子音群といくつかの中間領域に分類することによっ
て上記目的を達成するものである。Means for Solving the Problems The present invention applies the magnitude of the dip between the low-frequency power and the high-frequency power of the input speech spectrum to a discriminant diagram for each subsequent vowel.
The above objective is achieved by classifying consonants into two consonant groups and several intermediate regions.

作　　用本発明は上記構成により、低域パワーと高域パワーのデ
イツプの大きさを後続母音別の判別図に適用することに
よって、子音の大分類が簡単に精度良く行なえ、しかも
認識率を向上させることができる。Effect of the Invention With the above configuration, the present invention applies the size of the dip of low-frequency power and high-frequency power to the discriminant diagram for each subsequent vowel, thereby making it possible to easily and accurately classify consonants and improve the recognition rate. can be done.

実施例以下、図面を参照しながら本発明の実施例について説明
する。Embodiments Hereinafter, embodiments of the present invention will be described with reference to the drawings.

本実施例では、音素／ｐ／、／ｌ／、／に／、／ｃ／。In this example, the phonemes /p/, /l/, /ni/, /c/.

／ｂ／、／ｄ／、／ｍ／、／ｎ／、／ｓ／、／ｈ／を無
音破裂音（／ｐ／、／ｌ／、／に／、／ｃ／）、有音破
裂音（／ｂ／。/b/, /d/, /m/, /n/, /s/, /h/ as silent plosives (/p/, /l/, /ni/, /c/) and voiced plosives (/p/, /l/, /ni/, /c/). /b/.

／ｄ／’ｔ、鼻音（／ｒｎ／　、　／ｎ／　）　、無声
摩擦音（／、／。/d/'t, nasals (/rn/, /n/), voiceless fricatives (/, /.

／ｈ／）の４つの音素群と、各音素群の境界付近は中間
領域として子音を大分類認識する場合の例を説明する。An example will be described in which consonants are roughly classified into four phoneme groups (/h/) and the vicinity of the boundary between each phoneme group is regarded as an intermediate region.

以下、図面を参照しながら本発明の実施例について説明
する。Embodiments of the present invention will be described below with reference to the drawings.

第１図は本発明の一実施例における子音認識法を具現化
する機能ブロック図である。第１図において、１０はパ
ワーディップ検出部、１１はパワーディップの大きさ抽
出部で、パワーディップ検出部１０により検出されたデ
イツプの大きさを求める。１２はフレーム毎の母音認識
部で、パワーディップ検出部１０により検出されたデイ
ツプの終端からフレーム毎に母音認識を行なう。１３は
後続母音認識部で、フレーム毎の母音認識部１２により
得られたフレーム毎の母音認識結果から後続母音の認識
を行なう。１４は判別図選択部で、後続母音認識部１３
で認識した後続母音用の判別図を後続母音別判別図格納
部１５から選択する。FIG. 1 is a functional block diagram embodying a consonant recognition method in an embodiment of the present invention. In FIG. 1, 10 is a power dip detection section, and 11 is a power dip magnitude extraction section, which obtains the magnitude of the dip detected by the power dip detection section 10. Reference numeral 12 denotes a vowel recognition unit for each frame, which performs vowel recognition for each frame from the end of the dip detected by the power dip detection unit 10. Reference numeral 13 denotes a subsequent vowel recognition unit, which recognizes subsequent vowels from the vowel recognition results for each frame obtained by the vowel recognition unit 12 for each frame. 14 is a discriminant diagram selection unit, and a subsequent vowel recognition unit 13
The discriminant diagram for the subsequent vowel recognized in is selected from the discriminant diagram storage unit 15 for each subsequent vowel.

１６は大分類判定部で、判別図選択部１４で選択された
判別図を用いて子音の大分類の判定を行なう。標準パタ
ーン選択部１７では、大分類判定部１６の大分類の結果
から必要な標準パターンを標準パターン格納部１８から
取り出して子音認識部１９で標準パターンとマツチング
を行ない子音の認識を行なう。Reference numeral 16 denotes a major classification determining section, which uses the discriminant diagram selected by the discriminant diagram selecting section 14 to determine the major classification of consonants. The standard pattern selection section 17 takes out a necessary standard pattern from the standard pattern storage section 18 based on the results of the major classification by the major classification determination section 16, and matches it with the standard pattern in the consonant recognition section 19 to perform consonant recognition.

以上のような構成において、以下その動作を説明する。The operation of the above configuration will be explained below.

本発明では、特徴パラメータとして低域パワー・高域パ
ワーを使用する。有声子音は高域パワーに、無声子音は
低域パワーにパワーディップが現われやすい。In the present invention, low frequency power and high frequency power are used as characteristic parameters. Voiced consonants tend to have a power dip in high-frequency power, and voiceless consonants tend to have a power dip in low-frequency power.

したがって、低域・高域パワーを併用することによりす
べての子音に対応出来るようになる。また、パワーディ
ップの大きさは後続母音の影響をを受けるので、後続母
音別に判別図を作成すると精度が向上する。Therefore, by using both low and high frequency power, it becomes possible to respond to all consonants. Furthermore, since the magnitude of the power dip is affected by the following vowel, accuracy will be improved if a discriminant diagram is created for each subsequent vowel.

次に具体的に説明をすると、まず最初に低域パワーと高
域パワーからパワーディップを検出しデイツプの大きさ
を求める。このパワーディップの大きさの求め方を第２
図（ａ）、　（ｂ）で説明する。図において高域パワー
の時間的変化速度２１が正の極大値になるフレームをｎ
ｌ　％低域パワーの時間的変化速度２３が正の極大値に
なるフレームをｎ２とする。この各フレームにおける変
化速度の大きさをＷＤ　（ｎ　１）　、ＷＤ　（ｎ　２
　）とする。低域パワーディップの大きさＰＬと高域パ
ワーディップの大きさＰＨをＰＨ＝　ＷＤ　（ｎｌ　）ＰＬ＝ＷＤ（ｎ２）のように定義をする。Next, to explain specifically, first, a power dip is detected from the low frequency power and the high frequency power, and the size of the dip is determined. The second method for determining the magnitude of this power dip is
This will be explained using Figures (a) and (b). In the figure, n is the frame in which the temporal change rate 21 of the high frequency power has a positive maximum value.
Let n2 be the frame in which the temporal change rate 23 of l% low-frequency power reaches a positive maximum value. The magnitude of the rate of change in each frame is WD (n 1), WD (n 2
). The magnitude PL of the low frequency power dip and the magnitude PH of the high frequency power dip are defined as PH=WD (nl) PL=WD(n2).

このデイツプの大きさを用いて、無声破裂音（／ｐ／、
／ｌ／、／に／、／ｃ／）、有声破裂音（／ｂ／。Using this dip size, voiceless plosives (/p/,
/l/, /ni/, /c/), voiced plosive (/b/.

／ｄ／）、鼻音Ｃ／ｍ／、／、／＞、無声摩擦音Ｃ／ｍ
／。/d/), nasal C/m/, /, />, voiceless fricative C/m
/.

／ｈ／）の場合のＰＬとＰＨの分布を調べると第３図〜
第６図のようになる。図において横軸がＰＬ縦軸がＰＨ
で図中の数字は音素の出現個数を表わしている。図から
明らかなように破裂性を示す音素はＰＬ、ＰＨともに大
きく、とくに無声破裂音はＰＬが大きく、有声破裂音は
ＰＨが大きい。また、破裂性を示さない音素はＰＬ、Ｐ
Ｈとも小さいが、有声子音か無声子音かによって第５図
、第６図のように分かれる。/h/) When examining the distribution of PL and PH, Figure 3~
It will look like Figure 6. In the figure, the horizontal axis is PL and the vertical axis is PH.
The numbers in the figure represent the number of phonemes that appear. As is clear from the figure, phonemes that exhibit plosiveness have large PL and PH, particularly voiceless plosives have large PL, and voiced plosives have large PH. In addition, phonemes that do not show plosiveness are PL, P
H is also small, but it can be divided into voiced or voiceless consonants as shown in Figures 5 and 6.

したがって、低域と高域のパワーディップの大きさを使
用することによって子音の大分類を行なうことが出来る
。各音素群に対して、ＰＬ　−ＰＨをパラメータとして
分散共分散、平均値からなる標準パターンを作成し、あ
らかじめ各入力に対して最も類似度の高い音素群に判別
を行なう判別図を作成する。この判別図を用いて子音を
４つの音素群に大分類をすると判別図の境界付近で誤分
類ができる。この例の場合（／ｐ／、／ｉ／、／に／、
／ｃ／）が（／ｂ／、／ｄ／）に約７％、（／ｂ／、／
４／）が（、’ｐ／、／ｌ／、／に／、／ｃ／）　　に
約８．３％程度誤まって分類される。このままでは、判
別図で大分類を誤まると誤まった音素群で標準パターン
とマツチングを行なうので正しく認識が出来なくなる。Therefore, by using the magnitude of the power dip in the low and high frequencies, consonants can be roughly classified. For each phoneme group, a standard pattern consisting of variance, covariance, and average value is created using PL - PH as parameters, and a discriminant diagram is created in advance to discriminate the phoneme group with the highest degree of similarity for each input. If consonants are roughly classified into four phoneme groups using this discriminant diagram, misclassification will occur near the boundaries of the discriminant diagram. In this example (/p/, /i/, /ni/,
/c/) is about 7% more than (/b/, /d/), (/b/, /
4/) is incorrectly classified as (,'p/, /l/, /ni/, /c/) by about 8.3%. If this continues, if the major classification is incorrect in the discriminant diagram, the incorrect phoneme group will be matched with the standard pattern, making it impossible to recognize it correctly.

そこで、音素群の境界付近に中間領域を設定し、この中
間領域に分類された子音は隣接する音素群の両方の標準
パターンとマツチングを行ない子音の認識を行なうこと
によって認識率を向上させる。Therefore, an intermediate region is set near the boundary of phoneme groups, and consonants classified into this intermediate region are matched with standard patterns of both adjacent phoneme groups to recognize consonants, thereby improving the recognition rate.

第７図に判別図の例を示す。図において横軸が低域、縦
軸が高域パワーディップの大きさである。FIG. 7 shows an example of a discriminant diagram. In the figure, the horizontal axis represents the low frequency range, and the vertical axis represents the magnitude of the high frequency power dip.

実線で４つの領域に区切ったのが中間減額のない場合の
判別境界である。点線で囲んだＩ−４の領域が中間領域
である。子音区間のパワーディップの大きさがこの中間
領域に入った場合は、隣接する音素群の音素標準パター
ンとマツチングを行なう。中間領域Ｉは（／ｐ／、／ｌ
／、／に／、／ｃ／、／ｂ／。The four areas divided by solid lines are the discrimination boundaries in the case where there is no intermediate reduction. The area I-4 surrounded by the dotted line is the intermediate area. When the magnitude of the power dip in the consonant section falls within this intermediate range, matching is performed with the phoneme standard pattern of the adjacent phoneme group. The intermediate region I is (/p/, /l
/, /ni/, /c/, /b/.

／ｄ／）、１は（／ｐ／　、／ｌ／　、／に／　、／ｃ
／　、／ｓ／　。/d/), 1 is (/p/, /l/, /ni/, /c
/ , /s/.

／ｈ／）　、　Ｉｔは（／ｂ／、／ｄ／、／、／、／ｈ
／）、　ＩＶは（／ｂ／、／ｃ＋／、／ｍ／、／ｎ／）
　　、Ｖ　　は　（／ｍ／、／、／。/h/), It is (/b/, /d/, /, /, /h
/), IV is (/b/, /c+/, /m/, /n/)
, V is (/m/, /, /.

／３／、／ｈ／）の音素標準パターンとマツチングを行
なう。中間領域以外の４つの音素群ｌζ対しては、それ
ぞれの音素群の標準パターンとマツチングを行なう。/3/, /h/) is matched with the phoneme standard pattern. For the four phoneme groups lζ other than the intermediate region, matching is performed with the standard pattern of each phoneme group.

また、後続母音別にパワーディップの大きさを調べてみ
ると同じ子音でもデイツプの大きさが違うことがわかる
。そこで、子音セグメンテーション精度を向上させるた
めに後続母音情報を利用し、後続母音別に子音の大分類
を行なう。例として音素／　ｒ　／の場合の後続母音別
のパワーの時間的変化パターンを第８図〜第１２図に示
す。第８図は／、ａ／、第９図は／ｒｌ／、第１０図は
／、ｕ／、　　第１１図は／、６／、第１２図は／「０
／の場合を示す。Furthermore, when we examine the magnitude of the power dip for each subsequent vowel, we find that the magnitude of the dip is different even for the same consonant. Therefore, in order to improve consonant segmentation accuracy, subsequent vowel information is used to roughly classify consonants by subsequent vowel. As an example, FIGS. 8 to 12 show temporal change patterns of power for each subsequent vowel in the case of the phoneme /r/. Figure 8 is /, a/, Figure 9 is /rl/, Figure 10 is /, u/, Figure 11 is /, 6/, Figure 12 is /'0
The case of / is shown.

図において横軸が時間、縦軸がパワーの大きさ、実線が
パワーの時間的変化、点線がパワーの時間的変化速度、
ＰＬが低域パワー、ＰＨが高域パワーの動きを示す。ま
た第３図から順に後続母音が／ａ／、／Ｉ／、／、／、
／、ｓ／、１０／になっている。デイツプの大きさを後
続母音別に見ると、／Ｕ／。In the figure, the horizontal axis is time, the vertical axis is the magnitude of power, the solid line is the change in power over time, and the dotted line is the rate of change in power over time.
PL indicates the movement of low frequency power, and PH indicates the movement of high frequency power. Also, starting from Figure 3, the following vowels are /a/, /I/, /, /,
/, s/, 10/. Looking at the dip size by following vowel, /U/.

１０／（第１０図、第１２図）の時の高域パワーディッ
プの大きさが他の後続母音（／ａ／、／Ｉ／。10/ (Figures 10 and 12), the magnitude of the high-frequency power dip is greater than that of other subsequent vowels (/a/, /I/).

／ｅ／）よりも小さいことがわかる。これは、母音によ
ってパワーが少しずつ違うために子音から母音へのパワ
ーの時間的変化速度に差が出てくるためである。／「／
以外の子音についても同様にパワーの時間的変化を調べ
てみると、後続母音によってパワーディップの大きさが
違うことがわかる。/e/). This is because the power varies slightly depending on the vowel, resulting in a difference in the speed at which the power changes over time from a consonant to a vowel. ／``／
Similarly, when we examine the temporal changes in power for other consonants, we find that the magnitude of the power dip differs depending on the following vowel.

したがって、低域と高域のパワーディップの大きさを使
用し後続母音別に子音の大分類を行なえば精度が向上す
る。後続母音の認識は、入力音声データと、あらかじめ
多くのデータから作成した５母音の標準パターンとの類
似度計算によりフレームごとに認識している。第２図（
０）にフレーム毎の第１位と２位の母音認識結果を示す
。簡単に後続母音を判定するために、例えば子音区間候
補の後５フレームにおける母音認識結果を使用し、第１
位に認識された場合２点、第２位で認識された場合１点
として、５フレームの中で各母音別に集計し最も点数の
高い音素を認識結果とする。（第２図では／ｅ／が最も
点数が高い）以上述べた方法により、特徴パラメータである低域パワ
ーと高域パワーディップの大きさを、あらかじめ多くの
音声データからサンプルを求め４つの各音素群に判別す
る判別図を作成する。この判別図の各音素群の境界に中
間領域を設定し、この中間領域に入った場合には、隣接
する両方の音素群の標準パターンとマツチングを行ない
子音の認識を行なう。またフレーム毎の母音認識結果か
ら後続母音の判定を行ない、その母音用の判別図を適用
する。Therefore, accuracy can be improved if consonants are roughly classified by following vowel using the magnitude of the power dip in the low and high frequencies. The subsequent vowels are recognized frame by frame by calculating the similarity between the input voice data and a standard pattern of five vowels created in advance from a large amount of data. Figure 2 (
0) shows the first and second vowel recognition results for each frame. In order to easily determine the subsequent vowel, for example, the vowel recognition results in 5 frames after the consonant interval candidate are used, and the first
If the phoneme is recognized in the first place, it is given 2 points, and if it is recognized in the second place, it is given 1 point.The phoneme with the highest score is counted for each vowel in the five frames and is taken as the recognition result. (In Figure 2, /e/ has the highest score.) Using the method described above, the magnitude of the characteristic parameters low-frequency power and high-frequency power dip is obtained by obtaining samples from a large number of audio data in advance, and comparing them to each of the four phonemes. Create a discriminant diagram that discriminates into groups. An intermediate area is set at the boundary of each phoneme group in this discriminant diagram, and when the intermediate area is entered, consonant recognition is performed by matching with standard patterns of both adjacent phoneme groups. Further, a subsequent vowel is determined from the vowel recognition results for each frame, and a discriminant diagram for that vowel is applied.

発明の効果以上のように本発明は低域パワーディップと高域パワー
ディップの大きさを判別図に適用し４つの音素群と中間
領域に分類し、認識の対象となる音素標準パターンとマ
ツチングすることにより精度良く子音の認識を行なうこ
とが出来る。これは、パラメータとして低域パワーと高
域パワーを併用しているので、高域パワーにパワーディ
ップのあられれやすい有声子音と低減パワーにパワーデ
ィップのあられれやすい無声子音の両方に対して有効に
作用しているためである。判別図の中に中間領域を設定
しこの領域に判別された子音については隣接する音素群
の両方の音素標準パターンと認識することによって、判
別図の境界付近で誤まって分類される子音の認識率を向
上させることができるようになった。また、後続母音別
に判別図を作成しているのでより精度の高い判別図の作
成ができる。以上述べたように、本発明の方法を用いる
ことにより子音の認識を精度良く行なえるようになり、
その効果も大きい。Effects of the Invention As described above, the present invention applies the magnitudes of the low-frequency power dip and high-frequency power dip to a discriminant diagram, classifies them into four phoneme groups and an intermediate region, and matches them with the phoneme standard pattern to be recognized. This allows for highly accurate consonant recognition. Since this uses both low-frequency power and high-frequency power as parameters, it is effective for both voiced consonants that tend to have a power dip in the high-frequency power and voiceless consonants that tend to have a power dip in the reduced power. This is because it is working. By setting an intermediate region in the discriminant diagram and recognizing consonants classified in this region as both phoneme standard patterns of adjacent phoneme groups, it is possible to recognize consonants that are incorrectly classified near the boundaries of the discriminant diagram. It is now possible to improve the rate. Furthermore, since a discriminant diagram is created for each subsequent vowel, it is possible to create a discriminant diagram with higher accuracy. As described above, by using the method of the present invention, consonants can be recognized with high accuracy,
The effect is also great.

[Brief explanation of the drawing]

第１図は、本発明の一実施例における子音認識法を具現
化する機能のブロック図、第２図は、本実施例のパワー
ディップの説明図、第３図〜第７図は本実施例における
無声破裂音・有声破裂音・鼻音・無声摩擦音のパワーデ
ィップの分布図、第８図〜第１２図は、本実施例におけ
る／ｒ　ａ／　、　／ｒ　＋／。／、ｕ／、／、ｅ／、／ｒｏ／とそれぞれ語中で発声し
たパワーの時間的変化と変化速度を示した図、第１３図
は従来の単語認識システムのブロック図、第１４図は従
来の子音セグメンテーション法の説明図である。１０・・・・・・パワーディップ検出部、１１・・・・
・・パワーディップの大きさ抽出部、１２・・・・・・
フレーム毎の母音認識部、１３・・・・・・後続母音認
識部、１４・・・・・・判別図選択部、１５・・・・・
・後続母音側判別図格納部、１６・・・・・・大分類判
定部、１７・・・・・・標準パターン選択部、１８・・
・・・・標準パターン格納部、１９・・・・・・子音認
識部。代理人の氏名　弁理士　中　尾　敏　男　ほか１名第１
図第７図ＰＬ→ 第８図第９１ｍ第１０図第１１０第１２図FIG. 1 is a block diagram of a function that embodies a consonant recognition method in an embodiment of the present invention, FIG. 2 is an explanatory diagram of a power dip in this embodiment, and FIGS. 3 to 7 are in this embodiment The power dip distribution diagrams of voiceless plosives, voiced plosives, nasals, and voiceless fricatives in FIGS. 8 to 12 are /ra/ and /r +/ in this example. Figure 13 is a block diagram of a conventional word recognition system, and Figure 14 is a diagram showing the temporal change and rate of change in the power uttered in words /, u/, /, e/, /ro/, respectively. FIG. 2 is an explanatory diagram of a conventional consonant segmentation method. 10... Power dip detection section, 11...
...Power dip size extraction part, 12...
Vowel recognition unit for each frame, 13...Subsequent vowel recognition unit, 14...Discriminant map selection unit, 15...
・Subsequent vowel side discriminant diagram storage unit, 16...Major classification judgment unit, 17...Standard pattern selection unit, 18...
... Standard pattern storage section, 19 ... Consonant recognition section. Name of agent: Patent attorney Toshio Nakao and 1 other person No. 1
Figure 7 PL → Figure 8 Figure 91m Figure 10 Figure 110 Figure 12

Claims

[Claims]

(1) In a speech recognition method characterized by phoneme recognition, the low-frequency power and high-frequency power of the speech spectrum are obtained, the magnitude of the power dip caused by each temporal change is extracted, and these are analyzed for each subsequent vowel. Consonants are classified into several consonant groups and intermediate regions by applying a discriminant diagram created in advance, and the consonant groups are matched with the phoneme standard pattern of each consonant group, and the intermediate regions are matched with the adjacent phoneme pattern. A consonant recognition method characterized by performing matching with standard phoneme patterns for both consonant groups.

(2) Consonant recognition according to claim 1, characterized in that the magnitude of the power dip is extracted as the magnitude of the temporal change speed when the power changes from the consonant to the subsequent vowel. Law.

(3) The method for creating a discriminant diagram is to calculate the distribution from the size of the power dip that appears in each consonant group based on a large amount of data in advance, and express the discriminant results as a discriminant diagram for all the input data expected in advance. 2. The consonant recognition method according to claim 1, wherein the consonant recognition method is used as a consonant recognition method.

(4) The consonant recognition method according to claim 1, wherein an intermediate region is set near the boundary of the discriminant diagram to reduce errors caused by the discriminant diagram.