JPH03239299A

JPH03239299A - Fricative consonant discriminating system

Info

Publication number: JPH03239299A
Application number: JP2036734A
Authority: JP
Inventors: Hideki Kojima; 英樹小島
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-02-16
Filing date: 1990-02-16
Publication date: 1991-10-24

Abstract

PURPOSE:To obtain a high discrimination performance by deriving a discrimination parameter from both a consonant section and the succeeding vowel transition part. CONSTITUTION:A consonant section analyzing means 2 analyzes the inside of a consonant section and derives a discrimination parameter in the consonant section. A succeeding vowel transition part analyzing means 6 analyzes the succeeding vowel transition part extending from a consonant section end point to the succeeding vowel rise point, and derives a discrimination parameter in the succeeding vowel transition part. A likelihood arithmetic means 3 derives a likelihood of an input voice and each category of a fricative consonant from the discrimination parameter in the consonant section and the discrimination parameter in the succeeding vowel transition part, and a deciding means 4 decides the category whose derived tolerance is the largest and decides and discriminates a consonant. In such a way, the fricative consonant whose discrimination had been difficult by only a consonant part can be discriminated, and the discrimination performance can be improved.

Description

【発明の詳細な説明】〔目　次〕概要産業上の利用分野従来の技術と発明が解決しようとする課題課題を解決す
るための手段作用実施例発明の効果〔概要〕音声識別処理において、音声中の摩擦子音を識別する摩
擦子音識別方式に関し、高い識別性能を持つことを目的とし、子音区間のみでなく、前後の母音の遷移部からも識別パ
ラメータ■を求め、更に、子音区間と母音遷移部とで分
析条件、例えば、分析窓周期（Ｆ）。[Detailed Description of the Invention] [Table of Contents] Overview Industrial Application Fields Prior Art and Problems to be Solved by the Invention Means for Solving the Problems Action Examples Effects of the Invention [Summary] In voice recognition processing, voice Regarding the fricative consonant identification method that identifies middle fricative consonants, with the aim of achieving high discrimination performance, we calculated the identification parameter ■ not only from the consonant interval but also from the transition part of the preceding and following vowels, and further, we and analysis conditions, e.g. analysis window period (F).

分析窓長（Ｗ）を変えるように構成する。It is configured to change the analysis window length (W).

（産業上の利用分野〕本発明は音声認識処理において重要である摩擦子音の識
別方式に関する。(Industrial Application Field) The present invention relates to a method for identifying fricative consonants, which is important in speech recognition processing.

最近の音声認識技術の進歩に伴い、音声ワードプロセッ
サ、工場での音声によるデータ入力の実用化が図られて
いる。With recent advances in voice recognition technology, efforts are being made to put voice word processors and voice-based data input in factories into practical use.

これらの分野においては、音声でデータが入力されるた
め、該人力された音声に対する高い認識率が必要とされ
る。In these fields, data is input by voice, so a high recognition rate for the human-generated voice is required.

一般に、音声の正確な認識には母音と子音の認識が重要
であり、音声認識では、該母音の認識は容易であるが子
音の認識が困難であることが多い。Generally, recognition of vowels and consonants is important for accurate speech recognition, and in speech recognition, it is often easy to recognize vowels, but difficult to recognize consonants.

従って、音声によるデータ入力の実用化には、子音の認
識率の向上が重要となる。Therefore, it is important to improve the consonant recognition rate for the practical application of voice data input.

〔従来の技術と発明が解決しようとする課題〕第６図は
従来の摩擦子音識別方式の構成を示す図である。[Prior art and problems to be solved by the invention] FIG. 6 is a diagram showing the configuration of a conventional fricative consonant identification system.

同図において、子音区間検出手段１は人力音声のディジ
タル時系列信号から子音区間の開始点。In the figure, a consonant section detecting means 1 detects the starting point of a consonant section from a digital time-series signal of human speech.

及び、終了点を検出する。And detect the end point.

分析手段２は検出された子音区間の内部で分析を行い識
別パラメータのを求める。The analysis means 2 performs analysis within the detected consonant section to obtain identification parameters.

次に、尤度演算手段３で、摩擦子音の標準パターンを用
いて、上記識別バラメーク■と摩擦子音の各カテゴリー
（即ち、子音「φ」「Ｓ」「ｈ」）との尤度を演算して
求める。Next, the likelihood calculation means 3 uses the standard pattern of fricative consonants to calculate the likelihood of the discrimination parameter ■ and each category of fricative consonants (i.e., consonants "φ", "S", and "h"). I ask.

更に、判定手段４は、求められた尤度が最も大きいカテ
ゴリーを判定して摩擦子音の判定識別を行う。Further, the determining means 4 determines the category with the highest calculated likelihood and performs determination and identification of fricative consonants.

上記従来方式では、子音区間内部を分析して得た識別パ
ラメータののみで判定識別を行っているため、後続母音
、及び、先行母音の遷移部に識別に有効な情報があって
も利用できずに、充分な識別性能が得られないという問
題があった。In the above conventional method, judgment and identification are performed only using the identification parameters obtained by analyzing the inside of the consonant interval, so even if there is information effective for identification in the transition part of the following vowel and preceding vowel, it cannot be used. However, there was a problem in that sufficient identification performance could not be obtained.

本発明は上記従来の欠点に鑑み、入力音声中の摩擦子音
を識別するのに、高い識別性能を持つ摩擦子音識別方式
を提供することを目的とする。SUMMARY OF THE INVENTION In view of the above-mentioned conventional drawbacks, it is an object of the present invention to provide a fricative consonant identification method that has high discrimination performance for identifying fricative consonants in input speech.

[Means to solve the problem]

第１図は本発明の原理ブロック図を示し、（ａ）は、子
音区間と後続母音遷移部の両方から識別パラメータのを
求める場合を示し、（ｂ）は、先行母音遷移部と子音区
間と後続母音遷移部の、それぞれから識別パラメータの
を求める場合を示している。FIG. 1 shows a block diagram of the principle of the present invention, in which (a) shows the case where the identification parameter is determined from both the consonant interval and the subsequent vowel transition part, and (b) shows the case where the identification parameter is determined from both the preceding vowel transition part and the consonant interval. A case is shown in which the identification parameter is determined from each of the following vowel transition parts.

上記の問題点は下記の如くに構成された摩擦子音識別方
式によって解決される。The above problems are solved by a fricative consonant identification method configured as follows.

（１）入力音声中の摩擦子音を識別する摩擦子音識別方
式において、子音区間と後続母音遷移部の両方から識別パラメータの
を求めるように構成する。(1) In the fricative consonant identification method for identifying fricative consonants in input speech, the identification parameter is determined from both the consonant interval and the subsequent vowel transition part.

（２）上記摩擦子音識別方式において、子音区間と後続
母音遷移部とで、異なる分析窓周期（Ｆ）を用いるよう
に構成する。(2) In the above fricative consonant identification method, different analysis window periods (F) are used in the consonant interval and the subsequent vowel transition part.

（３）上記摩擦子音識別方式において、子音区間と後続
母音遷移部とで、異なる分析窓長（Ｗ）を用いるように
構成する。(3) In the above fricative consonant identification method, different analysis window lengths (W) are used for the consonant interval and the subsequent vowel transition part.

（４）入力音声中の摩擦子音を識別する摩擦子音識別方
式において、先行母音遷移部と子音区間と後続母音遷移部の３箇所か
ら識別パラメータのを求めるように構成する。(4) In the fricative consonant identification method for identifying fricative consonants in input speech, the identification parameter is determined from three locations: the preceding vowel transition part, the consonant interval, and the following vowel transition part.

（５）上記（４）に記載の摩擦子音識別方式において、
先行母音遷移部と子音区間と後続母音遷移部とで、それ
ぞれ異なる分析窓周期（Ｆ）を用いるように構成する。(5) In the fricative consonant identification method described in (4) above,
The preceding vowel transition section, the consonant section, and the following vowel transition section are configured to use different analysis window periods (F), respectively.

、５（６）上記（４）に記載の摩擦子音識別方式において、
先行母音遷移部と子音区間と後続母音遷移部とで、異な
る分析窓長（−）を用いるように構成する。, 5 (6) In the fricative consonant identification method described in (4) above,
Different analysis window lengths (-) are configured to be used for the preceding vowel transition part, the consonant section, and the following vowel transition part.

（作用〕即ち、本発明によれば、音声認識処理において重要であ
る摩擦子音を識別するのに、従来の子音区間から識別パ
ラメータのを求めるのではなく、該子音区間と、該子音
の前後の母音の遷移部からも識別パラメータのを求める
ようにする。(Operation) That is, according to the present invention, in order to identify a fricative consonant, which is important in speech recognition processing, instead of determining the identification parameter from the consonant interval as in the past, the identification parameter is determined from the consonant interval and before and after the consonant. The identification parameter is also determined from the transition part of the vowel.

先ず、入力音声のディジタル時系列信号が子音区間検出
手段１、及び、後続母音立ち上がり点検出手段５に供給
される。First, a digital time-series signal of input speech is supplied to the consonant interval detection means 1 and the subsequent vowel rising point detection means 5.

子音区間検出手段１は、子音区間の開始点、及び、終了
点を検出し、又、後続母音立ち上がり点検出手段５は、子音から後続
母音への遷移が終わり後続母音の発声が確定し始める点
を検出する。The consonant interval detection means 1 detects the start point and the end point of the consonant interval, and the subsequent vowel rising point detection means 5 detects the point at which the transition from the consonant to the subsequent vowel ends and the utterance of the subsequent vowel begins to be determined. Detect.

子音区間分析手段２は、子音区間内部を分析して子音区
間での識別パラメータのを求める。The consonant section analysis means 2 analyzes the inside of the consonant section to obtain identification parameters in the consonant section.

後続母音遷移部分析手段６は、子音区間終了点から後続
母音立ち上がり点までの後続母音遷移部を分析して、後
続母音遷移部での識別パラメータ■を求める。The subsequent vowel transition analysis means 6 analyzes the subsequent vowel transition from the end point of the consonant section to the rising point of the subsequent vowel, and obtains the identification parameter (2) for the subsequent vowel transition.

尤度演算手段３ば、子音区間での識別パラメタと、後続
母音遷移部での識別パラメータ■から、人力音声と摩擦
子音の各カテゴリー（即ち、「φＪ　　’Ｓｊ　　’ｈ
Ｊ）　　との尤度を求める。The likelihood calculation means 3 calculates each category of human speech and fricative consonants (i.e., ``φJ 'Sj 'h
J) Find the likelihood.

判定手段４は求められた尤度が最も大きいカテゴリーを
判定して子音の判定識別を行う。The determining means 4 determines the category with the greatest likelihood and performs consonant identification.

更に、辞書製作に用いるデータが豊富な場合は、先行母
音、後続母音側に辞書を作り、先行母音の遷移部からも
情報を得ることができる。この場合の原理図を第１図（
ｂ）に示す。Furthermore, if there is a wealth of data used to create a dictionary, a dictionary can be created for the preceding vowel and subsequent vowel, and information can also be obtained from the transition part of the preceding vowel. The principle diagram in this case is shown in Figure 1 (
Shown in b).

先行母音遷移開始点検出手段７は、先行母音から子音に
向けての遷移が始まる点を検出する。The preceding vowel transition start point detecting means 7 detects the point at which the transition from the preceding vowel to the consonant begins.

先行母音遷移部分桁手段８は、先行母音遷移開始点から
子音区間開始点までの先行母音遷移部を分析して先行母
音遷移部での識別パラメータ■を求める。The preceding vowel transition part digit means 8 analyzes the preceding vowel transition part from the starting point of the preceding vowel transition to the starting point of the consonant section, and obtains the identification parameter (2) for the preceding vowel transition part.

尤度演算手段３は、先行母音遷移部での識別パラメータ
のと、子音区間での識別パラメータのと後続母音遷移部
での識別パラメータのから入力音声と摩擦子音の各カテ
ゴリーとの尤度を求める。The likelihood calculation means 3 calculates the likelihood between the input speech and each category of fricative consonants from the identification parameters at the preceding vowel transition part, the identification parameters at the consonant interval, and the identification parameters at the subsequent vowel transition part. .

判定手段４は求められた尤度が最も大きいカテゴリー（
子音）を判定して子音の判定識別を行う。Judgment means 4 selects the category (
Consonant) is determined and the consonant is determined and identified.

このように、本発明においては、子音区間のみではなく
、先行母音と後続母音の遷移部からも特徴を抽出して、
すべての識別パラメータのを用いて、各カテゴリーとの
尤度を求め、尤度の最も高いカテゴリーを識別結果とす
る。In this way, in the present invention, features are extracted not only from the consonant interval but also from the transition part between the preceding vowel and the following vowel.
The likelihood with each category is determined using all the identification parameters, and the category with the highest likelihood is taken as the identification result.

これによって、子音部だけでは識別が困難であった摩擦
子音の識別が可能となり、識別性能が向上する効果があ
る。This makes it possible to identify fricative consonants, which were difficult to identify using just the consonant part, and has the effect of improving identification performance.

〔Example〕

以下本発明の実施例を図面によって詳述する。 Embodiments of the present invention will be described in detail below with reference to the drawings.

前述の第１図は本発明の原理ブロック図であり、第２図
は本発明の一実施例を示した図であり、第３図は本発明
の他の実施例を示した図であり、第４図は本発明による
子音区間、母音遷移部での分析条件の例を示した図であ
って、（ａ）は子音区間の例を示し、（ｂ）は後続母音
遷移部での例を示し、（ｃ）は先行母音遷移部での例を
示しており、第５図は入力音声の低域パワーと闇値によ
り子音区間と、母音遷移部を検出する例を示した図であ
って、子音区間のみでなく、該子音の前後の母音への遷
移部からの識別パラメータのを求め、且つ、子音区間と
、母音遷移部での分析条件、例えば、分析窓の周期（Ｆ
）１分析窓長（旧を変える手段が本発明を実施するのに
必要な手段である。尚、全図を通して同し符号は同し対
象物を示している。The above-mentioned FIG. 1 is a block diagram of the principle of the present invention, FIG. 2 is a diagram showing one embodiment of the present invention, and FIG. 3 is a diagram showing another embodiment of the present invention. FIG. 4 is a diagram showing an example of analysis conditions for a consonant interval and a vowel transition part according to the present invention, where (a) shows an example of a consonant interval, and (b) shows an example of a subsequent vowel transition part. (c) shows an example of the preceding vowel transition part, and FIG. , find the identification parameters not only from the consonant interval but also from the transition part to the vowel before and after the consonant, and analyze the consonant interval and the analysis conditions at the vowel transition part, for example, the period of the analysis window (F
) 1 analysis window length (The means for changing the old is a necessary means for carrying out the present invention. The same reference numerals indicate the same objects throughout the figures.

以下、第１図を参照しながら、第２図、第３図によって
本発明の摩擦子音識別方式を説明する。Hereinafter, while referring to FIG. 1, the fricative consonant identification method of the present invention will be explained with reference to FIGS. 2 and 3.

第２図は本発明による摩擦子音識別方式の請求項（１）
〜（３）の構成を示している。Figure 2 shows claim (1) of the fricative consonant identification method according to the present invention.
This shows the configuration of ~(3).

この実施例では無声摩擦子音「φ、ｓ、ｈ、１を含む単
語音声を人力して、その無声摩擦子音間の識別を行うも
のである。In this embodiment, the speech of a word containing the voiceless fricative consonants ``φ, s, h, and 1'' is manually generated, and the voiceless fricative consonants are discriminated.

同図中、音声データメモリは、有声／無声摩擦０子音と、それに先行する母音、及び、後続する母音の入
力音声のディジタル時系列信号が供給されて記録される
。In the figure, the audio data memory is supplied with and records digital time-series signals of input audio of voiced/voiceless fricative 0 consonants, vowels preceding them, and vowels following them.

この音声データメモリから読み出されたディジタル時系
列信号は子音区間開始点検出部１０と。The digital time-series signal read from the audio data memory is sent to the consonant section start point detection section 10.

子音区間終了点検出部１１と、後続母音立ち上がり点検
出部５にそれぞれ供給される。The signal is supplied to the consonant section end point detecting section 11 and the subsequent vowel rising point detecting section 5, respectively.

子音区間開始点検出部１０ば入力音声の低域パワーが一
定の閾値ＴＨＩより低くなった時点を子音区間開始点と
して検出する。子音区間終了点検出部１１は人力音声の
低域パワーが一定の閾値ＴＨ２より高くなった時点を子
音区間終了点として検出する。The consonant section start point detection unit 10 detects the point in time when the low frequency power of the input voice becomes lower than a certain threshold value THI as the consonant section start point. The consonant section end point detection unit 11 detects the point in time when the low frequency power of the human voice becomes higher than a certain threshold TH2 as the consonant section end point.

又、後続母音立ち上がり点検出部５は入力音声の低域パ
ワーが一定の閾値ＴＨ３（＞ＴＨ２）より高くなった時
点を後続母音立ち上がり点として検出する。（第４図、
第５図参照）次に、子音区間分析位置設定部２０は、例えば、３フレ
ームの分析を行う場合、子音区間開始点と子音区間終了
点の間を４等分（これにより、分析周期（Ｆ）が定まる
。以下、同し）し、中央の３個の分割点に窓長Ｗ１の分
析フレームを設定する。Further, the subsequent vowel rising point detection unit 5 detects the point in time when the low frequency power of the input voice becomes higher than a certain threshold TH3 (>TH2) as the subsequent vowel rising point. (Figure 4,
(See FIG. 5) Next, when analyzing three frames, the consonant section analysis position setting unit 20 divides the space between the consonant section start point and the consonant section end point into four equal parts (thereby, the analysis period (F ) is determined (the same applies hereinafter), and an analysis frame with a window length W1 is set at the three central dividing points.

（第４図（ａ）参照）この後、子音区間分析部２１は上記分析フレームの周波
数分析を行い、例えば、周波数０〜１０ｋＨｚを３２の
帯域に分割して、この３２帯域の各々のパワースペクト
ルを求め、合計９６個のパワースペクトルを識別パラメ
ータＸ（要素χ。(See FIG. 4(a)) After this, the consonant interval analysis unit 21 performs frequency analysis of the analysis frame, for example, divides the frequency from 0 to 10 kHz into 32 bands, and calculates the power spectrum of each of the 32 bands. , and a total of 96 power spectra are identified by the identification parameter X (element χ.

ｉ＝１〜９６）のとする。i=1 to 96).

次に、後続母音遷移部分折位置設定部６０ば、例えば、
３フレームの分析を行う場合、子音区間終了点と後続母
音立ち」二かり点の間を２等分し、３個の分割点に窓長
Ｗ２の分析フレームを設定する。（第４図（ｂ）参照）この後、後続母音遷移部分折部６１　は上記分析フレー
ムの周波数分析を行い、例えば、周波数Ｏ〜１０ｋＨｚ
を３２の帯域に分割して、この３２帯域各々のバワース
ペク［・ルを求め、合計９６個のパワースペクトルを識
別パラメータＸ（要素Ｘｉ。Next, the subsequent vowel transition part break position setting unit 60, for example,
When analyzing three frames, the area between the consonant interval end point and the following vowel start point is divided into two, and an analysis frame with a window length W2 is set at the three division points. (See FIG. 4(b)) After this, the subsequent vowel transition part splitting unit 61 performs frequency analysis of the analysis frame, for example, frequency O~10kHz.
is divided into 32 bands, the power spectra of each of these 32 bands are determined, and a total of 96 power spectra are determined by the identification parameter X (element Xi).

１−９７〜１９２）■とする。1-97~192) ■.

ところで、標準パターン辞書３１には、予め多数のデー
タから求めておいた識別パラメータ■の主成分係数ベク
トルＭと２主成分展開された２８次元のデータの「φ」
、「Ｓ」、「ｈ」各々の群の平均ベクトルＥφ、Ｅｓ、
Ｅｈと、各群の分散共分散行列の平均分散共分散行列の
逆行列■とが格納されている。By the way, the standard pattern dictionary 31 contains the principal component coefficient vector M of the identification parameter ■, which has been obtained in advance from a large amount of data, and "φ" of the 28-dimensional data expanded with two principal components.
, "S", "h", each group's average vector Eφ, Es,
Eh and the inverse matrix (■) of the mean variance-covariance matrix of the variance-covariance matrix of each group are stored.

ここで、主成分展開とは、上記１９２次元のパラメータ
を、情報量を損なわないように、２８次元のパラメータ
に圧縮することであり、主成分係数ベクトルＭは、この
主成分展開に必要な係数のベクトルで、１９２Ｘ２８次
元の行列（要素ｍ。Here, principal component expansion means compressing the above 192-dimensional parameters into 28-dimensional parameters without losing the amount of information, and the principal component coefficient vector M is the coefficients necessary for this principal component expansion. is a 192x28-dimensional matrix (element m).

ｉ＝１〜１９２．ｊ＝１〜２８）である。又、各群の平
均ベクトルＥ（Ｅφ、Ｅｓ、Ｅｈ）は、２８次元のベク
トル（要素ｅ、、ｊ＝１〜２８）であり、分散共分散行
列とは、各群の多数のデータ（要素〉の分散を表す行列
であって、■は２８×２８次元の行列（要素Ｖ　ｉ　ｊ
＋　　１−１〜２８．ｊ−１〜２８）である。i=1-192. j=1 to 28). In addition, the average vector E (Eφ, Es, Eh) of each group is a 28-dimensional vector (elements e, , j = 1 to 28), and the variance-covariance matrix is a vector of many data (elements 〉, where ■ is a 28×28-dimensional matrix (element V i j
+ 1-1~28. j-1 to 28).

次に、主成分分析部３０は、子音区間分析部２と、後続
母音遷移部分折部６Ｉで得られた識別パラメータＸのと
標準パターン辞書３１からの主成分係数ベクトルＭから
次式の演算を行い、上記パラメータＸ■の主成分Ｒ（要
素ｒＪ、ｊ＝１〜２８）、即ち、上記圧縮データを求め
る。Next, the principal component analysis unit 30 calculates the following equation from the identification parameter Then, the principal component R (element rJ, j=1 to 28) of the parameter X■, that is, the compressed data is obtained.

ｒｊ−Σｍ、、−Ｘ、　（ｊ　＝　１〜２８　）　　　
−−−−−ｍ−−−−（１）次に距離演算部３２は、上
記の主成分Ｒと、標準パターン辞書３１からの平均ベク
トルＥφ、Ｅｓ、Ｅｈと、逆行列Ｖ（上記、各要素の分
散を基に、後述の各子音との距離Ｐを求める演算を行う
とき、必要となる各分散量の正規化には、通常、除算が
必要となるが、該分散の逆行列を用いることで、該除算
を乗算に帰着させることができる）から次式の演算を行
い、各カテゴリー「φ」５「ｓ」１　「ｈ」との距離Ｐ
（スカラー量）を求める。rj−Σm,,−X, (j = 1 to 28)
-------m---(1) Next, the distance calculation unit 32 calculates the above principal component R, the average vectors Eφ, Es, Eh from the standard pattern dictionary 31, and the inverse matrix V (the above, each When calculating the distance P to each consonant, which will be described later, based on the variance of the elements, division is usually required to normalize each variance, but the inverse matrix of the variance is used. (By doing so, the division can be reduced to multiplication), calculate the distance P from each category "φ" 5 "s" 1 "h" by calculating the following formula.
Find (scalar quantity).

ｐｑ−（Ｒ−Ｅｑ）ｔ　−ｖ・　（Ｒ−Ｅｑ）（２）（但し、９−φ、ｓ、ｈで、Ｌは転置を表す）この距離
Ｐφ、Ｐｓ、Ｐｈは、各カテゴリーとの３４尤度が高い程小さい値となる。pq−(R−Eq)t −v・(R−Eq)(2) (However, in 9−φ, s, h, L represents transposition) This distance Pφ, Ps, Ph is the distance between each category. 3 4 The higher the likelihood, the smaller the value.

上記、主成分Ｒを求める式（１）の導出過程については
、例えば、“「多変量解析法コ、Ｐ、８４−１００゛主
成分分析法゛、柳井晴夫、高値芳雄著、朝倉書店、　１
９７７　、９／３０　、初版発刊”に詳しいので、算出
式のみを示しておく。又、上記距離Ｐを求める弐（２）
の導出過程についても、同じ文献の“「多変量解析法」
、゛判別分析’、Ｐ、６９”に詳しいので、該算出式の
みを示しておく。Regarding the derivation process of formula (1) for calculating the principal component R, see, for example, "Multivariate Analysis Method, P, 84-100゛Principal Component Analysis Method," by Haruo Yanai and Yoshio Takashi, Asakura Shoten, 1.
977, first published on September 30th, so I will only show the calculation formula. Also, find the above distance P (2)
Regarding the process of deriving
, ``Discriminant Analysis'', P. 69'', so only the calculation formula will be shown.

判定部４は、上記距離演算部３２より供給される各カテ
ゴリーの距離Ｐの値が最小で、最も尤度の高いカテゴリ
ーを判定し、これを識別結果とする。The determining unit 4 determines the category with the lowest value of distance P for each category supplied from the distance calculating unit 32 and the highest likelihood, and uses this as the identification result.

更に、上記標準パターン辞書３１を作るための音声デー
タが非常に豊富な場合、先行、後続母音別に辞書を作り
、先行母音の遷移部からも識別パラメータのを得ること
が可能である。Furthermore, if the speech data for creating the standard pattern dictionary 31 is very rich, it is possible to create dictionaries for preceding and succeeding vowels and obtain identification parameters from the transition part of the preceding vowel.

第３図はその場合の構成を示すブロック図で示したもの
で、本発明による摩擦子音識別方式の請求項の（４）〜
（６）に当たる。この実施例も、無声ＨＭ子音「φ、　
　ｓ、　　ｔ＋」を含む単語音声を入力して、その無声
摩擦子音間の識別を行うものである。FIG. 3 is a block diagram showing the configuration in that case, and it shows claims (4) to 3 of the fricative consonant identification method according to the present invention.
This corresponds to (6). This example also uses the voiceless HM consonant “φ,
A word sound including "s, t+" is input, and the voiceless fricative consonants are discriminated.

この場合、音声データメモリから読み出されたディジタ
ル時系列信号は、先行母音遷移開始点検出部涛と、子音
区間開始点検出部１０と、子音区間終了点検出部１】と
、後続母音立ち上がり点検出部、５４）−４こ、それぞ
れ供給される。In this case, the digital time-series signal read from the audio data memory is sent to the preceding vowel transition start point detection section 10, the consonant section start point detection section 10, the consonant section end point detection section 1], and the subsequent vowel transition detection section 1. Outputs 54)-4 are supplied, respectively.

先行母音遷移開始点検出部７は、入力音声の低域パワー
が一定の閾値ＴＨＯより低くなった時点を先行母音遷移
開始点として検出する。子音区間開始点検出部１０は高
域除去した入力音声の低域パワーが一定の闇値ＴＨＩ　
　（＜ＴＨＯ）より低くなった時点を子音区間開始点と
して検出する。子音区間終了点検出部１１は入力音声の
低域パワーが一定の閾値Ｔ　Ｈ２より高くなった時点を
子音区間終了点として検出する。又、後続母音立ち上が
り点検出部５は入力音声の低域パワーが一定の閾値ＴＨ
３（＞ＴＨ２）より高くなった時点を後続母音立ちＬが
り点として検出する。（第４図、第５図参照）次に、先行母音遷移部分折位置設定部８０は、例えば、
３フレームの分析を行う場合、上記先行母音遷移開始点
と子音区間開始点の間を２等分し、中央の３個の分割点
に窓長Ｗ０の分析フレー１、を設定する。（第４図（ｃ
）を参照）この後、先行母音遷移部分折部８Ｉは、上記分析フレー
ムの周波数分析を行い、例えば、周波数０〜１０ｋＨｚ
を３２の帯域に分割して、この３２帯域の各々のパワー
スペクトルを求め、合計９６個のパワースペクトルを識
別パラメータＹ（要素ｙ８．ｉ−１〜９６）のとする。The preceding vowel transition start point detection unit 7 detects the point in time when the low frequency power of the input voice becomes lower than a certain threshold THO as the preceding vowel transition starting point. The consonant section start point detection unit 10 detects a dark value THI in which the low frequency power of the input voice from which the high frequency has been removed is constant.
(<THO) The time point when the value becomes lower than THO is detected as the consonant section start point. The consonant section end point detection unit 11 detects the point in time when the low frequency power of the input voice becomes higher than a certain threshold value TH2 as the consonant section end point. Further, the subsequent vowel rise point detection unit 5 detects a threshold value TH at which the low frequency power of the input voice is constant.
3 (>TH2) is detected as the trailing vowel rising point. (See FIGS. 4 and 5) Next, the preceding vowel transition part break position setting unit 80, for example,
When analyzing three frames, the space between the preceding vowel transition start point and the consonant section start point is divided into two equal parts, and analysis frame 1 with window length W0 is set at the central three division points. (Figure 4(c)
) After this, the preceding vowel transition part fragmentation unit 8I performs a frequency analysis of the above analysis frame, for example, the frequency of 0 to 10kHz.
is divided into 32 bands, the power spectrum of each of these 32 bands is obtained, and a total of 96 power spectra are used as the identification parameter Y (elements y8.i-1 to 96).

次に、子音区間分析位置設定部２０は、例えば、３フレ
ームの分析を行う場合、上記子音区間開始点と子音区間
終了点の間を４等分し、中央の３個の分割点に窓長Ｗ１
の分析フレームを設定する。Next, when analyzing three frames, for example, the consonant section analysis position setting unit 20 divides the space between the consonant section start point and the consonant section end point into four equal parts, and sets the window length to the three central dividing points. W1
Set up an analysis frame.

（第４図（ａ）参照）この後、子音区間分析部２１は、上記分析フレームの周
波数分析を行い、例えば、周波数Ｏ〜１０　ｋ　Ｈｚを
３２の帯域に分割して、この３２帯域の各々のパワース
ペクトルを求め、合計９６個のパワースペクトルを識別
パラメータＹ（要素ｙ。(See FIG. 4(a)) After this, the consonant interval analysis unit 21 performs a frequency analysis of the analysis frame, for example, divides the frequency from 0 to 10 kHz into 32 bands, and analyzes each of the 32 bands. A total of 96 power spectra are determined using the identification parameter Y (element y.

１−９７〜１９２）のとする。1-97 to 192).

次に、後続母音遷移部分折位置設定部６０は、例えば、
３フレームの分析を行う場合、子音区間終了点と後続母
音立ち上がり点の間を２等分し、３個の分割点に窓長Ｗ
２の分析フレームを設定する。（第４図（ｂ）参照）この後、後続母音遷移部分折部６１は、上記分析フレー
ムの周波数分析を行い、例えば、周波数０〜１０ｋＨｚ
を３２の帯域に分割して、この３２帯域の各々のパワー
スペクトルを求め、合計９６個のパワースペクトルを識
別パラメータＹ（要素ｙｔ、！　＝１９３〜２８日）の
とする。Next, the subsequent vowel transition part break position setting unit 60, for example,
When analyzing three frames, divide the area between the consonant interval end point and the following vowel rise point into two, and set the window length W at the three dividing points.
Set up the second analysis frame. (See FIG. 4(b)) After this, the subsequent vowel transition part splitting unit 61 performs frequency analysis of the above analysis frame, for example, the frequency of 0 to 10kHz.
is divided into 32 bands, the power spectrum of each of these 32 bands is determined, and a total of 96 power spectra are used as the identification parameter Y (element yt,!=193 to 28 days).

ところで、標準パターン辞書３１には、予め、多数のデ
ータから求めておいた識別パラメータ■の主成分係数ベ
クトルＮと、主成分展開された２８次元のデータの「φ
」、「Ｓ」、「ｈ」各々の群の平均ベクトルＦφ、Ｆｓ
、Ｆｈと、各群の分散共分散ｊテ列の平均分散共分散行
列の逆行列Ｕとが格納されている。By the way, the standard pattern dictionary 31 contains the principal component coefficient vector N of the identification parameter ■ previously obtained from a large amount of data, and the principal component coefficient vector N of the 28-dimensional data expanded as the principal component.
”, “S”, “h” group mean vectors Fφ, Fs
, Fh, and the inverse matrix U of the mean variance-covariance matrix of the variance-covariance j-te sequence of each group are stored.

７８ここで主成分展開とは、上記２８８次元のパラメータを
、情報量を損なうことなく２８次元のパラメータに圧縮
することであり、主成分係数ベクトルＮは、この主成分
展開に必要な係数のベクトルで２８８Ｘ２８次元の行列
（要素ｎ　ｉｌｌ　””　１〜２８Ｂ、ｊ＝１〜２８）
である。又、各群の平均ベクトルＦ（Ｆφ、ＦｓＳＦｈ
）は２８次元のベクトル（要素ｆＪ、ｊ＝１〜２８）で
あり、分散共分散行列どは、各群の多数のデータの分散
を表す行列であって、Ｕは２８Ｘ２８次元の行列（要素
ｕｉｊ、ｉ””ｌ〜２８．ｊ＝１〜２８）である。7 8 Here, principal component expansion means compressing the above 288-dimensional parameters into 28-dimensional parameters without losing the amount of information, and the principal component coefficient vector N is the number of coefficients required for this principal component expansion. 288x28-dimensional vector matrix (element n ill ”” 1 to 28B, j = 1 to 28)
It is. Also, the average vector F(Fφ, FsSFh
) is a 28-dimensional vector (element fJ, j = 1 to 28), the variance-covariance matrix is a matrix representing the variance of a large number of data in each group, and U is a 28x28-dimensional matrix (element uij , i""l~28.j=1~28).

主成分分析部３０は、上記先行母音遷移部分折部８１と
、子音区間分析部２１と、後続母音遷移部分折部６１で
得られた識別パラメータＹ■と、標準パターン辞書３１
からの主成分係数ベクトルＮから次式の演算を行い、上
記パラメータＹの主成分Ｓ（要素Ｓ４．ｊ−１〜２８）
を求める。The principal component analysis section 30 uses the identification parameter Y obtained by the preceding vowel transition section folding section 81, the consonant interval analyzing section 21, and the following vowel transition section folding section 61, and the standard pattern dictionary 31.
The following equation is calculated from the principal component coefficient vector N from
seek.

パターン辞書３１からの平均ベクトルＦφ、ＦｓＦｈと
逆行列Ｕから次式の演算を行い、各カテゴリー１φ１．
’ＳＪ、’ｈＪとの距離Ｑ（スカラー量）を求める。The following equation is calculated from the average vectors Fφ, FsFh from the pattern dictionary 31 and the inverse matrix U, and each category 1φ1.
Find the distance Q (scalar quantity) between 'SJ and 'hJ.

Ｑｑ−（Ｓ−Ｆｑ）Ｌ　−Ｕ・　（Ｓ−Ｆｑ）（２）（但Ｕ５、＋４　＝φ、ｓ、ｈで、′は転置を表す）こ
の距離勉φ　Ｑｓ、Ｑｈは、各カテゴリーとの尤度が商
い程小さい値となる。Qq−(S−Fq)L −U・(S−Fq)(2) (However, U5, +4 = φ, s, h, and ′ represents transposition) This distance study φ Qs, Qh is The likelihood of is as small as the quotient.

判定部４は、１−記距離演算部３２より供給される各カ
テゴリーの距離Ｑの値が最小で、最も尤度の高いカテゴ
リーを判定し、これを識別結果とする。The determination unit 4 determines the category with the lowest value of the distance Q of each category supplied from the 1-record distance calculation unit 32 and the highest likelihood, and uses this as the identification result.

このように、本発明は、子音区間に加え、前後の母音の
遷移部を最適な分析条件（周期（Ｆ）、窓長（Ｗ））で
５・析し、尤度を判定するため、従来よりも高い識別性
能が得られるようにしたところに特徴がある。In this way, the present invention analyzes not only the consonant interval but also the transition part between the preceding and succeeding vowels under optimal analysis conditions (period (F), window length (W)) and determines the likelihood. The feature is that it is possible to obtain higher identification performance than the conventional method.

〔Effect of the invention〕

上述のごとく、本発明の摩擦子音識別方式によれば、音
声識別処理において、子音区間のみでなく、前後の母音
の遷移部からも識別パラメータのを求め、更に、子音区
間と、前後の母音遷移部とで分析条件（分析窓周期（Ｆ
）１分析窓長（ｈ））を変えるようにしたものであるの
で、高い識別性能が得られ、実用上きわめて有用である
という効果がある。As described above, according to the fricative consonant identification method of the present invention, in speech identification processing, identification parameters are determined not only from the consonant interval but also from the transition parts of the preceding and following vowels, and furthermore, the identification parameters are determined from the consonant interval and the preceding and following vowel transitions. The analysis conditions (analysis window period (F
) 1 analysis window length (h)), high discrimination performance can be obtained and the effect is that it is extremely useful in practice.

[Brief explanation of drawings]

第１図は本発明の原理ブロック図。第２図は本発明の一実施例を示した間第３図は本発明の他の実施例を示１．た図。第４図は本発明による子音区間、母ａ遷移部での分析条
件の例を示した図。第５図は人力音声の低域パワーと闇値により子音１ｇ間
と、母音遷移部を検出」−乙例をボＬ、た図。第〔１図；：１！〆］束の摩擦子音識別方式の構成を示
Ｃ図。である。図面において、１は子音区間検出手段。１０は子音区間開始点検出部１１は子音区間終了点検出部。２は分析１段、又は、子音区間分析手段。２０は子音区間分析位置設定部。２１は子ｊ）区間分析部３は尤度演算手段。３０は′を威分分析部。３１は標準パターン辞書、３２は距離演算部４は判定手
段、又は、判定部５は後続母音立ち上がり点検出手段１又は、後続母音立
ち上がり点検出部６は後続母音遷移部分桁手段。６０は後続母音遷移部分折位置設定部。６１は後続母ａ遷移部分析部７は先行母音遷移開始点検出手段、又は、先行母音遷移
開始点検出部。１２８は先行母音遷移部分析手段８０は先行母音遷移部分折位置設定部。８１は先行母音遷移部分折部。 ■は識別パラメータ。Ｍ、Ｎは主成分係数ベクトルＥφ、Ｅｓ、Ｅｈ、Ｆφ、Ｆｓ、Ｆｈ、は７−音の平均
ベクトルＶ、ｔＪは分散共分散行列の平均分散共分散行列の逆行
列。Ｓ、Ｒは主成分、　　　　　Ｐ、Ｑは距離。をそれぞれ示す。３本発明の原理ブロック図第　１　図　（その１）FIG. 1 is a block diagram of the principle of the present invention. While FIG. 2 shows one embodiment of the invention, FIG. 3 shows another embodiment of the invention. Figure. FIG. 4 is a diagram showing an example of analysis conditions for a consonant interval and a transition part of a vowel according to the present invention. Figure 5 is a diagram showing an example of detecting the consonant 1g interval and the vowel transition part using the low-frequency power and darkness value of the human voice. Figure 1: 1! 〆] Figure C shows the configuration of a bundle fricative consonant identification system. It is. In the drawing, 1 is a consonant interval detection means. 10 is a consonant section start point detection section 11 is a consonant section end point detection section. 2 is a first stage of analysis or a consonant interval analysis means. 20 is a consonant interval analysis position setting unit. 21 is a child j) The interval analysis section 3 is a likelihood calculation means. 30 is the power analysis department. Reference numeral 31 denotes a standard pattern dictionary; 32, a distance calculation section 4 is a determining means; or, a determining section 5 is a subsequent vowel rising point detecting means 1; or a subsequent vowel rising point detecting section 6 is a subsequent vowel transition partial digit means. 60 is a subsequent vowel transition part break position setting unit. Reference numeral 61 denotes a subsequent vowel a transition part analysis unit 7, which is a preceding vowel transition start point detection means or a preceding vowel transition start point detection unit. 1 2 8, the preceding vowel transition part analysis means 80 is a preceding vowel transition part breaking position setting unit. 81 is the leading vowel transition part. ■ is an identification parameter. M, N are the principal component coefficient vectors Eφ, Es, Eh, Fφ, Fs, Fh, are the mean vector V of the 7-tones, and tJ is the inverse matrix of the mean variance-covariance matrix of the variance-covariance matrix. S and R are principal components, P and Q are distances. are shown respectively. 3 Principle block diagram of the present invention Figure 1 (Part 1)

Claims

[Claims]

(1) In the fricative consonant identification method that identifies fricative consonants in input speech, the identification parameter (
(1)) A fricative consonant identification method characterized by determining the following.

(2) In the above fricative consonant identification method, different analysis window periods (F
2. The fricative consonant identification method according to claim 1, wherein the method uses: ).

(3) In the above fricative consonant identification method, the analysis window length (W) is different between the consonant interval and the following vowel transition part.
2. The fricative consonant identification method according to claim 1, wherein:

(4) In a fricative consonant identification method for identifying fricative consonants in input speech, a fricative consonant characterized in that the identification parameter ((1)) is obtained from three locations: a preceding vowel transition part, a consonant interval, and a subsequent vowel transition part. Identification method.

(5) In the fricative consonant identification method according to claim 4, the fricative consonant identification method is characterized in that different analysis window periods (F) are used for the preceding vowel transition part, the consonant interval, and the following vowel transition part. .

(6) The fricative consonant identification method according to claim 4, wherein different analysis window lengths (W) are used for the preceding vowel transition part, the consonant interval, and the following vowel transition part.