JPS6325699A

JPS6325699A - Formant extractor

Info

Publication number: JPS6325699A
Application number: JP61170058A
Authority: JP
Inventors: 修司高田; 道代後藤; 上川　豊
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1986-07-18
Filing date: 1986-07-18
Publication date: 1988-02-03

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は音声の分析、認識に用いられるホルマント抽出
装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a formant extraction device used for speech analysis and recognition.

従来の技術ホルマントとは声道の伝達関数の極によって、音声波の
周波数スペクトル上に生ずる共振の山であり、音韻性を
形作る重要な因子である。周波数の低いほうから順に第
１．第２・・・ホルマントと呼ばれる。BACKGROUND OF THE INVENTION A formant is a peak of resonance that occurs on the frequency spectrum of a speech wave due to the poles of the transfer function of the vocal tract, and is an important factor that shapes phonology. 1st in order from the lowest frequency. Second... called formant.

従来、このホルマント抽出りこは、ツナグラムがらの読
み取り、フィルタバンクを用いる方法、合成による分析
法等、様々な方法が用いられている。Conventionally, various methods have been used for formant extraction, such as reading Tunagrams, using a filter bank, and analyzing by synthesis.

線形予測分析による方法もこの中の−っで、この方法：
よ声道の伝達関数を全極型のモデルで近イ里するもので
ある。This method also includes a method using linear predictive analysis:
This is a close approximation of the vocal tract transfer function using an all-pole model.

以下図面を参照しながら、上述した従来のホルマント抽
出装置の一例について説明する。第２図は従来のホルマ
ント抽出装置の要部ブロック図である。同図において２
１は線形予測分析部、２２は高次方程式求根部、２３は
ホルマント選択部である。An example of the conventional formant extraction device mentioned above will be described below with reference to the drawings. FIG. 2 is a block diagram of the main parts of a conventional formant extractor. In the same figure, 2
1 is a linear prediction analysis section, 22 is a high-order equation root finding section, and 23 is a formant selection section.

以上のように構成された従来のホルマント抽出装置につ
いて、その動作を以下に説明する。The operation of the conventional formant extraction device configured as described above will be described below.

第３闇は線形予測分析部２１における処理フロー図であ
る。ここで音声のサンプリング周波数を１０Ｋ　ｌｌｚ
、■フレームの時間長を２０ｍ５とする。１フレームの
音声波形データをｙ　　（ｉ）、ｉ＝１〜２００で表わ
す。The third diagram is a processing flow diagram in the linear prediction analysis unit 21. Here, the audio sampling frequency is 10Kllz
, ■The time length of the frame is 20 m5. One frame of audio waveform data is expressed as y (i), i=1 to 200.

プリエンファシスには次式で示される一次差分を用いる
。For pre-emphasis, a first-order difference expressed by the following equation is used.

ｙ’　　（ｉ＞＝ｙ　　（ｉ＋１）−ｙ　　（ｉ）　　
−・−・（１）フレーム切出しによる周波数ひずみの影
古を軽減１シるために次式で示されろハミング窓をかけ
る。y'(i>=y (i+1)-y (i)
--- (1) In order to reduce the effect of frequency distortion caused by frame extraction, a Hamming window is applied as shown by the following equation.

ｙ′　（ｉ）＝Ｈ（ｉ）　　・ｙ′　（ｉ）　　−・−
・−（２）Ｈ（ｉ）　　−〇、５４−○、、＋６ＣＯ５（２π　ｉ　／２００）　　　　　　−一−−＜
３＋次に次式で示される短区間自己相関関数Ｒ１を算出
する。y′ (i)=H(i) ・y′ (i) −・−
・−(2)H(i) −〇, 54−○,, +6 CO5(2π i /200) −1−−<
3+Next, calculate the short-term autocorrelation function R1 expressed by the following equation.

一一一一・−（４） αパラメータは次の連立−次方程式を解（ことによって
求められる。1111・-(4) The α parameter is obtained by solving the following simultaneous -order equations.

実際には、Ｄｕｒｂｉｎ法等の再帰的解法によって効率
的うこ解くことができ、ｋパラメータも同時に求めるこ
とができる。In reality, it is possible to efficiently solve the problem using a recursive solution method such as the Durbin method, and the k parameter can also be determined at the same time.

声道の伝達関数は次式で表わされる。The transfer function of the vocal tract is expressed by the following equation.

したがって、高次方程式求根部２２において１＋Σ　α
、Ｚ−“＝０なる高次方程式の根をニュートン′・ラブ
ラン法等によって解くことにより、声道伝達関数の極を
求めることができる。さらにホルマント選択部２３にお
いて、周波数、帯域幅を考宙して複数の根の中から求め
るホルマントに対応する根を選び出す。Therefore, in the higher-order equation root finding section 22, 1+Σ α
, Z-"=0 by solving the roots of the higher-order equation using the Newton'-Labran method, etc., the poles of the vocal tract transfer function can be found.Furthermore, in the formant selection section 23, the frequency and bandwidth are taken into consideration. Then select the root corresponding to the desired formant from among multiple roots.

発明が解決しようとする問題点上記のような構成のホルマント抽出装置において（ま高
次の代数方程式を解かなければならないために、浮動小
数点演算が必要となり、処理時間がかかるという問題点
がある。また求まった複数の根の中から、求めようとす
るホルマントに相当する根を選択する際に誤りを生ずる
。Problems to be Solved by the Invention In the formant extracting device configured as described above, there is a problem in that floating point calculations are required and processing time is required because high-order algebraic equations must be solved. Also, errors occur when selecting a root corresponding to the desired formant from among the multiple roots found.

本発明シまかかる点に鑑みてなされたもので、簡易にか
つ一意に求めるホルマント周波数を得ることのできるホ
ルマント抽出装置を提供することを目的としている。The present invention has been made in view of the above problems, and an object of the present invention is to provide a formant extraction device that can easily and uniquely obtain a desired formant frequency.

問題点を解決するための手段本発明は上記問題点を解決するため音声特徴量によって
、入力母音音声波形データの母音判別を行なう母音判別
部と、あらかじめ複数話者の該当複数母音の上記音声特
徴量データとホルマント周波数データとを基に、男女別
、各母音毎に音声特徴量からホルマント周波数を推定す
る重回帰分析を行なって得られる回帰係数を記憶して、
上記母音判別結果および上記人力音声波形データの性別
情報により、該当する母音、性別の回帰係数を出力する
ホルマント推定係数記憶部と、上記音声特徴量と上記該
当する母音、性別の回帰係数とから回帰直線値としての
ホルマント周波数推定値を算出して出力するホルマント
推定部とを備えたものである。Means for Solving the Problems In order to solve the above-mentioned problems, the present invention includes a vowel discriminator that discriminates vowels in input vowel speech waveform data based on speech feature quantities, and a vowel discriminator that discriminates vowels in input vowel speech waveform data using speech feature quantities, and the above-mentioned speech characteristics of corresponding plural vowels of plural speakers in advance. Based on the volume data and formant frequency data, multiple regression analysis is performed to estimate the formant frequency from the voice feature amount for each gender and each vowel, and the regression coefficients obtained are stored.
A formant estimation coefficient storage unit that outputs a regression coefficient for the corresponding vowel and gender based on the vowel discrimination result and the gender information in the human voice waveform data, and a formant estimation coefficient storage unit that performs regression from the voice feature amount and the regression coefficient for the corresponding vowel and gender. and a formant estimator that calculates and outputs a formant frequency estimated value as a linear value.

作用本発明は上記した（１′Ｎ成により、母音判別および性
別４８報によって、男女別、各母音毎シこ用意してある
回帰係数の中から該当するものを選び出し、音声特徴量
とこの回帰係数との積和演算によってホルマント周波数
を求めることができる。Effects of the present invention As described above (1'N formation, vowel discrimination and gender 48 information, the corresponding one is selected from among the regression coefficients prepared for each gender and each vowel, and the voice feature amount and this regression The formant frequency can be determined by a product-sum operation with the coefficients.

重回月分析法とは、ある変数ｙとそれに形容をおよぼす
と考えられる他の変数Ｘ＋、Ｘｚ＋　・・・Ｘ。The multiple monthly analysis method consists of a certain variable y and other variables that are thought to affect it, X+, Xz+...X.

に関するデータにもとづいて予測ａｏ　　＋３．Ｘ、””＋ａＦ　　ＸＦ　　　　　　ｙ
のように予測する１つの方法である。予測の精度は変数
ｘ、ｌ　　ｘｚ　＋　・・・Ｘ、と変数ｙとの相関の度
合（回帰直線への適合の度合）に依存する。またＮ個の
データから算出した回帰直線へのデータの当てはまりが
良くない場合、変数Ｘ　＋　＋　　Ｘｚ　＋　・・・Ｘ
、の空間を分割し、より小さなデータ集合（データ数ｎ
１．ｎ２・・・＜＜Ｎ）毎に予測を行うことにより、精
度を上げることができる。本発明においては、男女別、
各母音毎に回帰係数を求めている。Prediction based on data regarding ao +3. X, “”+aF XF y
This is one way to predict. The accuracy of prediction depends on the degree of correlation between variables x, l xz + . . . , and variable y (degree of fit to the regression line). Also, if the data fit poorly to the regression line calculated from N pieces of data, the variables X + + Xz + ...X
, by dividing the space of , into smaller data sets (number of data n
1. By performing prediction every n2...<<N), accuracy can be improved. In the present invention, by gender,
Regression coefficients are calculated for each vowel.

ここで回帰係数の算出法について説明しておく。Here, the method for calculating the regression coefficient will be explained.

Ｐ次の音声特徴ｉｘｉ。、　　ｉ＝ｌ、・・・、Ｐから
第ｊホルマント周波数ｆ、、、ｊ＝１．・・・、を推定
する回帰モデルは次式で示される。P-order audio features ixi. , i=l, . . . , from P to the j-th formant frequency f, , j=1. The regression model for estimating . . . is expressed by the following equation.

このとき推定誤差ｅ。は次式で示される。At this time, the estimation error e. is expressed by the following equation.

複数話者の複数母音にわたるＮ個のサンプルシこついて
ｅ、、′を加え合わせたものをε２とする。Let ε2 be the sum of N samples e, , ′ covering multiple vowels from multiple speakers.

ε２を最小とする条件δε２／δａ４、−〇（Ｑ＜ｉ＜
Ｐ）により、次の連立−次方程弐を解くことによって、
回帰係数を求めることができる。Condition to minimize ε2 δε2/δa4, -〇(Q<i<
P), by solving the following coalition - next step 2,
Regression coefficients can be found.

実施例以下本発明の一実施例のホルマント抽出装置について図
面を見ながら説明する。EXAMPLE Hereinafter, a formant extracting apparatus according to an embodiment of the present invention will be described with reference to the drawings.

第１図は本発明のホルマント抽出装置の第１の一実施例
を示す要部ブロック図である。第１図において、１１は
音害分析部、１２は母音判別部、１３はホルマント推定
係数記憶部、１４はホルマント推定部である。FIG. 1 is a block diagram of essential parts showing a first embodiment of the formant extracting apparatus of the present invention. In FIG. 1, 11 is a sound noise analysis section, 12 is a vowel discrimination section, 13 is a formant estimation coefficient storage section, and 14 is a formant estimation section.

以上のように構成されたホルマント抽出装置について、
以下その動作を説明する。Regarding the formant extraction device configured as above,
The operation will be explained below.

まず音容分析部１１において、入力音声波形データを分
析してスペクトル包絡を表わすＰ次元の音声特徴ｌｘｉ
　、ｉ　＝　１．　・・・Ｐを算出する。First, the sound volume analysis unit 11 analyzes input speech waveform data to obtain a P-dimensional speech feature lxi representing a spectral envelope.
, i = 1. ...Calculate P.

母音判別と（ま音声持敬量ｘｉ、ｉ＝１．・・・Ｐに関
して母音毎に得られているデータサンプルにもとづき、
これらの音声特敬里の値から、入力音声波形データがど
の母音に属するかを判別する方法であり、線形判別関数
が多く用いられる。Based on the data samples obtained for each vowel regarding vowel discrimination and (the amount of phonetic respect xi, i = 1...P),
This is a method of determining which vowel the input speech waveform data belongs to based on these speech values, and a linear discriminant function is often used.

第１．第２ホルマントの２つのホルマント周波数を推定
するものとすると、ホルマント推定係数記憶部１３にお
いては、あらかじめ複数男女話者の日本語５母音の上記
音声特徴量データと第１．第２ホルマント周波数データ
とをもとに、男女別、各母音毎に音声特徴量からそれぞ
れ第１．第２ホルマント周波数を推定する重回帰分析を
行なって得られる回帰係数を記［、ておく。そして母音
量刑結果および性別情報により、該当する母音、性別の
回帰係数を出力する。これをａｊｉ、Ｊ＝１゜２　　ｉ
＝ｏ、・・・Ｐで表わすものとする。1st. Assuming that two formant frequencies of the second formant are to be estimated, the formant estimation coefficient storage unit 13 stores in advance the voice feature data of the five Japanese vowels of multiple male and female speakers and the first and second formant frequencies. Based on the second formant frequency data, the first formant frequency data is calculated from the voice features for each gender and each vowel. Write down the regression coefficients obtained by performing multiple regression analysis to estimate the second formant frequency. Then, based on the vowel sentence result and gender information, regression coefficients for the corresponding vowel and gender are output. This is aji, J=1゜2i
= o, . . . shall be represented by P.

ホルマント推定部１４においては、上記音声特徴ｌｘ１
．ｉ＝１．−Ｐと上記回帰係数”ｉ；、Ｊ−１，２ｉ＝
０．　・・・Ｐとから次式によって第１゜第２ホルマン
ト周／ｉ数准定値ｆ；、Ｊ＝１．２を算出する。In the formant estimating unit 14, the voice feature lx1
．． i=1. −P and the above regression coefficient “i;, J−1,2i=
0. . . . From P, calculate the 1st degree second formant period/i number quasi-definite value f;, J=1.2 using the following equation.

以上のように木−実施例によれば、母音判別部１２とホ
ルマント推定係数記憶部１３とホルマント推定部１４を
設けることにより、スペクトル包絡を表わす音声時ｆｆ
１ｆｆｉから、ホルマント周波数を推定することができ
る。As described above, according to the tree embodiment, by providing the vowel discrimination unit 12, the formant estimation coefficient storage unit 13, and the formant estimation unit 14, the speech time ff representing the spectral envelope is
From 1ffi, the formant frequency can be estimated.

以下本発明の第２の一実施例について図面を参照しなが
ら説明する。A second embodiment of the present invention will be described below with reference to the drawings.

第４図は本発明の第２の一実施例を示すホルマント抽出
装置の要部ブロック図である。母音判別部４２、ホルマ
ント推定係数記憶部４３、ホルマント推定部・４４の構
成は第１図と同様なものである。合口分析部に線形予測
分析部４１を用い、スペクトル包絡を表わす音声性微量
としてαパラメータを出力するものである。FIG. 4 is a block diagram of the main parts of a formant extraction device showing a second embodiment of the present invention. The configurations of the vowel discrimination section 42, formant estimation coefficient storage section 43, and formant estimation section 44 are similar to those shown in FIG. A linear predictive analysis section 41 is used as the abutment analysis section, and the α parameter is output as a phonetic trace representing the spectrum envelope.

線形予測分析部４１における処理は、第２図の従来例に
おける線形予測分析部２１における処理フロー図（第３
図）と同一のものである。The processing in the linear prediction analysis unit 41 is as shown in the processing flow diagram of the linear prediction analysis unit 21 in the conventional example shown in FIG.
(Figure).

母音判別部４２とホルマント推定係数記憶部４３とホル
マント推定部４４とを設けることにより、従来ホルマン
ト周波数を求めるために高次の代数方程式を解いていた
のにくらべて、大幅に演算量を減らすことができる。By providing the vowel discrimination section 42, formant estimation coefficient storage section 43, and formant estimation section 44, the amount of calculation can be significantly reduced compared to the conventional method of solving a high-order algebraic equation to obtain the formant frequency. Can be done.

以下本発明の第３の一実施例について図面を参照しなが
ら説明する。A third embodiment of the present invention will be described below with reference to the drawings.

第５図は本発明の第３の一実施例を示すホルマント抽出
装置の要部ブロック図である。母音判別部５２、ホルマ
ント推定係数記憶部５３、ホルマント推定部５４の構成
は第１図と同様なものである。音響分析部を入力音声波
形データからαパラメータを算出する線形予測分析部５
１１と、αパラメータからＬ　Ｐ　Ｃケプストラム係数
を算出するケプストラム算出部５１２とから構成してい
る。FIG. 5 is a block diagram of main parts of a formant extracting apparatus showing a third embodiment of the present invention. The configurations of the vowel discrimination section 52, formant estimation coefficient storage section 53, and formant estimation section 54 are the same as those shown in FIG. A linear predictive analysis unit 5 that calculates the α parameter from the audio waveform data input to the acoustic analysis unit.
11, and a cepstrum calculation unit 512 that calculates LPC cepstrum coefficients from the α parameter.

線形予測分析部５１１における処理は、第２図の従来例
における線形予測分析部２１における処理フロー図（第
３図）と同一のものである。The processing in the linear prediction analysis section 511 is the same as the processing flow diagram (FIG. 3) in the linear prediction analysis section 21 in the conventional example shown in FIG.

ケプストラム算出部５１２においては、次式によってＬ
ＰＣケプストラム係数Ｃ；、ｉ＝１．　・・・Ｐ−−一
−・−Ｃｌ２１発明の効果以上のように本発明は母音判別部とホルマント推定係数
記憶部とホルマント推定部とを設けることにより、スペ
クトル包絡を表わす音声性微量かろホルマント周波数を
推定することができるもので、音響分析部に線形予測分
析を用い、上記音声性微量としてαパラメータを用いた
場合、従来のホルマント抽出装置において高次の代数方
程式を解いていたのにくらべて、大幅に演算量を減らす
ことができる。また音響分析部に線形予測分析部とケプ
ストラム算出部とを用い、上記音声性微量としてＬＰＣ
ケプストラム係数を用いても同様の効果が得られる。In the cepstrum calculation unit 512, L
PC cepstral coefficient C;, i=1. ...P--1--Cl21 Effects of the Invention As described above, the present invention provides a vowel discrimination section, a formant estimation coefficient storage section, and a formant estimation section, thereby determining the phonetic trace amount and formant frequency representing the spectral envelope. When linear predictive analysis is used in the acoustic analysis section and the α parameter is used as the phonetic trace, it is possible to estimate the , the amount of calculation can be significantly reduced. In addition, a linear prediction analysis section and a cepstrum calculation section are used in the acoustic analysis section, and the LPC is used as the phonetic trace amount.
A similar effect can be obtained by using cepstral coefficients.

[Brief explanation of the drawing]

第１図は本発明の第１の一実施例におけるホルマント抽
出装置の要部ブロック図、第２図は従来のホルマント抽
出装置の要部ブロック図、第３図：よ第２２１１の線形
予測分析部における処理フロー図、第４図５よ本発明の
第２の一実施例におけるホルマント抽出装置の要部ブロ
ック図、第５図は本発明の第３の一実施例におけるホル
マント抽出Ｈｆｆｉの要部ブロック図である。１１・・・・・・音響分析部、１２・・・・・・母音判
別部、１３・・・・・・ホルマント推定係数記憶部、１
４・・・・・・ホルマント推定部、２１・・・・・・線
形予測分析部、２２・・・・・・高次方程式木根部、２
３・・・・・・ホルマントａ択部、４１・・・・・・線
形予測分析部、４２・・・・・・母音判別部、４３・・
・・・・ホルマント推定係数記憶部、４４・・・・・・
ホルマント推定部、５１１・・・・・・線形予測分析部
、５１２・・・・・・ケプストラム算出部、５２・・・
・・・母音判別部、５３・・・・・・ホルマント推定係
数記憶部、５４・・・・・・ホルマント推定部。Fig. 1 is a block diagram of main parts of a formant extractor according to the first embodiment of the present invention, Fig. 2 is a block diagram of main parts of a conventional formant extractor, Fig. 3: Linear predictive analysis section of 2211 FIG. 4 is a block diagram of the main part of the formant extraction device in the second embodiment of the present invention, and FIG. 5 is a block diagram of the main part of the formant extraction Hffi in the third embodiment of the present invention. It is a diagram. 11... Acoustic analysis unit, 12... Vowel discrimination unit, 13... Formant estimation coefficient storage unit, 1
4... Formant estimation section, 21... Linear prediction analysis section, 22... Higher order equation tree root section, 2
3... Formant a selection section, 41... Linear prediction analysis section, 42... Vowel discrimination section, 43...
... Formant estimation coefficient storage section, 44 ...
Formant estimation section, 511...Linear prediction analysis section, 512...Cepstrum calculation section, 52...
. . . Vowel discrimination section, 53 . . . Formant estimation coefficient storage section, 54 . . . Formant estimation section.

Claims

[Claims]

(1) an acoustic analysis unit that performs acoustic analysis on input vowel audio waveform data to calculate audio features representing a spectral envelope; and a vowel discrimination unit that performs vowel discrimination of the input vowel audio waveform data based on the audio features; Based on the voice feature data and formant frequency data of the corresponding plural vowels of multiple speakers in advance, a regression coefficient obtained by performing a multiple regression analysis to estimate the formant frequency from the voice feature amount for each vowel by gender and each vowel is calculated. The corresponding vowel,
a formant estimation coefficient storage unit that outputs a gender regression coefficient; and a formant estimation unit that calculates and outputs a formant frequency estimate as a regression line value from the voice feature amount, the corresponding vowel, and the gender regression coefficient. A formant extraction device comprising:

(2) The formant extraction device according to claim (1), characterized in that an α parameter is used as the audio feature.

(3) The formant extraction device according to claim (1), characterized in that LPC cepstral coefficients are used as voice features.