JPS63124100A

JPS63124100A - Fundamental frequency analyzer

Info

Publication number: JPS63124100A
Application number: JP27207986A
Authority: JP
Inventors: 藤崎　博也; 広瀬　啓吉; 清水　圭典; 幹雄山口
Original assignee: Sumitomo Electric Industries Ltd
Current assignee: Sumitomo Electric Industries Ltd
Priority date: 1986-11-13
Filing date: 1986-11-13
Publication date: 1988-05-27
Anticipated expiration: 2011-11-27
Also published as: JP2558658B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は音声の基本周波数を分析する装置の改良に関す
るもので、音声の分析合成、音声確認、音声の高能率符
号化伝送などの用途においてもちいられる、音声の基本
周波数分析装置に関する。[Detailed Description of the Invention] [Field of Industrial Application] The present invention relates to an improvement of a device for analyzing the fundamental frequency of speech, and is useful in applications such as speech analysis and synthesis, speech confirmation, and high-efficiency coding transmission of speech. This article relates to a voice fundamental frequency analysis device that can be used.

[Conventional technology]

音声の基本周波数の抽出方法はケプストラムによる方法
、音声波形自体の自己相関関数を利用する方法など多く
の方法があるが、本発明に対応する従来の技術として線
形予測分析の結果得られる残差波形の自己相関関数を利
用する方法について説明する。There are many methods for extracting the fundamental frequency of speech, such as a cepstrum method and a method that uses the autocorrelation function of the speech waveform itself.However, as a conventional technique corresponding to the present invention, a residual waveform obtained as a result of linear predictive analysis is used. We will explain how to use the autocorrelation function of

第２図に従来技術による基本周波数の求め方を示す、ま
た第３図に正規化自己相関関数を求めるまでのデータを
模式化して示す。FIG. 2 shows how to find the fundamental frequency according to the prior art, and FIG. 3 schematically shows the data up to finding the normalized autocorrelation function.

まず、入力となる音声波形は、時とともに状態の変化す
る（有声から無声へ変わったり、調音様式が変わったり
する）、時間的に連続した波形であるが、これによる短
期間の波形の抽出処理２１によって、ある期間（たとえ
ば２０ｍ５ｅｃ）だけ取り出す０次にハミング窓やハニ
ング窓等の窓関数をかける処理２２を施すことによって
、窓の端での波の振幅を弱めることにより、窓の切り出
し境界での時点での影響を小さくする。そして窓の長さ
の間では音声波形を生成する系の状態が定常的であると
仮定して線形予測分析２３を行う、線形予測分析によっ
て音源情報に対応する残差波形と、声道伝達関数に対応
する係数が得られる。残差波形は無声音の場合はほぼラ
ンダム雑音に近いものが得られ、有声音の場合は声帯の
振動に対応したインパルス列が得られる（第３図３４）
。First, the input speech waveform is a temporally continuous waveform whose state changes over time (changes from voiced to unvoiced, articulatory style changes, etc.), but this is used to extract short-term waveforms. 21, by applying a window function such as a zero-order Hamming window or Hanning window that extracts only a certain period (for example, 20 m5ec), the amplitude of the wave at the edge of the window is weakened. Reduce the impact at the point in time. Then, linear predictive analysis 23 is performed on the assumption that the state of the system that generates the speech waveform is stationary during the length of the window.The residual waveform corresponding to the sound source information and the vocal tract transfer function The coefficient corresponding to is obtained. In the case of unvoiced sounds, a residual waveform similar to random noise is obtained, and in the case of voiced sounds, an impulse train corresponding to the vibration of the vocal cords is obtained (Fig. 3, 34).
.

基本周波数を求めるにはこのインパルス列の繰り返しの
周期を求めれば良いので、残差波形の正規化自己相関関
数（自己相関関数をパワーで割り、遅れ時間τがτ−０
のとき、関数値が１になるように正規化した関数）を求
めると、基本周波数の周期（τ、とする）の整数倍で遅
らせた正規化自己相関関数は極大値を示す、基本周波数
を求めるには残差波形の正規化自己相関関数からこの極
大値をみつけ、τ、の逆数を計算することにより、基本
周波数を算出すればよい、実際には無声音の場合や無音
（有声音も無声音も出ていない期間）の場合もあるので
以下の説明のように処理を行う。To find the fundamental frequency, it is sufficient to find the repetition period of this impulse train, so the normalized autocorrelation function of the residual waveform (autocorrelation function divided by the power, delay time τ is τ - 0
, the normalized autocorrelation function delayed by an integer multiple of the period (τ,) of the fundamental frequency shows a maximum value, and the fundamental frequency is To find this, the fundamental frequency can be calculated by finding this maximum value from the normalized autocorrelation function of the residual waveform and calculating the reciprocal of τ. There may be cases where the period is not shown, so the process is performed as explained below.

無声音の残差波形の正規化自己相関関数の場合、遅れ時
間τが０のとき最大値１を取る以外は比較的小さな値を
とる。有声音の残差波形の正規化自己相関関数の場合は
τ＝０で最大値１をとる以外に、τ−τ２、τ−２τ２
、・・・・・・・・・・・・で極大値となる。また基本
周波数の周期τ、と関係の無いての値のときにも、極大
値を示すことがよくある。In the case of the normalized autocorrelation function of the residual waveform of an unvoiced sound, it takes relatively small values except for the maximum value of 1 when the delay time τ is 0. In the case of the normalized autocorrelation function of the residual waveform of a voiced sound, in addition to taking the maximum value 1 at τ=0, τ−τ2, τ−2τ2
The maximum value is reached at , . . . . Further, even when the value has no relation to the period τ of the fundamental frequency, it often shows a maximum value.

また、無音のときは本来は音声波形には全く信号が現れ
ないはずであるが、通常はノイズ（特に電源からの５０
又は６　ＧＨｚのハム音）が存在するため、そのノイズ
により正規化自己相関関数が有声音の場合と同様に周期
的に極大値を示すことがよくある。しかし、ノイズは音
声波形に比べて小さいので残差波形のパワーを見ればそ
の値が小さいことで判別できる。有声、無声、無音の判
別を含めた実際の処理は次のようにして行なう。Also, when there is no sound, no signal should normally appear in the audio waveform, but there is usually noise (especially 50% noise from the power supply).
or a 6 GHz hum), the normalized autocorrelation function often shows a maximum value periodically due to the noise, as in the case of voiced sounds. However, since noise is smaller than the voice waveform, it can be determined by looking at the power of the residual waveform, based on its small value. Actual processing including voiced, unvoiced, and silent discrimination is performed as follows.

先ず、閾値θ、を設けて、窓の長さだけ求まっている残
差波形パワーがθ、を越えないときは無音と判別する（
このときの基本周波数は”なし”とする）、無音でない
場合は、次に閾値θ、を設けて、残差波形の正規化自己
相関関数のτ−０近辺以外の極大値がθ、を越えないな
らば無声音と判別する（このときも基本周波数は”なし
”とする）、無音でも無声音でもないときが有声音の場
合であり、残差波形の正規化自己相関関数のτ＝０近辺
以外で０９を越える極大値をとるτΦ値を、極大値の大
きいほうから順に基本周波数の周期の第一候補、第二候
補、第三候補、・・・・・・とし、周期の逆数をとって
基本周波数第一候補、第二候補、第三候補、・・・・・
・とする（符号２６）。First, a threshold value θ is set, and when the residual waveform power determined by the length of the window does not exceed θ, it is determined that there is no sound (
(The fundamental frequency at this time is "none"), and if there is no silence, then set a threshold θ, so that the maximum value of the normalized autocorrelation function of the residual waveform other than around τ−0 exceeds θ. If there is no sound, it is determined to be unvoiced (the fundamental frequency is set to "none" in this case as well).If it is neither silent nor unvoiced, it is a voiced sound, and the normalized autocorrelation function of the residual waveform is outside the vicinity of τ = 0. The τΦ values that take a maximum value exceeding 09 are taken as the first candidate, second candidate, third candidate, etc. for the period of the fundamental frequency in order from the one with the largest maximum value, and the reciprocal of the period is taken. Fundamental frequency first candidate, second candidate, third candidate, etc.
・(code 26).

以上の基本周波数の候補は窓位置を少しずつ（たとえば
１０ｍ５ｅｃ）ずらしながら、各窓位置に対して求める
。窓位置をずらす様子を第４図に模式的に示す、そして
各窓位置に対して求まった基本周波数の候補の様子を第
二候補まで第５図に示す０図中■は第一候補であり、■
は第二候補である。最終的に求まる基本周波数は基本的
には第一候補の周波数とするが、前後の窓位置における
基本周波数の候補とのつながり具合をみて、第二候補、
第三候補による周波数の方が各窓位置の基本周波数の時
間変化パタンの方が滑らかに続くならばそちらを基本周
波数とする（第２図符号２７参照）、窓位置Ｎにおける
基本周波数は例えば次のようにして候補から選択する。The above fundamental frequency candidates are determined for each window position while shifting the window position little by little (for example, by 10 m5ec). Figure 4 schematically shows how the window position is shifted, and Figure 5 shows the fundamental frequency candidates found for each window position up to the second candidate. , ■
is the second candidate. The fundamental frequency finally determined is basically the frequency of the first candidate, but the second candidate,
If the frequency according to the third candidate has a smoother time-varying pattern of the fundamental frequency at each window position, then that is set as the fundamental frequency (see reference numeral 27 in Figure 2).The fundamental frequency at window position N is, for example, as follows: Select from the candidates as follows.

まず、Ｎ−２、Ｎ−１、Ｎ、Ｎ＋１、Ｎ＋２のそれぞれ
の窓位置における基本周波数の第一候補をもとめると、
順に１２８，１０４．１２１．１０７．１０８Ｈｚであ
る。これらを大きい順に並べると１２８．１２１．１０
８．１０７．１０４Ｈｚであり、三番目すなわち中央の
値は１０８Ｈｚである。窓位置Ｎにおいてはこの中央の
値に最も近い値を候補から選出して基本周波数とする。First, when finding the first candidate for the fundamental frequency at each window position of N-2, N-1, N, N+1, and N+2,
They are 128, 104.121.107.108Hz in order. Arranging these in descending order is 128.121.10
8.107.104Hz, and the third or middle value is 108Hz. At window position N, the value closest to this central value is selected from the candidates and set as the fundamental frequency.

すなわち第二候補１０６Ｈｚの方が第一候補１２１Ｈｚ
よりも１０８Ｈｚに近いので第二候補を最終的に基本周
波数とする。この処理を全ての窓位置の基本周波数の候
補に対して行い、各窓位置における基本周波数をきめる
。In other words, the second candidate 106Hz is better than the first candidate 121Hz.
Since it is closer to 108 Hz than the second candidate, the second candidate is finally selected as the fundamental frequency. This process is performed on candidates for fundamental frequencies at all window positions to determine the fundamental frequency at each window position.

[Problem that the invention seeks to solve]

従来技術では、窓を設定しその期間内では音声生成系が
定常状態であると仮定して分析を行っていた。そのため
、次のような問題点がある。In the prior art, analysis is performed by setting a window and assuming that the speech generation system is in a steady state within that period. Therefore, the following problems arise.

＋１）　　音声波形に対する窓の位置が常に最適である
とは限らないため、窓の位置によっては基本周波数の分
析精度が悪くなることがある。+1) Since the position of the window with respect to the audio waveform is not always optimal, the accuracy of fundamental frequency analysis may deteriorate depending on the position of the window.

（２）窓の長さが常に最適であるとは限らない。窓が長
いとそれだけ定常状態と仮定している（実際には非定常
だカ９期間が長くなり、基本周波数が急激に変わるとこ
ろでは基本周波数を抽出することができなくなる。逆に
窓が短ぐ、基本周波数の周期の二倍より小さくなると（
低い声の場合に起こりやすい）基本周波数の抽出ができ
ないことがある。(2) The window length is not always optimal. The longer the window is, the more it is assumed to be in a steady state (in reality, it is unsteady, so the period becomes longer, and it becomes impossible to extract the fundamental frequency where the fundamental frequency changes rapidly. Conversely, the shorter the window, the longer the period becomes. , when it becomes smaller than twice the period of the fundamental frequency (
It may not be possible to extract the fundamental frequency (which tends to occur with low voices).

基本周波数の変化範囲は同一人でも会話などでは２オク
ターブに達するので有限で一定の窓の長さでは常に最適
であることは困難であり、分析精度をあげるためには窓
の長さを適応的に変化させることが望ましい。The range of change in the fundamental frequency reaches two octaves even for the same person during conversation, so it is difficult to always achieve the optimum with a finite and fixed window length.In order to improve analysis accuracy, the window length must be adjusted adaptively. It is desirable to change it to

[Means for solving problems]

第１図に本発明による処理のブロック図を示す。 FIG. 1 shows a block diagram of processing according to the present invention.

１は連続的に線形予測分析（ここでは線形予測分析と等
価な偏自己相関分析またはそれらの変形手法による分析
を含めて考える）を行う部分で時間的に連続した残差波
形を出力する。２はローパスフィルタで、残差波形の低
周波成分を強調する。1 is a part that continuously performs linear predictive analysis (here, we will consider analysis including partial autocorrelation analysis equivalent to linear predictive analysis or analysis using modified methods thereof), and outputs a temporally continuous residual waveform. 2 is a low-pass filter that emphasizes low frequency components of the residual waveform.

３は指数関数的に減衰する半無限長窓を用いることを示
す、４は連続的に自己相関関数を算出することを示す、
３と４は説明を分かり易くするために別々に表している
が実施例に述べるように一度に処理をすることができる
。５は自己相関関数の極大値と残差波形のパワーに対し
闇値を設け、有声音に対してのみ、基本周波数の候補を
みつける処理を進める部分である。６は自己相関関数の
極大値をとる遅れ時間τから基本周波数の候補を求める
部分である。７は基本周波数の候補から基本周波数を選
択する部分である。3 indicates that a semi-infinite window with exponential decay is used; 4 indicates that the autocorrelation function is continuously calculated;
3 and 4 are shown separately to make the explanation easier to understand, but they can be processed at the same time as described in the embodiment. 5 is a part that sets dark values for the maximum value of the autocorrelation function and the power of the residual waveform, and proceeds with the process of finding fundamental frequency candidates only for voiced sounds. Reference numeral 6 is a part for determining fundamental frequency candidates from the delay time τ at which the autocorrelation function takes the maximum value. 7 is a part for selecting a fundamental frequency from fundamental frequency candidates.

[Effect]

第６図に本発明による音声波形の処理経過を概念的に示
す、６１は入力となる音声波形である。FIG. 6 conceptually shows the processing progress of a speech waveform according to the present invention. Reference numeral 61 indicates an input speech waveform.

これを１により連続的線形予測分析をおこなって残差波
形を求めたのが６２である。さらにローパスフィルタ２
によって、高周波成分を減衰させ低周波域を強調すると
６３のような波形が得られる。62 is obtained by performing continuous linear predictive analysis using 1 to obtain a residual waveform. Furthermore, low pass filter 2
By attenuating the high frequency components and emphasizing the low frequency range, a waveform like 63 is obtained.

６４は窓関数と掛は合わされる残差波形の部分を示す、
６５は指数関数的に減衰する半無限長の窓関数を示して
いる。６４と６５を各サンプル時点ごとに掛は合わせる
ことによって、窓の開始時点では波の大きさが大きいが
、時間的に古くなるに従って、波の大きさが小さく縮小
された波形６６が得られる。これはすなわち、過去の値
に対して現在の値に重みづけを行っていることを意味す
る。64 indicates the portion of the residual waveform that is combined with the window function,
65 indicates a window function of semi-infinite length that decays exponentially. By multiplying 64 and 65 together for each sample time, a waveform 66 is obtained in which the wave size is large at the beginning of the window, but the wave size is reduced as time passes. This means that the current value is weighted relative to the past value.

更に６６の波形に対して自己相関関数を計算したものが
６７の自己相関関数である。第３図では分かり易くする
たるめに正規化したうえで横方向に拡大して示しである
。Further, the autocorrelation function of 67 is obtained by calculating the autocorrelation function for the 66 waveforms. In FIG. 3, the image is normalized and enlarged in the horizontal direction for clarity.

残差波形のパワーと正規化自己相関関数の僅によって有
声か否かの判別を、従来の技術で示した通りおこなうこ
とができる。自己相関関数からは従来技術と同様にその
極大値を示す遅れ時間τ１、τ、・・・・・・の値を基
本周波数の周期の候補とすることができ、基本周波数の
候補を求め、基本周波数の候補の中から選定する。第６
図では概念的に説明するために残差波形に指数関数的に
減衰する半無限長の窓関数を掛けた結果６６を示してい
るが、実施例で示すように６６を求めなくても直接自己
相関関数を求めることができる。It is possible to determine whether a voice is voiced or not based on the power of the residual waveform and the magnitude of the normalized autocorrelation function, as shown in the prior art. From the autocorrelation function, the value of the delay time τ1, τ, ..., which indicates the maximum value, can be used as a candidate for the period of the fundamental frequency, as in the conventional technology. Select from frequency candidates. 6th
The figure shows the result 66 obtained by multiplying the residual waveform by a semi-infinite window function that decays exponentially for conceptual explanation, but as shown in the example, it is not necessary to obtain 66 directly Correlation functions can be found.

以上の処理によって、窓の開始位置がサンプル点ｎにあ
るときの基本周波数を求めることができる。窓の開始位
置を順次ずらしながら、各位置における基本周波数を求
めることで、基本周波数の時間変化バタンか求められる
。Through the above processing, the fundamental frequency when the window start position is at sample point n can be found. By sequentially shifting the starting position of the window and finding the fundamental frequency at each position, the temporal change of the fundamental frequency can be found.

〔実施例〕まず連続的な残差波形を求めるための線形予測分析の実
施例として、格子法の計算により予測残差波形を求める
方法を第７図に示す、格子法は斎藤、中日：″音声情報
処理の基礎”オーム社（１９８１）　１）０−１）２頁
に解説がある。ここでは従来の格子法を時間的に連続し
て処理できるように変形して用いる。[Example] First, as an example of linear prediction analysis to obtain a continuous residual waveform, a method for obtaining a predicted residual waveform by calculating the grid method is shown in Fig. 7.The grid method is described by Saito, Chunichi: There is an explanation on "Fundamentals of Speech Information Processing", Ohmsha (1981), pages 1) 0-1) 2. Here, the conventional lattice method is modified so that it can be processed temporally continuously.

第７図（ａ）において、ｋ　ｚ”　（ｉ”’　１　＋　
”’＋　　ｐ）はあるサンプル時点ｔにおける偏相関係
数で、各サンプル時点ごとに連続的相互相関演算器Ｃ五
によって求める。あるサンプル時点ｔにおいて１段目の
出力として得られる前向予測残差をεＩ′１、後向予測
残差をε、ｊｔｌ、とすると４＋１段目（第７図（ｂ）
）の前向予測残差は、゛　ε（＞、１を一ε：：Ｆ−ｋＬＨ″０ε訊後向予測
残差は、 ε１）＋１）．−εごｌ、−、−ｋ（Ｆ：ｌ）ε８ｔ−
５で計算する。後向予測残差は実際には１つ前のサンプ
ル時点ｔ’−ｔ−１におイテ、ε；）、　　１（ｉ：＋
１６　：ｌ。In FIG. 7(a), k z"(i"' 1 +
``'+ p) is the partial correlation coefficient at a certain sample time t, which is obtained by the continuous cross-correlation calculator C5 for each sample time.The forward prediction residual obtained as the output of the first stage at a certain sample time t If the difference is εI′1 and the backward prediction residual is ε, jtl, then the 4th+1st stage (Fig. 7(b)
), the forward prediction residual for l) ε8t-
Calculate by 5. The backward prediction residual is actually at the previous sample time t'-t-1, ε;), 1(i:+
16:l.

を計算しておき、その値を遅延要素を通すことにより、
サンプル時点ｔにおいてεｖｖとして出力する。初段（
ｉ＝ｏ）の前向予測残差と後向予測残差は、 ε：＊　’ｃ−Ｘｔ　　　（入力波形）ε；、”Ｌ−ｘ
Ｌ−１（１つ前のサンプル時点における入力波形）であ
る、またに２１）１）は、によって計算する。ここで、　　　は、時間平均を意味
する。ここでは、過去の値に重みづけを行った時間平均
を連続的に計算するため、指数関数的に減退する窓をか
けることによって時間平均の計算を行う、窓の実施例と
しては、現在の時刻を０とし、過去にｊτ待時間かのぼ
った時点の窓関数の値をＷ（ｊτ）とすると、Ｗ（ｊτ）ｍｅｘｐ（−ｊτ／Ｔｃ）の関数を用いる。ただし、 τはサンプリング周期、Ｔｃは窓関数の減退の早さを定めるパラメータで１０か
ら２Ｏ−ｓｅｃ程度に選ぶ。By calculating and passing that value through the delay element,
At sample time t, it is output as εvv. Shodan (
The forward prediction residual and backward prediction residual of i=o) are ε: * 'c-Xt (input waveform) ε;, "L-x
L-1 (the input waveform at the previous sample time), and 21) 1) is calculated by: Here, means time average. Here, in order to continuously calculate the time average that weights past values, the time average is calculated by applying an exponentially decreasing window. When is set to 0 and the value of the window function at the time when jτ waiting time has increased in the past is set to W(jτ), the following function is used: W(jτ)mexp(−jτ/Tc). Here, τ is the sampling period, and Tc is a parameter that determines the speed of decline of the window function, which is selected to be about 10 to 2 O-sec.

時系列的に得られるサンプル値ｙ、のサンプル時点ｔに
おける時間平均ｙ、はｙｔ−Σｙｔ−ｊＷ　（ｊτ）で定義する。直前のサンプル時点ｔ−１で得られる時間
平均ｙｔ−１を使えば、Ｙｔ　−ｙｚ−＋　ｅＸｐ（ｆ／ＴＪ　”Ｙｃという漸
化式にって７７が計算できる。この漸化式を用いること
で、　εＪ！ｔεＪ！−１εｔ＋’ｔ　、　εｂ’ｔが
効率よく計算できる。第１図２のローパスフィルタに入
力する残差波形は９段目の出力として得られる前向予測
残差ε、Ｈ！ｄ、をもちいる。The time average y at sample time t of the sample values y obtained in time series is defined as yt-Σyt-jW (jτ). Using the time average yt-1 obtained at the immediately preceding sample time t-1, Yt -yz-+ eXp(f/TJ 77 can be calculated using the recurrence formula Yc.Using this recurrence formula Then, εJ!tεJ!-1εt+'t, εb't can be calculated efficiently.The residual waveform input to the low-pass filter in FIG. Use !d.

次に、２のローパスフィルタについて説明する。Next, the second low-pass filter will be explained.

ローパスフィルタは無くても基本周波数を求めることが
できるが、５００Ｈｚ程度の遮断周波数特性をもつロー
パスフィルタを用いることで残差波形に含まれる基本周
波数成分を強調することができ、分析精度をあげること
ができる。The fundamental frequency can be determined without a low-pass filter, but by using a low-pass filter with a cutoff frequency characteristic of about 500 Hz, the fundamental frequency component contained in the residual waveform can be emphasized, increasing the accuracy of analysis. I can do it.

次に、３の半無限長窓関数の実施例は、Ｗｔ（ｎ　ｒ）
−Ａｔ　ｅｘｐ　（−ｎ　ｆ／Ｔ＋）　（ｎ≧Ｏ）（た
だし、Ａ１−１であり、τはサンプリング周期である。Next, an example of a semi-infinite window function of 3 is Wt(n r)
-At exp (-n f/T+) (n≧O) (where A1-1, and τ is the sampling period.

）で表される一次の指数関数である。Ｔ、の値は窓関数の
減退の速さを定めるパラメータで、たとえば男性では１
５ｍ５ｅｃ、女性では１０ｍ５ｅｃ程度の値を用いる。) is a first-order exponential function expressed as The value of T is a parameter that determines the speed of decline of the window function, for example, 1 for men.
For women, a value of about 10m5ec is used.

サンプル点ｎにおけるにサンプル離れた場所との自己相
関関数の値をσ（ｎ、ｋ）とすると、直前のサンプル点
ｎ−１における自己相関関数の値σ（ｎ−１，ｋ）を用
いて以下の漸化式でサンプル点ｎにおける自己相関関数
を計算できる。If the value of the autocorrelation function at sample point n with a place a few samples away is σ(n, k), then using the value of the autocorrelation function at the immediately preceding sample point n-1 σ(n-1, k), The autocorrelation function at sample point n can be calculated using the following recurrence formula.

σ（ｎ、ｋ）ｍα”ｒ（ｎ−１＋ｋ）＋α”ｘ、ｘｌｌ
−。σ(n,k)mα”r(n-1+k)+α”x,xll
−.

ただし、α＝　ｅｘｐ　（−τ／　’ｒ　Ｉ）ｘｌは残
差波形のサンプル値 τはサンプリング周期に＝ｏ、１．２．・・・。However, α= exp (-τ/'r I)xl is the sample value τ of the residual waveform, where the sampling period is =o, 1.2. ....

である。初期値すなわちｘ７が入力される前の状態はσ
（０，ｋ）−０（ｋ−０，１，２，・・・、）である。It is. The initial value, that is, the state before x7 is input is σ
(0, k)-0 (k-0, 1, 2, . . . ).

以上の漸化式を用いることで、比較的少ない演算量で各
サンプル点における自己相関関数を求めることができる
。ｋ−Ｑの場合、すなわちσ（ｎ。By using the above recurrence formula, the autocorrelation function at each sample point can be obtained with a relatively small amount of calculation. For k-Q, i.e. σ(n.

０）はその窓をかけた後の残差波形のパワーを表してい
る。正規化自己相関関数は、ｅ　（ｎ、ｋ）／ａ　（ｎ、０）（ｋ＝ｏ、１．２．−
、）で表される。実際に計算すべきｋの範囲は、ｋ＝０
．１，２．・・・、に□８とすると、以下の通りである
。観測されうる基本周波数の下限値をＦ　ｓｉｎとする
と、Ｆ　＋ｍｉａによる自己相関関数の極大点は遅れ時
間ｋｌＩ□τ”　１　／　Ｆ−ｔ、ｌのところに現れる
。従ってｋｓａｘ　＝ｃｅｔ１　（１／　（ｒ　Ｆ＋＋
＋＋＋＋　））まで自己相関関数を求めておけばその極
大点を見付けることができる。ただし、ｃｅｉＨｘｌは
Ｘを越える最小の整数を表す。Ｆ　ｓｉｎは男性話者の
通常の発話なら例えば７０Ｈｚ程度に設定する。0) represents the power of the residual waveform after applying the window. The normalized autocorrelation function is e (n, k)/a (n, 0) (k=o, 1.2.-
, ). The range of k that should actually be calculated is k=0
．． 1, 2. ..., □8, it is as follows. If the lower limit of the fundamental frequency that can be observed is F sin, the maximum point of the autocorrelation function due to F + mia appears at the delay time klI□τ” 1 / F−t, l. Therefore, ksax = cet1 (1/ ( r F++
If you calculate the autocorrelation function up to ++++ )), you can find its maximum point. However, ceiHxl represents the smallest integer that exceeds X. F sin is set to, for example, about 70 Hz for normal speech by a male speaker.

５．６の処理は従来技術と同様に行うことができる。７
の基本周波数の選出も、基本周波数′の候補が各サンプ
ル点ごとにえられ、時間的により細かく比較できる以外
は従来技術と同様におこなうことができる。The processing in 5.6 can be performed in the same manner as in the prior art. 7
The selection of the fundamental frequency can be performed in the same manner as in the prior art, except that a candidate for the fundamental frequency' is obtained for each sample point, and a more detailed comparison can be made in terms of time.

以上の説明から判るようにσ（ｎ、　　ｋ）は全てのサ
ンプル点ｎ毎に、すなわち時間的に連続して求めるので
、基本周波数も全てのサンプル点ｎ毎に出力することも
可能である。しかしながら、基本周波数を利用する面か
ら言えば全てのサンプル点毎（たとえばサンプリング周
波数１０ｋＨｚならＱ　、１ｍ５ｅｃ毎）には必ずしも
必要ではない場合が多く、例えば１０ｍ５ｅｃ毎に判れ
ば良い。そのためには７から得られる基本周波数を１０
＋＊ｓｅｃ毎に間引きして用いるか、あるいは３以降ま
たは４以降の処理をｌＱｍｓｅｃ毎に行うことも可能で
ある。As can be seen from the above explanation, since σ(n, k) is obtained for every sample point n, that is, continuously in time, it is also possible to output the fundamental frequency for every sample point n. However, from the point of view of using the fundamental frequency, it is often not necessary to check every sample point (for example, if the sampling frequency is 10 kHz, Q, every 1 m5 ec), and it is sufficient to find out every 10 m5 ec, for example. To do this, we need to change the fundamental frequency obtained from 7 to 10
It is also possible to thin out and use it every +*sec, or to perform the processing after 3 or after 4 every 1Qmsec.

また、半無限長の窓の実施例をあげると、多次の指数関
数的な窓の可能であり、たとえばＷｚ（ｎ　ｒ）　＝Ａ
ｚｎ　Ｔ　ｅＸｐ　（１−ｎτ／Ｔ２）ただし、Ａｚ＝
１／Ｔ富や、ｗ、（ｎ　ｒ）　＝Ａｓ（ｎ　τ）！　ｅＸｐ　（２−
ｎ　ｒ　／Ｔｚ　）ただし、Ａ！　＝１／　（２Ｔ２　
）”という式によって計算することも可能である。Furthermore, to give an example of a window with a semi-infinite length, a multi-order exponential window is possible, for example, Wz(n r) = A
zn T eXp (1-nτ/T2) However, Az=
1/T wealth, w, (n r) = As(n τ)! eXp (2-
n r /Tz) However, A! =1/ (2T2
)” can also be calculated.

Ｗｚ（ｎｒ）やＷ、（ｎｒ）の場合もＷ、（ｎで）と同
様にして多少複雑になるが漸化式によって自己相関関数
を計算することも可能であるため、効率よく計算できる
。In the case of Wz(nr) and W, (nr), it is possible to calculate the autocorrelation function using a recurrence formula in the same way as W and (n), although it is somewhat complicated, so it can be calculated efficiently.

第８図にＷ、（ｎ　ｒ）　、Ｗ、（ｎ　ｒ）　％　ｖ＋
ｒ、（ｎ　ｒ）それぞれの窓関数の形を８１．８２．８
３に示す。Figure 8 shows W, (n r) , W, (n r) % v+
r, (n r) the form of each window function is 81.82.8
Shown in 3.

第８図ではＴ２、Ｔ、の値は、Ｗ、（ｎｒ）が０．５に
なると時刻と、Ｗ、（ｎｒ）　、Ｗ３（ｎｒ）が極大値
を取る時刻とが一致するよう、具体的には、Ｔ、　＝７
１）ｎ２Ｔｓ　＝　Ｔｚ　／　２として示しである。In Fig. 8, the values of T2 and T are specified so that when W, (nr) reaches 0.5, the time coincides with the time when W, (nr) and W3 (nr) take their maximum values. For, T, =7
1) It is shown as n2 Ts = Tz / 2.

〔Effect of the invention〕

従来では有限長の窓を用いていたために避けられない、
あるいは特別の配慮を要した藺題点がすべて未然に回避
される。すなわち、窓の位置によって、基本周波数分析
精度が悪くなることや、窓の長さを厳密に制御する必要
がなくなる。Conventionally, this was unavoidable because a window of finite length was used.
Alternatively, all problems that required special consideration can be avoided. That is, it is no longer necessary to cause the fundamental frequency analysis accuracy to deteriorate depending on the position of the window, and to strictly control the length of the window.

さらに、従来の技術では窓の範囲内では定常性を前提と
して分析を行っていたが、本発明では過去の音声波形に
比べ現在の音声波形により重みをおいた分析であり、窓
の範囲における定常性を必要としないため分析精度の向
上が期待できる。Furthermore, in the conventional technology, analysis was performed assuming stationarity within the range of the window, but in the present invention, the analysis places more weight on the current audio waveform compared to the past audio waveform. Since this method does not require any specificity, it can be expected to improve the accuracy of analysis.

また、窓に指数関数的に減退する関数を用いるので、直
前のサンプル点における自己相関関数を用いて比較的簡
単な処理により次のサンプル点における自己相関関数を
計算することができる。Furthermore, since an exponentially decreasing function is used for the window, the autocorrelation function at the next sample point can be calculated by relatively simple processing using the autocorrelation function at the immediately previous sample point.

[Brief explanation of the drawing]

第１図は、本発明による基本周波数の求め方の説明図、
第２図は、従来技術による基本周波数の求め方の説明図
、第３図は、従来技術による正規化自己相関関数を求め
るまでの処理過程の説明図、第４図は、窓位置の移動の
説明図、第５図は、基本周波数の候補の説明図、第６図
は、本発明による自己相関関数算出までの処理の模式的
説明図、第７図の（ａ）は、格子形偏相関計算回路の全
体図（ｂ）はその一部分の回路図、第８図は、窓関数の
概形図である。１・・・・・・連続的線形予測分析処理、２・・・・・
・ローパスフィルタ、３・・・・・・指数関数的に減退
する半無限長窓による処理、４・・・・・・自己相関関
数の連続算出、５．２５・・・・・・閾値による有声判
別処理、６．２６・・・・・・基本周波数の候補の選出
処理、７．２７・・・・・・基本周波数の選出処理、２
１・・・・・・窓掛けによる短時間の波形の抽出処理、
２２・・・・・・窓関数をかける処理、２３・・・・・
・線形予測分析、２４、・・・・・・自己相関関数の算
出処理、３Ｌ　６１・・・・・・音声波形、３２・・・
・・・窓の長さの音声波形、３３・・・・・・窓関数を
掛けた波形、３４・・・・・・残差波形、３５・・・・
・・正規化自己相関関数、６２・・・・・・時間的に連
続する残差波形、６３・・・・・・ローパスフィルタを
通した残差波形、６４・・・・・・半無限長の窓関数と
掛は合わされる部分の残差波形、６５・・・・・・半無
限長の窓関数、６６・・・・・・窓掛は結果、６７・・
・・・・自己相関関数、８１・・・・・・−次の指数関
数で表される窓関数の概形、８２・・・・・・二次の指
数関数的に表される窓関数の概形、８３・・・・・・三
次の指数関数的に表される窓関数の概形。特許出願人　　藤　　崎　　博　　世間　　　　住友電気工業株式会社同　代理人　　鎌　　１）　文　　二第１図第２図第５図基本周波数Ｈ！第６図６１音声波形力伜Ｗ穎呻孕ｙＶ内ヤ ■ ６２残差波形−学Ｙ財１すＰ倉−〜 ■ ６６窓かけ結果基本周波数の周期の候補第７図（ａ） ■：加算器Ｉ］：１サンプル分の遅延要素（ｂ）第８図FIG. 1 is an explanatory diagram of how to find the fundamental frequency according to the present invention,
Fig. 2 is an explanatory diagram of how to obtain the fundamental frequency using the prior art, Fig. 3 is an explanatory diagram of the processing process up to obtaining the normalized autocorrelation function using the prior art, and Fig. 4 is an explanatory diagram of how to obtain the normalized autocorrelation function using the prior art. 5 is an explanatory diagram of fundamental frequency candidates, FIG. 6 is a schematic explanatory diagram of the process up to autocorrelation function calculation according to the present invention, and FIG. 7(a) is a lattice partial correlation diagram. The overall diagram (b) of the calculation circuit is a partial circuit diagram thereof, and FIG. 8 is a schematic diagram of the window function. 1... Continuous linear predictive analysis processing, 2...
・Low-pass filter, 3...Processing using a semi-infinite window that decreases exponentially, 4...Continuous calculation of autocorrelation function, 5.25...Voicing using a threshold value Determination processing, 6.26... Fundamental frequency candidate selection processing, 7.27... Fundamental frequency selection processing, 2
1... Short-time waveform extraction processing by windowing,
22...Processing of applying a window function, 23...
・Linear prediction analysis, 24, ... Autocorrelation function calculation process, 3L 61 ... Audio waveform, 32...
...Audio waveform with window length, 33...Waveform multiplied by window function, 34...Residual waveform, 35...
...Normalized autocorrelation function, 62...Temporally continuous residual waveform, 63...Residual waveform passed through a low-pass filter, 64...Semi-infinite length The window function and multiplication are the residual waveform of the part to be combined, 65... A half-infinite length window function, 66... The window multiplication is the result, 67...
・・・Autocorrelation function, 81...--Outline of the window function expressed by the following exponential function, 82......The window function expressed by the quadratic exponential function Approximate form, 83... Approximate form of a window function expressed as a third-order exponential function. Patent applicant Hiroshi Fujisaki Sumitomo Electric Industries Co., Ltd. Agent Kama 1) Text 2 Figure 1 Figure 2 Figure 5 Fundamental frequency H! Fig. 6 61 Speech waveform force 伜 W 穎孕 y V 内や ■ 62 Residual waveform - Gaku Y goods 1 S P warehouse - ~ ■ 66 Windowing result fundamental frequency period candidate Fig. 7 (a) ■: Addition Device I]: Delay element for 1 sample (b) Figure 8

Claims

[Claims]

(1) A means for obtaining a residual by continuously performing linear predictive analysis using an audio waveform as an input, and a means for obtaining an autocorrelation function by multiplying the residual by a window function of semi-infinite length that decays exponentially; means for determining the period of the fundamental frequency from the maximum point of the autocorrelation function of the residual; the residual is continuously determined from the audio waveform, and the residual is multiplied by a window function of semi-infinite length that decays exponentially. 1. A fundamental frequency analysis device characterized in that the autocorrelation function of the autocorrelation function is obtained, and the fundamental frequency is obtained from the maximum point of the autocorrelation function.

(2) In the fundamental frequency analyzer according to claim 1, the fundamental frequency analysis device is characterized in that the means for continuously performing linear predictive analysis to obtain residuals is based on continuous calculation of partial correlation using a lattice method. Analysis equipment.

(3) In the fundamental frequency analysis device according to claim 1 or 2, the window function W(nτ) is A_1exp(-nτ/T_1) (A_1 and T_1 are constants. τ indicates a sampling period. .n is a non-negative integer.) A fundamental frequency analyzer characterized by being represented by:

(4) In the fundamental frequency analyzer according to claim 1 or 2, the window function W(nτ) is A_2nτexp(1-nτ/T_2) (A_2 and T_2 are constants. τ is the sampling period. (n is a non-negative integer.) A fundamental frequency analyzer characterized by being represented by:

(5) In the fundamental frequency analyzer according to claim 1 or 2, the window function W(nτ) is A_3(nτ)^2exp(2-nτ/T_3)(A_
3.T_3 is a constant. τ indicates the sampling period. n is a non-negative integer. ) A fundamental frequency analyzer characterized by: