JPS62194300A

JPS62194300A - Pitch extraction system

Info

Publication number: JPS62194300A
Application number: JP61035151A
Authority: JP
Inventors: 浅川　吉章; 宮本　宜則; 和弘近藤; 市川　熹; 鈴木　俊郎
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1986-02-21
Filing date: 1986-02-21
Publication date: 1987-08-26
Anticipated expiration: 2012-02-26
Also published as: JP2585214B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は音声の分析に係り、特に実時間で音声のピッチ
周期を抽出するに好適なピッチ抽出方式の改良に関する
。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to speech analysis, and particularly to an improvement of a pitch extraction method suitable for extracting the pitch period of speech in real time.

[Conventional technology]

音声を分解して伝送あるいは蓄積する高能率符号化や音
声合成等において、ピッチ周期情報は音質上極めて重要
であり、その高精度な抽出手段は必要不可欠である。ま
た特に伝送においては実時間処理が必須であり、装置コ
ストの低減のためにも低処理量で高速なピッチ抽出アル
ゴリズムが望まれる。Pitch period information is extremely important in terms of sound quality in high-efficiency encoding, speech synthesis, etc. in which speech is decomposed and transmitted or stored, and a means for extracting it with high precision is indispensable. In addition, real-time processing is essential especially in transmission, and a high-speed pitch extraction algorithm with low processing amount is desired in order to reduce equipment costs.

音声のピッチ周波数は男女子供を考慮すると、７０〜５
００Ｈ７（周期にして２〜１５　ｍ　ｓ　）の範囲にわ
たり、その抽出精度は符号化音声あるいは合成音声の品
質の観点から０．１ｍｓ以下、少なくとも０．３ｍｓ以
下の間隔で抽出することが望ましい、そこで従来は音声
のスペクトル情報を抽出するためのサンプリング周波数
８〜１０ｋｌｌｚによりアナログ−ディジタル（Ａ／Ｄ
）変換した５信号を同時に用いることにより時間分解能
の十分あるピッチ同期を抽出していた。The pitch frequency of voice is 70-5, considering men and women.
00H7 (period: 2 to 15 ms), and the extraction accuracy is preferably 0.1 ms or less, or at least 0.3 ms or less, from the viewpoint of the quality of encoded speech or synthesized speech. Conventionally, analog-digital (A/D
) Pitch synchronization with sufficient time resolution was extracted by simultaneously using the five converted signals.

ピッチ同期を抽出するためには音声波形あるいは予測残
差波形の自己相関係数を２〜１５ｍ５の遅れに対して求
め、その相関係数のピーク値を与える時間遅れの値をも
ってピッチ周期とする手法が一般的である。いま、８ｋ
Ｈｚサンプリングの場合を考えると、１サンプル当りの
時間遅れは１．２５μｓであるから、２〜１５　ｍ　ｓ
の遅れは１６〜１２０サンプル点の遅れに相当し、抽出
した自己相関係数の信頼性を考慮すれば、最も遅延の大
きい１２０サンプル遅れの値に対しても１００点程度合
データが要求されるから、音声サンプルデータは２２０
点程度合要となり、１６〜１２０点遅れの自己相関係数
の演算量も非常に大きなものとなる。In order to extract pitch synchronization, the autocorrelation coefficient of the speech waveform or prediction residual waveform is calculated for a delay of 2 to 15m5, and the pitch period is determined by the time delay value that gives the peak value of the correlation coefficient. is common. Now 8k
Considering the case of Hz sampling, the time delay per sample is 1.25 μs, so 2 to 15 m s
The delay corresponds to a delay of 16 to 120 sample points, and considering the reliability of the extracted autocorrelation coefficient, matching data of about 100 points is required even for the value with a delay of 120 samples, which is the largest delay. , the audio sample data is 220
The calculation amount for the autocorrelation coefficient with a lag of 16 to 120 points becomes very large.

上記ピッチ抽出の′／ｉｌ算量を低減し、汎用の信号処
理用マイクロコンピュータ（ＤＳＰ）によって実時間（
通常１０〜２０　ｍ　ｓ　）で実現可能とした方法とし
て１例えば特開昭５７−８２８９７号がある。これは入
力音声を例えば１／４にリサンプリングした後、相関係
数を算出することによってデータ量。By reducing the calculation amount of '/il for the pitch extraction mentioned above, a general-purpose signal processing microcomputer (DSP) can be used to
For example, Japanese Patent Application Laid-Open No. 57-82897 is a method that can be realized in a time period of 10 to 20 ms). This is the amount of data obtained by resampling the input audio to, for example, 1/4 and then calculating the correlation coefficient.

演算量を低減し、相関係数のピーク値付近を放物線補間
することで必要な時間分解能を確保したピッチ周期を抽
出した方法である。また特開昭５８−７６８９１号では
りサンプリングに際し、低次の線形予測分析を行い、ホ
ルマントの影響を除去した上でピッチ抽出を行う方法が
開示されている。さらに特開昭５８−１１４０７９８号
では、過去数フレー１１におけるピッチ周期からガイド
インデックスを求め、ピッチ周期の連続性を考慮してピ
ッチ同期を抽出する方法が開示されている。This method reduces the amount of calculations and performs parabolic interpolation around the peak value of the correlation coefficient to extract the pitch period while ensuring the necessary time resolution. Further, Japanese Patent Application Laid-Open No. 76891/1983 discloses a method in which low-order linear predictive analysis is performed during beam sampling, and pitch extraction is performed after removing the influence of formants. Further, JP-A-58-1140798 discloses a method of determining a guide index from the pitch period of the past few frames 11 and extracting pitch synchronization by taking into consideration the continuity of the pitch period.

[Problem that the invention seeks to solve]

上記特開昭５７−８２８９又は特開昭５８−７６８９１
に開示された技術を電話回線を経由した音声に適用しよ
うとすると必ずしも十分な性能は得られない。これは帯
域が制限された（３００〜３４００Ｈｚ　）音声に対し
ては、音声の調波成分の影響を受けやすくなるためであ
る。すなわちピッチ成分よりもその高調波成分が相対的
に強調され、真のピッチ周期の整数分の−が選ばれやす
くなる。逆に整数倍の周期が選ばれることもある。これ
らの問題は通常のサンプリンググレートにおいても起こ
るが、リサンプリングした場合には真のピッチ周期とり
サンプリング周期との不一致に起因して、上記問題が増
加する。The above-mentioned Japanese Patent Application Publication No. 57-8289 or Japanese Patent Application Publication No. 58-76891
If the technology disclosed in 2003 is applied to voice transmitted over a telephone line, sufficient performance may not always be obtained. This is because voice with a limited band (300 to 3400 Hz) is susceptible to the effects of harmonic components of the voice. In other words, the harmonic components thereof are emphasized relatively more than the pitch components, and - which is an integer of the true pitch period is more likely to be selected. Conversely, a period that is an integer multiple may be selected. These problems occur even at normal sampling rates, but when resampling is performed, the above problems increase due to the mismatch between the true pitch period and the sampling period.

一方、ピッチ同期はフレー１１毎に独立に抽出するため
、不連続が生じやすい。これに対し特開昭５８−１１４
０７９８の方法はピッチ周期の連続性を保つ上で有効で
ある。しかしピッチ周期の候補に対するｊｌ関値を評価
していないため、抽出されたピッチ周期に誤りが多い場
合には、誤りが伝播する可能性があり、これを防ぐため
には、予め８フレ一１１分程度のピッチ周期を抽出した
上でピッチ周期を選び直す必要がある。このことは符号
に遅延が８０　ｍ　ｓ増えろことを意味し、通話品質に
与える影響が無視できない。On the other hand, since pitch synchronization is extracted independently for each frame 11, discontinuity is likely to occur. On the other hand, JP-A-58-114
The method of 0798 is effective in maintaining the continuity of pitch periods. However, since the jl function value for pitch period candidates is not evaluated, if there are many errors in the extracted pitch period, there is a possibility that the errors will propagate. It is necessary to select the pitch period again after extracting the pitch period. This means that the code has an additional delay of 80 ms, and the impact on speech quality cannot be ignored.

本発明の目的は、データ量、処理量ともに少なく、かつ
本質的に符号化遅延が少なくて済む高精度のピッチ抽出
方法を提供することにある。An object of the present invention is to provide a highly accurate pitch extraction method that requires less data and less processing, and essentially requires less encoding delay.

[Means for solving problems]

上記目的を達成するために、本発明ではりサンプリング
された音声信号の相関係数のピークにより抽出されたピ
ッチ周期から複数個の候補を算出し、その各々に対する
音声信号の短区間の相関値を評価することによって、こ
れら候補から最も適切なピッチ周期が選ばれるようにし
ている。またこの時に、直前のフレームまでに抽出され
たピッチ周期に基づいて相関値に重み付けを施すことに
より、連続性が確保された安定なピッチ周期が選１ギれ
る。In order to achieve the above object, the present invention calculates a plurality of candidates from the pitch period extracted by the peak of the correlation coefficient of the sampled audio signal, and calculates the correlation value of the short section of the audio signal for each of them. Through evaluation, the most appropriate pitch period is selected from these candidates. Also, at this time, by weighting the correlation values based on the pitch periods extracted up to the previous frame, a stable pitch period with guaranteed continuity is selected.

[Effect]

第３図の波形２１は音声波形の一例を示してぃろ。また
区間３１はピッチを抽出する該フレーｔ１を示している
。原波形ｘ１（ｉ番目の標本化波形）を低減濾波した波
形Ｘ＋　を４：１にリサンプルした波形ｙ、に対し、自
己相関係数を次式で算出する。Waveform 21 in FIG. 3 shows an example of a voice waveform. Furthermore, a section 31 indicates the frame t1 from which the pitch is extracted. The autocorrelation coefficient is calculated using the following equation for a waveform y obtained by resampling the waveform X+ at a ratio of 4:1, which is obtained by reducing and filtering the original waveform x1 (i-th sampled waveform).

Ｒ（ｔ）の最大値近傍を補間してその最大値を与える時
間遅れをＴ（原サンプリングの分解能を持つ）とする。Let the time delay for interpolating the vicinity of the maximum value of R(t) and giving the maximum value be T (having the resolution of the original sampling).

このときピッチ周期の候補としてはピッチ周期の探索範
囲内にあるＴ、ｎＴ、Ｔ／ｎ（ｎは２以上の整数）とな
る、第４図は第３図の区間３２を基準として、そこから
Ｔ／３．Ｔ／２、Ｔ、２Ｔはなれた区間（それぞれｇ５
３図の区間３３，３４．３５．３６）との相関値を次式
で算出した結果を示している。At this time, the pitch period candidates are T, nT, and T/n (n is an integer of 2 or more) within the pitch period search range. T/3. T/2, T, and 2T are separate sections (each g5
This shows the result of calculating the correlation value with the sections 33, 34, 35, 36) in Figure 3 using the following equation.

ｔ：ＯここにＸ、はｉ番目の音声波形の標本データの振幅であ
り、区間３２の先頭を便宜的にｉ　＝　Ｑとしている。t:O Here, X is the amplitude of the sample data of the i-th audio waveform, and the beginning of the section 32 is conveniently set at i=Q.

Ｍは予め定めたデータ数、ｊは区間３３．３４，３５．
３６の先頭のデータ番号（アドレス）、すなわちＴ／３
．Ｔ／２．Ｔ、２Ｔ（但し整数）である。第３図によれ
ばｒ（Ｔ／２）はｒ（Ｔ）と同程度の値を示しており、
正しいピッチ周期はＴ／２と判定できる。ここで式（２
）に用いるｘｌは原データ、すなわちリサンプルされる
前のデータであるため、基準となる区間（第３図におけ
る区間３２）を適切に選べば比較的少ないデータ数で安
定な判定が可能である。M is the predetermined number of data, and j is the section 33, 34, 35, .
36 first data number (address), that is, T/3
．． T/2. T, 2T (however, an integer). According to Figure 3, r(T/2) has a value similar to r(T),
The correct pitch period can be determined to be T/2. Here, the formula (2
) is the original data, that is, the data before being resampled, so if the reference section (section 32 in Figure 3) is appropriately selected, stable judgments can be made with a relatively small amount of data. .

ところで、一般にピッチ周期の候補はＴ　／　ｎ　。By the way, the pitch period candidate is generally T/n.

ｎＴの両方があるが、いずれかの場合に限定されれば、
ピッチの判定は容易となる。そこで式（１）のＲ（ｔ）
に対し次数ｔに応じて適切な重みＷ（ｔ）を乗すること
により、　Ｒ（ｔ）の最大値を与えるＴは正しいピッチ
周期又はその整数倍のみが抽出されるようになり、式（
２）のｒ　（ｊ）はｊ＝Ｔ／ｎ（ｎ≧１）のみ算出すれ
ば良く、この時はｒ（ｊ）”ｒ（Ｔ）となるもののうち
最も小さなｊをピッチ周期とすれば良い。There are both nT, but if limited to either case,
It becomes easy to judge the pitch. Therefore, R(t) of formula (1)
By multiplying by an appropriate weight W(t) according to the order t, T that gives the maximum value of R(t) will extract only the correct pitch period or its integer multiple, and the formula (
r(j) in 2) only needs to be calculated for j=T/n (n≧1), and in this case, the pitch period should be the smallest j among those that satisfy r(j)”r(T). .

一方、Ｒ（ｔ）はフレーム毎に算出されるので。On the other hand, R(t) is calculated for each frame.

まれに正しいピッチの非整数倍の値が選ばれることがあ
り、このようなものに対しては正しいピッチ周期を抽出
することが出来ない０通常の発声では、フレー１１間の
ピッチ周期の変動は連続的と見なされろ範囲であり、直
前のフレー１１までに抽出されたピッチ周期に近いもの
が選ばれろようにすることで上記問題は回避できる。具
体的にはＲ（ｔ）に乗する重みＷ　（ｔ）を、直前のフ
レームのピッチ周期に対応するＬおよびその近傍のみ標
準値よりも大きくすれば良い。同様の操作は式（２）の
ｒ　（Ｔ／ｎ）（ｎは１以上の整数）の値を評価してピ
ッチ周期を選択する場合にも適用できる。すなわち直前
のフレームのピッチ周期に近いＴ　／　ｎに対するｒ（
Ｔ／ｎ）の相関値に重み付けすればよい。In rare cases, a value that is a non-integer multiple of the correct pitch is selected, and for such cases it is not possible to extract the correct pitch period. The above problem can be avoided by selecting a pitch period that is considered to be continuous and close to the pitch period extracted up to the previous frame 11. Specifically, the weight W (t) multiplied by R(t) may be made larger than the standard value only in L corresponding to the pitch period of the immediately previous frame and its vicinity. A similar operation can be applied to the case where the pitch period is selected by evaluating the value of r (T/n) (n is an integer of 1 or more) in equation (2). That is, r( for T/n close to the pitch period of the immediately previous frame)
What is necessary is to weight the correlation value of T/n).

この方法の特徴はピッチ周期の連続性を相関係数値に反
映させる点にある。このようにすることで、万が一前フ
レームで抽出されたピッチ周期が誤っていても、現フレ
ームで正しいピッチ周期に対応する相関値の方が、誤っ
て重みづけられた相関値よりも大きくなる可能性が高く
、誤りが伝播しにくい。The feature of this method is that the continuity of the pitch period is reflected in the correlation coefficient. By doing this, even if the pitch period extracted in the previous frame is incorrect, the correlation value corresponding to the correct pitch period in the current frame can be larger than the incorrectly weighted correlation value. It has a high level of accuracy, making it difficult for errors to propagate.

〔Example〕

以下、本発明の一実施例を図面を用いて説明する。 An embodiment of the present invention will be described below with reference to the drawings.

第１図は本発明を用いたピッチ抽出装置のブロック図で
ある。第１図において所定のサンプリング同期でディジ
タル化された音声データがバッファメモリ１に格納され
る。ここでサンプリング周期は１２５μ５（８ｋＨｚサ
ンプリング）、フレーム周期は２０ｍ５とする。バッフ
ァメモリには現フレームを中心とした４０ｍ５分のデー
タが格納されている。バッファメモリ１から音声データ
Ｘ、が読み出され、リサンプリング部２に入力され、２
ｋＨｚでリサンプリングされた波形ｙｉが出力される。FIG. 1 is a block diagram of a pitch extraction device using the present invention. In FIG. 1, audio data digitized with predetermined sampling synchronization is stored in a buffer memory 1. Here, the sampling period is 125 μ5 (8 kHz sampling) and the frame period is 20 m5. The buffer memory stores 40m5 worth of data centered on the current frame. Audio data X is read out from buffer memory 1, inputted to resampling section 2, and
A waveform yi resampled at kHz is output.

リサンプリング部２において、音声データｘｉがカット
オフ周波数５００　Ｈｚの低域濾波器を経由し、４：１
に間引かれている。In the resampling unit 2, the audio data xi passes through a low-pass filter with a cutoff frequency of 500 Hz, and
have been thinned out.

リサンプリングされた音声データｙｉは自己相関係数演
算部３に入力され、式（１）に従って自己相関係数ＲＤ
）が算出される。ここでピッチ周期の探索範囲は２〜１
５　ｍ　Ｆ！であり、８　ｋ　Ｈｚサンプリングではｔ
ｗｉｎ　＝１６〜ｔｍａｘ　＝　’Ｌ　２０サンプルに
相当するが、２ｋＨｚでリサンプリングされた波形に対
しては４〜３０サンプルに相当する。但しピッチ周期の
候補は放物線補間により抽出するので、余分に２サンプ
ルすなわち時間遅れｔ＝３〜３１に対してＲ（ｔ）を算
出する必要がある。The resampled audio data yi is input to the autocorrelation coefficient calculation unit 3, and the autocorrelation coefficient RD is calculated according to equation (1).
) is calculated. Here, the pitch period search range is 2 to 1
5 m F! and for 8 kHz sampling, t
win = 16 ~ tmax = 'L corresponds to 20 samples, but for a waveform resampled at 2 kHz corresponds to 4 to 30 samples. However, since pitch period candidates are extracted by parabolic interpolation, it is necessary to calculate R(t) for two extra samples, that is, for a time delay t=3 to 31.

Ｒ（ｔ）はピッチ周期候補抽出部４に入力され、まず重
み付けが施される。R(t) is input to the pitch period candidate extraction unit 4, and first weighted.

Ｒ’　　（ｔ）＝Ｒ（ｔ）　　・Ｗ（ｔ）　　　　（３
）Ｗ　（ｔ、　）の標準特性は例えば第５図に示すよう
なものである。これは一種の低域強調となっており、正
しいピッチ周期の整数分の−が抽出されるのを抑制する
効果がある。R' (t)=R(t) ・W(t) (3
) W (t, ) is as shown in FIG. 5, for example. This is a kind of low-frequency emphasis, and has the effect of suppressing extraction of an integer part of the correct pitch period.

次にｔ＝４〜３０に対応するＲ’　　（ｔ）の最大値が
検出される。Ｒ’（ｔ）の最大値を与えろ次数をｔ＝ｔ
ｏ　とすると、ピッチ周期の候補Ｔは次の放物線補間に
より８ｋＨｚサンプリングにおける時間分解能（１２５
μＳ）で求まる。Next, the maximum value of R' (t) corresponding to t=4 to 30 is detected. Give the maximum value of R'(t).The order is t=t
o, the pitch period candidate T is determined by the following parabolic interpolation to obtain the time resolution (125
μS).

ピッチ周期候補抽出部４からはＲ’　（ｔ）の最大値Ｒ
’（ｔｏ）が判定部６へ、ピッチ周期候補Ｔが部分相関
演算部５へそれぞれ出力される。From the pitch period candidate extracting unit 4, the maximum value R of R' (t)
'(to) is output to the determination unit 6, and the pitch period candidate T is output to the partial correlation calculation unit 5, respectively.

部分相関演算部５においては、バッファメモリ１から音
声データＸ、が読み出され、Ｔ　／　ｎ≧τｗｉｎ　　　　　　　　　　　　（５）
なるＴ　／　ｎに対し式（２）に従ってｒ（Ｔ／ｎ）が
算出される。ここにｎは１以上の整数であり、Ｔ　／　
ｎは整数で表現した値である。ここで式（２）において
便宜上土＝０としている基準区間の先頭アドレスの求め
方を説明する。In the partial correlation calculation unit 5, audio data X is read out from the buffer memory 1, and T/n≧τwin (5)
r(T/n) is calculated according to equation (2) for T/n. Here n is an integer greater than or equal to 1, and T/
n is a value expressed as an integer. Here, we will explain how to find the start address of the reference section, where soil=0 in equation (2) for convenience.

部分相関演算部５の目的は式（２）の相関係数をＴ　／
　ｎに対して感度よく求めることである。そのためには
フレームの中で最も周期性の高い部分を基準とすること
が望ましい、基準区間の求め方の一例は、まずフレーム
内の音声データのうち振幅の絶対値が最大のものＸｔＯ
を検出し、そのデータを含み、連続したＭ個の音声デー
タに対し振幅の絶対値和を求め、この最大値を与えるｋ　”　ｋ　ｏ　を基準区
間の先頭アドレスとするものである。式（６）のａ（ｋ
）のかわりにパワを用いてもよい、このようにして基準区間を定めると１
式（２）におけるデータ数Ｍは最小ピッチ周期τｗｉｎ
の２倍程度で良いことがわかった。The purpose of the partial correlation calculation unit 5 is to calculate the correlation coefficient of equation (2) by T/
The purpose is to find it with good sensitivity for n. For this purpose, it is desirable to use the part with the highest periodicity in the frame as the reference. An example of how to find the reference interval is to first
is detected, the sum of the absolute values of the amplitudes is determined for M continuous audio data including that data, and k '' k o that gives this maximum value is set as the start address of the reference section.Equation (6) ) of a(k
) may be used instead of power.If the reference interval is determined in this way, 1
The number of data M in equation (2) is the minimum pitch period τwin
It turns out that about twice the amount is sufficient.

このようにして決定したアドレスｋｏを改めてｉ＝Ｏと
し、式（２）を算出する。ここでｋｏがフレームの後半
にあるときは、式（２）のかわりにを用いても良い。The address ko determined in this way is set again to i=O, and equation (2) is calculated. Here, when ko is in the latter half of the frame, equation (2) may be used instead.

部分相関演算部５から、Ｔ／ｎ、およびｒ　（Ｔ／　ｎ
　）が判定部６に出力される。From the partial correlation calculation unit 5, T/n and r (T/n
) is output to the determination section 6.

判定部６においては、まず、ピッチ抽出候補抽出部４か
らの出力Ｒ’（Ｔｏ）の閾値判定により該フレームが有
声か無声かを決定する。すなわちＲ’　　（ｔ、）≧０
１　　　　　　　　　　　（７）のとき有声とし、ｒ（
Ｔ／ｎ）に対する判定を行う。ここにθｌは正の閾値で
ある。そうでない場合は該フレームは無声として、ピッ
チ周期７としてて＝Ｏを出力し、該フレームの処理を終
了する６有声の場合には部分相関演算部の出力Ｔ　／　
ｎおよびｒ（Ｔ／ｎ）を用いて。The determining unit 6 first determines whether the frame is voiced or unvoiced by threshold value determination of the output R'(To) from the pitch extraction candidate extracting unit 4. That is, R' (t,)≧0
1 (7) is voiced, and r(
T/n). Here, θl is a positive threshold. Otherwise, the frame is considered unvoiced, the pitch period is 7, and =O is output, and the processing of the frame is terminated.6 If the frame is voiced, the output of the partial correlation calculation unit is T/
using n and r(T/n).

ｒ　（Ｔ／　ｎ　’；：　ｒ　（Ｔ）−０２（８）を満
たすＴ／ｎ（ｎは２以上の整数）のうち最小のものをピ
ッチ周期とする。但しθ２は正の閾値である。もし式（
８）を満たすＴ　／　ｎが存在しない場合にはピッチ周
期はτ＝Ｔとする。r (T/n';: r (T) - 02 (8) The smallest one among T/n (n is an integer of 2 or more) that satisfies (8) is defined as the pitch period. However, θ2 is a positive threshold value. If the expression (
If T/n satisfying 8) does not exist, the pitch period is set to τ=T.

ピッチ周期７が出力されることにより、該フレｌｓの処
理を終了する。By outputting the pitch period 7, the processing of the frame Is is completed.

次に本発明の第２の実施例を第２図を用いて説明する。Next, a second embodiment of the present invention will be described using FIG. 2.

第１の実施例との違いは、重み制御部８が付加されてい
る点である。これは該フレームの直前のフレー１１まで
のピッチ情報を利用することにより、より安定なピッチ
抽出を行うことを目的として今る０重み制御部８では直
前のフレームのピッチ周期が決定した時点で、次のよう
な処理が行われる。The difference from the first embodiment is that a weight control section 8 is added. This is for the purpose of more stable pitch extraction by using the pitch information up to Frame 11 immediately before the current frame.The current 0 weight control unit 8, when the pitch period of the immediately preceding frame is determined, The following processing is performed.

重み制御部８では該フレームの１フレーム前のピッチ周
期で１と２フレーム前のピッチ周期τ２が記憶されてお
り、１　τニーτ２１≦Ｏｓ　　　　　　　　　　　　　　
（９）なる時に、第１の制御パラメータＰＬがＰＬ＝τ
１／４　　　　　　　　　　　　（１ｏ）第２の制御パ
ラメータＰ２がＰ２＝τｓ　　　　　　　　　　　　　　（１１）とセ
ットされ、式（９）を濶たさない場合にはＰ１＝Ｏ（１
０）’ Ｐ　ｚ＝　Ｏ（１１）’ とセットされる。ここに０８は正の閾値であり、連続す
る２フレ一ム間のピッチ周期の変動が連続とみなせる幅
を表している。In the weight control unit 8, the pitch period τ2 of one frame and two frames before the current frame is stored, and 1 τ knee τ21≦Os
(9) When the first control parameter PL becomes PL=τ
1/4 (1o) When the second control parameter P2 is set as P2=τs (11) and does not satisfy equation (9), P1=O(1
0)' P z = O(11)' is set. Here, 08 is a positive threshold value, and represents the width at which the fluctuation in pitch period between two consecutive frames can be regarded as continuous.

該フレームにおける処理は自己相関演算部３までは第１
の実施例と同様である。ピッチ周期候補選択部４におい
て、重み制御部８から供給される第１の制御パラメータ
ｐｔによって、自己相関係数Ｒ（ｔ）の値が部分的に補
正される。すなわちここにＷｌは１以上の重みで通常１
．１〜１．２程度である。またΔＰは補正する次数の幅
を示し、１〜２程度である１式（１２）においてＷｌは
一定としであるが、ｔ：＝Ｐｔを中心に山型の重み形成
とすることも可能である。式（１２）によって直前の２
フレームで連続的なピッチ同期が抽出されている場合に
は、それに近い周期が選ばれやすくなる。The processing in this frame is the first one up to the autocorrelation calculation unit 3.
This is similar to the embodiment. In the pitch period candidate selection section 4, the value of the autocorrelation coefficient R(t) is partially corrected by the first control parameter pt supplied from the weight control section 8. That is, here Wl is usually 1 with a weight of 1 or more.
．． It is about 1 to 1.2. In addition, ΔP indicates the width of the order to be corrected, and is about 1 to 2. In Equation 1 (12), Wl is assumed to be constant, but it is also possible to form a mountain-shaped weight around t:=Pt. . By formula (12), the previous 2
If continuous pitch synchronization is extracted in a frame, a period close to it is likely to be selected.

一方判定部６においては、部分相関演算部から供給され
る相関係数ｒ’　（Ｔ／ｎ）（ｎは１以上の整数）に対
し、重み制御部８から供給される第２の制御パラメータ
Ｐｘによって選択的に重み付けを行う。すなわち、ｎ≧
１に対してｌ　Ｔ　／　ｎ　−Ｐ　ｘ　Ｉ≦０４　　　　　　　　
（１３）を満たす場合、ｒ　（Ｔ／ｎ）　＝ｒ　（Ｔ／ｎ）　　・ｖｔｘ　　（
１４）とする。０番は連続的と見なせるピッチ周期の変
動幅であり、通常はは０＋　＝θδである。またｗｚは
１以上の重みである。このような重み付けを行った場合
には、ｎ≧１に対してｒ　ｍａｘ　＝　ｗａｘ　（（’］−／　ｎ　）（１５
）７７口とし、式（８）による判定のかわりに、ｎ≧１に対してｒ（Ｔ／ｎ）≧ｒ＋ｗａｘ　−Ｏｚ’　　　　　（１６
）を満足するＴ／ｎ（ｎ≧１）のうち最小のものをピッ
チ周期とすれば良い。０２′　は０２に準じた閾値であ
る。該フレームのピッチ周期τが決定した時点で、τ２
＝τｌ、τ１＝τと値が更新される。On the other hand, in the determination section 6, the second control parameter Px supplied from the weight control section 8 is determined based on the correlation coefficient r' (T/n) (n is an integer of 1 or more) supplied from the partial correlation calculation section. Selective weighting is performed by That is, n≧
l T/n −P x I≦04 for 1
If (13) is satisfied, r (T/n) = r (T/n) ・vtx (
14). Number 0 is the fluctuation range of the pitch period that can be regarded as continuous, and is usually 0+ = θδ. Moreover, wz is a weight of 1 or more. When such weighting is performed, r max = wax ((']-/n) (15
)77 mouths, and instead of the determination using equation (8), for n≧1, r(T/n)≧r+wax −Oz' (16
), the smallest one of T/n (n≧1) that satisfies T/n (n≧1) may be used as the pitch period. 02' is a threshold value based on 02. At the time when the pitch period τ of the frame is determined, τ2
The values are updated as follows: = τl, τ1 = τ.

重み制御部８における制御パラメータＰｘ、Ｐｚの決め
方は最も単純な例を示した。制御パラメータの決定方法
に種々の変形があることはいうまでもない。The simplest example of how to determine the control parameters Px and Pz in the weight control unit 8 has been shown. It goes without saying that there are various variations in the method of determining control parameters.

上記第１の実施例および第２の実施例の処理は比較的演
算量、メモリ量が少なくて済み汎用のマイクロプロセッ
サ等により容易に実現できる。第２の実施例で電話回線
を経由した音声のピッチを抽出したところ、抽出誤りが
約２５％から５％に減少した。The processing of the first embodiment and the second embodiment described above requires a relatively small amount of computation and memory, and can be easily implemented using a general-purpose microprocessor or the like. When the pitch of speech transmitted via a telephone line was extracted in the second embodiment, the extraction error was reduced from about 25% to 5%.

〔Effect of the invention〕

本発明によれば、少ない処理量でピッチ周期の候補を抽
出し、ピッチ周期の連続性を考慮した精密な判定を行う
ことができるので、より正確なピッチ周期の抽出を行う
ことができる。According to the present invention, pitch period candidates can be extracted with a small amount of processing and accurate determination can be made in consideration of the continuity of pitch periods, so that pitch periods can be extracted more accurately.

[Brief explanation of drawings]

第１図は本発明の第１の実施例のピッチ抽出装置のブロ
ック図、第２図は本発明の第２の実施例のピッチ抽出装
置のブロック図、第３図は音声波形を示す図、第４図は
本発明の原理を示す図、第５図は重み関数を示す図であ
る。２・・・リサンプリング部、３・・・自己相関係数演算
部。４・・・ピッチ周期候補抽出部、５・・・部分相関演算
部、６・・・判定部、８・・・重み制御部。FIG. 1 is a block diagram of a pitch extraction device according to a first embodiment of the present invention, FIG. 2 is a block diagram of a pitch extraction device according to a second embodiment of the present invention, and FIG. 3 is a diagram showing audio waveforms. FIG. 4 is a diagram showing the principle of the present invention, and FIG. 5 is a diagram showing a weighting function. 2...Resampling unit, 3...Autocorrelation coefficient calculation unit. 4... Pitch period candidate extraction section, 5... Partial correlation calculation section, 6... Judgment section, 8... Weight control section.

Claims

[Claims] 1. In a pitch extraction method in which a digitized audio signal is divided into frames of a predetermined time length and a pitch period is extracted from the correlation function value of the audio signal within the frame, the pitch period is extracted from the correlation function value. A pitch period candidate is extracted, a time delay correlation value corresponding to synchronization of the pitch candidate is precisely calculated with respect to the audio signal in the frame, and a pitch period is extracted based on the correlation value. Pitch extraction method. 2. The pitch extraction method according to claim 1, wherein a predetermined weighting is applied to the correlation function value according to the order. 3. The pitch extraction method according to claim 2, wherein the weight is controlled based on the pitch period extracted up to the immediately previous frame. 4. The correlation function value of the audio signal within the frame is calculated by interpolating the correlation function value calculated after thinning the audio signal within the frame to a predetermined ratio. A pitch extraction method according to claim 1, 2, or 3.