JPH0377998B2 - - Google Patents

Info

Publication number
JPH0377998B2
JPH0377998B2 JP58175454A JP17545483A JPH0377998B2 JP H0377998 B2 JPH0377998 B2 JP H0377998B2 JP 58175454 A JP58175454 A JP 58175454A JP 17545483 A JP17545483 A JP 17545483A JP H0377998 B2 JPH0377998 B2 JP H0377998B2
Authority
JP
Japan
Prior art keywords
pitch
period
pitch period
voiced
waveform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP58175454A
Other languages
Japanese (ja)
Other versions
JPS6068000A (en
Inventor
Satoru Taguchi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
Nippon Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Electric Co Ltd filed Critical Nippon Electric Co Ltd
Priority to JP58175454A priority Critical patent/JPS6068000A/en
Publication of JPS6068000A publication Critical patent/JPS6068000A/en
Publication of JPH0377998B2 publication Critical patent/JPH0377998B2/ja
Granted legal-status Critical Current

Links

Description

【発明の詳細な説明】 本発明は音声のピツチ周期を抽出するためのピ
ツチ抽出装置に関し、殊に可変長フレーム型ボコ
ーダ等に必要な実時間ピツチ抽出を可能としたピ
ツチ抽出装置に係る。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a pitch extraction device for extracting pitch periods of speech, and more particularly to a pitch extraction device capable of real-time pitch extraction necessary for variable length frame type vocoders and the like.

音声波形における有声音部分は周期的な繰り返
し波形を持ちその周期(ピツチ周期)の変化特性
は音声の分析合成、認識等における重要なパラメ
ータであることが知られている。例えば、音声の
分析合成系においては分析部で抽出されるピツチ
抽出結果が合成部において合成される合成音の品
質に大きな影響を及ぼす。
It is known that the voiced part of a speech waveform has a periodic repeating waveform, and the change characteristic of the period (pitch period) is an important parameter in speech analysis, synthesis, recognition, etc. For example, in a speech analysis and synthesis system, the pitch extraction result extracted by the analysis section has a large effect on the quality of synthesized speech synthesized by the synthesis section.

音声波形のピツチ周期の抽出方法としては、従
来、ピツチ周期程度の時間長を持つフレーム毎に
自己相関係数を算出し抽出する方法等、種々の分
析パラメータを用いる方法が知られている。
As a method for extracting the pitch period of an audio waveform, there are conventionally known methods that use various analysis parameters, such as a method of calculating and extracting an autocorrelation coefficient for each frame having a time length approximately equal to the pitch period.

自己相関係数に基づくピツチ抽出法は、自己相
関係数が時間領域内の処理で求め得る点と、被分
析波形とフレームとの位相の影響が比較的に小さ
い点とから広く用いられる。しかしながら自己相
関係数に基づくピツチ抽出法は、後述するように
ピツチ周期の整数倍、又はピツチ周期のN1/N2
倍の周期をピツチ周期として誤つて検出すること
が多いという欠点を有している。(但しN1,N2
は整数であり、N1<N2である)。前記欠点の発
生する被分析波形を、その波形形状から分類する
と、いわゆる有声音定常部と語頭等の有声音過渡
部とに大別される。
Pitch extraction methods based on autocorrelation coefficients are widely used because the autocorrelation coefficients can be obtained by processing in the time domain and the influence of the phase between the analyzed waveform and the frame is relatively small. However, the pitch extraction method based on the autocorrelation coefficient uses an integer multiple of the pitch period, or N1/N2 of the pitch period, as described later.
This method has the disadvantage that double the period is often mistakenly detected as the pitch period. (However, N1, N2
is an integer and N1<N2). The waveforms to be analyzed in which the above-mentioned defects occur can be classified into so-called voiced sound stationary parts and voiced sound transient parts such as the beginning of words, based on their waveform shapes.

有声音定常部に前記欠点が発生する一つ原因は
被分析波形の定常性が著しく強いことである。な
ぜならば、いわゆる有声音定常部は、例えば数百
mSEC程度の比較的の長時間について観察するな
らば、そのピツチ周期を一単位とする波形素片
は、ピツチ周期、波形素片共に、除々に変化して
いることが認められている。しかし、有声音定常
部の種々のセグメントについて、フレーム周期毎
に切出される波形の時間長(例えば30mSEC)程
度の比較的に短時間に限定して観察すると、その
波形は、ほぼ完全な定常性、すなわち周期性を示
すことがしばしばある。例えば正弦波の自己相関
係数波形が前記正弦波と同一周期を有する余弦波
となる等、よく知られている様に、定常性、すな
わち周期性を有する波形の自己相関係数波形は周
期性を有する。従つてフレーム周期毎に例えば
30mSEC程度の時間長で切り出される波形がほぼ
完全な定常性すなわち周期性を示す場合には、そ
の自己相関係数波形は、ほぼ完全な周期性を示
す。故に例えば第1図に示す様にピツチ周期にお
ける自己相関係数の極大値101と倍ピツチ周期
における極大値102とがほとんど等しくなり、
演算精度や、わずかな外乱等の影響でピツチ周期
における極大値101よりも倍ピツチ周期におけ
る極大値102が大きくなることが頻繁に発生す
るからである。
One of the reasons why the above-mentioned defects occur in voiced sound stationary parts is that the waveform to be analyzed has extremely strong stationarity. This is because the so-called voiced stationary part is, for example, several hundred
If we observe a relatively long period of time such as mSEC, it is recognized that the pitch period and the waveform element each change gradually. However, if we observe various segments of the stationary voiced sound over a relatively short period of time, such as the time length of the waveform extracted every frame period (for example, 30 mSEC), we find that the waveform is almost completely stationary. , that is, it often shows periodicity. For example, the autocorrelation coefficient waveform of a sine wave becomes a cosine wave with the same period as the sine wave, and as is well known, the autocorrelation coefficient waveform of a waveform that has stationarity, that is, periodicity, has periodicity. has. Therefore, for each frame period, for example
When a waveform cut out over a time length of about 30 mSEC exhibits almost perfect stationarity, that is, periodicity, the autocorrelation coefficient waveform exhibits almost perfect periodicity. Therefore, for example, as shown in FIG. 1, the maximum value 101 of the autocorrelation coefficient in the pitch period is almost equal to the maximum value 102 in the double pitch period,
This is because the local maximum value 102 in the double pitch period often becomes larger than the local maximum value 101 in the pitch period due to calculation accuracy, slight disturbances, and the like.

有声音定常部に前記欠点が発生する他の原因は
被分析波形の発声者において、例えば第1ホルマ
ントの帯域巾が狭く、更に第1ホルマントの中心
周波数がピツチ周波数(ピツチ周期の逆数)の2
倍等の整数倍の場合に、ピツチ周波数の例えば第
2高調波が第1ホルマントと共振し、ピツチ周波
数の2倍の周波数成分が極端に強調され被分析波
形の基本周波数が、あたかもピツチ周波数の2倍
となることに起因する。ピツチ周波数の2倍の周
波数成分が極端に強調された被分析波形の見かけ
上の周期、すなわち見かけ上の基本周波数の逆数
が本来のピツチ周期の1/2になると、被分析波形
の自己相関係数波形は本来のピツチ周期の1/2の
周強で周期性を示す。故に例えば第2図に示す様
に、ホルマントとピツチとの共振により出現する
自己相関係数の極大値201と本来のピツチ周期
における極大値202とがほとんど等しくなりピ
ツチ周期の誤検出の原因となる。
Another reason why the above-mentioned defects occur in the voiced stationary part is that in the speaker of the analyzed waveform, for example, the bandwidth of the first formant is narrow, and the center frequency of the first formant is 2 times the pitch frequency (the reciprocal of the pitch period).
In the case of an integral multiple of the pitch frequency, for example, the second harmonic of the pitch frequency resonates with the first formant, and the frequency component twice the pitch frequency is extremely emphasized, making the fundamental frequency of the analyzed waveform appear as if it were the pitch frequency. This is due to the fact that the amount is doubled. When the apparent period of the analyzed waveform in which the frequency component twice the pitch frequency is extremely emphasized, that is, the reciprocal of the apparent fundamental frequency becomes 1/2 of the original pitch period, the self-correlation of the analyzed waveform The number waveform exhibits periodicity with a period slightly more than half the original pitch period. Therefore, as shown in FIG. 2, for example, the maximum value 201 of the autocorrelation coefficient that appears due to resonance between formant and pitch is almost equal to the maximum value 202 in the original pitch period, which causes false detection of the pitch period. .

有声音過渡部に前記欠点が発生する原因は、有
声音過渡部はピツチ周期及び音声波形の形状の変
化が大きく、かつ比較的に不規則なことに起因す
る。ピツチ周期及び音声波形の形状の変化が大き
く、かつ比較的に不規則な被分析波形の自己相関
係数波形は多くの場合にピツチ周期による大まか
な周期性は認められるが、ピツチ周期又はピツチ
周期の整数倍の周期における自己相関係数の極大
値が比較的に不揃いとなり、しばしばピツチ周期
における自己相関係数の極大値がピツチ周期の整
数倍周期における極大値より小さくなり、いわゆ
る整数倍ピツチ周期エラーが多く起る。
The reason why the above-mentioned defects occur in the voiced sound transition part is that the voiced sound transition part has large changes in pitch period and voice waveform shape, and is relatively irregular. The autocorrelation coefficient waveform of the analyzed waveform, which has a large change in the pitch period and the shape of the audio waveform and is relatively irregular, often shows rough periodicity due to the pitch period, but The maximum values of the autocorrelation coefficients at periods that are integral multiples of the pitch period are relatively uneven, and the maximum value of the autocorrelation coefficient at the pitch period is often smaller than the maximum value at periods that are an integral multiple of the pitch period, so-called integral multiple pitch periods. Many errors occur.

なお、有声音過渡部は一般にピツチ周波数の変
化が大きく、ピツチ周波数の高調波とホルマント
周波数との共振による音声波形への影響は、有声
音定常部における影響と比較すると小さく、有声
音過渡部におけるいわゆるホルマントピツチエラ
ーの発生頻度は有声音定常部における発生頻度よ
り小さい。
Note that in the voiced sound transient part, the pitch frequency generally changes largely, and the effect on the speech waveform due to the resonance between the harmonics of the pitch frequency and the formant frequency is small compared to the effect in the voiced sound steady part, and The frequency of occurrence of so-called formant pitch errors is lower than the frequency of occurrence in voiced sound stationary parts.

ピツチ検出エラーの影響が音声分析合成系にお
ける合成音の品質に与える影響は、聴覚的には、
有声音定常部におけるエラーが大きく、有声音過
渡部におけるエラーの影響は比較的に軽微であ
る。
The impact of pitch detection errors on the quality of synthesized speech in a speech analysis and synthesis system is auditory:
The error in the voiced sound stationary part is large, and the effect of the error in the voiced sound transient part is relatively small.

従来、特に合成音の品質に大きな影響を与える
有声音定常部におけるピツチ検出エラーを軽減な
いし除去するために、種々の方法が試みられてい
る。しかしながら従来の方法は有声音の定常部と
語頭等の過渡部とを一率に扱つていたために、例
えば語頭において、たまたまピツチ検出誤りが発
生すると、前記ピツチ検出誤りが将来のピツチ検
出特性に悪影響を及ぼすという欠点を有してい
る。
In the past, various methods have been attempted to reduce or eliminate pitch detection errors, particularly in voiced sound stationary parts, which have a large effect on the quality of synthesized speech. However, since conventional methods treat the steady part of voiced sounds and the transient part such as the beginning of a word, if a pitch detection error happens to occur at the beginning of a word, for example, the pitch detection error will affect future pitch detection characteristics. It has the disadvantage of having negative effects.

従来の方法として、例えば音声のピツチ周期の
変化が比較的にゆるやかであることを利用して相
隣るフレームにおけるピツチ周期の差分を、あら
かじめ定められた範囲内に限定してピツチ周期の
抽出を行なうことによりピツチ周期の検出誤りを
防ぐ方法が知られている。しかしながら、この様
に検索範囲内に制限する方法、例えば第3図に示
すように基本ピツチ周期の曲線301上から検出
誤り等のため一度例えば2倍のピツチ周期を持つ
倍ピツチ周期曲線302上のいわゆる倍ピツチ周
期を検出してしまうと、再び正しい基本ピツチ周
期を検出することが困難となる欠点を持つている
持にいわゆる語頭等の無音部から有声音部に移行
する場合、あるいは無声音部から有声音部に移行
する場合、には前記倍ピツチ周期を誤つて検出す
る危険性が大きい。
Conventional methods, for example, take advantage of the fact that the pitch period of audio changes relatively slowly and limit the difference in pitch period between adjacent frames to a predetermined range to extract the pitch period. There is a known method for preventing pitch cycle detection errors by performing the following steps. However, with this method of limiting the search range, for example, as shown in FIG. Once the so-called double pitch period is detected, it becomes difficult to detect the correct basic pitch period again.However, when transitioning from a silent part such as the beginning of a word to a voiced part, or from an unvoiced part. When transitioning to a voiced part, there is a high risk of erroneously detecting the double pitch period.

前記欠点を緩和するために、過去数フレームで
検出されたピツチ周期からピツチの検索範囲を決
定する場合には、いわゆる語頭におけるピツチの
検索範囲の決定が困難であるていう欠点を有して
いた。
In order to alleviate the above-mentioned drawback, when determining the pitch search range from pitch cycles detected in the past few frames, it is difficult to determine the pitch search range at the beginning of a word.

本発明の目的は自己相関係数等に基づいピツチ
抽出を行なうピツチ抽出装置において、ピツチ周
期の検出誤りを防止し、より確実に正しいピツチ
の検出を可能とするピツチ抽出装置を提給するこ
とにある。
An object of the present invention is to provide a pitch extraction device that performs pitch extraction based on autocorrelation coefficients, etc., which prevents errors in pitch period detection and more reliably detects correct pitches. be.

本発明のピツチ抽出装置は一定時間長のピツチ
決定区間を用い、且つ時間的過去に隣接するピツ
チ決定区間に於ける最も時間的に新しいピツチ周
期を唯一の始点とし、更にピツチ周期の連続性を
ピツチ周期の変化率を拘束する目的でピツチ周期
の変化率に対応する傾射制限を有する動的計画法
により評価する手段と、最適なピツチ周期列をピ
ツチ決定区間の終端における複数のピツチ周期列
候補から唯一度決定する手段から構成されてい
る。
The pitch extraction device of the present invention uses a pitch determination interval of a certain length of time, uses the most temporally latest pitch period in the pitch determination interval adjacent in the temporal past as the only starting point, and further measures the continuity of the pitch periods. A means for evaluating by dynamic programming that has an inclination limit corresponding to the rate of change of the pitch period for the purpose of constraining the rate of change of the pitch period, and a plurality of pitch period sequences at the end of the pitch determination interval to determine the optimal pitch period sequence. It consists of means for making a unique decision from among candidates.

次に図面を参照して本発明の実施例を説明す
る。第4図は本発明の実施例を示すブロツク図で
あり、一点鎖線4013で囲まれた部分は本発明
の構成範囲を示す。
Next, embodiments of the present invention will be described with reference to the drawings. FIG. 4 is a block diagram showing an embodiment of the present invention, and the area surrounded by a dashed line 4013 shows the configuration range of the present invention.

波形入力端子4001を介して被分析音声波形
がA/D変換器4002へ供給される。A/D変
換器4002は前記音声波虚を例えば3.4KHzに
帯域制限した後に8KHzで標本化し、更に各標本
を12bitsで線形量子化する。A/D変換器400
2は前記量子化音声信号をウインド処理器400
3へ出力する。ウインド処理器4003は前記量
子化音声信号を蓄積し、30mSEC分(本実施例に
於いては240サンプル)を一括して自己相関係数
算出器4004と有声無声判別器4005へと出
力する。なおウインドウ処理器4003よりの出
力繰返し周期はピツチ抽出処理に於けるフレーム
周期と一致するものであり、その値は例えば
10mSECである。自己相関係数算出器4004は
入力された240サンプルの音声信号から下記(1)式
に示す演算により自己相関係数列τMIN,τMIN+
,τMIN+2,……、τMAXを算出する。
A speech waveform to be analyzed is supplied to an A/D converter 4002 via a waveform input terminal 4001 . The A/D converter 4002 band-limits the audio wave to, for example, 3.4 KHz, samples it at 8 KHz, and linearly quantizes each sample with 12 bits. A/D converter 400
2 converts the quantized audio signal into a window processor 400
Output to 3. The window processor 4003 accumulates the quantized audio signal and outputs 30 mSEC (240 samples in this embodiment) at once to an autocorrelation coefficient calculator 4004 and a voiced/unvoiced discriminator 4005. Note that the output repetition period from the window processor 4003 matches the frame period in the pitch extraction process, and its value is, for example,
It is 10mSEC. The autocorrelation coefficient calculator 4004 calculates autocorrelation coefficient sequences τ MIN , τMIN+ from the input 240 samples of audio signal by calculating the following equation (1).
1 , τMIN+ 2 ,..., τMAX are calculated.

但しx5(i)は量子化音声サンプル、jは基準音声
サンプル数(本実施例に於いては120)である。
算出された自己相関係数例は極大値検索器400
6へ供給される極大値検索器4006は前記係数
列の極大値を検索し、更に極大値及び極大値に対
応する遅延時間を検索結果として伝送路4007
を介してDP処理器4008へ出力する。DP処理
器4008は後述の伝送路4009を介して供給
される時間的過去に隣接するピツチ決定区間に於
ける最も時間的に新しいピツチ周期を唯一の始点
とし、極大値検索器4006より伝送路4007
を介して供給される前記極大値及び極大値に対応
する遅延時間の最適なパスをピツチ周期の変化率
に対応する傾斜制限を有する動的計画法を用い
て、ピツチ決定区間(本実施例では20フレーム:
200mSEC)について選択する。DP処理器400
8は更に選択されたパス(即ち本実施例では20ケ
のピツチ周期データ)をピツチ出力端子4010
を介して出力する。又DP処理器4008は前記
パスを構成する最も時間的に新しい、云い換えれ
ばパスの終端に於けるピツチ周期データを伝送路
4009を介してピツチメモリ4011へ出力す
る。ピツチメモリ4011は前記ピツチ周期デー
タを一時的に記憶し、時間的未来に隣接するピツ
チ決定区間におけるピツチ始点データとして伝送
路4009を介してDP処理器4008へ返却す
る。
However, x 5 (i) is the quantized audio sample, and j is the reference number of audio samples (120 in this embodiment).
The calculated autocorrelation coefficient example is the maximum value searcher 400
A local maximum value searcher 4006 supplied to a transmission line 4007 searches for the local maximum value of the coefficient sequence, and further outputs the local maximum value and the delay time corresponding to the local maximum value as a search result to a transmission line 4007.
is output to the DP processor 4008 via the DP processor 4008. The DP processor 4008 uses as its only starting point the most temporally newest pitch period in the temporally adjacent pitch determination interval supplied via a transmission path 4009 (described later), and uses the transmission path 4007 from the maximum value searcher 4006.
The optimum path of the maximum value and the delay time corresponding to the maximum value supplied via the pitch determination interval (in this example, 20 frames:
200mSEC). DP processor 400
8 further outputs the selected path (that is, 20 pitch cycle data in this embodiment) to the pitch output terminal 4010.
Output via. Further, the DP processor 4008 outputs the temporally newest pitch cycle data constituting the path, in other words, the pitch period data at the end of the path, to the pitch memory 4011 via the transmission line 4009. The pitch memory 4011 temporarily stores the pitch cycle data and returns it to the DP processor 4008 via the transmission path 4009 as pitch start point data in a pitch determination interval adjacent in the temporal future.

以上の説明はピツチ決定区間内い無声(無音を
含む)フレームが存在しない場合の本発明の動作
を述べたものである。ピツチ決定区間内に無声フ
レームが存在する場合には本発明の動作は以下の
通りとなる。
The above description describes the operation of the present invention when there is no unvoiced (including silence) frame within the pitch determination interval. When an unvoiced frame exists within the pitch determination interval, the operation of the present invention is as follows.

有声無声判別器4005はウインド処理器40
03より供給される30mSEC分の音声信号に対
し、有声無声の判別を線形判別式等を用いて例え
ば判別パラメータとして声道断面積比関数を利用
する手法により実施する。なお前記手法は例えば
公開特許公報、昭54−151303“有声無声判別装置”
に記載されている手法である。有声無声判別器4
005は判別結果を有声無声信号として伝送路4
012を介してDP処理器4008へ出力する。
前記有声無声信号の場合にはDP処理器4008
は対応するフレームのピツチデータを例えば
“0”としてピツチの存在しないことを表現する。
又、前記有声無声信号が有声から無声に変化する
場合、有声の最終フレームについてはDP処理器
4008は最終の有声フレームにおいて複数のピ
ツチ周期列候補から最適なピツチ周期列を選択す
る。又、前記有声無声信号が無声から有声に変化
する場合DP処理器4008は最初の有声フレー
ムを始端とし、更に始端に於ける複数のピツチ周
期候補を各々、始点とする。
The voiced/unvoiced discriminator 4005 is the window processor 40
For the 30 mSEC worth of audio signals supplied from 03, voiced/unvoiced discrimination is performed using a linear discriminant or the like, for example, by a method using a vocal tract cross-sectional area ratio function as a discrimination parameter. The above-mentioned method is described in, for example, the published patent publication, 1984-151303 "Voiced/unvoiced discriminator"
This is the method described in . Voiced/unvoiced classifier 4
005 transmits the discrimination result to the transmission path 4 as a voiced/unvoiced signal.
012 to the DP processor 4008.
In the case of the voiced and unvoiced signal, a DP processor 4008
represents the absence of pitch by setting the pitch data of the corresponding frame to, for example, "0".
Further, when the voiced and unvoiced signal changes from voiced to unvoiced, the DP processor 4008 selects the optimal pitch period sequence from a plurality of pitch period sequence candidates in the final voiced frame. Further, when the voiced/unvoiced signal changes from unvoiced to voiced, the DP processor 4008 uses the first voiced frame as the starting point, and also uses each of the plurality of pitch cycle candidates at the starting point as starting points.

以上の処理により無声区間が複数のピツチ決定
区間に渡つて存在しても本発明はなんら制約を受
けないことは自明である。
It is obvious that the present invention is not subject to any restrictions even if the silent section exists over a plurality of pitch determination sections through the above processing.

又、有声無声判別器4005の判別結果を有
声、無声の二値に限定せず、連続量として出力す
ることは容易であり、下記の処理が可能となる。
ピツチ周期の時間的変化特性は以下の性質を持
つ、即ち有声度が高い場合、例えば音声のエネル
ギーが大きく、又、音頭の変化速度がゆるやかな
有声音定常部では、前記変化特性はゆるやかであ
り、有声度が低い場合、例えば音声のエネルギー
が変化し、又、音韻の変化速度のはやい有声音過
渡部では前記変化特性は激しいことが経験的に知
られている。又、有声無声判別器4005に使用
される判別パラメータとしては例えば、音声のエ
ネルギーや、音韻に対応するスペクトル包絡パラ
メータが用いられている。従つてDP処理器40
08に於いて音声の有声度により傾斜制限の設定
値を適応的に変更し、より安定なピツチ抽出を可
能にし得る。
Further, the discrimination result of the voiced/unvoiced discriminator 4005 is not limited to the binary values of voiced and unvoiced, but can be easily output as a continuous quantity, making it possible to perform the following processing.
The temporal change characteristic of the pitch period has the following properties. That is, when the degree of voicing is high, for example, the energy of the voice is large, and the voiced sound stationary part where the rate of change of the onset is slow, the change characteristic is gradual. It is empirically known that when the degree of voicing is low, for example, the energy of the voice changes, and that the change characteristics are severe in voiced sound transition parts where the phoneme changes quickly. Further, as the discrimination parameter used in the voiced/unvoiced discriminator 4005, for example, the energy of the voice and the spectral envelope parameter corresponding to the phoneme are used. Therefore, the DP processor 40
In step 08, the setting value of the slope limit is adaptively changed depending on the degree of voicing of the voice, thereby enabling more stable pitch extraction.

次に波形図を用いてDP処理器4008の動作
を詳細に説明する。第5図はDP処理器4008
の動作を説明するための波形図である。点510
0はピツチ決定区間1に時間的過去に隣接するピ
ツチ決定区間に於ける最も時間的に新しいピツチ
周期を表わす。点5201,5202,5203
はピツチ決定区間1に含まれる第1番目のフレー
ムに於ける自己相関係数の極大値に対応する遅れ
時間である。又、線分5101,5102は傾斜
制限を示す。点5100と点5201とを結ぶ点
線5211は傾斜制限外にある。従つて点520
1に連るピツチ列候補5001と5002とは候
補から除外される。同様に点5100と5203
とを結ぶ点線5213も又、傾斜制限外にある。
従つて点5203に連るピツチ周期列候補500
5も又、候補から除外される。点5100と点5
202とを結ぶ実線5212は傾斜制限内にあ
る。従つて点5202に連るピツチ周期列候補5
003と5004とが、第1番目のフレームに於
いては候補として存在している。ピツチ決定区間
1に於いて、第2番目のフレームから第20番目の
フレームの全てについてピツチ周期列候補500
3と5004とに含まれる点は各々の前フレーム
よりの傾斜制限内にある。従つて最適なピツチ周
期列は第20番目のフレームで以下の式(2)により求
められるRMAXに付随して決定される。
Next, the operation of the DP processor 4008 will be explained in detail using waveform diagrams. Figure 5 shows DP processor 4008
FIG. 2 is a waveform diagram for explaining the operation of FIG. point 510
0 represents the temporally newest pitch period in the pitch determining interval temporally adjacent to the pitch determining interval 1. Points 5201, 5202, 5203
is the delay time corresponding to the maximum value of the autocorrelation coefficient in the first frame included in pitch determination section 1. Moreover, line segments 5101 and 5102 indicate slope restrictions. A dotted line 5211 connecting points 5100 and 5201 is outside the slope limit. Therefore point 520
Pitch column candidates 5001 and 5002 connected to 1 are excluded from the candidates. Similarly, points 5100 and 5203
A dotted line 5213 connecting the two points is also outside the slope limit.
Therefore, pitch periodic sequence candidate 500 connected to point 5203
5 is also excluded as a candidate. Point 5100 and point 5
202 is within the slope limit. Therefore, pitch periodic sequence candidate 5 connected to point 5202
003 and 5004 exist as candidates in the first frame. In pitch determination interval 1, pitch periodic sequence candidates 500 are used for all frames from the second frame to the 20th frame.
The points included in 3 and 5004 are within the slope limits from their respective previous frames. Therefore, the optimum pitch period sequence is determined in the 20th frame in conjunction with R MAX determined by the following equation (2).

RMAX=Max(ρ5003,ρ5004) ……(2) 但しρ5003,ρ5004はそれぞれピツチ周期候補5
003と5004とに含まれるピツチ周期におけ
る自己相関係数値である。今、仮に最適なピツチ
周期列として5003が決定されたものとする。
ピツチ決定区間2の第1番目〜第9番目のフレー
ムに含まれるピツチ周期列候補5006,500
7と5008のうち、点5120を基準とする傾
斜制限内に始点を有する候補は5007のみであ
り、5007が最適にピツチ周期列として決定さ
れる。無論、点5120を基縮とする傾斜制限内
に始点を有する他の候補5009が存在する場合
には第9番目のフレームに於いて、前述のピツチ
決定区間1の第20番目のフレームで実施した式(2)
によるピツチ周期列の形定を行なえばよい。
R MAX = Max (ρ5003, ρ5004) ...(2) However, ρ5003 and ρ5004 are pitch period candidates 5, respectively.
This is an autocorrelation coefficient in pitch periods included in 003 and 5004. Assume now that 5003 has been determined as the optimal pitch period sequence.
Pitch periodic sequence candidates 5006,500 included in the 1st to 9th frames of pitch determination interval 2
7 and 5008, only candidate 5007 has a starting point within the slope limit based on point 5120, and 5007 is optimally determined as the pitch periodic sequence. Of course, if there is another candidate 5009 whose starting point is within the slope restriction based on the point 5120, the process is performed in the 9th frame and in the 20th frame of the pitch determination section 1 described above. Formula (2)
The pitch periodic sequence can be formed by .

第10番目〜第12番目のフレームについては無声
フレーム区間であるためピツチ周期を決定する必
要がない。第13番目のフレームを始点とするピツ
チ周期列候補5010,5011と5012は第
20番目のフレームで式(2)と等価な式により評価さ
れ最適なピツチ周期列が決定される。
Since the 10th to 12th frames are silent frame sections, there is no need to determine the pitch period. Pitch periodic sequence candidates 5010, 5011 and 5012 starting from the 13th frame are the
In the 20th frame, an equation equivalent to equation (2) is evaluated to determine the optimal pitch period sequence.

なお、無声フレーム区間付近の有声フレームは
有声度が低く、且つピツチ周期の変化が激しい特
徴を有する。故に有声度により傾斜制限を可変と
することはピツチ周期を安定に検出する上で有効
である。
Note that a voiced frame near an unvoiced frame section has a low degree of voicing and is characterized by a sharp change in pitch period. Therefore, making the slope limit variable depending on the degree of voicing is effective in stably detecting the pitch period.

以上説明した様に本発明はピツチ周期の変化率
に対応する傾斜制限を有する動的計画法を用いて
最適なピツチ周期列を評価、選択することによ
り、ピツチ周期の連続性を有するピツチ周期列候
補中、最も尤もらしいピツチ周期列を容易に決定
することを可能とし、且つわずかな処理遅延での
ピツチ抽出処流を可能とした。即ち、本発明は、
従来方法ではピツチ抽出誤りの多く発生する有声
音定常部と有声音過渡部とについて、以下のピツ
チ抽出誤り軽減効果がある。まず、有声音過渡部
では有声音定常部の時間的過去又は時間的未来に
連接して存在し、又、定常部のピツチ周期に関連
して過渡部のピツチ周期が決定されるため、少な
くとも有声音定常部にピツチ抽出誤りが存在しな
ければ有声音過渡部にもピツチ抽出誤りが殆んど
存在しない。次に有声音定常部では一連のピツチ
周期列候補中から前記(2)式を用いた総合的評価に
より最適なピツチ周期列を決定することにより、
倍ピツチ周期誤り等が除去される。なぜならば、
通常、自己相関係数等の最大体検索をフレーム毎
に実施した場合に後ピツチ周期誤りが発生するフ
レームの割合は0〜30%程度であり、ピツチ周期
列と、後ピツチ周期列との各々の自己相関係数の
総和を比較するとほぼ確実にピツチ周期列の前記
総和が大きいことが経験的に知られているからで
ある。又、本発明の処理遅延時間は例えば高々
200mSECであり、可変長フレーム型ボコーダに
於けるスペクトル包絡情報分析に要する処理遅延
時間とほぼ等しく、従つて本発明は、安定なピツ
チ抽出を実時間処理で可能とする。
As explained above, the present invention evaluates and selects the optimum pitch period sequence using dynamic programming with a slope restriction corresponding to the rate of change of the pitch period, thereby creating a pitch period sequence having continuity of the pitch period. This makes it possible to easily determine the most likely pitch cycle sequence among the candidates, and also enables pitch extraction processing with a slight processing delay. That is, the present invention
The conventional method has the following pitch extraction error reduction effect for voiced sound steady parts and voiced sound transient parts where many pitch extraction errors occur. First, the voiced transient part exists in conjunction with the temporal past or temporal future of the voiced steady part, and the pitch period of the transient part is determined in relation to the pitch period of the steady part. If there is no pitch extraction error in the steady voice part, there will be almost no pitch extraction error in the voiced sound transition part. Next, in the voiced sound stationary part, the optimal pitch period sequence is determined from a series of pitch period sequence candidates through comprehensive evaluation using equation (2) above.
Double pitch period errors, etc. are removed. because,
Normally, when a maximum field search such as an autocorrelation coefficient is performed for each frame, the proportion of frames in which a trailing pitch period error occurs is about 0 to 30%, and each of the pitch period sequence and the trailing pitch period sequence This is because it is empirically known that when comparing the sum of autocorrelation coefficients of , the sum of pitch periodic sequences is almost certainly large. Further, the processing delay time of the present invention is, for example, at most
This is 200 mSEC, which is approximately equal to the processing delay time required for spectral envelope information analysis in a variable length frame type vocoder. Therefore, the present invention enables stable pitch extraction in real-time processing.

なお、本発明はピツチ抽出用のパラメータとし
は必ずしも自己相関係数に限定されない。本発明
は、ケプストラム、波形差分絶対値等を用いて容
易に実施し得ることは明らかである。
Note that in the present invention, the pitch extraction parameter is not necessarily limited to the autocorrelation coefficient. It is clear that the present invention can be easily implemented using cepstrum, waveform difference absolute values, and the like.

【図面の簡単な説明】[Brief explanation of drawings]

第1図指ピツチ周期を説明するための波形図、
第2図は第1ホルマントの影響によるピツチ検出
誤りを説明するための波形図、第3図はピツチ検
索範囲を制限するピツチ抽出法の欠点を説明する
ための波形図、第4図は本発明の実施例を説明す
るためのブロツク図、第5図はDP処理器400
8の動作を説報するための波形図である。 4001……波形入力端子、4002……A/
D変換器、4003……ウインドウ処理器、40
04……自己相関係数算出器、4005……有声
無声判別器、4006……極大値検索器、400
7……伝送路、4008……DP処理器、400
9……伝送路、4010……ピツチ出力端子、4
011……ピツチメモリ、4012……伝送路、
4013……本発明の構成範囲。
Figure 1 is a waveform diagram for explaining the finger pitch cycle,
Figure 2 is a waveform diagram to explain pitch detection errors due to the influence of the first formant, Figure 3 is a waveform diagram to explain the drawbacks of the pitch extraction method that limits the pitch search range, and Figure 4 is a waveform diagram of the present invention. FIG. 5 is a block diagram for explaining an embodiment of the DP processor 400.
8 is a waveform chart for explaining the operation of No. 8. FIG. 4001...Waveform input terminal, 4002...A/
D converter, 4003...Window processor, 40
04...Autocorrelation coefficient calculator, 4005...Voiced/unvoiced discriminator, 4006...Local maximum value search device, 400
7...Transmission line, 4008...DP processor, 400
9...Transmission line, 4010...Pitch output terminal, 4
011... Pitch memory, 4012... Transmission line,
4013...Configuration range of the present invention.

Claims (1)

【特許請求の範囲】 1 音声のピツチ周期を抽出するためのピツチ抽
出装置に於いて、一定時間長のピツチ決定区間を
用い、且つ時間的過去に隣接するピツチ決定区間
に於ける最も時間的に新しいピツチ周期を唯一の
始点とし、更にピツチ周期の変化率を拘束する目
的でピツチ周期の変化率に対応する傾射制限を有
する動的計画法によりピツチ周期の連続性を評価
する手段と、最適なピツチ周期列をピツチ決定区
間の終端における複数のピツチ周期列候補から唯
一度決定する手段とを有することを特徴とするピ
ツチ抽出装置。 2 特許請求の範囲第1項記載のピツチ抽出装置
に於いて、音声の有声度により前記傾斜制限の設
定値が変更可能であることを特徴とするピツチ抽
出装置。 3 特許請求の範囲第1項記載のピツチ抽出装置
に於いて、前記ピツチ決定区間内に無声区間を有
する場合に、前記無声区間に先行する有声部分を
終端として、ピツチ周期列を決定し、又、前記無
声区間に後続する有声部分を新たに始端とし、更
に始端に於ける複数のピツチ周期候補を各々始点
とすることを特徴とするピツチ抽出装置。
[Claims] 1. In a pitch extraction device for extracting the pitch period of speech, a pitch determination interval of a certain time length is used, and the pitch determination interval that is adjacent to the pitch period in the temporal past is the most temporally means for evaluating the continuity of the pitch period by a dynamic programming method that uses a new pitch period as the only starting point and has an inclination limit corresponding to the rate of change of the pitch period for the purpose of constraining the rate of change of the pitch period; 1. A pitch extraction device comprising means for determining a pitch periodic sequence uniquely from a plurality of pitch periodic sequence candidates at the end of a pitch determination interval. 2. The pitch extraction device according to claim 1, wherein the set value of the slope limit can be changed depending on the degree of voicing of the voice. 3. In the pitch extraction device according to claim 1, when there is an unvoiced section within the pitch determination section, a pitch period sequence is determined with the voiced section preceding the unvoiced section as the end, and , a pitch extraction device characterized in that a voiced portion following the unvoiced section is newly set as a starting point, and a plurality of pitch cycle candidates at the starting point are each set as starting points.
JP58175454A 1983-09-22 1983-09-22 Pitch extractor Granted JPS6068000A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58175454A JPS6068000A (en) 1983-09-22 1983-09-22 Pitch extractor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58175454A JPS6068000A (en) 1983-09-22 1983-09-22 Pitch extractor

Publications (2)

Publication Number Publication Date
JPS6068000A JPS6068000A (en) 1985-04-18
JPH0377998B2 true JPH0377998B2 (en) 1991-12-12

Family

ID=15996350

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58175454A Granted JPS6068000A (en) 1983-09-22 1983-09-22 Pitch extractor

Country Status (1)

Country Link
JP (1) JPS6068000A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003042977A1 (en) * 2001-11-13 2003-05-22 Nec Corporation Code conversion method, apparatus, program, and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0731504B2 (en) * 1985-05-28 1995-04-10 日本電気株式会社 Pitch extractor
US4879748A (en) * 1985-08-28 1989-11-07 American Telephone And Telegraph Company Parallel processing pitch detector

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003042977A1 (en) * 2001-11-13 2003-05-22 Nec Corporation Code conversion method, apparatus, program, and storage medium

Also Published As

Publication number Publication date
JPS6068000A (en) 1985-04-18

Similar Documents

Publication Publication Date Title
US7756700B2 (en) Perceptual harmonic cepstral coefficients as the front-end for speech recognition
Dhananjaya et al. Voiced/nonvoiced detection based on robustness of voiced epochs
US20060053003A1 (en) Acoustic interval detection method and device
US5774836A (en) System and method for performing pitch estimation and error checking on low estimated pitch values in a correlation based pitch estimator
US20040133424A1 (en) Processing speech signals
JP3105465B2 (en) Voice section detection method
US8942977B2 (en) System and method for speech recognition using pitch-synchronous spectral parameters
CN101149924A (en) Method and device for implementing open-loop pitch search
Ishizuka et al. Study of noise robust voice activity detection based on periodic component to aperiodic component ratio.
US6470311B1 (en) Method and apparatus for determining pitch synchronous frames
Kadiri et al. Estimation of Fundamental Frequency from Singing Voice Using Harmonics of Impulse-like Excitation Source.
JPH06161494A (en) Automatic extracting method for pitch section of speech
US20030078770A1 (en) Method for detecting a voice activity decision (voice activity detector)
Zhao et al. A processing method for pitch smoothing based on autocorrelation and cepstral F0 detection approaches
JPH0377998B2 (en)
Sorin et al. The ETSI extended distributed speech recognition (DSR) standards: client side processing and tonal language recognition evaluation
Reddy et al. Predominant melody extraction from vocal polyphonic music signal by combined spectro-temporal method
JPS6214839B2 (en)
JPS6151320B2 (en)
KR100194953B1 (en) Pitch detection method by frame in voiced sound section
JP4576612B2 (en) Speech recognition method and speech recognition apparatus
JPH0222399B2 (en)
JP2001083978A (en) Speech recognition device
Krishnan et al. Comparison of glottal closure instant estimation algorithms for singing voices in indian context
Funada A method for the extraction of spectral peaks and its application to fundamental frequency estimation of speech signals