JPS6151320B2 - - Google Patents

Info

Publication number
JPS6151320B2
JPS6151320B2 JP53047263A JP4726378A JPS6151320B2 JP S6151320 B2 JPS6151320 B2 JP S6151320B2 JP 53047263 A JP53047263 A JP 53047263A JP 4726378 A JP4726378 A JP 4726378A JP S6151320 B2 JPS6151320 B2 JP S6151320B2
Authority
JP
Japan
Prior art keywords
pitch
pitch period
voiced
sound
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
JP53047263A
Other languages
Japanese (ja)
Other versions
JPS54139307A (en
Inventor
Satoru Taguchi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
Nippon Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Electric Co Ltd filed Critical Nippon Electric Co Ltd
Priority to JP4726378A priority Critical patent/JPS54139307A/en
Publication of JPS54139307A publication Critical patent/JPS54139307A/en
Publication of JPS6151320B2 publication Critical patent/JPS6151320B2/ja
Granted legal-status Critical Current

Links

Abstract

PURPOSE:To determine a retrieval range accurately by retrieving a pitch period where the retrieval range of the pitch period in each frame of the sound section except the first word head is limited near the pitch period detected in the past frame. CONSTITUTION:Sound/non-sound discriminator 405 discriminates whether each frame has sound or not, and continuous sound sections and continuous non-sound sections are detected by detectors 407 and 408 respectively on a basis of discrimination signals. Then, busy discriminator 410 discriminates a busy state or non-busy state on a basis of continuous sound section information and continuous non-sound section information, and a fixed time interval is given to each frame by the first word head detector 411 on a basis of sound/non-sound information and busy/non- busy information. Then, integral-multiple pitch period error correction is performed in the first work head section by 413, and the pitch period which has a formant error corrected is extracted by 414, and the pitch period where the retrieval range of the pitch period in each frame of sound sections except the first word head is limited near the pitch period detected in the past frame is retrieved by 409. As a result, the retrieval range can be determined accurately.

Description

【発明の詳細な説明】 本発明は音声波形をピツチ周期程度のフレーム
周期で分析して得られる自己相関係数に基づいて
ピツチ抽出を行なうピツチ抽出装置に関し、殊に
聴覚的に重要な有声音連続部におけるピツチ抽出
誤りを大幅に減少し得るピツチ抽出装置に係るも
のである。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a pitch extraction device that performs pitch extraction based on an autocorrelation coefficient obtained by analyzing a speech waveform with a frame period approximately equal to the pitch period. This invention relates to a pitch extraction device that can significantly reduce pitch extraction errors in continuous sections.

音声波形における有声音部分は周期的な繰り返
し波形を持ちその周期(ピツチ周期)の変化特性
は音声の分析合成、認識等における重要なパラメ
ータであることが知られている。例えば、音声の
分析合成系においては分析部で抽出されるピツチ
抽出結果が合成部おいて合成される合成音の品質
に大きな影響を及ぼす。
It is known that the voiced part of a speech waveform has a periodic repeating waveform, and the change characteristic of the period (pitch period) is an important parameter in speech analysis, synthesis, recognition, etc. For example, in a speech analysis and synthesis system, the pitch extraction result extracted by the analysis section has a large effect on the quality of synthesized speech synthesized by the synthesis section.

音声波形のピツチ周期の抽出方法としては、従
来、ピツチ周期程度の時間長を持つフレーム毎に
自己相関係数を算出し抽出する方法等、種々の分
析パラメータを用いる方法が知られている。
As a method for extracting the pitch period of an audio waveform, methods using various analysis parameters are conventionally known, such as a method of calculating and extracting an autocorrelation coefficient for each frame having a time length approximately equal to the pitch period.

自己相関係数に基づくピツチ抽出法は、自己相
関係数が時間領域内の処理で求め得る点と、被分
析波形とフレームとの位相の影響が比較的に小い
さい点とから広く用いられている。しかしながら
自己相関系数に基づくピツチ抽出法は、後述する
ように、ピツチ周期の整数倍、又はピツチ周期の
N1/N2倍の周期をピツチ周期として誤つて検出
することが多いという欠点を有している。(但し
N1、N2は整数であり、N1<N2である)前記欠点
の発生する被分析波形を、その波形形状から分類
すると、言ゆる有声音定常部と、言ゆる語頭等の
有声音過渡部とに大別される。
Pitch extraction methods based on autocorrelation coefficients are widely used because the autocorrelation coefficients can be obtained by processing in the time domain, and the influence of the phase between the analyzed waveform and the frame is relatively small. ing. However, the pitch extraction method based on the autocorrelation system uses integral multiples of the pitch period or the pitch period.
This method has the disadvantage that a period of N1/N2 times is often mistakenly detected as a pitch period. (however
(N1 and N2 are integers, and N1 < N2) The analyzed waveform in which the above defects occur can be classified into the so-called voiced sound stationary part and the so-called voiced sound transient part such as the beginning of a word. Broadly classified.

有声音定常部に前記欠点が発生する一つの原因
は被分析波形の定常性が著しく強いことである。
なぜならば、言ゆる有声音定常部は、例えば数百
mSEC程度の比較的に長時間について観察する
ならば、そのピツチ周期を一単位とする波形素片
は、ピツチ周期、波形素片形共に、除々に変化し
ていることが認められる。しかしながら、有声音
定常部の種々のセグメントについて、フレーム周
期毎に切り出される波形の時間長(例えば30m
SEC)程度の比較的に短時間に限定して観察す
ると、その波形は、ほぼ完全な定常性、すなわち
周期性を示すことがしばしばある。例えば正弦波
の自己相関係数波形が前記正弦波と同一周期を有
する余弦波となる等、よく知られている様に、定
常性、すなわち、周期性を有する波形の自己相関
係数波形は周期性を有する。従つてフレーム周期
毎に例えば30mSEC程度の時間長で切り出され
る波形がほぼ完全に定常性すなわち周期性を示す
場合には、その自己相関係数波形は、ほぼ完全な
周期性を示す。故に例えば第2図に示す様にピツ
チ周期における自己相関係数の極大値、201と倍
ピツチ周期における極大値、202とがほとんど等
しくなり、演算精度や、わずかな外乱等の影響で
ピツチ周期における極大値、201よりも倍ピツチ
周期における極大値、202が大きくなることが頻
繁に発生するからである。
One of the reasons why the above-mentioned defects occur in voiced sound stationary parts is that the waveform to be analyzed has extremely strong stationarity.
This is because if the so-called stationary part of a voiced sound is observed over a relatively long period of time, for example, several hundred mSEC, the waveform element whose pitch period is one unit will have the following pitch period and waveform element shape: It is recognized that there is a gradual change. However, for various segments of the stationary voiced part, the time length of the waveform cut out for each frame period (for example, 30 m
When observed over a relatively short period of time (SEC), the waveform often exhibits almost perfect stationarity, that is, periodicity. For example, the autocorrelation coefficient waveform of a sine wave becomes a cosine wave with the same period as the sine wave, and as is well known, the autocorrelation coefficient waveform of a waveform with stationarity, that is, periodicity, has a period. have sex. Therefore, when a waveform cut out at a time length of, for example, about 30 mSEC for each frame period exhibits almost perfect stationarity, that is, periodicity, the autocorrelation coefficient waveform exhibits almost perfect periodicity. Therefore, for example, as shown in Figure 2, the maximum value of the autocorrelation coefficient in the pitch period, 201, is almost equal to the maximum value, 202, in the double pitch period, and due to the effects of calculation accuracy and slight disturbances, the maximum value of the autocorrelation coefficient in the pitch period, 201, This is because the maximum value 202 in the double pitch period frequently occurs larger than the maximum value 201.

有声音定常部に前記欠点が発生する他の原因は
被分析波形の発声者において、例えば1ホルマン
トの帯域巾が狭く、更に第1ホルマントの中心周
波数がピツチ周波数(ピツチ周期の逆数)の2倍
等の整数倍の場合に、ピツチ周波数の例えば第2
高調波がホルマントと共振し、ピツチ周波数の2
倍の周波数成分が極端に強調され被分析波形の基
本周波数が、あたかもピツチ周波数の2倍となる
ことである。ピツチ周波数の2倍の周波数成分が
極端に強調された被分析波形の見かけ上の周期す
なわち見かけ上の基本周波数の逆数が本来のピツ
チ周期1/2になると、被分析波形の自己相関係数
波形は本来のピツチ周期の1/2の周期で周期性を
示す。故に例えば第1図に示す様に、ホルマント
とピツチとの共振により出現する自己相関係数の
極大値、101と本来のピツチ周期における極大
値、102とがほとんど等しくなり、ピツチ周期の
誤検出の原因となる。
Another reason why the above-mentioned defects occur in the voiced stationary part is that the utterer of the analyzed waveform has a narrow band width for one formant, and furthermore, the center frequency of the first formant is twice the pitch frequency (the reciprocal of the pitch period). For example, if the pitch frequency is an integer multiple of
The harmonics resonate with the formant, and the pitch frequency is 2
The double frequency component is extremely emphasized, and the fundamental frequency of the analyzed waveform becomes twice the pitch frequency. When the apparent period of the analyzed waveform in which the frequency component twice the pitch frequency is extremely emphasized, that is, the reciprocal of the apparent fundamental frequency becomes 1/2 of the original pitch period, the autocorrelation coefficient waveform of the analyzed waveform shows periodicity with a period that is 1/2 of the original pitch period. Therefore, for example, as shown in Figure 1, the maximum value of the autocorrelation coefficient, 101, which appears due to the resonance between the formant and the pitch, and the maximum value, 102, in the original pitch period are almost equal, which prevents false detection of the pitch period. Cause.

声音過渡部に前記欠点が発生する原因は、有声
音過渡部はピツチ周期及び音声波形の形状の変化
が大きく、かつ比較的に不規則なことにある。ピ
ツチ周期及び音声波形の形状の変化が大きく、か
つ比較的に不規則な被分析波形の自己相関係数波
形は多くの場合にピツチ周期による大まかな周期
性は認められるが、ピツチ周期又はピツチ周期の
整数倍の周期における自己相関係数の極大値が比
較的に不揃いとなり、しばしばピツチ周期におけ
る自己相関係数の極大値がピツチ周期の整数倍周
期における極大値より小さくなり、いわゆる整数
倍ピツチ周期エラーが多く起る。
The reason why the above-mentioned defects occur in the voiced sound transition part is that the pitch period and the shape of the voice waveform in the voiced sound transition part change greatly and are relatively irregular. The autocorrelation coefficient waveform of the analyzed waveform, which has a large change in the pitch period and the shape of the audio waveform and is relatively irregular, has rough periodicity due to the pitch period in many cases. The maximum values of the autocorrelation coefficients at periods that are integral multiples of the pitch period are relatively uneven, and the maximum value of the autocorrelation coefficient at the pitch period is often smaller than the maximum value at periods that are an integral multiple of the pitch period, so-called integral multiple pitch periods. Many errors occur.

なお、有声音過渡部は一般にピツチ周波数の変
化が大きく、ピツチ周波数の高調波とホルマント
周波数との共振による音声波形への影響は、有声
音定常部における影響と比較すると小いさく、有
声過渡部おけるいわゆるホルマントピツチエラー
の発生頻度は有声音定常部における発生頻度より
ちいさい。
Note that voiced transient parts generally have large changes in the pitch frequency, and the effect on the speech waveform due to resonance between the harmonics of the pitch frequency and the formant frequency is small compared to the effect in the steady voiced part; The frequency of occurrence of so-called formant pitch errors is smaller than that in voiced stationary parts.

ピツチ検出エラーの影響が音声分析合成系にお
ける合成音の品質に与える影響は、聴覚的には、
有性音定常部におけるエラーが大きく、有性音過
渡部におけるエラーの影響は比較的に軽微であ
る。
The impact of pitch detection errors on the quality of synthesized speech in a speech analysis and synthesis system is auditory:
The error in the constant sound part is large, and the effect of the error in the transient part of the sound sound is relatively small.

従来、特に合成音の品質に大きな影響を与える
有声音定常部におけるピツチ検出エラーを軽減な
いし除去するために、種々の方法が試みられてい
る。しかしながら従来の方式は有声音の定常部と
語頭等の過渡部とを一率に扱つていたために、例
えば語頭において、たまたまピツチ検出誤りが発
生すると、前記ピツチ検出誤りが将来のピツチ検
出特性に悪影響を及ぼすという欠点を有してい
る。
In the past, various methods have been attempted to reduce or eliminate pitch detection errors, particularly in voiced sound stationary parts, which have a large effect on the quality of synthesized speech. However, because the conventional method treats both the steady part of voiced sounds and the transient part such as the beginning of a word, if a pitch detection error happens to occur at the beginning of a word, for example, the pitch detection error will affect future pitch detection characteristics. It has the disadvantage of having negative effects.

従来の方法として例えば音声のピツチ周期の変
化が比較的にゆるやかであることを利用して相隣
るフレームにおけるピツチ周期の差分を、あらか
じめ定められた範囲内に限定してピツチ周期の抽
出を行なうことによりピツチ周期の検出誤りを防
ぐ方式が知られている。しかしながら、前記の検
索範囲内に制限する方式は例えば第3図に示すよ
うに基本ピツチ周期の曲線301上から検出誤り
等のため一度例えば2倍のピツチ周期を持つ倍ピ
ツチ周期曲線302上のいわゆる倍ピツチ周期を
検出してしまうと、再び正しい基本ピツチ周期を
検出することが困難となる欠点を持つている。特
にいわゆる語頭等の無音部から有声音部に移行す
る場合、あるいは無声音部から有声音部に移行す
る場合には前記倍ピツチ周期を誤つて検出する危
険性が大きい。
Conventional methods, for example, take advantage of the fact that the pitch period of audio changes relatively slowly and limit the difference in pitch period between adjacent frames to a predetermined range to extract the pitch period. There is a known method for preventing pitch period detection errors. However, as shown in FIG. 3, the above-mentioned method of limiting the search range to the basic pitch period curve 301, for example, is difficult to detect due to a detection error, etc. Once the double pitch period is detected, it is difficult to detect the correct basic pitch period again. In particular, when there is a transition from a silent part such as the beginning of a word to a voiced part, or from an unvoiced part to a voiced part, there is a high risk of erroneously detecting the double pitch period.

前記欠点を緩和するために、過去数フレームで
検出されたピツチ周期からピツチの検索範囲を決
定する場合には、いわゆる語頭におけるピツチの
検索範囲の決定が困難であるという欠点を有して
いた。
In order to alleviate the above-mentioned drawback, when determining the pitch search range from the pitch cycles detected in the past few frames, there is a drawback that it is difficult to determine the pitch search range at the beginning of a word.

本発明の目的は自己相関係数に基づいてピツチ
抽出を行なうピツチ抽出装置において、ピツチ周
期の検出誤りを防止し、より確実に正しいピツチ
の検出を可能とするピツチ抽出装置を提供するこ
とにある。
SUMMARY OF THE INVENTION An object of the present invention is to provide a pitch extraction device that performs pitch extraction based on an autocorrelation coefficient, which prevents pitch cycle detection errors and enables more reliable pitch detection. .

本発明は各フレームの有声/無声を判別する手
段と、前記判別された有声/無声情報から連続有
声区間と連続無声区間とを検出する手段と、前記
連続有声区間情報と連続無声区間情報とから話
中/話断を判別する手段、前記有声/無声情報と
前記話中/話断情報とから一定時間間隔を有する
第1の語頭を検出する手段とを有し、この第1の
語頭区間で整数倍ピツチ周期誤りを訂正するとと
もにホルマント誤りを訂正したピツチ周期を抽出
し前記第1の語頭を除く有声音区間の各フレーム
におけるピツチ周期の検索範囲を過去フレームに
おいて検出されたピツチ周期の近傍に制限してピ
ツチ周期の検索を行なうことを特徴とするピツチ
抽出装置が得られる。
The present invention includes means for determining whether each frame is voiced or unvoiced, means for detecting a continuous voiced section and a continuous unvoiced section from the determined voiced/unvoiced information, and a means for detecting a continuous voiced section and a continuous unvoiced section from the continuous voiced section information and continuous unvoiced section information. means for determining busy/disconnected speech; and means for detecting a first word beginning having a predetermined time interval from the voiced/unvoiced information and the busy/discontinued information; Integer multiple pitch period errors are corrected, and pitch periods with formant errors corrected are extracted, and the search range for pitch periods in each frame of the voiced section excluding the first word beginning is set in the vicinity of pitch periods detected in past frames. A pitch extraction device is obtained which is characterized in that it searches for a pitch period with a limit.

次に本発明の実施例を図面を参照して説明す
る。
Next, embodiments of the present invention will be described with reference to the drawings.

第4図は本発明の実施例を説明するためのブロ
ツク図、第4図において一点鎖線401で囲んだ
部分は本発明の構成を示す。波形入力端子402
を介して高域周波数を除去された音声波形入力信
号がA/D変換器403へ入力される。A/D変
換器403は前記音声波形入力信号に含まれる最
高周波数成分の2倍以上の周波数で前記音声波形
入力信号を標本化し、更に量子化して標本化音声
波形信号を発生し、前記標本化音声波形信号を一
時記憶器404へ出力する。
FIG. 4 is a block diagram for explaining an embodiment of the present invention, and the portion surrounded by a dashed line 401 in FIG. 4 shows the configuration of the present invention. Waveform input terminal 402
The audio waveform input signal from which high frequencies have been removed is input to the A/D converter 403 via the A/D converter 403 . The A/D converter 403 samples the audio waveform input signal at a frequency that is twice or more the highest frequency component included in the audio waveform input signal, further quantizes the audio waveform input signal, generates a sampled audio waveform signal, and converts the audio waveform input signal into a sampled audio waveform signal. The audio waveform signal is output to temporary storage 404.

一時記憶器404は前記標本化音声波形信号を
一時的に記憶し、フレーム周期毎に前記標本化音
声波形信号を有無判定器405と自己相関係数計
測器406とへ出力する。有無判定器405はフ
レーム毎に切り出される標本化音声波形信号を一
般によく知られている種々の有声音/無声音判別
手段により有声音と無声音とに分類する。なお、
前記無声音は無音を含む。
A temporary storage device 404 temporarily stores the sampled audio waveform signal, and outputs the sampled audio waveform signal to the presence/absence determination device 405 and the autocorrelation coefficient measuring device 406 for each frame period. The presence/absence determiner 405 classifies the sampled audio waveform signal cut out for each frame into voiced sounds and unvoiced sounds using various generally well-known voiced/unvoiced sound discrimination means. In addition,
The unvoiced sound includes silence.

更に有無判定器405は前記分類結果を有無信
号として連続有声検出器407と連続無声検出器
408とピツチ検索範囲計測器、409と第1語
頭検出器411と第2語頭検出器412とへ出力
する。連続有声検出器407は後記する話中判定
器410から供給される話中判定信号が話断を示
すときに有無判定器405から供給される有無信
号が連続して有声の場合(例えば100mSEC連続
有声の場合)に連続有声信号を後記する話中判定
器410へ供給する。
Further, the presence/absence determining unit 405 outputs the classification result as a presence/absence signal to a continuous voicing detector 407, a continuous unvoiced detector 408, a pitch search range measuring device 409, a first word beginning detector 411, and a second word beginning detector 412. . Continuous voicing detector 407 detects continuous voicing when the busy determination signal supplied from busy determiner 410 (to be described later) indicates a disconnection and the presence/absence signal supplied from presence/absence determiner 405 is continuously voiced (for example, 100m SEC continuous voiced). ), a continuous voiced signal is supplied to a busy determiner 410, which will be described later.

また、連続有声検出器407は後記する話中判
定器410から供給される話中判定信号が話中を
示すときは機能を抑圧される。連続無声検出器4
08は後記する話中判定器410から供給される
話中判定信号が話中を示すときに有無判定器、4
05から供給される有無信号が連続して無声の場
合(例えば300mSEC連続して無声の場合)に連
続無声信号を後記する話中判定器410へ供給す
る。また連続無声検出器408は後記する話中判
定器410から供給される話中判定信号が話断を
示すときは機能を抑圧される。
Further, the function of the continuous voicing detector 407 is suppressed when a busy determination signal supplied from a busy determining device 410 (to be described later) indicates that the telephone is busy. Continuous silence detector 4
08 is a presence/absence determination device when a busy determination signal supplied from a busy determination device 410 (to be described later) indicates a busy state;
When the presence/absence signal supplied from 05 is continuously silent (for example, when 300 m SEC is continuously silent), the continuous silent signal is supplied to the busy determiner 410, which will be described later. Further, the function of the continuous silence detector 408 is suppressed when a busy determination signal supplied from a busy determining device 410 (described later) indicates a disconnection.

話中判定器410は双安定マルチバイブレータ
であり、前記連続有声信号により話中に、又前記
連続無声信号により話断に設定される。話中判定
器410は話中判定信号を連続有声検出器407
及び連続無声検出器408以外に第1語頭検出器
411と第2語頭検出器412とへ供給する。
The busy determiner 410 is a bistable multivibrator, and is set to be busy by the continuous voiced signal and to be disconnected by the continuous unvoiced signal. The busy determiner 410 converts the busy determination signal into a continuous voicing detector 407.
In addition to the continuous silence detector 408, the signal is also supplied to a first word-beginning detector 411 and a second word-beginning detector 412.

第1語頭検出器411はANDゲートであり、
話中判定器410から供給される話中判定信号が
話断であり、しかも有無判定器405から供給さ
れる有無信号が有声を示すときに第1語頭信号を
発生しピツチ検索範囲計測器409と整数倍ピツ
チ周期誤り訂正器413とホルマントピツチ誤り
訂正指示器414とへ前記第1語頭信号を供給す
る。従つて第1語頭信号の最大時間巾は連続有声
検出器407が設定した時間、例えば100mSEC
と一致する。
The first word beginning detector 411 is an AND gate,
When the busy determination signal supplied from the busy determiner 410 indicates a disconnection and the presence/absence signal supplied from the presence/absence determiner 405 indicates voiced, a first word beginning signal is generated and the pitch search range measuring device 409 The first word beginning signal is supplied to an integer pitch period error corrector 413 and a formant pitch error correction indicator 414. Therefore, the maximum time width of the first word-initial signal is the time set by the continuous voicing detector 407, for example, 100 mSEC.
matches.

第2語頭検出器412はタイマ付ANDゲート
であり、話中判定器410から供給される話中判
定信号が話断であり、しかも有無判定器405か
ら供給される有無信号が有声を示すときに第2語
頭信号を発生し、同時にタイマが作動を開始す
る。前記タイマは例えば30mSEC程度に設定さ
れ、前記話中判定信号が話断であり、しかも前記
有無信号が有声を示すときは歩進し、前記2つの
条件の少なくとも一つが満されない場合には初期
状態に戻る。また前記タイマがあらかじめ設定さ
れた時間例えば30mSECに達すると、第2語頭
信号は抑圧される。第2語頭検出器412は第2
語頭信号をホルマントピツチ誤り訂正指示器41
4へ出力する。ホルマント誤り訂正指示器414
は前記第1語頭信号を供給され、しかも前記第2
語頭信号を供給されていないときに訂正指示信号
をホルマント誤り訂正器415へ出力する。
The second word beginning detector 412 is an AND gate with a timer, and when the busy determination signal supplied from the busy determiner 410 indicates a speech interruption and the presence/absence signal supplied from the presence/absence determiner 405 indicates voiced. A second word beginning signal is generated and at the same time a timer starts operating. The timer is set to about 30 mSEC, for example, and advances when the busy determination signal indicates a disconnection and the presence/absence signal indicates voice presence, and returns to the initial state when at least one of the two conditions is not met. Return to Further, when the timer reaches a preset time, for example, 30 mSEC, the second word beginning signal is suppressed. The second word beginning detector 412
Formant pitch error correction indicator 41
Output to 4. Formant error correction indicator 414
is supplied with the first word-initial signal, and is supplied with the second word-initial signal.
A correction instruction signal is output to the formant error corrector 415 when the word beginning signal is not supplied.

ピツチ検索範囲計測器409は有無判定器40
5から供給される有無信号が無声を示す場合には
ピツチ検索器416へあらかじめ設定されたピツ
チ検索範囲を指示するか、又はピツチ検索の禁止
を指示する。またピツチ検索範囲計測器409は
第1語頭検出器411が第1語頭信号を出力して
いるときはピツチ検索器416へあらかじめ設定
されたピツチ検索範囲を指示する。更にピツチ検
索範囲計測器409は有無判定器405から供給
される有無信号が有声を示し、なおかつ第1語頭
検出器411が第1語頭信号を出力していないと
きにはホルマント誤り訂正器415から供給され
る過去のピツチ出力データよりピツチ検索範囲を
決定し、ピツチ検索器416のピツチ検索範囲を
指示する。なおピツチ検索範囲は例えば LMAX=C1AVMIN=C2AV として決定される。又LAVは1フレーム毎に例え
ば aLAV+bPITCHP に置き替えられる。但しLMAXはピツチ周期検索
範囲の最大値、LMINはピツチ周期検索範囲の最
小値、LAVは過去の平均的なピツチ周期PITCHP
は直前のフレームにおけるピツチ周期、aは定数
(0<a<1)、bは定数(0<b<1)である。
また一般にa>bである。またC1、C2は定数で
ありC1>1>C2>0である。なおピツチ検索範
囲藩を決定する方法は上記の方法以外にも種々実
施し得ることは明らかである。
The pitch search range measuring device 409 is the presence/absence determination device 40
When the presence/absence signal supplied from 5 indicates silence, it instructs the pitch search unit 416 to a preset pitch search range or to prohibit pitch search. Further, the pitch search range measuring device 409 instructs the pitch search device 416 to a preset pitch search range when the first word beginning detector 411 is outputting the first word beginning signal. Further, the pitch search range measuring device 409 receives a signal from the formant error corrector 415 when the presence/absence signal supplied from the presence/absence determiner 405 indicates voicing and the first word-initial detector 411 is not outputting the first word-initial signal. The pitch search range is determined from past pitch output data, and the pitch search range is instructed to the pitch search unit 416. Note that the pitch search range is determined as, for example, L MAX =C 1 L AV L MIN =C 2 L AV . Also, L AV is replaced by, for example, aL AV +bPITCHP for each frame. However, L MAX is the maximum value of the pitch cycle search range, L MIN is the minimum value of the pitch cycle search range, and L AV is the past average pitch cycle PITCH P
is the pitch period in the immediately previous frame, a is a constant (0<a<1), and b is a constant (0<b<1).
Also, generally a>b. Further, C 1 and C 2 are constants, and C 1 >1>C 2 >0. Note that it is clear that various methods other than the above method can be used to determine the pitch search range domain.

自己相関係数計測器406は一時記憶器404
からフレーム周期毎に供給される標本化音声波形
信号から自己相関係数列を計測し、前記自己相関
係数列をピツチ検索器416へ出力する。ピツチ
検索器416はピツチ検索範囲計測器409によ
り指示されたピツチ検索範囲について前記自己相
関係数列の最大値及び極大値を検索し、更に前記
最大値及び極大値に対応する自己相関係数の各遅
れ時間を求める。
The autocorrelation coefficient measuring device 406 is a temporary memory device 404
A sequence of autocorrelation coefficients is measured from a sampled audio waveform signal supplied every frame period from , and the sequence of autocorrelation coefficients is output to a pitch searcher 416 . The pitch searcher 416 searches for the maximum value and local maximum value of the autocorrelation coefficient sequence in the pitch search range specified by the pitch search range measuring device 409, and further searches for each of the autocorrelation coefficients corresponding to the maximum value and local maximum value. Find the delay time.

更にピツチ検索器416は前記各遅れ時間情報
と各遅れ時間に対応する各自己相関係数値とを整
数倍ピツチ周期誤り訂正器413へ出力する。整
数倍ピツチ周期誤り訂正器413は第1語頭検出
器411より第1語頭信号が供給される場合のみ
整数倍ピツチ周期誤り訂正を実施する。整数倍ピ
ツチ周期誤り訂正は例えば前記最大値に対応する
遅れ時間の整数分の1の遅れ時間(例えば1/2、
1/3、1/4、………)付近に前記最大値とほとんど
変らない値の極大値が存在する場合に前記極大値
に対応する遅れ時間をピツチ周期と決定し、前記
極大値が存在しない場合には前記最大値に対応す
る遅れ時間をピツチ周期と決定する。
Further, the pitch searcher 416 outputs each delay time information and each autocorrelation value corresponding to each delay time to the integer pitch period error corrector 413. The integer multiple pitch period error corrector 413 performs integer multiple pitch period error correction only when the first word beginning signal is supplied from the first word beginning detector 411. Integer multiple pitch period error correction is performed using, for example, a delay time that is an integer fraction of the delay time corresponding to the maximum value (for example, 1/2,
1/3, 1/4, ......), the delay time corresponding to the maximum value is determined to be the pitch period, and the maximum value is determined to exist. If not, the delay time corresponding to the maximum value is determined as the pitch period.

整数倍ピツチ周期誤り訂正器413は更に前記
ピツチ周期及び前記各遅れ時間並びに各遅れ時間
に対応する自己相関係数値をホルマント誤り訂正
器415へ出力する。ホルマント誤り訂正器41
5はホルマント誤り訂正指示器414より訂正指
示信号が供給される場合のみホルマント誤り訂正
を実施する。ホルマント誤り訂正は過去数フレー
ムにおけるピツチ周期による予測値(以下予測ピ
ツチ周期という)の近傍に整数倍ピツチ周期誤り
訂正器413より供給されたピツチ周期が存在し
ない場合で更にピツチ周期に対応する極大値以外
の極大値の各遅れ時間に予測ピツチ周期の近傍に
あるものが存在する場合に、前記近傍にある遅れ
時間をピツチ周期と訂正して決定することにより
実施される。
The integer multiple pitch period error corrector 413 further outputs the pitch period, each delay time, and the autocorrelation coefficient corresponding to each delay time to the formant error corrector 415. Formant error corrector 41
5 performs formant error correction only when a correction instruction signal is supplied from the formant error correction indicator 414. Formant error correction is performed when there is no pitch period supplied from the integer multiple pitch period error corrector 413 in the vicinity of the predicted value based on the pitch period in the past several frames (hereinafter referred to as predicted pitch period), and a local maximum value corresponding to the pitch period is further corrected. This is carried out by correcting and determining the delay time in the vicinity as the pitch period when there is a delay time of a local maximum value other than the predicted pitch period.

なお、全極大値の各遅れ時間が全て予測ピツチ
周期の近傍に存在しない場合には、整数倍ピツチ
周期誤り訂正器413より供給されたピツチ周期
をそのままピツチ周期と決定する。ホルマント誤
り訂正器415は訂正後のピツチ周期データをピ
ツチ検索範囲計測器409へ供給するとともに、
ピツチ周期データ出力端子417を介して出力す
る。
Note that if all the delay times of all maximum values are not in the vicinity of the predicted pitch period, the pitch period supplied from the integer multiple pitch period error corrector 413 is directly determined as the pitch period. The formant error corrector 415 supplies the corrected pitch period data to the pitch search range measuring device 409, and
The pitch period data is outputted via the output terminal 417.

以上説明した様に本発明の特徴は有声/無声情
報から連続有声区間と連続無声区間とを検出し、
更に前記2つの区間から話中/話断を判別するこ
とにより比較的に長時間(例えば100mSEC程
度)持続する第1の語頭と、比較的に短時間(例
えば30mSEC程度)持続する第2の語頭とを検
出し、第1の語頭において検出されたピツチ周期
データによりピツチ検索範囲の初期値を設定し第
1の語頭を除く有声音区間の各フレームにおける
ピツチ周期の検索範囲を過去のフレームにおいて
検出されたピツチ周期の近傍に制限してピツチ周
期の検出を行ない、更に第1の語頭においては整
数倍ピツチ周期の訂正を行ない、第2の語頭を除
く第1の語頭においてはホルマントの影響による
ピツチ周期の誤りを、比較的に近い過去の各フレ
ームにおけるピツチ周期データからの予測値を参
照して訂正することにある。
As explained above, the feature of the present invention is to detect continuous voiced sections and continuous unvoiced sections from voiced/unvoiced information,
Furthermore, by determining whether the person is talking or not speaking from the two sections, the first word beginning that lasts for a relatively long time (for example, about 100 mSEC) and the second word beginning that lasts for a relatively short time (for example, about 30 mSEC) can be determined. and sets the initial value of the pitch search range using the pitch period data detected at the beginning of the first word, and detects the search range of the pitch period in each frame of the voiced sound section excluding the first word beginning in past frames. The pitch period is detected by restricting it to the vicinity of the detected pitch period, and furthermore, the pitch period is corrected by an integer multiple at the beginning of the first word, and the pitch period due to the influence of the formant is detected at the beginning of the first word except for the second beginning. The purpose is to correct period errors by referring to predicted values from pitch period data in each frame in the relatively near past.

これらの手段によりいわゆる語頭におけるピツ
チ検索範囲の決定が困難であるという前記欠点を
解決し得る。またいわゆる語頭を除く有声音区間
については、ピツチ周期の検索範囲を過去のフレ
ームにおいて検出されたピツチ周期により、2倍
ピツチ周期とホルマントの影響による自己相関係
数の極大値における遅れ時間(ほぼピツチ周期の
N1/N2の遅れ時間となる。但しN1、N2は整数で
あり、N1<N2である。)とを含まない程度に設定
し、ピツチ周期誤りの原因の大部分を除去し得る
のは言うまでもない。
These means can solve the aforementioned drawback that it is difficult to determine the pitch search range at the beginning of a word. In addition, for voiced sections excluding so-called word beginnings, the pitch period search range is determined by the pitch period detected in past frames, and the delay time (approximately pitch) at the maximum value of the autocorrelation coefficient due to the influence of double pitch period and formant. periodic
The delay time is N1/N2. However, N1 and N2 are integers, and N1<N2. ), it is possible to eliminate most of the causes of pitch cycle errors.

また、いわゆる語頭については第1の語頭と第
2の語頭とを設けることにより、整数倍ピツチ周
期誤りとホルマントの影響によるピツチ周期誤り
の訂正をより正確に実施し得る。すなわち、特に
無声音部分と有声音部分との境界付近の有声音部
分において多く発生する傾向のある整数倍ピツチ
周期誤りについては全語頭区間つまり第1の語頭
について訂正を実施する。また、注意深く訂正し
ないと正しいピツチ周期を例えば倍ピツチ周期に
誤つて訂正する可能性が大きいホルマントの影響
によるピツチ誤りについては過去の数フレームの
ピツチ周期の予測値を参照しながら訂正を実施す
ることが望ましい。
Furthermore, by providing a first word beginning and a second word beginning, it is possible to more accurately correct an integer multiple pitch period error and a pitch period error due to the influence of formants. That is, for integer multiple pitch cycle errors that tend to occur particularly frequently in voiced parts near the boundary between unvoiced parts and voiced parts, correction is performed for the entire word-initial section, that is, the first word-initial section. In addition, for pitch errors due to the influence of formants, where there is a high possibility that the correct pitch period will be incorrectly corrected to, for example, a double pitch period, if not corrected carefully, correction should be performed while referring to the predicted values of the pitch period of several past frames. is desirable.

本発明においては第2の語頭を設けることによ
り、第2の語頭におけるピツチ周期よりピツチ周
期の予測値をつくり、第2の語頭に含まれない第
1の語頭におけるホルマントの影響によるピツチ
誤りを適切に訂正し得る。以上のようにして正確
に求められた第1の語頭におけるピツチ周期デー
タを用いることにより、前記第1の語頭区間を除
く有声音区間の各フレームにおけるピツチ周期の
検索範囲の決定が適確に行ない得るのは明らかで
ある。
In the present invention, by providing a second word beginning, a predicted value of the pitch period is created from the pitch period at the second word beginning, and pitch errors due to the influence of the formant at the first word beginning, which is not included in the second word beginning, can be appropriately corrected. can be corrected. By using the pitch period data at the first word beginning accurately determined as described above, the search range for the pitch period in each frame of the voiced sound section excluding the first word beginning section can be determined accurately. It's obvious what you get.

以上のように本発明はピツチ周期検索範囲の決
定が適確かつ容易であり、かつ前記ピツチ周期検
索範囲を決定するためのデータを提供する、いわ
ゆる語頭についても正確なピツチ周期を抽出し得
るという効果がある。
As described above, according to the present invention, it is possible to appropriately and easily determine the pitch period search range, and it is also possible to extract accurate pitch periods even for so-called word beginnings, which provide data for determining the pitch period search range. effective.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図はホルマントの影響を説明するための自
己相関係数波形図、第2図は倍ピツチ周期の影響
を説明するための自己相関係数波形図、第3図は
従来方式において生じるピツチ検出エラーの一例
を説明する図、第4図は本発明の実施例を説明す
るためのブロツク図である。 402……波形入力端子、403……A/D変
換器、404……一時記憶器、405……有無判
定器、406……自己相関係数計測器、407…
…連続有声検出器、408……連続無声検出器、
409……ピツチ検索範囲計測器、410……話
中判定器、411……第1語頭検出器、412…
…第2語頭検出器、413……整数倍ピツチ周期
誤り訂正器、414……ホルマントピツチ誤り訂
正指示器、415……ホルマント誤り訂正器、4
16……ピツチ検索器、417……ピツチ周期デ
ータ出力端子。
Figure 1 is an autocorrelation coefficient waveform diagram to explain the influence of formant, Figure 2 is an autocorrelation coefficient waveform diagram to explain the influence of double pitch period, and Figure 3 is pitch detection that occurs in the conventional method. FIG. 4, which is a diagram for explaining an example of an error, is a block diagram for explaining an embodiment of the present invention. 402...Waveform input terminal, 403...A/D converter, 404...Temporary memory, 405...Presence/absence determination device, 406...Autocorrelation coefficient measuring device, 407...
...Continuous voiced detector, 408...Continuous unvoiced detector,
409... Pitch search range measuring device, 410... Busy determiner, 411... First word beginning detector, 412...
...Second word beginning detector, 413...Integer multiple pitch cycle error corrector, 414...Formant pitch error correction indicator, 415...Formant error corrector, 4
16... Pitch search device, 417... Pitch cycle data output terminal.

Claims (1)

【特許請求の範囲】[Claims] 1 音声波形をピツチ周期程度のフレーム周期で
分析して得られる自己相関係数に基づいてピツチ
抽出を行なうピツチ抽出装置において、各フレー
ムの有声/無声を判別する手段と、前記判別され
た有声/無声情報から連続有声区間を検出する手
段と、前記判別された有声/無声情報から連続無
声区間を検出する手段と、前記連続有声区間情報
と前記連続無声区間情報とから話中/話断を判別
する手段と、前記有声/無声情報と前記話中/話
断情報とから一定時間間隔を有する第1の語頭を
検出する手段とを有し、この第1の語頭区間で整
数倍ピツチ周期誤りを訂正すると共にホルマント
誤りを訂正したピツチ周期を抽出し、前記第1の
語頭を除く有声音区間の各フレームにおけるピツ
チ周期の検索範囲を過去のフレームにおいて検出
されたピツチ周期の近傍に制限してピツチ周期の
検索を行なうことを特徴とするピツチ抽出装置。
1. In a pitch extraction device that performs pitch extraction based on an autocorrelation coefficient obtained by analyzing a speech waveform at a frame period approximately equal to the pitch period, there is provided a means for determining whether each frame is voiced or unvoiced, and means for determining whether each frame is voiced or unvoiced; means for detecting a continuous voiced section from unvoiced information; means for detecting a continuous unvoiced section from the determined voiced/unvoiced information; and determining busy/interrupted speech from the continuous voiced section information and the continuous unvoiced section information. and means for detecting a first word beginning having a certain time interval from the voiced/unvoiced information and the busy/disconnected information, and detecting an integer multiple pitch cycle error in the first word beginning section. At the same time, the pitch period whose formant error has been corrected is extracted, and the search range for the pitch period in each frame of the voiced section excluding the first word initial is limited to the vicinity of the pitch period detected in the past frame. A pitch extraction device characterized by searching for a period.
JP4726378A 1978-04-20 1978-04-20 Pitch extraction unit Granted JPS54139307A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP4726378A JPS54139307A (en) 1978-04-20 1978-04-20 Pitch extraction unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP4726378A JPS54139307A (en) 1978-04-20 1978-04-20 Pitch extraction unit

Publications (2)

Publication Number Publication Date
JPS54139307A JPS54139307A (en) 1979-10-29
JPS6151320B2 true JPS6151320B2 (en) 1986-11-08

Family

ID=12770396

Family Applications (1)

Application Number Title Priority Date Filing Date
JP4726378A Granted JPS54139307A (en) 1978-04-20 1978-04-20 Pitch extraction unit

Country Status (1)

Country Link
JP (1) JPS54139307A (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58193597A (en) * 1982-05-07 1983-11-11 日本電気株式会社 Pitch extractor
JPS6022197A (en) * 1983-07-18 1985-02-04 松下電器産業株式会社 Pitch extractor
JPS60150099A (en) * 1984-01-18 1985-08-07 三洋電機株式会社 Pitch parameter smoothing method
US6240386B1 (en) * 1998-08-24 2001-05-29 Conexant Systems, Inc. Speech codec employing noise classification for noise compensation
CN111063372B (en) * 2019-12-30 2023-01-10 广州酷狗计算机科技有限公司 Method, device and equipment for determining pitch characteristics and storage medium

Also Published As

Publication number Publication date
JPS54139307A (en) 1979-10-29

Similar Documents

Publication Publication Date Title
US7756700B2 (en) Perceptual harmonic cepstral coefficients as the front-end for speech recognition
KR100463417B1 (en) The pitch estimation algorithm by using the ratio of the maximum peak to candidates for the maximum of the autocorrelation function
JPH0431898A (en) Voice/noise separating device
US8942977B2 (en) System and method for speech recognition using pitch-synchronous spectral parameters
JPS6151320B2 (en)
US20030078770A1 (en) Method for detecting a voice activity decision (voice activity detector)
JPS6214839B2 (en)
JPS5912185B2 (en) Voiced/unvoiced determination device
KR0136608B1 (en) Phoneme recognizing device for voice signal status detection
JPH0377998B2 (en)
JPS60129796A (en) Sillable boundary detection system
KR100539176B1 (en) Device and method of extracting musical feature
KR100345402B1 (en) An apparatus and method for real - time speech detection using pitch information
JPH04230798A (en) Noise predicting device
JPS59149400A (en) Syllable boundary selection system
JPH1097288A (en) Background noise removing device and speech recognition system
JP2891259B2 (en) Voice section detection device
JPH02192335A (en) Word head detecting system
JP2557497B2 (en) How to identify male and female voices
KR0128669B1 (en) Real time detecting method for voice signal
JPS60138599A (en) Voice section detector
JPS5969798A (en) Extraction of pitch
JPH05108088A (en) Speech section detection device
JPH04211299A (en) Monosyllabic voice recognizing device
JPH01170998A (en) Phoneme section information generating device