JP2006171008A

JP2006171008A - Device, method and program for extracting fundamental frequency, and recording medium with the program stored thereon

Info

Publication number: JP2006171008A
Application number: JP2006014305A
Authority: JP
Inventors: Tomohiro Nakatani; 智広中谷; Toshio Irino; 俊夫入野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2001-09-28
Filing date: 2006-01-23
Publication date: 2006-06-29
Anticipated expiration: 2022-03-07
Also published as: JP4125322B2

Abstract

<P>PROBLEM TO BE SOLVED: To enable extraction with an enhanced stability under the presence of noise and other disturbances of voice or speech. <P>SOLUTION: The frequency characteristics of the input signals, such as voice or speech signal and musical signal, is deformed to the frequency characteristics suitable for extracting the fundamental frequency by using preprocessing. A power S(ωc) is calculated for each frequency on the deformed input signal. An envelope component eliminating power is obtained by removing the envelope component from the power. The mean value of the envelope component eliminating power is subtracted from the envelope component eliminating power obtained for each frequency. Each frequency of plural frequencies in the assumed frequency range where the fundamental frequency exists is set as a candidate for the fundamental frequency. The center frequency which is near the frequency of an integral multiple of each candidate of the fundamental frequency is sought. Then, the sum of the subtracted value on each center frequency close to the integral multiple of the frequency of obtained for each frequency is sought, and this sum is set as a harmonic structure power. The frequency, giving the maximum value of the obtained harmonic structure power, is set as the fundamental frequency. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

この発明は複数の音や雑音下の音声信号などの音響信号からその基本周波数を
狭い周波数帯域に分割して抽出する基本周波数抽出方法およびその装置、基本周
波数抽出プログラム、並びにその記録媒体に関する。
基本周波数抽出は、音声合成、音声認識、音声符号化等の信号処理の前処理と
して利用されている。したがって、雑音下での高精度な基本周波数抽出は、後処
理として実施される信号処理装置の性能を向上させることに寄与する。このよう
な信号処理装置には、以下のようなものが含まれる。
１．基本周波数の情報をもとに複数音源の混合音から各構成音を分離する音源
分離装置
２．基本周波数の情報をもとに音声を符号化する音声符号化・復号化装置
３. 騒がしい環境で人が鼻歌で歌った音の基本周波数からメロディを抽出し
て、楽曲を検索する音楽検索装置
４．音楽演奏を音響信号を受け取って楽譜、もしくは楽譜相当の音楽情報を抽
出する自動採譜装置
５．人が発した声の基本周波数の高さで機械にコマンドを渡す機械制御インタ
ーフェース装置、および、機械との対話装置 The present invention relates to a fundamental frequency extraction method and apparatus, a fundamental frequency extraction program, and a recording medium for extracting the fundamental frequency by dividing it into a narrow frequency band from a plurality of sound signals such as sound signals under noise.
Fundamental frequency extraction is used as preprocessing for signal processing such as speech synthesis, speech recognition, speech coding, and the like. Therefore, high-accuracy fundamental frequency extraction under noise contributes to improving the performance of a signal processing device implemented as post-processing. Such a signal processing device includes the following.
1. A sound source that separates each component sound from the mixed sound of multiple sound sources based on fundamental frequency information
Separation device 2. Speech encoding / decoding device that encodes speech based on fundamental frequency information 3. Extracts melodies from fundamental frequencies of sounds sung by humans in noisy environments
3. Music search device for searching for music Music performance is obtained by receiving sound signals and extracting music information equivalent to music scores
4. Automatic music transcription device Machine control interface that passes commands to the machine at the fundamental frequency of human voice
Interface device and machine interaction device

基本周波数抽出装置の従来例１を図１４を参照して説明する。
この従来例１は基本周波数の整数倍の周波数で、対数パワースペクトル上に周
期的なピークが現れることを利用する。信号入力部１１よりの入力信号は対数パ
ワースペクトル抽出部１２で短時間フーリエ変換され、その各スペクトルの絶対
値を２乗したものの対数をとって対数パワースペクトルが演算され、この対数パ
ワースペクトルに対し、周期性抽出部１３により短時間フーリエ逆変換が施され
、各周期と対応したレベル、つまり周期性が抽出される。最大値抽出部１４で周
期性が最大になる時間差を抽出する。この抽出した時間差、つまり周期の逆数が
基本周波数である。 Conventional example 1 of the fundamental frequency extracting apparatus will be described with reference to FIG.
This conventional example 1 utilizes the fact that periodic peaks appear on the logarithmic power spectrum at a frequency that is an integral multiple of the fundamental frequency. The input signal from the signal input unit 11 is Fourier-transformed for a short time by the logarithmic power spectrum extraction unit 12, and a logarithmic power spectrum is calculated by taking the logarithm of the square of the absolute value of each spectrum. Then, the short-time Fourier inverse transform is performed by the periodicity extracting unit 13, and the level corresponding to each period, that is, the periodicity is extracted. The maximum value extraction unit 14 extracts a time difference that maximizes the periodicity. The extracted time difference, that is, the reciprocal of the period is the fundamental frequency.

他の従来例２は、非特許文献１に示すように、瞬時周波数を用いて、前記従来
例１と同じ対数パワースペクトルのピークをより強調して、高精度な基本周波数
を抽出しようとするものである。入力信号の瞬時周波数成分を抽出し、これを各
周波数帯ごとの瞬時周波数φ′（ω）（ωは各周波数帯域ごとの中心周波数）と
、対数パワースペクトル抽出部が抽出したスペクトルＳ（ω）から、以下の式を
用いてピークを強調した瞬時周波数スペクトルＧ（λ₀ ）を求める。 As shown in Non-Patent Document 1, the other conventional example 2 uses the instantaneous frequency to further emphasize the same logarithmic power spectrum peak as the conventional example 1 and to extract a high-accuracy fundamental frequency. It is. The instantaneous frequency component of the input signal is extracted, and the instantaneous frequency φ ′ (ω) (ω is the center frequency for each frequency band) for each frequency band and the spectrum S (ω) extracted by the logarithmic power spectrum extraction unit. From this, the instantaneous frequency spectrum G (λ ₀ ) with the peak enhanced is obtained using the following equation.

この瞬時周波数スペクトルＧ（λ₀ ）のピークの周期性を抽出することで、基
本周波数を抽出する。
阿部敏彦、小林隆夫、今井聖、「瞬時周波数に基づく雑音環境下でのピッチ推定」信学論、vol.J79-D-II, No.11, pp.1771-1781, Nov., 1996

The fundamental frequency is extracted by extracting the periodicity of the peak of the instantaneous frequency spectrum G (λ ₀ ).
Toshihiko Abe, Takao Kobayashi, Sei Imai, "Pitch Estimation under Instantaneous Frequency Based Noise Environment", Science, vol.J79-D-II, No.11, pp.1771-1781, Nov., 1996

上述した基本周波数抽出装置の従来例１は、入力信号中に、目的音以外の複数
音声や雑音が含まれている場合、対数パワースペクトル上に目的音以外の特徴が
重畳されてしまう。このため、目的音以外の音のパワーが大きくなると、基本周
波数抽出の誤差が大きくなってしまう問題点があった。
また、従来例２では、瞬時周波数スペクトルは、瞬時周波数の微小区間の傾き
を用いて周波数ピークの強調を行うため、雑音下では、瞬時周波数の不安定な挙
動がそのまま瞬時周波数スペクトルにあらわれてしまう。このため、雑音下で安
定して基本周波数を抽出する特徴量としては不適切であった。 In the conventional example 1 of the fundamental frequency extraction device described above, when a plurality of sounds other than the target sound and noise are included in the input signal, features other than the target sound are superimposed on the logarithmic power spectrum. For this reason, when the power of sound other than the target sound is increased, there has been a problem that an error in extraction of the fundamental frequency is increased.
In Conventional Example 2, since the instantaneous frequency spectrum emphasizes the frequency peak using the slope of the minute section of the instantaneous frequency, the unstable behavior of the instantaneous frequency appears in the instantaneous frequency spectrum as it is under noise. . For this reason, it is inappropriate as a feature quantity for stably extracting a fundamental frequency under noise.

この発明の装置によれば、入力された音声信号又は音楽信号などの音響信号の
パワーを抽出するパワー抽出部と、各周波数のパワーの平均値を求める平均値演
算部と、上記パワーに対し、複数のある周波数について、これとその整数倍の周
波数に関する各パワーから上記平均値をそれぞれ減算した値の和を求める調波構
造パワー抽出部と、上記減算値の和の最大値を抽出して、これと対応する周波数
を基本周波数として出力する最大値抽出部とを備える。
また、この発明の方法によれば、入力された音声信号又は音楽信号などの音響
信号のパワーを抽出するパワー抽出過程と、各周波数のパワーの平均値を求める
平均値演算過程と、上記パワーに対し、複数のある周波数について、これとその
整数倍の周波数に関する各パワーから上記平均値をそれぞれ減算した値の和を求
める調波構造パワーを抽出する調波構造パワー抽出過程と、上記減算値の和の最
大値を抽出して、これと対応する周波数を基本周波数として出力する最大値抽出
過程と、を有する。
まず、この発明に関連する占有度抽出及びそれを使った基本周波数抽出等につ
いて説明する。
この発明に関連する発明では、入力音の各周波数成分が雑音の影響を受けてい
ない度合いを表す占有度を定義し、その占有度の抽出方法とその装置、またその
占有度を利用した基本周波数抽出方法とその装置を提供する。このため、瞬時周
波数に関する次の性質を利用する。
瞬時周波数φ′とは、例えば短時間フーリエ変換の各周波数ビン（ｂｉｎ）を
、等間隔にならんだ狭帯域通過フィルタ出力群とみなした場合の、その各出力波
の位相φの時間微分のことである。ある時刻のある帯域に強いパワーを持った占
有的な周波数成分があると、短時間フーリエ変換（以下、ＳＴＦＴと記す）にお
けるその周波数近傍のｂｉｎでは、瞬時周波数がほぼ一定値になることが知られ
ている。このため、雑音が少ない入力信号中の調波構造を持った音の瞬時周波数
を縦軸に、ＳＴＦＴの周波数ｂｉｎを横軸にとってプロットすると、図１５Ａ中
の細実線で示すような階段状になる。この階段の水平部分と各周波数ｂｉｎの中
心周波数ω_c が一致する点（φ′＝ω_c 、以下、不動点と呼ぶ）が、各高調波成
分の周波数とみなすことができる。一方、強い雑音がある入力信号中では、瞬時
周波数は明確な階段状にならず、図１５Ｂ中の細実線の６００Ｈｚ以上の部で示
すように、なだらかな右上がりの線になる。 According to the apparatus of the present invention, a power extraction unit that extracts the power of an input audio signal or an acoustic signal such as a music signal, an average value calculation unit that calculates an average value of the power of each frequency, For a plurality of frequencies, a harmonic structure power extraction unit for obtaining the sum of values obtained by subtracting the average value from each power related to this and an integer multiple of the frequency, and extracting the maximum value of the sum of the subtraction values, And a maximum value extraction unit that outputs the corresponding frequency as a fundamental frequency.
Further, according to the method of the present invention, a power extraction process for extracting the power of an input audio signal or an acoustic signal such as a music signal, an average value calculation process for obtaining an average value of power at each frequency, and the power On the other hand, for a plurality of frequencies, the harmonic structure power extraction process for extracting the harmonic structure power for obtaining the sum of the values obtained by subtracting the average value from the respective powers related to the integral multiple frequency, and the subtraction value A maximum value extracting process of extracting a maximum value of the sum and outputting a frequency corresponding thereto as a fundamental frequency.
First, occupancy extraction and fundamental frequency extraction using the occupancy associated with the present invention will be described.
In the invention related to the present invention, an occupancy representing the degree to which each frequency component of the input sound is not affected by noise is defined, a method and apparatus for extracting the occupancy, and a fundamental frequency using the occupancy An extraction method and apparatus are provided. For this reason, the following property regarding the instantaneous frequency is used.
The instantaneous frequency φ ′ is, for example, the time differentiation of the phase φ of each output wave when each frequency bin of the short-time Fourier transform is regarded as a group of narrow band pass filters output at equal intervals. It is. It is known that if there is an occupying frequency component with strong power in a certain band at a certain time, the instantaneous frequency becomes a substantially constant value in the bin near that frequency in the short-time Fourier transform (hereinafter referred to as STFT). It has been. Therefore, when the instantaneous frequency of a sound having a harmonic structure in an input signal with little noise is plotted on the vertical axis and the frequency bin of the STFT is plotted on the horizontal axis, a staircase pattern as shown by a thin solid line in FIG. 15A is obtained. . A point (φ ′ = ω _c , hereinafter referred to as a fixed point) where the horizontal portion of the staircase coincides with the center frequency ω _c of each frequency bin can be regarded as the frequency of each harmonic component. On the other hand, in an input signal with strong noise, the instantaneous frequency does not have a clear staircase shape, but becomes a gentle upward line as shown by a portion of 600 Hz or more of a thin solid line in FIG. 15B.

上述の瞬時周波数の性質を用いて、調波構造が周波数ｂｉｎの出力をどの程度
占めているかを評価するために、占有度（degree of dominance）Ｄ₀ （ω_c ）
を以下で定義する。

Ｂ（ω_c ）² は、中心周波数ω_c を持つ周波数ｂｉｎの近傍（ω∈Ω_c ）のｂ
ｉｎにおいて、各瞬時周波数（位相の微分値）φ′（ω）とω_c の差分をパワー
スペクトルＳ（ω）² で重み付き和をとったものである。占有的な周波数成分に
対応する不動点近傍では、φ′（ω）とω_c はほぼ同じ値をとるため、Ｂ（ω_c
）² は極小値をとると期待される。Ｂ（ω_c ）² の逆数（の対数）をとって、同
じ点で極大値を取るようにしたものがＤ₀ （ω_c ）である。なおＳ（ω）² によ
る重み付けは必ずしもしなくてもよいが、重み付けをした方が強いパワーを持つ
周波数の特徴がより強調される。また式（２）の分母はパワーによる定規化であ
る。
占有度Ｄ₀ （ω_c ）は、それ自身で調波構造を強調したスペクトル（占有度ス
ペクトルと呼ぶ）として見ることができるため、ケプストラム法のように対数パ
ワースペクトルに基づく基本周波数抽出法を、占有度スペクトルにそのまま適用
して基本周波数を抽出することができる。また、対数パワースペクトルを占有度
で重み付けした以下のスペクトルＤ_p も占有度スペクトルとして利用できる（式
中のａ，ｂは重み付け係数）。
Ｄ_p （ω_c ）＝log（Ｓ（ω_c ）^2a）＋ｂＤ₀ （ω_c ）（３）
＝log（Ｓ（ω_c ）^2a／Ｂ（ω_c ）^2b）（４）
Ｄ₀ （ω_c ），Ｄ_p （ω_c ）のどちらのスペクトルも、調波構造の強調効果に
より正確な基本周波数抽出が期待される。さらに、ＳＮＲの悪い状態でも、雑音
の影響の少ない周波数成分はそのまま強調され、雑音に埋もれた成分は抑制され
る。この結果、雑音下でも頑健な基本周波数抽出が実現出来る。 In order to evaluate how much the harmonic structure occupies the output of the frequency bin using the properties of the instantaneous frequency described above, the degree of dominance D ₀ (ω _c )
Is defined below.

B (ω _c ) ² is b in the vicinity (ω∈Ω _c ) of the frequency bin having the center frequency ω _c.
In, the difference between each instantaneous frequency (phase differential value) φ ′ (ω) and ω _{c is} a weighted sum of the power spectrum S (ω) ² . In the vicinity of the fixed point corresponding to the occupied frequency component, φ ′ (ω) and ω _c take almost the same value, so B (ω _c
² is expected to take a local minimum. D ₀ (ω _c ) is obtained by taking the reciprocal of B (ω _c ) ² and taking the maximum value at the same point. Although weighting with S (ω) ² is not necessarily performed, the characteristics of the frequency having stronger power are more emphasized when weighting is performed. The denominator of equation (2) is regularization by power.
Since the occupancy D ₀ (ω _c ) can be viewed as a spectrum that emphasizes the harmonic structure by itself (referred to as an occupancy spectrum), a fundamental frequency extraction method based on a logarithmic power spectrum, such as a cepstrum method, The fundamental frequency can be extracted by applying it directly to the occupancy spectrum. Further, the spectrum D _p of less weighted the logarithmic power spectrum in occupancy can be used as occupancy spectrum (a in the formula, b is the weighting factor).
D _p (ω _c ) = log (S (ω _c ) ^2a ) + bD ₀ (ω _c ) (3)
= Log (S (ω _c ) ^2a / B (ω _c ) ^2b ) (4)
In both spectra of D ₀ (ω _c ) and D _p (ω _c ), accurate fundamental frequency extraction is expected due to the enhancement effect of the harmonic structure. Furthermore, even in a state with a poor SNR, frequency components that are less affected by noise are emphasized as they are, and components buried in noise are suppressed. As a result, robust fundamental frequency extraction can be realized even under noise.

精緻化した基本周波数を求めるために、以下で定義する不動点を利用する。Ｓ
ＴＦＴのとなりあった周波数ｂｉｎの中心周波数をω_c1,ω_c2とし（ω_c1＜ω_c2）、
各周波数ｂｉｎの瞬時周波数をφ’（ω_c1）,φ’（ω_c2）とし、以下の等式を満
たす場合、ω_c1とω_c2の間にφ’（ω）＝ωとなる周波数ωが存在し、これを不動
点と呼ぶ。
φ’（ω_c1）＞ω_c1andφ’（ω_c2）＜ ω_c2
不動点の周波数は、基本周波数を持つ音の各周波数成分の周波数に相当すると
考えられる。特に、大きな占有度の値を持つ不動点は、背景雑音に比べて十分に
強い周波数成分に相当することが予想されるため、この不動点の周波数は正確な
周波数成分の周波数を与えるものと期待される。また、周波数成分の周波数をあ
る整数で割ることで基本周波数の候補を得ることができる。この不動点から導か
れる基本周波数の候補に対し、大きな占有度を持つ値により大きな重みを置いて
平均を計算することで、雑音下でも精度の高い基本周波数抽出法を構成する。
また、占有度のかわりに信号のパワー、または包絡成分を取り除いたパワーを
用いる方法でも、同様に、精緻化した基本周波数の抽出法を構成できる。一般に
、パワーの強い周波数成分に相当する不動点では、背景雑音に比べて周波数成分
の影響が強いため、不動点の周波数が周波数成分の周波数の良い近似を与えるで
あろうことが期待される。したがって、本発明では、パワーの強い周波数により
大きな重みを置いた基本周波数候補値の平均を計算することで、雑音下でも精度
の高い基本周波数抽出法を構成する。 In order to obtain a refined fundamental frequency, the fixed point defined below is used. S
The center frequency of the frequency bin adjacent to the TFT is ω _c1 , ω _c2 (ω _c1 <ω _c2 ),
When the instantaneous frequency of each frequency bin is φ ′ (ω _c1 ), φ ′ (ω _c2 ), and the following equation is satisfied, the frequency ω at which φ ′ (ω) = ω is obtained between ω _c1 and ω _c2. It exists and is called a fixed point.
φ '(ω _c1 )> ω _c1 andφ' (ω _c2 ) <ω _c2
The frequency of the fixed point is considered to correspond to the frequency of each frequency component of the sound having the fundamental frequency. In particular, a fixed point with a large occupancy value is expected to correspond to a frequency component that is sufficiently stronger than the background noise, so the frequency of this fixed point is expected to give an accurate frequency component frequency. Is done. Further, a fundamental frequency candidate can be obtained by dividing the frequency of the frequency component by a certain integer. By calculating an average with a larger weight on a value having a large occupancy for a fundamental frequency candidate derived from this fixed point, a highly accurate fundamental frequency extraction method is configured even under noise.
Similarly, a method of using a signal power or a power from which an envelope component is removed instead of the occupancy can similarly form a refined method of extracting a fundamental frequency. In general, at a fixed point corresponding to a frequency component with strong power, the influence of the frequency component is stronger than that of background noise, so it is expected that the frequency of the fixed point will give a good approximation of the frequency component frequency. Therefore, in the present invention, a basic frequency extraction method with high accuracy is configured even under noisy conditions by calculating an average of basic frequency candidate values that place a greater weight on a high-power frequency.

さらに、音源分離装置と組み合わせることでより高精度な基本周波数抽出法を
構成する。音源分離装置を用いると、空間的に異なる位置で測定された二つ以上
の入力信号中で、特定の位置の音源から出てくる信号を強調、もしくは抑制する
ことができることが知られている。しかし、この分離信号も、分離結果にある程
度以上のひずみが含まれているため、従来例１、２などの基本周波数抽出法では
、ひずみの影響で基本周波数抽出性能が劣化してしまうことがあった。これに対
し、占有度を利用した基本周波数抽出では、占有的な周波数成分のみで基本周波
数を抽出するためひずみの影響も受けにくい。このため、音源分離装置が抑制す
る雑音の影響を回避しつつ、より高精度な基本周波数抽出を実現できる。 Furthermore, a more accurate fundamental frequency extraction method is configured by combining with a sound source separation device. It is known that when a sound source separation device is used, a signal emitted from a sound source at a specific position can be emphasized or suppressed among two or more input signals measured at spatially different positions. However, since this separation signal also includes a certain degree of distortion in the separation result, the fundamental frequency extraction methods such as the conventional examples 1 and 2 may deteriorate the fundamental frequency extraction performance due to the distortion. It was. On the other hand, in the fundamental frequency extraction using the degree of occupancy, the fundamental frequency is extracted using only the occupying frequency component, so that it is not easily affected by distortion. For this reason, more accurate fundamental frequency extraction can be realized while avoiding the influence of noise suppressed by the sound source separation device.

対数変換を加えていない信号のパワーでは、雑音成分と周波数成分の差が大き
いという性質を持つ。本発明では、信号のパワーのこの性質に着目して、周波数
特性の変形を受けていない信号に対する基本周波数抽出方法を構成する。また、
周波数特性の変形を受けた信号に対しても、その周波数特性の変形を受ける前の
状態にもどす補償方法と組み合わせた基本周波数抽出方法を構成する。これによ
り、背景雑音下で頑健な基本周波数抽出が行えるようになる。 The power of a signal not subjected to logarithmic conversion has a property that a difference between a noise component and a frequency component is large. In the present invention, focusing on this property of signal power, a fundamental frequency extraction method is configured for a signal that has not undergone frequency characteristic deformation. Also,
A fundamental frequency extraction method is configured in combination with a compensation method for returning a signal that has undergone frequency characteristic deformation to a state before undergoing the frequency characteristic deformation. This makes it possible to perform robust fundamental frequency extraction under background noise.

以下この発明の実施の形態の説明に先立ち、関連のある基本周波数抽出装置及
びその方法について説明する。
占有度の抽出（装置）
占有度抽出装置の例を図１に示す。入力部１１から音響信号が入力信号に変換
されて入力され、この入力信号は瞬時周波数抽出部２１で各周波数帯域ごとに各
時刻の入力信号の瞬時周波数φ′（ω₁ ）〜φ′（ω_n ）がそれぞれ抽出される。
ω₁ 〜ω_n は各帯域の中心周波数である。この周波数帯域は例えば５０〜１００
Ｈｚの等間隔の帯域であり、例えば短時間フーリエ変換部２２で３０〜５０ｍｓ
ごとの入力信号が短時間フーリエ変換され、つまり周波数領域に変換され、この
変換されたスペクトルが帯域分割位相検出部２３₁ 〜２３_n により、ｎ個の周波
数帯域に分割され、各帯域において、その複素スペクトルの位相φ（ω₁ ）〜φ
（ω_n ）が検出される。入力信号の周波数領域の変換はウェーブレット変換、余
弦変換など他の手法を用いてもよい。あるいは５０〜１００Ｈｚの間隔の帯域通
過フィルタ（フィルタバンク）により入力信号を帯域ごとに分割し、その各出力
を正弦波とみなし、その位相を求めてもよい。なお、この装置においては一般に
はディジタル処理で行われる。
このようにして帯域ごとの位相φ（ω₁ ）〜φ（ω_n ）が微分部２４₁ 〜２４
_n でそれぞれ微分されて瞬時周波数φ′（ω₁ ）〜φ′（ω_n）とされる。
これら瞬時周波数φ′（ω₁ ）〜φ′（ω_n ）は周波数差抽出部２５に入力さ
れ、各周波数帯域ごとにその中心周波数ω_c （ｃ＝１，２，…，ｎ）を中心とし
た与えられた前後の帯域を含む帯域ω_c −Δω〜ω_c ＋Δωについてその各瞬時
周波数と中心周波数ω_c との差がそれぞれ求められる。つまりφ′（ω₁ −Δω
）−ω₁ 〜φ′（ω₁ ＋Δω）−ω₁ ，φ′（ω₂ −Δω）−ω₂ 〜φ′（ω₂
＋Δω）−ω₂ ，…，φ′（ω_n −Δω）−ω_n 〜φ′（ω_n ＋Δω）−ω_n が
得られる。
なお、積分範囲は想定される基本周波数の５０〜１００％に該当する適当な固
定値としてもよく後述のように適応的に変更してもよい。 Prior to the description of the embodiments of the present invention, a related fundamental frequency extraction apparatus and method will be described.
Occupancy extraction (device)
An example of the occupancy extraction device is shown in FIG. An acoustic signal is converted into an input signal and input from the input unit 11, and this input signal is input by the instantaneous frequency extraction unit 21 for each frequency band instantaneous frequency φ ′ (ω ₁ ) to φ ′ (ω _n ) are extracted.
ω _{1 to} ω _n are the center frequencies of the respective bands. This frequency band is, for example, 50-100.
For example, 30-50 ms in the short-time Fourier transform unit 22.
Each input signal is Fourier-transformed for a short time, that is, converted into the frequency domain, and the converted spectrum is divided into n frequency bands by the band division phase detectors 23 ₁ to 23 _n. Phase of complex spectrum φ (ω ₁ ) to φ
(Ω _n ) is detected. Other methods such as wavelet transform and cosine transform may be used for transforming the input signal in the frequency domain. Alternatively, the input signal may be divided for each band by a band-pass filter (filter bank) having an interval of 50 to 100 Hz, and each output may be regarded as a sine wave to obtain the phase. In this apparatus, digital processing is generally performed.
In this way, the phases φ (ω ₁ ) to φ (ω _n ) for each band are differentiated from the differential units 24 _{1 to} 24.
Differentiated by _n , the instantaneous frequencies φ ′ (ω ₁ ) to φ ′ (ω _n ) are obtained.
These instantaneous frequencies φ ′ (ω ₁ ) to φ ′ (ω _n ) are input to the frequency difference extraction unit 25 and centered on the center frequency ω _c (c = 1, 2,..., N) for each frequency band. The difference between each instantaneous frequency and the center frequency ω _c is obtained for each of the bands ω _c −Δω to ω _c + Δω including the given front and rear bands. That is, φ ′ (ω ₁ −Δω
) −ω _{1 to} φ ′ (ω ₁ + Δω) −ω ₁ , φ ′ (ω ₂ −Δω) −ω _{2 to} φ ′ (ω ₂
+ Δω) −ω ₂ ,..., Φ ′ (ω _n −Δω) −ω _{n to} φ ′ (ω _n + Δω) −ω _n are obtained.
The integration range may be an appropriate fixed value corresponding to 50 to 100% of the assumed fundamental frequency, or may be adaptively changed as described later.

一方、入力信号が信号パワー抽出部２６に入力され、各周波数帯域の中心周波
数ω_c の入力信号パワーＳ（ω_c ）² を抽出する。例えば短時間フーリエ変換部
２２などの周波数領域に変換された係数の該当中心周波数ω_c のスペクトルＳ（
ω_c ）を取り出し、それを２乗すればよい。
周波数差抽出部２５からの各周波数差φ′（ω_c −Δω）と信号パワー抽出部
２６からの中心周波数パワーＳ（ω）² とが占有度演算部２７に入力されて占有
度が演算される。占有度は式（１）により定義されたＤ₀ （ω_c ）又は式（３）
あるいは（４）により定義されたＤ_p （ω_c ）を演算して求められる。
占有度Ｄ₀ （ω_c ）を求めるには例えば図２Ａに示すように周波数差φ′（ω
_c −Δω）−ω_c 〜φ′（ω_c ＋Δω）−ω_c が重み付き加算部２７１でパワー
スペクトルＳ（ω_c ）² の重み付き加算が行われる。つまり各周波数差φ′（ｐ
）−ω_c （ｐ＝ω_c −Δω，…，ω_c ，…ω_c ＋Δω）が２乗部２７２で２乗さ
れ、この２乗値（φ′（ｐ）−ω_c ）² に対し乗算部２７３にＳ（ω_c ）² が乗
算され、加算部２７４で加算され、重み付き加算結果Σ（φ′（ｐ）−ω_c ）²
・Ｓ（ω_c ）² （Σはｐ＝ω_c −Δωからｐ＝ω_c ＋Δωまで）が得られる。
一方、その周波数差の帯域ω_c −Δω〜ω_c ＋Δωの各周波数の入力信号のパ
ワースペクトルＳ（ω_c −Δω）² 〜Ｓ（ω_c ＋Δω）² が加算部２７５に入力
され、これらが加算され、その加算値により、重み付き加算部２７１よりの加算
値が割算部２７６で割算されて、Ｂ（ω_c ）² が求まる。更に逆数・対数演算部
２７８でＢ（ω_c ）² の逆数の対数ｌｏｇ（１／Ｂ（ω_c ）² ）＝Ｄ₀ （ω_c ）
が演算されて出力される。 On the other hand, the input signal is input to the signal power extraction unit 26, and the input signal power S (ω _c ) ² of the center frequency ω _{c in} each frequency band is extracted. For example, the spectrum S (of the corresponding center frequency ω _c of the coefficient transformed into the frequency domain such as the short-time Fourier transform unit 22
ω _c ) is taken out and squared.
Each frequency difference φ ′ (ω _c −Δω) from the frequency difference extractor 25 and the center frequency power S (ω) ² from the signal power extractor 26 are input to the occupancy calculator 27 to calculate the occupancy. The The occupancy is D ₀ (ω _c ) defined by equation (1) or equation (3)
Alternatively, it is obtained by calculating D _p (ω _c ) defined by (4).
To determine the occupancy D ₀ (ω _c ), for example, as shown in FIG. 2A, the frequency difference φ ′ (ω
_c− Δω) −ω _{c to} φ ′ (ω _c + Δω) −ω _c is subjected to weighted addition of the power spectrum S (ω _c ) ^{2 by} the weighted addition unit 271. That is, each frequency difference φ ′ (p
) −ω _c (p = ω _c −Δω,..., Ω _c ,..., Ω _c + Δω) is squared by the square unit 272, and this square value (φ ′ (p) −ω _c ) ^{2 is} multiplied. The unit 273 is multiplied by S (ω _c ) ² , added by the adding unit 274, and the weighted addition result Σ (φ ′ (p) −ω _c ) ²
S (ω _c ) ² (Σ is p = ω _c −Δω to p = ω _c + Δω) is obtained.
On the other hand, the power spectrum of each frequency of the input signal of the frequency difference band ω _{_c} -Δω~ω _c + Δω of _{^{S (ω c -Δω) 2 ~S}} (ω c + Δω) 2 are input to the adder 275, these The addition value is added, and the addition value from the weighted addition unit 271 is divided by the division unit 276 to obtain B (ω _c ) ² . Further reciprocal-logarithmic arithmetic unit 278 B (ω _c) of the ^second inverse logarithm _{log (1 / B (ω c} ) 2) = D 0 (ω c)
Is calculated and output.

式（３）による占有度Ｄ_p （ω_c ）を求めるには例えば図２Ｂに示すように、
各帯域の中心周波数のパワーＳ（ω_c ）² がべき乗部２７９でａべき乗され、そ
の結果Ｓ（ω_c ）^2aに対し、対数演算部２８１で対数演算される。一方、図２Ａ
で求めたＤ₀ （ω_c ）が乗算部２８２でｂ倍され、この結果ｂＤ₀ （ω_c ）と対
数演算部２８１の出力ｌｏｇ（Ｓ（ω_c ）^2a）とが加算部２８３で加算されて、
Ｄ_p （ω_c ）として出力される。
式（４）による占有度Ｄ_p （ω_c ）を求めるには例えば図２Ｃに示すように、
Ｓ（ω_c ）² がべき乗部２７９でａべき乗され、一方図２Ａ中の割算部２７６の
出力Ｂ（ω_c ）² がべき乗部２８４でｂべき乗され、これらべき乗結果が割算部
２８５で割算され、Ｓ（ω_c ）^2a／Ｂ（ω_c ）^2bが計算され、この結果に対し対
数演算部２８５で対数がとられてＤ_p （ω_c ）として出力される。
図２Ｂ及び図２Ｃにおいてａ＝ｂとしてもよい。この場合は図２Ｂでべき乗算
部２７９、乗算部２８２は省略され、図２Ｃでべき乗算部２７９，２８４が省略
される。なおａ，ｂは０より大きい値であればよく、Ｓ（ω_c ）² とＤ₀ （ω_c
）又はＢ（ω_c ）² との何れを重視するか、かつその重視の程度により、ａ，ｂ
を決定する。これは入力信号の雑音混入状態などにより決定される。 In order to obtain the occupancy D _p (ω _c ) according to the equation (3), for example, as shown in FIG.
The power S (ω _c ) ² of the center frequency of each band is multiplied by a power by the power unit 279, and as a result, the logarithm calculation unit 281 performs logarithm calculation on S (ω _c ) ² a. On the other hand, FIG.
D ₀ (ω _c ) obtained in step ( _b ) is multiplied by b by the multiplier 282, and the result bD ₀ (ω _c ) and the output log (S (ω _c ) ^2a ) of the logarithmic calculator 281 are added by the adder 283. And
It is output as D _p (ω _c ).
To obtain the occupancy D _p (ω _c ) according to the equation (4), for example, as shown in FIG.
S (ω _c ) ² is a-powered by the power unit 279, while the output B (ω _c ) ² of the dividing unit 276 in FIG. 2A is b-powered by the power unit 284, and the power result is obtained by the dividing unit 285. Division is performed to calculate S (ω _c ) ^2a / B (ω _c ) ^{2b, and a} logarithm is calculated by the logarithmic operation unit 285 and output as D _p (ω _c ).
It is good also as a = b in FIG. 2B and FIG. 2C. In this case, the power multiplier 279 and the multiplier 282 are omitted in FIG. 2B, and the power multipliers 279 and 284 are omitted in FIG. 2C. It should be noted that a and b need only be larger than 0, and S (ω _c ) ² and D ₀ (ω _c
) Or B (ω _c ) ² and a or b depending on the importance
To decide. This is determined by the noise mixing state of the input signal.

占有度演算部２７では周波数差を中心周波数パワーＳ（ω_c ）² の重み付き加
算したが、この重み付きを省略し、つまり図２Ａで乗算部２７３を省略して周波
数差を加算してもよい。つまり単なる周波数差の加算でも対数パワースペクトル
よりも調波構造が強調される。場合によってはパワーによる正規化も省略しても
よい。つまり図２Ａで加算部２７５、割算部２７６を省略してもよい。
式（２）中の積分範囲、つまりω_c −Δω〜ω_c ＋Δωは固定としてもよいが
、入力信号の基本周波数の概算値により適応的に変更することが望ましい。つま
り図１中に破線で示すように、積分範囲決定部２８を設け、この積分範囲決定部
２８で決定されたΔωが周波数差抽出部２５に入力され、演算する周波数差の周
波数範囲ω_c −Δω〜ω_c ＋Δωが決定される。
つまり入力音声の基本周波数によって、積分範囲の最適値は変化するため、よ
りよい精度で基本周波数を求めるためには、より適切な積分範囲を選択すること
が望ましい。例えば、入力信号の音源である話者が男性か女性かが事前にわかっ
ていることを想定して、それぞれに最適な固定の積分範囲、例えば男性の場合Δ
ωを約８０Ｈｚ、女性の場合Δωを約１４０Ｈｚ程度とし、これを積分範囲決定
部２８に設定する。また、別の方法では、式（２）を適用する前に、従来技術の
項で説明した基本周波数抽出法、その他の方法など別の基本周波数抽出法を用い
て積分範囲決定部２８で基本周波数の初期推定値Ｆ₀を求め、その初期推定基本
周波数に応じて例えば基本周波数に対して２・Δωを約５０〜１００％程度、好
ましくは２・Δω≒ Ｆ₀×０．７５として積分範囲を決定し、そのΔωが周波数
差抽出部２５へ供給されるようにしてもよい。 In the occupancy calculator 27, the frequency difference is added with the weight of the center frequency power S (ω _c ) ² , but this weighting is omitted, that is, the multiplier 273 is omitted in FIG. 2A and the frequency difference is added. Good. That is, the harmonic structure is emphasized more than the logarithmic power spectrum even by simply adding the frequency difference. In some cases, normalization by power may be omitted. That is, the adding unit 275 and the dividing unit 276 may be omitted in FIG. 2A.
Integration range in the equation (2), i.e. ω _{_c} -Δω~ω _c + Δω is may be fixed, it is desirable to adaptively modify the estimate of the fundamental frequency of the input signal. That is, as shown by a broken line in FIG. 1, an integration range determination unit 28 is provided, and Δω determined by the integration range determination unit 28 is input to the frequency difference extraction unit 25, and the frequency range ω _c − of the frequency difference to be calculated. Δω˜ω _c + Δω is determined.
In other words, since the optimum value of the integration range varies depending on the fundamental frequency of the input speech, it is desirable to select a more appropriate integration range in order to obtain the fundamental frequency with better accuracy. For example, assuming that it is known in advance whether the speaker that is the sound source of the input signal is male or female, the optimal fixed integration range for each, for example, Δ
ω is set to about 80 Hz, and in the case of a woman, Δω is set to about 140 Hz, and this is set in the integration range determination unit 28. In another method, the fundamental frequency is extracted by the integration range determination unit 28 using another fundamental frequency extraction method such as the fundamental frequency extraction method described in the section of the prior art or other methods before applying the expression (2). The initial estimated value F ₀ is obtained, and, for example, 2 · Δω is about 50 to 100% with respect to the fundamental frequency in accordance with the initial estimated fundamental frequency, preferably 2 · Δω≈F ₀ × 0.75. The Δω may be determined and supplied to the frequency difference extraction unit 25.

占有度の抽出（方法）
次に前述した占有度抽出装置における処理手順、つまり占有度の抽出方法を以
下に説明する。
図３に基本的手順の例を示す。入力信号の各周波数帯域ごとの瞬時周波数を瞬
時周波数抽出過程（Ｓ１）で抽出する。この瞬時周波数抽出は先の装置説明で行
ったように例えば入力信号を短時間フーリエ変換により周波数領域信号に変換し
（Ｓａ）、この周波数領域信号を狭い周波数帯域の信号に帯域分割し（Ｓｂ）、
この各帯域の信号の位相φ（ω_c ）をそれぞれ抽出し（Ｓｃ）、その各位相φ（
ω_c ）を微分して瞬時周波数φ′（ω_c ）を求める（Ｓｄ）。
これら瞬時周波数φ′（ω_c ）について、その中心周波数ω_c を中心とした前
後の帯域を含むω_c −Δω〜ω_c ＋Δωの範囲の各値から中心周波数をω_c を差
し引いて周波数差を抽出する（Ｓ２）。
これら周波数差の各ω_c −Δω〜ω_c ＋Δωの成分の和を求め、この和を用い
てそのω_c の占有度を演算する（Ｓ３）。
このステップＳ３の占有度演算における占有度Ｄ₀ （ω_c ）を求める場合の例
を図４を参照して説明する。まず各帯域について周波数差のパワースペクトルの
重み付き加算を行う（Ｓ１）。即ち各ω_c についてω_c −Δω〜ω_c ＋Δωの帯
域における各周波数差を２乗し（Ｓ１ａ）、その２乗値にパワースペクトルＳ（
ω_c ）² を乗算し（Ｓ１ｂ）、このパワースペクトルを乗算したものを、この帯
域ω_c −Δω〜ω_c ＋Δωについて加算する（Ｓｃ）。
一方、各中心周波数ω_c について同一の帯域ω_c −Δω〜ω_c ＋Δωのパワー
スペクトルの和を求め（Ｓ２）、このパワースペクトルの和で、同一帯域の前記
重み付き和を割算して正規化してＢ（ω_c ）² を求める（Ｓ３）。そのＢ（ω_c
）² の逆数を取り、その逆数に対数演算を行ってＤ₀ （ω_c ）を得る（Ｓ４）。
図４Ａにおいて、ステップＳ１とＳ２は順を逆にしてもよい。 Occupancy extraction (method)
Next, a processing procedure in the above-described occupation degree extraction apparatus, that is, a method for extracting the occupation degree will be described below.
FIG. 3 shows an example of a basic procedure. The instantaneous frequency for each frequency band of the input signal is extracted in the instantaneous frequency extraction process (S1). This instantaneous frequency extraction is performed, for example, by converting the input signal into a frequency domain signal by a short-time Fourier transform (Sa), and dividing the frequency domain signal into a narrow frequency band signal (Sb). ,
The phase φ (ω _c ) of each band signal is extracted (Sc), and each phase φ (
ω _c ) is differentiated to obtain an instantaneous frequency φ ′ (ω _c ) (Sd).
For these instantaneous frequencies φ ′ (ω _c ), subtract the center frequency ω _c from each value in the range of ω _c −Δω to ω _c + Δω including the bands before and after the center frequency ω _c as the center to obtain the frequency difference. Extract (S2).
Calculates the sum of the components of each ω _{_c} -Δω~ω _c + Δω of the frequency difference, it calculates the occupancy of the omega _c using the sum (S3).
An example of obtaining the occupancy D ₀ (ω _c ) in the occupancy calculation of step S3 will be described with reference to FIG. First, weighted addition of the power spectrum of the frequency difference is performed for each band (S1). That each frequency difference in the band of ω _{_c} -Δω~ω _c + Δω for each omega _c 2 squared (S1a), the power spectrum S to the square value (
ω _c ) ² is multiplied (S1b), and this power spectrum is multiplied for this band ω _c −Δω to ω _c + Δω (Sc).
On the other hand, for each center frequency omega _c calculates the sum of the power spectrum of the same band _{_{ω c -Δω~ω c + Δω (S2}} ), the sum of the power spectrum, by dividing the weighted sum of the same band regular To obtain B (ω _c ) ² (S3). B (ω _c
) Take the reciprocal of ² , and perform logarithmic operation on the reciprocal to obtain D ₀ (ω _c ) (S4).
In FIG. 4A, steps S1 and S2 may be reversed in order.

次に占有度Ｄ_p （ω_c ）を式（３）により求めた順を図５Ａを参照して説明す
る。図４Ａで求めた占有度Ｄ₀ （ω_c ）に重み定数ｂを乗算してｂＤ₀ （ω_c ）
を求め（Ｓ１）、またω_c のパワースペクトルに対し重み定数ａをべき乗してＳ
（ω_c ）^2aを求め（Ｓ２）、その対数ｌｏｇ（Ｓ（ω_c ）^2a）を演算し（Ｓ３）
、これとｂＤ₀ （ω_c ）を加算して占有度Ｄ_p （ω_c ）とする（Ｓ４）。ステッ
プＳ１〜Ｓ３の順は任意でよい。
更に式（４）による占有度Ｄ_p （ω_c ）を求める手順を図５Ｂを参照して説明
する。図４Ａ中のステップＳ３で求めたＢ（ω_c ）² に対し重み定数ｂのべき乗
を計算し（Ｓ１）、またω_c のパワースペクトルに対し重み定数ａのべき乗を計
算し（Ｓ２）、これらべき乗算結果の比Ｓ（ω_c ）^2a／Ｂ（ω_c ）^2bを求め（Ｓ
３）、この比の対数をとり占有度Ｄ_p （ω_c ）とする（Ｓ４）。ここでステップ
Ｓ１とＳ２は何れを先に行ってもよい。 Next, the order in which the occupancy D _p (ω _c ) is obtained by Expression (3) will be described with reference to FIG. 5A. The occupancy D ₀ (ω _c ) obtained in FIG. 4A is multiplied by a weight constant b to obtain bD ₀ (ω _c ).
(S1), and the power constant of ω _c is raised to the power of the weighting constant a.
(Ω _c ) ^2a is obtained (S2), and its logarithm log (S (ω _c ) ^2a ) is calculated (S3).
These and bD ₀ (ω _c ) are added to obtain an occupancy D _p (ω _c ) (S4). The order of steps S1 to S3 may be arbitrary.
Further, a procedure for obtaining the occupancy D _p (ω _c ) according to the equation (4) will be described with reference to FIG. 5B. The power of weight constant b is calculated for B (ω _c ) ² obtained in step S3 in FIG. 4A (S1), and the power of weight constant a is calculated for the power spectrum of ω _c (S2). The power multiplication result ratio S (ω _c ) ^2a / B (ω _c ) ^2b is obtained (S
3) The logarithm of this ratio is taken as the occupation degree D _p (ω _c ) (S4). Here, either step S1 or S2 may be performed first.

図３乃至図５を参照して説明した占有度抽出方法について、先に説明した占有
度抽出装置における変形は同様に行うことができ、また各種条件も同様である。
例えば積分範囲Δωの適応的決定も同様にこの方法にも適用できる。図１中の瞬
時周波数抽出部２１、図３中の瞬時周波数抽出ステップＳ１における瞬時周波数
抽出の手法としてはこれらの図に示した手法に限らず、例えば「Ｌ．コーエン著
、『時間−周波数解析』（吉川昭・佐藤俊輔訳）、第２章、朝倉書店（１９９８
）」に示す手法、その他の手法を用いてもよい。 The occupancy extraction method described with reference to FIGS. 3 to 5 can be similarly modified in the occupancy extraction apparatus described above, and various conditions are also the same.
For example, the adaptive determination of the integration range Δω can be applied to this method as well. The instantaneous frequency extraction unit 21 in FIG. 1 and the instantaneous frequency extraction method in the instantaneous frequency extraction step S1 in FIG. 3 are not limited to the methods shown in these drawings. For example, “L. Cohen”, “Time-frequency analysis”. (Translated by Akira Yoshikawa and Shunsuke Sato), Chapter 2, Asakura Shoten (1998)
) "Or other methods may be used.

基本周波数抽出（装置）
次に上述した占有度抽出装置を用いた基本周波数抽出装置の例を説明する。
図６に示すように入力部１１からの入力信号は前述した占有度抽出装置（以下
では占有度抽出部と記す）３１は入力されて、各帯域の占有度Ｄ₀ （ω₁ ）〜
Ｄ₀ （ω_n ）又はＤ_p（ω₁ ）〜Ｄ_p （ω_n ）が抽出される。これら占有度は周期
性演算部３２に入力されて、周波数軸上での占有度の周期性が演算される。例え
ば各時刻、例えば３０〜５０ミリ秒ごとに得られる占有度スペクトルＤ₀（ω₁ ）
〜Ｄ₀ （ω_n ）又はＤ_p （ω₁ ）〜Ｄ_p（ω_n ）に対し短時間フーリエ逆変換を行
い、スペクトルピークの周期性Ｐ₀（Ｔ₁ ）〜Ｐ₀ （Ｔ_n ）が抽出される。この周
期性は例えば図１６に横軸に時間（周期）Ｔを縦軸にレベルをとって示すように
なる。
これら周期性Ｐ₀ （Ｔ₁ ）〜Ｐ₀（Ｔ_n ）は最大値抽出部３３に入力され、そ
の最大値を与える周期Ｔ₀ が抽出され、その周期Ｔ₀ の逆数が逆数計算部３４で
計算され、基本周波数Ｆ₀ ＝１／Ｔ₀ として出力される。 Basic frequency extraction (device)
Next, an example of a fundamental frequency extraction device using the above-described occupancy degree extraction device will be described.
As shown in FIG. 6, the input signal from the input unit 11 is input to the above-described occupancy degree extraction device (hereinafter referred to as an occupancy degree extraction unit) 31 and the occupancy levels D ₀ (ω ₁ ) to.
D ₀ (ω _n ) or D _p (ω ₁ ) to D _p (ω _n ) is extracted. These occupancy levels are input to the periodicity calculation unit 32, and the periodicity of the occupancy levels on the frequency axis is calculated. For example, the occupancy spectrum D ₀ (ω ₁ ) obtained at each time, for example, every 30 to 50 milliseconds.
˜D ₀ (ω _n ) or D _p (ω ₁ ) to D _p (ω _n ) is subjected to a short-time inverse Fourier transform, and the spectral peak periodicities P ₀ (T ₁ ) to P ₀ (T _n ) are Extracted. This periodicity is, for example, shown in FIG. 16 with time (period) T on the horizontal axis and level on the vertical axis.
These periodicity _{_{_{P 0 (T 1) ~P 0}}} (T n) are input to the maximum value extraction unit 33, the period T ₀ is extracted to provide the maximum value, the inverse of the period T ₀ is the inverse calculation unit 34 Calculated and output as a fundamental frequency F ₀ = 1 / T ₀ .

次に基本周波数抽出装置の他の例を図７を参照して説明する。
図６に示した場合と同様に、入力部１１からの入力信号は占有度抽出部３１で占
有度（スペクトル）が抽出される。この例においてはこれら占有度スペクトルは
調波構造占有度演算部３５に入力され、以下で定義される調波構造に関する占有
度の和Ｄ_t0（ω₀）（もしくはＤ_tp（ω₀））を最大にするω₀ を求めることで
基本周波数を求める。
Ｄ_t0（ω₀ ）＝Σ_q Ｄ₀ （ｒ（ｑ・ω₀ ））（５）
Ｄ_tp（ω₀ ）＝Σ_q Ｄ_p （ｒ（ｑ・ω₀ ））（６）
ここで、ω₀ は任意の周波数、ｑは高調波の次数、ｒ（・）はｑ・ω₀ で求ま
る周波数を、占有度抽出に用いられた帯域分割における帯域中心周波数ω_c の最
も近い周波数に変換する関数である。ｑの次はいくらでも高い値としてもよいが
、演算量を単に増加させることになる。この点からｑ・ω₀ の値が１５００Ｈｚ
程度乃至は３０００Ｈｚ程度までとしても十分である。 Next, another example of the fundamental frequency extraction device will be described with reference to FIG.
As in the case shown in FIG. 6, the occupancy (spectrum) is extracted by the occupancy extraction unit 31 from the input signal from the input unit 11. In this example, these occupancy spectra are input to the harmonic structure occupancy calculator 35, and the sum of the occupancy D _t0 (ω ₀ ) (or D _tp (ω ₀ )) relating to the harmonic structure defined below is obtained. The fundamental frequency is obtained by obtaining ω ₀ that is maximized.
_{_{D t0 (ω 0) = Σ}} q D 0 (r (q · ω 0)) (5)
_{_{D tp (ω 0) = Σ}} q D p (r (q · ω 0)) (6)
Here, ω ₀ is an arbitrary frequency, q is a harmonic order, r (·) is a frequency obtained by q · ω ₀ , and the frequency closest to the band center frequency ω _c in the band division used for occupancy extraction. Is a function that converts to The value after q may be any value, but the amount of calculation is simply increased. From this point, q · ω ₀ is 1500Hz
It is sufficient even if it is about to about 3000 Hz.

調波構造占有度演算部３５で演算されたＤ_t0（ω₁ ）〜Ｄ_t0（ω_n ）又はＤ_tp
（ω₁ ）〜Ｄ_tp（ω_n ）が最大値抽出部３６に入力され、これらの中の最大値が
抽出され、その最大値を与えるＤ_t0（ω_c ）又はＤ_tp（ω_c ）と対応するω₀ が
基本周波数Ｆ₀ として出力される。
調波構造占有度演算部３５は例えば図８に示すように乗算部３５１に順次ω₀
を設定してその各ω₀ についてｑ・ω₀ を計算する。男性のピッチ周期の平均を
１２５Ｈｚとすると、９０Ｈｚ〜１００Ｈｚ程度から１乃至数Ｈｚずつ増加した
周波数をω₀ として順次設定すればよい。乗算部３５１の乗算結果ｑ・ω₀ は対
応中心周波数検出部３５２に入力されて、ω₁ 〜ω_n 中のｑ・ω₀ に最も近いω
_c がω_cqとして求められ、占有度取出部３５３において各ω_cqに対する占有度Ｄ
₀ （ω_cq）又はＤ_p （ω_cq）を取り出し、各ω₀ について取り出された各ｑの占
有度が加算されてＤ_t0（ω₀ ）又はＤ_tp（ω₀ ）として出力される。
占有度Ｄ₀ （ω_c ）を利用する場合は次式を最大とするω₀ を求めることによ
り、式（５）を用いる場合より更に雑音に強い基本周波数抽出装置とすることが
できる。
Ｄ_t0（ω₀ ）＝Σ_q （Ｄ₀ （ｒ（ｑ・ω₀ ））−Ｄ_0AV）（７）
ここでＤ_0AVは占有度Ｄ₀ （ω₁ ）〜Ｄ₀ （ω_n ）の平均値である。
この場合は図８中に破線で示すように平均値計算部３５５においてＤ₀ （ω₁
）〜Ｄ₀ （ω_n ）の平均値Ｄ_0AVが計算され、加算部３５６でΣ_q（Ｄ₀ （ω_cq
）−Ｄ_0AV ）が計算され、Ｄ_t0（ω₀ ）として出力される。
占有度Ｄ_p （ω_c ）を利用する場合は、Ｄ_p（ω₁ ）〜Ｄ_p（ω_n ）を時間系
列とみなして高域通過フィルタ処理を行い、そのフィルタ処理したＤ_p （ω₁）
〜Ｄ_p （ω_n ）を式（６）に用いることにより更に高い精度の基本周波数抽出装
置とすることができる。つまり図８中に破線で示すようにフィルタ処理部３５７
でＤ_p （ω₁ ）〜Ｄ_p（ω_n ）がこれらを時間系列とみなして高域通過フィルタ
処理され、この系列の変化における細かい変化成分Ｄ′_p （ω₁ ）〜Ｄ′_p （ω
_n ）が取り出され、検出された各ω_cqと対応するＤ′_p （ω_cq）が占有度取出部
３５８で取り出され、これらが加算部３５９で加算され、Ｄ_tp（ω₀ ）＝Σ_q Ｄ
′_p （ω_cq）として出力される。 D _t0 (ω ₁ ) to D _t0 (ω _n ) or D _tp calculated by the harmonic structure occupancy calculating unit 35
(Ω ₁ ) to D _tp (ω _n ) are input to the maximum value extraction unit 36, the maximum value among these is extracted, and D _t0 (ω _c ) or D _tp (ω _c ) giving the maximum value The corresponding ω ₀ is output as the fundamental frequency F ₀ .
For example, as shown in FIG. 8, the harmonic structure occupancy calculating unit 35 sequentially supplies the multiplication unit 351 with ω _0.
And q · ω ₀ is calculated for each ω ₀ . If the average pitch period of men is 125 Hz, the frequency increased from about 90 Hz to about 100 Hz by 1 to several Hz may be sequentially set as ω ₀ . Multiplication result q · omega ₀ multiplier 351 is input to the corresponding center frequency detecting unit 352, closest to q · omega ₀ in ω ₁ ~ω _n ω
_c is obtained as omega _cq, occupancy D for each omega _cq in occupancy extraction unit 353
₀ (ω _cq ) or D _p (ω _cq ) is extracted, and the occupancy of each q extracted for each ω ₀ is added and output as D _t0 (ω ₀ ) or D _tp (ω ₀ ).
When the occupancy D ₀ (ω _c ) is used, by obtaining ω ₀ that maximizes the following equation, a fundamental frequency extraction device that is more resistant to noise than when using Equation (5) can be obtained.
D _t0 (ω ₀ ) = Σ _q (D ₀ (r (q · ω ₀ )) − D _0AV ) (7)
Here, D _0AV is an average value of occupancy degrees D ₀ (ω ₁ ) to D ₀ (ω _n ).
In this case, as indicated by a broken line in FIG. 8, the average value calculation unit 355 uses D ₀ (ω ₁
) To D ₀ (ω _n ), an average value D _0AV is calculated, and the adder 356 calculates Σ _q (D ₀ (ω _cq
) −D _0AV ) is calculated and output as D _t0 (ω ₀ ).
When the occupancy D _p (ω _c ) is used, D _p (ω ₁ ) to D _p (ω _n ) are regarded as a time series, high-pass filtering is performed, and the filtered D _p (ω ₁ )
By using ~ D _p (ω _n ) in the equation (6), it is possible to obtain a fundamental frequency extraction device with higher accuracy. That is, as shown by the broken line in FIG.
D _p (ω ₁ ) to D _p (ω _n ) are regarded as time series and are subjected to high-pass filtering, and fine change components D ′ _p (ω ₁ ) to D ′ _p (ω
_n ) is extracted, D ′ _p (ω _cq ) corresponding to each detected ω _cq is extracted by the occupancy extraction unit 358, and these are added by the addition unit 359, and D _tp (ω ₀ ) = Σ _q D
It is output as ′ _p (ω _cq ).

所で図６に示した基本周波数抽出装置は雑音に強く、図７に示した基本周波数
抽出装置は精度が高い性質がある。このような点から、図６に示すように、占有
度スペクトルの周期性を演算し、その最大値の周期を抽出して、その逆数から基
本周波数Ｆ₀ を求め、図６中に破線で示すように、その基本周波数Ｆ₀ が調波構
造占有度利用基本周波数抽出部３８へ供給され、この抽出部３８では入力された
基本周波数Ｆ₀ の近傍、例えばＦ₀ ±Ｆ₀ の１０％の各周波数をω₀ として、先
に説明した、図７及び図８に示した調波構造占有度演算が行われ、式（５）又は
（６）あるいは（７）若しくはΣ_q Ｄ′_p （ｒ（ｑ・ω₀ ））を最大にするω₀
が求められ、そのω₀ が正しい基本周波数Ｆ₀ として出力される。このようにす
れば、雑音に強く、かつ精度が高い基本周波数抽出装置が構成される。 The fundamental frequency extraction device shown in FIG. 6 is resistant to noise, and the fundamental frequency extraction device shown in FIG. 7 is highly accurate. From this point, as shown in FIG. 6, the periodicity of the occupancy spectrum is calculated, the period of the maximum value is extracted, the fundamental frequency F ₀ is obtained from the reciprocal thereof, and is shown by a broken line in FIG. As described above, the fundamental frequency F ₀ is supplied to the harmonic structure occupation degree utilization fundamental frequency extraction unit 38, and in this extraction unit 38, each of the vicinity of the inputted fundamental frequency F ₀ , for example, 10% of F ₀ ± F _0. The harmonic structure occupancy calculation described above with reference to FIGS. 7 and 8 is performed with the frequency as ω ₀ , and Equation (5) or (6) or (7) or Σ _q D ′ _p (r ( q · ω ₀ )) to maximize ω ₀
And ω ₀ is output as the correct fundamental frequency F ₀ . In this way, a fundamental frequency extraction device that is resistant to noise and has high accuracy is configured.

基本周波数抽出（方法）
次に先に説明した基本周波数抽出装置の処理手順、つまり基本周波数抽出方法
の例を説明する。
図９は図６に示した装置と対応するものであり、まず、図３乃至図５に示した
この発明による占有度抽出方法により、入力信号からのその占有度（スペクトル
）Ｄ₀ （ω_c ）又はＤ_p （ω_c）を抽出し（Ｓ１）、この占有度スペクトルの周
波数軸上での占有度の周期性を演算する、例えば、各時刻ごとの占有度スペクト
ルに短時間フーリエ変換して周期性を求める（Ｓ２）。この占有度の周期性の最
大値を与える周期（時間）Ｔ₀ を抽出し（Ｓ３）、その周期Ｔ₀ の逆数１／Ｔ₀
＝Ｆ₀ を求めて基本周波数Ｆ₀ を得る（Ｓ４）。 Basic frequency extraction (method)
Next, the processing procedure of the fundamental frequency extraction apparatus described above, that is, an example of the fundamental frequency extraction method will be described.
FIG. 9 corresponds to the apparatus shown in FIG. 6. First, the occupancy (spectrum) D ₀ (ω _c ) from the input signal is obtained by the occupancy extraction method according to the present invention shown in FIGS. ) Or D _p (ω _c ) is extracted (S1), and the periodicity of the occupancy on the frequency axis of this occupancy spectrum is calculated. For example, the occupancy spectrum at each time is Fourier-transformed for a short time. Periodicity is obtained (S2). The period (time) T ₀ giving the maximum value of the periodicity of the occupancy is extracted (S3), and the reciprocal 1 / T ₀ of the period T ₀ is extracted.
= F ₀ is obtained to obtain the fundamental frequency F ₀ (S4).

次に図７に示した装置と対応する基本周波数抽出方法の例を図１０を参照して
説明する。先の場合と同様に図３乃至図５に示したこの発明による占有度抽出方
法により、入力信号からその占有度（スペクトル）Ｄ₀ （ω_c ）又はＤ_p （ω_c ）
を抽出する（Ｓ１）。次にこの実施例ではその占有度に対して、複数のある周波
数ω₀ について、その整数倍の周波数に関する占有度の和をそれぞれ求めて調波
構造占有度Ｄ_t0（ω₀ ）又はＤ_tp（ω₀ ）を求める（Ｓ２）。
このステップＳ２は例えば各ω₀ をｑ倍（ｑ＝１，２，…）し（Ｓ２ａ）、そ
の各ｑ・ω₀ と最も近いω_c 、つまり占有度を抽出する際に入力信号を狭い周波
数帯域に分割した時の各帯域の中心周波数ω₁ ,…，ω_n 中のｑ・ω₀ に最も近
いものを求める、そのω_c をω_cqと書く（Ｓ２ｂ）。求めた各ω_cqの占有度Ｄ₀
（ω_cq）又はＤ_p （ω_cq）を求め（Ｓ２ｃ）、更にその各ω₀ についてその求め
たＤ₀ （ω_cq）又はＤ_p （ω_cq）の和Σ_qＤ₀ （ω_cq）又はΣ_q Ｄ_p （ω_cq）を
求め、つまり調波構造占有度Ｄ_t0（ω₀ ）又はＤ_tp（ω₀ ）を得る（Ｓ２ｄ）。
このようにして求めた各ω₀ に対する調波構造占有度Ｄ_t0（ω₀ ）又はＤ_tp（
ω₀ ）中の最大のものを抽出し、その抽出した最大のＤ_t0（ω₀ ）又はＤ_tp（ω
₀ ）のω₀ を基本周波数Ｆ₀ とする（Ｓ３）。 Next, an example of a fundamental frequency extraction method corresponding to the apparatus shown in FIG. 7 will be described with reference to FIG. As in the previous case, the occupancy (spectrum) D ₀ (ω _c ) or D _p (ω _c ) is determined from the input signal by the occupancy extraction method according to the present invention shown in FIGS.
Is extracted (S1). Next, in this embodiment, with respect to the occupancy, the sum of the occupancy for a plurality of frequencies ω ₀ is obtained for each of the frequencies, and the harmonic structure occupancy D _t0 (ω ₀ ) or D _tp ( (ω ₀ ) is obtained (S2).
In this step S2, for example, each ω ₀ is multiplied by q (q = 1, 2,...) (S2a), and ω _c closest to each q · ω ₀ , that is, when the occupancy is extracted, the input signal is narrowed. The one closest to q · ω ₀ in the center frequencies ω ₁ ,..., Ω _n of each band when divided into bands is calculated, and ω _c is written as ω _cq (S2b). Occupancy degree D ₀ of each obtained ω _cq
(Ω _cq ) or D _p (ω _cq ) is obtained (S 2 c), and for each ω ₀ , the sum of the obtained D ₀ (ω _cq ) or D _p (ω _cq ) Σ _q D ₀ (ω _cq ) or Σ _q D _p (ω _cq ) is obtained, that is, the harmonic structure occupancy D _t0 (ω ₀ ) or D _tp (ω ₀ ) is obtained (S2d).
In this way, the harmonic structure occupancy for each omega ₀ obtained D _t0 (ω ₀₎ or D _tp (
The largest one of ω ₀ ) is extracted, and the extracted maximum D _t0 (ω ₀ ) or D _tp (ω
The ω ₀ of ₀₎ and the fundamental frequency F ₀ (S3).

この図１０に示す方法においては図８を参照して説明したと同様の変形例が考
えられる。つまり図１０中に破線で示すように、ステップＳ２の次に又は予め、
占有度Ｄ₀ （ω₁ ）〜Ｄ₀ （ω_n ）の平均値Ｄ_0AV を計算し（Ｓ４）、各ω₀ に
ついてその求めたＤ₀ （ω_cq）と平均値Ｄ_0AVとの差の和Σ_q （Ｄ₀ （ω_cq）−
Ｄ_0AV）をＤ_t0（ω₀ ）として求め（Ｓ５）、これよりステップＳ３に移り、こ
れらＤ_t0（ω₀ ）中の最大値を与えるω₀ を求めてＦ₀ を得る。
あるいはステップＳ２ｂの次に又は予め占有度Ｄ_p （ω₁ ）〜Ｄ_p （ω_n ）を
時系列として高域通過フィルタ処理を施し、ゆるやかに変化する成分を除く、細
かい変化成分のみからなるＤ′_p （ω₁ ）〜Ｄ′_p （ω_n ）を求め（Ｓ６）、ス
テップＳ２ｃではＤ_p （ω_cq）の代りにＤ′_p （ω_cq）を各ｑについて求め、ス
テップＳ２ｄではＤ_tp＝Σ_q Ｄ′_p （ω_cq）を計算してステップＳ３に移る。
図６中に示したように、占有度の周期性を求め、その最大値を与える周期Ｔ₀
を求め、その逆数Ｆ₀ ＝１／Ｔ₀ を基本周波数として求め、図６中に破線で示す
ように更に調波構造占有度利用基本周波数抽出部３８によりＦ₀ の近傍の周波数
をω₀ として、更に高い精度の基本周波数を求める構成とすることができる。基
本周波数抽出方法においても図９に破線で示すように、ステップＳ４の次にステ
ップＳ４で得られた基本周波数Ｆ₀ の近傍周波数、例えばＦ₀ ±Ｆ₀ ×０．１の
帯域の各周波数をω₀ として図１０に示したステップＳ２以後の処理を行って、
より精度が高い基本周波数を求める（Ｓ５）ようにしてもよい。このステップＳ
５では図１０中に破線で示した各種変形も適用できる。 In the method shown in FIG. 10, a modification similar to that described with reference to FIG. 8 can be considered. That is, as indicated by a broken line in FIG.
The sum of the difference between the occupancy D _{₀ (ω} ₁₎ ~D ₀ calculates the average value D _0AV of (ω _n) (S4), for each omega ₀ the determined D ₀ and (omega _cq) and the average value D _0AV Σ _q (D ₀ (ω _cq ) −
D _0AV) a D _t0 (ω ₀₎ as determined (S5), Turning now to step S3, to obtain a F ₀ seeking omega ₀ giving the maximum value in these D _t0 (ω _0).
Alternatively, after the step S2b or in advance, a high-pass filter process is performed with the occupancy D _p (ω ₁ ) to D _p (ω _n ) as a time series, and D consisting only of fine change components excluding slowly changing components. _'p (ω ₁₎ ~D' seek _{_{p (ω n) (S6)}} , determined for each q a D _'p (ω _cq) in place of step S2c the D _p (ω _cq), step S2d the D _tp = Σ _q D 'by calculating _{p (ω} _cq) proceeds to step S3.
As shown in FIG. 6, the periodicity of the occupancy is obtained and the period T ₀ giving the maximum value is obtained.
Look, Searching for the inverse F _₀ = 1 / T ₀ as the fundamental frequency, as omega ₀ a frequency near the F ₀ by further harmonic structure occupancy utilizing fundamental frequency extraction unit 38 as shown by a broken line in FIG. 6 In addition, the fundamental frequency with higher accuracy can be obtained. Also in the fundamental frequency extraction method, as shown by a broken line in FIG. 9, the frequencies near the fundamental frequency F ₀ obtained in step S4 after step S4, for example, frequencies in a band of F ₀ ± F ₀ × 0.1 are obtained. performing step S2 after the processing shown in FIG. 10 as omega _0,
A fundamental frequency with higher accuracy may be obtained (S5). This step S
5, various modifications indicated by broken lines in FIG. 10 can be applied.

変形例
図１１にこの発明の基本周波数抽出装置の変形例を示す。図６及び図７に示し
た装置と違う点は、占有周期性演算部３２よりの占有度周期性Ｐ₀ （Ｔ₁ ）〜
Ｐ₀ （Ｔ_n ）又は調波構造占有度演算部３５よりの占有度和Ｄ_t0（ω₁ ）〜Ｄ
_t0（ω_n ）あるいはＤ_tp（ω₁ ）〜Ｄ_tp（ω_n ）は基本周期又は基本周波数平滑
化部３７で時間的に連続するように平滑化され、その平滑化された占有度周期性
又は占有度和が最大値抽出部３５又は３６へ供給され、異常値に基づく誤抽出を
防止するようにすることもできる。
つまり各時刻において求められた基本周波数の抽出精度を、時間的な連続性を
用いてさらに抽出精度を向上するものである。これは、図９に示した基本周波数
抽出法の周期性、または、図１０に示した基本周波数抽出法の調波構造に関する
占有度の和の時系列に対して、図９中のステップＳ２の次に破線で示すように、
また図１０中のステップＳ２ｄの次に破線で示すように、平滑化処理ステップＳ
７において、周波数ギャップの少ないピーク位置を時間軸に沿って追跡すること
で実現する。 Modification FIG. 11 shows a modification of the fundamental frequency extraction apparatus of the present invention. The difference from the apparatus shown in FIGS. 6 and 7 is that the occupancy periodicity P ₀ (T ₁ ) ˜
P ₀ (T _n ) or the occupancy sum D _t0 (ω ₁ ) to D from the harmonic structure occupancy calculator 35
_t0 (ω _n ) or D _tp (ω ₁ ) to D _tp (ω _n ) is smoothed so as to be temporally continuous by the fundamental period or fundamental frequency smoothing unit 37, and the smoothed occupancy periodicity Alternatively, the occupation degree sum may be supplied to the maximum value extraction unit 35 or 36 to prevent erroneous extraction based on the abnormal value.
That is, the extraction accuracy of the fundamental frequency obtained at each time is further improved by using temporal continuity. This corresponds to the periodicity of the fundamental frequency extraction method shown in FIG. 9 or the time series of the sum of occupancy related to the harmonic structure of the fundamental frequency extraction method shown in FIG. Next, as shown by the broken line,
Further, as shown by a broken line after step S2d in FIG.
7, the peak position with a small frequency gap is tracked along the time axis.

このピーク追跡には、例えば、ダイナミックプログラミング（以下ＤＰと呼ぶ
）などの既知のアルゴリズムを適用できる。また、基本周波数抽出は、様々な音
声処理の前処理として想定されているため、ＤＰのようなバッチ処理ではなく逐
次処理であることが望ましい場合もある。この場合は、ＤＰのアルゴリズムを改
良した逐次ＤＰが適用できる。逐次ＤＰでは、各時刻において、すでに求められ
た現在時刻以前の周期性または占有度の和の時系列に対して、通常のＤＰを実行
して現在の基本周波数を求める。この方法で、過去から現在にわたる周波数の連
続性について考慮した現在時刻の基本周波数推定ができる。しかも、もともとＤ
Ｐは、実行途中において、現在時刻までの最適パスを更新する逐次アルゴリズム
であるため、逐次ＤＰにしても通常のＤＰと比べて余分な計算は発生しない。 For this peak tracking, for example, a known algorithm such as dynamic programming (hereinafter referred to as DP) can be applied. In addition, since the fundamental frequency extraction is assumed as a pre-process for various audio processes, it may be desirable to perform a sequential process instead of a batch process such as DP. In this case, a sequential DP in which the DP algorithm is improved can be applied. In the sequential DP, at each time, a normal DP is executed on a time series of the sum of periodicity or occupancy before the current time that has already been obtained to obtain the current fundamental frequency. With this method, it is possible to estimate the fundamental frequency at the current time in consideration of frequency continuity from the past to the present. And originally D
Since P is a sequential algorithm that updates the optimum path up to the current time during execution, even if it is a sequential DP, no extra calculation occurs compared to a normal DP.

次に音源分離装置により分離した音源信号の基本周波数抽出装置の実施例を図
１２に示す。信号入力部４１により２チャネル以上の音響信号が入力され、これ
ら複数チャネルの入力信号は音源分離装置４２で音源と信号入力部との位置関係
から目的音源信号が強調、もしくは目的音源信号以外の音響信号が抑圧されて目
的音源信号が分離され、その分離された目的音源信号の基本周波数が、図６、図
７、図１１の何れかに示した基本周波数抽出装置４３により抽出される。 Next, FIG. 12 shows an embodiment of a fundamental frequency extraction device for sound source signals separated by a sound source separation device. Two or more channels of acoustic signals are input by the signal input unit 41, and the input signals of these plural channels are emphasized by the target sound source signal from the positional relationship between the sound source and the signal input unit by the sound source separation device 42, or sound other than the target sound source signal. The target sound source signal is separated by suppressing the signal, and the fundamental frequency of the separated target sound source signal is extracted by the fundamental frequency extraction device 43 shown in any of FIGS.

図１３にダミーヘッドマイクロフォンを用いた音源分離装置４２の構成例を示
す。各左右の耳の信号入力部４１_L及び４１_Rから入力された２チャネルの入力信
号のそれぞれに対し、周波数解析部４２１_R，４２１_Lにおいて例えば短時間フー
リエ変換が施され、この変換されたスペクトルにより、左右の各周波数ごとに信
号の強度と位相がそれぞれ求められ、その各周波数ごとに左右の入力の強度差と
位相差が強度差抽出部４２２、位相差抽出部４２３でそれぞれ求められる。目的
音源の方向からくる音の強度差と位相差に関するダミーヘッドの特性を使うと、
各周波数ごとに、目的方向からくる音の強度差と時間差の範囲が求められる。こ
の性質を利用して、目的方向周波数帯域選択部４２４，４２５で各周波数で入力
音がこの範囲に入っているかどうかを調べ、目的方向周波数帯域信号通過部４２
６で目的方向以外の音の場合は、その周波数の入力信号を０と置き換える。その
結果得られる左右の信号に、短時間フーリエ逆変換を施すことで、目的方向から
くる音だけを分離することが出来る。この音源分離装置は例えばJ. Acoust. Soc
. Jpn(E)20, 2(1999)147〜149頁を参照されたい。 FIG. 13 shows a configuration example of a sound source separation device 42 using a dummy head microphone. For example, short-time Fourier transform is performed in the frequency analysis units 421 _R and 421 _{L on} each of the two-channel input signals input from the left and right ear signal input units 41 _L and 41 _R, and the converted spectrum is obtained. Thus, the intensity and phase of the signal are obtained for each of the left and right frequencies, and the intensity difference and phase difference between the left and right inputs are obtained by the intensity difference extraction unit 422 and the phase difference extraction unit 423 for each frequency. Using the characteristics of the dummy head related to the intensity difference and phase difference of the sound coming from the direction of the target sound source,
For each frequency, a range of difference in sound intensity and time difference from the target direction is obtained. Using this property, the target direction frequency band selection units 424 and 425 check whether or not the input sound falls within this range at each frequency, and the target direction frequency band signal passing unit 42
In the case of a sound other than the target direction at 6, the input signal of that frequency is replaced with 0. By applying a short time inverse Fourier transform to the left and right signals obtained as a result, only the sound coming from the target direction can be separated. This sound source separation device is, for example, J. Acoust. Soc
Jpn (E) 20, 2 (1999) pp. 147-149.

こうして分離された音声信号は、いくつかの周波数帯域の音を０に置き換えら
れているため、大きなひずみを持った音信号である。しかしながら、雑音に比べ
て強度の強い占有的な周波数成分を目的音信号が持つときは、分離音信号にもそ
のままその成分は残されている。したがって、この発明による占有度を用いた基
本周波数抽出法をそのまま適用することができ、音源分離装置の雑音抑制効果に
加えて、分離ひずみの影響も受けにくい基本周波数抽出法を構成できる。
なお、複数マイクロホンによる音源分離法は、独立成分分析法、ヌルビームフ
ォーマ法、ディレイサム法、ミント法など多数のものが知られている。どの方法
を用いる場合も、分離音信号に対しこの発明による占有度を用いる方法で基本周
波数を抽出することで、占有度が分離ひずみの影響を受けにくい評価尺度である
ため高精度の分離装置を構成することが出来る。 The sound signal thus separated is a sound signal having a large distortion because sounds in several frequency bands are replaced with zero. However, when the target sound signal has an occupying frequency component that is stronger than noise, the component remains in the separated sound signal. Therefore, the fundamental frequency extraction method using the occupancy according to the present invention can be applied as it is, and a fundamental frequency extraction method that is not easily affected by separation distortion in addition to the noise suppression effect of the sound source separation device can be configured.
Many sound source separation methods using a plurality of microphones are known, such as an independent component analysis method, a null beam former method, a delay sum method, and a mint method. Whichever method is used, the fundamental frequency is extracted from the separated sound signal by the method using the occupancy according to the present invention. Can be configured.

適応的な積分範囲決定方法
図１９に、入力信号の概算基本周波数が得られていない場合に、適応的に積分
範囲を決定し基本周波数を抽出する処理手順を示す。
まず、入力部から入力された入力信号を占有度に基づく基本周波数抽出部が受
け取り、式（１）、（２）で求められる占有度を抽出する。この際、式（２）で
必要な積分範囲については、入力音に含まれる音の基本周波数に共通に利用でき
る積分範囲（大人の話者の発声の場合、約260Hz幅）を利用する。このようにし
て求められた占有度に対し、次に、同基本周波数抽出部は調波構造占有度を求め
る。これは、例えば、図８において説明した方法に関連して以下の式を用いて計
算される。

ここでｑは高調波の次数、ｒ（・）はｑ・ω₀を最も近い周波数ｂｉｎの中心
周波数ω_cに変換する関数、Ｅ（Ｄ₀（ω_c））はＤ₀（ω_c）の全周波数にわたる
平均値である。同基本周波数抽出部はこうして求められた調波構造占有度に関し
て、以下の式に従って、最大値を与える基本周波数の初期設定値を抽出する（Ｓ
１）。

Adaptive Integration Range Determination Method FIG. 19 shows a processing procedure for adaptively determining the integration range and extracting the fundamental frequency when the approximate fundamental frequency of the input signal is not obtained.
First, an input signal input from the input unit is received by a fundamental frequency extraction unit based on the occupancy level, and the occupancy levels obtained by equations (1) and (2) are extracted. At this time, for the integration range required in Expression (2), an integration range (about 260 Hz width in the case of an utterance of an adult speaker) that can be commonly used for the fundamental frequency of the sound included in the input sound is used. Next, the fundamental frequency extraction unit obtains the harmonic structure occupation degree with respect to the occupation degree thus obtained. This is calculated, for example, using the following equation in connection with the method described in FIG.

Here, q is the harmonic order, r (·) is a function for converting q · ω ₀ to the center frequency ω _c of the nearest frequency bin, and E (D ₀ (ω _c )) is D ₀ (ω _c ). Average value over all frequencies. The fundamental frequency extraction unit extracts an initial set value of a fundamental frequency that gives a maximum value according to the following formula with respect to the harmonic structure occupancy thus obtained (S
1).

次に、こうして求められた初期基本周波数に対して最適な積分範囲を積分範囲
決定部２８が決定する（Ｓ２）。最適な積分範囲は、ＳＴＦＴの各周波数ｂｉｎ
を中心として、基本周波数の初期推定値の60%〜100%程度の範囲である。
こうして求められた積分範囲を用いて、同じ入力信号に対して、占有度に基づ
く基本周波数抽出部は、基本周波数の初期設定と同じ方法で、占有度、調波構造
占有度、および最大値を抽出し、より正確な基本周波数を抽出する（Ｓ３）。
なお、占有度の抽出は、基本周波数の初期設定値を求めるために式（２）を計
算する過程において、積分を部分的に行った時点の計算途中結果を保存しておく
ことで、２回目には、式（２）の計算をしなくても１回目の途中結果を利用する
ことができる。これにより計算コストを短縮することができる。 Next, the integration range determination unit 28 determines an optimal integration range for the initial fundamental frequency thus determined (S2). The optimum integration range is STFT frequency bin
Is about 60% to 100% of the initial estimated value of the fundamental frequency.
Using the integration range thus obtained, for the same input signal, the fundamental frequency extraction unit based on the occupancy degree calculates the occupancy degree, harmonic structure occupancy degree, and maximum value in the same manner as the initial setting of the fundamental frequency. Extraction is performed to extract a more accurate fundamental frequency (S3).
The extraction of the degree of occupancy is performed the second time by saving the intermediate calculation result when the integration is partially performed in the process of calculating Equation (2) to obtain the initial setting value of the fundamental frequency. The first halfway result can be used without calculating the formula (2). Thereby, calculation cost can be shortened.

［この発明の実施形態］
占有度の代わりにパワースペクトルを用いる基本周波数抽出法
図２０、２１に、包絡成分を取り除いた入力信号のパワーを用いる基本周波数
抽出装置、処理手順を示す。
まず、前処理を用いて入力信号の周波数特性を基本周波数抽出に適したものに
変形する。これには、例えば、時系列入力信号に対して高域通過フィルタを適用
することで、低域の周波数を抑制して高域の周波数を強調する処理や、逆に低域
通過フィルタを適用して高域を抑制する処理などがあげられる。周波数特性の変
形を受けていない入力信号、または、その補正をする必要がない入力信号の場合
は、この処理は省略することができる。（以上がＳ１の処理である。）
次に、パワー抽出部５１が、入力信号の周波数ω_c（ω_c1〜ω_cn）ごとにパワ
ーＳ（ω_c）²を計算する。これには、例えば、ＳＴＦＴの各周波数ｂｉｎの出力
の２乗をとることで計算することができる。 [Embodiment of the Invention]
Basic Frequency Extraction Method Using Power Spectrum Instead of Occupancy FIGS. 20 and 21 show a basic frequency extraction apparatus and processing procedure using the power of an input signal from which an envelope component has been removed.
First, the frequency characteristics of the input signal are transformed into those suitable for basic frequency extraction using preprocessing. To this end, for example, a high-pass filter is applied to the time-series input signal to suppress the low-frequency and emphasize the high-frequency, or conversely, the low-pass filter is applied. For example, processing that suppresses high frequencies. In the case of an input signal that has not been subjected to frequency characteristic deformation or an input signal that does not need to be corrected, this processing can be omitted. (The above is the process of S1.)
Next, the power extraction unit 51 calculates the power S (ω _c ) ² for each frequency ω _c (ω _{c1 to} ω _cn ) of the input signal. This can be calculated, for example, by taking the square of the output of each frequency bin of the STFT.

次に、包絡成分除去部５２がそのパワーの包絡成分を取り除く。これには、例
えば次の方法を用いることができる。まず、各周波数のパワーＳ（ω_c）²を周波
数軸に沿って並べたもの（周波数特性と呼ぶ）に対して、更に離散フーリエ変換
を適用する。次に、この離散フーリエ変換の低い周波数に相当する信号を０と置
き換えてから、離散逆フーリエ変換をかけて周波数特性に相当する信号にもどす
。このとき、得られる信号は、一般に複素数になるため、この信号の実部を抽出
したものが、包絡成分を除去したパワーとなる。 Next, the envelope component removing unit 52 removes the envelope component of the power. For example, the following method can be used. First, a discrete Fourier transform is further applied to the power (S (ω _c ) ² ) of each frequency arranged along the frequency axis (referred to as frequency characteristics). Next, the signal corresponding to the low frequency of the discrete Fourier transform is replaced with 0, and then the discrete inverse Fourier transform is performed to return the signal corresponding to the frequency characteristic. At this time, since the obtained signal is generally a complex number, the extracted real part of this signal is the power from which the envelope component is removed.

次にこうして求めた包絡を取り除いたパワーに対して、調波構造パワー抽出部
５３が以下の式に基づいて調波構造パワーＳ_t0（ω₀）²を抽出する。

ここで、ｑは高調波の次数、ｒ（・）はｑ・ω₀を最も近い周波数ｂｉｎの
中心周波数ω_cに変換する関数、Ｅ（Ｓ（ω_c））はＳ（ω_c）の全周波数にわた
る平均値（平均値抽出部５４）である。
こうして求められた調波構造パワーの最大値を最大値抽出部５５が抽出し、以
下の式に従って、最大値を与える基本周波数を抽出する。（以上がＳ２の処理で
ある。）

なお、図２２に示したように、包絡成分抽出部を省略すれば計算精度はやや落ち
るが、その見返りとして計算コストを削減することができる。 Next, the harmonic structure power extraction unit 53 extracts the harmonic structure power S _t0 (ω ₀ ) ² based on the following equation for the power obtained by removing the envelope thus obtained.

Here, q is the harmonic order, r (·) is a function that converts q · ω ₀ to the center frequency ω _c of the nearest frequency bin, and E (S (ω _c )) is the total of S (ω _c ). The average value over the frequency (average value extraction unit 54).
The maximum value extraction unit 55 extracts the maximum value of the harmonic structure power thus obtained, and extracts the fundamental frequency that gives the maximum value according to the following equation. (The above is the process of S2.)

As shown in FIG. 22, if the envelope component extraction unit is omitted, the calculation accuracy is slightly reduced, but in return, the calculation cost can be reduced.

精緻化した基本周波数抽出法
図２３に、概算で求められている基本周波数Ｆ'₀をより精緻化して求めるため
の機能構成を示す。
入力信号を受け取ると瞬時周波数抽出部６１は各周波数ごとに瞬時周波数を抽
出する。得られた瞬時周波数から不動点抽出部６２は、以下の式を満たす不動点
とその周波数φ’を抽出する。

（ここで、φ₁’＞ω_c1 、 φ₂’＜ω_c2 ）
ここで、ω_c1、ω_c2は、となりあった周波数ｂｉｎの中心周波数（ω_c1＜ω_c2）
、φ₁’，φ₂’はそれぞれの瞬時周波数である。また、式（５）を計算する代わ
りにφ’＝ω_c1、または、φ’＝ω_c2とすることで、計算精度はやや落ちるが計
算コストを少なくすることができる。
上記の計算と並行して、占有度抽出部６３が各周波数ｂｉｎの占有度を抽出す
る。概算基本周波数抽出部６４において、概算基本周波数を抽出する際に占有度
がすでに計算されている場合には、この処理は必要ない。 Refined Fundamental Frequency Extraction Method FIG. 23 shows a functional configuration for obtaining a more refined fundamental frequency F ′ ₀ that is roughly calculated.
When receiving the input signal, the instantaneous frequency extraction unit 61 extracts the instantaneous frequency for each frequency. From the obtained instantaneous frequency, the fixed point extraction unit 62 extracts a fixed point that satisfies the following expression and its frequency φ ′.

(Where φ ₁ '> ω _c1 , φ ₂ '<ω _c2 )
Here, ω _c1 and ω _c2 are the center frequencies (ω _c1 <ω _c2 ) of the _existing frequency bin.
, Φ ₁ ′, φ ₂ ′ are respective instantaneous frequencies. In addition, by calculating φ ′ = ω _c1 or φ ′ = ω _c2 instead of calculating equation (5), the calculation cost can be reduced, but the calculation cost can be reduced.
In parallel with the above calculation, the occupancy degree extraction unit 63 extracts the occupancy degree of each frequency bin. If the occupancy is already calculated when the approximate fundamental frequency extraction unit 64 extracts the approximate fundamental frequency, this processing is not necessary.

最後に、精緻化基本周波数抽出部６５は、概算基本周波数Ｆ'₀の整数倍（＝ｉ
）の周波数の近傍（±10%程度）に含まれる不動点φ’∈Φ’（ｉ・Ｆ'₀）（Φ
’（Ｆ）は周波数Ｆの近傍にある不動点の集合を表す。）を抽出するとともに、
不動点の瞬時周波数φ’を整数（＝ｉ）で割った値を基本周波数候補値として、
その平均値を各占有度Ｄ₀（ｒ（φ’））で重み付けして計算することで、精緻
化した基本周波数を求める。これには、以下の式に従い計算する。

ここで、ｃはすべての不動点の占有度を正の値にするためのバイアスで、εは任
意の小さい正の値でよい。
この占有度を用いた基本周波数の精緻化法は、占有度の代わりに、パワ−抽出
部５１で抽出したパワーもしくは包絡成分除去部６８において包絡成分を取り除
いたパワーを用いることで、全く同様に構成することができる。図２４にその機
能構成を示す。 Finally, the refined fundamental frequency extraction unit 65 is an integer multiple of the approximate fundamental frequency F ′ ₀ (= i
) Fixed point φ′∈Φ ′ (i · F ′ ₀ ) (Φ
'(F) represents a set of fixed points in the vicinity of the frequency F. )
A value obtained by dividing the instantaneous frequency φ ′ of the fixed point by an integer (= i) is a basic frequency candidate value,
The refined fundamental frequency is obtained by calculating the average value by weighting each occupancy D ₀ (r (φ ′)). This is calculated according to the following formula.

Here, c is a bias for setting the occupancy of all the fixed points to a positive value, and ε may be an arbitrarily small positive value.
The refinement method of the fundamental frequency using the occupancy is exactly the same by using the power extracted by the power extraction unit 51 or the power from which the envelope component is removed by the envelope component removal unit 68 instead of the occupancy. Can be configured. FIG. 24 shows the functional configuration.

上述した占有度抽出装置、基本周波数抽出装置はそれぞれ、コンピュータによ
りプログラムを実行させることにより、機能させることもできる。この場合は実
施例に示した何れかの占有度抽出方法をコンピュータに実行させるため占有度抽
出プログラム、あるいは基本周波数抽出方法をコンピュータに実行させるための
基本周波数抽出プログラムを、ＣＤ−ＲＯＭ、可撓性磁気ディスクなどの記録媒
体、又は通信回線を介してコンピュータ内にインストールして行えばよい。 Each of the above-described occupancy extraction device and fundamental frequency extraction device can also function by causing a computer to execute a program. In this case, an occupancy degree extraction program for causing a computer to execute one of the occupancy degree extraction methods described in the embodiments, or a fundamental frequency extraction program for causing a computer to execute the fundamental frequency extraction method is a CD-ROM, flexible May be installed in a computer via a recording medium such as a magnetic disk or a communication line.

実験例
図１５Ａ，１５Ｂに雑音のない場合と、０ｄＢの白色雑音を加えた場合の音声
について、各周波数ｂiｎにおける占有度Ｄ₀ （ω_c ）を太い実線で示す。この
太い実線の占有度によれば図１５Ａより、各高調波成分の中心付近の周波数にお
いても鋭いピークがえられているのがわかる。また、図１５Ｂより、第３高調波
までは鋭いピークがあるが、第４高調波以上のピークが抑制されており、白色雑
音の影響が大きいことがわかる。これは、破線で示す対数パワースペクトルを目
視で評価した結果とよく一致しており、占有度が雑音の影響を評価する適切な尺
度であることを示している。 Experimental Example FIGS. 15A and 15B show the occupancy D ₀ (ω _c ) at each frequency bin with a solid solid line for the case where there is no noise and the case where white noise of 0 dB is added. According to the occupancy of this thick solid line, it can be seen from FIG. 15A that sharp peaks are obtained even at frequencies near the center of each harmonic component. From FIG. 15B, it can be seen that there is a sharp peak up to the third harmonic, but a peak higher than the fourth harmonic is suppressed, and the influence of white noise is large. This is in good agreement with the result of visual evaluation of the logarithmic power spectrum indicated by the broken line, indicating that the occupancy is an appropriate measure for evaluating the influence of noise.

図１７Ａに、白色雑音下での、図１７Ｂに白色雑音と妨害音声下での目的音声
の基本周波数抽出正解率（抽出した基本周波数が正解値から±５％に入っている
割合）を示す。目的音声には男女各２名（計４名）が発話した３０種類の文（計
１２０文）を用い、背景雑音には、白色雑音単独（雑音−１）と、白色雑音にさ
らに１名の妨害音声（男女各１名の計６０文）をくわえたもの（雑音−２）を用
いた。雑音−２では、２つの雑音同士のパワーは同一とし、目的音声と片方の雑
音とのパワー比をＳＮＲとして記している。適応的に積分範囲を決める方法（占
有度法１）、事前情報（入力信号が男性か女性か）を用いる方法（占有度法２）
、および、ケプストラム法（従来法）を、それぞれ破線、太実線、□付破線で示
した。なお、目的音声の正解基本周波数は音声集音時に同時に収集したＥＧＧ（
electro glottal graph）波形から求めた。また占有度としてＤ_p（ω_c ）を用い
た。両図より、どの背景雑音下でも占有度法２が最も安定して基本周波数が抽出
できていることがわかる。また、占有度法１も雑音の強度増加に応じた性能劣化
が少なく、０ｄＢ付近で、占有度法２についで正解率が高い。このことから、占
有度を用いることで雑音に強い基本周波数抽出が行えているといえる。 FIG. 17A shows the fundamental frequency extraction accuracy rate of the target speech under white noise and interfering speech (the rate at which the extracted fundamental frequency is within ± 5% from the correct answer value) under white noise. Thirty sentences (total of 120 sentences) spoken by 2 males and females (total 4 persons) are used for the target speech, and the background noise is white noise alone (noise-1), and white noise is one additional person. A speech (noisy-2) including disturbing speech (total of 60 sentences for each male and female) was used. In noise-2, the powers of the two noises are the same, and the power ratio between the target speech and one of the noises is described as SNR. Method of adaptively determining the integration range (occupancy method 1), using prior information (whether the input signal is male or female) (occupancy method 2)
The cepstrum method (conventional method) is indicated by a broken line, a thick solid line, and a dotted line with □, respectively. Note that the correct fundamental frequency of the target voice is the EGG (
electro glottal graph). Also, D _p (ω _c ) was used as the degree of occupation. From both figures, it can be seen that the occupancy method 2 can extract the fundamental frequency most stably under any background noise. Also, the occupancy method 1 has little performance degradation in response to an increase in noise intensity, and the accuracy rate is high next to the occupancy method 2 near 0 dB. From this, it can be said that the fundamental frequency extraction resistant to noise can be performed by using the degree of occupation.

図１８に、０ｄＢの白色雑音下で、ケプストラム法（従来法）および占有度と
してＤ_p （ω_c ）を用いた上記占有度法１の方法で抽出した基本周波数の時系列
を示す。図１８Ａは正解を、図１８Ｂは従来法を、図１８Ｃは占有度法１をそれ
ぞれ示す。正解値と比較して、ケプストラム法では非常に誤差が大きいのに比べ
、占有度法１では安定して正解に近い値が抽出されていることがわかる。
図２５に、背景雑音下における目的音声のＦ₀正解率（推定したＦ₀が正解値か
ら±５%に入っている割合）を示す。目的音声には男女各２名（計４名）の30種
類の文（計120文）を用い、背景雑音には白色雑音とマルチトーカ雑音を用いた
。マルチトーカ雑音はカクテルパーティ環境を模擬する雑音で、上記120文から
ランダムに選んだ10個の発話を同時に重複して作成した。占有度を用いて適応的
に積分範囲を決定する基本周波数抽出法（調波構造占有度を最大化に式（１）を
利用）と、占有度を用いて精緻化する方法とを組み合わせた基本周波数抽出法（
proposedと表記）と、従来から知られているケプストラム法を比較した。正解
Ｆ₀は音声収音時に同時に収集したＥＧＧ(electro glottal graph)信号から各
Ｆ₀抽出法を用いて抽出し、雑音下の目的音声から抽出したＦ₀と比較した。図より
占有度を用いた方法が従来法より各ＳＮＲ下で頑健にＦ₀抽出が行えている。 FIG. 18 shows a time series of basic frequencies extracted by the method of the occupancy method 1 using the cepstrum method (conventional method) and D _p (ω _c ) as the occupancy under white noise of 0 dB. 18A shows the correct answer, FIG. 18B shows the conventional method, and FIG. 18C shows the occupancy method 1. Compared with the correct answer value, the cepstrum method has a very large error, whereas the occupancy method 1 stably extracts a value close to the correct answer.
FIG. 25 shows the F ₀ correct answer rate of the target speech under background noise (the ratio at which the estimated F ₀ is within ± 5% of the correct answer value). Thirty sentences (total of 120 sentences) of 2 males and females (total of 4 persons) were used for the target speech, and white noise and multi-talker noise were used for the background noise. Multi talker noise simulates a cocktail party environment, and was created by duplicating 10 utterances randomly selected from the above 120 sentences. A basic frequency combining method that uses an occupancy to adaptively determine the integration range (using equation (1) to maximize the harmonic structure occupancy) and a refinement method using the occupancy Frequency extraction method (
The proposed cepstrum method was compared with the proposed method. The correct answer F ₀ is extracted from each EGG (electro glottal graph) signal collected at the time of voice collection using each F ₀ extraction method, and compared with F ₀ extracted from the target voice under noise. From the figure, F ₀ extraction can be performed more robustly under each SNR in the method using occupancy than the conventional method.

図２６に、この発明の実施例による包絡成分を取り除いたパワーを用いる基本
周波数抽出法を利用した場合の結果を示す。抽出処理の前に周波数特性の補正の
ために入力信号に高域通過フィルタ処理を施さない方法(PowerSpec-1)、施す方
法(PowerSpec-2)、および正解Ｆ₀を求める場合のみ高域通過フィルタ処理を施す
方法(PowerSpec-3)を比較した。結果は、PowerSpec-3が一番よい。これは、包
絡成分を取り除いた信号のパワーを用いる方法では、正解Ｆ₀と目的音声のＦ₀抽
出のために前処理を変えなければならない場合があることを示している一方で、適
切に、前処理を選択した場合は、背景雑音に対して頑健であることを示している。 FIG. 26 shows a result of using the fundamental frequency extraction method using the power from which the envelope component is removed according to the embodiment of the present invention. A method that does not perform high-pass filter processing on the input signal to correct frequency characteristics before extraction processing (PowerSpec-1), a method that performs it (PowerSpec-2), and a high-pass filter only when the correct answer F ₀ is obtained The processing method (PowerSpec-3) was compared. The result is PowerSpec-3 is the best. This shows that in the method using the power of the signal from which the envelope component is removed, the preprocessing may be changed for extracting the correct answer F ₀ and the target voice F ₀ . When pre-processing is selected, it is robust against background noise.

占有度抽出装置の例の機能構成を示す図。The figure which shows the function structure of the example of an occupation degree extraction apparatus. 図１中の占有度演算部の具体例の機能構成を示す図。The figure which shows the function structure of the specific example of the occupation calculation part in FIG. 占有度抽出方法の例の手順を示す流れ図。The flowchart which shows the procedure of the example of an occupation degree extraction method. 図３中のステップＳ３における占有度処理の具体的手順の例を示す流れ図。The flowchart which shows the example of the specific procedure of the occupancy degree process in step S3 in FIG. 図３中のステップＳ３における占有度演算処理の具体的手順の他の例を示す流れ図。The flowchart which shows the other example of the specific procedure of the occupation calculation process in step S3 in FIG. 基本周波数抽出装置の例の機能構成を示す図。The figure which shows the function structure of the example of a fundamental frequency extraction apparatus. 基本周波数抽出装置の他の例の機能構成を示す図。The figure which shows the function structure of the other example of a fundamental frequency extraction apparatus. 図７中の調波構造占有度演算部３５の各具体例の機能構成を示す図。The figure which shows the function structure of each specific example of the harmonic structure occupation degree calculating part 35 in FIG. 基本周波数抽出方法の例の手順を示す流れ図。The flowchart which shows the procedure of the example of a fundamental frequency extraction method. 基本周波数抽出方法の他の例の手順を示す流れ図。The flowchart which shows the procedure of the other example of a fundamental frequency extraction method. 基本周波数抽出装置の例の一部変形の機能構成を示す図。The figure which shows the function structure of the partial deformation | transformation of the example of a fundamental frequency extraction apparatus. 音源分離装置を備える基本周波数抽出装置の例を示す図。The figure which shows the example of a fundamental frequency extraction apparatus provided with a sound source separation apparatus. 図１２中の音源分離装置４２の具体例の機能構成を示す図。The figure which shows the function structure of the specific example of the sound source separation apparatus 42 in FIG. 従来の基本周波数抽出装置の機能構成を示す図。The figure which shows the function structure of the conventional fundamental frequency extraction apparatus. 有声音の瞬時周波数と対数パワースペクトルと占有度の例を示す図。The figure which shows the example of the instantaneous frequency of a voiced sound, a logarithmic power spectrum, and occupancy. 占有度スペクトルの周期性の例を示す図。The figure which shows the example of the periodicity of an occupancy spectrum. 従来技術および各基本周波数抽出方法の正解率の実験結果を示す図。The figure which shows the experimental result of the correct answer rate of a prior art and each fundamental frequency extraction method. 基本周波数抽出の実験結果を示す図。The figure which shows the experimental result of fundamental frequency extraction. 適応的な積分範囲決定方法、およびそれを利用した基本周波数抽出方法の手順を示す流れ図。The flowchart which shows the procedure of the adaptive integration range determination method, and the fundamental frequency extraction method using it. 包絡成分を取り除いた入力信号のパワーを用いるこの発明の基本周波数抽出装置の機能構成例を示す図。The figure which shows the function structural example of the fundamental frequency extraction apparatus of this invention which uses the power of the input signal which removed the envelope component. 入力信号のパワー、もしくは包絡成分を取り除いたパワーを用いる基本周波数抽出方法と、周波数特性補正を組み合わせた基本周波数抽出方法の手順を示す流れ図。The flowchart which shows the procedure of the fundamental frequency extraction method using the power of the input signal, or the power which removed the envelope component, and the fundamental frequency extraction method which combined frequency characteristic correction. 入力信号のパワーを用いる基本周波数抽出装置の機能構成を示す図。The figure which shows the function structure of the fundamental frequency extraction apparatus using the power of an input signal. 占有度を用いたより精緻化した基本周波数抽出装置の機能構成を示す図。The figure which shows the function structure of the more fundamental frequency extraction apparatus which used the degree of occupation. 入力信号のパワー、もしくは包絡成分を取り除いたパワーを用いた、より精緻化した基本周波数抽出装置の機能構成を示す図。The figure which shows the function structure of the more detailed fundamental frequency extraction apparatus using the power of which the power of the input signal or the envelope component was removed. 占有度を用いて適応的に積分範囲を決定する基本周波数抽出方法と占有度を用いて精緻化する方法とを組み合わせた基本周波数抽出方法と従来のケプストラム法を比較する正解率の実験結果を示す図。Experimental results of the accuracy rate comparing the fundamental frequency extraction method combining the fundamental frequency extraction method adaptively determining the integration range using occupancy and the refinement method using occupancy with the conventional cepstrum method are shown. Figure. 包絡成分を取り除いたパワーを用いる基本周波数抽出方法の高域通過フィルタ処理を施さない方法(PowerSpec-1)、施す方法(PowerSpec-2)、正解を求める場合のみ高域通過フィルタを施す方法(PowerSpec-3)における正解率の実験結果を示す図。A method that does not perform high-pass filter processing (PowerSpec-1), a method that applies power (PowerSpec-2), and a method that applies a high-pass filter only when finding a correct answer (PowerSpec) The figure which shows the experimental result of the correct answer rate in -3).

Claims

A power extraction unit that calculates the power of an acoustic signal (hereinafter referred to as an input signal) such as an input audio signal or music signal for each center frequency of each frequency band;
An average value calculation unit for calculating an average value of powers of all center frequencies;
A plurality of frequencies within a frequency range in which a fundamental frequency is assumed to be present are used as candidates for the fundamental frequency, a center frequency that is close to an integer multiple of each candidate of the fundamental frequency is obtained, and the above obtained for each frequency. For each center frequency close to an integer multiple, a harmonic structure power extraction unit that calculates the sum of the power of the center frequency minus the average value;
A maximum value extraction unit that extracts the maximum value of the sum for each candidate for the fundamental frequency and outputs the frequency corresponding thereto as a fundamental frequency;
A fundamental frequency extraction device comprising:

From the power of the extracted input signal, an envelope of the frequency characteristic is extracted, and an envelope component removing unit that removes the envelope from the power is provided.
2. The fundamental frequency extracting apparatus according to claim 1, wherein a fundamental frequency is extracted from the power from which the envelope component is removed.

3. The fundamental frequency extracting apparatus according to claim 1, further comprising a frequency characteristic correcting unit that corrects a frequency characteristic of an input signal as preprocessing for all processes.

A power extractor for calculating the power of the input signal for each center frequency of each frequency band;
An instantaneous frequency extraction unit that extracts an instantaneous frequency of the center frequency of each frequency band from the input signal;
A fixed point extraction unit that extracts a fixed point that is a frequency at which the center frequency and the instantaneous frequency of each frequency band coincide with each other;
An approximate fundamental frequency extractor for calculating an approximate value of the fundamental frequency;
A fundamental frequency refinement unit that further refines the approximate fundamental frequency,
The fundamental frequency refinement unit selects a fixed point existing in the vicinity of an integer multiple of the approximate fundamental frequency, and calculates the power obtained by the power extraction unit for the fundamental frequency candidate obtained by dividing the frequency by an integer. A fundamental frequency extracting apparatus that extracts a more refined fundamental frequency by taking an average with a weight as a weight.

A power extraction process for calculating the power of an acoustic signal (hereinafter referred to as an input signal) such as an input audio signal or music signal for each center frequency of each frequency band;
An average value calculation process for obtaining an average value of powers of all center frequencies,
A plurality of frequencies within a frequency range in which a fundamental frequency is assumed to be present are used as candidates for the fundamental frequency, a center frequency that is close to an integer multiple of each candidate of the fundamental frequency is obtained, and the above obtained for each frequency. For each center frequency close to an integer multiple, a harmonic structure power extraction process for calculating the sum of the power at the center frequency minus the average value,
Extracting the maximum value of the sum for each candidate for the fundamental frequency, and outputting the corresponding frequency as the fundamental frequency;
A fundamental frequency extraction method characterized by comprising:

From the extracted input signal power, the envelope of the frequency characteristic is extracted, and the envelope component removal process of removing this from the power,
6. The fundamental frequency extraction method according to claim 5, wherein the fundamental frequency is extracted from the power from which the envelope component is removed.

7. The fundamental frequency extraction method according to claim 5, further comprising a frequency characteristic correction process for correcting the frequency characteristic of the input signal as preprocessing for all processes.

A power extraction process for calculating the power of the input signal for each center frequency of each frequency band,
An instantaneous frequency extraction process for extracting an instantaneous frequency for each frequency band from the input signal;
A fixed point extraction process for extracting a fixed point that is a frequency at which the center frequency and the instantaneous frequency of each frequency band coincide with each other;
An approximate fundamental frequency extraction process for calculating an approximate value of the fundamental frequency;
With a fundamental frequency refinement process to further refine the approximate fundamental frequency,
In the fundamental frequency refinement process, the power obtained in the power extraction process is selected for a fundamental frequency candidate obtained by selecting a fixed point existing in the vicinity of an integer multiple of the approximate fundamental frequency and dividing the frequency by an integer. By taking the average as
A fundamental frequency extraction method characterized by extracting a more refined fundamental frequency.

A fundamental frequency extraction program for causing a computer to execute each step of the fundamental frequency extraction method according to claim 5.

A computer-readable recording medium on which the fundamental frequency extraction program according to claim 9 is recorded.