JP3892379B2 - Harmonic structure section estimation method and apparatus, harmonic structure section estimation program and recording medium recording the program, harmonic structure section estimation threshold determination method and apparatus, harmonic structure section estimation threshold determination program and program Recording media - Google Patents

Harmonic structure section estimation method and apparatus, harmonic structure section estimation program and recording medium recording the program, harmonic structure section estimation threshold determination method and apparatus, harmonic structure section estimation threshold determination program and program Recording media Download PDF

Info

Publication number
JP3892379B2
JP3892379B2 JP2002274525A JP2002274525A JP3892379B2 JP 3892379 B2 JP3892379 B2 JP 3892379B2 JP 2002274525 A JP2002274525 A JP 2002274525A JP 2002274525 A JP2002274525 A JP 2002274525A JP 3892379 B2 JP3892379 B2 JP 3892379B2
Authority
JP
Japan
Prior art keywords
occupancy
harmonic structure
value
estimating
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP2002274525A
Other languages
Japanese (ja)
Other versions
JP2004109742A (en
Inventor
智広 中谷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2002274525A priority Critical patent/JP3892379B2/en
Publication of JP2004109742A publication Critical patent/JP2004109742A/en
Application granted granted Critical
Publication of JP3892379B2 publication Critical patent/JP3892379B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Description

【0001】
【発明の属する技術分野】
この発明は、複数の音や雑音下の音響信号から、目的音の調波構造(音声の場合は有声音)が含まれている区間(有声音区間)を推定する調波構造区間推定法に関する。
調波構造区間の推定は、音声合成、音声認識、音声符号化等の信号処理の前処理として利用されている。したがって、雑音下での高精度な調波構造区間の推定は、後処理として実施される信号処理装置の性能を向上させることに寄与する。このような信号処理装置には、以下のようなものが含まれる。
1.調波構造区間の情報をもとに複数音源の混合音から各構成音を分離する音源分離装置。
2.有声音区間の情報をもとに音声を符号化する音声符号化・復号化装置。
3.騒がしい環境で人が鼻歌で歌った音の有声音区間からメロディを推定して、楽曲を検索する音楽検索装置。
4.音楽演奏の音響信号を受け取って調波構造区間を推定し、楽譜もしくは楽譜相当の音楽情報を推定するための自動採譜装置。
5.人が発した声の有声音区間における基本周波数の高さで機械にコマンドをわたす機械制御インターフェース、および機械との対話装置。
【0002】
【従来の技術】
図5に従来の調波構造区間推定装置の構成を示す。
図5を参照して従来の調波構造区間推定装置と調波構造区間推定方法を説明する。
調波構造区間推定装置は、入力された音響信号のケプストラムピークを抽出するケプストラムピーク抽出部と、入力された音響信号のパワーを抽出する信号パワー抽出部と、単位時間当たりのゼロクロス点の数を抽出するzero crossing比抽出部と、抽出された各信号と予め設定された閾値とを比較する各閾値処理部と、閾値処理部の比較結果に基づき調波構造区間を判定する統合部とから構成される。
この従来例では、調波構造区間の判定は入力音響信号の以下の特徴を利用する。
(1)ケプストラム係数のピーク値が大きくなる。
(2)パワー値が大きくなる。
(3)zero crossing比が小さくなる。
上記3つの値毎に閾値を決定し、(1),(2)の値が閾値を超えて、(3)の値が閾値以下の場合のみ、当該入力区間を調波構造区間と判定する。(例えば、非特許文献1 参照。)
【0003】
【非特許文献1】
Ahmadi,S.,and Spanias,A.S.,"Cepstrum-based pitch detection using a new statistical V/UV classification algorithm",Speech and Audio Processing, IEEE Transaction on,Volume:7 Issue:3 ,May 1999 Page(s):333-338.
【0004】
【発明が解決しようとする課題】
上述した従来の調波構造区間推定装置及び方法では、入力された音響信号の状態に応じて、閾値を変更する必要があった。例えば、入力音響信号のレベルが大きくなると、音響信号のケプストラム係数やパワー値が調波構造区間であるかないかに関わらず大きくなるため、閾値を変更しない場合は、調波構造区間ではない区間を調波構造区間であると誤判定してしまう場合が多くなってしまう。また、背景雑音が大きくなる場合にも、調波構造区間以外のケプストラム係数やパワー値が上昇するため、誤判定を減少させるためには、同様に閾値を上げる必要があった。zero crossing比自体は、入力レベルとは関係のない値であるが、調波構造区間を推定するためには信頼性の低い尺度であるため、他の二つの特徴と組み合わせて用いるのが通常であり、単独では用いることが出来なかった。
また、状況に応じて適切な閾値を決定することは容易ではなく、専門家による調整が必要であった。しかも、入力音や背景雑音のレベルは、一般に、一つの録音環境でも一定ではないため、推定精度を上げるためには、その都度調整をしなおす等の必要があった。
【0005】
【課題を解決するための手段】
上記課題を解決するために、本発明では、入力音響信号の各周波数成分が雑音の影響を受けていない度合いを表す占有度を利用し、占有度を利用した調波構造区間推定方法及び装置、調波構造区間推定の閾値決定方法及び装置を構成する。占有度による方法では、雑音下で占有的なパワーを持つ調波成分を特定し、その特徴量のみを用いて特徴抽出を行うことで、入力音の状況によらず安定した調波構造区間推定方法、調波構造区間推定の閾値決定方法を構成する。
【0006】
占有度は、瞬時周波数に関する次の性質に基づき定義されている。瞬時周波数φ'とは、短時間フーリエ変換の各周波数binを等間隔にならんだ狭帯域通過フィルタ群とみなした場合の、その各出力波の位相φの時間微分のことである。ある時刻のある帯域に強いパワーを持った占有的な周波数成分があると、短時間フーリエ変換(以下、STFTという)におけるその周波数近傍のbinでは、瞬時周波数がほぼ一定値になることが知られている。このため、雑音が少ない入力音響信号中の調波構造を持った音の瞬時周波数を縦軸に、STFTの周波数binを横軸にとってプロットすると、図6(a)の細実線で示したような階段状になる。この階段の水平部分と各周波数binの中心周波数ωcが一致する点(φ'=ωc、以下、不動点と呼ぶ)が、各高調波成分の周波数とみなすことができる。一方、強い雑音がある入力信号中では、瞬時周波数は明確な階段状にならず、図6(b)の細実線(600[Hz]以上)で示したように、なだらかな右上がりの線になる。
【0007】
上述の瞬時周波数の性質を用いて、調波構造が周波数binの出力をどの程度占めているかを評価するために、占有度(degree of dominance)D0(ωc)は以下で定義される。
D0(ωc)=10log10(1/B(ωc2) (1)
【数1】

Figure 0003892379
B(ωc)2は、中心周波数ωcを持つ周波数binの近傍(ωc−Δω/2<ω<ωc+Δω/2)のbinにおいて、φ'(ω)とωcの差分をパワースペクトルS(ω)2で重み付き和をとったものである。占有的な周波数成分に対応する不動点近傍では、φ'(ω)とωcはほぼ同じ値をとるため、B(ωc)2は極小値をとると期待される。その逆数(の対数)をとって、同じ点で極大値をとるようにしたものがD0c)である。
占有度は、パワースペクトルS(ω)2で正規化した値でもあるため、非調波成分に対する調波成分の相対的な強度のみを評価するものである。信号の絶対的なパワーに依存せず、ほぼ同じ範囲の値をとる(約−40〜0dB)。
【0008】
図7に、幼児音声の有声音/無声音区間の占有度を各周波数binごとにプロットした結果を示す。図(a)より、有声音区間では、占有度は各調波成分に相当する規則的な鋭いピークを持ち、瞬時周波数の不動点(=瞬時周波数と各FFT binの中心周波数が一致する点)とも一致していることがわかる。また、対数スペクトル上では背景雑音に由来する調波成分以外のピークが表れているが、占有度においてはこれらのピークは消失している。一方、図(b)より無声音区間ではピークの位置は不規則かつ不明確であり不動点の位置とも一致しない。占有度のこれらの性質は、入力信号中で占有的な調波成分の構造を抽出するのに極めて有効である。
【0009】
占有度による方法では調波構造に対応する占有度のピーク値の和をとり、その大小で有声音/無声音区間(V/UV)を判定する。各フレーム(例えば、1[msec]シフト)ごとに以下の調波構造占有度を求め、メジアンフィルタ(例えば、61サンプルポイント)で時間方向の平滑化処理を行った後に閾値処理する。
【数2】
Figure 0003892379
ここで、lは高調波の次数、f0は基本周波数F0の推定値、D0,F(l・2πf0)はl次高調波の近傍の不動点における占有度(不動点がない場合はE(D0c)))を返す関数である。なお、E(D0c))は占有度のバイアスを除去する項で占有度の周波数方向の平均値を返す関数である。
【0010】
前述したように、有声音区間では各高調波に相当する周波数と不動点および占有度のピークが一致するため、無声音区間に比べてDt0(f0)は大きな値をとることになる。しかも、入力音声のパワーによらず、占有的な調波成分の占有度はほぼ近い値をとるため、有声音区間においてDt0(f0)はある一定の大きな値の範囲におさまる。これに対し、無声音区間の占有度は小さな値で一定の範囲におさまる。この有声音/無声音区間のそれぞれの占有度がとりうる範囲の境界に閾値を設定することでV/UV判定を行うことが出来る。
【0011】
【発明の実施の形態】
(調波構造区間の推定方法、装置)
図1に本発明の調波構造区間推定装置の構成例を示す。
図1を用いて、調波構造区間推定装置の構成及び調波構造区間推定方法を説明する。
調波構造区間推定装置は、基本周波数推定部と、窓切り出し手段と瞬時周波数推定手段からなる瞬時周波数推定部と、信号パワー推定手段と占有度計算手段からなる占有度計算部と、不動点推定部と、調波構造占有度推定部と閾値処理部と、から構成される。
本実施例では、まず、基本周波数推定部は、入力された音響信号から有声音/無声音区間の区別なく基本周波数を推定する。基本周波数の推定には、特願2002-62513号に示されている方法などが利用できる。その他、既存の様々な基本周波数推定法が利用可能である。
【0012】
次に、この基本周波数推定値f0を用いて有声音区間の判定を行う。
まず、窓切り出し手段は、入力音響音声を短時間単位で分析するために、窓切り出し処理を行う。後段の処理で行う瞬時周波数推定を精度よく行うためには、入力音響信号の推定基本周波数f0に応じて窓の切り出し幅を変えることが有効であることが知られている(詳しくは文献[1]阿竹 他,“瞬時周波数に基づく雑音環境下でのピッチ推定”,電子情報通信学会論文誌,Vol.J79-D-II,No.11,pp.1771-1781,1996参照)。例えば、ある時刻の推定基本周波数をf0[Hz]としたとき、3.5/f0[sec]程度の長さのハニング窓を利用すればよい。ハニング窓以外にもハミング窓やブラックマン窓など、様々な既存の窓関数を利用することが出来る。さらに、瞬時周波数推定精度は下がるが可変長時間窓ではなく固定長時間窓(例えば、42[msec]ハニング窓など)を用いても、同様に以降の分析を進めることが出来る。
【0013】
次に、占有度を求めるのに必要な量である、信号のパワースペクトルS(ω)2と瞬時周波数φ'(ωc)を求める。
信号パワー推定手段において、音響信号のパワースペクトルは、まず、切り出した各短時間入力音響信号に対して、短時間フーリエ変換を施して周波数領域の信号に変換すると、各周波数帯域(以後、中心周波数をωcと表す)ごとの特徴を表す複素信号が得られる。この複素信号の平方を計算することで、各周波数ωcごとの信号パワーS(ωc)2を求めることが出来る。周波数領域への変換は、ウォーブレット変換、余弦変換など他の手法を用いてもよい。
瞬時周波数推定手段において、瞬時周波数は、上記のようにして求められる周波数領域の複素信号に対して、各周波数ごとに位相を時間微分することで求めることが出来る。このためには、例えば、例えば、可変長窓切り出し処理で切り出すディジタル信号波形を切り出す位置を1サンプルずらして求められる二つの信号波形から得られる周波数領域の複素信号の位相φ(t1),φ(t2)に対して、各位相の差を1サンプル間の時間差△t=t2−t1で割ってやることで、近似的に微分を行うことが出来る。
【数3】
Figure 0003892379
もしくは、文献[2](阿部 他“調波成分の瞬時周波数を用いた基本周波数推定方法”,電子情報通信学会論文誌,Vol.J83-D-II,No.11,pp.2077-2086,2000)に示されている方法などを用いて、ひとつの時間窓で切り出したディジタル信号波形から直接計算する方法も知られている。
【0014】
占有度計算手段における占有度の推定は、上記のようにして周波数変換の各中心周波数ωcごとに得られる信号パワーS(ωc)2、瞬時周波数φ'(ωc)に基づき、式(1),(2)を用いて行う。まず、各周波数帯の中心周波数ごとに、中心周波数ωcと、その近傍の周波数の瞬時周波数φ'(ω)との2乗誤差(φ'(ω)−ωc)2を計算し、それに信号パワーS(ω)2を乗じた値を、その近傍の周波数全体で総和をとるとともに、同じ近傍周波数で信号パワーS(ω)2のみの総和をとった値で割ることで、各中心周波数ωcに対する占有度を推定できる。総和をとる近傍の周波数の範囲△ωは、基本周波数f0[Hz]を用いてより適切に決定することが出来、例えばf0[Hz]に0.9程度の値を乗じて得られる範囲などを用いればよい。
【0015】
一方、不動点推定部において、不動点は各中心周波数ωcとその点での瞬時周波数φ'(ωc)から求めることが出来る。二つのとなりあった中心周波数ωc1c2およびその瞬時周波数φ'(ωc1),φ'(ωc2)の間に式(6)の不等式が成立する場合、ωc1c2の間に不動点が存在し、不動点の周波数φ'は、式(5)で与えられる。
【数4】
Figure 0003892379
Where φ'1>ωc1 and φ'2<ωc2 (6)
ただし、φ'はφ'(ωc1),φ'(ωc2)の間の値をとるため、以後のディジタル信号処理では、φ'(ωc1),φ'(ωc2)のうちφ'に近い方を不動点として扱うようにする。
【0016】
次に、調波構造占有度推定部は、調波構造占有度の推定を式(3)に基づいて行う。さらに、各時刻の調波構造占有度に対し、時間方向の平滑化処理としてメジアンフィルターを用いることは有効である。メジアンフィルターは、各時刻を中心にして、前後一定数のサンプルを切り出すとともに、切り出したサンプルの値をその時刻の値とするフィルターである。すべての時刻に対して同じ処理を行うことで、平滑化を行える。前後一定数のサンプルの長さとしては、例えば30[msec](前後合わせて60[msec])程度の長さが有効である。
最後に、閾値処理部は、時間平滑化処理を行った調波構造占有度に対し、その値が予め決められた閾値より大きな値をとる時刻を調波構造区間とすることで、調波構造区間を推定することが出来る。
【0017】
(調波構造区間の閾値推定法)
図2に調波構造区間の閾値決定装置の構成例を示す。
多くの音響信号から構成されるデータベースを用いて、調波構造区間を推定するための閾値の決定方法及び装置を説明する。
まず、データベースに含まれる各音響信号に対し、前節で示される方法に基づき、各時間窓ごとに調波構造占有度を求める。すなわち、図1に示された基本周波数推定部、窓切り出し手段と瞬時周波数推定手段からなる瞬時周波数推定部と、信号パワー推定手段と占有度計算手段からなる占有度計算部と、不動点推定部と、調波構造占有度推定部を用いて調波構造占有度を求める。
次にヒストグラム計算部では、占有度-10〜180[dB]を細かいいくつかの区間に分け、得られた各調波構造占有度がどの区間に含まれるかを判定することで、各区間に含まれる値を調波構造占有度がとる総数を計算する。このようにして得られた各区間ごとの回数を、横軸に調波構造占有度の値をとってプロットすると、左右二つの分布の山をもつヒストグラムを生成する(図3 参照)。
分布境界抽出部では、この二つの山の境界の低くなっているところ(例えば、最小値を検出して)を閾値とすることで、調波構造区間推定の閾値を得ることができる。
なお、調波構造占有度の値は、入力音響信号の状態にあまり依存しないため、このようにして求められた閾値は、異なる環境で録音された音響信号に対しても利用することができる。
【0018】
本発明の音響信号の調波構造区間推定装置及び調波構造区間の閾値決定装置はCPUやメモリ等を有するコンピュータと、ユーザが利用する端末と、記録媒体とから構成される。記録媒体は、CD−ROM、磁気ディスク装置、半導体メモリ等の機械読み取り可能な記録媒体であり、ここに記録された調波構造区間推定プログラム及び調波構造区間の閾値決定プログラムはコンピュータに読み取られ、コンピュータの動作を制御し、コンピュータ上に前述した各構成要素を実現する。
【0019】
【発明の効果】
図3(a)に、幼児音声データからランダムに選んだ1749データから、また図3(b)に雑音を含まない成人の音声(28人×30発話)と女性の音声から抽出した調波構造占有度のヒストグラムを示す。各図では、横軸上34[dB]付近をはさんで左右に一つづつ分布の山が出来ている。左が無声音区間、右が有声音区間に相当する分布である。成人でも幼児でも同様な性質を持つ分布が得られ、しかも有声音/無声音の各分布の境界がほぼ同じ程度の値になる。これはV/UV判定の尺度として有効であるといえる。以上の考察により、音声の有声音/無声音の判定の閾値は、34[dB]付近に設定すればよいことがわかる。
最後に発明の構成例を用いて、幼児音声の分析を行った結果の例を図4に示す。図の実線で示すように、ほぼ正しく基本周波数f0および有声音/無声音区間(V/UV)を推定出来ていることがわかる。
【図面の簡単な説明】
【図1】本発明の調波構造区間推定装置の構成例を示す図。
【図2】本発明の調波構造区間の閾値決定装置の構成例を示す図。
【図3】(a)雑音下の幼児音声データの調波構造占有度のヒストグラムと(b)非雑音下の成人男性と女性音声データの調波構造占有度のヒストグラム。
【図4】幼児音声のスペクトルグラムと基本周波数推定及び有声音/無声音区間の判定結果を示す図。
【図5】従来の調波構造区間推定装置の構成を示す図。
【図6】(a)非雑音下における有声音の瞬時周波数と対数パワースペクトルと占有度を示す図、及び(b)白色雑音下における有声音の瞬時周波数と対数パワースペクトルと占有度を示す図。
【図7】(a)幼児音声有声音区間の占有度と対数パワースペクトルと不動点を示す図、及び(b)幼児音声無声音区間の占有度と対数パワースペクトルと不動点を示す図。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a harmonic structure section estimation method for estimating a section (voiced sound section) including a harmonic structure of a target sound (voiced sound in the case of speech) from a plurality of sounds and acoustic signals under noise. .
The estimation of the harmonic structure section is used as preprocessing for signal processing such as speech synthesis, speech recognition, speech coding, and the like. Therefore, the estimation of the harmonic structure section with high accuracy under noise contributes to improving the performance of the signal processing apparatus implemented as post-processing. Such a signal processing device includes the following.
1. A sound source separation device that separates each component sound from a mixed sound of a plurality of sound sources based on information on harmonic structure sections.
2. A speech encoding / decoding device that encodes speech based on information of a voiced sound section.
3. A music search device that searches for music by estimating a melody from a voiced sound segment of a sound sung by a person in a noisy environment.
4). An automatic music transcription device that receives an acoustic signal of a music performance, estimates a harmonic structure section, and estimates music information corresponding to a score or a score.
5). A machine control interface that passes commands to the machine at the height of the fundamental frequency in the voiced section of a voice uttered by a person, and an interactive device with the machine.
[0002]
[Prior art]
FIG. 5 shows a configuration of a conventional harmonic structure section estimation apparatus.
A conventional harmonic structure section estimation apparatus and harmonic structure section estimation method will be described with reference to FIG.
The harmonic structure interval estimation device includes a cepstrum peak extraction unit that extracts a cepstrum peak of an input acoustic signal, a signal power extraction unit that extracts the power of the input acoustic signal, and the number of zero cross points per unit time. A zero crossing ratio extraction unit that extracts, each threshold processing unit that compares each extracted signal with a preset threshold value, and an integration unit that determines a harmonic structure section based on the comparison result of the threshold processing unit Is done.
In this conventional example, the harmonic structure section is determined using the following characteristics of the input acoustic signal.
(1) The peak value of the cepstrum coefficient increases.
(2) The power value increases.
(3) The zero crossing ratio is reduced.
A threshold is determined for each of the three values, and the input section is determined as a harmonic structure section only when the values of (1) and (2) exceed the threshold and the value of (3) is equal to or less than the threshold. (For example, refer nonpatent literature 1.)
[0003]
[Non-Patent Document 1]
Ahmadi, S., and Spanias, AS, "Cepstrum-based pitch detection using a new statistical V / UV classification algorithm", Speech and Audio Processing, IEEE Transaction on, Volume: 7 Issue: 3, May 1999 Page (s): 333-338.
[0004]
[Problems to be solved by the invention]
In the conventional harmonic structure section estimation apparatus and method described above, it is necessary to change the threshold according to the state of the input acoustic signal. For example, when the level of the input acoustic signal increases, the cepstrum coefficient and power value of the acoustic signal increase regardless of whether or not they are in the harmonic structure section. There are many cases in which it is erroneously determined to be a wave structure section. Also, when the background noise increases, the cepstrum coefficient and power value outside the harmonic structure section increase, so that it is necessary to increase the threshold value in order to reduce erroneous determination. The zero crossing ratio itself is a value that has nothing to do with the input level, but is a low-reliability measure for estimating the harmonic structure interval, so it is usually used in combination with the other two features. Yes, it could not be used alone.
Moreover, it is not easy to determine an appropriate threshold according to the situation, and adjustment by an expert is necessary. In addition, the level of the input sound and background noise is generally not constant even in one recording environment. Therefore, it is necessary to adjust each time in order to improve the estimation accuracy.
[0005]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, in the present invention, using the degree of occupancy representing the degree that each frequency component of the input acoustic signal is not affected by noise, the harmonic structure section estimation method and apparatus using the degree of occupancy, A threshold determination method and apparatus for harmonic structure section estimation are configured. In the occupancy method, harmonic components with occupying power under noise are identified, and feature extraction is performed using only those features, so that stable harmonic structure interval estimation can be performed regardless of the state of the input sound. And a threshold determination method for harmonic structure section estimation.
[0006]
The occupancy is defined based on the following property regarding the instantaneous frequency. The instantaneous frequency φ ′ is a time derivative of the phase φ of each output wave when each frequency bin of the short-time Fourier transform is regarded as a narrow band pass filter group aligned at equal intervals. It is known that if there is an occupying frequency component with strong power in a certain band at a certain time, the instantaneous frequency is almost constant in the bin near that frequency in the short-time Fourier transform (hereinafter referred to as STFT). ing. Therefore, when the instantaneous frequency of a sound having a harmonic structure in an input acoustic signal with little noise is plotted on the vertical axis and the frequency bin of STFT is plotted on the horizontal axis, as shown by a thin solid line in FIG. Become a staircase. A point (φ ′ = ω c , hereinafter referred to as a fixed point) where the horizontal portion of the staircase coincides with the center frequency ω c of each frequency bin can be regarded as the frequency of each harmonic component. On the other hand, in an input signal with strong noise, the instantaneous frequency does not have a clear staircase shape, and as shown by a thin solid line (600 [Hz] or more) in FIG. Become.
[0007]
In order to evaluate how much the harmonic structure occupies the output of the frequency bin using the properties of the instantaneous frequency described above, the degree of dominance D 0c ) is defined below.
D 0c ) = 10 log 10 (1 / B (ω c ) 2 ) (1)
[Expression 1]
Figure 0003892379
B (ω c ) 2 is the difference between φ ′ (ω) and ω c in the bin near the frequency bin having the center frequency ω cc −Δω / 2 <ω <ω c + Δω / 2). This is a weighted sum of the spectrum S (ω) 2 . In the vicinity of the fixed point corresponding to the occupied frequency component, φ ′ (ω) and ω c have almost the same value, and therefore B (ω c ) 2 is expected to take a minimum value. D 0c ) is obtained by taking the reciprocal (logarithm thereof) and taking the maximum value at the same point.
Since the occupancy is also a value normalized by the power spectrum S (ω) 2 , only the relative intensity of the harmonic component with respect to the non-harmonic component is evaluated. It does not depend on the absolute power of the signal and takes values in the same range (about −40 to 0 dB).
[0008]
FIG. 7 shows the result of plotting the occupancy of voiced / unvoiced sections of infant speech for each frequency bin. From the figure (a), in the voiced sound section, the occupancy has a regular sharp peak corresponding to each harmonic component, and the fixed point of the instantaneous frequency (= the point where the instantaneous frequency and the center frequency of each FFT bin match) Both are consistent. In addition, peaks other than the harmonic components derived from the background noise appear on the logarithmic spectrum, but these peaks disappear in the occupancy. On the other hand, the position of the peak in the unvoiced sound section is irregular and unclear, and does not coincide with the position of the fixed point as shown in FIG. These properties of occupancy are very effective in extracting the structure of the occupying harmonic component in the input signal.
[0009]
In the occupancy method, the sum of the peak values of the occupancy corresponding to the harmonic structure is taken, and the voiced / unvoiced sound section (V / UV) is determined based on the sum. The following harmonic structure occupancy is obtained for each frame (for example, 1 [msec] shift), and smoothing processing in the time direction is performed with a median filter (for example, 61 sample points), and then threshold processing is performed.
[Expression 2]
Figure 0003892379
Where l is the order of the harmonic, f 0 is the estimated value of the fundamental frequency F 0 , and D 0, F (l · 2πf 0 ) is the occupancy at the fixed point near the 1st harmonic (when there is no fixed point) Is a function that returns E (D 0c ))). Note that E (D 0c )) is a function that removes the occupancy bias and returns an average value of the occupancy in the frequency direction.
[0010]
As described above, in the voiced sound section, the frequency corresponding to each harmonic coincides with the fixed point and the occupancy peak, so that D t0 (f 0 ) takes a larger value than the unvoiced sound section. In addition, since the occupancy of the occupying harmonic component takes a substantially close value regardless of the power of the input speech, D t0 (f 0 ) falls within a certain large value range in the voiced sound section. On the other hand, the degree of occupancy of the unvoiced sound section is small and falls within a certain range. V / UV determination can be performed by setting a threshold value at the boundary of the range that each occupancy of the voiced / unvoiced sound section can take.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
(Estimation method and apparatus for harmonic structure section)
FIG. 1 shows a configuration example of a harmonic structure section estimation apparatus of the present invention.
The configuration of the harmonic structure section estimation device and the harmonic structure section estimation method will be described with reference to FIG.
The harmonic structure interval estimation device includes a fundamental frequency estimation unit, an instantaneous frequency estimation unit including a window cutting unit and an instantaneous frequency estimation unit, an occupancy calculation unit including a signal power estimation unit and an occupancy calculation unit, and a fixed point estimation. Unit, a harmonic structure occupancy estimation unit, and a threshold processing unit.
In this embodiment, first, the fundamental frequency estimation unit estimates the fundamental frequency from the input acoustic signal without distinguishing between voiced / unvoiced sections. The method shown in Japanese Patent Application No. 2002-62513 can be used for estimating the fundamental frequency. In addition, various existing fundamental frequency estimation methods can be used.
[0012]
Next, the voiced sound segment is determined using the fundamental frequency estimation value f 0 .
First, the window cutout means performs window cutout processing in order to analyze the input acoustic sound in a short time unit. It is known that it is effective to change the cutout width of the window in accordance with the estimated fundamental frequency f 0 of the input acoustic signal in order to accurately perform the instantaneous frequency estimation performed in the subsequent processing (for details, refer to [Document [ 1] Atake et al., “Pitch estimation under noisy frequency based on instantaneous frequency”, IEICE Transactions, Vol. J79-D-II, No. 11, pp.1771-1781, 1996). For example, when the estimated fundamental frequency at a certain time is f 0 [Hz], a Hanning window having a length of about 3.5 / f 0 [sec] may be used. In addition to the Hanning window, various existing window functions such as a Hamming window and a Blackman window can be used. Further, although the instantaneous frequency estimation accuracy is lowered, the subsequent analysis can be similarly performed even if a fixed long time window (for example, 42 [msec] Hanning window) is used instead of the variable long time window.
[0013]
Next, the signal power spectrum S (ω) 2 and the instantaneous frequency φ ′ (ω c ), which are the amounts necessary for obtaining the occupancy, are obtained.
In the signal power estimation means, the power spectrum of the acoustic signal is first converted into a frequency domain signal by performing a short-time Fourier transform on each cut-out short-time input acoustic signal. the complex signal is obtained which represents the feature of each representative) and omega c. By calculating the square of this complex signal, the signal power S (ω c ) 2 for each frequency ω c can be obtained. For the conversion to the frequency domain, other methods such as a wavelet transform and a cosine transform may be used.
In the instantaneous frequency estimation means, the instantaneous frequency can be obtained by time-differentiating the phase for each frequency with respect to the complex signal in the frequency domain obtained as described above. For this purpose, for example, the phase φ (t1), φ () of the complex signal in the frequency domain obtained from two signal waveforms obtained by shifting the position to cut out the digital signal waveform cut out by variable length window cut-out processing by one sample. Differentiating can be performed approximately by dividing the difference of each phase by the time difference Δt = t2−t1 between one sample with respect to t2).
[Equation 3]
Figure 0003892379
Or, reference [2] (Abe et al. “Basic frequency estimation method using instantaneous frequency of harmonic component”, IEICE Transactions, Vol. J83-D-II, No. 11, pp. 2077-2086, 2000), a method of directly calculating from a digital signal waveform cut out in one time window is also known.
[0014]
The occupancy estimation in the occupancy calculation means is based on the signal power S (ω c ) 2 and the instantaneous frequency φ ′ (ω c ) obtained for each center frequency ω c of the frequency conversion as described above, using the formula ( Use 1) and (2). First, for each center frequency of each frequency band, a square error (φ ′ (ω) −ω c ) 2 between the center frequency ω c and the instantaneous frequency φ ′ (ω) of the nearby frequency is calculated. Each center frequency is obtained by multiplying the value multiplied by the signal power S (ω) 2 by the sum of all the nearby frequencies and dividing by the value obtained by summing only the signal power S (ω) 2 at the same nearby frequency. The degree of occupancy for ω c can be estimated. The frequency range Δω in the vicinity where the sum is taken can be determined more appropriately using the fundamental frequency f 0 [Hz]. For example, a range obtained by multiplying f 0 [Hz] by a value of about 0.9, etc. Use it.
[0015]
On the other hand, in the fixed point estimation unit, the fixed point can be obtained from each center frequency ω c and the instantaneous frequency φ ′ (ω c ) at that point. If the inequality of equation (6) holds between the two center frequencies ω c1 , ω c2 and the instantaneous frequencies φ ′ (ω c1 ), φ ′ (ω c2 ), the distance between ω c1 and ω c2 There is a fixed point, and the frequency φ ′ of the fixed point is given by equation (5).
[Expression 4]
Figure 0003892379
Where φ ' 1 > ω c1 and φ' 2c2 (6)
However, since φ ′ takes a value between φ ′ (ω c1 ) and φ ′ (ω c2 ), in subsequent digital signal processing, φ ′ (ω c1 ), φ ′ (ω c2 ) of φ ′ The one near is treated as a fixed point.
[0016]
Next, the harmonic structure occupancy estimation unit estimates the harmonic structure occupancy based on Expression (3). Furthermore, it is effective to use a median filter as a smoothing process in the time direction for the degree of harmonic structure occupancy at each time. The median filter is a filter that cuts out a fixed number of samples around each time and uses the value of the cut samples as the value of that time. Smoothing can be performed by performing the same processing for all times. As the length of a certain number of samples before and after, for example, a length of about 30 [msec] (60 [msec] in total before and after) is effective.
Finally, the threshold processing unit sets the harmonic structure section to a time when the value of the harmonic structure occupancy that has been subjected to the time smoothing process takes a value larger than a predetermined threshold value. An interval can be estimated.
[0017]
(Threshold estimation method for harmonic structure sections)
FIG. 2 shows a configuration example of the threshold value determination device for the harmonic structure section.
A threshold value determination method and apparatus for estimating a harmonic structure section will be described using a database composed of many acoustic signals.
First, for each acoustic signal included in the database, the harmonic structure occupancy is obtained for each time window based on the method described in the previous section. That is, the fundamental frequency estimator shown in FIG. 1, an instantaneous frequency estimator comprising window cutout means and instantaneous frequency estimator, an occupancy calculator comprising signal power estimator and occupancy calculator, and a fixed point estimator Then, the harmonic structure occupancy estimation unit is used to obtain the harmonic structure occupancy.
Next, the histogram calculation unit divides the occupancy of -10 to 180 [dB] into several sections and determines which section each harmonic structure occupancy obtained is included in. Calculate the total number of harmonic structure occupancy values included. When the number of times of each section obtained in this way is plotted with the value of the harmonic structure occupancy on the horizontal axis, a histogram having two distribution peaks is generated (see FIG. 3).
The distribution boundary extraction unit can obtain a threshold value for harmonic structure section estimation by setting a place where the boundary between the two peaks is low (for example, by detecting the minimum value) as a threshold value.
Since the value of the harmonic structure occupancy does not depend much on the state of the input acoustic signal, the threshold value obtained in this way can also be used for acoustic signals recorded in different environments.
[0018]
The harmonic structure section estimation apparatus and the harmonic structure section threshold value determination apparatus according to the present invention include a computer having a CPU and a memory, a terminal used by a user, and a recording medium. The recording medium is a machine-readable recording medium such as a CD-ROM, a magnetic disk device, or a semiconductor memory. The harmonic structure section estimation program and the harmonic structure section threshold determination program recorded therein are read by a computer. The operation of the computer is controlled, and the above-described components are realized on the computer.
[0019]
【The invention's effect】
Fig. 3 (a) shows a harmonic structure extracted from 1749 data randomly selected from infant voice data, and Fig. 3 (b) shows an adult voice (28 people x 30 utterances) that does not contain noise and a female voice. The histogram of occupancy is shown. In each figure, there are mountains with one distribution on the left and right across 34 dB on the horizontal axis. The distribution corresponds to the unvoiced sound section on the left and the voiced sound section on the right. Distributions having similar properties can be obtained for both adults and infants, and the boundaries of the distributions of voiced / unvoiced sounds are approximately the same value. This can be said to be effective as a measure of V / UV determination. From the above considerations, it can be seen that the threshold for determination of voiced / unvoiced sound may be set near 34 [dB].
Finally, FIG. 4 shows an example of the result of analyzing the infant voice using the configuration example of the invention. As shown by the solid line in the figure, it can be seen that the fundamental frequency f 0 and the voiced / unvoiced sound section (V / UV) can be estimated almost correctly.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration example of a harmonic structure section estimation device according to the present invention.
FIG. 2 is a diagram illustrating a configuration example of a threshold value determination device for a harmonic structure section according to the present invention.
3A is a histogram of harmonic structure occupancy of infant voice data under noise, and FIG. 3B is a histogram of harmonic structure occupancy of adult male and female voice data under non-noise.
FIG. 4 is a diagram showing a spectrumgram of an infant voice, a fundamental frequency estimation, and a determination result of a voiced / unvoiced sound section.
FIG. 5 is a diagram showing a configuration of a conventional harmonic structure section estimation device.
6A is a diagram showing the instantaneous frequency, logarithmic power spectrum and occupancy of voiced sound under non-noise, and FIG. 6B is a diagram showing the instantaneous frequency, logarithmic power spectrum and occupancy of voiced sound under white noise. .
7A is a diagram showing the occupancy, logarithmic power spectrum, and fixed point of an infant voiced sound section; FIG. 7B is a diagram showing the occupancy, logarithmic power spectrum and fixed point of an infant voice unvoiced section;

Claims (8)

基本周波数推定手段が、連続的に入力される音響信号に対し、各時刻ごとの基本周波数を推定するステップと、
瞬時周波数推定手段が、前記音響信号中の瞬時周波数を推定するステップと、
不動点推定手段が、前記瞬時周波数に基づき不動点を推定するステップと、
占有度計算手段が、占有度を計算するステップと、
調波構造占有度推定手段が、基本周波数の整数倍の近傍の不動点における占有度の和をとる調波構造占有度を計算するステップと、
閾値処理手段が、時間平滑化処理を行った前記調波構造占有度に対し、各時刻毎に、その値が予め決められた閾値より大きな値をとる時刻を調波構造区間とすることで、調波構造区間を推定するステップと、を備えたことを特徴とする調波構造区間推定方法。
Fundamental frequency estimation means, with respect to acoustic signals which are continuously inputted, the step of estimating the fundamental frequency of each time,
An instantaneous frequency estimating means estimating an instantaneous frequency in the acoustic signal;
A fixed point estimating means estimating a fixed point based on the instantaneous frequency; and
An occupancy calculating means calculating an occupancy;
It is harmonic structure occupancy estimator, calculating a harmonic structure occupancy summing the occupancy of definitive immovable point in the vicinity of integral multiples of the fundamental frequency,
For the harmonic structure occupancy that has been subjected to the time smoothing process by the threshold processing means, for each time, a time at which the value takes a value larger than a predetermined threshold is set as the harmonic structure section, A harmonic structure section estimation method comprising: a step of estimating a harmonic structure section.
基本周波数推定手段が、データベース中に含まれる音響信号それぞれに対し、各時刻ごとの基本周波数を推定するステップと、ここで、データベースとは、予め多くの音響信号が記憶されているものであり、
瞬時周波数推定手段が、前記音響信号中の瞬時周波数を推定するステップと、
不動点推定手段が、前記瞬時周波数に基づき不動点を推定するステップと、
占有度計算手段が、占有度を計算するステップと、
調波構造占有度推定手段が、基本周波数の整数倍の近傍の不動点における占有度の和をとる調波構造占有度を計算するステップと、
ヒストグラム計算手段が、前記占有度の値をいくつかの区間に分け、得られた各前記調波構造占有度がどの前記区間に含まれるかを判定し、各前記区間に含まれた値を調波構造占有度の総数として計算し、前記総数と前記調波構造占有度の値をとってプロットすることで、二つの分布の山をもつヒストグラムを生成するステップと、
分布境界抽出手段が、前記二つの分布の山の間の極小値における値を閾値として抽出するステップと、を備えたことを特徴とする調波構造区間の閾値決定方法
The fundamental frequency estimating means estimates the fundamental frequency for each time for each acoustic signal included in the database, wherein the database stores a large number of acoustic signals in advance,
An instantaneous frequency estimating means estimating an instantaneous frequency in the acoustic signal;
A fixed point estimating means estimating a fixed point based on the instantaneous frequency; and
An occupancy calculating means calculating an occupancy;
The harmonic structure occupancy estimation means calculates a harmonic structure occupancy that takes the sum of the occupancy at fixed points in the vicinity of an integral multiple of the fundamental frequency;
Histogram calculation means divides the occupancy value into several sections, determines which section the obtained harmonic structure occupancy is included, and adjusts the value included in each section. Calculating as a total number of wave structure occupancy, generating a histogram with two distribution peaks by plotting the total number and harmonic structure occupancy values; and
A threshold value determination method for a harmonic structure section, comprising: a distribution boundary extraction unit extracting a value at a local minimum value between the two distribution peaks as a threshold value ;
連続的に入力される音響信号に対し、各時刻ごとの基本周波数を推定する基本周波数推定部と、
前記音響信号中の瞬時周波数を推定する瞬時周波数推定部と、
瞬時周波数に基づき不動点を推定する不動点推定部と、
占有度を計算する占有度計算部と、
基本周波数の整数倍の近傍の不動点における占有度の和をとる調波構造占有度を計算する調波構造占有度推定部と、
時間平滑化処理を行った前記調波構造占有度に対し、各時刻毎に、その値が予め決められた閾値より大きな値をとる時刻を調波構造区間とすることで、調波構造区間を推定する閾値処理部と、を備えたことを特徴とする調波構造区間推定装置。
A fundamental frequency estimator for estimating a fundamental frequency at each time for a continuously input acoustic signal;
An instantaneous frequency estimator for estimating an instantaneous frequency in the acoustic signal;
A fixed point estimator that estimates a fixed point based on the instantaneous frequency;
An occupancy calculator for calculating occupancy;
A harmonic structure occupancy estimator for calculating a harmonic structure occupancy summing the occupancy of definitive immovable point in the vicinity of integral multiples of the fundamental frequency,
For the harmonic structure occupancy that has been subjected to the time smoothing process, the harmonic structure section is defined as a harmonic structure section at each time when the value takes a value larger than a predetermined threshold. A harmonic structure section estimation device comprising: a threshold processing unit for estimation.
予め多くの音響信号が記憶されているデータベースと、
前記音響信号に対し、各時刻ごとの基本周波数を推定する基本周波数推定部と、
前記音響信号中の瞬時周波数を推定する瞬時周波数推定部と、
瞬時周波数に基づき不動点を推定する不動点推定部と、
占有度を計算する占有度計算部と、
基本周波数の整数倍の近傍の不動点における占有度の和をとる調波構造占有度を計算する調波構造占有度推定部と、
前記占有度の値をいくつかの区間に分け、得られた各前記調波構造占有度がどの前記区間に含まれるかを判定し、各前記区間に含まれた値を調波構造占有度の総数として計算し、前記総数と前記調波構造占有度の値をとってプロットすることで、二つの分布の山をもつヒストグラムを生成するヒストグラム計算部と、
前記二つの分布の山の間の極小値における値を閾値として抽出する分布境界抽出部と、を備えたことを特徴とする調波構造区間の閾値決定装置。
A database in which many acoustic signals are stored in advance;
For the acoustic signal, a fundamental frequency estimation unit that estimates a fundamental frequency for each time;
An instantaneous frequency estimator for estimating an instantaneous frequency in the acoustic signal;
A fixed point estimator that estimates a fixed point based on the instantaneous frequency;
An occupancy calculator for calculating occupancy;
A harmonic structure occupancy estimator for calculating a harmonic structure occupancy summing the occupancy of definitive immovable point in the vicinity of integral multiples of the fundamental frequency,
Dividing the value of the occupancy into several sections, determining which of the obtained harmonic structure occupancy is included in the section, and determining the value included in each of the sections as the harmonic structure occupancy A histogram calculation unit that calculates a total number, generates a histogram having two distribution peaks by plotting the total number and the harmonic structure occupancy value, and
A threshold value determination apparatus for a harmonic structure section, comprising: a distribution boundary extraction unit that extracts a value at a local minimum value between the two distribution peaks as a threshold value.
連続的に入力される音響信号に対し、各時刻ごとの基本周波数を推定する処理と、
前記音響信号中の瞬時周波数を推定する処理と、
瞬時周波数に基づき不動点を推定する処理と、
占有度を計算する処理と、
基本周波数の整数倍の近傍の不動点における占有度の和をとる調波構造占有度を計算する処理と、
時間平滑化処理を行った前記調波構造占有度に対し、各時刻毎に、その値が予め決められた閾値より大きな値をとる時刻を調波構造区間とすることで、調波構造区間を推定する処理と、をコンピュータに実行させるため調波構造区間推定プログラム。
A process for estimating the fundamental frequency for each time for continuously input acoustic signals,
A process for estimating an instantaneous frequency in the acoustic signal;
Processing to estimate the fixed point based on the instantaneous frequency;
Processing to calculate the occupancy,
A process of calculating the harmonic structure occupancy that takes the sum of the occupancy at fixed points in the vicinity of an integer multiple of the fundamental frequency;
For the harmonic structure occupancy that has been subjected to the time smoothing process, the harmonic structure section is defined as a harmonic structure section at each time when the value takes a value larger than a predetermined threshold. A harmonic structure section estimation program for causing a computer to execute processing for estimation.
データベース中に含まれる音響信号それぞれに対し、各時刻ごとの基本周波数を推定する処理と、ここで、データベースとは、予め多くの音響信号が記憶されているものであり、
前記音響信号中の瞬時周波数を推定する処理と、
瞬時周波数に基づき不動点を推定する処理と、
占有度を計算する処理と、
基本周波数の整数倍の近傍の不動点における占有度の和をとる調波構造占有度を計算する処理と、
前記占有度の値をいくつかの区間に分け、得られた各前記調波構造占有度がどの前記区間に含まれるかを判定し、各前記区間に含まれた値を調波構造占有度の総数として計算し、前記総数と前記調波構造占有度の値をとってプロットすることで、二つの分布の山をもつヒストグラムを生成する処理と、
前記二つの分布の山の間の極小値における値を閾値として抽出する処理と、をコンピュータに実行させるための調波構造区間の閾値決定プログラム。
For each acoustic signal included in the database, a process for estimating the fundamental frequency for each time, and here, the database is one in which many acoustic signals are stored in advance.
A process for estimating an instantaneous frequency in the acoustic signal;
Processing to estimate the fixed point based on the instantaneous frequency;
Processing to calculate the occupancy,
A process of calculating a harmonic structure occupancy summing the occupancy of definitive immovable point in the vicinity of integral multiples of the fundamental frequency,
Dividing the value of the occupancy into several sections, determining which of the obtained harmonic structure occupancy is included in the section, and determining the value included in each of the sections as the harmonic structure occupancy Calculating as a total number, and taking the value of the total number and the harmonic structure occupancy and plotting it, generating a histogram with two distribution peaks ;
A threshold value determination program for a harmonic structure section for causing a computer to execute a process of extracting a value at a local minimum value between the two distribution peaks as a threshold value.
連続的に入力される音響信号に対し、各時刻ごとの基本周波数を推定する処理と、
前記音響信号中の瞬時周波数を推定する処理と、
瞬時周波数に基づき不動点を推定する処理と、
占有度を計算する処理と、
基本周波数の整数倍の近傍の不動点における占有度の和をとる調波構造占有度を計算する処理と、
時間平滑化処理を行った前記調波構造占有度に対し、各時刻毎に、その値が予め決められた閾値より大きな値をとる時刻を調波構造区間とすることで、調波構造区間を推定する処理と、をコンピュータに実行させるための調波構造区間推定プログラムを記録した記録媒体。
A process for estimating the fundamental frequency for each time for continuously input acoustic signals,
A process for estimating an instantaneous frequency in the acoustic signal;
Processing to estimate the fixed point based on the instantaneous frequency;
Processing to calculate the occupancy,
A process of calculating a harmonic structure occupancy summing the occupancy of definitive immovable point in the vicinity of integral multiples of the fundamental frequency,
For the harmonic structure occupancy that has been subjected to the time smoothing process, the harmonic structure section is defined as a harmonic structure section at each time when the value takes a value larger than a predetermined threshold. The recording medium which recorded the harmonic structure area estimation program for making a computer perform the process to estimate.
データベース中に含まれる音響信号それぞれに対し、各時刻ごとの基本周波数を推定する処理と、
ここで、データベースとは、予め多くの音響信号が記憶されているものであり、
前記音響信号中の瞬時周波数を推定する処理と、
瞬時周波数に基づき不動点を推定する処理と、
占有度を計算する処理と、
基本周波数の整数倍の近傍の不動点における占有度の和をとる調波構造占有度を計算する処理と、
前記占有度の値をいくつかの区間に分け、得られた各前記調波構造占有度がどの前記区間に含まれるかを判定し、各前記区間に含まれた値を調波構造占有度の総数として計算し、前記総数と前記調波構造占有度の値をとってプロットすることで、二つの分布の山をもつヒストグラムを生成する処理と、
前記二つの分布の山の間の極小値における値を閾値として抽出する処理と、をコンピュータに実行させるための調波構造区間の閾値決定プログラムを記録した記録媒体。
For each acoustic signal included in the database, a process for estimating the fundamental frequency for each time,
Here, the database is a database in which many acoustic signals are stored in advance.
A process for estimating an instantaneous frequency in the acoustic signal;
Processing to estimate the fixed point based on the instantaneous frequency;
Processing to calculate the occupancy,
A process of calculating a harmonic structure occupancy summing the occupancy of definitive immovable point in the vicinity of integral multiples of the fundamental frequency,
Dividing the value of the occupancy into several sections, determining which of the obtained harmonic structure occupancy is included in the section, and determining the value included in each of the sections as the harmonic structure occupancy Calculating as a total number, and taking the value of the total number and the harmonic structure occupancy and plotting it, generating a histogram with two distribution peaks ;
The recording medium which recorded the threshold value determination program of the harmonic structure area for making a computer perform the process which extracts the value in the minimum value between the peaks of said two distribution as a threshold value.
JP2002274525A 2002-09-20 2002-09-20 Harmonic structure section estimation method and apparatus, harmonic structure section estimation program and recording medium recording the program, harmonic structure section estimation threshold determination method and apparatus, harmonic structure section estimation threshold determination program and program Recording media Expired - Lifetime JP3892379B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2002274525A JP3892379B2 (en) 2002-09-20 2002-09-20 Harmonic structure section estimation method and apparatus, harmonic structure section estimation program and recording medium recording the program, harmonic structure section estimation threshold determination method and apparatus, harmonic structure section estimation threshold determination program and program Recording media

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2002274525A JP3892379B2 (en) 2002-09-20 2002-09-20 Harmonic structure section estimation method and apparatus, harmonic structure section estimation program and recording medium recording the program, harmonic structure section estimation threshold determination method and apparatus, harmonic structure section estimation threshold determination program and program Recording media

Publications (2)

Publication Number Publication Date
JP2004109742A JP2004109742A (en) 2004-04-08
JP3892379B2 true JP3892379B2 (en) 2007-03-14

Family

ID=32270970

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2002274525A Expired - Lifetime JP3892379B2 (en) 2002-09-20 2002-09-20 Harmonic structure section estimation method and apparatus, harmonic structure section estimation program and recording medium recording the program, harmonic structure section estimation threshold determination method and apparatus, harmonic structure section estimation threshold determination program and program Recording media

Country Status (1)

Country Link
JP (1) JP3892379B2 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102009058415B4 (en) 2009-12-16 2012-12-06 Siemens Medical Instruments Pte. Ltd. Method for frequency transposition in a hearing aid device and hearing aid device
JP5696828B2 (en) * 2010-01-12 2015-04-08 ヤマハ株式会社 Signal processing device
JP6152690B2 (en) * 2013-05-02 2017-06-28 ヤマハ株式会社 Acoustic analyzer

Also Published As

Publication number Publication date
JP2004109742A (en) 2004-04-08

Similar Documents

Publication Publication Date Title
Gonzalez et al. PEFAC-A pitch estimation algorithm robust to high levels of noise
US7035742B2 (en) Apparatus and method for characterizing an information signal
US8440901B2 (en) Musical score position estimating apparatus, musical score position estimating method, and musical score position estimating program
Dhananjaya et al. Voiced/nonvoiced detection based on robustness of voiced epochs
US9093056B2 (en) Audio separation system and method
US7660718B2 (en) Pitch detection of speech signals
US7567900B2 (en) Harmonic structure based acoustic speech interval detection method and device
KR101266894B1 (en) Apparatus and method for processing an audio signal for speech emhancement using a feature extraxtion
US8193436B2 (en) Segmenting a humming signal into musical notes
EP2988297A1 (en) Complexity scalable perceptual tempo estimation
Sukhostat et al. A comparative analysis of pitch detection methods under the influence of different noise conditions
CN103854662A (en) Self-adaptation voice detection method based on multi-domain joint estimation
CN111128213A (en) Noise suppression method and system for processing in different frequency bands
Rossignol et al. Feature extraction and temporal segmentation of acoustic signals
CN107210029B (en) Method and apparatus for processing a series of signals for polyphonic note recognition
Magre et al. A comparative study on feature extraction techniques in speech recognition
US5809453A (en) Methods and apparatus for detecting harmonic structure in a waveform
Katmeoka et al. Separation of harmonic structures based on tied Gaussian mixture model and information criterion for concurrent sounds
CN104036785A (en) Speech signal processing method, speech signal processing device and speech signal analyzing system
Markel Application of a digital inverse filter for automatic formant and F o analysis
KR100744288B1 (en) Method of segmenting phoneme in a vocal signal and the system thereof
JP5924968B2 (en) Score position estimation apparatus and score position estimation method
JP3892379B2 (en) Harmonic structure section estimation method and apparatus, harmonic structure section estimation program and recording medium recording the program, harmonic structure section estimation threshold determination method and apparatus, harmonic structure section estimation threshold determination program and program Recording media
Zhao et al. A processing method for pitch smoothing based on autocorrelation and cepstral F0 detection approaches
Elton et al. A novel voice activity detection algorithm using modified global thresholding

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20040727

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20060714

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20060725

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20060920

RD03 Notification of appointment of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7423

Effective date: 20060920

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20061017

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20061101

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20061128

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20061206

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

Ref document number: 3892379

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101215

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101215

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111215

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111215

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121215

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121215

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20131215

Year of fee payment: 7

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

EXPY Cancellation because of completion of term