JP2004341339A

JP2004341339A - Noise restriction device

Info

Publication number: JP2004341339A
Application number: JP2003139248A
Authority: JP
Inventors: Satoshi Furuta; 訓古田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2003-05-16
Filing date: 2003-05-16
Publication date: 2004-12-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a noise restriction device which can perform noise restriction favorable in audibility and has small quality deterioration even in very noisy environment. <P>SOLUTION: The device is equipped with a a band division part 4 which divides the amplitude spectrum of an input speech signal into a plurality of frequency bands and outputs mean amplitude spectra by the bands, a band-classified speech/noise decision part 6 which analyzes spectrum shapes by the frequency bands, discriminates a speech, noise, and a speech-like noise similar to a speech, and outputs an estimated noise spectrum update flag, a noise spectrum estimation part 7 which determines whether estimated noise spectrums are updated by the bands according to the estimated noise spectrum update flag, and a noise restriction part 8 which selects optimum noise restriction methods by the bands according to the estimated noise spectrum update flag to perform noise restriction of the amplitude spectra, and outputs noise restricted spectra. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
この発明は、雑音抑圧装置に関するものである。
【０００２】
【従来の技術】
携帯電話やＴＶ会議システム等の音声通信システムや音声認識システムは、種々の雑音を含む環境下で用いられる。目的信号である音声信号以外の雑音信号を抑制することにより、目的信号が強調され、音質の改善や、音声認識率の向上を図ることができる。
【０００３】
雑音が混入した入力信号から雑音信号を抑圧するための様々な技術が公表されている。
例えば、特許文献１に開示された従来の雑音抑圧装置は、非特許文献１に示されたスペクトルサブトラクション（ＳｐｅｃｔｒａｌＳｕｂｔｒａｃｔｉｏｎ：以下、ＳＳ法という。）により雑音の抑圧を行うものである。ＳＳ法では、振幅スペクトルから、別途推定した平均的な雑音スペクトルを減算することにより雑音の抑圧を行う。
【０００４】
また、特許文献２に開示された背景雑音除去装置では、入力信号を周波数成分に変換すると共に入力信号の音声・雑音区間判定を行う。現フレームの入力信号が雑音と判定された場合には、現フレームにおいて推定した背景雑音と、過去のフレームにおいて推定された背景雑音の平均を取って推定背景雑音を更新する。一方、現フレームが音声区間と判断された場合には、周波数成分から推定背景雑音を減算して雑音抑圧信号を求める。この減算処理で得られた雑音抑圧信号の周波数成分を信号とし、推定された背景雑音を雑音として、全周波数帯域での信号対雑音比（ＳＮ比）と全周波数帯域を複数に分割した小領域毎のＳＮ比を計算する。小帯域別のＳＮ比と全帯域のＳＮ比の差が所定値以下の小領域については、雑音抑圧信号と推定背景雑音成分とを所定の割合で含む再更新背景雑音を生成し、雑音抑圧信号から再更新背景雑音をさらに減算して再雑音抑圧信号を求め、この信号を時間領域で表現される信号に戻して雑音抑圧信号を得る。
【０００５】
また、非特許文献２に開示された従来の雑音抑圧方法は、特許文献１と同様にＳＳ法を基本としている。入力信号の周波数変換を行うと共に、現フレームの有音・雑音判定を行い、現フレームが有音区間である場合には入力信号スペクトルの包絡線と推定雑音スペクトルの包絡線の交点を求め、その交点をカットオフ周波数とした高域通過形フィルタ（ＨｉｇｈＰａｓｓＦｉｌｔｅｒ：以下、ＨＰＦと記す。）と低域通過形フィルタ（ＬｏｗＰａｓｓＦｉｌｔｅｒ：以下、ＬＰＦと記す。）を用いて入力信号を高域成分と低域成分に分離する。そして、低域成分では通常のＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ：高速フーリエ変換）を用いたＳＳ法による雑音抑圧方式を選択し、高域成分ではＭＷＳＥ（Ｍｕｌｔｉ−ＷｉｎｄｏｗＳｐｅｃｔｒａｌＥｓｔｉｍａｔｉｏｎ）法を用いたＳＳ法による雑音抑圧方式を選択する。このように、高域と低域で特性の異なる雑音抑圧方法をとることにより、良好な雑音抑圧を行うことを可能にしている。
【０００６】
【特許文献１】
特開２０００−３４７６８８号公報
【特許文献２】
特開平１０−１７１４９７号公報
【非特許文献１】
Ｓ．Ｆ．Ｂｏｌｌ，”ＳｕｐｐｒｅｓｓｉｏｎｏｆＡｃｏｕｓｔｉｃｎｏｉｓｅｉｎｓｐｅｅｃｈｕｓｉｎｇｓｐｅｃｔｒａｌｓｕｂｔｒａｃｔｉｏｎ”，ＩＥＥＥＴｒａｎｓ．ＡＳＳＰ，Ａｐｒｉｌ１９７９，Ｖｏｌ．ＡＳＳＰ−２７，Ｎｏ．２
【非特許文献２】
Ｃ．ＨｅａｎｄＧ．Ｚｗｅｉｇ，”ＡｄａｐｔｉｖｅＴｗｏ−ｂａｎｄＳｐｅｃｔｒａｌＳｕｂｔｒａｃｔｉｏｎｗｉｔｈＭｕｌｔｉ−ＷｉｎｄｏｗＳｐｅｃｔｒａｌＥｓｔｉｍａｔｉｏｎ”，ＩＥＥＥＣｏｎｆｅｒｅｎｃｅｏｆＡｃｏｕｓｔｉｃＳｐｅｅｃｈＰｒｏｃｅｓｓｉｎｇ，１９９９，ｐｐ．７９３−７９６
【０００７】
【発明が解決しようとする課題】
雑音の中には、例えば多人数の人声が混じった雑音のように、スペクトル形状が音声スペクトルに似た雑音がある。このような雑音を音声的雑音（Ｓｐｅｅｃｈ−ｌｉｋｅｎｏｉｓｅ）という。
【０００８】
特許文献２に開示された従来の背景雑音除去装置は、全帯域ＳＮ比と各帯域ＳＮ比との差が所定の閾値以下の帯域に対し、雑音スペクトルの再減算処理を行うので大きな抑圧量が得られる利点がある。しかし、再減算処理を行うかどうかは、単に全帯域ＳＮ比と各帯域ＳＮ比との差の値によって判断しており、その帯域のスペクトルが音声スペクトル的なものか、または雑音スペクトル的なものかどうかは判定していない。そのため、音声的雑音が入力信号に混入している場合には以下のような問題が生じる。
【０００９】
まず、音声的雑音を雑音として扱う場合、次のような問題がある。ＳＳ法に基づいて雑音抑圧を行う場合、推定背景雑音スペクトルは周波数軸方向の変動が少ない方が望ましい。しかし、音声的雑音は周波数軸方向の変動が大きいため、音声的雑音が雑音として推定雑音スペクトルに混入すると、推定雑音スペクトルの精度が劣化するという問題がある。
【００１０】
一方、音声的雑音を誤って「音声」と判定した場合には、音声的雑音は音声として雑音抑圧されることとなる。しかし、音声的雑音は信号パワーは小さいがスペクトル形状が音声スペクトル的であることから、スペクトル減算処理を行うことにより、スペクトル振幅が比較的大きなスペクトル成分だけが孤立して残る。特許文献２の装置のように、更に再減算処理を行うことにより、不要なスペクトル成分がさらに強調されてしまい、残留雑音に含まれる耳障りな人工的雑音（ミュージカルノイズ）が増大してしまう。
【００１１】
また、特許文献２の装置を、臨場感が求められるＴＶ会議システムのように、７ｋＨを上限とした広帯域音声通信システムに適用する場合を考える。４ｋＨｚ以上の高域の音声スペクトル成分のＳＮ比とパワーはかなり小さくなるため、音声・雑音判定において４ｋＨｚ以上の音声を雑音に誤る場合がある。誤って雑音と判定されると、高域の音声が大きくスペクトル減算されるので、高域においてはスペクトル振幅が比較的大きなスペクトル成分だけが残ることになる。これによりミュージカルノイズが発生して音質が劣化する。
【００１２】
また、非特許文献２に開示された従来の雑音抑圧装置は、入力信号スペクトルの包絡線と雑音スペクトルの包絡線との交点から定めたＨＰＦとＬＰＦを用いて、入力信号を低域と高域の２帯域に分離し、各帯域に応じた雑音抑圧方式を選択する構成なので、各帯域に応じた良好な雑音抑圧を行うことができる。しかし、例えば、３帯域以上の有音帯域及び雑音帯域が存在するような場合の雑音抑圧には適さない。
【００１３】
この発明は上記のような課題を解決するためになされたもので、聴感上好ましい雑音抑圧が可能で、高雑音下でも品質劣化の少ない雑音抑圧装置を得ることを目的とする。
【００１４】
【課題を解決するための手段】
この発明に係る雑音抑圧装置は、時間領域で表される入力音声信号を周波数領域の表現に変換し、周波数成分から振幅スペクトルと位相スペクトルを生成する時間・周波数変換部と、振幅スペクトルを複数の周波数帯域に分割し、帯域毎の平均振幅スペクトルを出力する帯域分割部と、周波数帯域毎に平均振幅スペクトルのスペクトル形状を解析して音声、雑音、及び音声に類似した音声的雑音の区別を行い、スペクトル形状判定結果を出力する帯域別音声・雑音判定部と、スペクトル形状判定結果に基づいて、周波数帯域毎の推定雑音スペクトルの更新を行なうかどうかを決定する雑音スペクトル推定部と、スペクトル形状判定結果に基づいて、周波数帯域毎に最適な雑音抑圧方法を選択し、選択した方法に従って各周波数帯域の振幅スペクトルから周波数帯域毎の推定雑音スペクトルを抑圧することにより得られる雑音抑圧スペクトルを出力する雑音抑圧部と、雑音抑圧スペクトルを時間領域で表される信号に変換することにより雑音抑圧信号を生成する周波数・時間変換部とを備えたものである。
【００１５】
【発明の実施の形態】
以下、この発明の実施の様々な形態を説明する。
実施の形態１．
図１は、この発明の実施の形態１による雑音抑圧装置１００の構成を示すブロック図である。
図に示すように、雑音抑圧装置１００は、入力端子１、時間・周波数変換部２、雑音らしさ分析部３、帯域分割部４、帯域ＳＮ比計算部５、帯域別音声・雑音判定部６、雑音スペクトル推定部７、雑音抑圧部８、周波数・時間変換部１２、出力端子１３を備えている。
また、雑音抑圧部８は、雑音抑圧制御部９、スペクトル減算部１０、スペクトル振幅抑圧部１１を備えている。
【００１６】
雑音抑圧装置１００による雑音抑圧処理について説明する。
雑音が混入した入力信号ｓ［ｔ］が入力端子１に入力されると、入力信号ｓ［ｔ］は所定のサンプリング周波数でサンプリングされ、所定の周期でフレーム分割されて時間・周波数変換部２へ入力される。なお、ここではサンプリング周波数を８ｋＨｚ、フレーム周期を２０ｍｓとする。
【００１７】
時間・周波数変換部２は、例えば２５６点の高速フーリエ変換（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ：以下、ＦＦＴを記す。）を用いてフレーム分割された入力信号ｓ［ｔ］を周波数解析し、振幅スペクトルＳ［ｆ］と位相スペクトルＰ［ｆ］とを生成して出力する。なおＦＦＴは周知の手法であるので説明は省略する。
【００１８】
雑音らしさ分析部３は、入力端子１から出力されたフレーム分割された入力信号ｓ［ｔ］と、時間・周波数変換部２から出力された振幅スペクトルＳ［ｆ］の入力を受ける。雑音らしさ分析部３は、入力された現フレームの入力信号ｓ［ｔ］を解析し、現フレームが音声区間であるか雑音区間であるかの状態を示す指標である雑音らしさ信号Ｎｓｔｔを帯域別音声・雑音判定部６へ出力する。また、雑音スペクトル推定部７に雑音らしさ信号Ｎｓｔｔに対応した雑音スペクトル更新係数ｒを出力する。雑音スペクトル更新係数ｒは、後述する推定雑音スペクトルＮ［ｆ_Ｂ］の算出に用いられる。
【００１９】
図２に、雑音らしさ信号Ｎｓｔｔおよび雑音スペクトル更新係数ｒと、現フレームの様態との関係を示す。図に示すように、雑音らしさ信号Ｎｓｔｔは、レベル値１〜５で出力される。Ｎｓｔｔが４〜５の範囲にあるとき、現フレームは雑音区間であることを表し、Ｎｓｔｔが１〜３の範囲にあるとき現フレームは音声区間であるとする。雑音らしさ信号Ｎｓｔｔの算出方法としては、例えば特許文献１に開示されている雑音らしさ分析処理と同様に行うことができるので、ここでは簡単に説明する。
雑音らしさ分析部３は、ローパスフィルタにより入力信号ｓ［ｔ］から高域雑音の影響を取り除き、ローパスフィルタ信号を得る。次に、ローパスフィルタ信号の線形予測分析を行う。次に、得られた線形予測係数を用いて、ローパスフィルタ信号の逆フィルタ処理を行う。逆フィルタ処理の結果得られたローパス残差信号の自己相関係数の正のピーク値と、ローパス残差信号のパワーおよびフレームパワーに基づいてＮｓｔｔを算出する。
【００２０】
帯域分割部４は、時間・周波数変換部２から出力された振幅スペクトルＳ［ｆ］の入力を受け、振幅スペクトルＳ［ｆ］を例えばバークスペクトル間隔として示される２０の周波数帯域に分割する。帯域分割部４は、分割した帯域毎に振幅スペクトルＳ［ｆ］の平均スペクトルを求め、帯域分割した振幅スペクトルＳｐ［ｆ_Ｂ］として出力する。なお、ｆ_Ｂはバークスペクトルにおける帯域番号を表す。
なお、バークスペクトルについては、ＥｂｅｒｈａｒｄＺｗｉｃｋｅｒ著、「心理音響学」、西村書店、１９９２、７４ページの表１に示されている。バークスペクトル間隔は人間の聴感特性に対応した周波数帯域の分割方法で、低周波数領域では帯域幅が狭く、周波数が高くなるにつれて帯域幅が広くなる特性を持つ。以下、周波数帯域毎の計算処理は、断りが無い限りバークスペクトル帯域ｆ_Ｂにおいて行うものとする。
【００２１】
帯域ＳＮ比計算部５は、帯域分割された振幅スペクトルＳｐ［ｆ_Ｂ］と、後述する推定雑音スペクトルＮ［ｆ_Ｂ］の入力を受け、下記の式（１）に従って帯域ＳＮ比ＳＮＲ［ｆ_Ｂ］を算出する。すなわち、帯域ＳＮ比は、各帯域の信号スペクトルパワーと雑音スペクトルパワーの比として算出される。

【００２２】
すなわち、式（１）において、計算の結果ＳＮＲ［ｆ_Ｂ］の値が負になる場合には、ＳＮＲ［ｆ_Ｂ］＝０とする。
【００２３】
帯域別音声・雑音判定部６は、帯域ＳＮ比計算部５が出力する現フレームの１つ前のフレームの帯域ＳＮ比ＳＮＲ［ｆ_Ｂ］を受け、帯域別の音声・雑音判定を行い、判定結果に応じて帯域別音声・雑音判定フラグｓｖａｄ［ｆ_Ｂ］を算出する。帯域別音声・雑音判定の方法として、例えば式（２）のように帯域ＳＮ比ＳＮＲ［ｆ_Ｂ］と所定の閾値ＴＨ１（第１の閾値）の比較による判定を行うことができる。
ＳＮＲ［ｆ_Ｂ］＞ＴＨ１の時
ｓｖａｄ［ｆ_Ｂ］＝Ｖｏｉｃｅ（音声）
ＳＮＲ［ｆ_Ｂ］≦ＴＨ１の時
ｓｖａｄ［ｆ_Ｂ］＝Ｎｏｉｓｅ（雑音）
ただし、ｆ_Ｂ＝｛１，・・・，２０｝（２）
【００２４】
ここで、ＴＨ１は帯域別音声・雑音判定に用いる閾値であり、閾値ＴＨ１には、多数の音声ＳＮ比のサンプルから得られた好適な値として、例えばＴＨ１＝１．５ｄＢを用いることができる。
【００２５】
さらに、帯域別音声・雑音判定部６は、式（１）によって算出した各帯域の帯域別音声・雑音判定フラグｓｖａｄ［ｆ_Ｂ］、および雑音らしさ分析部３が出力する雑音らしさ信号Ｎｓｔｔに基づいて、雑音帯域の連続性の判定処理を行い、判定結果に基づいて帯域毎の推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］（スペクトル形状判定結果）を設定する。図３のフローチャートを用いて、帯域別音声・雑音判定部６における、帯域毎の推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］の設定処理について説明する。
【００２６】
まず、ステップＳＴ１０１では、帯域別音声・雑音判定部６は雑音らしさ分析部３が出力する雑音らしさ信号Ｎｓｔｔを解析する。図２に示したように、Ｎｓｔｔの値が１，２，３である場合には音声区間と判断し、ステップＳＴ１０２へ進む。
【００２７】
一方、Ｎｓｔｔの値が４，５の場合には、全帯域において雑音区間であると判断し、ステップＳＴ１１０へ進む。
ステップＳＴ１１０では、すべての帯域の推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］にＮＯＩＳＥを設定し出力する。後述するように、これにより、全帯域の推定雑音スペクトルＮ［ｆ_Ｂ］の更新が行われる。
【００２８】
ステップＳＴ１０２〜ステップＳＴ１０９の処理は、分割された帯域毎に行われる。まず、ステップＳＴ１０２では、帯域別音声・雑音判定フラグｓｖａｄ［ｆ_Ｂ］の値を判定し、ｓｖａｄ［ｆ_Ｂ］の値が雑音（ＮＯＩＳＥ）を示す場合にはステップＳＴ１０３へ進み、音声（ＶＯＩＣＥ）を示す場合にはステップＳＴ１０５へ進む。
【００２９】
ステップＳＴ１０３では、ＮＯＩＳＥと判断された帯域数がインクリメントされる。得られたカウント数をｃｏｕｎｔとする。次に、ステップＳＴ１０４では、処理中の帯域の帯域番号ｆ_Ｂが最大値２０になったと判定された場合にはステップＳＴ１０５へ進む。ｆ_Ｂが最大値が１９以下の場合には、ステップＳＴ１０２へ戻る。この繰り返し処理により、雑音帯域が連続している場合に、その連続した帯域数をカウントすることができる。
【００３０】
ステップＳＴ１０５では、ｃｏｕｎｔの値を連続カウント閾値ＴＨｃ（第２の閾値）とを比較する。連続カウント閾値ＴＨｃには、経験上得られる好適な値として例えばＴＨｃ＝３を設定することができる。ｃｏｕｎｔが閾値ＴＨｃよりも大きい場合、すなわち、雑音帯域の連続数が閾値で定められた数よりも多い場合には、その連続した帯域すべてが雑音であると判定し、ステップＳＴ１０６へ進む。一方、ｃｏｕｎｔが閾値ＴＨｃ以下の場合には、雑音と判定せずステップＳＴ１０７へ進む。
【００３１】
ステップＳＴ１０６では、ステップＳＴ１０５で雑音帯域と判定された各帯域の推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］に、ＮＯＩＳＥを設定する。これにより、該当する帯域については、後述する推定雑音スペクトルＮ［ｆ_Ｂ］の更新が行われる。
推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］の設定処理をｃｏｕｎｔの回数分繰り返したらステップＳＴ１０８へ進む。
【００３２】
ステップＳＴ１０７では、ステップＳＴ１０５で雑音帯域ではないと判定された各帯域の推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］に、ＶＯＩＣＥを設定する。これにより、該当する帯域については、後述する推定雑音スペクトルＮ［ｆ_Ｂ］の更新は行われない。
推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］の設定処理をｃｏｕｎｔの回数分繰り返したらステップＳＴ１０８へ進む。
【００３３】
ステップＳＴ１０８では、ｃｏｕｎｔの値を０にリセットする。次に、ステップＳＴ１０９では、処理中の帯域の帯域番号ｆ_Ｂが最大値２０になったと判定された場合には当処理を終了する。ｆ_Ｂが最大値が１９以下の場合には、ステップＳＴ１０２へ戻る。これにより、全帯域について処理が行われる。
【００３４】
ここで、図４および図５を用いて帯域別音声・雑音判定部６による推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］設定処理の結果の具体例を示す。図４は、雑音信号が混入した音声入力信号の音声スペクトルと雑音スペクトルの例である。また、図５は、図４に示すスペクトル分布より得られる帯域ＳＮ比ＳＮＲ［ｆ_Ｂ］と、図３の処理によって得られた推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］の例である。図５において、帯域ＳＮ比ＳＮＲ［ｆ_Ｂ］が判定閾値ＴＨ１を下回る帯域が帯域幅閾値ＴＨｃ＝３以上連続する帯域群については、推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］が雑音帯域（ＮＯＩＳＥ）と設定されており、それ以外の帯域については音声帯域（ＶＯＩＣＥ）となっている。なお、図に示すように、音声帯域または雑音帯域として判定された連続した複数の帯域の組を帯域群とする。
【００３５】
次に、帯域別音声・雑音判定部６は、図３の処理で推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］にＮＯＩＳＥが設定された帯域について、更に判定精度を高めるための処理を行う。
すなわち、雑音帯域と判定された帯域群について、さらに、雑音であるか音声的雑音であるかの判定を行う。音声的雑音と判定された帯域については、推定雑音スペクトルＮ［ｆ_Ｂ］の更新が行われないように設定される。これは、入力信号に含まれる雑音成分の平均的なスペクトル形状を保持している推定雑音スペクトルに周波数方向の変動が大きい音声的雑音が混入すると、推定雑音スペクトルの精度が劣化するからである。
なお、図３の処理で推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］にＶＯＩＣＥが設定された帯域群、すなわち音声帯域群については、判定精度を高める処理は行わない。
【００３６】
ここでは、判定精度を高めるための１つの方法として、帯域群別に帯域ＳＮ比の帯域間の分散を求め、その値によって当該帯域群が雑音であるか音声的雑音であるかを判断し、推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］を修正する。
図５に示す一連の帯域群の通し番号をｎとし、Ｌ［ｎ］を帯域群番号ｎにおける帯域幅、すなわち、帯域群に含まれる帯域数とする。ＮＯＩＳＥと判断された帯域群番号ｎにおける、帯域ＳＮ比の帯域間の分散ＳＮＲ_ｄｅｖ［ｎ］は、式（３）によって求めることができる。
【数１】

【００３７】
ここで、ｆ_Ｂ（ｎ）は帯域群ｎに属する帯域番号ｆ_Ｂであり、ｆ_Ｂ（ｎ_Ｌ）は帯域群ｎの帯域番号下限値、ｆ_Ｂ（ｎ_Ｈ）は帯域群ｎの帯域番号上限値である。図５で、ｎ＝２の場合を例に説明すると、帯域群２においては、ｆ_Ｂ（２）＝｛１０，１１，１２，１３｝であり、ｆ_Ｂ（２_Ｌ）＝１０、ｆ_Ｂ（２_Ｈ）＝１３、Ｌ［２］＝４である。
【００３８】
帯域別音声・雑音判定部６は、雑音と判定された全ての帯域群について、式（３）によって求められた帯域群ｎの帯域ＳＮ比の分散ＳＮＲ_ｄｅｖ［ｎ］と閾値ＴＨ２（第３の閾値）を比較する。ここで、閾値ＴＨ２は雑音か音声的雑音かを決定するための所定の閾値であり、閾値ＴＨ２には、経験上得られる好適な値として例えばＴＨ２＝１６．０を設定することができる。
帯域ＳＮ比の分散が閾値ＴＨ２よりも小さい場合、その帯域群は周波数方向のスペクトルのばらつき（スペクトルの凹凸）が小さく定常的であることを表しており、帯域別音声・雑音判定部６は、その帯域群を雑音であると判断する。一方、帯域ＳＮ比の分散が閾値ＴＨ２以上である場合には、その帯域群においては周波数方向のスペクトルのばらつきが大きいことを表しており、帯域別音声・雑音判定部６は、その帯域群を音声に似たスペクトル形状の雑音、すなわち音声的雑音であると判断する。
【００３９】
帯域別音声・雑音判定部６は、音声的雑音と判断された帯域群については、帯域群に含まれるすべての帯域の推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］を、ＮＯＩＳＥから音声的雑音であることを表すＳＰＥＥＣＨＬＩＫＥ＿ＮＯＩＳＥに変更する。
なお、後述する雑音抑圧方式の変更については、音声的雑音であっても、雑音帯域であるものとして取り扱う。
【００４０】
雑音スペクトル推定部７は、雑音らしさ分析部３が出力する雑音スペクトル更新係数ｒと、帯域分割部４が出力する振幅スペクトルＳｐ［ｆ_Ｂ］と、帯域別音声・雑音判定部６が出力する雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］と、過去の平均的な雑音スペクトル形状を示す推定雑音スペクトルＮ_ｏｌｄ［ｆ_Ｂ］とを用いて、式（４）に従い、推定雑音スペクトルＮ［ｆ_Ｂ］の更新を行う。推定雑音スペクトルＮ_ｏｌｄ［ｆ_Ｂ］は、雑音スペクトル推定部７が保有するＲＡＭ等の内部記憶手段に記憶されていてもよいし、雑音スペクトル推定部７がアクセス可能な外部の記憶装置に記憶されていてもよい。なお、推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］がＶＯＩＣＥまたはＳＰＥＥＣＨＬＩＫＥ＿ＮＯＩＳＥの場合には推定雑音スペクトルＮ［ｆ_Ｂ］の更新は行わない。
ｕｐｄａｔｅ［ｆ_Ｂ］＝ＮＯＩＳＥの時
Ｎ［ｆ_Ｂ］＝ｒ・Ｎ_ｏｌｄ［ｆ_Ｂ］＋（１−ｒ）・Ｓｐ［ｆ_Ｂ］
ｕｐｄａｔｅ［ｆ_Ｂ］＝ＶＯＩＣＥまたは
ｕｐｄａｔｅ［ｆ_Ｂ］＝ＳＰＥＥＣＨＬＩＫＥ＿ＮＯＩＳＥの時
Ｎ［ｆ_Ｂ］＝Ｎ_ｏｌｄ［ｆ_Ｂ］
ただし、ｆ_Ｂ＝｛１，・・・，２０｝（４）
【００４１】
雑音抑圧制御部９は、帯域別音声・雑音判定部６が出力する推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］と、帯域ＳＮ比計算部５が出力する帯域ＳＮ比ＳＮＲ［ｆ_Ｂ］を入力として、後述するスペクトル振幅抑圧とスペクトル減算に用いる各係数である、スペクトル減算量α［ｆ_Ｂ］とスペクトル振幅抑圧量β［ｆ_Ｂ］を計算する。それぞれ計算された係数用い、スペクトル減算部１０にて振幅スペクトルＳ［ｆ］から推定雑音スペクトルＮ［ｆ_Ｂ］を減算した後、スペクトル振幅抑圧部１１で、更にスペクトル振幅抑圧することにより雑音抑圧を行い、雑音抑圧されたスペクトルＳｒ［ｆ］を出力する。
【００４２】
まず、雑音抑圧制御部９における、スペクトル減算とスペクトル振幅抑圧に用いる各係数の算出方法について説明する。
まず、式（５）に従ってスペクトル振幅抑圧量β［ｆ_Ｂ］を求める。なお、式（５）中のＧＡＩＮは帯域ＳＮ比ＳＮＲ［ｆ_Ｂ］の重み係数であり所定の定数である。帯域別音声・雑音判定部６が出力する推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］がＶＯＩＣＥ、すなわち音声帯域の場合には、式（５）に従ってスペクトル振幅抑圧量β［ｆ_Ｂ］を求めるが、β［ｆ_Ｂ］が０（ｄＢ）を越える場合にはβ［ｆ_Ｂ］＝０（ｄＢ）とし、この場合スペクトル振幅抑圧を行わない。
一方、推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］がＮＯＩＳＥまたはＳＰＥＥＣＨＬＩＫＥ＿ＮＯＩＳＥ、すなわち雑音帯域もしくは音声的雑音帯域の場合には、スペクトル減算処理に伴う残留雑音成分がミュージカルノイズの原因となる。そのため、スペクトル減算を行わずスペクトル振幅抑圧だけを行う必要があることから、式（５）に示すようにスペクトル振幅抑圧量β［ｆ_Ｂ］に最大抑圧量−Ｇｍｉｎ（ｄＢ）を設定する。
ｕｐｄａｔｅ［ｆ_Ｂ］＝ＶＯＩＣＥの時
β［ｆ_Ｂ］＝Ｍｉｎ｛ＳＮＲ［ｆ_Ｂ］・ＧＡＩＮ−Ｇｍｉｎ，０｝
ｕｐｄａｔｅ［ｆ_Ｂ］＝ＮＯＩＳＥまたは
ｕｐｄａｔｅ［ｆ_Ｂ］＝ＳＰＥＥＣＨＬＩＫＥ＿ＮＯＩＳＥの時
β［ｆ_Ｂ］＝−Ｇｍｉｎ（５）
【００４３】
式（５）に従ってスペクトル振幅抑圧量β［ｆ_Ｂ］を求めた後、雑音抑圧制御部９は、このβ［ｆ_Ｂ］を用い、式（６）に従ってスペクトル減算量α［ｆ_Ｂ］を求める。雑音抑圧制御部９は、得られたスペクトル減算量α［ｆ_Ｂ］をスペクトル減算部１０へ、スペクトル振幅抑圧量β［ｆ_Ｂ］をスペクトル振幅抑圧部１１へ出力する。
α［ｆ_Ｂ］＝−（Ｇｍｉｎ＋β［ｆ_Ｂ］）（６）
【００４４】
スペクトル減算部１０は、スペクトル減算量α［ｆ_Ｂ］をパーセンテージ値であるスペクトル減算率α_Ｐ［ｆ_Ｂ］に変換する。スペクトル減算部１０は、式（７）に従い、雑音スペクトル推定部７から出力された推定雑音スペクトルＮ［ｆ_Ｂ］にスペクトル減算率α_Ｐ［ｆ_Ｂ］を乗じたスペクトルを時間・周波数変換部２から出力された振幅スペクトルＳ［ｆ］から減算し、雑音引き去りスペクトルＳ_Ｓ［ｆ］を出力する。雑音引き去りスペクトルＳ_Ｓ［ｆ］が負になる場合には、入力信号の振幅スペクトルＳ［ｆ］に与えられた所定の定数ＧＬ_ｍｉｎを振幅スペクトルＳ［ｆ］に乗じたものを雑音引き去りスペクトルＳ_Ｓ［ｆ］とする埋め戻し処理を行う。
なお、本処理においては、各帯域番号ｆ_Ｂに対応した推定雑音スペクトルＮ［ｆ_Ｂ］とスペクトル減算率α_Ｐ［ｆ_Ｂ］を、各帯域番号ｆ_Ｂに対応した振幅スペクトル成分Ｓ［ｆ］に展開して計算を行うものとする。
Ｓ［ｆ］＞α_Ｐ［ｆ_Ｂ］・Ｎ［ｆ_Ｂ］の時
Ｓ_Ｓ［ｆ］＝Ｓ［ｆ］−α_Ｐ［ｆ_Ｂ］・Ｎ［ｆ_Ｂ］
Ｓ［ｆ］≦α_Ｐ［ｆ_Ｂ］・Ｎ［ｆ_Ｂ］の時
Ｓ_Ｓ［ｆ］＝Ｓ［ｆ］・ＧＬ_ｍｉｎ（７）
【００４５】
次に、スペクトル振幅抑圧部１１は、スペクトル振幅抑圧量β［ｆ_Ｂ］をリニア値β_１［ｆ_Ｂ］に変換し、式（８）に従って雑音引き去りスペクトルＳ_Ｓ［ｆ］にβ_１［ｆ_Ｂ］を乗じて、雑音抑圧スペクトルＳｒ［ｆ］を算出する。
Ｓｒ［ｆ］＝β_１［ｆ_Ｂ］・Ｓ_Ｓ［ｆ］（８）
【００４６】
式（５）と式（６）から分かるように、推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］がＶＯＩＣＥに設定されている音声帯域では、帯域ＳＮ比ＳＮＲ［ｆ_Ｂ］が大きくなればスペクトル振幅抑圧量β［ｆ_Ｂ］が小さくなり、振幅抑圧が弱まると共に、スペクトル減算量α［ｆ_Ｂ］は大きくなり、スペクトル減算が強くなる。逆に、帯域ＳＮ比ＳＮＲ［ｆ_Ｂ］が小さくなればスペクトル振幅抑圧量β［ｆ_Ｂ］は大きくなり、振幅抑圧が強まると共に、スペクトル減算が弱くなる。これにより、ＳＮ比が高い帯域では主にスペクトル減算で雑音抑圧量を稼ぎ、ＳＮ比が低い帯域では音声スペクトル成分を保持しつつ振幅抑圧を行うことになるので、高い雑音抑圧量と音質を両立することができる。
【００４７】
また、推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］がＮＯＩＳＥもしくはＳＰＥＥＣＨＬＩＫＥ＿ＮＯＩＳＥに設定されている雑音帯域または音声的雑音帯域では、スペクトル振幅抑圧量β［ｆ_Ｂ］が最大抑圧量Ｇｍｉｎになっているので、スペクトル減算量α［ｆ_Ｂ］の値は０となり、スペクトル減算処理は行われず、雑音はそのスペクトル形状を保持したまま音量が小さくなるだけでスペクトル変形が発生しないので雑音抑圧処理音声の「自然性」が保たれる。
【００４８】
図６〜図８を用いて、音声区間における雑音抑圧処理の具体例を説明する。図６は入力信号中の音声信号と音声的雑音信号のそれぞれのスペクトル成分を示した図である。図７は、図６の入力信号を従来のように、音声的雑音帯域が誤って音声帯域と判断された場合の雑音抑圧処理後のスペクトルを示した図である。図８は、図６に示す入力信号をこの実施の形態１の雑音抑圧装置１００に入力した場合の雑音抑圧処理後のスペクトルを示す図である。
【００４９】
図７に示す例では、図中、ＳＰＥＥＣＨＬＩＫＥ＿ＮＯＩＳＥで示された音声的雑音帯域部分が音声と誤って判定され、その判定に基づいてスペクトル減算が行われる。このため、図に示すように、音声的雑音帯域においてスペクトル変形が生じ、音声スペクトルの高域成分に、大きな振幅の孤立した残留スペクトル成分が発生している。
一方、図８に示す例では、音声的雑音帯域は音声的雑音と判定され、その判定に基づいて、スペクトル減算は行わず、スペクトルの振幅抑圧のみが行われるので、スペクトル変形は生じず、孤立した残留スペクトルは発生しない。よって、良好な雑音抑圧が実現されている。
【００５０】
また、図９〜図１１を用いて、雑音区間における雑音抑圧処理の具体例を説明する。図９は音声的雑音スペクトルの例を示した図である。図１０は従来のように、音声的雑音帯域が誤って音声帯域と判断された場合の雑音抑圧処理後の音声的雑音スペクトルを示す図である。また、図１１はこの実施の形態１の雑音抑圧装置１００による雑音抑圧処理後の音声的雑音スペクトルを示す図である。
【００５１】
図１０に示す例では、点在する音声的雑音スペクトルが音声と誤って判定され、その判定に基づいてスペクトル減算が行われる。これにより、スペクトル変形が発生し、振幅の大きな孤立スペクトル成分が発生して音声的雑音スペクトルが強調されている。一方、図１１では、点在する音声的雑音スペクトルは音声的雑音として正しく判定され、その判定に基づいてスペクトル減算ではなくスペクトル振幅抑圧が行われるので、スペクトル変形は生じず、全帯域において雑音のスペクトル形状が保持されたまま信号パワーのみが減少する。すなわち、雑音の自然性が保たれたまま音量のみが小さくなり、良好な雑音抑圧を行うことができる。
【００５２】
周波数・時間変換部１２は、雑音抑圧スペクトルＳｒ［ｆ］と時間・周波数変換部２が出力する位相スペクトルＰ［ｆ］を時間信号に変換し、一部、前フレームの雑音抑圧信号と重ね合わせ処理を行い、雑音抑圧信号ｓｒ［ｔ］を出力端子１３より出力する。
【００５３】
以上のように、この実施の形態１によれば、入力信号のスペクトルを周波数帯域で分割し、帯域毎にスペクトルのＳＮ比に基づいて音声・雑音の判定を行なう。さらに、雑音と判定された帯域につては、雑音区間の連続の程度を解析し、一定以上雑音帯域が続いた場合にのみ再度雑音と判定する。さらに、雑音帯域については、連続した帯域間でのＳＮ比の分散に基づいて、雑音と音声的雑音の区別を行なうようにした。
これらの区別に基づいて、帯域毎に適正な雑音スペクトルを推定し、また、帯域毎のスペクトル形状に適した雑音抑圧方法を選択して雑音抑圧を行なうようにしたので、帯域毎に最適な雑音抑圧が行なわれ、聴感上好ましい音声を得ることが可能である。
【００５４】
音声的雑音の判定が正しくできると、周波数軸方向の変動が大きな音声的雑音のスペクトル成分が、推定雑音スペクトルに混入することが避けられるので、推定雑音スペクトルの精度劣化を防止することができる。
【００５５】
また、音声的雑音の判定ができることにより、音声的雑音の帯域については雑音抑圧方法にスペクトル減算を用いず、スペクトル振幅抑圧だけを行う。これにより、抑圧後のスペクトルに変形が生じず、スペクトル形状を保持したまま音量だけが小さくなるようにできる。このため、孤立した残留スペクトル成分が発生しないので、雑音抑圧処理後の音声の自然性は保たれ、残留雑音に含まれる耳障りな人工的雑音（ミュージカルノイズ）の増大を防ぐことができる。
【００５６】
なお、実施の形態１においては、図３に示したように帯域別音声・雑音判定部６は、雑音らしさ分析部３が出力する雑音らしさ信号Ｎｓｔｔの値を利用して帯域毎の推定雑音スペクトル更新フラグｕｐｄａｔｅ［ｆ_Ｂ］の設定処理を行なっているが、雑音らしさ信号Ｎｓｔｔの値による判定処理は行なわず、帯域別音声・雑音判定フラグｓｖａｄ［ｆ_Ｂ］のみを用いて処理をおこなってもよい。
また、同じく雑音らしさ分析部３が出力する雑音スペクトル更新係数ｒについても、図２に示したように雑音らしさ信号Ｎｓｔｔに対応した値を用いず、固定値を用いて推定雑音スペクトルの算出を行なうようにしてもよい。
【００５７】
実施の形態２．
実施の形態１においては、電話等、４ｋＨｚ程度までの音声帯域を対象とした音声通信システムに利用する雑音抑圧装置を考え、入力信号のサンプリング周波数として８ｋＨｚを利用した。実施の形態２では、例えばサンプリング周波数を１６ｋＨｚまで拡張することにより、音声帯域が７ｋＨｚを上限とする、広帯域音声通信システムに利用できる雑音抑圧装置を考える。
【００５８】
４ｋＨｚ以上の音声スペクトル成分のＳＮ比は、４ｋＨｚ以下の電話の音声帯域におけるＳＮ比よりも更に小さくなる。そのため、実施の形態１で用いた閾値ＴＨ１およびＴＨｃとは別に４ｋＨｚ以上の高域に適した閾値を用意する。
【００５９】
すなわち、４ｋＨｚ以上の高域では、帯域別音声・雑音判定に用いる閾値を４ｋＨｚ以下の帯域よりも小さくし、例えばＴＨ１_ｈ＝０．５（ｄＢ）とする。これにより、音声のＳＮ比が小さい高域の音声が、より音声として判定され易くなる。また、雑音帯域の連続カウント閾値を大きくし、例えばＴＨｃ_ｈ＝４と設定する。これにより、音声帯域を判定されやすくなる。
また４ｋＨｚ以上の音声が誤って雑音または音声的雑音と判断された場合でも、実施の形態２の雑音抑圧装置では、雑音、音声的雑音帯域に対してはスペクトル減算処理を行わず、スペクトル振幅抑圧処理のみ行う。これにより、スペクトル減算によるスペクトル変形が生じず、孤立した残留スペクトル成分は発生しないので音質が劣化することは避けられる。
【００６０】
以上のように、この実施の形態２によれば、広帯域音声通信システムに適用した場合でも、各閾値を各帯域に適した値に設定することにより、高域の音声に対しても適切な雑音抑圧処理を行うことができる。
また、４ｋＨｚ以上の帯域の音声成分を雑音または音声的雑音と誤って判定した場合でも、スペクトル減算処理は行わずスペクトル振幅抑圧処理のみが行われることから、スペクトル減算によるスペクトル変形が生じず孤立した残留スペクトル成分は発生しないので音質の劣化を防ぐことができる。
【００６１】
実施の形態３．
実施の形態１および実施の形態２では、帯域別音声・雑音判定部６において用いられる各判定閾値は、全帯域で、あるいは帯域別に一定値に設定されていた。実施の形態３では、例えば雑音らしさ分析部３が出力する雑音らしさ信号Ｎｓｔｔの値に基づいて、各閾値を動的に変化させる。
【００６２】
図１２は、この発明の実施の形態３による雑音抑圧装置３００の構成を示すブロック図である。図１と同一の符号は同一の構成要素を表している。図に示すように、雑音抑圧装置３００は、閾値変更部１４を備える。実施の形態３では、閾値変更部１４以外の各部は実施の形態１と同様に動作する。
【００６３】
閾値変更部１４は、内部に、図１３に示すような定数テーブルを有している。このテーブルは、雑音らしさ分析部３が出力する雑音らしさ信号Ｎｓｔｔに対応する帯域別音声・雑音判定判定用閾値ＴＨ１と雑音区間の帯域幅閾値ＴＨｃとを関連付けている。図に示すように、雑音らしさ信号Ｎｓｔｔが音声と予測される値（Ｎｓｔｔ＝１，２，３）の場合には、より音声として判定され易くするために、ＴＨ１を小さくすると共にＴＨｃを大きくする。逆に、雑音らしさ信号Ｎｓｔｔが雑音と予測される値（Ｎｓｔｔ＝４，５）の場合には、より雑音として判定され易くするために、ＴＨ１を大きくＴＨｃを小さく設定している。
【００６４】
閾値変更部１４は、この内部テーブルを参照し、雑音らしさ分析部３から出力されたＮｓｔｔに対応する閾値ＴＨ１及びＴＨｃを選択し、帯域別音声・雑音判定部６に出力する。帯域別音声・雑音判定部６は、閾値変更部１４から通知された閾値ＴＨ１及びＴＨｃを用いて、実施の形態１と同様の処理を行う。
【００６５】
以上のように、この実施の形態３によれば、雑音らしさ分析部３が出力する雑音らしさ信号Ｎｓｔｔの結果に応じて各判定閾値を選択することにより、帯域別音声・雑音判定処理を入力信号の状態に適した条件で行うことができる。これにより、帯域別音声・雑音判定の判定精度が向上し、雑音抑圧処理後の信号の音質を更に向上させることができる。
【００６６】
なお、本実施の形態３では、閾値ＴＨ１および閾値ＴＨｃの２つの閾値を動的に変更しているが、どちらか一方だけを変更するようにしてもよい。
【００６７】
また、実施の形態３においても、サンプリング周波数を例えば１６ｋＨｚまで拡張することにより、音声帯域幅が７ｋＨｚの広帯域音声通信システム向けに利用することができる。
【００６８】
【発明の効果】
以上のように、この発明によれば、聴感上好ましい雑音抑圧が可能で、高雑音下でも品質劣化の少ない雑音抑圧装置を得られるという効果がある。
【図面の簡単な説明】
【図１】この発明の実施の形態１による雑音抑圧装置の構成を示すブロック図である。
【図２】雑音らしさ信号及び雑音スペクトル更新係数と、現フレームの様態との関係を示す図である。
【図３】この発明の実施の形態１による、帯域別音声・雑音判定部における帯域毎の推定雑音スペクトル更新フラグの設定処理のフローチャートである。
【図４】入力信号の音声スペクトルと雑音スペクトルの例を示す図である。
【図５】この発明の実施の形態１による、図４に示す入力信号から得られる推定雑音スペクトル更新フラグの例を示す図である。
【図６】音声スペクトルと音声的雑音信号のスペクトルの例を示す図である。
【図７】音声的雑音信号が誤って音声と判断された場合の、雑音抑圧処理後のスペクトルの例を示す図である。
【図８】この発明の実施の形態１による、雑音抑圧処理後のスペクトルの例を示す図である。
【図９】雑音区間における音声的雑音スペクトルの例である。
【図１０】音声的雑音信号が誤って音声と判断された場合の、雑音抑圧処理後の音声的雑音スペクトルの例を示す図である。
【図１１】この発明の実施の形態１による、雑音抑圧処理後の音声的雑音スペクトルの例を示す図である。
【図１２】この発明の実施の形態３による雑音抑圧装置の構成を示すブロック図である。
【図１３】雑音らしさ信号と各判定閾値との関係を示す図である。
【符号の説明】
１入力端子、２時間・周波数変換部、３雑音らしさ分析部、４帯域分割部、５帯域ＳＮ比計算部、６帯域別音声・雑音判定部、７雑音スペクトル推定部、８雑音抑圧部、９雑音抑圧制御部、１０スペクトル減算部、１１スペクトル振幅抑圧部、１２周波数・時間変換部、１３出力端子、１４閾値変更部、１００，３００雑音抑圧装置。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a noise suppression device.
[0002]
[Prior art]
Speech communication systems and speech recognition systems such as mobile phones and TV conference systems are used in environments containing various noises. By suppressing noise signals other than the audio signal, which is the objective signal, the objective signal is emphasized, and the sound quality can be improved and the speech recognition rate can be improved.
[0003]
Various techniques have been disclosed for suppressing a noise signal from an input signal containing noise.
For example, the conventional noise suppression device disclosed in Patent Document 1 suppresses noise by spectral subtraction (hereinafter referred to as SS method) shown in Non-Patent Document 1. In the SS method, noise suppression is performed by subtracting a separately estimated average noise spectrum from an amplitude spectrum.
[0004]
In the background noise elimination device disclosed in Patent Document 2, an input signal is converted into a frequency component, and a speech / noise section of the input signal is determined. When the input signal of the current frame is determined to be noise, the estimated background noise is updated by averaging the background noise estimated in the current frame and the background noise estimated in the past frame. On the other hand, if the current frame is determined to be a voice section, the estimated background noise is subtracted from the frequency component to obtain a noise suppression signal. The signal-to-noise ratio (SN ratio) in the entire frequency band and the small area obtained by dividing the entire frequency band into a plurality of parts, using the frequency component of the noise suppression signal obtained by the subtraction processing as a signal and the estimated background noise as noise. Calculate the SN ratio for each. For a small region in which the difference between the SN ratio for each small band and the SN ratio of the entire band is equal to or less than a predetermined value, renewed background noise including the noise suppression signal and the estimated background noise component at a predetermined ratio is generated, and the noise suppression signal is generated. Is further subtracted from the renewed background noise to obtain a re-noise suppression signal, and this signal is returned to a signal expressed in the time domain to obtain a noise suppression signal.
[0005]
Further, the conventional noise suppression method disclosed in Non-Patent Document 2 is based on the SS method as in Patent Document 1. While performing the frequency conversion of the input signal, the voice / noise determination of the current frame is performed, and when the current frame is a voiced section, the intersection of the envelope of the input signal spectrum and the envelope of the estimated noise spectrum is obtained. Using a high-pass filter (High Pass Filter: hereinafter referred to as HPF) and a low-pass filter (Low Pass Filter: hereinafter referred to as LPF) having an intersection as a cutoff frequency, the input signal is subjected to a high-pass filter. Component and low-pass component. For the low-frequency component, a noise suppression method based on the SS method using a normal FFT (Fast Fourier Transform) is selected. Select the noise suppression method. As described above, by adopting the noise suppression method having different characteristics between the high band and the low band, it is possible to perform good noise suppression.
[0006]
[Patent Document 1]
JP 2000-347688 A
[Patent Document 2]
JP-A-10-171497
[Non-patent document 1]
S. F. Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction", IEEE Trans. ASSP, April 1979, Vol. ASSP-27, no. 2
[Non-patent document 2]
C. He and G. Zweig, "Adaptive Two-band Spectral Subtraction with Multi-Window Spectral Estimation", IEEE Conference of Acoustic 19th Spec. 793-796
[0007]
[Problems to be solved by the invention]
Among noises, there is noise having a spectrum shape similar to a voice spectrum, such as noise mixed with human voices of many people. Such noise is referred to as speech-like noise.
[0008]
The conventional background noise elimination device disclosed in Patent Literature 2 performs a re-subtraction process on the noise spectrum for a band in which the difference between the entire band S / N ratio and each band S / N ratio is equal to or less than a predetermined threshold. There are benefits to be gained. However, whether or not to perform the re-subtraction process is determined simply based on the value of the difference between the entire band S / N ratio and each band S / N ratio, and the spectrum of the band is similar to a voice spectrum or noise spectrum. Is not determined. Therefore, when speech noise is mixed in the input signal, the following problem occurs.
[0009]
First, when speech noise is treated as noise, there are the following problems. When noise suppression is performed based on the SS method, it is desirable that the estimated background noise spectrum has little fluctuation in the frequency axis direction. However, there is a problem that the accuracy of the estimated noise spectrum is deteriorated when the noise is mixed in the estimated noise spectrum as the noise because the noise in the frequency axis greatly fluctuates.
[0010]
On the other hand, if the speech noise is erroneously determined to be “speech”, the speech noise is suppressed as speech. However, since the speech noise has a small signal power but a spectrum shape similar to that of a speech spectrum, by performing the spectrum subtraction process, only a spectrum component having a relatively large spectrum amplitude remains isolated. By performing the re-subtraction processing further as in the device of Patent Document 2, unnecessary spectral components are further emphasized, and unpleasant artificial noise (musical noise) included in the residual noise increases.
[0011]
Also, consider a case in which the device of Patent Document 2 is applied to a wideband audio communication system with an upper limit of 7 kHz, such as a TV conference system requiring a sense of realism. Since the S / N ratio and power of a high-frequency voice spectral component of 4 kHz or more become considerably small, a voice of 4 kHz or more may be mistaken for noise in voice / noise determination. If it is erroneously determined to be noise, high-frequency speech is greatly spectrum-subtracted, so that only high-frequency spectral components having relatively large spectral amplitudes remain. As a result, musical noise is generated and the sound quality is degraded.
[0012]
Further, the conventional noise suppression device disclosed in Non-Patent Document 2 uses an HPF and an LPF determined from the intersection of the envelope of the input signal spectrum and the envelope of the noise spectrum to convert the input signal into a low band and a high band. , And a noise suppression method is selected according to each band, so that good noise suppression according to each band can be performed. However, for example, it is not suitable for noise suppression when there are three or more voiced bands and noise bands.
[0013]
SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to provide a noise suppression device capable of suppressing noise which is desirable in terms of audibility and having less deterioration in quality even under high noise.
[0014]
[Means for Solving the Problems]
A noise suppression device according to the present invention converts an input audio signal represented in a time domain into a frequency domain expression, and generates a magnitude spectrum and a phase spectrum from frequency components. A band division unit that divides into frequency bands and outputs an average amplitude spectrum for each band, and analyzes the spectrum shape of the average amplitude spectrum for each frequency band to discriminate speech, noise, and speech-like noise similar to speech. A noise / sound determining unit for each band for outputting a spectrum shape determination result, a noise spectrum estimating unit for determining whether to update an estimated noise spectrum for each frequency band based on the spectrum shape determination result, and a spectrum shape determination Based on the results, an optimal noise suppression method is selected for each frequency band, and the amplitude spectrum of each frequency band is selected according to the selected method. A noise suppression unit that outputs a noise suppression spectrum obtained by suppressing the estimated noise spectrum for each frequency band from the noise, and a frequency that generates the noise suppression signal by converting the noise suppression spectrum into a signal represented in the time domain. -It has a time conversion unit.
[0015]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, various embodiments of the present invention will be described.
Embodiment 1 FIG.
FIG. 1 is a block diagram showing a configuration of a noise suppression device 100 according to Embodiment 1 of the present invention.
As shown in the figure, the noise suppression device 100 includes an input terminal 1, a time / frequency conversion unit 2, a noise likeness analysis unit 3, a band division unit 4, a band S / N ratio calculation unit 5, a band-specific speech / noise determination unit 6, The apparatus includes a noise spectrum estimating unit 7, a noise suppressing unit 8, a frequency / time converting unit 12, and an output terminal 13.
The noise suppression unit 8 includes a noise suppression control unit 9, a spectrum subtraction unit 10, and a spectrum amplitude suppression unit 11.
[0016]
The noise suppression processing performed by the noise suppression device 100 will be described.
When the input signal s [t] mixed with noise is input to the input terminal 1, the input signal s [t] is sampled at a predetermined sampling frequency, divided into frames at a predetermined cycle, and sent to the time / frequency converter 2. Is entered. Here, the sampling frequency is 8 kHz and the frame period is 20 ms.
[0017]
The time / frequency conversion unit 2 performs frequency analysis on the input signal s [t] that has been frame-divided using, for example, 256 points of Fast Fourier Transform (FFT), and an amplitude spectrum S [f]. ] And a phase spectrum P [f] are generated and output. Note that FFT is a well-known technique, and a description thereof will be omitted.
[0018]
The noise likeness analyzer 3 receives the frame-divided input signal s [t] output from the input terminal 1 and the amplitude spectrum S [f] output from the time / frequency converter 2. The noise likeness analysis unit 3 analyzes the input signal s [t] of the input current frame, and separates the noise likeness signal Nstt, which is an index indicating whether the current frame is a speech section or a noise section, by band. Output to the voice / noise determination unit 6. The noise spectrum estimating unit 7 also outputs a noise spectrum update coefficient r corresponding to the noise likeness signal Nstt. The noise spectrum update coefficient r is equal to an estimated noise spectrum N [f _B ] Is calculated.
[0019]
FIG. 2 shows the relationship between the noise-likeness signal Nsttt and the noise spectrum update coefficient r and the state of the current frame. As shown in the figure, the noise likeness signal Nstt is output with level values 1 to 5. When Nsttt is in the range of 4 to 5, the current frame is a noise section, and when Nsttt is in the range of 1 to 3, the current frame is a speech section. A method of calculating the noise likeness signal Nsttt can be performed in the same manner as the noise likeness analysis processing disclosed in Patent Document 1, for example, and thus will be briefly described here.
The noise-likeness analysis unit 3 removes the influence of high-frequency noise from the input signal s [t] using a low-pass filter to obtain a low-pass filter signal. Next, a linear prediction analysis of the low-pass filter signal is performed. Next, inverse filtering of the low-pass filter signal is performed using the obtained linear prediction coefficients. Nstt is calculated based on the positive peak value of the auto-correlation coefficient of the low-pass residual signal obtained as a result of the inverse filter processing, the power of the low-pass residual signal, and the frame power.
[0020]
The band division unit 4 receives the input of the amplitude spectrum S [f] output from the time / frequency conversion unit 2, and divides the amplitude spectrum S [f] into, for example, 20 frequency bands indicated as Bark spectrum intervals. The band dividing unit 4 obtains an average spectrum of the amplitude spectrum S [f] for each divided band, and obtains the band-divided amplitude spectrum Sp [f]. _B ] Is output. Note that f _B Represents a band number in the bark spectrum.
The Bark spectrum is shown in Table 1 of Eberhard Zwicker, "Psychoacoustics", Nishimura Shoten, 1992, pp. 74. The bark spectrum interval is a method of dividing a frequency band corresponding to human hearing characteristics, and has a characteristic that a bandwidth is narrow in a low frequency region, and the bandwidth increases as the frequency increases. Hereinafter, the calculation processing for each frequency band is performed as follows, unless otherwise specified. _B Shall be performed.
[0021]
The band S / N ratio calculator 5 calculates the band-divided amplitude spectrum Sp [f _B ], And an estimated noise spectrum N [f _B ], And the band SNR SNR [f] according to the following equation (1). _B ] Is calculated. That is, the band SN ratio is calculated as a ratio between the signal spectrum power and the noise spectrum power of each band.

[0022]
That is, in equation (1), the calculation result SNR [f _B ] Is negative, the SNR [f _B ] = 0.
[0023]
The band-specific speech / noise determination unit 6 determines the band SNR SNR [f of the frame immediately before the current frame output by the band SNR calculation unit 5. _B ], And performs speech / noise determination for each band, and according to the determination result, a speech / noise determination flag svad [f for each band. _B ] Is calculated. As a method of speech / noise determination for each band, for example, a band SN ratio SNR [f _B ] And a predetermined threshold value TH1 (first threshold value).
SNR [f _B ]> When TH1
svad [f _B ] = Voice (voice)
SNR [f _B ] ≤ TH1
svad [f _B ] = Noise (noise)
Where f _B = {1, ..., 20} (2)
[0024]
Here, TH1 is a threshold value used for speech / noise determination for each band. As the threshold value TH1, for example, TH1 = 1.5 dB can be used as a suitable value obtained from a large number of samples of the speech SN ratio.
[0025]
Further, the band-based voice / noise determination unit 6 calculates the band-based voice / noise determination flag svad [f for each band calculated by Expression (1). _B ], And a continuity determination process of the noise band based on the noise-likeness signal Nstt output from the noise-likeness analysis unit 3, and based on the determination result, an estimated noise spectrum update flag update [f for each band. _B ] (Spectral shape determination result) is set. Referring to the flowchart of FIG. 3, estimated noise spectrum update flag update [f for each band in band-based speech / noise determination unit 6. _B ] Will be described.
[0026]
First, in step ST101, the band-specific speech / noise determination unit 6 analyzes the noise-likeness signal Nstt output from the noise-likeness analysis unit 3. As shown in FIG. 2, when the value of Nstt is 1, 2, or 3, it is determined that the section is a voice section, and the process proceeds to step ST102.
[0027]
On the other hand, when the value of Nsttt is 4 or 5, it is determined that all the bands are noise sections, and the process proceeds to step ST110.
In step ST110, the estimated noise spectrum update flags update [f _B Is set to NOISE and output. As will be described later, this results in an estimated noise spectrum N [f _B ] Is updated.
[0028]
The processing of steps ST102 to ST109 is performed for each of the divided bands. First, in step ST102, a speech / noise determination flag for each band svad [f _B ] Is determined, and svad [f _B ] Indicates noise (NOISE), the process proceeds to step ST103. If the value indicates voice (VOICE), the process proceeds to step ST105.
[0029]
In step ST103, the number of bands determined to be NOISE is incremented. The obtained count number is defined as count. Next, in step ST104, the band number f of the band being processed _B Has reached the maximum value 20, the process proceeds to step ST105. f _B Is less than or equal to 19, the process returns to step ST102. By this repetition processing, when the noise bands are continuous, the number of the continuous bands can be counted.
[0030]
In step ST105, the value of count is compared with a continuous count threshold THc (second threshold). For the continuous count threshold THc, for example, THc = 3 can be set as a suitable value obtained from experience. If count is greater than threshold THc, that is, if the number of consecutive noise bands is greater than the number determined by the threshold, it is determined that all the continuous bands are noise and the process proceeds to step ST106. On the other hand, if the count is equal to or smaller than the threshold THc, the process proceeds to step ST107 without determining that the noise is noise.
[0031]
In step ST106, the estimated noise spectrum update flag update [f for each band determined to be a noise band in step ST105. _B Is set to NOISE. Thereby, for the corresponding band, an estimated noise spectrum N [f _B ] Is updated.
Estimated noise spectrum update flag update [f _B ] Is repeated the number of times of count, the process proceeds to step ST108.
[0032]
In step ST107, the estimated noise spectrum update flag update [f for each band determined to be not a noise band in step ST105. _B ] Is set to VOICE. Thereby, for the corresponding band, an estimated noise spectrum N [f _B ] Is not updated.
Estimated noise spectrum update flag update [f _B ] Is repeated the number of times of count, the process proceeds to step ST108.
[0033]
In step ST108, the value of count is reset to 0. Next, in step ST109, the band number f of the band being processed _B Is determined to have reached the maximum value 20, the processing is terminated. f _B Is less than or equal to 19, the process returns to step ST102. As a result, processing is performed for all bands.
[0034]
Here, the estimated noise spectrum update flag update [f by the band-specific speech / noise determination unit 6 will be described with reference to FIGS. 4 and 5. _B A specific example of the result of the setting process will be described. FIG. 4 is an example of a voice spectrum and a noise spectrum of a voice input signal mixed with a noise signal. FIG. 5 shows a band SN ratio SNR [f obtained from the spectrum distribution shown in FIG. _B ] And the estimated noise spectrum update flag update [f obtained by the processing of FIG. _B ] Is an example. In FIG. 5, the band SN ratio SNR [f _B ] Is less than the determination threshold value TH1 for a band group in which the bandwidth threshold value THc = 3 or more continues, the estimated noise spectrum update flag update [f _B ] Is set as a noise band (NOISE), and the other bands are voice bands (VOICE). As shown in the figure, a set of a plurality of continuous bands determined as a voice band or a noise band is defined as a band group.
[0035]
Next, the band-specific speech / noise determination unit 6 performs the estimation noise spectrum update flag update [f _B For the band for which NOISE is set in [], processing for further improving the determination accuracy is performed.
That is, the band group determined to be a noise band is further determined to be noise or speech noise. For the band determined to be speech noise, the estimated noise spectrum N [f _B ] Is not set. This is because the accuracy of the estimated noise spectrum is deteriorated when speech noise having large fluctuation in the frequency direction is mixed into the estimated noise spectrum holding the average spectrum shape of the noise component included in the input signal.
Note that the estimated noise spectrum update flag update [f _B For the band group in which VOICE is set in [], that is, the voice band group, the processing for increasing the determination accuracy is not performed.
[0036]
Here, as one method for improving the determination accuracy, the variance of the band S / N ratio between the bands is determined for each band group, and whether the band group is noise or speech noise is determined based on the value, and the estimation is performed. Noise spectrum update flag update [f _B ].
The serial number of a series of band groups shown in FIG. 5 is n, and L [n] is the bandwidth at band group number n, that is, the number of bands included in the band group. Dispersion SNR between bands of band SN ratio in band group number n determined to be NOISE _dev [N] can be obtained by equation (3).
(Equation 1)

[0037]
Where f _B (N) is a band number f belonging to the band group n _B And f _B (N _L ) Is the band number lower limit of band group n, f _B (N _H ) Is the band number upper limit of band group n. In FIG. 5, the case where n = 2 will be described as an example. _B (2) = {10, 11, 12, 13}, and f _B (2 _L ) = 10, f _B (2 _H ) = 13 and L [2] = 4.
[0038]
The band-specific speech / noise determination unit 6 calculates the variance SNR of the band S / N ratio of the band group n obtained by Equation (3) for all the band groups determined to be noise. _dev [N] is compared with a threshold value TH2 (third threshold value). Here, the threshold value TH2 is a predetermined threshold value for determining whether the noise is noise or speech noise. For the threshold value TH2, for example, TH2 = 16.0 can be set as a suitable value obtained through experience.
When the variance of the band S / N ratio is smaller than the threshold value TH2, it indicates that the band group has a small variation in spectrum in the frequency direction (unevenness of the spectrum) and is stationary. The band group is determined to be noise. On the other hand, if the variance of the band S / N ratio is equal to or larger than the threshold value TH2, it indicates that the spectrum in the frequency group has large dispersion in the band group. It is determined that the noise has a spectrum shape similar to voice, that is, voice noise.
[0039]
The band-based speech / noise determination unit 6 updates the estimated noise spectrum update flags update [f for all the bands included in the band group for the band group determined to be speech noise. _B ] Is changed from NOISE to SPEECHLIKE_NOISE indicating that it is speech noise.
Regarding the change of the noise suppression method described later, even a speech noise is treated as a noise band.
[0040]
The noise spectrum estimating unit 7 includes a noise spectrum updating coefficient r output from the noise likeness analyzing unit 3 and an amplitude spectrum Sp [f output from the band dividing unit 4. _B ] And a noise spectrum update flag update [f output from the band-specific speech / noise determination unit 6. _B ] And an estimated noise spectrum N indicating a past average noise spectrum shape. _old [F _B ] And the estimated noise spectrum N [f _B ] Is updated. Estimated noise spectrum N _old [F _B ] May be stored in an internal storage unit such as a RAM held by the noise spectrum estimating unit 7 or may be stored in an external storage device accessible by the noise spectrum estimating unit 7. Note that the estimated noise spectrum update flag update [f _B ] Is VOICE or SPEECHLIQUE_NOISE, the estimated noise spectrum N [f _B ] Is not updated.
update [f _B ] = NOISE
N [f _B ] = R · N _old [F _B ] + (1-r) · Sp [f _B ]
update [f _B ] = VOICE or
update [f _B ] = When SPEECHLIQUE_NOISE
N [f _B ] = N _old [F _B ]
Where f _B = {1, ..., 20} (4)
[0041]
The noise suppression control unit 9 updates the estimated noise spectrum update flag update [f output from the band-specific speech / noise determination unit 6. _B ] And the band SN ratio SNR [f output by the band SN ratio calculation unit 5] _B ] As an input, a spectrum subtraction amount α [f, which is a coefficient used for spectrum amplitude suppression and spectrum subtraction described later. _B ] And the spectral amplitude suppression amount β [f _B ] Is calculated. Using the calculated coefficients, the spectrum subtraction unit 10 estimates the estimated noise spectrum N [f] from the amplitude spectrum S [f]. _B ), The spectrum amplitude suppression unit 11 further suppresses the noise by further suppressing the spectrum amplitude, and outputs the noise-suppressed spectrum Sr [f].
[0042]
First, a method of calculating each coefficient used for spectrum subtraction and spectrum amplitude suppression in the noise suppression control unit 9 will be described.
First, the spectral amplitude suppression amount β [f _B ]. Note that GAIN in equation (5) is the band SN ratio SNR [f _B ] Is a predetermined constant. Estimated noise spectrum update flag update [f output by band-specific speech / noise determination unit 6 _B ] Is VOICE, that is, the voice band, the spectrum amplitude suppression amount β [f _B ], And β [f _B ] Exceeds 0 (dB), β [f _B ] = 0 (dB), and in this case, spectral amplitude suppression is not performed.
On the other hand, the estimated noise spectrum update flag update [f _B ] Is NOISE or SPEECHLIKE_NOISE, that is, a noise band or a speech noise band, a residual noise component accompanying the spectrum subtraction processing causes musical noise. Therefore, it is necessary to perform only the spectral amplitude suppression without performing the spectrum subtraction. Therefore, as shown in Expression (5), the spectral amplitude suppression amount β [f _B ] Is set to the maximum suppression amount -Gmin (dB).
update [f _B ] = When VOICE
β [f _B ] = Min ｛SNR [f _B ] ・ GAIN-Gmin, 0｝
update [f _B ] = NOISE or
update [f _B ] = When SPEECHLIQUE_NOISE
β [f _B ] = − Gmin (5)
[0043]
According to equation (5), the spectrum amplitude suppression amount β [f _B ], The noise suppression control unit 9 determines the value of β [f _B And the spectral subtraction amount α [f according to the equation (6). _B ]. The noise suppression control unit 9 obtains the obtained spectrum subtraction amount α [f _B ] To the spectrum subtraction unit 10 and the spectrum amplitude suppression amount β [f _B ] To the spectrum amplitude suppression unit 11.
α [f _B ] = − (Gmin + β [f _B ]) (6)
[0044]
The spectrum subtraction unit 10 calculates the spectrum subtraction amount α [f _B ] Is the percentage value of the spectral subtraction rate α _P [F _B ]. The spectrum subtraction unit 10 calculates the estimated noise spectrum N [f output from the noise spectrum estimation unit 7 according to the equation (7). _B ] Is the spectral subtraction rate α _P [F _B ] Is subtracted from the amplitude spectrum S [f] output from the time / frequency converter 2 to obtain a noise subtracted spectrum S _S [F] is output. Noise removal spectrum S _S When [f] becomes negative, a predetermined constant GL given to the amplitude spectrum S [f] of the input signal _min Is multiplied by the amplitude spectrum S [f] to obtain a noise removal spectrum S _S A backfilling process of [f] is performed.
In this processing, each band number f _B Noise spectrum N [f corresponding to _B ] And the spectral subtraction rate α _P [F _B ] To each band number f _B It is assumed that the calculation is performed by expanding to an amplitude spectrum component S [f] corresponding to.
S [f]> α _P [F _B ] · N [f _B ]time
S _S [F] = S [f] -α _P [F _B ] · N [f _B ]
S [f] ≦ α _P [F _B ] · N [f _B ]time
S _S [F] = S [f] · GL _min (7)
[0045]
Next, the spectrum amplitude suppression unit 11 sets the spectrum amplitude suppression amount β [f _B ] To the linear value β ₁ [F _B And the noise removal spectrum S according to equation (8). _S Β in [f] ₁ [F _B ] To calculate the noise suppression spectrum Sr [f].
Sr [f] = β ₁ [F _B ] ・ S _S [F] (8)
[0046]
As can be seen from Equations (5) and (6), the estimated noise spectrum update flag update [f _B ] Is set to VOICE in the band SN ratio SNR [f _B ] Increases, the spectrum amplitude suppression amount β [f _B ], The amplitude suppression is weakened, and the spectral subtraction amount α [f _B ] Becomes large, and the spectrum subtraction becomes strong. Conversely, the band SN ratio SNR [f _B ] Becomes smaller, the spectrum amplitude suppression amount β [f _B ] Increases, the amplitude suppression becomes stronger, and the spectrum subtraction becomes weaker. As a result, in a band with a high SN ratio, noise suppression is gained mainly by spectrum subtraction, and in a band with a low SN ratio, amplitude suppression is performed while retaining a voice spectrum component. can do.
[0047]
Also, the estimated noise spectrum update flag update [f _B ] Is set to NOISE or SPEECHLIKE_NOISE, or in a speech noise band, the spectrum amplitude suppression amount β [f _B ] Is the maximum suppression amount Gmin, so that the spectrum subtraction amount α [f _B ] Is 0, the spectrum subtraction processing is not performed, and the noise is only reduced in volume while maintaining its spectral shape, and no spectral deformation occurs, so that the “naturalness” of the noise suppression processing voice is maintained.
[0048]
A specific example of the noise suppression processing in the voice section will be described with reference to FIGS. FIG. 6 is a diagram showing respective spectral components of a speech signal and a speech noise signal in the input signal. FIG. 7 is a diagram showing a spectrum after noise suppression processing when a speech noise band is erroneously determined to be a speech band in the input signal of FIG. 6 as in the related art. FIG. 8 is a diagram illustrating a spectrum after noise suppression processing when the input signal illustrated in FIG. 6 is input to the noise suppression device 100 according to the first embodiment.
[0049]
In the example shown in FIG. 7, the speech noise band portion indicated by SPEECHLIQUE_NOISE in the figure is erroneously determined to be speech, and spectrum subtraction is performed based on the determination. For this reason, as shown in the figure, spectrum deformation occurs in the speech noise band, and isolated residual spectrum components having a large amplitude are generated in the high frequency components of the speech spectrum.
On the other hand, in the example shown in FIG. 8, the speech noise band is determined as speech noise, and based on the determination, spectrum subtraction is not performed and only spectrum amplitude suppression is performed. No residual spectrum is generated. Therefore, good noise suppression is realized.
[0050]
Further, a specific example of the noise suppression processing in the noise section will be described with reference to FIGS. FIG. 9 is a diagram showing an example of a speech noise spectrum. FIG. 10 is a diagram showing a speech noise spectrum after noise suppression processing when a speech noise band is erroneously determined to be a speech band as in the related art. FIG. 11 is a diagram showing a speech noise spectrum after the noise suppression processing by the noise suppression device 100 according to the first embodiment.
[0051]
In the example shown in FIG. 10, the scattered speech noise spectrum is erroneously determined to be speech, and spectrum subtraction is performed based on the determination. As a result, spectrum deformation occurs, an isolated spectrum component having a large amplitude is generated, and the speech noise spectrum is emphasized. On the other hand, in FIG. 11, the scattered speech noise spectrum is correctly determined as speech noise, and spectrum amplitude suppression is performed instead of spectrum subtraction based on the decision. Only the signal power decreases while the spectral shape is maintained. That is, only the sound volume is reduced while the naturalness of noise is maintained, and good noise suppression can be performed.
[0052]
The frequency / time conversion unit 12 converts the noise suppression spectrum Sr [f] and the phase spectrum P [f] output from the time / frequency conversion unit 2 into a time signal, and partially overlaps the noise suppression signal of the previous frame. The processing is performed, and the noise suppression signal sr [t] is output from the output terminal 13.
[0053]
As described above, according to the first embodiment, the spectrum of an input signal is divided into frequency bands, and speech / noise determination is performed for each band based on the SN ratio of the spectrum. Further, for the band determined to be noise, the degree of continuity of the noise section is analyzed, and only when the noise band continues for a certain length or more, it is determined again as noise. Further, with respect to the noise band, noise and speech noise are distinguished based on the variance of the SN ratio between consecutive bands.
Based on these distinctions, an appropriate noise spectrum is estimated for each band, and a noise suppression method suitable for the spectrum shape of each band is selected to perform noise suppression. Suppression is performed, and it is possible to obtain a sound that is preferable in terms of hearing.
[0054]
If speech noise can be correctly determined, it is possible to prevent the spectral components of speech noise having large fluctuations in the frequency axis direction from being mixed into the estimated noise spectrum, thereby preventing the accuracy of the estimated noise spectrum from deteriorating.
[0055]
In addition, since speech noise can be determined, only spectrum amplitude suppression is performed for a speech noise band without using spectrum subtraction in the noise suppression method. As a result, the suppressed spectrum is not deformed, and only the sound volume can be reduced while maintaining the spectrum shape. For this reason, since no isolated residual spectral components are generated, the naturalness of the sound after the noise suppression processing is maintained, and annoying artificial noise (musical noise) included in the residual noise can be prevented from increasing.
[0056]
In Embodiment 1, as shown in FIG. 3, band-based speech / noise determination section 6 uses estimated noise spectrum Nstt output from noise-likeness analysis section 3 for estimated noise spectrum for each band. Update flag update [f _B Is performed, but the determination process based on the value of the noise likeness signal Nsttt is not performed, and the speech / noise determination flag svad [f for each band is performed. _B ] May be used for processing.
Also, as for the noise spectrum update coefficient r output from the noise likeness analyzer 3, the estimated noise spectrum is calculated using a fixed value instead of using the value corresponding to the noise likeness signal Nstt as shown in FIG. You may do so.
[0057]
Embodiment 2 FIG.
In the first embodiment, 8 kHz is used as a sampling frequency of an input signal in consideration of a noise suppression device used for a voice communication system for a voice band up to about 4 kHz such as a telephone. In the second embodiment, a noise suppression device that can be used in a wideband voice communication system in which the voice band is limited to 7 kHz by extending the sampling frequency to 16 kHz, for example, is considered.
[0058]
The S / N ratio of the voice spectrum component of 4 kHz or higher is even smaller than the S / N ratio in the voice band of a telephone of 4 kHz or lower. Therefore, in addition to the threshold values TH1 and THc used in the first embodiment, a threshold value suitable for a high band of 4 kHz or more is prepared.
[0059]
That is, in the high frequency range of 4 kHz or more, the threshold used for the speech / noise determination for each frequency band is set smaller than the frequency range of 4 kHz or less. _h = 0.5 (dB). This makes it easier for a high-frequency voice having a low SN ratio of voice to be determined as a voice. Also, the continuous count threshold of the noise band is increased, for example, THc _h = 4 is set. This makes it easier to determine the audio band.
Further, even when speech of 4 kHz or more is erroneously determined to be noise or speech noise, the noise suppression apparatus of the second embodiment does not perform spectrum subtraction processing on noise and speech noise bands, and suppresses spectrum amplitude suppression. Perform only processing. As a result, spectrum deformation due to spectrum subtraction does not occur, and no isolated residual spectrum components are generated, so that deterioration in sound quality can be avoided.
[0060]
As described above, according to the second embodiment, even when applied to a wideband speech communication system, by setting each threshold value to a value suitable for each band, an appropriate noise level can be obtained even for high-frequency speech. Suppression processing can be performed.
Further, even when a voice component in a band of 4 kHz or more is erroneously determined to be noise or voice noise, only spectral amplitude suppression processing is performed without performing spectrum subtraction processing. Since no residual spectrum component is generated, deterioration of sound quality can be prevented.
[0061]
Embodiment 3 FIG.
In the first and second embodiments, each determination threshold used in band-specific speech / noise determination unit 6 is set to a constant value for the entire band or for each band. In the third embodiment, for example, each threshold is dynamically changed based on the value of the noise likeness signal Nstt output from the noise likeness analyzer 3.
[0062]
FIG. 12 is a block diagram showing a configuration of a noise suppression device 300 according to Embodiment 3 of the present invention. 1 denote the same components. As shown in the figure, the noise suppression device 300 includes a threshold changing unit 14. In the third embodiment, each unit other than the threshold value changing unit 14 operates similarly to the first embodiment.
[0063]
The threshold value changing unit 14 internally has a constant table as shown in FIG. This table associates the band-based speech / noise determination determination threshold value TH1 corresponding to the noise-likeness signal Nstt output by the noise-likeness analysis unit 3 with the bandwidth threshold value THc of the noise section. As shown in the figure, when the noise-likeness signal Nstt is a value predicted to be speech (Nstt = 1, 2, 3), TH1 is decreased and THc is increased in order to make it more likely to be determined as speech. . Conversely, when the noise-likeness signal Nstt is a value predicted to be noise (Nstt = 4, 5), TH1 is set to be large and THc is set to be small in order to make it easier to determine as noise.
[0064]
The threshold changing unit 14 refers to the internal table, selects the thresholds TH1 and THc corresponding to Nstt output from the noise likeness analyzing unit 3, and outputs the selected thresholds TH1 and THc to the band-based speech / noise determining unit 6. The band-specific speech / noise determination unit 6 performs the same processing as in the first embodiment using the thresholds TH1 and THc notified from the threshold changing unit 14.
[0065]
As described above, according to the third embodiment, by selecting each determination threshold according to the result of the noise-likeness signal Nstt output from the noise-likeness analysis unit 3, the speech / noise determination process for each band is performed on the input signal. Can be carried out under conditions suitable for the above conditions. As a result, the accuracy of speech / noise determination for each band is improved, and the sound quality of the signal after the noise suppression processing can be further improved.
[0066]
In the third embodiment, the two thresholds TH1 and THc are dynamically changed, but only one of them may be changed.
[0067]
Also, in the third embodiment, by expanding the sampling frequency to, for example, 16 kHz, it can be used for a wideband voice communication system having a voice bandwidth of 7 kHz.
[0068]
【The invention's effect】
As described above, according to the present invention, it is possible to obtain a noise suppression device which can suppress noise which is preferable in terms of audibility and which has little quality deterioration even under high noise.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a noise suppression device according to Embodiment 1 of the present invention.
FIG. 2 is a diagram showing a relationship between a noise-likeness signal and a noise spectrum update coefficient, and a state of a current frame.
FIG. 3 is a flowchart of a setting process of an estimated noise spectrum update flag for each band in a band-based speech / noise determination unit according to Embodiment 1 of the present invention;
FIG. 4 is a diagram illustrating an example of a speech spectrum and a noise spectrum of an input signal.
FIG. 5 is a diagram showing an example of an estimated noise spectrum update flag obtained from the input signal shown in FIG. 4 according to the first embodiment of the present invention.
FIG. 6 is a diagram illustrating an example of a speech spectrum and a spectrum of a speech noise signal.
FIG. 7 is a diagram illustrating an example of a spectrum after noise suppression processing when a speech noise signal is erroneously determined to be speech.
FIG. 8 is a diagram showing an example of a spectrum after noise suppression processing according to the first embodiment of the present invention.
FIG. 9 is an example of a speech noise spectrum in a noise section.
FIG. 10 is a diagram illustrating an example of a speech noise spectrum after noise suppression processing when a speech noise signal is erroneously determined to be speech.
FIG. 11 is a diagram showing an example of a speech noise spectrum after the noise suppression processing according to the first embodiment of the present invention.
FIG. 12 is a block diagram showing a configuration of a noise suppression device according to Embodiment 3 of the present invention.
FIG. 13 is a diagram illustrating a relationship between a noise likeness signal and each determination threshold.
[Explanation of symbols]
Reference Signs List 1 input terminal, 2 time / frequency conversion unit, 3 noise likeness analysis unit, 4 band division unit, 5 band SN ratio calculation unit, 6 band speech / noise determination unit, 7 noise spectrum estimation unit, 8 noise suppression unit, 9 Noise suppression control unit, 10 spectrum subtraction unit, 11 spectrum amplitude suppression unit, 12 frequency / time conversion unit, 13 output terminal, 14 threshold change unit, 100, 300 noise suppression device.

Claims

A time-frequency conversion unit that converts an input audio signal represented in a time domain into a frequency domain expression, and generates an amplitude spectrum and a phase spectrum from the frequency components;
A band dividing unit that divides the amplitude spectrum into a plurality of frequency bands and outputs an average amplitude spectrum for each band;
Analyzing the spectrum shape of the average amplitude spectrum for each frequency band, speech, noise, and a distinction of speech noise similar to speech, a band-based speech / noise determination unit that outputs a spectrum shape determination result,
Based on the spectrum shape determination result, a noise spectrum estimating unit that determines whether to update the estimated noise spectrum for each frequency band,
Based on the spectrum shape determination result, an optimum noise suppression method is selected for each of the frequency bands, and noise obtained by suppressing the estimated noise spectrum for each of the frequency bands from the amplitude spectrum of each frequency band according to the selected method. A noise suppression unit that outputs a suppression spectrum,
A noise suppression device comprising: a frequency / time conversion unit that generates a noise suppression signal by converting the noise suppression spectrum into a signal represented in a time domain.

Analyzing the noise likeness of the input speech signal, the noise likeness signal indicating whether the input speech signal is speech or noise, and a noise likeness analysis unit that outputs a noise spectrum update coefficient according to the noise likeness,
The band-based speech / noise determination unit uses the noise likeness signal for analysis of a spectrum shape for each frequency band,
2. The noise suppression device according to claim 1, wherein the noise spectrum estimating unit calculates an estimated noise spectrum using the noise spectrum update coefficient, and updates the estimated noise spectrum based on a spectrum shape determination result.

For each frequency band, there is provided a band S / N ratio calculator that calculates a band S / N ratio represented by a ratio of the power of the average amplitude spectrum to the power of the estimated noise spectrum,
The speech / noise determining unit for each band determines a spectrum shape of an average amplitude spectrum based on the value of the band SN ratio and a value of variance of the band SN ratio in a plurality of bands. Item 3. The noise suppression device according to Item 2.

The band-based speech / noise determination unit determines, when the frequency band in which the band SN ratio is equal to or smaller than the first threshold is continuous equal to or larger than the second threshold, the spectrum shape of the continuous band as noise, 4. The noise suppression device according to claim 3, wherein the noise is determined in other cases.

The band-based speech / noise determination unit determines that, when the frequency bands in which the band SN ratio is equal to or less than the first threshold are continuous over the second threshold, the variance of the band SN ratio in those continuous bands is larger than the third threshold. If it becomes smaller, the spectrum shape of those continuous bands is determined as noise,
On the other hand, when the variance of the band S / N ratio in these continuous bands is equal to or greater than a third threshold, the spectrum shape of these continuous bands is determined as speech noise. Noise suppression device.

The noise suppression apparatus according to any one of claims 1 to 5, wherein the noise spectrum estimating unit does not update the estimated noise spectrum of the band determined to be speech noise. apparatus.

The noise suppression unit includes a noise suppression control unit that controls a ratio of suppression amounts by a plurality of noise suppression methods based on the spectrum shape determination result and the band SN ratio,
The noise suppression spectrum obtained by suppressing the noise spectrum for each frequency band from the amplitude spectrum of each frequency band based on the controlled ratio is output. The noise suppression device according to claim 1.

8. The noise suppression control unit according to claim 7, wherein the noise suppression control unit performs only noise suppression by spectrum amplitude suppression without performing noise suppression by spectrum subtraction for a band in which the spectrum shape determination result is noise or speech noise. Noise suppression device.

The noise suppression control unit increases the ratio of noise suppression by spectrum subtraction and decreases the ratio of noise suppression by spectrum amplitude suppression as the band SN ratio increases for a band whose spectrum shape determination result is voice. The noise suppression device according to claim 7 or 8, wherein

The noise suppression device according to any one of claims 1 to 9, wherein the band division unit divides an amplitude spectrum of the input audio signal into a frequency band corresponding to human hearing characteristics.

A threshold changing unit that selects and outputs a first threshold according to the noise likelihood signal output by the noise likeness analyzing unit,
The noise suppression device according to any one of claims 4 to 10, wherein the band-based speech / noise determination unit determines the spectrum shape using the selected first threshold. .

A threshold changing unit that selects and outputs a second threshold according to the noise likelihood signal output by the noise likeness analyzing unit,
The noise suppression device according to any one of claims 4 to 10, wherein the band-based speech / noise determination unit determines the spectrum shape using the selected second threshold value. .