JP4509413B2

JP4509413B2 - Electronics

Info

Publication number: JP4509413B2
Application number: JP2001095039A
Authority: JP
Inventors: 由利子塚原
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2001-03-29
Filing date: 2001-03-29
Publication date: 2010-07-21
Anticipated expiration: 2021-03-29
Also published as: JP2002300687A

Abstract

PROBLEM TO BE SOLVED: To provide an electronic apparatus which realizes a satisfactory noise suppressing function and voice switch function. SOLUTION: The electronic apparatus comprises a voice switch for inserting a loss corresponding to a signal attenuation A in a transmitted or received signal and a noise canceler which detects voice components from the transmitted or received signal, computes a signal attenuation B and compares the attenuation A with the attenuation B to obtain an adjusted final signal attenuation C for suppressing noise.

Description

【０００１】
【発明の属する技術分野】
本発明は、音声信号の送受を取り扱う携帯電話などの電子機器に関する。
【０００２】
【従来の技術】
携帯電話などの音声の送受を行う通信システムにおいては、ハウリングなどの防止のため、ボイススイッチが通信システムを構成する電子機器（携帯電話など）に搭載されている。
【０００３】
ボイススイッチは送受信の信号ループのなかに恒常的にＬＯＳＳを挿入するものであり、受信信号もしくは送信信号のどちらか一方に所定の信号低減量を課すものである。
【０００４】
一般的には、受話のみのときには送信信号に、送話のみのときには受信信号にＬＯＳＳを挿入するように設定されている。いわゆるダブルトークの際もしくは無話状態のときには、例えば、送話側にＬＯＳＳが挿入されるように設定されている。
【０００５】
また携帯電話など音声通信を行う電子機器では、ＣＥＬＰ(Code Excited Linear Prediction)方式などの音声符号化方式が用いられている。
【０００６】
このような機器を背景雑音の大きい環境下で使用すると、この背景雑音が取り込まれて符号化され音声の明瞭感が低下してしまう。そのため、背景雑音を除去もしくは抑制して音声のみの信号に近づけて音声符号化を行う技術（ノイズキャンセラ）が研究され、電子機器に搭載されてきている。
【０００７】
【発明が解決しようとする課題】
前述のごとく、ボイススイッチ及びノイズキャンセラはそれぞれ異なる目的を有するため、両者を備えた電子機器においては、それぞれ独自にその機能を発揮するように個別に動作している。
【０００８】
従って両者が同一の信号系（たとえば送話系）に信号低減をいれた場合はどちらの機能にとっても必要以上の信号低減が行われてしまうという問題点がある。
【０００９】
また、音声検出機能及び信号低減機能は共通の機能である。従って信号処理的には冗長な領域が存在していることになる。
【００１０】
本発明は以上の点を考慮してなされたものであり、ノイズキャンセラとボイススイッチとを有する際の音声信号処理が改善された電子機器の提供を目的とする。
【００１１】
【課題を解決するための手段】
本発明は、送受信される音声信号を取得し、送信信号または受信信号の少なくとも一方への信号低減を行う電子機器であって、
送信信号または受信信号の音声検出結果に基づいて送信信号への第１の信号低減量を設定するボイススイッチと、
送信信号を取得し、前記ボイススイッチによって設定された第１の信号低減量以下の最終信号低減量で送信信号への信号低減を行うノイズキャンセラとを備えたことを特徴とする電子機器である。
【００１２】
すなわちボイススイッチによる信号低減量（Ａ）とノイズキャンセラによる信号低減量（Ｂ）とを単純に加算することなく、両者を比較して適正な信号減衰量を決定するというものである。
【００１３】
一般的にノイズキャンセラの方がきめ細かいノイズ低減を行うように動作するので（信号減衰量の時間変化が細やか）、ノイズキャンセラの信号低減量（Ｂ）を基準に送信／受信系に挿入する信号低減量を決定することが好ましい。
【００１４】
しかしながらハウリング防止などの観点から一定量の信号低減は送信／受信系のいずれかには入れる必要がある。
【００１５】
従って、信号低減量の基準としてはノイズキャンセラの信号低減量（Ｂ）を採用し、信号低減量の上限（絶対値としては下限）をボイススイッチの信号低減量（Ａ）とすることが好ましい。
【００１６】
すなわち、ノイズキャンセラの信号低減量（Ｂ）がボイススイッチの信号低減量（Ａ）以下となるように調整することで達成できる。
【００１７】
またボイススイッチのＬＯＳＳは送信系，受信系のいずれかに挿入されるが、その判断は送信系／受信系のどちらに音声信号が存在するかの判断による。この判断手法は種々の方法があり特段限定するものではない。
【００１８】
しかしながら、ノイズキャンセラでも音声信号の有無の判断を行っているので、この結果を利用することでボイススイッチにおける独自の音声信号有無検出の処理を省略することができる。
【００２０】
本発明で用いるノイズキャンセラとしては時間領域におけるノイズ抑制方式，周波数領域におけるノイズ抑制方式など各種方式を採用することができる。
【００２１】
例えば特許２９９５７３７号公報には、入力信号を所定期間のフレームに分割し、フレーム毎に雑音／音声を判断し、雑音フレームと判断された場合には帯域別のゲイン値を最小に設定し、音声フレームと判断された場合にはそれを超えるゲイン値を設定し、ノイズ抑圧を行う方式が開示されており、これを用いることもできる。
【００２２】
また、入力信号を定められた時間単位のフレームに分割し、この分割されたフレームを所定の周波数帯域に分割し、この分割された帯域ごとに雑音の抑圧処理を行うノイズ抑制方法において：前記フレームが雑音フレームであるか音声フレームであるかの判定を行う音声フレーム判定ステップと；前記音声フレーム判定ステップの結果に基づき各フレームの帯域別ゲイン値を設定する帯域別ゲイン決定ステップと；前記帯域ゲイン決定ステップにより決定された帯域別ゲイン値を用いて帯域毎に雑音抑圧を行った後にフレームを再構成して雑音抑制された出力信号を生成する信号生成ステップとを具備し、前記帯域別ゲイン決定ステップでは、決定対象のフレームが音声フレームであると判定された場合の帯域別ゲイン値が、決定対象のフレームが雑音フレームであると判定された場合の帯域別ゲイン値より小さい値を取り得るように帯域別ゲイン値の設定が行われることを特徴とするノイズキャンセラを採用することもできる。
【００２３】
この方式では音声フレーム内での音声成分を含まない帯域のノイズ抑制が十分に行われ、良好なノイズ低減を行うことができる。すなわち、雑音抑圧を行う際の帯域別のゲイン値を音声フレームと雑音フレームとで区別して決定するだけではなく、雑音フレームの帯域別最小ゲイン値よりも音声フレームの帯域別最小ゲイン値の方が小さくなるように設定することで、ノイズ抑圧後の音声信号の聴感が改善されるている。
【００２４】
音声フレームと判断されたフレームでも全ての帯域に音声成分が含まれているとは限らない。音声フレームと判断されたフレーム内の音声成分が含まれない（若しくは音声成分が少ない）と推定される帯域に関しては、雑音フレームの帯域別ゲイン値より小さいゲインを設定し、音声フレーム内での音声成分の含まれる帯域を際立たせることで良好な聴感が得られるのである。
【００２５】
すなわち、決定対象フレームが音声フレームであると判定されたフレーム内の音声成分が含まれないと推定された帯域の帯域別ゲイン値を、決定対象のフレームが雑音フレームであると判定された場合の帯域別ゲイン値より小さい値を取るように帯域別ゲイン値の設定を行うことにより、音声フレーム内の音声成分を含む帯域をより際立たせることができ、結果として聴感の良好なノイズ抑圧された出力信号を得ることができる。
【００２６】
なお、雑音フレームにおいては、各帯域に関し一定値のゲイン値を設定する様にしても良いし、帯域別パワーと雑音パワーとの差に基づいて変化するように設定してもよい。
【００２７】
また、音声フレームに関しては帯域別パワーと雑音パワーの差に基づく指標が大きくなるにつれ帯域別ゲイン値が大きくなるように設定し、この指標が所定値以下の場合は一定値とする設定も可能である。連続的に減少する関数を採用することも構わない。
【００２８】
このようなノイズ抑制方法においては、前述の帯域別ゲインを決定する前段階として、雑音パワーの推定値の更新を行う段階がある。この雑音推定値は、所定の条件で更新され、例えば特表平１０−５１３０３０号に開示されたノイズ抑圧方法に開示された更新方法を採用することができる。
【００２９】
この更新方法は、各フレームの個々の帯域ごとのＳＮＲ（信号エネルギ／雑音エネルギの対数値）に重み付けを行ったものの合計であるボイスメトリックを用いるものであり、個々の帯域ごとの偏差（信号エネルギの対数値−過去の信号エネルギの平均値の対数値）の絶対値をとったものの合計であるスペクトル偏差を用いて雑音推定値を更新する技術であり、このスペクトル偏差がしきい値を一定時間（例えば１秒間）下回った場合は推定雑音値が更新される。
【００３０】
また、スペクトル偏差の値をそのまま判定に用いるのではなく、過去フレームとの間で、帯域パワーと雑音パワーとの差の偏差合計をその平均値で正規化し、この正規化値をもとに雑音区間の判定を行うことで、上記方法に比べフレーム間の変動の大きい雑音を雑音として認識することができる方法を採用することもできる。
【００３１】
すなわち、帯域別パワーと帯域別雑音パワー推定値との差に所定の重み付けを行った帯域別有意値(suby)の現フレームと前フレームとの差を合計した値(sum)を、その平均値(sum＿average)で正規化した比率(r)をもとに現フレームが雑音フレームであるか否かの判定を行う方法である。
【００３２】
このように過去フレームとの帯域別有意値(suby)の偏差を利用し、この偏差合計値の平均値で偏差を正規化した値を判定根拠に用いることでフレーム毎のばらつきを緩和することができるので、安定した雑音フレーム判定を行うことができる。従ってフレーム間のばらつきが大きい雑音に対しても雑音としての認識を良好に行うことができる。
【００３３】
より詳細に説明すると、送信入力信号を定められた時間単位のフレームに分割するフレーム分割ステップと；各々のフレームについて複数の周波数帯域に分割する周波数帯域分割ステップと；各々の周波数帯域について帯域別パワー(channel＿power)を算定する帯域別パワー算定ステップと；各々の周波数帯域について帯域別雑音パワー推定値(noise＿power)と前記帯域別パワー(chennel＿power)との差(tmp)を算定し、この差(tmp)に所定の重み付けを行って得た帯域別有意値(suby)を所定の条件にて加算した有意値(y)を算定する有意値算定ステップと；現フレームと前のフレームとの間で、各々の周波数帯域について帯域別有意値(suby)の差の絶対値和(sum)をとる帯域別有意値和算定ステップと；前記絶対値和(sum)の平均値(sum＿average)を算定し、前記絶対値和(sum)をこの絶対値和の平均値(sum＿average)で正規化した比率(r)を算定する有意値正規化ステップと；を有する。
【００３４】
雑音パワー推定値の更新は以下の２種類のステップを有する。
【００３５】
すなわち、前記有意値(y)が所定のしきい値を下回った場合に現フレームを雑音フレームと判断し、前記帯域別雑音パワー推定値(noise＿power)を更新する第1の雑音パワー推定値更新ステップと；前記比率(r)が所定のしきい値を所定の期間連続して下回った際に現フレームを雑音フレームと判断し、前記帯域別雑音パワー推定値(noise＿power)の更新を行う第2の雑音パワー推定値更新ステップとである。
【００３６】
上記第１の雑音パワー推定値更新ステップは、良好に雑音推定が行われて有意値判定により雑音フレームであると判定される場合であり、第２の雑音パワー推定値更新ステップは、有意値がフレーム間でばらついたりして有意値では良好な雑音フレーム判定ができない場合でも強制更新を可能とするものである。
【００３７】
なお正規化に用いる平均値は、前記絶対値和(sum)のリーク積分を用いての推定値を使用することができる。また、前記絶対値和(sum)の標準偏差のリーク積分を用いて得られた前記平均値(sum＿average)の推定値を用いることも可能である。
【００３８】
なお、前述の帯域別のゲイン設定に際しては、音声フレームの場合と雑音フレームの場合とで異なる関数を用いてその帯域別ゲイン値を決定すすことになるが、ゲイン値決定の変数は基本的には帯域別パワーと帯域別ノイズパワーとの差（対数では差：ＳＮＲ）をもとに算出される。すなわち音声フレームでＳＮＲが大きい帯域は音声成分を含んでいる帯域と推定されるので、その帯域のゲイン値は大きく設定され、ＳＮＲが小さい帯域は音声成分を含んでいないと推定され、そのゲイン値は小さく設定される。
【００３９】
ところで、雑音（Background Noise）は一般に定常と仮定されるが、屋外では変動する場合がある。特に、自動車が通り過ぎるときに発生する雑音のエネルギは自動車の接近とともに大きくなる。この状態で送話音声が入力されると、音声と雑音とのエネルギ差が小さいため、抑圧後の音声を歪ませることがある。また、雑音のスペクトル形状と音声のスペクトル形状が似ている場合も、雑音エネルギをもとに抑圧を行うと音声のスペクトルに干渉しやすくなるため、抑圧後の音声に歪みが発生する。雑音エネルギが変動した場合でもその影響を排除して安定な雑音抑圧処理を行えるように、ＳＮＲを基本としながら、ゲイン値決定の変数を調整することでそのような影響を抑えることも可能である。
【００４０】
このような調整は、前記帯域別ゲイン値の決定に際し、前記周波数帯域ごとに信号のパワーを求め、この帯域パワーをもとに帯域別の雑音パワーを推定する雑音パワー推定ステップと；前記帯域パワー及び帯域別雑音パワーのうちの少なくとも一方について、複数のフレーム期間に亘りパワーの最小値を検出する最小値検出ステップと；前記周波数帯域ごとにその帯域パワーと前記最小値検出ステップにより検出された帯域別最小値との差を求める帯域別最小値決定ステップから求められた差をもとに周波数帯域別の雑音抑圧量を決定することにより行うことができる。
【００４１】
さらに、フレームごとに異なる帯域共通の調整値を生成する調整値を用い、前記周波数帯域ごとに、前記帯域別最小値と前記調整値を加えた値とその帯域パワーとの差を求め、この差をもとに周波数帯域別の雑音抑圧量を決定することにより行うこともできる。
【００４２】
この調整値は、雑音区間においては、前記帯域別最小値間の平均値と前記帯域別雑音パワー間の平均値との差に基づいて帯域共通の調整値を決定し；音声区間においては、１フレームにおける複数の帯域パワーの中の最小値と複数の帯域別最小値の中の最大値との差に基づいて帯域共通の調整値を決定することで得ることができる。
【００４３】
なお、音声フレームと雑音フレームとの判定には：周波数帯域ごとに信号のパワーを求め、この帯域パワーをもとに帯域別の雑音パワーを推定する雑音パワー推定ステップと；前記周波数帯域ごとに帯域別雑音パワーと帯域パワーとの差を求め、これらの帯域別差を所定のしきい値と比較する比較ステップと；周波数順に配列された前記各帯域別差のうち隣接する複数の帯域の帯域別差がしきい値を超えると判定された場合に、これらの帯域別差を所定の重み付けを行った上で相互に加算する加算ステップと；この加算ステップにより得られた帯域別差の加算値に基づいて、前記入力信号について音声区間か雑音区間であるかを判定する判定ステップとからなる判定方法を採用することができる。
【００４４】
この加算ステップでは、各帯域別差に対し、周波数が高くなるに従い重みが小さくなるような重み付けを行うことができ、前記判定ステップでは、前記加算値に基づいて、前記入力信号について音声区間か、雑音区間か或いは両区間の中間領域である過渡区間かを判定することが可能である。
【００４５】
この様な本発明は、ＡＣＥＬＰ，ＥＶＲＣ，ＥＦＲ，ＡＭＲなどの各種音声符号化方式を用いたディジタル音声符号化方法を採用する携帯電話など電子機器に適応できる。すなわち、音声信号入力部（マイクなどの直接入力手段，電子ファイルなどからの信号送出でも構わない）と、音声符号化部とを有する電子機器において、音声信号入力部の音声信号を受け、上述のノイズ抑制方法によりノイズ抑制された信号を音声符号化部へ供給するノイズキャンセラと、前述のボイススイッチとを具備した電子機器である。
【００４６】
なおボイススイッチ，ノイズキャンセラは、例えば音声符号化などと同様にＤＳＰ内の信号処理により実行することが可能である。
【００４７】
【発明の実施の形態】
本発明の実施態様を説明する。
【００４８】
図１は本発明の実施態様を示す電子機器の概略ブロック図である。
【００４９】
受話信号（１）は受話信号増幅器（２）を介してスピーカなどの音声出力器（３）に供給される。またマイクなどの音声入力器（４）からの入力信号はノイズキャンセラＮＣ（５）を介して送話信号（９）として送り出される。
【００５０】
ボイススイッチＶＳ（６）は、受話信号から受話信号中の音声信号の有無を検出し、またノイズキャンセラＮＣ（５）から送話信号中の音声信号の有無を示す信号ｓｐ［音声検出フラグ］を受け取る。その結果からダブルトーク判定部ＤＴＤ（７）で送受話信号のどちらに音声信号が含まれているかを判断する。
【００５１】
この判断結果を受けてＬＯＳＳ決定部（８）では、設定に基づき送信系／受信系に挿入するＬＯＳＳを決定し、増幅器（２）にはＲ＿ｌｏｓｓを、ノイズキャンセラＮＣ（５）にはＳ＿ｌｏｓｓを通知する。
【００５２】
この設定としては、例えば、一定量のＶＳ＿ｌｏｓｓ（例えば−１２ｄＢ）を送話／受話のどちらか一方に挿入するものとする。
【００５３】
音声の有無により４個のケースがあり下記のように設定される。
（１）受話（音声なし）／送話（音声なし）：Ｓ＿ｌｏｓｓ＝ＶＳ＿ｌｏｓｓ
（２）受話（音声有り）／送話（音声なし）：Ｓ＿ｌｏｓｓ＝ＶＳ＿ｌｏｓｓ
（３）受話（音声なし）／送話（音声有り）：Ｒ＿ｌｏｓｓ＝ＶＳ＿ｌｏｓｓ
（４）受話（音声有り）／送話（音声有り）：Ｓ＿ｌｏｓｓ＝ＶＳ＿ｌｏｓｓ
ＶＳ＿ｌｏｓｓの挿入されなかった方の減衰量は“０”とする。
【００５４】
この設定は適宜変更することが可能である。例えば、ボイススイッチの切り替え時（ＶＳ＿ｌｏｓｓから“０”への切り替え）には急激に減衰量が変化するのでいわゆるスイッチ感がユーザーに感じられる。これを低減するため、切り替え時に減衰量に傾きをつけることも可能である。
【００５５】
このボイススイッチの決定する信号減衰量（Ａ）とノイズキャンセラの決定する信号減衰量（Ｂ）とから送話／受話信号の信号減衰量を決定する。
【００５６】
本実施例では、受話信号の信号減衰量はボイススイッチの信号減衰量（Ａ）となり、送話信号の信号低減量はボイススイッチの信号低減量（Ａ）で制御されたノイズキャンセラの最終信号減衰量（Ｃ）となる。
【００５７】
すなわち、ノイズキャンセラがノイズ低減のために算定した信号減衰量が、ボイススイッチが提供する信号減衰量より大きい場合はノイズキャンセラの信号減衰量を採用し、ボイススイッチが提供する信号減衰量よりノイズキャンセラの信号減衰量が小さい場合はボイススイッチが提供する信号減衰量を採用する。
【００５８】
従って、送話信号の最終信号低減量は、ボイススイッチの提供する信号減衰量を上限とした形で推移することになる。
【００５９】
送話側の音声信号の有無もボイススイッチで送話信号を取り込んで同様の処理を施すことで判定してもよいが、本実施形態では送話信号中の音声の有無はノイズキャンセラの判断を用いる設定にしている。
【００６０】
次にノイズキャンセラの動作について説明する。
【００６１】
ノイズキャンセラは例えばＤＳＰ（Digital Signal Processor）により実現されるものであり、その処理プログラムはノイズキャンセラ内のメモリまたは制御回路に付属するメモリに格納されている。図２はこの処理プログラムにより実現される機能構成を示すノイズキャンセラのブロック図である。
【００６２】
マイクなどの音声入力部からの音声信号がＡ／Ｄ変換されたデジタル音声信号は、まずフレーム分割部２１に入力される。フレーム分割部は、例えば１２８サンプルに整えられたフレームを出力する（フレーム分割ステップ）。このときディジタル送話信号を例えば８０サンプルのフレームに分割した後、ウインドウがけを行うことによりフレーム端をオーバーラップさせても構わない。このディジタル送話信号フレームを高速フーリエ変換部（ＦＦＴ）２２に入力する。
【００６３】
ＦＦＴ２２の出力はノイズキャンセラの最終信号低減量（Ｃ）を決定する最終低減量決定部３４からの出力に基づき乗算器２３にてノイズ抑制がなされ、ＩＦＦＴ部２４にて逆ＦＦＴをかけフレーム合成部２５にてフレームに戻し、送信信号として出力される。
【００６４】
また、ＦＦＴ２２の出力は帯域パワー計算部２６に入力され、その出力は有意値計算部２７及び帯域別ゲイン決定部３３に供給される。また更新判定部３１の判定結果により雑音リーク積分値更新部３２にもその出力が供給される。
【００６５】
有意値計算部２７の出力は更新判定部３１及び音声重み計算部２８に供給される。
【００６６】
雑音リーク積分値更新部３２の出力は、有意値計算部２７，音声重み計算部２８及び帯域別ゲイン決定部３３に供給される。
【００６７】
音声重み計算部２８の出力は雑音最小値推定部２９及び帯域別ゲイン決定部３３に供給されるとともに、ボイススイッチＶＳにＳＰ値（音声検出フラグ）として出力される。
【００６８】
帯域別ゲイン決定部３３は、雑音リーク積分値更新部３２，雑音最小値推定部２９及び音声重み計算部２８からの出力を入力とし、最終低減量決定部３４へ信号を出力する。
【００６９】
最終低減量決定部３４は、ボイススイッチＶＳからのＳ＿ｌｏｓｓと帯域別ゲイン決定部３３からの出力を入力とし、乗算器２３へゲインを出力する。
【００７０】
以下各ブロックにおける動作を説明する。
【００７１】
高速フーリエ変換部ＦＦＴ２２は、入力されたディジタル送話信号フレームに対し高速フーリエ変換処理を行い、低域から高域まで順に１６帯域（ｋ＝０，１，２，・・・１５）に周波数分割された変換係数を得る。この変換係数は各帯域において同じである必要はない。この帯域分割された変換係数を、帯域パワー計算部２６に出力する（周波数帯域分割ステップ）。
＜帯域パワー計算＞
帯域パワー計算部２６は、各帯域ごとにエネルギ（変換係数の二乗平均値）を求めて対数をとり、帯域パワーchannel＿power(m,k)、［mはフレーム番号，ｋは帯域番号（０〜１５）］を出力する（帯域別パワー算定ステップ）。この帯域パワーは有意値計算部２７に出力される。
＜有意値計算＞
有意値計算部２７では、後述する雑音リーク積分値更新部３２から出力される雑音リーク積分値noise＿power(m,k)と、上記帯域パワーchannel＿power(m,k)との差tmpをもとめ、帯域別の差tmpを所定のしきい値と比較する。周波数順に配列された上記帯域別の差tmpの内、隣接する複数の帯域の帯域別差tmpがしきい値を超えると判定された場合に、これらの帯域別差tmpに所定の重み付けを行った上で相互に加算する。この重み付け後の値suby(m,k)の条件付き総和（隣接する複数の帯域の帯域別差tmpがしきい値を超えると判定された場合）を有意値ｙとして出力する（有意値算定ステップ）。
【００７２】
また有意値ｙの平均値（y＿average：リーク積分による推定値で代用でき、例えば下記の式にて計算）も出力する。
【００７３】
ｙ(m)：有意値、suby(m,k)の条件付き総和
y＿average(m)=y＿average(m-1)×0.9＋y(m)×0.1
図３は有意値計算部２７の処理手順を示すフローチャートである。有意値ｙを出力するフローを図３に基づいて説明する。
【００７４】
ステップ３ａでフレーム番号ｍ＝０にリセット／初期値設定した後、ステップ３ｂでグループ番号ｍをインクリメントするとともに有意値ｙ，帯域番号ｋ及び連続数flag（しきい値を超える帯域別差tmpの連続数フラグ）を“０”に初期設定する。
【００７５】
次にステップ３ｃで帯域ｋ＝０について、帯域パワーと雑音リーク積分値との差tmpと、この帯域別差tmpに対して重み付けを行った値suby(m,k)とを下記のように計算する。
【００７６】
tmp＝chanel＿power(m,k)−noise＿power(m,k)
sub＿y(m,k)＝｛200−(k−1)²｝／100×（tmp−1）
ただし、｛200−(k−1)²｝は重み付け係数である。この場合、帯域の周波数が高くなるにつれ小さくなるように設定されているが、適宜変更可能である。
【００７７】
帯域ｋ＝０における帯域別差tmpが算出されると、有意値計算部２７はステップ３ｄで帯域別差tmpをしきい値（例えば１）と比較する。しきい値を超えていると音声である可能性があると判断してステップ３ｅ，ステップ３ｇを経てステップ３ｉに移行し、連続数flagを１に設定する。ついでステップ３ｋで帯域番号kをインクリメントしてｋ＝１とした後、ステップｃに戻って帯域ｋ＝１についても同様の処理を実行する。
【００７８】
ここで帯域ｋ＝１においても帯域ｋ＝０に引き続いて帯域別差tmpがしきい値を超えたとする。連続数flagは既に１なのでステップ３ｅからステップ３ｆに移行して、ここで
y＝y＋suby(m,k−1)
なる演算を実行する。そして連続数flagを２に設定し、ステップ３ｇを経てステップ３ｈに移行して、下記演算を実行する。
【００７９】
y＝y＋suby(m,k)
ついでステップ３ｋで帯域番号ｋを更にインクリメントしｋ＝２として、ステップ３ｃに戻り、帯域ｋ＝２についての処理を実行する。
【００８０】
以降同様に、隣接する帯域の帯域別差tmpが連続してしきい値を超える毎に、その帯域のsuby(m,k)が一つ前の帯域までに得られた有意値ｙに順次加算され、帯域別差tmpの重み付け加算値ｙが求められる。
【００８１】
なお、いずれかの帯域ｋ＝ｉにおいて、帯域別差tmpがしきい値以下になると、有意値計算部２７はステップ３ｄからステップ３ｊに移行し、連続数flagを０にリセットする。
【００８２】
こうして１フレームを構成する１６個の全ての帯域（ｋ＝０〜１５）について処理が完了すると、有意値計算部２７は、ステップ３ｍからステップ３ｎに移行し、有意値ｙと、帯域ごとに算出した重み付け後の帯域別差suby(m,k)（k＝0〜15）を夫々出力する。
【００８３】
このようにして各フレーム毎に有意値ｙが求められ、音声フレームであるか雑音フレームであるかの判定に供される。
【００８４】
また有意値計算部２７では雑音パワー強制更新を判定する有意区間のカウントをも行う。この処理を図４のフローチャートに基づいて説明する。
【００８５】
まず有意値ｙ(m)の平均値y＿average(m)を求める。
【００８６】
ステップ４ａでフレーム番号ｍ＝０，sum＿average(0)=0.1，y＿average(0)=10，counter(0)＝０に初期値設定した後、ステップ４ｂでグループ番号ｍをインクリメントするとともに有意値ｙ，sub(m,k)を入力する。
【００８７】
ついでステップ４ｃで有意値ｙの平均値を算出する。平均値はメモリ容量，計算量などの関係から適宜期間を設定することができるが（例えば０．１〜０．３秒くらいの平均をとれば十分であるので、過去２０フレーム分を加算して平均を求めるなど）、一般的にはリーク積分を用い下記のように推定算出する。平均値の求め方はリーク積分以外の手法を用いても良いことは言うまでもない。
【００８８】
y＿average(m)=y＿average(m-1)×0.9＋y(m)×0.1
次にステップ４ｄでsub(m,k)とsub(m-1,k)との差の絶対値和sumを求め（帯域別有意値和算定ステップ）、更にステップ４ｅにて、絶対和sumの平均値sum＿averageで割り、比率ｒを算出する（有意値正規化ステップ）。
【００８９】
sum(m)／sum＿average（ｍ−1）
この値を直接ｒとしても良いが、特異的な値を除去するため、r(m−1)に決められた減衰率（例えば０．９９）を乗じた値との大きさを比べ、大きい方をｒ（ｍ）として採用する。
【００９０】
この比率ｒは有意値区間算定のカウンタ加算の判定基準となるものであり、例えば上限は８に設定される。従って、ステップ４ｆでｒ（ｍ）が８を超えていると判定されるとステップ４ｇでｒ（ｍ）＝８に設定し直される。
【００９１】
ついでステップ４ｈでsum＿averageが更新される。この平均値もメモリ容量，計算量などの関係から適宜期間を設定することができるが（例えば０．１〜０．３秒くらいの平均をとれば十分であるので、過去２０フレーム分を加算して平均を求めるなど）、一般的にはリーク積分を用い下記のように推定算出することができる。平均値の求め方はリーク積分以外の手法を用いても良いことは言うまでもない。
【００９２】
sum＿average(m)＝sum＿average(m−1)×0.9＋sum(y)×0.1
なおsum＿averageは標準偏差の推定値を用いても良い。その場合も下記式のリーク積分を用いて推定値を得ることができ、この値で代用する。
【００９３】
sum＿average(m)＝sqrt(sum＿average(m−1)²×0.9＋sum(m)²×0.1)
続いて有意区間のカウンタcounter(m)を算出する。
【００９４】
ｙ＞１０かつcounter(m)＜１００かつr(m)≦ＴＨＲのとき、counter(m)に１が加算される。この条件を満たさない場合はcounter(m)=0にリセットされる。
【００９５】
ＴＨＲは固定値でも構わないし、y＿averageによって変化させることも可能である。本実施形態では、下記の式で変化するＴＨＲを採用している。
【００９６】
ＴＨＲ＝1.7＋（y＿average−40）／200 ただし 1.7≦ＴＨＲ≦2.0
y＿average＞100 ＴＨＲ＝2.0
y＿average≦ 40 ＴＨＲ＝1.7
40≦y＿average≦100 ＴＨＲ＝1.7＋（y＿average−40）／200
従ってステップ４ｉでy＿average(m)が１００を超えると判定された場合はステップ４ｊにてＴＨＲ＝２．０に設定され、ステップ４ｋでy＿average(m)が４０を超えると判定された場合はステップ４ｌでＴＨＲが上記式の可変値に設定される。その他の場合はステップ４ｍにてＴＨＲ＝１．７に設定される。
【００９７】
ステップ４ｎで有意値ｙが１０を超えていると判定され、ステップ４ｏでカウンタcounterが１００未満と判定され、ステップ４ｐで比率ｒがＴＨＲ以下と判定された場合は、ステップ４ｑでカウンタcounterが加算され、それ以外の場合はステップ４ｒにてカウンタcounterは０にリセットされる。
【００９８】
同様にステップ４ｎで有意値ｙが１０以下と判定された場合はステップ４ｓでカウンタcounter(m)は０にリセットされ、ステップ４ｏでcounterが１００以上（すなわち１００）の場合はステップ４ｔでcounter(m)＝counter(m−1)に据え置かれる。
【００９９】
以上の処理で各フレームｍに対して、counter(m)とy＿average(m)が出力されることになる（ステップ４ｕ）。
＜更新判定＞
これらの出力（counter(m),suby(m,k),y(m),y＿average(m)）を受け更新判定部３１で帯域別雑音パワー値noise＿power(m,k)の更新の有無を判定し、雑音リーク積分値更新部３２で帯域別雑音パワー値を更新する。
【０１００】
有意値ｙは通常の音声の場合は２０〜３０程度であり雑音推定が良好に実施されている場合はｙ＜１５程度となる。従ってｙ＜１５のときには例えば下記の式により実施する（第１の雑音パワー推定値更新ステップ）。
noise＿power(m＋1,k)＝noise＿power(m,k)×0.9＋channel＿power(m,k)×0.1
ｋ＝０，１，・・・，１５
またＩＳ１２７［米国規格の可変レート音声符号化方式："Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems" (TIA IS127)］に規定されているような通常の雑音パワー更新を行ってもよい。
【０１０１】
何らかの理由でｙが正確に計算されない場合は、上記カウンタ値(counter)をもとに強制更新が実施される（第２の雑音パワー推定値更新ステップ）。たとえば、counter(m)≧１００かつy＜y＿average(m)＋５のときに、上記式に従って更新する。
【０１０２】
続いて帯域別ゲイン決定部３３において帯域別のゲインを決定する。このとき有意値計算部において算出された有意値(y)，帯域別有意値(suby)などを参照して、各帯域毎に設定される。
＜音声重み計算＞
上記有意値計算部にからの出力である有意値ｙを受け、音声重み計算部では雑音抑圧ゲインの決定に用いる音声重みｓｐの計算が行われる。音声重みｓｐは、１フレーム中に音声が含まれる度合いを０≦ｓｐ≦６の範囲で表す数値であり、ｓｐ＝０は雑音区間、ｓｐ＝６は音声区間を表す。なおこの数値，段階区切りなどは適宜設定可能である。
【０１０３】
このｓｐ値が前述のボイススイッチに供給され、ボイススイッチにおける送話信号中の音声の有無の判定に供される。例えば、ｓｐ＝０のときは音声なしと判断し、それ以外のときは音声有りと判断することができる。
【０１０４】
図５は、この音声重み計算部２８における音声重みｓｐの計算手順とその処理内容を示すフローチャートである。
【０１０５】
先ずステップ５ａでフレーム番号ｍを０にリセットしたのち、ステップ５ｂでグループ番号ｍをインクリメントする。次に、ステップ５ｃで上記重み付け加算値ｙを任意のしきい値「１３」と比較し、ｙ＜１３であれば雑音フレームと判断してステップ５ｄに移行し、ここで音声重みｓｐ（ｍ）を
ｓｐ（ｍ）＝ｓｐ（ｍ−１）−０．５
に設定する。後述の如くｓｐ（ｍ）は最終ステップで
ｓｐ（ｍ）＝ＭＡＸ（ｓｐ（ｍ），０））
と最小値が０となるように設定されるので、雑音フレームが連続すればｓｐ（ｍ）は０に収束する。
【０１０６】
一方、ｙ≧１３だった場合には、音声若しくは過渡期のフレームであり、ステップ５ｅに移行して仮の音声重みｚ
ｚ＝（ｙ−１３）×１．５＋１
を計算する。
【０１０７】
まず、ステップ５ｆにおいて
ｓｐ（ｍ−１）≦０．５
を判定する。すなわち、１フレーム前の音声重みｓｐ（ｍ−１）が０．５以下と十分小さいかどうかを判断し、雑音フレームだったかどうかを判定する。
【０１０８】
１フレーム前のフレームが雑音フレームと判断されていた場合、すなわちｓｐ（ｍ−１）が０．５以下の場合はステップ５ｇに移行し、ここで現フレームの音声重みｓｐ（ｍ）を上記仮の音声重みｚに設定する。
【０１０９】
このケースは、雑音から音声への切り替わりの時点であり、語頭が切れないように、雑音を抑制して音声をはっきりたち上げる必要がある。従って音声重みとして大きい値を取るように設定されることになる。
【０１１０】
これに対し、１フレーム前の音声重みｓｐ（ｍ−１）が雑音フレームではなかった場合（ｓｐ（ｍ−１）＞０．５）には、ステップ５ｈに移行して、
ｚ＞ｓｐ（ｍ−１）＋０．５
を判定する。ｚが（ｓｐ（ｍ−１）＋０．５）より大であればステップ５ｉで現フレームの音声重みｓｐ（ｍ）を（ｓｐ（ｍ−１）＋０．５）に設定する。
【０１１１】
このケースは、音声フレームの過渡期と判断されている時点であり、連続性を重視し、前フレームからｚの上昇を０．５に抑えていることになる。
【０１１２】
一方、ｚが（ｓｐ（ｍ−１）＋０．５）以下であればステップ５ｊに移行し、
ｚ＜ｓｐ（ｍ−１）−０．５
を判定し、ｚが（ｓｐ（ｍ−１）−０．５）より小であればステップ５ｋで現フレームの音声重みｓｐ（ｍ）を（ｓｐ（ｍ−１）−０．５）に設定する。
【０１１３】
このケースは、やはり音声フレームの過渡期と判断されている時点であり、連続性を重視し、前フレームからｚの下降を０．５に抑えていることになる。
【０１１４】
また、ｚが（ｓｐ（ｍ−１）−０．５）以下であればステップ５ｍに移行して現フレームの音声重みｓｐ（ｍ）＝ｚに設定する。
【０１１５】
以上のステップを経て、ｓｐ（ｍ）＝ｚ，ｓｐ（ｍ−１）±０．５の３種類の値のいずれかに設定され、最終的に
ｓｐ（ｍ）＝ＭＩＮ（ｓｐ（ｍ），６）
ｓｐ（ｍ）＝ＭＡＸ（ｓｐ（ｍ），０）
により、ｓｐ（ｍ）＝０〜６の値が決定される。
【０１１６】
すなわち上記ステップ５ｆからステップ５ｍにおいて、現フレームで算出した仮の音声重みｚが、１つ前のフレームで設定した音声重みｓｐ（ｍ−１）を考慮して補正され、ステップ５ｎでｓｐ（ｍ）として出力され、ステップ５ｂに戻り全てのｍに対してｓｐ（ｍ）が求められる。
【０１１７】
このように求めた音声重みｓｐ（ｍ）を使用することで、フレーム間の連続性を考慮した音声／雑音／過渡域の調整を行うことができる。
【０１１８】
上記音声重み計算部２８により求められた音声重みｓｐ（ｍ）は、雑音最小値推定部２９及び帯域別ゲイン決定部３３に入力される。
【０１１９】
なおこのｓｐ値は音声検出フラグとしてボイススイッチＶＳにも供給される。これをもとにＶＳ側で、例えばｓｐ＝０［音声なし］／ｓｐ＞０［音声あり］のような判断することになる。
＜雑音最小値推定＞
雑音最小値推定部２９は、上記音声重みｓｐがｓｐ＝０となる１００フレームの期間ごとに、各帯域における雑音のリーク積分値noise＿power(m,k)の最小値を調べる。そして、この最小値を次の１００フレームの期間において、雑音最小値noise＿min(m,k)として使用する。またそれと共に、各帯域の雑音最小値の帯域間平均値min＿allを求める。
【０１２０】
図６及び図７は、この雑音最小値推定部２９において実行される最小値推定処理の手順と内容を示すフローチャートである。
【０１２１】
同図において、雑音最小値推定部２９は先ずステップ６ａで、フレーム番号ｍをｍ＝０にリセットすると共に、フレームカウンタの値をｆｃ＝９６に、雑音最小値をnoise＿min(k)＝３６に、帯域をk ＝０，・・・,１５にそれぞれ初期設定する。
【０１２２】
さらに
noise＿min＿h(k)＝MAX(noise＿power(m,2k)，noise＿power(m,2k+1))，
ｋ＝０，・・・，７
雑音最小値の帯域間平均min＿allをnoise＿min＿h(n)：ｎ＝０，１，・・・，７の値の合計値の平均値である
min＿all＝Σ noise＿min＿h(n)/8 ｎ＝０〜７
にそれぞれ初期設定する。
【０１２３】
すなわち、隣接する帯域で大きいノイズパワーを有する値をとり、その平均値をmin＿allと設定する。
【０１２４】
次に雑音最小値推定部２９は、ステップ６ｂでフレーム番号ｍをインクリメントしたのち、ステップ６ｃで上記音声重みがｓｐ＝０であるか否か、つまり雑音フレームであるか否かを判定する。
【０１２５】
そして、雑音フレームであれば、ステップ６ｂに戻ってフレーム番号ｍをインクリメントし、上記ステップ６ｃによる雑音フレームの判定を行う。ｓｐ＝０ではないと判定された場合、すなわち、音声フレーム又は過渡域フレームが検出されると、雑音最小値推定部２９はステップ６ｄに移行してここでフレームカウンタｆｃをインクリメントすると共に、帯域ｋ＝０を選択する。
【０１２６】
そして、ステップ６ｅで
ｘ＝MAX(noise＿power(m,2k)，noise＿power(m,2k+1))
に設定したのち、ステップ６ｆに移行して
noise＿min＿h(k)＞ｘ
であるか否か判定する。
【０１２７】
noise＿min＿h(k)＞ｘであればステップ６ｇに移行してここで雑音最小値をnoise＿min＿h(k)＝ｘに設定する。そして、ステップ６ｈに移行する。
【０１２８】
これに対しnoise＿min＿h(k)≦ｘであれば、そのままステップ６ｈに移行して次の帯域ｋ＝１を選択し、帯域ｋ＝８に達するまでは上記ステップ６ｅ〜ステップ６ｇによる雑音最小値noise＿min＿h(k)の設定処理を繰り返す。
【０１２９】
そして、帯域ｋ＝８に達すると、雑音最小値推定部２９はステップ６ｊでフレームカウンタｆｃが１００に達したか否かを判定する。そして、１００フレームに達するまではステップ６ｂに戻って次のフレームを選択し、この選択したフレームについて上記ステップ６ｃ〜ステップ６ｉによる処理を繰り返す。
【０１３０】
一方、上記１００フレームに対する処理を終了すると、雑音最小値推定部２９はステップ７ａに移行し、ここで雑音最小値の帯域間平均(min＿all)をnoise＿min＿h(n)：ｎ＝０，１，・・・，７の値の合計値の平均値として下記のように算出する。
【０１３１】
min＿all＝Σ noise＿min＿h(n)/8 ｎ＝０〜７
またそれと共に、noise＿min(0)及びnoise＿min(1)をそれぞれ
noise＿min(0)＝noise＿min＿h(0)
noise＿min(1)＝0.75×noise＿min＿h(0)＋0.25×noise＿min＿h(1)
とすると共に、帯域をｋ＝１とする。
【０１３２】
さらに雑音最小値推定部２９は、ステップ７ｂに移行してここで、先に帯域ｋ＝０〜７について求めた８個の雑音最小値をもとに、残りの帯域ｋ＝８〜１５について雑音最小値を
noise＿min(2k)＝0.75×noise＿min＿h(k)＋0.25×noise＿min＿h(k-1)
noise＿min(2k+1)＝0.75×noise＿min＿h(k)＋0.25×noise＿min＿h(k+1)
のように算出する。
【０１３３】
そして、以上の演算が終了すると、雑音最小値推定部２９はステップ７ｄからステップ７ｅに移行し、ここで
noise＿min(14)＝0.75×noise＿min＿h(7)＋0.25×noise＿min＿h(6)
noise＿min(15)＝noise＿min＿h(7)
を算出する。
【０１３４】
すなわち、雑音最小値推定部２９は、上記ステップ７ａ〜ステップ７ｅにおいて８個のmin＿allをもとに１６個のmin＿allを補間している。
【０１３５】
そうして１６個のmin＿allを算出すると、雑音最小値推定部２９はステップ７ｆにおいて、フレームカウンタｆｃを０にリセットすると共に、雑音最小値をnoise＿min＿h(k)=36に、また帯域をｋ＝０，・・・，７に設定し直す。
【０１３６】
そして、ステップ７ｇにおいて、先に算出した雑音最小値の帯域間平均値min＿all、及び雑音最小値noise＿min(m,k)，ｋ＝０，・・・，１５を出力し、ステップ６ｂに戻って次のフレーム（ｍ＝ｍ＋１）について同様の雑音最小値及びその帯域間平均値の算出処理を繰り返す。
＜帯域別ゲイン決定＞
帯域別ゲイン決定部３３は、前記帯域パワー計算部２６から出力された帯域パワーchannel＿power(m,k)、雑音リーク積分値更新部３２から出力された雑音パワーnoise＿power(m,k)、音声重み計算部２８から出力された音声重みsp(m,k)、及び雑音最小値推定部２９から出力された雑音最小値noise＿min(m,k)をもとに、帯域別ゲインgain(m,k)を決定する。
【０１３７】
先ず雑音リーク積分値noise＿power(m,k)の帯域平均値noise＿allを、noise＿power(m,k)：ｋ＝０，１，・・・，１５の値の合計値の平均値として
noise＿all= Σ noise＿power(m,k)/16 ｋ＝０〜１５
により求める。
【０１３８】
続いて、帯域パワーchannel＿power(m,k)の帯域最低値min＿band、及び雑音最小値noise＿min(m,k)の帯域最大値max＿bandをそれぞれ、
min＿band＝ＭＩＮ（channel＿power(m,k),ｋ＝２，・・・，１１）
max＿band＝ＭＡＸ（noise＿power(m,k)，ｋ＝０，・・・，１５）
により求める。
【０１３９】
次に、帯域共通の調整値ｍｄを
md＝（noise＿all−min＿all）×（１−sp/6）＋（min＿band−max＿band）×sp/6
により決定する。この式によると、
sp=0すなわち雑音区間のとき、md=noise＿all−min＿all
sp=6すなわち音声区間のとき、md=min＿band−max＿band
となり、過渡域はこれらの中間の値をとることがわかる。
【０１４０】
雑音フレームの場合、及び音声フレームの場合の周波数対パワー特性の一例を、それぞれ図８及び図９に示す。
【０１４１】
雑音フレームでは、図８に示すように、帯域パワーは雑音最小値に近くなる。雑音最小値に調整値を加えた値は、雑音最小値のスペクトル特性はそのままで平均値が雑音パワーの平均値noise＿allに変更されたものとなる。
【０１４２】
これに対し音声フレームの場合には、図９に示すように、雑音最小値に調整値を加えた値は、最小値のスペクトル特性はそのままで帯域の最大値が帯域パワーの最低値と一致するよう調整されることになる。
【０１４３】
帯域別ゲインgain(m,k)は、帯域パワーchannel＿power(m,k)と、雑音最小値noise＿min(m,k)と、調整値とから次のように決定される。
【０１４４】
まず、
tmp=channel＿power(m,k)−noise＿min(m,k)−md−1.625
と設定する。
【０１４５】
次いで、音声重みｓｐによりgain(m,k)（gain(m,K)≦0）決定の方式を変更する。
（１）ｓｐ＞０、すなわち、音声若しくは過渡フレームのとき、
gain(m,k)＝｛sqrt（1.4＋（0.7×tmp）²）＋0.7×tmp−10｝×２
（２）ｓｐ＝０、すなわち、雑音フレームのとき、
gain(m,k)＝[sqrt（1.4＋(0.03125×tmp)²）＋0.03125×tmp−10]×２
これをｋ＝，・・・，１５についてそれぞれ独立に求める。
【０１４６】
このｇａｉｎを決定する関数形は適宜設定可能である。ｔｍｐの値の小さい領域で音声フレームの方が雑音フレームより下回っていればよい。
【０１４７】
そして、以上のように求められた帯域別ゲインgain(m,k)は、乗算器２３において帯域ごとに変換係数に乗算され、これによりノイズキャンセルがなされる。
【０１４８】
図１０にｔｍｐ−ｇａｉｎの関係をグラフとして示す。実線で示したのが音声フレーム（ｓｐ＞０）の場合であり、点線が雑音フレーム（ｓｐ＝０）の場合である。
【０１４９】
ｔｍｐが０を下回った場合には音声フレームのゲインの方が雑音フレームのゲインを下回っている。これはｔｍｐが帯域のＳＮＲからｍｄと定数（上記例では１．６２５）を差引いたものと考えることができるため、調整値ｍｄの変動分はあるものの、帯域のＳＮＲが小さい場合には、音声区間の方が雑音区間より小さいゲイン値を採ることになる。
【０１５０】
これは音声区間における小さいＳＮＲを示す帯域（これは音声成分を含まないと推定できる帯域である）を積極的に抑圧（小ゲイン値）することで、音声フレーム中の音声成分を含む帯域を際立たせる結果となる。この効果は雑音フレームの帯域ゲイン値より小さく設定することで達成される。
【０１５１】
このようなゲイン値設定は上記双曲線的な関数に限らず種々の設定で行うことが可能である。
【０１５２】
たとえば図１１に示すように、

のように設定することも可能である。
【０１５３】
このノイズキャンセルされた各帯域ごとの変換係数は、ＩＦＦＴ２４において逆高速フーリエ変換されて時間軸上の信号フレームに戻されたのち、フレーム合成部２５においてフレーム合成されて送話信号として、例えば音声符号化回路に供給される。
【０１５４】
以上述べたようにこの実施形態によれば、音声フレームと判断されたフレームでも、音声成分が含まれないと判断された帯域については、雑音フレームと判断されたフレームの帯域別ゲインより小さいゲインが設定されているので、音声フレームにおける音声成分（帯域）が強調されることになり、結果として聴覚的に良好なノイズ抑制出力信号を得ることができる。
【０１５５】
また、雑音最小値推定回路２９において各帯域の雑音パワーの最小値を求め、この雑音最小値のスペクトル形状を帯域別ゲイン決定部３３による帯域別ゲインの決定に用いるようにしているため、例えば自動車の通過時のような雑音スペクトルの短期的な変化に影響されず、音声スペクトルを歪ませにくいノイズキャンセル処理を実現することができる。
【０１５６】
また、各フレームの有意値ｙが大きく（通常は音声と判断される）が前フレームとの帯域別差の差分の変化が小さい（ただし平均値で正規化したもので判断）フレームが連続した場合（例えば１００フレーム）は雑音フレームと判断し、雑音パワー推定値を強制更新する。この強制更新の判定の際には、スペクトル偏差の平均値で正規化した値をもって連続区間をカウントしているため、スペクトル偏差がフレーム間でばらつくような雑音の場合でも実質的に連続区間としてカウントすることができる。従って、良好な雑音フレーム判定がなされないような有意値の変動があっても強制更新がかかることにより良好な雑音パワー推定値の更新が可能となり、もって良好なノイズ抑制が行われることになる。
【０１５７】
以上のようにしてノイズキャンセラの信号低減量（Ｂ）が決定された後、最終信号減衰量（Ｃ）を調整する最終低減量決定部３４において、ボイススイッチから供給されるＳ＿ｌｏｓｓとｇａｉｎをもとに送信信号の最終的な信号減衰量（ｌｇ）を決定する。
【０１５８】
この信号減衰量は例えば下記の様に決定される。
【０１５９】
ｌｇ（ｍ，ｋ）＝ＭＩＮ（Ｓ＿ｌｏｓｓ（ｍ），ｇａｉｎ（ｍ，ｋ））
Ｓ＿ｌｏｓｓとｇａｉｎ（ｍ，ｋ）の小さい方，すなわち信号減衰量の大きい方を採用することになる。
【０１６０】
図１２に送話信号及び受話信号のサンプルを、図１３にＳ＿ｌｏｓｓ及びｌｇのパワー推移をサンプルを提示する。
【０１６１】
ｌｇは必ずＳ＿ｌｏｓｓ以下であるので、通信系（送信−受信）には必ずＶＳ＿ｌｏｓｓ以下のＬＯＳＳが挿入される。また送話のみのときにはＳ＿ｌｏｓｓ＝０であるから、ノイズキャンセラ本来のノイズ抑圧が実現される。送話音声が終った時点において、上記例では、ｇａｉｎ＜Ｓ＿ｌｏｓｓとなるので、ボイススイッチによるスイッチ感（送受話系のＬＯＳＳの急激な切替えによる不自然に音が小さくなる現象）は低減される。
【０１６２】
次いでボイススイッチの動作に関して説明する。
【０１６３】
上述のノイズキャンセラを受話側にも設けて音声の有無の判断を行ってもよいが、本実施形態では受話信号中の音声の有無はダブルトーク判定部ＤＴＤで行う設定にしている。判定方法は各種方法が採用できるがその一例を図１４を用いて説明する。
【０１６４】
この例では、受話信号のフレーム毎のフレームパワーＰ（例えばＰ＝１０ｌｏｇ（サンプルの二乗平均値））を所定基準と比較し音声の有無を判定する。この判定の際に雑音レベル更新期間ＩＮＴＶＬ（例えば１秒間：５０フレーム相当）のフレームパワーＰの最低値ｍｉｎ，Ｐの長期平均ａｖｇ（例えばリーク積分値：ａｖｇ＝γａｖｇ＋（１−γ），γは適宜設定（例えば０．９９））を用いる。
【０１６５】
ステップ（２ａ）でｃｏｕｎｔｅｒ＝０とし計算を開始する。初期値は適宜設定可能であるが、例えば、ｍｉｎ＝ＭＡＸ＿ＮＯＩＳＥ，ｎｏｉｓｅ＝５，ａｖｇ＝５と設定する。ＭＡＸ＿ＮＯＩＳＥは雑音の最高値であり、例えば３６とする。
【０１６６】
ステップ（２ｂ）で対象フレームのフレームパワーＰを計算／入力する。
【０１６７】
ステップ（２ｃ）でｃｏｕｎｔｅｒを“１”増分し、ステップ（２ｄ）でフレームパワーＰとｍｉｎとの比較を行う。Ｐ＜ｍｉｎであればｍｉｎ＝Ｐと書き換え（ステップ２ｅ）、それ以外はｍｉｎの現状値を維持して次ステップ（２ｆ）に進む。
【０１６８】
ステップ（２ｆ）ではフレームパワーＰの長期平均ａｖｇを求める。例えば下式のようにリーク積分値を長期平均値として採用することができる。
【０１６９】
ａｖｇ＝γａｖｇ＋（１−γ），γは適宜設定（例えば０．９９）
ステップ（２ｇ）でｃｏｕｎｔｅｒ値をＩＮＴＶＬ値と比較する。両者が一致しなければステップ（２ｍ）に進み、測定対象フレームに対するｎｏｉｓｅ値を出力する。ｃｏｕｎｔｅｒ＝ＩＮＴＶＬの場合はｎｏｉｓｅ値の更新のため次ステップ（２ｈ）に進む。
【０１７０】
ステップ（２ｈ）ではｍｉｎ−ｎｏｉｓｅ＞−２を判定する。Ｙｅｓであればステップ（２ｉ）に進み、ｎｏｉｓｅ＝ｍｉｎと設定する。Ｎｏであれば次ステップ（２ｊ）に進み、ａｖｇ＜ｎｏｉｓｅ−１を判定する。Ｙｅｓであればステップ（２ｋ）に進み、ｎｏｉｓｅ＝ａｖｇと設定する。
【０１７１】
次いでステップ（２ｌ）に進み、ｃｏｕｎｔｅｒ＝０とリセットし、ｍｉｎ＝ＭＡＸ＿ＮＯＩＳＥに設定して次ステップ（２ｍ）に進み、更新されたｎｏｉｓｅ値を出力する。
【０１７２】
各フレーム毎のフレームパワーＰとｎｏｉｓｅ値を入力として受話信号中の音声の有無を判定する。
【０１７３】
ステップ（２ｏ）でフレームパワーＰとｎｏｉｓｅを入力し、次ステップ（２ｐ）でＰ＜ｎｏｉｓｅ＋ＴＨの判定を行う。ＴＨは閾値であり、例えば１８に設定される。Ｐ＜ｎｏｉｓｅ＋ＴＨであれば受話＝無［音声信号なし］（ステップ（２ｑ）），そうでなければ受話＝有［音声信号あり］（ステップ（２ｒ））と判断する。
【０１７４】
この結果を対象フレームの音声の有無として出力する（ステップ（２ｓ））。
【０１７５】
これを各フレーム毎に繰返すことで、受話信号中の音声の有無の検出を行うことができる。
【０１７６】
この結果を受け、音声有りの場合は受話側にＬＯＳＳを挿入するため、Ｒ＿ｌｏｓｓの信号低減率を増幅器で実行することになる。一方、送信側にはノイズキャンセラにおいてノイズキャンセラの信号減衰量（Ｂ）と比較調整のためＳ＿ｌｏｓｓ（信号減衰量（Ａ））が送られることになる。
【０１７７】
上述のボイススイッチのｌｏｓｓは“０”／ＶＳ＿ｌｏｓｓの２値であったが、この設定は適宜変更することが可能であり、例えば下記のように設定することもできる。
（１）受話（音声なし）／送話（音声なし）：
Ｒ＿ｌｏｓｓ＝ＶＳ＿ｌｏｓｓ−ｈ；Ｓ＿ｌｏｓｓ＝ｈ
（２）受話（音声有り）／送話（音声なし）：
Ｒ＿ｌｏｓｓ＝０；Ｓ＿ｌｏｓｓ＝ＶＳ＿ｌｏｓｓ
（３）受話（音声なし）／送話（音声有り）：
Ｒ＿ｌｏｓｓ＝ＶＳ＿ｌｏｓｓ；Ｓ＿ｌｏｓｓ＝０
（４）受話（音声有り）／送話（音声有り）：
Ｒ＿ｌｏｓｓ＝０；Ｓ＿ｌｏｓｓ＝ＶＳ＿ｌｏｓｓ
ただし、ｈ＝（ｓｐ＝０が続いたフレーム数）×（−０．１）
ＶＳ＿ｌｏｓｓ≦ｈ≦０
ｓｐ：ノイズキャンセラから受け取る音声／雑音の判断変数
上記実施形態に加えエコーキャンセラＥＣ（１０）を加えてもよい（図１５）。エコーキャンセラは音声出力部からの出力が音声入力部から入力された場合にその信号を除去／低減するものでありエコー検出の方式は各種の方式を採用することができる。
【０１７８】
エコーキャンセラＥＣ（１０）以外は図２の構成と同一であり説明を省略する。ＥＣにおいて、音声入力部からの入力信号と、音声出力部に入力される受信信号と、音声入力部の信号から受信信号を引いた信号とを比較し、入力信号に受信信号が重畳していないかを判断し、重畳していればその信号分を差引いてノイズキャンセラＮＣへの入力信号とするものでる。
【０１７９】
なお出力音声環境に応じエコーパスは変化するので音声出力部にて出力された音声が音声入力部にて受ける際の時間差を考慮する必要がある。
【０１８０】
上述の実施態様では、送話側にノイズキャンセラを挿入しているが、受話側にノイズキャンセラをいれてもよい。送受話双方にノイズキャンセラをいれた場合は、送受話ともノイズキャンセルを行うことも可能であり、この場合は、ボイススイッチ機能をノイズキャンセラに取り込んで、ノイズキャンセラの音声信号減衰量の制御でボイススイッチのＬＯＳＳ挿入機能を兼ねることも可能である。
【０１８１】
すなわち送受話どちらかに必ずボイススイッチ機能に必要なＬＯＳＳ量を入れるように両者の信号減衰量を制御すれば良い。
【０１８２】
本発明は携帯電話などの通信機器に限らず、音声処理を用いる電子機器（録音機器，携帯電子端末など）でればどのような機器にも使用することができる。
【０１８３】
なお、図２に示す各ブロックは機能説明を行うために便宜上区分して記載したものであり、各ブロックが個別の素子である必要はなく、１個またはそれ以上の機能、たとえばＣＰＵ，ＤＳＰ，モデム，音声符号化回路など、をまとめて１チップのＬＳＩとしても良いことは言うまでもない。
【０１８４】
【発明の効果】
以上説明したように本発明によれば、高音質の音声信号を供給することができる電子機器を提供することができ、産業上寄与するところ大なるものである。
【図面の簡単な説明】
【図１】図１は本発明の実施形態を示す回路ブロック図。
【図２】図２は本発明の実施形態のノイズキャンセラ示すブロック図。
【図３】図３は本発明の実施形態の有意値計算部の処理手順を示すフローチャート。
【図４】図４は本発明の実施形態の雑音パワー強制更新を判定する有意区間のカウントの処理手順を示すフローチャート。
【図５】図５は本発明の実施携帯の音声重みｓｐの処理手順を示すフローチャート。
【図６】図６は本発明の実施形態のの雑音最小値推定部の処理手順を示すフローチャート。
【図７】図７は本発明の実施形態のの雑音最小値推定部の処理手順を示すフローチャート。
【図８】図８は雑音フレームの場合の周波数対パワー特性の一例を示す図。
【図９】図９は音声フレームの場合の周波数対パワー特性の一例を示す図。
【図１０】図１０は本発明の実施形態のｔｍｐ−ｇａｉｎの関係図。
【図１１】図１１は本発明の実施形態のｔｍｐ−ｇａｉｎの関係図。
【図１２】図１２は送話信号及び受話信号のサンプル示す図。
【図１３】図１３はＳ＿ｌｏｓｓ及びｌｇのパワー推移をサンプルを示す図。
【図１４】図１４は本発明の実施形態のボイススイッチの処理手順を示すフローチャート。
【図１５】図１５は本発明の実施形態のブロック図。
【符号の説明】
ＮＣ・・・ノイズキャンセラ；ＶＳ・・・ボイススイッチ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an electronic device such as a mobile phone that handles transmission and reception of audio signals.
[0002]
[Prior art]
In a communication system that transmits and receives audio such as a mobile phone, a voice switch is mounted on an electronic device (such as a mobile phone) that constitutes the communication system in order to prevent howling.
[0003]
The voice switch constantly inserts LOSS into a transmission / reception signal loop, and imposes a predetermined signal reduction amount on either the reception signal or the transmission signal.
[0004]
In general, LOSS is set to be inserted into a transmission signal when only receiving a call and to a reception signal when only transmitting. During so-called double talk or when there is no speech, for example, LOSS is set to be inserted on the transmission side.
[0005]
In addition, in an electronic device such as a mobile phone that performs voice communication, a voice coding method such as a CELP (Code Excited Linear Prediction) method is used.
[0006]
When such a device is used in an environment where there is a large amount of background noise, the background noise is captured and encoded, resulting in a decrease in the clarity of speech. For this reason, a technology (noise canceller) that removes or suppresses background noise and performs speech coding close to a speech-only signal has been studied and installed in electronic devices.
[0007]
[Problems to be solved by the invention]
As described above, since the voice switch and the noise canceller have different purposes, the electronic devices equipped with the voice switch and the noise canceller individually operate so as to perform their functions independently.
[0008]
Therefore, when both of them are put in the same signal system (for example, transmission system), there is a problem that signal reduction more than necessary is performed for both functions.
[0009]
The voice detection function and the signal reduction function are common functions. Therefore, a redundant area exists in terms of signal processing.
[0010]
The present invention has been made in view of the above points, and an object of the present invention is to provide an electronic apparatus with improved audio signal processing when a noise canceller and a voice switch are provided.
[0011]
[Means for Solving the Problems]
The present inventionAn electronic device that acquires an audio signal to be transmitted and received and performs signal reduction to at least one of a transmission signal and a reception signal,
Based on the audio detection result of the transmitted signal or received signalSets the first signal reduction amount to the transmission signalVoice switchWhen,
A transmission signal is acquired, and a final signal reduction amount equal to or less than the first signal reduction amount set by the voice switch is sent to the transmission signal.Signal reductionI doNoise cancellerAnd featuresIt is an electronic device.
[0012]
That is, the signal reduction amount (A) by the voice switch and the signal reduction amount (B) by the noise canceller are not simply added, but are compared to determine an appropriate signal attenuation amount.
[0013]
In general, the noise canceler operates to perform finer noise reduction (the signal attenuation amount changes with time), so the signal reduction amount to be inserted into the transmission / reception system based on the signal reduction amount (B) of the noise canceller is reduced. It is preferable to determine.
[0014]
However, a certain amount of signal reduction needs to be included in any of the transmission / reception systems from the viewpoint of howling prevention.
[0015]
Accordingly, it is preferable that the signal reduction amount (B) of the noise canceller is adopted as a reference for the signal reduction amount, and the upper limit (lower limit as an absolute value) of the signal reduction amount is set as the signal reduction amount (A) of the voice switch.
[0016]
That is, it can be achieved by adjusting the signal reduction amount (B) of the noise canceller to be equal to or less than the signal reduction amount (A) of the voice switch.
[0017]
The LOSS of the voice switch is inserted into either the transmission system or the reception system, and the determination is based on whether the audio signal exists in the transmission system or the reception system. There are various methods for the determination, and the determination method is not particularly limited.
[0018]
However, since the noise canceller also determines the presence / absence of an audio signal, by using this result, it is possible to omit the process of detecting the presence / absence of an audio signal in the voice switch.
[0020]
As the noise canceller used in the present invention, various methods such as a noise suppression method in the time domain and a noise suppression method in the frequency domain can be employed.
[0021]
For example, in Japanese Patent No. 2955737, an input signal is divided into frames of a predetermined period, noise / speech is determined for each frame, and when it is determined that the frame is a noise frame, the gain value for each band is set to the minimum. A method is disclosed in which when a frame is determined, a gain value exceeding that is set and noise suppression is performed, and this can also be used.
[0022]
In the noise suppression method of dividing an input signal into predetermined time unit frames, dividing the divided frames into predetermined frequency bands, and performing noise suppression processing for each of the divided bands: An audio frame determination step for determining whether the frame is a noise frame or an audio frame; a band-specific gain determination step for setting a gain value for each band based on the result of the audio frame determination step; and the band gain A signal generation step of generating a noise-suppressed output signal by reconstructing a frame after performing noise suppression for each band using the gain value for each band determined in the determination step, and determining the gain for each band In the step, the gain value for each band when it is determined that the frame to be determined is an audio frame is set as the frame to be determined. It is also possible to employ a noise canceller that beam, characterized in that the setting of the band-specific gain value to be taken per-band gain value smaller than the value when it is determined that the noise frame.
[0023]
In this method, noise suppression in a band that does not include a voice component in a voice frame is sufficiently performed, and good noise reduction can be performed. In other words, the gain value for each band when performing noise suppression is not only determined by distinguishing between the voice frame and the noise frame, but the minimum gain value for each band of the voice frame is more than the minimum gain value for each band of the noise frame. By setting it to be small, the audibility of the audio signal after noise suppression is improved.
[0024]
Even in a frame determined as an audio frame, an audio component is not always included in all bands. For a band that is estimated to contain no audio component (or low audio component) in a frame determined to be an audio frame, a gain smaller than the gain value for each band of the noise frame is set, and the audio in the audio frame is set. Good audibility can be obtained by making the band containing the components stand out.
[0025]
That is, when the determination target frame is determined to be a noise frame, the gain value for each band that is estimated not to include the audio component in the frame determined to be the audio frame is determined. By setting the gain value for each band so that it takes a value smaller than the gain value for each band, the band including the audio component in the audio frame can be made more prominent, resulting in a noise-suppressed output with good audibility A signal can be obtained.
[0026]
In the noise frame, a constant gain value may be set for each band, or may be set so as to change based on the difference between the power for each band and the noise power.
[0027]
In addition, for voice frames, the gain value for each band is set to increase as the index based on the difference between the power for each band and the noise power increases. is there. It is also possible to adopt a continuously decreasing function.
[0028]
In such a noise suppression method, there is a step of updating an estimated value of noise power as a step before determining the above-described gain for each band. This estimated noise value is updated under a predetermined condition, and for example, an update method disclosed in a noise suppression method disclosed in JP-T-10-513030 can be employed.
[0029]
This updating method uses a voice metric which is a sum of weighted SNRs (logarithm of signal energy / noise energy) for each individual band of each frame, and uses a deviation (signal energy) for each individual band. Is a technique for updating a noise estimation value using a spectral deviation that is a sum of absolute values of logarithmic values of the past signal energy average values). If it falls below (for example, 1 second), the estimated noise value is updated.
[0030]
Also, instead of using the value of the spectral deviation as it is for the determination, the total deviation of the difference between the band power and the noise power is normalized with the average value between the past frames and the noise is calculated based on this normalized value. By determining the section, it is also possible to adopt a method that can recognize noise having a large variation between frames as compared with the above method as noise.
[0031]
That is, the average value of the sum (sum) of the difference between the current frame and the previous frame of the significant value (suby) for each band obtained by applying a predetermined weight to the difference between the power for each band and the noise power estimate for each band. This is a method for determining whether or not the current frame is a noise frame based on the ratio (r) normalized by (sum_average).
[0032]
In this way, by using the deviation of the significant value (suby) for each band from the past frame, and using the value obtained by normalizing the deviation by the average value of the total deviation value as the basis for determination, it is possible to reduce the variation for each frame. Therefore, stable noise frame determination can be performed. Therefore, it is possible to satisfactorily recognize noise as noise that has a large variation between frames.
[0033]
More specifically, a frame dividing step for dividing the transmission input signal into predetermined time unit frames; a frequency band dividing step for dividing each frame into a plurality of frequency bands; and a power for each band for each frequency band a power calculation step for each band for calculating (channel_power); a difference (tmp) between a noise power estimation value for each band (noise_power) and the power for each band (chennel_power) is calculated for each frequency band, and the difference (tmp) A significant value calculation step of calculating a significant value (y) obtained by adding a significant value (suby) obtained by performing a predetermined weighting to each band under a predetermined condition; between the current frame and the previous frame, Calculating a sum of absolute values (sum) for each frequency band, and calculating an average value (sum_average) of the absolute value sums (sum_average). Sum of values (sum) Having; this significant value and normalization step of calculating an average value of the absolute value sum (sum_average) the normalized ratio (r).
[0034]
The update of the noise power estimate has the following two steps.
[0035]
That is, when the significant value (y) falls below a predetermined threshold, the current frame is determined to be a noise frame, and the first noise power estimate update step for updating the noise power estimate for each band (noise_power) And when the ratio (r) continuously falls below a predetermined threshold for a predetermined period, the current frame is determined to be a noise frame, and the second noise power estimation value (noise_power) is updated. A noise power estimation value update step.
[0036]
The first noise power estimated value update step is a case where noise estimation is performed well and it is determined that the frame is a noise frame by significant value determination. The second noise power estimated value update step has a significant value. Forcible updating is possible even when there is a variation between frames and a good noise frame cannot be determined with a significant value.
[0037]
As the average value used for normalization, an estimated value using the leak integral of the absolute value sum (sum) can be used. It is also possible to use an estimated value of the average value (sum_average) obtained by using the leak integral of the standard deviation of the absolute value sum (sum).
[0038]
In the above-described gain setting for each band, the gain value for each band is determined using a different function for a voice frame and a noise frame. Is calculated based on the difference between the power for each band and the noise power for each band (logarithmic difference: SNR). That is, since a band having a high SNR in a voice frame is estimated as a band containing a voice component, the gain value of that band is set to be large, and a band having a low SNR is estimated to contain no voice component, and its gain value Is set small.
[0039]
By the way, although noise (Background Noise) is generally assumed to be steady, it may fluctuate outdoors. In particular, the energy of noise generated when a vehicle passes by increases as the vehicle approaches. When the transmitted voice is input in this state, the energy difference between the voice and noise is small, and thus the suppressed voice may be distorted. Further, even when the noise spectrum shape is similar to the speech spectrum shape, if suppression is performed based on noise energy, interference with the speech spectrum is likely to occur, and thus the suppressed speech is distorted. Even if the noise energy fluctuates, it is possible to suppress such influence by adjusting the gain value determining variable based on the SNR so that stable noise suppression processing can be performed by eliminating the influence. .
[0040]
In such adjustment, when determining the gain value for each band, a noise power estimation step of obtaining a signal power for each frequency band and estimating a noise power for each band based on the band power; And a minimum value detecting step for detecting a minimum value of power over a plurality of frame periods for at least one of the noise power by band; a band detected by the band power and the minimum value detecting step for each frequency band; This can be done by determining the noise suppression amount for each frequency band based on the difference obtained from the step for determining the minimum value for each band to obtain the difference from the other minimum value.
[0041]
Further, using an adjustment value that generates an adjustment value common to different bands for each frame, for each of the frequency bands, a difference between the band power and a value obtained by adding the adjustment value and the band power is obtained. It is also possible to determine the noise suppression amount for each frequency band based on the above.
[0042]
This adjustment value determines a common adjustment value for the band based on the difference between the average value between the minimum values for each band and the average value for the noise power for each band in the noise period; It can be obtained by determining an adjustment value common to the bands based on the difference between the minimum value of the plurality of band powers in the frame and the maximum value of the minimum values of the plurality of bands.
[0043]
The determination of the voice frame and the noise frame includes: a noise power estimation step of obtaining a signal power for each frequency band and estimating a noise power for each band based on the band power; a band for each frequency band; A comparison step of obtaining a difference between the different noise power and the band power and comparing these band-by-band differences with a predetermined threshold value; An addition step of adding the respective band-specific differences to each other after performing a predetermined weighting when it is determined that the difference exceeds a threshold value; Based on this, it is possible to employ a determination method including a determination step for determining whether the input signal is a speech section or a noise section.
[0044]
In this addition step, each band difference can be weighted so that the weight becomes smaller as the frequency becomes higher. In the determination step, based on the added value, the input signal is a voice section, It is possible to determine whether it is a noise section or a transition section that is an intermediate area between both sections.
[0045]
The present invention as described above can be applied to an electronic device such as a mobile phone that employs a digital speech coding method using various speech coding methods such as ACELP, EVRC, EFR, and AMR. That is, in an electronic device having an audio signal input unit (direct input means such as a microphone, signal transmission from an electronic file, etc.) and an audio encoding unit, the audio signal of the audio signal input unit is received, An electronic apparatus includes a noise canceller that supplies a signal subjected to noise suppression by a noise suppression method to a voice encoding unit, and the voice switch described above.
[0046]
The voice switch and the noise canceller can be executed by signal processing in the DSP, for example, in the same way as voice encoding.
[0047]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described.
[0048]
FIG. 1 is a schematic block diagram of an electronic apparatus showing an embodiment of the present invention.
[0049]
The reception signal (1) is supplied to a sound output device (3) such as a speaker through a reception signal amplifier (2). An input signal from a voice input device (4) such as a microphone is sent out as a transmission signal (9) via a noise canceller NC (5).
[0050]
The voice switch VS (6) detects the presence or absence of a voice signal in the received signal from the received signal, and receives a signal sp [voice detection flag] indicating the presence or absence of the voice signal in the transmitted signal from the noise canceller NC (5). . Based on the result, the double talk determination unit DTD (7) determines which of the transmission / reception signals contains the audio signal.
[0051]
In response to the determination result, the LOSS determination unit (8) determines the LOSS to be inserted into the transmission system / reception system based on the setting, and notifies the amplifier (2) of R_loss and the noise canceller NC (5) of S_loss. .
[0052]
As this setting, for example, a certain amount of VS_loss (for example, −12 dB) is inserted into one of transmission / reception.
[0053]
There are four cases depending on the presence or absence of audio and is set as follows.
(1) Receipt (no voice) / Send (no voice): S_loss = VS_loss
(2) Reception (with voice) / Transmission (without voice): S_loss = VS_loss
(3) Receipt (without voice) / Send (with voice): R_loss = VS_loss
(4) Reception (with voice) / Transmission (with voice): S_loss = VS_loss
The amount of attenuation when VS_loss is not inserted is assumed to be “0”.
[0054]
This setting can be changed as appropriate. For example, when the voice switch is switched (switching from VS_loss to “0”), the amount of attenuation changes abruptly, so that the user feels a so-called switch feeling. In order to reduce this, it is also possible to incline the attenuation amount at the time of switching.
[0055]
The signal attenuation amount of the transmission / reception signal is determined from the signal attenuation amount (A) determined by the voice switch and the signal attenuation amount (B) determined by the noise canceller.
[0056]
In this embodiment, the signal attenuation amount of the received signal is the signal attenuation amount (A) of the voice switch, and the signal reduction amount of the transmission signal is the final signal attenuation amount of the noise canceller controlled by the signal reduction amount (A) of the voice switch. (C).
[0057]
In other words, if the signal attenuation calculated by the noise canceller for noise reduction is greater than the signal attenuation provided by the voice switch, the signal attenuation of the noise canceller is used, and the signal attenuation of the noise canceller is greater than the signal attenuation provided by the voice switch. If the amount is small, the signal attenuation provided by the voice switch is adopted.
[0058]
Therefore, the final signal reduction amount of the transmission signal changes with the signal attenuation amount provided by the voice switch as an upper limit.
[0059]
The presence or absence of a voice signal on the transmission side may be determined by capturing the transmission signal with a voice switch and performing the same processing, but in this embodiment, the presence or absence of a voice in the transmission signal is determined by a noise canceller. It is set.
[0060]
Next, the operation of the noise canceller will be described.
[0061]
The noise canceller is realized by, for example, a DSP (Digital Signal Processor), and the processing program is stored in a memory in the noise canceller or a memory attached to the control circuit. FIG. 2 is a block diagram of a noise canceller showing a functional configuration realized by this processing program.
[0062]
A digital audio signal obtained by A / D converting an audio signal from an audio input unit such as a microphone is first input to the frame dividing unit 21. For example, the frame division unit outputs a frame arranged to 128 samples (frame division step). At this time, after the digital transmission signal is divided into frames of, for example, 80 samples, the frame ends may be overlapped by performing windowing. This digital transmission signal frame is input to a fast Fourier transform unit (FFT) 22.
[0063]
The output of the FFT 22 is noise-suppressed by the multiplier 23 based on the output from the final reduction amount determination unit 34 that determines the final signal reduction amount (C) of the noise canceller, and the IFFT unit 24 applies inverse FFT to the frame synthesis unit 25. To return to the frame and output as a transmission signal.
[0064]
The output of the FFT 22 is input to the band power calculation unit 26, and the output is supplied to the significant value calculation unit 27 and the band-specific gain determination unit 33. The output is also supplied to the noise leak integrated value update unit 32 according to the determination result of the update determination unit 31.
[0065]
The output of the significant value calculation unit 27 is supplied to the update determination unit 31 and the voice weight calculation unit 28.
[0066]
The output of the noise leak integrated value update unit 32 is supplied to the significant value calculation unit 27, the voice weight calculation unit 28, and the band-specific gain determination unit 33.
[0067]
The output of the voice weight calculation unit 28 is supplied to the minimum noise value estimation unit 29 and the band-specific gain determination unit 33 and is also output as an SP value (voice detection flag) to the voice switch VS.
[0068]
The band-specific gain determination unit 33 receives outputs from the noise leak integrated value update unit 32, the minimum noise value estimation unit 29, and the speech weight calculation unit 28, and outputs a signal to the final reduction amount determination unit 34.
[0069]
The final reduction amount determining unit 34 receives the S_loss from the voice switch VS and the output from the band-specific gain determining unit 33 and outputs a gain to the multiplier 23.
[0070]
The operation in each block will be described below.
[0071]
The fast Fourier transform unit FFT22 performs fast Fourier transform processing on the input digital transmission signal frame, and frequency-divides it into 16 bands (k = 0, 1, 2,... 15) in order from low to high. Obtained conversion coefficients. This transform coefficient need not be the same in each band. The band-divided conversion coefficient is output to the band power calculation unit 26 (frequency band division step).
<Band power calculation>
The band power calculation unit 26 obtains energy (root mean square value of conversion coefficient) for each band and takes a logarithm, band power channel_power (m, k), [m is a frame number, k is a band number (0-15) )] Is output (band-based power calculation step). This band power is output to the significant value calculator 27.
<Significant value calculation>
The significant value calculating unit 27 obtains a difference tmp between a noise leak integrated value noise_power (m, k) output from a noise leak integrated value updating unit 32 (to be described later) and the band power channel_power (m, k). The difference tmp is compared with a predetermined threshold value. When it is determined that the band-specific differences tmp of a plurality of adjacent bands exceed the threshold value among the band-specific differences tmp arranged in the frequency order, the band-specific differences tmp are subjected to predetermined weighting. Add to each other above. A conditional sum of the weighted values suby (m, k) (when it is determined that the difference tmp between adjacent bands exceeds the threshold) is output as a significant value y (significant value calculation step) ).
[0072]
Further, an average value of significant values y (y_average: an estimated value by leak integration can be substituted, for example, calculated by the following formula) is also output.
[0073]
y (m): significant value, conditional sum of suby (m, k)
y_average (m) = y_average (m-1) × 0.9 + y (m) × 0.1
FIG. 3 is a flowchart showing the processing procedure of the significant value calculation unit 27. A flow for outputting the significant value y will be described with reference to FIG.
[0074]
After resetting / initializing the frame number m = 0 in step 3a, the group number m is incremented in step 3b, and the significant value y, the band number k, and the continuous number flag (continuous band-specific differences tmp exceeding the threshold value) Number flag) is initialized to “0”.
[0075]
Next, in step 3c, for band k = 0, the difference tmp between the band power and the noise leak integrated value and the value suby (m, k) obtained by weighting the band-specific difference tmp are calculated as follows: To do.
[0076]
tmp = chanel_power (m, k) −noise_power (m, k)
sub_y (m, k) = {200− (k−1)²} / 100 × (tmp−1)
However, {200− (k−1)²} Is a weighting coefficient. In this case, the frequency is set so as to decrease as the frequency of the band increases, but can be changed as appropriate.
[0077]
When the band-specific difference tmp in the band k = 0 is calculated, the significant value calculating unit 27 compares the band-specific difference tmp with a threshold value (for example, 1) in step 3d. If the threshold value is exceeded, it is determined that there is a possibility of voice, and the process proceeds to step 3i through

steps

3e and 3g, and the continuous number flag is set to 1. Next, after incrementing the band number k in step 3k to set k = 1, the process returns to step c and the same processing is executed for the band k = 1.
[0078]
Here, also in the band k = 1, it is assumed that the band-specific difference tmp exceeds the threshold following the band k = 0. Since the continuous number flag is already 1, the process moves from step 3e to step 3f.
y = y + suby (m, k−1)
The following operation is executed. Then, the continuous number flag is set to 2, the process proceeds to step 3h through step 3g, and the following calculation is executed.
[0079]
y = y + suby (m, k)
Next, in step 3k, the band number k is further incremented to set k = 2, and the process returns to step 3c to execute processing for the band k = 2.
[0080]
Similarly, every time the difference tmp of adjacent bands exceeds the threshold continuously, suby (m, k) of that band is sequentially added to the significant value y obtained up to the previous band. Then, the weighted addition value y of the band-specific difference tmp is obtained.
[0081]
In any band k = i, when the band-specific difference tmp is equal to or smaller than the threshold value, the significant value calculation unit 27 proceeds from step 3d to step 3j and resets the continuous number flag to 0.
[0082]
When the processing is completed for all 16 bands (k = 0 to 15) constituting one frame in this way, the significant value calculation unit 27 proceeds from step 3m to step 3n to calculate the significant value y and each band. The weighted subband differences suby (m, k) (k = 0 to 15) are output.
[0083]
In this way, a significant value y is obtained for each frame, and is used to determine whether the frame is a speech frame or a noise frame.
[0084]
In addition, the significant value calculation unit 27 also counts a significant section for determining noise power forced update. This process will be described based on the flowchart of FIG.
[0085]
First, an average value y_average (m) of significant values y (m) is obtained.
[0086]
In step 4a, initial values are set to frame number m = 0, sum_average (0) = 0.1, y_average (0) = 10, counter (0) = 0, and in step 4b, group number m is incremented and significant value y, Enter sub (m, k).
[0087]
Next, in step 4c, an average value of significant values y is calculated. The average value can be set as appropriate based on the relationship between memory capacity and calculation amount (for example, taking an average of about 0.1 to 0.3 seconds is sufficient, so add the past 20 frames. In general, it is estimated and calculated as follows using a leak integral. It goes without saying that a method other than leak integration may be used for obtaining the average value.
[0088]
y_average (m) = y_average (m-1) × 0.9 + y (m) × 0.1
Next, in step 4d, an absolute value sum sum of differences between sub (m, k) and sub (m-1, k) is obtained (significant value sum calculation step for each band), and in step 4e, the absolute sum sum is calculated. Divide by the average value sum_average to calculate the ratio r (significant value normalization step).
[0089]
sum (m) / sum_average (m-1)
This value may be directly set to r, but in order to remove a specific value, the larger one is compared with the value obtained by multiplying r (m−1) by a predetermined attenuation factor (for example, 0.99). Is adopted as r (m).
[0090]
This ratio r serves as a criterion for the counter addition for calculating the significant value interval. For example, the upper limit is set to 8. Therefore, if it is determined in step 4f that r (m) exceeds 8, r (m) = 8 is reset in step 4g.
[0091]
Then, sum_average is updated in step 4h. This average value can also be set as appropriate based on the relationship between memory capacity and calculation amount (for example, it is sufficient to take an average of about 0.1 to 0.3 seconds, so add the past 20 frames). In general, it can be estimated and calculated as follows using leak integration. It goes without saying that a method other than leak integration may be used for obtaining the average value.
[0092]
sum_average (m) = sum_average (m−1) × 0.9 + sum (y) × 0.1
Note that sum_average may be an estimated value of standard deviation. Even in this case, an estimated value can be obtained by using the leak integral of the following formula, and this value is substituted.
[0093]
sum_average (m) = sqrt (sum_average (m−1)²× 0.9 + sum (m)²× 0.1)
Subsequently, the counter counter (m) of the significant section is calculated.
[0094]
When y> 10 and counter (m) <100 and r (m) ≦ THR, 1 is added to counter (m). If this condition is not met, counter (m) = 0 is reset.
[0095]
THR may be a fixed value or may be changed by y_average. In this embodiment, THR that varies according to the following equation is employed.
[0096]
THR = 1.7 + (y_average−40) / 200 where 1.7 ≦ THR ≦ 2.0
y_average> 100 THR = 2.0
y_average ≦ 40 THR = 1.7
40 ≦ y_average ≦ 100 THR = 1.7 + (y_average−40) / 200
Therefore, if it is determined in step 4i that y_average (m) exceeds 100, THR = 2.0 is set in step 4j. If it is determined in step 4k that y_average (m) exceeds 40, step 4l is set. THR is set to the variable value of the above equation. In other cases, THR = 1.7 is set in step 4m.
[0097]
If it is determined in step 4n that the significant value y is greater than 10, the counter counter is determined to be less than 100 in step 4o, and the ratio r is determined to be less than or equal to THR in step 4p, the counter counter is added in step 4q. Otherwise, the counter counter is reset to 0 in step 4r.
[0098]
Similarly, if it is determined in step 4n that the significant value y is 10 or less, the counter counter (m) is reset to 0 in step 4s, and if the counter is 100 or more (ie 100) in step 4o, the counter ( m) = counter (m−1).
[0099]
With the above processing, counter (m) and y_average (m) are output for each frame m (step 4u).
<Update judgment>
In response to these outputs (counter (m), suby (m, k), y (m), y_average (m)), the update determination unit 31 determines whether or not the noise power value for each band noise_power (m, k) is updated. Then, the noise leakage integrated value update unit 32 updates the noise power value for each band.
[0100]
The significant value y is about 20 to 30 in the case of normal speech, and y <15 when noise estimation is performed well. Accordingly, when y <15, for example, the following equation is used (first noise power estimated value update step).
noise_power (m + 1, k) = noise_power (m, k) × 0.9 + channel_power (m, k) × 0.1
k = 0, 1,..., 15
In addition, a normal noise power update as defined in IS127 [US Standard Variable Rate Speech Coding System: "Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems" (TIA IS127)] is performed. Also good.
[0101]
If y is not accurately calculated for some reason, a forced update is performed based on the counter value (counter) (second noise power estimate update step). For example, when counter (m) ≧ 100 and y <y_average (m) +5, updating is performed according to the above formula.
[0102]
Subsequently, the band-specific gain determination unit 33 determines the band-specific gain. At this time, it is set for each band with reference to the significant value (y) calculated by the significant value calculation unit, the significant value (suby) for each band, and the like.
<Sound weight calculation>
The significant value y that is an output from the significant value calculation unit is received, and the sound weight calculation unit calculates the sound weight sp used for determining the noise suppression gain. The voice weight sp is a numerical value representing the degree of voice included in one frame in a range of 0 ≦ sp ≦ 6, where sp = 0 represents a noise section and sp = 6 represents a voice section. It should be noted that these numerical values and stage breaks can be set as appropriate.
[0103]
This sp value is supplied to the above-described voice switch, and is used to determine the presence or absence of voice in the transmission signal in the voice switch. For example, when sp = 0, it can be determined that there is no sound, and otherwise, it can be determined that there is sound.
[0104]
FIG. 5 is a flowchart showing the procedure for calculating the voice weight sp and its processing contents in the voice weight calculator 28.
[0105]
First, the frame number m is reset to 0 in step 5a, and then the group number m is incremented in step 5b. Next, in step 5c, the weighted addition value y is compared with an arbitrary threshold value “13”, and if y <13, it is determined as a noise frame, and the process proceeds to step 5d, where speech weight sp (m) The
sp (m) = sp (m−1) −0.5
Set to. As will be described later, sp (m) is the final step.
sp (m) = MAX (sp (m), 0))
Since the minimum value is set to 0, sp (m) converges to 0 if noise frames continue.
[0106]
On the other hand, if y ≧ 13, the frame is a voice or a transitional period, and the process goes to step 5e to enter a temporary voice weight z.
z = (y-13) × 1.5 + 1
Calculate
[0107]
First, in step 5f
sp (m−1) ≦ 0.5
Determine. That is, it is determined whether the voice weight sp (m−1) one frame before is sufficiently small as 0.5 or less, and it is determined whether it is a noise frame.
[0108]
If the previous frame is determined to be a noise frame, that is, if sp (m−1) is 0.5 or less, the process proceeds to step 5g, where the speech weight sp (m) of the current frame is set to the above-mentioned temporary frame. Is set to the audio weight z.
[0109]
This case is the time of switching from noise to speech, and it is necessary to suppress the noise and raise the speech clearly so that the beginning of the word is not cut off. Accordingly, the voice weight is set to take a large value.
[0110]
On the other hand, when the speech weight sp (m−1) one frame before is not a noise frame (sp (m−1)> 0.5), the process proceeds to step 5h.
z> sp (m−1) +0.5
Determine. If z is larger than (sp (m-1) +0.5), the voice weight sp (m) of the current frame is set to (sp (m-1) +0.5) in step 5i.
[0111]
This case is a point in time when it is determined that the audio frame is in a transitional period, and importance is placed on continuity, and the increase in z is suppressed to 0.5 from the previous frame.
[0112]
On the other hand, if z is equal to or less than (sp (m-1) +0.5), the process proceeds to step 5j.
z <sp (m-1) -0.5
If z is smaller than (sp (m−1) −0.5), the voice weight sp (m) of the current frame is set to (sp (m−1) −0.5) in step 5k. To do.
[0113]
This case is also a point in time when it is determined that the audio frame is in a transition period, and importance is placed on continuity, and the decrease in z from the previous frame is suppressed to 0.5.
[0114]
If z is equal to or less than (sp (m-1) -0.5), the process proceeds to step 5m, where the voice weight sp (m) of the current frame is set to z.
[0115]
Through the above steps, sp (m) = z and sp (m−1) ± 0.5 are set to one of the three values, and finally
sp (m) = MIN (sp (m), 6)
sp (m) = MAX (sp (m), 0)
Thus, the value of sp (m) = 0 to 6 is determined.
[0116]
That is, in step 5f to step 5m, the temporary audio weight z calculated in the current frame is corrected in consideration of the audio weight sp (m-1) set in the previous frame, and in step 5n, sp (m ) And the process returns to step 5b to determine sp (m) for all m.
[0117]
By using the sound weight sp (m) thus obtained, it is possible to adjust the sound / noise / transient range in consideration of continuity between frames.
[0118]
The voice weight sp (m) obtained by the voice weight calculation unit 28 is input to the minimum noise value estimation unit 29 and the band-specific gain determination unit 33.
[0119]
This sp value is also supplied to the voice switch VS as a voice detection flag. Based on this, on the VS side, for example, sp = 0 [no sound] / sp> 0 [sound] is determined.
<Minimum noise estimation>
The noise minimum value estimation unit 29 checks the minimum value of the noise leakage integral value noise_power (m, k) in each band for every 100 frames in which the speech weight sp is sp = 0. Then, this minimum value is used as the minimum noise value noise_min (m, k) in the next 100 frames. At the same time, an average value min_all of the minimum noise values in each band is obtained.
[0120]
6 and 7 are flowcharts showing the procedure and contents of the minimum value estimation process executed in the noise minimum value estimation unit 29.
[0121]
In the figure, the noise minimum value estimation unit 29 first resets the frame number m to m = 0 in step 6a, sets the frame counter value to fc = 96, and sets the noise minimum value to noise_min (k) = 36. The bands are initially set to k = 0,.
[0122]
further
noise_min_h (k) = MAX (noise_power (m, 2k), noise_power (m, 2k + 1)),
k = 0,..., 7
The average noise minimum value between bands min_all is the average value of the total values of noise_min_h (n): n = 0, 1,...
min_all = Σ noise_min_h (n) / 8 n = 0-7
Initialize each.
[0123]
That is, a value having a large noise power in an adjacent band is taken, and the average value is set as min_all.
[0124]
Next, the noise minimum value estimation unit 29 increments the frame number m in step 6b, and then determines in step 6c whether or not the speech weight is sp = 0, that is, whether or not it is a noise frame.
[0125]
If it is a noise frame, the process returns to step 6b to increment the frame number m, and the noise frame is determined in step 6c. When it is determined that sp = 0 is not satisfied, that is, when a speech frame or a transitional frame is detected, the noise minimum value estimation unit 29 proceeds to step 6d and increments the frame counter fc, and the band k Select = 0.
[0126]
And in step 6e
x = MAX (noise_power (m, 2k), noise_power (m, 2k + 1))
After setting, go to step 6f
noise_min_h (k)> x
It is determined whether or not.
[0127]
If noise_min_h (k)> x, the process proceeds to step 6g where the minimum noise value is set to noise_min_h (k) = x. Then, the process proceeds to step 6h.
[0128]
On the other hand, if noise_min_h (k) ≦ x, the process proceeds to step 6h as it is and the next band k = 1 is selected. Until the band k = 8 is reached, the minimum noise value noise_min_h (from step 6e to step 6g) is selected. Repeat the setting process of k).
[0129]
When the band k reaches 8, the noise minimum value estimation unit 29 determines whether or not the frame counter fc reaches 100 in step 6j. Until the 100th frame is reached, the process returns to step 6b to select the next frame, and the processes in steps 6c to 6i are repeated for the selected frame.
[0130]
On the other hand, when the processing for the 100 frames is completed, the noise minimum value estimation unit 29 proceeds to step 7a, where the noise minimum value between the bands (min_all) is calculated as noise_min_h (n): n = 0, 1,.・ Calculate as the average of the total values of 7 and 7 as follows.
[0131]
min_all = Σ noise_min_h (n) / 8 n = 0-7
Along with that, noise_min (0) and noise_min (1) are set respectively.
noise_min (0) = noise_min_h (0)
noise_min (1) = 0.75 × noise_min_h (0) + 0.25 × noise_min_h (1)
And the bandwidth is k = 1.
[0132]
Further, the noise minimum value estimation unit 29 proceeds to step 7b, where the noise minimum value for the remaining bands k = 8 to 15 is determined based on the eight noise minimum values previously determined for the bands k = 0 to 7. Minimum value
noise_min (2k) = 0.75 × noise_min_h (k) + 0.25 × noise_min_h (k-1)
noise_min (2k + 1) = 0.75 × noise_min_h (k) + 0.25 × noise_min_h (k + 1)
Calculate as follows.
[0133]
When the above calculation is completed, the noise minimum value estimation unit 29 proceeds from step 7d to step 7e, where
noise_min (14) = 0.75 × noise_min_h (7) + 0.25 × noise_min_h (6)
noise_min (15) = noise_min_h (7)
Is calculated.
[0134]
That is, the minimum noise value estimation unit 29 interpolates 16 min_alls based on the 8 min_alls in Steps 7a to 7e.
[0135]
After calculating 16 min_all, the noise minimum value estimation unit 29 resets the frame counter fc to 0 in step 7f, sets the noise minimum value to noise_min_h (k) = 36, and sets the band to k = 0. , ..., set to 7 again.
[0136]
Then, in step 7g, the previously calculated minimum noise minimum value between bands min_all and minimum noise value noise_min (m, k), k = 0,..., 15 are output, and the process returns to step 6b and next. The same processing for calculating the minimum noise value and the average value between the bands is repeated for the next frame (m = m + 1).
<Determination of gain by band>
The band-specific gain determination unit 33 includes the band power channel_power (m, k) output from the band power calculation unit 26, the noise power noise_power (m, k) output from the noise leak integration value update unit 32, and a voice weight calculation. Based on the speech weight sp (m, k) output from the unit 28 and the noise minimum value noise_min (m, k) output from the noise minimum value estimation unit 29, the gain by band gain (m, k) is calculated. decide.
[0137]
First, the band average value noise_all of the noise leak integrated value noise_power (m, k) is set as the average value of the total values of noise_power (m, k): k = 0, 1,.
noise_all = Σ noise_power (m, k) / 16 k = 0-15
Ask for.
[0138]
Subsequently, the band minimum value min_band of the band power channel_power (m, k) and the band maximum value max_band of the noise minimum value noise_min (m, k) are respectively determined.
min_band = MIN (channel_power (m, k), k = 2,..., 11)
max_band = MAX (noise_power (m, k), k = 0,..., 15)
Ask for.
[0139]
Next, the adjustment value md common to the bands is set to
md = (noise_all−min_all) × (1−sp / 6) + (min_band−max_band) × sp / 6
Determined by According to this formula:
When sp = 0, that is, in the noise interval, md = noise_all-min_all
When sp = 6, that is, in the voice interval, md = min_band-max_band
Thus, it can be seen that the transition region takes an intermediate value between these.
[0140]
Examples of frequency versus power characteristics in the case of a noise frame and in the case of a speech frame are shown in FIGS. 8 and 9, respectively.
[0141]
In the noise frame, as shown in FIG. 8, the band power is close to the noise minimum value. The value obtained by adding the adjustment value to the noise minimum value is obtained by changing the average value to the noise noise average value noise_all without changing the spectral characteristics of the noise minimum value.
[0142]
On the other hand, in the case of an audio frame, as shown in FIG. 9, the value obtained by adding the adjustment value to the minimum noise value is the same as the minimum value of the band power while the minimum spectral characteristics remain unchanged. Will be adjusted as follows.
[0143]
The gain by band gain (m, k) is determined from the band power channel_power (m, k), the minimum noise value noise_min (m, k), and the adjustment value as follows.
[0144]
First,
tmp = channel_power (m, k) −noise_min (m, k) −md−1.625
And set.
[0145]
Next, the method of determining gain (m, k) (gain (m, K) ≦ 0) is changed by the voice weight sp.
(1) When sp> 0, that is, when a voice or a transient frame,
gain (m, k) = {sqrt (1.4 + (0.7 x tmp)²) + 0.7 × tmp−10} × 2
(2) When sp = 0, ie, a noise frame,
gain (m, k) = [sqrt (1.4+ (0.03125 × tmp)²) + 0.03125 × tmp−10] × 2
This is obtained independently for k =,.
[0146]
The function form for determining the gain can be set as appropriate. It suffices if the voice frame is lower than the noise frame in the region where the value of tmp is small.
[0147]
Then, the gain by band gain (m, k) obtained as described above is multiplied by the conversion coefficient for each band in the multiplier 23, and noise cancellation is thereby performed.
[0148]
FIG. 10 is a graph showing the relationship of tmp-gain. The solid line shows the case of the voice frame (sp> 0), and the dotted line shows the case of the noise frame (sp = 0).
[0149]
When tmp falls below 0, the audio frame gain is lower than the noise frame gain. This can be considered that tmp is obtained by subtracting md and a constant (1.625 in the above example) from the SNR of the band. Therefore, although there is a change in the adjustment value md, if the SNR of the band is small, the voice The section takes a gain value smaller than the noise section.
[0150]
This is by proactively suppressing (small gain value) a band showing a small SNR in a voice section (this is a band that can be estimated not to contain a voice component), thereby conspicuous a band containing a voice component in a voice frame. Result. This effect is achieved by setting it smaller than the band gain value of the noise frame.
[0151]
Such a gain value setting is not limited to the hyperbolic function and can be performed with various settings.
[0152]
For example, as shown in FIG.

It is also possible to set as follows.
[0153]
The noise-cancelled transform coefficient for each band is subjected to inverse fast Fourier transform in IFFT 24 and returned to the signal frame on the time axis, and then frame-synthesized in the frame synthesizer 25 as a transmission signal, for example, a speech code Supplied to the circuit.
[0154]
As described above, according to this embodiment, even in a frame determined as an audio frame, a gain smaller than a gain for each band of a frame determined as a noise frame is determined for a band determined not to include an audio component. Since it is set, the voice component (band) in the voice frame is emphasized, and as a result, an audibly good noise suppression output signal can be obtained.
[0155]
Further, since the minimum noise power value in each band is obtained in the minimum noise value estimation circuit 29 and the spectrum shape of this minimum noise value is used for determining the band-specific gain by the band-specific gain determination unit 33, for example, an automobile Thus, it is possible to realize a noise canceling process that is not affected by a short-term change in the noise spectrum as in the case of the passage of the voice and hardly distorts the voice spectrum.
[0156]
In addition, when the significant value y of each frame is large (usually judged as speech), but the change in the difference of the band difference from the previous frame is small (however, normalized by the average value) and the frames are continuous (For example, 100 frames) is determined as a noise frame, and the noise power estimation value is forcibly updated. When this forced update is determined, the continuous interval is counted with the value normalized by the average value of the spectral deviation, so even in the case of noise in which the spectral deviation varies between frames, it is counted as a continuous interval. can do. Therefore, even if there is a significant value change that does not make a good noise frame determination, the forced update is performed, so that a good noise power estimation value can be updated, and good noise suppression is performed.
[0157]
After the signal reduction amount (B) of the noise canceller is determined as described above, the final reduction amount determining unit 34 for adjusting the final signal attenuation amount (C) is based on S_loss and gain supplied from the voice switch. The final signal attenuation (lg) of the transmission signal is determined.
[0158]
This signal attenuation is determined, for example, as follows.
[0159]
lg (m, k) = MIN (S_loss (m), gain (m, k))
The smaller of S_loss and gain (m, k), that is, the greater signal attenuation is employed.
[0160]
FIG. 12 shows a sample of a transmission signal and a reception signal, and FIG. 13 shows a sample of power transitions of S_loss and lg.
[0161]
Since lg is always less than or equal to S_loss, LOSS less than or equal to VS_loss is always inserted into the communication system (transmission-reception). In addition, since S_loss = 0 when only the transmission is performed, noise suppression inherent in the noise canceller is realized. In the above example, gain <S_loss at the time when the transmitted voice is finished, so that the switch feeling due to the voice switch (a phenomenon in which the sound is unnaturally reduced due to abrupt switching of the LOSS of the transmission / reception system) is reduced.
[0162]
Next, the operation of the voice switch will be described.
[0163]
Although the above-described noise canceller may be provided on the receiver side to determine the presence / absence of voice, in this embodiment, the presence / absence of voice in the received signal is set to be performed by the double talk determination unit DTD. Various methods can be adopted as the determination method, and an example will be described with reference to FIG.
[0164]
In this example, the frame power P for each frame of the received signal (for example, P = 10 log (square mean value of samples)) is compared with a predetermined reference to determine the presence or absence of speech. In this determination, the minimum value min of the frame power P in the noise level update period INTVL (for example, 1 second: equivalent to 50 frames), the long-term average avg of P (for example, leak integral value: avg = γavg + (1−γ), γ is An appropriate setting (for example, 0.99) is used.
[0165]
In step (2a), the counter = 0 and the calculation is started. The initial value can be set as appropriate. For example, min = MAX_NOISE, noise = 5, and avg = 5 are set. MAX_NOISE is the highest noise value, for example, 36.
[0166]
In step (2b), the frame power P of the target frame is calculated / input.
[0167]
In step (2c), the counter is incremented by “1”, and in step (2d), the frame power P is compared with min. If P <min, rewrite as min = P (step 2e), otherwise maintain the current value of min and proceed to the next step (2f).
[0168]
In step (2f), the long-term average avg of the frame power P is obtained. For example, the leak integral value can be adopted as the long-term average value as shown in the following equation.
[0169]
avg = γavg + (1−γ), γ is appropriately set (for example, 0.99)
In step (2g), the counter value is compared with the INTVL value. If they do not match, the process proceeds to step (2m), and the noise value for the measurement target frame is output. If counter = INTVL, the process proceeds to the next step (2h) for updating the noise value.
[0170]
In step (2h), min-noise> -2 is determined. If Yes, proceed to step (2i) and set noise = min. If it is No, it will progress to the following step (2j) and will determine avg <noise-1. If Yes, the process proceeds to step (2k) and noise = avg is set.
[0171]
Next, the process proceeds to step (2l), resets counter = 0, sets min = MAX_NOISE, proceeds to the next step (2m), and outputs the updated noise value.
[0172]
The presence / absence of voice in the received signal is determined by inputting the frame power P and noise value for each frame.
[0173]
In step (2o), frame power P and noise are input, and in the next step (2p), P <noise + TH is determined. TH is a threshold and is set to 18, for example. If P <noise + TH, it is determined that the received voice is not present [no voice signal] (step (2q)), and otherwise, the received voice is present [voice signal present] (step (2r)).
[0174]
This result is output as the presence / absence of sound in the target frame (step (2s)).
[0175]
By repeating this for each frame, it is possible to detect the presence or absence of voice in the received signal.
[0176]
In response to this result, in the presence of voice, LOSS is inserted on the receiver side, so that the signal reduction rate of R_loss is executed by the amplifier. On the other hand, the signal cancel amount (B) of the noise canceller and S_loss (signal attenuation amount (A)) are sent to the transmission side for comparison adjustment.
[0177]
The loss of the voice switch described above is a binary value of “0” / VS_loss, but this setting can be changed as appropriate, and can be set as follows, for example.
(1) Incoming (no audio) / sent (no audio):
R_loss = VS_loss−h; S_loss = h
(2) Reception (with voice) / Transmission (without voice):
R_loss = 0; S_loss = VS_loss
(3) Incoming (no audio) / sent (with audio):
R_loss = VS_loss; S_loss = 0
(4) Incoming (with audio) / sent (with audio):
R_loss = 0; S_loss = VS_loss
Where h = (number of frames followed by sp = 0) × (−0.1)
VS_loss ≦ h ≦ 0
sp: Voice / noise judgment variable received from the noise canceller
In addition to the above embodiment, an echo canceller EC (10) may be added (FIG. 15). The echo canceller removes / reduces the signal when the output from the audio output unit is input from the audio input unit, and various methods can be adopted for the echo detection.
[0178]
Except for the echo canceller EC (10), the configuration is the same as that of FIG. In EC, the input signal from the audio input unit, the received signal input to the audio output unit, and the signal obtained by subtracting the received signal from the signal of the audio input unit are compared, and the received signal is not superimposed on the input signal If it is superposed, the signal is subtracted and used as an input signal to the noise canceller NC.
[0179]
Since the echo path changes according to the output sound environment, it is necessary to consider the time difference when the sound output from the sound output unit is received by the sound input unit.
[0180]
In the above-described embodiment, a noise canceller is inserted on the transmission side, but a noise canceller may be inserted on the reception side. If a noise canceller is inserted in both the transmission and reception, it is also possible to cancel the noise in both transmission and reception. In this case, the voice switch function is incorporated into the noise canceller and the voice signal LOSS is controlled by controlling the noise signal attenuation of the noise canceller. It can also serve as an insertion function.
[0181]
That is, it is only necessary to control the amount of signal attenuation so that the LOSS amount necessary for the voice switch function is always included in either of the transmission and reception.
[0182]
The present invention is not limited to a communication device such as a mobile phone, and can be used for any device as long as it is an electronic device using sound processing (recording device, portable electronic terminal, etc.).
[0183]
Each block shown in FIG. 2 is described for convenience in order to explain the function. Each block does not have to be an individual element, and one or more functions such as CPU, DSP, Needless to say, a modem, a voice encoding circuit, and the like may be integrated into a one-chip LSI.
[0184]
【The invention's effect】
As described above, according to the present invention, it is possible to provide an electronic apparatus capable of supplying a high-quality sound signal, which greatly contributes industrially.
[Brief description of the drawings]
FIG. 1 is a circuit block diagram showing an embodiment of the present invention.
FIG. 2 is a block diagram showing a noise canceller according to an embodiment of the present invention.
FIG. 3 is a flowchart illustrating a processing procedure of a significant value calculation unit according to the embodiment of this invention.
FIG. 4 is a flowchart showing a significant interval counting process procedure for determining noise power forced update according to the embodiment of the present invention;
FIG. 5 is a flowchart showing a processing procedure of a voice weight sp of the mobile phone embodying the present invention.
FIG. 6 is a flowchart showing a processing procedure of a noise minimum value estimation unit according to the embodiment of the present invention.
FIG. 7 is a flowchart showing a processing procedure of a noise minimum value estimation unit according to the embodiment of the present invention.
FIG. 8 is a diagram illustrating an example of frequency versus power characteristics in the case of a noise frame.
FIG. 9 is a diagram showing an example of frequency versus power characteristics in the case of a voice frame.
FIG. 10 is a relationship diagram of tmp-gain according to the embodiment of the present invention.
FIG. 11 is a relationship diagram of tmp-gain according to the embodiment of the present invention.
FIG. 12 is a diagram showing a sample of a transmission signal and a reception signal.
FIG. 13 is a diagram showing a sample of power transition of S_loss and lg.
FIG. 14 is a flowchart showing a processing procedure of the voice switch according to the embodiment of the present invention.
FIG. 15 is a block diagram of an embodiment of the present invention.
[Explanation of symbols]
NC: Noise canceller; VS: Voice switch

Claims

An electronic device that acquires an audio signal to be transmitted and received and performs signal reduction to at least one of a transmission signal and a reception signal,
A voice switch that sets a first signal reduction amount to the transmission signal based on a sound detection result of the transmission signal or the reception signal ;
An electronic apparatus comprising: a noise canceller that acquires a transmission signal and performs signal reduction to the transmission signal with a final signal reduction amount equal to or less than a first signal reduction amount set by the voice switch .

An electronic device that acquires an audio signal to be transmitted and received and performs signal reduction to at least one of a transmission signal and a reception signal,
A voice switch that sets a first signal reduction amount to the transmission signal based on a sound detection result of the transmission signal or the reception signal ;
A transmission signal is acquired, a second signal reduction amount for suppressing noise included in the transmission signal is calculated, and the first signal reduction amount to the transmission signal set by the voice switch and the second signal are calculated. An electronic apparatus comprising: a noise canceller that reduces a signal to a transmission signal with a final signal reduction amount adjusted to be equal to or less than a first signal reduction amount by comparing the reduction amount .

An electronic device that acquires an audio signal to be transmitted and received and performs signal reduction to at least one of a transmission signal and a reception signal,
A voice switch that sets a first signal reduction amount to the transmission signal based on a sound detection result of the transmission signal or the reception signal;
A transmission signal is acquired, a second signal reduction amount for suppressing noise included in the transmission signal is calculated, and any of the second signal reduction amount and the first signal reduction amount set by the voice switch is calculated. An electronic device comprising: a noise canceller characterized by performing signal reduction of the transmission signal by using one having a larger attenuation.

The voice switch sets a first signal reduction amount to the transmission signal, and when the signal reduction amount to the reception signal is set, the signal reduction of the reception signal is performed using the signal reduction amount to the reception signal. the electronic device according to claim 1 to claim 3, characterized in that to perform.

The noise canceller performs voice detection from a transmission signal or a reception signal, calculates a second signal reduction amount, and the voice switch receives a result of voice detection by the noise canceller, and calculates a first signal reduction amount to the transmission signal. the electronic device according to claims 2 to 4, characterized in that setting.

The electronic device according to claim 2, wherein the noise canceller sets the second signal reduction amount for each frequency band of the transmission signal.

An electronic device that acquires an audio signal to be transmitted and received and performs signal reduction to at least one of a transmission signal and a reception signal,
A voice switch that sets a predetermined signal reduction amount for at least one of the transmission signal or the reception signal based on the voice detection result ;
When a received signal is acquired and a signal reduction amount to the received signal is set by the voice switch , a received signal reduction unit that performs signal reduction to the received signal using the signal reduction amount;
Obtains a transmission signal, performs voice detection from the transmission signal or reception signal, calculates a signal reduction amount for noise suppression for each frequency band of the transmission signal based on the voice detection result, and transmits the signal to the transmission signal by the voice switch . If the signal reduction amount is set, the signal reduction amount of the transmission signal is reduced by the final signal reduction amount adjusted so that the signal reduction amount for noise suppression is equal to or less than the signal reduction amount set by the voice switch. And a noise canceller that reduces the signal of the transmission signal using the signal reduction amount for noise suppression when the signal reduction amount to the reception signal is set by the voice switch . Electronic equipment.