JP3756828B2

JP3756828B2 - Reverberation elimination method, apparatus for implementing this method, program, and recording medium therefor

Info

Publication number: JP3756828B2
Application number: JP2002048553A
Authority: JP
Inventors: 暁江村; 陽一羽田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2002-02-25
Filing date: 2002-02-25
Publication date: 2006-03-15
Anticipated expiration: 2022-02-25
Also published as: JP2003250193A

Description

【０００１】
【発明の属する技術分野】
この発明は、反響消去方法、装置、プログラムおよびその記録媒体に関し、特に、拡声通話装置の如き音響通信装置において通話の障害となり、時にはハウリングを引き起こす反響を消去する反響消去方法、装置、プログラムおよびその記録媒体に関する。
【０００２】
【従来の技術】
拡声通話装置においては、受話音声がスピーカから拡声されてマイクロホンに回り込み収音されて生じる反響が問題となる。通信回線を介して相互接続された拡声通話装置について閉ループのループゲインが１より大きい場合、ハウリングを引き起こして通話を不可能にする。また、ループゲインが１より小さい場合であっても、反響は通話の障害となると共に不快感を与える。より自然な通話環境を実現するために、スピーカからマイクロホンへの音響的回り込みにより生じる反響の消去が必要となる。
【０００３】
図１を参照するに、反響消去装置はＭチャネル再生系と１チャネル収音系に接続され、反響の消去を行う。ここで、受話端子１_m（ｍ＝１ないしＭ）から入力される受話信号は、スピーカ２_m（ｍ＝１ないしＭ）において音響信号として再生され、反響経路ｈ _m （ｍ＝１ないしＭ）を経てマイクロホン３に回り込む。
受話端子１_mと送話端子５の間に接続される反響消去部４により反響を消去する。この反響消去部４はＭ入力１出力適応フィルタより成る。マイクロホン３がＮ個ある場合は、図１に示されるＭ入力１出力適応フィルタをＮ個並列に並べた構成とする。
【０００４】
この反響消去部４の構成を図２を参照して説明する。各受話信号を予測反響生成部４１に入力して予測反響信号を生成し、この予測反響信号とマイクロホン３から入力する収音信号との間の差が減算器４２においてとられ、この残差信号ｅ(ｋ)が反響経路推定部４３にフィードバックされる。予測反響生成部４１への入力信号をｘ_m（ｋ）、マイクロホン３により収音された収音信号をｙ(ｋ)、スピーカ２_mからマイクロホン３に到る反響経路のインパルス応答をｈ_m、その長さをＬとすると、受話チャネル数Ｍ＝１のとき、入力信号ｘ_m(ｋ)と収音信号ｙ(ｋ)の間には、
【０００５】
【数６】

【０００６】
の様にベクトル化することで、入力信号ｘ(ｋ)と収音信号ｙ(ｋ)の関係を受話チャネル数Ｍ＝１のケースと同様に記述することができる。
反響消去部４の内部においては、予測反響生成部４１により予測反響信号が生成されて、実際の収音信号ｙ(ｋ)との間の差ｅ(ｋ)および過去の入力信号ｘ_m(ｋ)に基づいて収音信号ｙ(ｋ)と予測反響信号の差である残差信号ｅ(ｋ)が小さくなる様に予測反響信号生成用の適応フィルタの係数が逐次更新される。
ここにおいては、適応フィルタ係数の更新法をＮＬＭＳ法とした場合を説明する。実際の収音信号ｙ(ｋ)から適応フィルタにより予測された予測反響信号を差し引いて得られる残差信号ｅ(ｋ)は、
【０００７】
【数７】

【０００８】
により更新する。ただし、μは推定を安定にするため、０〜１の固定した値に設定されるステップサイズである。
この適応フィルタ更新方法において、収音信号ｙ(ｋ)は反響のみが収音されたものであることを前提としている。しかし、拡声通話装置が実際に使用されるときは、収音信号ｙ(ｋ)には送話および騒音の如き反響以外の信号が当然に含まれる。ここで、反響信号をｙ_E(ｋ)、送話および騒音の如き反響以外の信号を妨害信号ｙ_I(ｋ)とし、収音信号ｙ(ｋ)が
ｙ(ｋ)＝ｙ_E(ｋ）＋ｙ_I(ｋ)
で表されるものとする。このとき、ＮＬＭＳ法の適応フィルタ更新式は
【０００９】
【数８】

【００１０】
の方向に修正される。ただし、ε［・］は平均をとることを意味する。この第２項は理想的な修正方向からのズレを表し、送話および騒音が妨害信号として働くことがわかる。収音信号ｙ(ｋ)に妨害信号ｙ_I(ｋ)が含まれる状況においては、適応フィルタの係数がこの分だけ誤って更新されるので、ステップサイズμの値に応じたノイズが発生し、ときには適応フィルタを発散させる。発散を回避するには、ステップサイズμを充分に小さくする必要があるが、実際は不必要に小さいμを選択するか、或は発散しない程度の大きさのμで反響以外の音響の妨害による不正確な修正を或る確率で許容することになり、収束速度を低下させることにつながる。
【００１１】
文献 A.Mader,H.Puder,G.U.Schmidt,“Step-size control for acoustic echo cancellation filters-an overview,”Signal Processing,80,pp.1697-1719(2000)には、この様な状況において最適なステップサイズμを導く方法が示されている。これによれば、反響と予測反響の差である残留反響信号
【００１２】
【数９】

【００１３】
で求められる。
この式によれば、妨害信号パワーε［ｙ_I ²(ｋ)］が大きくなる程ステップサイズμが小さく設定されることにより、妨害信号ｙ_I(ｋ)が適応フィルタ推定に及ぼす影響を減少させている。
【００１４】
【発明が解決しようとする課題】
しかし、実際の環境でこの最適なステップサイズμをそのまま求めて適応フィルタを更新することはできなかった。それは、残留反響信号ｅ_E(ｋ）に妨害信号が重畳している残差信号から、残留反響信号ｅ_E(ｋ）だけを抽出することはできないからである。また、反響消去装置は、本来、スピーカ２_mからマイクロホン３までの未知の反響経路ｈを推定しながら反響を消去するに使用されるので、
ｅ_E(ｋ)＝（ｈ(ｋ)−ｈ ^{^}(ｋ)）^T ｘ(ｋ)
の関係式から残留反響信号を求めることもできないからである。
【００１５】
仮に、妨害信号ｙ_I(ｋ)のパワーε［ｙ_I ²(ｋ)］が一定で、そのレベルが予め分かっている場合、最適なステップサイズμを算出することはできる。しかし、通常は、騒音信号のレベルは一定とは限らないし、送話信号のレベルは時々刻々と変動している。
以上の状況において、最適なステップサイズμを使用して適応フィルタを更新するには、残差信号に占める反響成分の比率を推定する必要がある。
この発明の目的は、残差信号あるいは収音信号から残差信号に占める反響成分の比率を求め、この情報をもちいて適応フィルタ係数を更新することにより、多チャネル音響通信における上述の問題を解決する反響消去方法、装置、プログラムおよびその記録媒体を提供することにある。
【００１６】
【課題を解決するための手段】
この発明によれば、スピーカＭ個（Ｍは２以上の整数）とマイクロホンＮ個（Ｎは1以上の整数）が共通の音場に配置され、スピーカからＭチャネル信号を再生し、各マイクロホンに対応する各M入力１出力適応フィルタにＭチャネル再生信号を入力して反響信号を予測し、マイクロホンからの収音信号から適応フィルタ出力信号を差し引いて得られる残差信号を小さくするように適応フィルタ係数を更新する多チャネル音響通信システムにおいて、残差信号に占める反響成分の比率を使用して適応フィルタ係数を更新する反響消去方法を構成する。また残差信号の代わりに収音信号に占める反響成分の比率を使用して適応フィルタ係数を更新する反響消去方法を構成することもできる。これにより、収音信号に反響以外の信号が含まれる状況でも適応フィルタによる反響消去と反響経路推定が安定になる。
【００１７】
また、Ｍチャネル再生信号を短時間区間ごとに周波数領域に変換し、周波数領域の適応フィルタ係数に乗算し、時間領域に変換して反響信号を予測し、収音信号から予測した反響信号を差し引いて得られた残差信号を短時間区間ごとに周波数領域に変換し、再生信号と対象信号の短時間スペクトルから、周波数帯域ごとに対象信号に占める反響成分の比率を求める。周波数領域で周波数成分ごとに残差信号と再生信号を乗算して求めた修正ベクトルを、対象信号に占める反響成分の比率、および入力信号と修正用信号の情報に基づいて周波数帯域ごとに補正して、適応フィルタ係数を更新する反響消去方法を構成した。適応フィルタ係数を周波数領域で取り扱うことにより、収音信号に反響以外の信号が含まれる状況での反響消去と反響経路推定を安定にしつつ、トータルの演算量を大幅に削減することができる。
【００１８】
また、Ｍチャネル受話信号を処理して、チャネル間相関がほぼ無相関とみなせるＭチャネル付加信号を生成し受話信号に加算して再生信号とし、短時間区間ごとに周波数領域に変換して周波数領域の適応フィルタ係数に乗算したのち時間領域に変換して反響信号を予測し、収音信号と予測した反響信号との残差信号を短時間区間ごとに周波数領域に変換し、再生信号と対象信号の短時間スペクトルから周波数帯域ごとに対象信号に占める反響成分の比率を求め、Ｍチャネル付加信号にa倍（aは0〜1の値）したＭチャネル受話信号を加算して修正用信号を生成し、修正用信号を短時間区間ごとに周波数領域に変換し、周波数領域で周波数成分ごとに残差信号と修正用信号を乗算して求めた修正ベクトルを対象信号に占める反響成分の比率および入力信号と修正用信号の情報に基づいて周波数帯域ごとに補正し、補正された修正ベクトルで適応フィルタ係数を更新する反響消去方法を構成した。これにより、収音信号に反響以外の信号が含まれる状況での反響消去および反響経路推定を安定にし、トータルの演算量を大幅に削減しつつ、反響経路推定を高速化できる。
【００１９】
更に、第ｍチャネル再生信号より第１〜第m-1チャネル再生信号との相関成成分を除去した信号の短時間スペクトルを求め、対象信号より、第１〜第m-1チャネル再生信号との相関成分を除去した信号の短時間スペクトルを求め、これらの短時間スペクトルから求めたコヒーレンスをもちいて、対象信号に占める反響成分の比率を求める反響消去方法を構成する。このような推定法により、再生信号、収音信号に含まれる反響以外が時々刻々と変動する状況でも残差信号もしくは収音信号に占める反響成分の比率を確実に推定することが可能となる。
【００２０】
【発明の実施の形態】
残差信号もしくは収音信号を対象信号とするときに対象信号に占める反響成分の比率を推定する目的で、コヒーレンス即ち、クロススペクトルをパワースペクトルで正規化して得られる複素関数の振幅２乗値を使用することができる。以下、残差信号を対象信号とする場合について説明する。
入力チャネル数がＭ＝１のモノラルの反響消去装置について、適応フィルタへの入力信号ｘ(ｋ)と残差信号ｅ(ｋ)のパワースペクトルをＳ_xx(ｆ)、Ｓ_ee(ｆ)、クロススペクトルをＳ_xe(ｆ)とするとき、コヒーレンスは
【００２１】
【数１０】

【００２２】
で計算される。
通常、入力信号ｘ(ｋ)と妨害信号ｙ_I(ｋ)、および残留反響信号ｅ_E(ｋ）と妨害信号ｙ_I(ｋ)は無相関と見なせるので、
【００２３】
【数１１】

【００２４】
を満たしている。この式によれば、コヒーレンスγ²(ｆ)とは、入力信号スペクトルと相関のある成分が残差信号ｅ(ｋ)のパワースペクトルに占める割合である。即ち、入力信号ｘ(ｋ)と残差信号ｅ(ｋ)のコヒーレンスは、残差信号ｅ(ｋ)に占める反響成分即ち残留反響信号ｅ_E(ｋ）のパワー比を表わしている。なお、コヒーレンスについては、例えば日野著、朝倉書店発行『スペクトル解析』に詳説されており、コヒーレンスを使用する解析については、例えば森下、小畑著、計測自動制御学会発行『信号処理』に詳説されている。
【００２５】
各パワースペクトルとクロススペクトルは、入力信号ｘ(ｋ)、残留反響信号ｅ_E(ｋ）を２Ｌ点離散フーリエ変換して求めた短時間スペクトルＸ(ｆ)、Ｅ(ｆ)（ｆ＝１、・・・・・・、２Ｌ）および時間平均ε［・］から、
【００２６】
【数１２】

【００２７】
の様に求められる。残差信号ｅ(ｋ)から残留反響信号ｅ_E(ｋ）と妨害信号ｙ_I(ｋ)を分離することはできないが、このコヒーレンス解析を行うことにより、最適なステップサイズμを求めることが可能になる。
【００２８】
【数１３】

【００２９】
【数１４】

【００３０】
Ｘ_m・(m-1)!(ｆ)：信号ｘ_m(ｋ)から信号ｘ₁(ｋ)、・・・・・・、ｘ_(m-1)(ｋ)との相関成分を除去した信号の短時間スペクトル、および
Ｅ_・(m-1)!(ｆ)：信号ｅ(ｋ)から信号ｘ₁(ｋ)、・・・・・・、ｘ_(m-1)(ｋ)との相関成分を除去した信号の短時間スペクトルのコヒーレンスになっている。チャネル数Ｍ＝２のときと同様に、相関成分を除去した後の短時間スペクトルＸ_m・(m-1)!(ｆ)は
【００３１】
【数１５】

【００３２】
以上の相関成分除去演算は図９の第１の相関除去部４３２１_mと第２の相関除去部４３２２_mにより実行する。第１の相関除去部４３２１_mに入力信号の短時間スペクトルＸ_m(ｊ、ｆ)と相関が除去された信号のスペクトルを入力して相関成分を除去した後の短時間スペクトルＸ_m・(m-1)!(ｊ、ｆ)を得る。第２の相関除去部４３２２_mに反響信号Ｅ(ｊ、ｆ)と相関が除去された信号のスペクトルＸ_m・(m-1)!(ｊ、ｆ)を入力して相関成分を除去した後の短時間スペクトルＥ_・(m-1)!(ｆ)を得る。
【００３３】
残留反響信号ｅ_E(ｋ)の予測値と入力信号ｘ(ｋ)のコヒーレンスγ^{^2}(ｆ)をステップサイズ制御に使用することも考えられる。残留反響信号の予測法として、例えば反響信号ｙ_E(ｋ)の各周波数成分をｔ(ｆ)倍する方法が考えられる。一例として、ｔ(ｆ)＝０．１に設定する場合、残留反響の信号パワーを反響信号パワーの−２０ｄＢであるものと想定して、残差信号ｅ(ｋ)に占める残留反響信号ｅ_E(ｋ）の比率を求めることに対応する。
上述したＭチャネル入力信号と残差信号ｅ(ｋ)のコヒーレンス算出と同様にしてＭチャネル入力信号ｘ₁(ｋ)・・・・・ｘ_M(ｋ)と収音信号ｙ(ｋ)のコヒーレンスγ'(ｆ)が求められているとき、残差信号に占める反響信号成分の比率γ^{^2}(ｆ)は
【００３４】
【数１６】

【００３５】
の様に、γ'(ｆ)から算出することができる。
適応フィルタの更新方法としては、上述したＮＬＭＳ法の如く毎サンプルの処理を時間領域で行う仕方の他に、一定区間毎に処理を行うブロック処理方式がある。これは、文献 E.R.Ferrara,“Fast Implementation of LMS adaptive filters,”IEEE Trans.Acoust.,Speech Signal Processing,vol.ASSP-28,pp.474-475(1980)ですでに提案されている通り、ＦＦＴを利用して周波数領域の適応フィルタ係数を扱うことにより、トータルの計算量を大幅に削減することができる。この適応アルゴリズムでは、周波数領域の適応フィルタ係数ベクトルＨ ^{^}（ｊ）
が
【００３６】
【数１７】

【００３７】
以下、この発明の実施の形態を実施例を参照して説明する。
実施例１
実施例１においては、文献D.Mansour and A.H.Gray,“Unconstrained Frequency-Domain Adaptive Filter,”IEEE Trans.Acoust.,Speech,Signal Processing,vol.ASSP・30,No.5,pp.726-734(1982)で提案されたアルゴリズムをマルチチャネルに拡張し、コヒーレンスに基づくステップサイズ制御方法を適用した場合を説明する。この周波数領域適応アルゴリズムは、白色化処理により受話信号の如きスペクトルに偏りのある信号が入力されても適応フィルタの収束特性の劣化が防止される。
【００３８】
以下の説明は、残差信号を対象信号とし、適応フィルタ長をＬとし、Overlap-save方式を使用してＬ／Ｄサンプル毎に長さ２Ｌの信号ベクトルを処理する場合を取り扱っている。
（ステップ１）
入力信号ｘ_m(ｋ)（ｍ＝１、…、Ｍ）を、Ｌ／Ｄサンプル毎に長さ２Ｌの信号ベクトルにブロック化して、ＦＦＴにより周波数領域に変換する。
Ｘ _m（ｊ）＝ｄｉａｇ（ＦＦＴ（［ｘ_m（ｊＬ／Ｄ−２Ｌ＋１）、・・・・・、ｘ_m（ｊＬ／Ｄ）］^T）、ここで、（ｍ＝１、・・・・・、Ｍ）
ただし、関数ＦＦＴ（ｘ）はベクトルｘをＦＦＴ変換する関数であり、ベクトルｘは関数ｄｉａｇ（ｘ）によりその要素を対角成分とする行列に変換される。即ち、ｘ＝［ｘ（１）・・・・・・ｘ（２Ｌ）］^Tのとき
【００３９】
【数１８】

【００４０】
（ステップ２）
周波数領域でＸ _m（ｊ）と第ｍチャネルの周波数領域での適応フィルタ係数ベクトルＨ ^{^} _m（ｊ）を掛けることで、チャンネル毎に入力信号ベクトルをフィルタ処理する。計算結果を逆ＦＦＴ処理して、時間領域の信号ベクトルｙ ^{^} _m（ｊ）を得る。
ｙ ^{^} _m（ｊ）＝［０ _L Ｉ _L］ＩＦＦＴ（Ｘ _m（ｊ）Ｈ ^{^} _m（ｊ））ただし、Ｈ ^{^} _m（ｊ）は要素数２Ｌの複素数ベクトルであり、逆ＦＦＴ変換して前半Ｌ個を取り出すと、適応フィルタのインパルス応答になる。０ _LはＬ×Ｌの零行列、Ｉ _LはＬ×Ｌの単位行列である。
【００４１】
（ステップ３）
信号ベクトルｙ ^{^} _m（ｊ）を加算して、予測反響信号のベクトルｙ ^{^}（ｊ）を得る。
ｙ ^{^}（ｊ）＝Σ^M _m=1 ｙ ^{^} _m（ｊ）
（ステップ４）
時間領域にて収音信号ベクトルｙ（ｊ）と予測反響ベクトルｙ ^{^}（ｊ）から残差信号ベクトルＥ（ｊ）を求め、ＦＦＴにより周波数領域に変換する。
【００４２】
【数１９】

（ステップ５）
【００４３】
【数２０】

【００４４】
【数２１】

【００４５】
（ステップ６）
残差信号と入力信号を周波数領域で処理し、修正ベクトルｄＨ ^{^} _m（ｊ）を求める。
【００４６】
【数２２】

【００４７】
ただし、行列Ｘ ^* _m(ｋ)の各成分は行列Ｘ _m(ｋ)各成分の複素共役である。
（ステップ７）
行列Ｐ(ｋ)を、
【００４８】
【数２３】

【００４９】
により求めた入力信号のパワースペクトル総和である。ただし、Ｘ^*は複素数Ｘの複素共役であり、βは短時間平均をとるための平滑化定数で０＜β＜１の値をとる。
（ステップ８）
ステップ５において求められた残差信号に占める反響成分の比率γ²(ｆ)から
【００５０】
【数２４】

【００５１】
によりコヒーレンスγ²(ｆ)を対角要素とする行列Ｍ（ｊ）を求める。ただし、μ₀は０〜１の間の固定値に設定される。適応フィルタを次式で更新する。
Ｈ ^{^} _m（ｊ＋１）＝Ｈ ^{^} _m（ｊ）＋Ｍ（ｊ）Ｐ（ｊ）ｄＨ ^{^} _m（ｊ）
行列Ｍ（ｊ）を掛けることにより周波数帯域毎に残差信号に占める反響成分の比率γ²(ｆ)に基づいてステップサイズが最適に制御される。行列Ｐ（ｊ）を修正ベクトルｄＨ ^{^} _m（ｊ）に掛けることは入力信号の白色化処理に対応し、入力信号が音声の様に有色性信号のとき適応フィルタの収束特性を向上させることが知られている。
【００５２】
実施例１の方法は、図３の構成の反響消去部４により実施される。
入力信号ｘ₁(ｋ)_......ｘ_M(ｋ)はＴＦ変換部４１１₁〜４１１_Mにてステップ１の如くにブロック化され、周波数領域に変換される。そして、フィルタ処理部４１２₁〜４１２_MとＦＴ変換部４１３₁〜４１３_M、ベクトル加算部４１４にてステップ２、３の様に時間領域の予測反響信号のベクトルｙ ^{^}（ｊ）が算出される。収音信号ｙ(ｋ)は、入力信号ｘ(ｋ)と時間ズレが生じない様にブロック化部４５でブロック化され、そして、信号ベクトル減算部４２でステップ４の様に予測反響の信号ベクトルｙ ^{^}（ｊ）が差し引かれ、ＴＦ変換部４３１にて周波数領域の残差信号ベクトルＥ（ｊ）が求められる。
【００５３】
コヒーレンス推定部４３２は、周波数領域の残差信号ベクトルＥ（ｊ）と周波数領域の入力信号ベクトルＸ _m（ｊ）から、ステップ５に従ってコヒーレンスを算出する。コヒーレンス推定部４３２の具体的構成は図８および図９に示されている。各周波数帯域に対応する第１および第２の相関除去部４３２１_m、４３２２_mに残差信号ベクトルＥ（ｊ）と周波数領域の入力信号ベクトルＸ _m（ｊ）を入力し、相関の除去された短時間スペクトルからコヒーレンス算出部４３２３₁〜４３２３_Mによりコヒーレンスを算出し、反響成分比率算出部４３２４にて残差信号に占める反響成分の比率を求める。
【００５４】
フィルタ更新部４３３₁〜４３３_Mは周波数領域の入力信号ベクトルＸ _m（ｊ）と周波数領域の残差信号ベクトルＥ（ｊ）とからステップ６に従って周波数領域で修正ベクトルを求めると同時にステップ７に従って行列Ｐ（ｊ）を計算する。そして、ステップ８に従って修正ベクトルを補正して適応フィルタ係数を更新する。更新されたフィルタ係数は、フィルタ処理部４１２₁〜４１２_Mに渡される。
実施例２
実施例２は、コヒーレンスに基づくステップサイズ制御方法を、文献江村、羽田、“付加信号強調型の周波数領域ステレオ適応アルゴリズム”、日本音響学会２００１年秋季研究発表会、ｐｐ．５３７−５３８（２００１）で提案されているマルチチャネル適応アルゴリズムに適用し残差信号を対象信号とした場合について説明する。
【００５５】
この適応アルゴリズムは、入力信号ｘ_m(ｋ)の代わりに修正用信号ｚ_m(ｋ)から適応フィルタの修正ベクトルを求める。そのために、図４のＭチャネル反響消去部７にはＭチャネル受話信号ｕ_m(ｋ)の他に、相関変動処理部６₁〜６_Mにより生成されたＭチャネル付加信号ｇ_m（ｕ_m(ｋ)）も入力される。なお、相関変動処理部６₁〜６_Mは、マルチチャネル反響消去装置の反響経路推定性能向上に一般的に使われる装置である。
図４のＭチャネル反響消去部７は、以下のステップに従って適応フィルタの係数を更新する。
【００５６】
（ステップ１）
各チャネルの受話信号ｕ_m(ｋ)と受話信号ｕ_m(ｋ)を相関変動処理部６_Mに入力して得られた付加信号ｇ_m（ｕ_m(ｋ)）とから再生信号ｘ_m(ｋ)と修正用信号ｚ_m(ｋ)を
ｘ_m(ｋ)＝ｕ_m(ｋ)＋ｇ_m（ｕ_m(ｋ))
ｚ_m(ｋ)＝ａｕ_m(ｋ)＋ｇ_m（ｕ_m(ｋ))
（ただし、ｍ＝１、…、Ｍ、０＜ａ≦１）
により生成する。そして、Ｌ／Ｄサンプル毎に長さ２Ｌの信号ベクトルにブロック化し、ＦＦＴにより、
Ｘ _m(ｊ)＝ｄｉａｇ（ＦＦＴ（［ｘ_m（ｊＬ／Ｄ−２Ｌ＋１）、…、ｘ_m（ｊＬ／Ｄ）］^T））
Ｚ _m(ｊ)＝ｄｉａｇ（ＦＦＴ（［ｚ_m（ｊＬ／Ｄ−２Ｌ＋１）、…、ｚ_m（ｊＬ／Ｄ）］^T））
（ただし、ｍ＝１、…、Ｍ）
の様に周波数領域に変換する。
【００５７】
（ステップ２）
周波数領域でＸ _m（ｊ）とＨ ^{^} _m（ｊ）を掛けることで、チャネル毎に入力信号ベクトルをフィルタ処理する。計算結果を逆ＦＦＴ処理し、時間領域の信号ベクトルｙ ^{^} _m（ｊ）（ただし、ｍ＝１、…、Ｍ）を得る。
ｙ ^{^} _m（ｊ）＝［０ _L Ｉ _L］ＩＦＦＴ（ｘ _m（ｊ）Ｈ ^{^} _m（ｊ））
ただし、０ _LはＬ×Ｌの零行列、Ｉ _LはＬ×Ｌの単位行列である。
（ステップ３）
信号ベクトルｙ ^{^} _m（ｊ）（ｍ＝１、…、Ｍ）を加算して、予測反響信号のベクトルｙ ^{^}（ｊ）を得る。
【００５８】
ｙ ^{^}（ｊ）＝Σ^M _m=1 ｙ ^{^} _m（ｊ）
（ステップ４）
時間領域にて収音信号ベクトルｙ（ｊ）と予測反響信号のベクトルｙ ^{^}（ｊ）から残差信号ベクトルを求め、ＦＦＴにより周波数領域に変換する。
【００５９】
【数２５】

（ステップ５）
【００６０】
【数２６】

【００６１】
【数２７】

【００６２】
（ステップ６）
残差信号と修正用信号を周波数領域で処理し、修正ベクトルｄＨ ^{^} _m（ｊ）を求める。
【００６３】
【数２８】

【００６４】
により計算する。ただし、関数Ｘ_m（ｊ、ｆ）、Ｚ_m（ｊ、ｆ）は行列Ｘ _m（ｊ）および行列Ｚ _m（ｊ）の（ｆ、ｆ）番目の要素である。δは分母が０になることを防止するための微小な正定数である。行列Ｐ（ｊ）中のｐ（ｊ、ｆ）は、各チャネルの入力信号と修正用信号のクロススペクトルの総和になっている。
（ステップ８）
ステップ５において求められたコヒーレンスγ²(ｆ)から
【００６５】
【数２９】

【００６６】
によりコヒーレンスγ²(ｆ)を対角要素とする行列Ｍ（ｊ）を求める。ただし、μ₀は０〜１の間の固定値に設定される。適応フィルタを次式で更新する。
Ｈ ^{^} _m（ｊ＋１）＝Ｈ ^{^} _m（ｊ）＋Ｍ（ｊ）Ｐ（ｊ）ｄＨ ^{^} _m（ｊ）
行列Ｍ（ｊ）を掛けることにより周波数帯域毎に対象信号に占める反響成分の比率に基づいてステップサイズが最適に制御される。行列Ｐ（ｊ）を修正ベクトルｄＨ ^{^} _m（ｊ）に掛けることは入力信号の白色化処理に対応し、入力信号が音声の様に有色性信号のとき適応フィルタの収束特性を向上させることが知られている。
【００６７】
Ｍチャネル反響消去部７の内部は、図５の様な構成をとる。再生信号ｘ_m(ｋ)および修正用信号ｚ_m(ｋ)をＴＦ変換するＴＦ変換部７０２_m、７０５_mは、図３のＴＦ変換部４１１_mに対応している。
加算器７０１_mにより受話信号ｕ_m(ｋ)に付加信号ｇ_m(ｕ_m(ｋ))が加算されて再生信号ｘ_m(ｋ)が生成され、ＴＦ変換部７０２_mによって行列Ｘ _m（ｊ）に変換
される。また、受話信号をｕ_m(ｋ) は減衰器７０３_mによりａ倍され（ただし、ａは０から１の値）、加算器７０４_mにより付加信号ｇ_m(ｕ_m(ｋ))が加算されて修正用信号ｚ_m(ｋ)が生成される。そして、ＴＦ変換部７０５_mにより行列Ｚ _m(ｊ)に変換される。
【００６８】
行列Ｘ _m（ｊ）はフィルタ処理部７１２_mに渡され、行列Ｚ _m(ｊ)はフィルタ更新部７３３_mに渡される。フィルタ処理部７１２_m 、ＦＴ変換部７１３_m、ベクトル加算部７１４は、ステップ２およびステップ３の処理を経て予測反響信号が生成される。マイクロホン３から得られる収音信号ｙ(ｋ)は、ブロック化部７５でブロック化され、ステップ４に従ってベクトル減算部７２にて予測反響信号ベクトルとの差がとられ、ＴＦ変換部７３１で周波数領域へ変換される。
コヒーレンス推定部７３２は、周波数領域の残差信号ベクトルＥ（ｊ）と入力信号ベクトルＸ _m（ｊ）からステップ５に従ってコヒーレンスを推定する。
フィルタ更新部７３３_m（ｍ＝１、…、Ｍ）は、ステップ６、ステップ７、ステップ８に従って周波数領域でＨ ^{^} _m（ｊ）を更新する。
【００６９】
図７を参照して実施例２の数値シミュレーション結果を説明する。
この数値シミュレーションは、入力チャネル数をＭ＝２とし、サンプリング周波数を８ｋＨｚに設定し、反響経路として残響時間２００ｍｓの部屋で実測した室内伝達関数を７００タップに打ち切って反響を生成した。また、妨害信号としてはレベル変動するホス雑音と送話信号が重畳した信号を使用した。反響信号、妨害信号、収音信号＝反響信号＋妨害信号および本手法適用後の残差信号ｅ(ｋ)は、それぞれ図６の様になっている。この信号を使用し、ステップサイズ制御を行わない従来方法と提案するステップサイズ制御方法を比較した。
【００７０】
チャネル当りの適応フィルタタップ数をＬ＝５１２とし、適応フィルタが１２８サンプル即ち１６ｍｓ毎に更新される様にＤ＝４に設定した。また、μ₀＝０．２に設定した。適応フィルタの係数誤差の変化を図７に示す。このグラフによれば、妨害信号が若干大きくなっている区間（ｔ＝４〜６ｓ）において、従来方法（点線）では推定による係数誤差が悪化している。しかし、提案方法（実線）は、この区間の推定は安定である。また、妨害信号が急激に大きくなる区間（ｔ＝６ｓ）において、従来方法は係数誤差が０ｄＢから８ｄＢに拡大して反響経路推定が不安定になっている。一方、提案方法は、この区間の係数誤差の悪化は−６ｄＢから−５ｄＢの１ｄＢにとどまっている。
【００７１】
【発明の効果】
以上の通りであって、この発明によれば、周波数領域の適応フィルタ係数と直前フレームのフィルタ係数の間の修正量として、従来の修正ベクトルと入力信号パワーの逆数の積を、残差信号もしくは収音信号と入力信号との間のコヒーレンスを用いて補正することにより、送話、周囲騒音その他の反響以外の妨害信号の存在する状況下においても適応フィルタの反響経路推定を頑健にすることができる。
【図面の簡単な説明】
【図１】多チャネル音響通信装置全体の概略を説明する図。
【図２】従来例を説明する図。
【図３】実施例を説明する図。
【図４】実施例を含む多チャネル音響通信装置全体の概略を説明する図。
【図５】他の実施例を説明する図。
【図６】反響信号、妨害信号、収音信号を示す図。
【図７】実施例の数値シミュレーション結果を示す図。
【図８】コヒーレンスおよび反響成分比率の算出を説明する図。
【図９】相関成分除去演算を説明する図。[0001]
BACKGROUND OF THE INVENTION
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an echo cancellation method, apparatus, program, and recording medium thereof, and in particular, an echo cancellation method, apparatus, program, and program for canceling an echo that disturbs a call in an acoustic communication device such as a voice communication device and sometimes causes howling The present invention relates to a recording medium.
[0002]
[Prior art]
In a loudspeaker device, there is a problem of reverberation that occurs when a received voice is loudened from a speaker and is collected by a microphone. If the closed loop gain of a loudspeaker connected via a communication line is greater than 1, it causes howling and makes the call impossible. Even if the loop gain is smaller than 1, the echo becomes an obstacle to the call and gives an unpleasant feeling. In order to realize a more natural call environment, it is necessary to eliminate the echo generated by acoustic wraparound from the speaker to the microphone.
[0003]
Referring to FIG. 1, the echo canceling apparatus is connected to an M channel reproduction system and a one channel sound collection system, and cancels echo. Here, the receiving terminal 1_mThe received signal input from (m = 1 to M) is the speaker 2_m(M = 1 to M), reproduced as an acoustic signal, and reverberation pathh _m It goes around the microphone 3 through (m = 1 to M).
Earphone terminal 1_mAnd the echo canceling unit 4 connected between the transmitter terminals 5 cancels the echoes. The echo canceling unit 4 comprises an M input 1 output adaptive filter. When there are N microphones 3, the configuration is such that N M-input 1-output adaptive filters shown in FIG. 1 are arranged in parallel.
[0004]
The configuration of the echo canceling unit 4 will be described with reference to FIG. Each received signal is input to the predicted echo generation unit 41 to generate a predicted echo signal, and a difference between the predicted echo signal and the collected sound signal input from the microphone 3 is taken by the subtractor 42, and this residual signal e (k) is fed back to the echo path estimation unit 43. The input signal to the predicted echo generator 41 is x_m(K), y (k), a speaker 2_mThe impulse response of the echo path from to the microphone 3 is h._mAssuming that the length is L, when the number of reception channels M = 1, the input signal x_mBetween (k) and the collected sound signal y (k)
[0005]
[Formula 6]

[0006]
By vectorizing like this, the relationship between the input signal x (k) and the collected sound signal y (k) can be described in the same manner as in the case where the number of received channels M = 1.
Inside the echo canceling unit 4, a predicted echo signal is generated by the predicted echo generating unit 41, and the difference e (k) from the actual sound pickup signal y (k) and the past input signal x_mBased on (k), the coefficient of the adaptive filter for generating the predicted echo signal is sequentially updated so that the residual signal e (k), which is the difference between the collected sound signal y (k) and the predicted echo signal, becomes small.
Here, a case where the adaptive filter coefficient updating method is the NLMS method will be described. The residual signal e (k) obtained by subtracting the predicted echo signal predicted by the adaptive filter from the actual collected signal y (k) is:
[0007]
[Expression 7]

[0008]
Update with However, μ is a step size set to a fixed value of 0 to 1 in order to stabilize the estimation.
In this adaptive filter updating method, it is assumed that the collected sound signal y (k) is a signal obtained by collecting only echo. However, when the voice communication device is actually used, the sound pickup signal y (k) naturally includes signals other than reverberation such as transmission and noise. Where y is the echo signal_E(k) Signals other than reverberation such as transmission and noise are disturbing signals y_I(k) and the collected sound signal y (k) is
y (k) = y_E(k) + y_I(k)
It shall be represented by At this time, the adaptive filter update formula of the NLMS method is
[0009]
[Equation 8]

[0010]
Will be corrected in the direction of. However, ε [·] means taking an average. This second term represents a deviation from the ideal correction direction, and it can be seen that transmission and noise act as interference signals. Interference signal y to sound collection signal y (k)_IIn the situation where (k) is included, the coefficient of the adaptive filter is erroneously updated by this amount, so that noise corresponding to the value of the step size μ is generated and sometimes the adaptive filter is diverged. In order to avoid divergence, it is necessary to make the step size μ sufficiently small. Accurate correction is allowed with a certain probability, leading to a decrease in convergence speed.
[0011]
Reference A.Mader, H. Puder, GUSchmidt, “Step-size control for acoustic echo cancellation filters-an overview,” Signal Processing, 80, pp. 1697-1719 (2000). A method for deriving the step size μ is shown. According to this, the residual echo signal, which is the difference between the echo and the predicted echo
[0012]
[Equation 9]

[0013]
Is required.
According to this equation, the interference signal power ε [y_I ²(k)] becomes larger as the step size μ is set smaller._IThe effect of (k) on adaptive filter estimation is reduced.
[0014]
[Problems to be solved by the invention]
However, the adaptive filter cannot be updated by obtaining the optimum step size μ as it is in an actual environment. It is the residual echo signal e_EFrom the residual signal in which the interference signal is superimposed on (k), the residual echo signal e_EThis is because it is not possible to extract only (k). In addition, the echo canceling device is originally a speaker 2._mTo the microphone 3 unknown echo pathhIs used to cancel the echo while estimating
e_E(k) = (h(k) −h ^{^}(k))^T x(k)
This is because the residual echo signal cannot be obtained from the above relational expression.
[0015]
Temporary signal y_I(k) power ε [y_I ²If (k)] is constant and the level is known in advance, the optimum step size μ can be calculated. However, normally, the level of the noise signal is not always constant, and the level of the transmission signal varies every moment.
In the above situation, in order to update the adaptive filter using the optimum step size μ, it is necessary to estimate the ratio of the reverberation component in the residual signal.
An object of the present invention is to solve the above-mentioned problem in multi-channel acoustic communication by obtaining the ratio of the reverberation component in the residual signal from the residual signal or the collected sound signal and updating the adaptive filter coefficient using this information. An echo canceling method, apparatus, program and recording medium therefor are provided.
[0016]
[Means for Solving the Problems]
According to the present invention, M speakers (M is an integer of 2 or more) and N microphones (N is an integer of 1 or more) are arranged in a common sound field, reproduce M channel signals from the speakers, An adaptive filter so as to reduce the residual signal obtained by inputting an M channel reproduction signal to each corresponding M input 1 output adaptive filter to predict an echo signal, and subtracting the adaptive filter output signal from the collected sound signal from the microphone. In a multi-channel acoustic communication system in which coefficients are updated, an echo cancellation method for updating adaptive filter coefficients is configured using the ratio of the echo components in the residual signal. It is also possible to configure an echo cancellation method that updates the adaptive filter coefficient using the ratio of the echo component in the collected sound signal instead of the residual signal. As a result, echo cancellation and echo path estimation by the adaptive filter are stable even in a situation where the collected signal includes a signal other than echo.
[0017]
In addition, the M channel reproduction signal is converted into the frequency domain for each short time interval, multiplied by the frequency domain adaptive filter coefficient, converted into the time domain, the echo signal is predicted, and the predicted echo signal is subtracted from the collected sound signal. The residual signal obtained in this way is converted into the frequency domain for each short time interval, and the ratio of the reverberation component occupying the target signal for each frequency band is determined from the reproduction signal and the short time spectrum of the target signal. The correction vector obtained by multiplying the residual signal and playback signal for each frequency component in the frequency domain is corrected for each frequency band based on the ratio of the echo component in the target signal and the information of the input signal and the correction signal. Thus, an echo cancellation method for updating the adaptive filter coefficient is constructed. By handling the adaptive filter coefficients in the frequency domain, the total amount of computation can be greatly reduced while stabilizing the echo cancellation and the echo path estimation in a situation where signals other than the echo are included in the collected sound signal.
[0018]
Further, an M channel received signal is processed to generate an M channel additional signal that can be regarded as having almost no correlation between channels, and is added to the received signal to obtain a reproduced signal. After multiplying by the adaptive filter coefficient of, it converts to the time domain and predicts the reverberation signal, converts the residual signal between the collected sound signal and the predicted reverberation signal to the frequency domain every short time interval, and reproduces the signal and target signal The ratio of the reverberation component occupying the target signal for each frequency band is obtained from the short-time spectrum, and the correction signal is generated by adding the M channel received signal multiplied by a (a is a value between 0 and 1) to the M channel additional signal. Then, the correction signal is converted into the frequency domain for each short time interval, and the correction vector obtained by multiplying the residual signal and the correction signal for each frequency component in the frequency domain and the ratio of the reverberation component in the target signal and the input Based on the information of the force signal and the correction signal, the echo canceling method is configured to correct each frequency band and update the adaptive filter coefficient with the corrected vector corrected. This stabilizes echo cancellation and echo path estimation in a situation where the collected sound signal includes a signal other than echo, and makes it possible to speed up the echo path estimation while greatly reducing the total amount of computation.
[0019]
Further, a short-time spectrum of a signal obtained by removing a correlation component with the first to m-1th channel reproduction signals from the mth channel reproduction signal is obtained, and the first to m-1th channel reproduction signals are obtained from the target signal. A short-time spectrum of the signal from which the correlation component is removed is obtained, and an echo canceling method for obtaining a ratio of the reverberation component in the target signal by using the coherence obtained from the short-time spectrum is configured. By such an estimation method, it is possible to reliably estimate the ratio of the reverberation component in the residual signal or the collected sound signal even in a situation where the components other than the reverberations included in the reproduction signal and the collected sound signal fluctuate every moment.
[0020]
DETAILED DESCRIPTION OF THE INVENTION
For the purpose of estimating the ratio of the reverberation component occupying the target signal when the residual signal or the collected sound signal is the target signal, the coherence, that is, the amplitude squared value of the complex function obtained by normalizing the cross spectrum with the power spectrum is used. Can be used. Hereinafter, a case where the residual signal is the target signal will be described.
For a monaural echo canceller with M = 1, the power spectrum of the input signal x (k) and residual signal e (k) to the adaptive filter is S_xx(f), S_ee(f) Cross spectrum is S_xeWhen (f), the coherence is
[0021]
[Expression 10]

[0022]
Calculated by
Usually, the input signal x (k) and the interference signal y_I(k) and the residual echo signal e_E(k) and interference signal y_ISince (k) can be considered uncorrelated,
[0023]
## EQU11 ##

[0024]
Meet. According to this equation, the coherence γ²(f) is the ratio of the component correlated with the input signal spectrum to the power spectrum of the residual signal e (k). That is, the coherence between the input signal x (k) and the residual signal e (k) is the reverberation component occupying the residual signal e (k), that is, the residual reverberation signal e._EThe power ratio of (k) is represented. Coherence is described in detail in, for example, “Spectrum Analysis” by Hino and Asakura Shoten, and analysis using coherence is described in detail in “Signal Processing” by, for example, Morishita and Obata, published by the Society of Instrument and Control Engineers. Yes.
[0025]
Each power spectrum and cross spectrum includes an input signal x (k), a residual echo signal e_EFrom the short-time spectrum X (f), E (f) (f = 1,..., 2L) and the time average ε [•] obtained by performing (L) discrete Fourier transform of (k),
[0026]
[Expression 12]

[0027]
Is required. Residual echo signal e from residual signal e (k)_E(k) and interference signal y_IAlthough (k) cannot be separated, an optimal step size μ can be obtained by performing this coherence analysis.
[0028]
[Formula 13]

[0029]
[Expression 14]

[0030]
X_{m ・ (m-1)!}(f): Signal x_mSignal k from (k)₁(k), ..., x_(m-1)a short-time spectrum of the signal from which the correlation component with (k) is removed, and
E_{・ (M-1)!}(f): Signal x from signal e (k)₁(k), ..., x_(m-1)This is the coherence of the short-time spectrum of the signal from which the correlation component with (k) is removed. As in the case of the number of channels M = 2, the short-time spectrum X after removing the correlation component_{m ・ (m-1)!}(f)
[0031]
[Expression 15]

[0032]
The above correlation component removal calculation is performed by the first correlation removal unit 4321 in FIG._mAnd the second correlation removal unit 4322_mTo execute. First correlation removal unit 4321_mThe short-time spectrum X of the input signal_m(j, f) and the short-time spectrum X after the correlation component is removed by inputting the spectrum of the signal from which the correlation is removed_{m ・ (m-1)!}(j, f) is obtained. Second correlation removal unit 4322_mThe spectrum X of the signal from which the correlation with the echo signal E (j, f) is removed_{m ・ (m-1)!}Short-time spectrum E after inputting (j, f) and removing the correlation component_{・ (M-1)!}(f) is obtained.
[0033]
Residual echo signal e_Epredicted value of (k) and coherence γ of input signal x (k)^{^ 2}It is also conceivable to use (f) for step size control. As a prediction method of the residual echo signal, for example, the echo signal y_EA method of multiplying each frequency component of (k) by t (f) is conceivable. As an example, when t (f) = 0.1 is set, it is assumed that the residual echo signal power is −20 dB of the echo signal power, and the residual echo signal e occupies the residual signal e (k)._EThis corresponds to obtaining the ratio of (k).
Similar to the above-described coherence calculation of the M channel input signal and the residual signal e (k), the M channel input signal x₁(k) ... x_MWhen the coherence γ ′ (f) between (k) and the collected sound signal y (k) is obtained, the ratio γ of the reverberant signal component in the residual signal γ^{^ 2}(f)
[0034]
[Expression 16]

[0035]
As described above, it can be calculated from γ ′ (f).
As a method for updating the adaptive filter, there is a block processing method in which processing is performed for each fixed section in addition to the method of processing each sample in the time domain as in the NLMS method described above. This is the FFT, as already proposed in the document ERFerrara, “Fast Implementation of LMS adaptive filters,” IEEE Trans.Acoust., Speech Signal Processing, vol.ASSP-28, pp.474-475 (1980). By using the adaptive filter coefficients in the frequency domain using, the total amount of calculation can be greatly reduced. This adaptive algorithm uses frequency domain adaptive filter coefficient vectors.H ^{^}(J)
But
[0036]
[Expression 17]

[0037]
Embodiments of the present invention will be described below with reference to examples.
Example 1
In Example 1, Document D.Mansour and AHGray, “Unconstrained Frequency-Domain Adaptive Filter,” IEEE Trans.Acoust., Speech, Signal Processing, vol.ASSP · 30, No.5, pp.726-734 ( The case where the algorithm proposed in 1982) is extended to multi-channel and a step size control method based on coherence is applied will be described. This frequency domain adaptive algorithm prevents the convergence characteristics of the adaptive filter from deteriorating even when a signal with a biased spectrum such as a received signal is input by whitening processing.
[0038]
The following description deals with a case where a residual signal is a target signal, an adaptive filter length is L, and a signal vector having a length of 2L is processed for each L / D sample using the overlap-save method.
(Step 1)
Input signal x_m(k) (m = 1,..., M) is blocked into a signal vector having a length of 2L for each L / D sample, and converted into the frequency domain by FFT.
X _m(J) = diag (FFT ([x_m(JL / D-2L + 1), ..., x_m(JL / D)]^T), Where (m = 1,..., M)
However, the function FFT (x) is a function for performing FFT conversion on the vector x, and the vector x is converted into a matrix having its elements as diagonal components by the function diag (x). That is,x= [X (1)... X (2L)]^TWhen
[0039]
[Formula 18]

[0040]
(Step 2)
In the frequency domainX _m(J) and an adaptive filter coefficient vector in the frequency domain of the m-th channelH ^{^} _mBy multiplying (j), the input signal vector is filtered for each channel. Time domain signal vector by inverse FFT processing of calculation resulty ^{^} _m(J) is obtained.
y ^{^} _m(J) = [0 _L I _L] IFFT (X _m(J)H ^{^} _m(J)) However,H ^{^} _m(J) is a complex vector of 2L elements, and when the first half L is extracted by inverse FFT conversion, it becomes the impulse response of the adaptive filter.0 _LIs an L × L zero matrix,I _LIs an L × L unit matrix.
[0041]
(Step 3)
Signal vectory ^{^} _m(J) is added to the vector of predicted echo signalsy ^{^}(J) is obtained.
y ^{^}(J) = Σ^M _{m = 1} y ^{^} _m(J)
(Step 4)
Collected sound signal vector in time domainy(J) and predicted echo vectory ^{^}Residual signal vector from (j)E(J) is obtained and converted to the frequency domain by FFT.
[0042]
[Equation 19]

(Step 5)
[0043]
[Expression 20]

[0044]
[Expression 21]

[0045]
(Step 6)
The residual signal and the input signal are processed in the frequency domain, and the correction vector dH ^{^} _m(J) is obtained.
[0046]
[Expression 22]

[0047]
However, the matrixX ^* _mEach component of (k) is a matrixX _m(k) Complex conjugate of each component.
(Step 7)
line; queue; procession; paradeP(k)
[0048]
[Expression 23]

[0049]
Is the total power spectrum of the input signal obtained by However, X^*Is a complex conjugate of the complex number X, and β is a smoothing constant for taking a short-time average and takes a value of 0 <β <1.
(Step 8)
Ratio γ of reverberation component in the residual signal obtained in step 5²From (f)
[0050]
[Expression 24]

[0051]
By coherence γ²matrix with diagonal element (f)M(J) is obtained. However, μ₀Is set to a fixed value between 0 and 1. Update the adaptive filter with:
H ^{^} _m(J + 1) =H ^{^} _m(J) +M(J)P(J) dH ^{^} _m(J)
line; queue; procession; paradeMThe ratio γ of the reverberation component in the residual signal for each frequency band by multiplying by (j)²The step size is optimally controlled based on (f). line; queue; procession; paradeP(J) is modified vector dH ^{^} _mApplying to (j) corresponds to the whitening process of the input signal, and it is known that the convergence characteristic of the adaptive filter is improved when the input signal is a colored signal like speech.
[0052]
The method of the first embodiment is performed by the echo canceling unit 4 having the configuration shown in FIG.
Input signal x₁(k)_...x_M(k) is a TF conversion unit 411.₁~ 411_MIn step 1, the block is formed and converted to the frequency domain. Then, the filter processing unit 412₁~ 412_MAnd FT converter 413₁~ 413_MThen, the vector addition unit 414 uses the time domain prediction echo signal vector as in steps 2 and 3.y ^{^}(J) is calculated. The collected sound signal y (k) is blocked by the blocking unit 45 so that there is no time lag with the input signal x (k), and the signal vector subtracting unit 42 performs the prediction echo signal vector as in step 4.y ^{^}(J) is subtracted, and the residual signal vector in the frequency domain is obtained by the TF transform unit 431.E(J) is required.
[0053]
The coherence estimation unit 432 generates a frequency domain residual signal vector.E(J) and frequency domain input signal vectorX _mFrom (j), the coherence is calculated according to step 5. A specific configuration of the coherence estimation unit 432 is illustrated in FIGS. 8 and 9. 1st and 2nd correlation removal part 4321 corresponding to each frequency band_m, 4322_mResidual signal vectorE(J) and frequency domain input signal vectorX _m(J) is input, and the coherence calculation unit 4323 is calculated from the short-time spectrum from which the correlation is removed.₁~ 4323_MThus, the coherence is calculated, and the ratio of the echo component in the residual signal is obtained by the echo component ratio calculation unit 4324.
[0054]
Filter update unit 433₁~ 433_MIs the frequency domain input signal vectorX _m(J) and frequency domain residual signal vectorE(J) to obtain a correction vector in the frequency domain according to step 6 and at the same time a matrix according to step 7P(J) is calculated. Then, the correction vector is corrected according to step 8 to update the adaptive filter coefficient. The updated filter coefficient is stored in the filter processing unit 412.₁~ 412_MPassed to.
Example 2
In Example 2, a step size control method based on coherence is described in the literature Emura, Haneda, “Additional signal enhancement type frequency domain stereo adaptive algorithm”, Acoustical Society of Japan 2001 Fall Meeting, pp. A case will be described in which the residual signal is a target signal applied to the multi-channel adaptive algorithm proposed in 537-538 (2001).
[0055]
This adaptive algorithm uses the input signal x_mCorrection signal z instead of (k)_mThe correction vector of the adaptive filter is obtained from (k). For this purpose, the M channel echo cancellation unit 7 in FIG._mIn addition to (k), the correlation fluctuation processing unit 6₁~ 6_MM channel additional signal g generated by_m(U_m(k)) is also input. The correlation fluctuation processing unit 6₁~ 6_MIs a device generally used for improving the echo path estimation performance of a multi-channel echo canceller.
The M channel echo cancellation unit 7 of FIG. 4 updates the coefficient of the adaptive filter according to the following steps.
[0056]
(Step 1)
Receive signal u of each channel_m(k) and received signal u_m(k) Correlation fluctuation processing unit 6_MAdditional signal g obtained by inputting to_m(U_m(k)) and reproduction signal x_m(k) and correction signal z_m(k)
x_m(k) = u_m(k) + g_m(U_m(k))
z_m(k) = au_m(k) + g_m(U_m(k))
(However, m = 1,..., M, 0 <a ≦ 1)
Generate by. Then, each L / D sample is blocked into a signal vector having a length of 2L, and by FFT,
X _m(j) = diag (FFT ([x_m(JL / D-2L + 1), ..., x_m(JL / D)]^T))
Z _m(j) = diag (FFT ([z_m(JL / D-2L + 1), ..., z_m(JL / D)]^T))
(However, m = 1, ..., M)
It converts to the frequency domain as follows.
[0057]
(Step 2)
In the frequency domainX _m(J) andH ^{^} _mBy multiplying (j), the input signal vector is filtered for each channel. Inverse FFT processing of the calculation result and time domain signal vectory ^{^} _m(J) (where m = 1,..., M).
y ^{^} _m(J) = [0 _L I _L] IFFT (x _m(J)H ^{^} _m(J))
However,0 _LIs an L × L zero matrix,I _LIs an L × L unit matrix.
(Step 3)
Signal vectory ^{^} _m(J) Vector of predicted echo signal by adding (m = 1,..., M)y ^{^}(J) is obtained.
[0058]
y ^{^}(J) = Σ^M _{m = 1} y ^{^} _m(J)
(Step 4)
Collected sound signal vector in time domainy(J) and vector of predicted echo signaly ^{^}A residual signal vector is obtained from (j) and converted to the frequency domain by FFT.
[0059]
[Expression 25]

(Step 5)
[0060]
[Equation 26]

[0061]
[Expression 27]

[0062]
(Step 6)
The residual signal and the correction signal are processed in the frequency domain, and the correction vector dH ^{^} _m(J) is obtained.
[0063]
[Expression 28]

[0064]
Calculate according to However, function X_m(J, f), Z_m(J, f) is a matrixX _m(J) and matrixZ _mThis is the (f, f) th element of (j). δ is a minute positive constant for preventing the denominator from becoming zero. line; queue; procession; paradePP (j, f) in (j) is the sum of the cross spectrum of the input signal of each channel and the correction signal.
(Step 8)
Coherence γ found in step 5²From (f)
[0065]
[Expression 29]

[0066]
By coherence γ²matrix with diagonal element (f)M(J) is obtained. However, μ₀Is set to a fixed value between 0 and 1. Update the adaptive filter with:
H ^{^} _m(J + 1) =H ^{^} _m(J) +M(J)P(J) dH ^{^} _m(J)
line; queue; procession; paradeMBy multiplying (j), the step size is optimally controlled based on the ratio of the reverberation component in the target signal for each frequency band. line; queue; procession; paradeP(J) is modified vector dH ^{^} _mApplying to (j) corresponds to the whitening process of the input signal, and it is known that the convergence characteristic of the adaptive filter is improved when the input signal is a colored signal like speech.
[0067]
The inside of the M channel echo canceling unit 7 has a configuration as shown in FIG. Playback signal x_m(k) and correction signal z_mTF conversion unit 702 that TF-converts (k)._m705_mIs a TF conversion unit 411 in FIG._mIt corresponds to.
Adder 701_mThe received signal u_mAdditional signal g in (k)_m(u_m(k)) is added to the reproduction signal x_m(k) is generated and the TF conversion unit 702 is generated._mMatrix byX _mConvert to (j)
Is done. Also, the received signal is u_m(k) is the attenuator 703._mMultiplied by a (where a is a value from 0 to 1) and adder 704_mThe additional signal g_m(u_m(k)) is added to the correction signal z_m(k) is generated. Then, the TF conversion unit 705_mMatrixZ _mconverted to (j).
[0068]
line; queue; procession; paradeX _m(J) is a filter processing unit 712._mThe matrixZ _m(j) is the filter update unit 733._mPassed to. Filter processing unit 712_m , FT converter 713_mThe vector adder 714 generates a predicted echo signal through the processing of step 2 and step 3. The collected sound signal y (k) obtained from the microphone 3 is blocked by the blocking unit 75, the difference from the predicted echo signal vector is taken by the vector subtracting unit 72 according to step 4, and the frequency domain is obtained by the TF converting unit 731. Converted to
The coherence estimation unit 732 generates a residual signal vector in the frequency domainE(J) and input signal vectorX _mCoherence is estimated from step (j) according to step 5.
Filter update unit 733_m(M = 1,..., M) in the frequency domain according to Step 6, Step 7, and Step 8.H ^{^} _m(J) is updated.
[0069]
The numerical simulation result of Example 2 will be described with reference to FIG.
In this numerical simulation, the number of input channels was set to M = 2, the sampling frequency was set to 8 kHz, and the room transfer function measured in a room having a reverberation time of 200 ms as an echo path was cut to 700 taps to generate echo. Further, as the interference signal, a signal in which the level-changing phos noise and the transmission signal are superimposed is used. FIG. 6 shows an echo signal, an interference signal, a collected sound signal = an echo signal + an interference signal, and a residual signal e (k) after application of this method. Using this signal, we compared the conventional method that does not perform step size control with the proposed step size control method.
[0070]
The number of adaptive filter taps per channel was set to L = 512, and D = 4 was set so that the adaptive filter was updated every 128 samples, that is, every 16 ms. Also, μ₀= 0.2. FIG. 7 shows changes in the coefficient error of the adaptive filter. According to this graph, in the section (t = 4 to 6 s) where the interference signal is slightly increased, the coefficient error due to estimation is deteriorated in the conventional method (dotted line). However, in the proposed method (solid line), the estimation of this section is stable. Further, in the section (t = 6 s) in which the interference signal increases suddenly, the coefficient error increases from 0 dB to 8 dB in the conventional method, and the echo path estimation becomes unstable. On the other hand, in the proposed method, the deterioration of the coefficient error in this section is only 1 dB from -6 dB to -5 dB.
[0071]
【The invention's effect】
As described above, according to the present invention, as a correction amount between the adaptive filter coefficient in the frequency domain and the filter coefficient in the immediately preceding frame, the product of the conventional correction vector and the reciprocal of the input signal power is used as a residual signal or By correcting using the coherence between the collected sound signal and the input signal, the echo path estimation of the adaptive filter can be made robust even in the presence of interference signals other than transmission, ambient noise and other echoes. it can.
[Brief description of the drawings]
FIG. 1 is a diagram for explaining the outline of an entire multi-channel acoustic communication apparatus.
FIG. 2 is a diagram illustrating a conventional example.
FIG. 3 is a diagram illustrating an example.
FIG. 4 is a diagram for explaining the outline of the entire multi-channel acoustic communication apparatus including an embodiment.
FIG. 5 is a diagram illustrating another embodiment.
FIG. 6 is a diagram showing a reverberation signal, an interference signal, and a sound collection signal.
FIG. 7 is a diagram showing a numerical simulation result of the example.
FIG. 8 is a diagram for explaining calculation of coherence and echo component ratio.
FIG. 9 is a diagram for explaining correlation component removal calculation.

Claims

Arranged in a common sound field, supplies reproduction signals of M channels to M (M is an integer of 2 or more) speakers, and N (N is an integer of 1 or more) arranged in the common sound field. The process of obtaining the collected sound signal with a microphone,
Converting the M channel reproduction signal into a frequency domain for each short time interval to obtain an M channel frequency domain reproduction signal;
Multiplying the M-channel frequency domain reproduction signal and M adaptive filter coefficients for each microphone to obtain M frequency-domain predicted echo signals for each microphone;
Transforming the M frequency domain predicted echo signals for each microphone into the time domain to obtain M time domain predicted echo signals for each microphone;
Adding M time domain predicted echo signals for each microphone to obtain a time domain predicted echo signal for each microphone;
Subtracting the time-domain predicted echo signal for each microphone from each collected sound signal by the microphone to obtain a residual signal for each microphone;
Converting each of the residual signals for each microphone into a frequency domain, and obtaining a frequency domain residual signal for each microphone;
Obtaining a coherence γ ² _1e (f) between the frequency domain reproduction signal of the first channel and the target signal in the frequency domain, using a collected sound signal or a residual signal as a target signal for each microphone ;
Removing a correlation component with the first to (m-1) th channel frequency domain reproduction signals from the mth channel frequency domain reproduction signal (m is 2 or more);
Removing a correlation component with the first to (m-1) th channel frequency domain reproduction signal from the frequency domain target signal for each microphone;
For each microphone, the m-th channel frequency domain reproduction signal from which the correlation component has been removed and the coherence γ ² _me · ( _m−1)! (F) of the target signal in the frequency domain from which the correlation component has been removed are obtained. Process,
For each microphone, the M coherences γ ² _me · ( _m−1)! (F)
γ ² (f) = 1− (1−γ _1e (f)) (1−γ ² _me · ( _m−1)! (f)) is calculated, and the ratio of the reverberation component in the target signal obtaining γ ² (f);
For each microphone, multiplying the frequency domain residual signal of the M channel and the frequency domain reproduction signal, respectively, to determine the correction amount of the adaptive filter coefficient of the M channel,
For each microphone, the reciprocal of the sum of all the channels of the reproduction signal power is multiplied by the ratio γ ² (f) of the reverberation component in the target signal and the correction amount of the adaptive filter coefficient, and the result is multiplied by the adaptive filter. Having a process of updating the adaptive filter coefficients by adding to the coefficients;
An echo canceling method characterized by the above.

Connected to M speakers (M is an integer of 2 or more), N microphones (N is an integer of 1 or more), M receiving terminals and N transmitting terminals arranged in a common sound field ;
A first TF converter connected to the M receiving terminals and converting the reproduction signal input to the receiving terminal into a frequency domain for each short time interval to obtain an M channel frequency domain reproduction signal; ,
Each of the microphones is connected to the M first TF converters, and the frequency domain reproduction signal of M channels and M adaptive filter coefficients are respectively multiplied by M to obtain M pieces of microphones for each microphone. A filter processing unit for obtaining a frequency domain predicted echo signal of
FT transform that is connected to the M filter processing units for each microphone, converts the M frequency domain predicted echo signals to the time domain, and obtains M time domain predicted echo signals for each microphone. And
A vector adder that is connected to the M FT converters for each microphone, adds the M time-domain predicted echo signals, and obtains a time-domain predicted echo signal for each microphone ;
A signal vector subtracting unit that is connected to the microphone and the vector adding unit for each microphone, and subtracts the time domain predicted echo signal from the collected sound signal for each microphone to obtain a residual signal for each microphone. ,
A second TF converter that is connected to the signal vector subtractor for each microphone, converts each of the residual signals for each microphone into a frequency domain, and obtains a frequency domain residual signal for each microphone;
For each of the microphones, the input first channel is connected to a conversion unit that converts the target signal into a frequency domain and a first TF conversion unit of the first channel using the collected sound signal or the residual signal as a target signal. A first coherence calculation unit for _obtaining a coherence γ _1e (f) of the frequency domain reproduction signal and the target signal in the frequency domain ;
For each of the microphones, the mth channel is connected to the first TF conversion unit (m is an integer equal to or greater than 2) and the first TF conversion unit of the first to mth channels. A first correlation removing unit for removing a correlation component with the first to (m-1) th channel frequency domain reproduction signals;
For each of the microphones, the target signal conversion unit and the first to m-1st channel first TF conversion units are connected, and the first to m-1st channel frequency domain reproduction signals from the frequency domain target signal. A second correlation removal unit for removing a correlation component with
For each of the microphones, the m-th channel frequency domain reproduction signal connected to the first correlation removing unit and the second correlation removing unit and removed by the first correlation removing unit, and the second correlation removing A second coherence calculation unit for obtaining coherence γ ² _me · ( _m−1)! (F) of the target signal in the frequency domain removed by the unit;
Each of the microphones is connected to the first and second coherence calculators, and the M coherences γ ² _me · ( _m−1)! (F)
γ ² (f) = 1− (1−γ _1V (f)) (1−γ ² _mv · ( _m−1) (f)) is calculated, and the ratio γ of the reverberation component in the target signal γ ^{2 An} echo component ratio calculation unit for obtaining (f);
Each of the microphones is connected to the first and second TF conversion units and the reverberation component ratio calculation unit, and multiplies the M channel frequency domain residual signal and the frequency domain reproduction signal, respectively. A correction amount of the filter coefficient is obtained, and for each microphone, the reciprocal of the sum of all channels of the power of the reproduction signal, the ratio γ ² (f) of the reverberation component in the target signal, and the correction amount of the adaptive filter coefficient are obtained. A filter update unit that multiplies the result and adds the result to the adaptive filter coefficient to update the adaptive filter coefficient;
An echo canceling device comprising:

An echo cancellation program for executing each process of the echo cancellation method according to claim 1 by a computer.

A computer-readable recording medium on which the echo cancellation program according to claim 3 is recorded.