JP2004349806A

JP2004349806A - Multichannel acoustic echo canceling method, apparatus thereof, program thereof, and recording medium thereof

Info

Publication number: JP2004349806A
Application number: JP2003141910A
Authority: JP
Inventors: Akira Emura; 暁江村; Yoichi Haneda; 陽一羽田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-05-20
Filing date: 2003-05-20
Publication date: 2004-12-09

Abstract

<P>PROBLEM TO BE SOLVED: To provide a multichannel acoustic echo canceling method capable of increasing a speed required for estimating filter coefficients in a predicted echo path even when gain ratios among a plurality of echo paths are comparatively largely different. <P>SOLUTION: A noncausal component tap L<SB>p</SB>corresponding to a transfer function of an echo path in response to a future input signal and a causal component tap L-L<SB>p</SB>corresponding to the transfer function of the echo paths in response to the past input signals are provided to predictive echo paths 61<SB>1</SB>to 61<SB>N</SB>of each channel, a total sum σ[w<SB>n</SB>] of absolute values of filter coefficients of the noncausal component tap L<SB>p</SB>is obtained by each channel, e.g., for 2 to 3 seconds, and as the σ[w<SB>n</SB>] is greater, an update step size μ<SB>n</SB>of the filter coefficients W<SB>n</SB>(k) of the corresponding channel is increased more. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
この発明は、多チャネル音響再生系を有する通信会議システムに適用され、通話の障害となり、時にはハウリングを引き起こす音響エコーを消去する多チャネル音響エコー消去方法、その装置、そのプログラム及び記録媒体に関するものである。
【０００２】
【従来の技術】
近年、音声画像符号化技術の高能率化とデジタルネットワーク技術の進展により、複数の人が容易に参加でき、より自然な通話環境を提供できる多チャネルの拡声通話方式が研究されはじめている。その実現には、複数スピーカからマイクロホンへの音響的回り込みを消去する多チャネル音響エコー消去装置（キャンセラ）が必要となる。
Ｎ（≧２）チャネルの再生系とＭ（≧１）チャネルの収音系とで構成される通信会議システムは、図５に示すような構成により音響エコーの消去を行う（例えば特許文献１、図９参照）。各受話端子１_１〜１_Ｎからの受話信号は、それぞれ相関変動処理部２_１〜２_Ｎを経て、各スピーカ３_１〜３_Ｎで音響信号として再生されて、音響エコー経路３４_１，１〜３４_Ｎ，Ｍを経て各マイクロホン４_ｍ（ｍ＝１，…，Ｍ）に回り込む。受話側の全Ｎチャネルの相関変動処理部２_１〜２_Ｎの各出力側と送話側のＭチャネルの送話端子５_１〜５_Ｍとのそれぞれの間にＮチャネルエコー消去部６_１〜６_Ｍを接続して音響エコーを消去する。
【０００３】
実際の通信会議では、１人の話者音声が複数のマイクロホンで収音されて多チャネルで送出される時間が大半を占める。この受話信号のチャネル間相互相関が非常に高いために、相関変動処理を適用しない状態では、エコーが消去されている状態であっても推定されたエコー伝達特性と真のエコー伝達特性が必ずしも一致しなくなる。そのため対地で話者が交代し受話信号のチャネル間相互相関が変化した瞬間に、音響エコーが消去されなくなる。受話信号に相関変動処理を適用することで、伝達特性の推定値が真値に収束することが保証されている。
上記Ｎチャネルエコー消去部６_ｍは、各収音チャネルについて、再生側の全Ｎチャネルと収音側の１チャネルとの間のＮ入力１出力の時系列信号を処理する構成をとる。このＮチャネル・エコー消去部６_ｍの構成を相関変動処理部２_１〜２_Ｎは省略している場合について図６に示す。各受話信号ｘ_１（ｋ）…ｘ_Ｎ（ｋ）は予測エコー経路６１_１〜６１_Ｎに入力されてチャネルごとの予測エコー信号が生成され、減算部６２により予測エコー信号とマイクロホン４_ｍからの収音信号との誤差信号ｅが生成される。
【０００４】
ここで、エコー経路推定部６３内部での処理を説明する。マイクロホン４_ｍから収音された信号をｙ（ｋ）、第ｎチャネル（ｎ＝１，…，Ｎ）のスピーカ３_ｎからマイクロホン４_ｍまでの音響エコー経路３４_ｎのインパルス応答をｈ_ｎ（ｋ）、その長さをＬとする。Ｎチャネル受話信号ｘ_１（ｋ）〜ｘ_Ｎ（ｋ）と収音信号ｙ（ｋ）との間には次の関係がある。
ｙ（ｋ）＝Σ_ｉ＝０ ^Ｌ−１ｈ_１（ｉ）ｘ_１（ｋ−ｉ）＋…＋Σ_ｉ＝０ ^Ｌ−１ｈ_Ｎ（ｉ）ｘ_Ｎ（ｋ−ｉ）（１）
各チャネルのエコー経路３４_１〜３４_Ｎの各インパルス応答と受話信号ｘ_ｎ（ｋ）を
ｈ _ｎ＝［ｈ_ｎ（０）…ｈ_ｎ（Ｌ−１）］^Ｔ（ｎ＝１，…，Ｎ）
ｘ _ｎ（ｋ）＝［ｘ_ｎ（ｋ）…ｘ_ｎ（ｋ−Ｌ＋１）］^Ｔ（ｎ＝１，…，Ｎ）（２）
［］^Ｔはベクトルの転置を表わす。
のようにベクトル化すると、Ｎチャネル受話信号と収音信号の関係は次のように記述される。
ｙ（ｋ）＝ｈ _１ ^Ｔｘ _１（ｋ）＋…＋ｈ _Ｎ ^Ｔｘ _Ｎ（ｋ）（３）
【０００５】
以下では、予測エコー経路６１_ｎを構成する適応フィルタの係数修正法としてＮＬＭＳアルゴリズム（ＮｏｒｍａｌｉｚｅｄＬｅａｓｔＭｅａｎＳｑｕａｒｅアルゴリズム）を用いた場合について説明する（例えば特許文献１、３−４頁参照）。
予測エコー経路６１_１〜６１_Ｎ中の適応フィルタ係数のベクトルをｗ _１（ｋ）
…ｗ _Ｎ（ｋ）とする。実際に収音された信号ｙ（ｋ）と、適応フィルタにより予
測された信号との誤差信号ｅ（ｋ）は、減算部６２により
ｅ（ｋ）＝ｙ（ｋ）−（ｗ _１ ^Ｔ（ｋ）ｘ _１（ｋ）＋…＋ｗ _Ｎ ^Ｔ（ｋ）ｘ _Ｎ（ｋ））（４）
で求められる。この誤差信号ｅ（ｋ）に基づき収音信号と予測エコー信号との差が小さくなるようにエコー経路推定部６３にてフィルタ係数が逐次修正され、予測エコー経路６１_１〜６１_Ｎに転送される。各適応フィルタ係数ベクトルは次式で更新される。
ｗ _ｎ（ｋ＋１）＝ｗ _ｎ（ｋ）＋μｅ（ｋ）ｘ _ｎ（ｋ）（ｎ＝１，…，Ｎ）（５）
ただし、μはステップサイズであり、ステップサイズ制御部６４において
μ＝μ_０／（Σ_ｎ＝１ ^Ｎｘ _ｎ ^Ｔ（ｋ）ｘ _ｎ（ｋ））（６）
により決定され、μ_０と入力信号のパワーに基づいて制御される。ただしμ_０は推定を安定にするために、あらかじめ０〜１の値に設定されるパラメータである。
【０００６】
適応フィルタの推定性能は、フィルタ係数の収束速度、すなわちフィルタ係数の相対誤差（Ｍｉｓａｌｉｇｎｍｅｎｔ）
‖ｈ _１−ｗ _１（ｋ）‖^２＋…＋‖ｈ _Ｎ−ｗ _Ｎ（ｋ）‖^２／（‖ｈ _１‖^２＋…＋‖ｈ _Ｎ‖^２）（７）
の減少速度で評価することができる。
【０００７】
【特許文献１】
特開平８−１８１６３９号公報
【特許文献２】
特開２００２−２２３１８２号公報９〜１３頁
【０００８】
【発明が解決しようとする課題】
通常の拡声通信会議では、スピーカとマイクロホンとの間の音響結合がなるべく小さくなるように、指向性マイクロホンを使用するのが通常である。複数のスピーカが存在する部屋では、スピーカとマイクロホン間の距離が同等であっても、指向性マイクロホンの死角に位置するスピーカとそうでないスピーカとの間には１５ｄＢ弱程度の音響結合量の差が容易に生じてしまう。
各音響エコー経路のゲインにこのような偏りがあるとき、適応フィルタによるエコー経路推定の精度が大きく影響されることが知られている。各音響エコー経路のゲインがほぼ均等な場合と比較して、ゲインが著しく不均等な場合では、適応フィルタ係数誤差の収束速度が劣化してしまう。
【０００９】
図７に、ステレオ音響エコー消去装置でのその例を示す。２つの音響エコー経路のゲイン比がそれぞれ１，３，９の各場合について、適応フィルタ係数の相対誤差が初期状態から減少していく様子を図７にプロットした。ただし、受話信号ｘ_１（ｋ），ｘ_２（ｋ）の各パワーを同等に設定し、多チャネル周波数領域適応アルゴリズム（例えば特許文献２参照）を適用している。この周波数領域アルゴリズムは、ＦＦＴ（高速離散的フーリエ変換）を用いて各信号を周波数領域に変換し、周波数ごとにＮＬＭＳベースの処理を行っている。射影アルゴリズム（例えば特許文献１、４−６頁参照）などの適応アルゴリズムにおいてもこのグラフと同様に、エコー経路ゲインが不均等になるほど収束速度が劣化する傾向がある。
この劣化の要因は、各チャネルの適応フィルタの推定誤差と更新ベクトルの大小関係から定性的に説明可能である。一例として、ステレオの場合について、エコー経路ゲイン比√（‖ｈ _２‖^２／‖ｈ _１‖^２）≒９、各チャネル受話信号
パワーがほぼ同等（‖ｘ _１（ｋ）‖≒‖ｘ _２（ｋ）‖）の場合を想定してみる。
適応フィルタはＮＬＭＳアルゴリズムにより更新されるとする。ＮＬＭＳアルゴリズムのフィルタ更新式（５）によれば、第１、２チャネルには同一のステップサイズが適用され、その修正ベクトルはそれぞれμｅ（ｋ）ｘ _１（ｋ）とμｅ（
ｋ）ｘ _２（ｋ）で与えられる。ここで‖ｘ _１（ｋ）‖≒‖ｘ _２（ｋ）‖の関係か
ら第１、２チャネルの修正量の大きさはほぼ等しくなる。
【００１０】
初期状態でフィルタ係数が０にセットされている場合、エコー経路ゲイン比は約９であるから、推定初期の段階で推定誤差は第１チャネルに対し、第２チャネルは９倍程度となる、しかし、適応フィルタの更新量は両チャネルで同等にとどまる。ステップサイズを大きく設定した場合には、エコー経路ゲインの小さい第１チャネルの推定が乱され易く推定に時間がかかる。またステップサイズを小さく設定した場合には、エコー経路ゲインの大きい第２チャネルの推定に時間がかかる。
また第１、第２チャネルの予測エコー経路の適応フィルタ係数誤差が収束した後に、エコー経路ゲイン比がほとんど変化しない程度にマイクロホンの位置がわずかずれた場合についても、第１、第２チャネルでエコー経路の変化量は、エコー経路ゲインと同等の９倍程度になる。しかし第１、第２チャネルの適応フィルタ更新量は同等にとどまる。ステップサイズを大きく設定した場合には、エコー経路ゲインの小さい第１チャネルの推定が乱され易く、推定に時間がかかる。またステップサイズを小さく設定した場合には、エコー経路ゲインの大きい第２チャネルの推定に時間がかかる。
【００１１】
【課題を解決するための手段】
この発明では、適応フィルタのエコー経路推定誤差を推測し、その大きさに応じてステップサイズをチャネルごとに制御する。
しかし本来、予測エコー経路の適応フィルタは未知のエコー経路の特性を推定するために導入されたものである。したがって、適応フィルタの推定誤差の大きさ、すなわち推定途中の予測エコー経路の特性と真のエコー経路の特性との誤差の大きさ、を推測する問題は原理的な困難をはらんでいる。
音響エコーの現在の状態は過去の影響だけを受ける。そこで従来においては、音響エコー経路のインパルス応答が図１Ａに示す状態の場合、その立上りからインパルス応答係数が十分小さくなるまでの時間と対応するＬタップを適応フィルタの全タップの因果成分タップとして用いていた。因果性とは時間的に、原因が必ず結果に先行することであり、原因（入力信号）を加えるより時間的に前には、その結果（出力信号）は生じない性質であり、どのような物理システムも因果性をもっている。
【００１２】
これに対し、この発明では適応フィルタのＬタップ中のＬ_ｐタップを、現在の状態に影響を与えない未来に対し、非因果成分タップとして用い、その残りのＬ−Ｌ_ｐ個のタップを因果成分に用いる。つまり予測エコー経路のフィルタ係数ベクトルｗ _ｎ（ｋ）は、時刻ｋを省略して表記すると、以下のようになる。
ｗ _ｎ＝［ｗ_ｎ（−Ｌ_ｐ）…ｗ_ｎ（−１）ｗ_ｎ（０）…ｗ_ｎ（Ｌ−Ｌ_ｐ −１）］^Ｔ（ｎ＝１，…，Ｎ）
このベクトルｗ _ｎ（ｋ）中の最初のＬ_ｐ個の要素ｗ_ｎ（−Ｌ_ｐ）…ｗ_ｎ（−１）が非因果成分になり、後のＬ−Ｌ_ｐ個の要素ｗ_ｎ（０）…ｗ_ｎ（Ｌ−Ｌ_ｐ −１）が因果成分になる。
【００１３】
時刻がｋ−Ｌ＋１からｋ−Ｌ_ｐまでの入力信号ｘ_ｎ（ｋ）…ｘ_ｎ（ｋ−Ｌ_ｐ）と時刻がｋ−Ｌ_ｐ＋１からｋまでのｘ_ｎ（ｋ−Ｌ_ｐ＋１）…ｘ_ｎ（ｋ）入力信号とに基づく予測とに分けてこのフィルタ係数ベクトルを用いて、予測エコー信号ｙ＾（ｋ−Ｌ_ｐ）を生成すると以下のようになる。
【数１】

この予測エコー信号ｙ＾（ｋ−Ｌ_ｐ）と収音信号との誤差信号が小さくなるよう各予測エコー経路のフィルタ係数ベクトルｗ _１（ｋ）…ｗ _Ｎ（ｋ）の推定を行
う。非因果成分ｗ_ｎ（ｋ−Ｌ_ｐ）…ｗ_ｎ（ｋ−１）は、時刻ｋよりも未来の信号成分から時刻ｋまでの信号に対する音響エコー経路の伝達係数となっている。
【００１４】
適応フィルタによる推定が完了し、各予測エコー経路の特性が真のエコー経路の特性に収束した状態では、予測エコー経路の非因果成分はすべて０に収束し、予測エコー経路の因果成分は真のエコー経路の特性（インパルス応答）に収束する。また推定途中の段階では、予測エコー経路の因果成分と真のエコー経路との差は、予測エコー経路の非因果成分の大きさに連動する。
そこでこの発明では、各予測エコー経路における非因果成分の大きさ、例えばその絶対値総和を、その予測エコー経路における推定誤差の推測値とし、この推測値をもとに、推定誤差の大きいチャネルの適応フィルタが大きく修正され、推定誤差の小さいチャネルの適応フィルタが小さく修正されるように、予測エコー経路ごとにステップサイズを制御する。この制御により、収束速度の向上が期待される。
【００１５】
【発明の実施の形態】
実施形態１
はじめに、図２を用いてこの発明の実施形態１を説明する。図２中の図６と対応する部分に同一参照番号を付けてある。
第１〜第Ｎの予測エコー経路６１_１〜６１_Ｎは、現在の状態に影響を与えるはずの過去との関連を与える因果成分部６１_１ａ〜６１_Ｎａと現在の状態に影響を与えないはずの未来との関連を与える非因果成分部６１_１ｂ〜６１_Ｎｂを備え、マイクロホン４_ｍからの収音信号は、非因果成分部６１_ｎｂのタップ数Ｌ_ｐだけ遅延を与える遅延部６５を通じて減算部６２へ供給される。更にエコー経路推定部６３に推定誤差推測部６６が設けられている点、またステップサイズ制御部６４における処理が従来のものとは異なる。
【００１６】
予測エコー経路６１_１〜６１_Ｎ中の適応フィルタ係数ｗ _１（ｋ）…ｗ _Ｎ（ｋ）について、フィルタ長をＬタップとし、そのうち非因果成分用にＬ_ｐタップを割り当てる。非因果成分用タップＬ_ｐは１〜Ｌ／２、好ましくはＬ／８〜Ｌ／４とする。Ｌ_ｐを大きくすると、予測エコー経路の推定誤差を精度よく推測することができるが、計算量が多くなる。従って推定誤差の精度と計算量との兼合いで決める。時刻ｋにおける予測エコーｙ＾（ｋ−Ｌ_ｐ）を式（８）により生成する。
【数２】

この予測エコーと、遅延部６５を経た収音信号との誤差信号ｅ（ｋ−Ｌ_ｐ）を減算部６２により求める。
ｅ（ｋ−Ｌ_ｐ）＝ｙ（ｋ−Ｌ_ｐ）−ｙ＾（ｋ−Ｌ_ｐ）
【００１７】
エコー経路推定部６３内の推定誤差推測部６６では、一定期間、例えば０．８秒〜６秒ごとに予測エコー経路６１_ｎのフィルタ係数ｗ _ｎ（ｋ）の非因果成分の絶対値総和σ［ｗ _ｎ］＝｜ｗ_ｎ（ｋ，−Ｌ_ｐ）｜＋…＋｜ｗ_ｎ（ｋ，−１）｜を求める。なお、この絶対値総和σ［ｗ _ｎ］の計算は各時刻ｋごとに行うのではなく、前述した一定期間ごとに行うため、時刻ｋの下にカンマ「，」を付けて区別した。この絶対値総和σ［ｗ _ｎ］を予測エコー経路６１_ｎのフィルタ係数ｗ _ｎ（ｋ）の推定誤差の推測値とする。これら推測値σ［ｗ _ｎ］、受話信号ｘ_ｎ（ｋ）（ｎ＝１，…，（Ｎ））を入力としてステップサイズ制御部６４ではステップサイズを
μ_ｎ＝μ_０σ^ｑ［ｗ _ｎ］／（Σ_ｊ＝１ ^Ｎσ^ｑ［ｗ _ｊ］ｘ _ｊ ^Ｔ（ｋ）ｘ _ｊ（ｋ））（９）
により決定する。ただしμ_０は０〜１の値に設定されるパラメータである。σ^ｑ［ｗ _ｊ］はσ［ｗ _ｎ］のｑ乗を表わし、ｑは自然数である。ｑは計算量の点から１が最も好ましく、次に２がよく、３，４でもかまわない。
ステップサイズ制御部６４は各チャネルの受話信号ｘ_ｎ（ｋ）のパワーを推測値σ［ｗ _ｎ］で重み付けして加算し、この加算値で、各チャネルの推測値σ^ｑ［ｗ _ｎ］をμ_０倍した値を割算した結果をチャネルｎに対するステップサイズμ_ｎとする。
【００１８】
エコー経路推定部６３は、ステップサイズ制御部６４で決定したステップサイズμ_ｎを用いて、収音信号と予測エコー信号との誤差信号ｅ（ｋ−Ｌ_ｐ）が小さくなるように、各適応フィルタ係数を
ｗ _ｎ（ｋ＋１）＝ｗ _ｎ（ｋ）＋μ_ｎｅ（ｋ）ｘ _ｎ（ｋ）（ｎ＝１，…，Ｎ）
により更新して、予測エコー経路６１_１〜６１_Ｎに転送する。
ステップサイズμ_ｎは、各チャネル受話信号のパワーも考慮に入れて、次式により決定してもよい。
【数３】

‖ｘ _ｎ（ｋ）‖はベクトルｘ _ｎ（ｋ）のノルムである。ｘ _ｎ（ｋ）＝［ｘ_ｎ（ｋ）…ｘ_ｎ（ｋ−Ｌ＋１）］とすると、ノルムは一般的に
【数４】

で定義される。ただしｐは自然数であり、計算量の点からは１が最もよいが、これより大きくてもよい。
この式（９）では、式（８）における推定誤差の推測値で重みを付けた受話信号パワーの総和をとる際に各チャネルをそのノルムで正規化し、この総和に各チャネルｎの受話信号のノルムを乗算したものでμ_０σ^ｑ［ｗ _ｎ］を割算してチャネルｎに対するステップサイズとする。
【００１９】
実施形態２
次にこの発明を、図５のように受話信号ｘ_ｎ（ｋ）が相関変動処理を受けるケースに適用した場合について説明する。ただしＮチャネルエコーキャンセル部には、受話信号と非線形処理された受話信号が別々に入力されるとする。
予測エコー経路６１_１〜６１_Ｎ中の適応フィルタ係数ｗ _１（ｋ）…ｗ _Ｎ（ｋ）は、フィルタ長がＬタップであり、非因果成分用にＬ_ｐ（＞１）タップを含むとする。また以下では予測エコーの生成処理を周波数領域で行う場合を例として説明する。図３にチャネルｎ（ｎ＝１，…，Ｎ）についての予測エコー生成と、フィルタ係数更新の一部の機能構成例を示す。
【００２０】
ステップ１
各チャネルの受話信号ｚ_ｎ（ｋ）と相関変動処理のための付加信号ｇ_ｎ（ｚ_ｎ（ｋ））（ｎ＝１，…，Ｎ）を、ブロック化部７１_ｎ，７２_ｎでそれぞれＬサンプル毎に長さ２Ｌの信号ベクトルにブロック化し、その各ブロックごとにＦＦＴ変換部７３_ｎ，７４_ｎで高速離散的フーリエ変換（ＦＦＴ）を行って周波数領域の信号Ｕ _ｎ（ｋ），Ｇ _ｎ（ｋ）を求める。
Ｕ _ｎ（ｋ）＝ｄｉａｇ（ＦＦＴ（［ｕ_ｎ（ｋ−２Ｌ＋１），…，ｕ_ｎ（ｋ）］^Ｔ））（ｎ＝１，…，Ｎ）
Ｇ _ｎ（ｋ）＝ｄｉａｇ（ＦＦＴ（［ｇ（ｕ_ｎ（ｋ−２Ｌ＋１）），…，ｇ（ｕ_ｎ（ｋ））］^Ｔ））（ｎ＝１，…，Ｎ）
ただし、ｄｉａｇ（）はベクトルを入力とし、その各要素を対角成分に持つ行列（対角行列）を出力とする関数であり、これは説明の便宜上使用している。
【００２１】
加算部７５_ｎでこれら両行列Ｕ _ｎ（ｋ）とＧ _ｎ（ｋ）を加算して周波数領域で入力信号ベクトルｘ_ｎ（ｋ）の要素を対角成分に持つ行列Ｘ _ｎ（ｋ）を求める。
Ｘ _ｎ（ｋ）＝Ｕ _ｎ（ｋ）＋Ｇ _ｎ（ｋ）
また修正ベクトル生成部７６_ｎで次式により周波数領域での修正用信号ベクトルの要素を対角成分に持つ行列Ｚ _ｎ（ｋ）を生成する。
Ｚ _ｎ（ｋ）＝βＵ _ｎ（ｋ）＋Ｇ _ｎ（ｋ）
ここでβは、適応フィルタによるエコー経路推定を加速するために用いられており、０〜１の値に設定される。
【００２２】
ステップ２
チャネル毎に周波数領域で入力信号Ｘ _ｎ（ｋ）とフィルタ係数Ｗ _ｎ（ｋ）を乗算部７７_ｎで掛けることで、入力信号ベクトルＸ _ｎ（ｋ）をフィルタ処理する。
計算結果を逆ＦＦＴ変換部７８_ｎで逆高速フーリエ変換処理して時間領域の信号ベクトルに変換し、更にブロック整形部７９_ｎでＬ個の要素の信号ベクトルｙ＾_ｎ（ｋ−Ｌ_ｐ）に整形する。つまり次式の処理を行う。
ｙ＾_ｎ（ｋ−Ｌ_ｐ）＝［０ _ＬＩ _Ｌ］ＩＦＦＴ（Ｘ _ｎ（ｋ）Ｗ _ｎ（ｋ））ただし、０ _ＬはＬ×Ｌの零行列、Ｉ _ＬはＬ×Ｌの単位行列である。
ステップ３
時間領域にてチャネル毎に算出された信号ベクトルｙ＾_ｎ（ｋ−Ｌ_ｐ）を加算して予測エコー信号のベクトルｙ＾（ｋ−Ｌ_ｐ）を得る。
ｙ＾（ｋ−Ｌ_ｐ）＝Σ_ｎ＝１ ^Ｎｙ＾_ｎ（ｋ−Ｌ_ｐ）
【００２３】
ステップ４
マイクロホン４からの収音信号ｙ（ｋ）は遅延部６５_ｎでＬ_ｐサンプル遅延され、この遅延信号がブロック化部８１_ｎでＬサンプルごとにブロック化され収音信号ベクトルｙ（ｋ−Ｌ_ｐ）とされ、この収音信号ベクトルｙ（ｋ−Ｌ_ｐ）と予測エコーのベクトルｙ＾（ｋ−Ｌ_ｐ）との時間領域での誤差信号ベクトルｅ（ｋ）が減算部６２で生成される。この誤差信号ベクトルｅ（ｋ）がＦＦＴ変換部８１により周波数領域の信号Ｅ（ｋ）に変換される。
Ｅ（ｋ）＝ＦＦＴ（［０，…，０，ｙ ^Ｔ（ｋ−Ｌ_ｐ）−ｙ＾^Ｔ（ｋ−Ｌ_ｐ）］^Ｔ）この変換は誤差ベクトルｅ（ｋ）の前にＬ個の０を詰めて行う。また
ｙ（ｋ−Ｌ_ｐ）＝［ｙ（ｋ−Ｌ_ｐ−Ｌ＋１）…ｙ（ｋ−Ｌ_ｐ）］^Ｔ
とする。
【００２４】
ステップ５
修正用信号ベクトルＺ _ｎ（ｋ）の複素共役Ｚ ^＊ _ｎ（ｋ）を共役生成部８３_ｎで生成し、これと誤差信号ベクトルＥ（ｋ）とを、周波数領域処理を経由してチャネルごとに乗算部（修正ベクトル生成部）８４_ｎで乗算して畳込み、修正ベクトルｄＷ _ｎ（ｋ）を生成する。この処理は以下のように行う。
ｖ _ｎ（ｋ）＝［Ｉ _Ｌ０ _Ｌ］ＩＦＦＴ（Ｚ ^＊ _ｎ（ｋ）Ｅ（ｋ））
ｄＷ _ｎ（ｋ）＝ＦＦＴ（［ｖ _ｎ ^Ｔ（ｋ），０，…，０］^Ｔ）
ただし行列Ｚ ^＊ _ｎ（ｋ）の各成分は行列Ｚ _ｎ（ｋ）各成分の複素共役である。
ｖ _ｎ（ｋ）のＦＦＴ変換はＬ個の０を後に詰めて行う。
【００２５】
ステップ６
ステップサイズ行列生成部８５_ｎで生成した行列形式のステップサイズＳ _ｎ（ｋ）を修正ベクトルｄＷ _ｎ（ｋ）に乗算部（修正ベクトル補正部）８６_ｎで乗算し、この結果Ｓ _ｎ（ｋ）ｄＷ _ｎ（ｋ）でフィルタ更新部８７_ｎのフィルタ係数Ｗ _ｎ（ｋ）を次式で更新する。
Ｗ _ｎ（ｋ＋Ｌ）＝Ｗ _ｎ（ｋ）＋Ｓ _ｎ（ｋ）ｄＷ _ｎ（ｋ）
行列形式のステップサイズＳ _ｎ（ｋ）は次式で表わされる。
Ｓ _ｎ（ｋ）＝μ_０σ^ｑ［ｗ _ｎ］ｄｉａｇ（［１／ｄ（ｋ，１）…１／ｄ（ｋ，２Ｌ
］）（１１）
μ_０は推定を安定化するために０〜１の値に設定されるパラメータである。
【００２６】
ｄ（ｋ，ｉ）＝ｔ_ｃｄ（ｋ−Ｌ，ｉ）＋（１−ｔ_ｃ）Σ_ｎ＝１ ^Ｎσ^ｑ［ｗ _ｎ］｜Ｔ（Ｘ _ｎ（ｋ），ｉ）Ｔ（Ｚ _ｎ（ｋ），ｉ）｜（ｉ＝１，…，２Ｌ）（１２）
ｔ_ｃは時定数であり０〜１の値に設定する。関数Ｔ（Ｘ _ｎ（ｋ），ｉ）は行列Ｘ _ｎ（ｋ）の（ｉ，ｉ）番目の要素を引き出す関数である。つまり行列Ｓ（ｋ）のｄ（ｋ，ｉ）は受話信号Ｚ _ｎ（ｋ）と入力信号Ｘ _ｎ（ｋ）との各周波数成分ごとの積を推定誤差の推測値σ^ｑ［ｗ _ｎ］で重み付けした短時間平均の総和の前回の値と今回の値とをｔ_ｃで重み付け加算して今回の短時間平均総和としたものである。
【００２７】
各予測エコー経路のインパルス応答（フィルタ係数）ｗ _ｎ（ｋ）は、フィルタ更新部８７_ｎの周波数領域でのフィルタ係数ｗ _ｎ（ｋ）から逆ＦＦＴ部８８_ｎで次式による逆ＦＦＴ処理されブロック整形されて求められる。
ｗ _ｎ（ｋ）＝［Ｉ _Ｌ０ _Ｌ］ＦＦＴ^−１（Ｗ _ｎ（ｋ））
このインパルス応答ｗ _ｎ（ｋ）中の非因果成分ｗ_ｎ（ｋ−Ｌ_ｐ）…ｗ_ｎ（ｋ−１）の絶対値の総和σ［ｗ _ｎ］が推定誤差の推測値として推定誤差推測部８９_ｎで計算される。
【００２８】
行列形式のステップサイズとして、各チャネルの入力信号パワーも考慮に入れ、下記の行列Ｓ′_ｎ（ｋ）を用いることもできる。
【数５】

【数６】

【数７】

このｒ_ｎ（ｋ）により各チャネルの入力信号パワーが、ステップサイズ制御に反映されている。ただしｂは自然数であり、計算量からはｂ＝１が最も好ましい。｜Ｒｅ［Ｔ（Ｚ _ｎ（ｋ），ｉ）］｜，｜Ｉｍ［Ｔ（Ｚ _ｎ（ｋ），ｉ）］｜は複素数Ｔ（Ｚ _ｎ（ｋ），ｉ）の実部と虚部の大きさである。
【００２９】
実験例
実際に２チャネル再生系と１チャネル収音系に接続された音響エコー消去装置について、数値シミュレーションを行った。シミュレーションでは、サンプリング周波数を８ｋＨｚに設定し、音響エコー経路として残響時間２００ｍｓの部屋で実測した室内伝達関数を７００タップに打ち切って音響エコーを生成した。また相互相関の高い２チャネル受話信号ｕ_１（ｋ），ｕ_２（ｋ）は、２本のマイクで単一話者の音声を収音している状況を模擬することで生成し、信号パワーが同等になるように調整した。
【００３０】
従来法として、文献江村、羽田、“付加信号強調型の周波数領域ステレオ適応アルゴリズム”（日本音響学会２００１年秋季公演論文集、２００１、ｐｐ．５３７−５３８）で提案されている周波数領域適応アルゴリズムを因果成分に対応する６４０タップのみを各チャネルに持つ適応フィルタに適用した。またこの発明方法として、実施形態２の適応アルゴリズムを非因果成分に１２８タップを含む６４０タップを各チャネルに持つ適応フィルタに適用した。どちらの方法においても適応フィルタは８０ｍｓごとに更新される。この発明方法ではステップサイズを２．４ｓ間隔で式（１１）及び（１２）により制御した。ただし、μ_０＝０．５に設定した。適応フィルタの修正用信号ベクトルＺ _ｎ（ｋ）において、相関変動成分を強調するためのパラメータについては、β＝０．２に設定した。
【００３１】
相関変動処理には、文献Ｐ．Ｅｎｅｒｏｔｈ、Ｔ．Ｇａｅｎｓｌｅｒ、Ｓ．ＧａｙａｎｄＪ．Ｂｅｎｅｓｔｙ、“Ｓｔｕｄｉｅｓｏｆａｗｉｄｅｂａｎｄｓｔｅｒｅｏｐｈｏｎｉｃａｃｏｕｓｔｉｃｅｃｈｏｃａｎｃｅｌｅｒ、”Ｐｒｏｃ．１９９９ＩＥＥＥＷｏｒｋｓｈｏｐｏｎＡｐｐｌｉｃａｔｉｏｎｓｏｆＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇｔｏＡｕｄｉｏａｎｄＡｃｏｕｓｔｉｃｓ、ｐｐ．２０７−２１０（１９９９）で用いられている半波整流方式
ｇ_１（ｕ（ｋ））＝α（ｕ（ｋ）＋｜ｕ（ｋ）｜）／２
ｇ_２（ｕ（ｋ））＝α（ｕ（ｋ）−｜ｕ（ｋ）｜）／２
を聴感上違和感のほとんどないα＝０．３で適用し、スピーカ再生信号ｘ_１（ｋ），ｘ_２（ｋ）として
ｘ_１（ｋ）＝ｕ_１（ｋ）＋ｇ_１（ｕ_１（ｋ））
ｘ_２（ｋ）＝ｕ_２（ｋ）＋ｇ_２（ｕ_２（ｋ））
を用いた。
【００３２】
この発明方法と従来法との推定性能を、音響エコー経路のインパルス応答と適応フィルタのインパルス応答との相対推定誤差（Ｍｉｓａｌｉｇｎｍｅｎｔ）を式（７）により評価した。
この発明方法と従来法について初期状態からの相対推定誤差の挙動を図４Ａ，Ｂにそれぞれ示す。ここでは、エコー経路ゲイン比が１と９のときに、相対誤差が−１５ｄＢに達するまでの時間Ｔ′_１とＴ′_９（この発明方法）およびＴ_１とＴ_９（従来法）に注目する。この発明方法によって、エコー経路ゲイン比が９のとき、Ｔ_９→Ｔ′_９の改善は３０％である。一方エコー経路ゲイン比が１のとき、Ｔ_１→Ｔ′_１の改善は２１％になる。このようにこの発明方法によって、エコー経路推定が高速化されると同時に、エコー経路ゲイン比の相違がエコー経路推定に与える影響が小さくなっている。
【００３３】
実施形態１を図５に示した相関変動処理を行う場合に適用してもよい。この場合は受話信号ｕ_ｎ（ｋ）とその相関変動処理を受けた信号ｇ_ｎ（ｕ_ｎ（ｋ））との和をスピーカへ供給する再生信号ｘ_ｎ（ｋ）＝ｕ_ｎ（ｋ）＋ｇ_ｎ（ｕ_ｎ（ｋ））とし、修正用信号ｚ_ｎ（ｋ）＝βｕ_ｎ（ｋ）＋ｇ_ｎ（ｋ）とし（０＜β＜１）、この修正用信号のベクトルｚ _ｎ（ｋ）と誤差信号ベクトルｅ（ｋ）とから修正ベクトルｄｗ _ｎを生成し、また式（９）又は式（１０）中のｘ _ｊ（ｋ）の代りにｚ _ｊ（ｋ）とおきかえて求めた第ｎチャネルのステップサイズμ_ｎをｄｗ _ｎに掛けて、この結果にフィルタ係数ｗ _ｎ（ｋ）に加算して更新を行えばよい。
【００３４】
実施形態２に示した周波数領域でのエコーの予測及びフィルタ更新処理を、受話信号に対し相関変動処理を行わない場合にも適用することができる。この場合は周波数領域の受話信号Ｘ _ｎ（ｋ）と誤差信号Ｅ（ｋ）とから実施形態２のステップ５と同様な処理により、周波数領域の修正ベクトルｄｗ _ｎ（ｋ）を求め、更に式（１１）、（１２）又は式（１３）、（１４）、（１５）においてＺ _ｎ（ｋ）の代りにＸ _ｎ（ｋ）を用いて、行列形式のステップサイズＳ _ｎ（ｋ）を求めればよい。
【００３５】
更に上述した音響エコー消去装置をＭ個用いて、図５に示したＮチャネル再生系、Ｍチャネル収音系のシステムにもこの発明を適用できる。
図２、図３にそれぞれ示した多チャネル音響エコー消去装置をコンピュータにより機能させてもよい。この場合は前述したこの発明の多チャネル音響エコー消去方法の各ステップをコンピュータにより実行させるためのプログラムをＣＤ−ＲＯＭ、磁気ディスクなどの記録媒体から、あるいは通信回線を介してコンピュータにダウンロードして、そのプログラムを実行させればよい。
【００３６】
【発明の効果】
この発明によれば、適応フィルタに非因果成分を含ませ、非因果成分量から多チャネル予測エコー経路の各適応フィルタ係数の推定誤差を推測してチャネルごとにステップサイズを制御することで、エコー経路ゲインが不均等な状況でもエコー経路推定の速度を向上させることができる。
【図面の簡単な説明】
【図１】音響エコー経路のインパルス応答と、従来法の予測エコー経路の因果成分タップ及びこの発明方法の非因果成分タップ及び因果成分タップとの関係例を示す図。
【図２】この発明の実施形態１の機能構成例を示す図。
【図３】この発明の実施形態２における第ｎチャネル部分の機能構成例を示す図。
【図４】この発明方法及び従来法による相対推定誤差の収束の様子を示すシミュレーション結果を示す図。
【図５】従来のＮチャネル再生系とＭチャネル収音系で構成される通信会議システムの構成例を示す図。
【図６】従来のＮチャネル音響エコー消去装置の機能構成例を示す図。
【図７】エコー経路ゲイン比をパラメータとした適応予測フィルタ係数の相対予測誤差の収束の様子を示す図。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention is applied to a communication conference system having a multi-channel sound reproduction system, and relates to a multi-channel sound echo canceling method for canceling an acoustic echo which sometimes causes a howling, which causes a trouble in a call, a device thereof, a program therefor, and a recording medium. is there.
[0002]
[Prior art]
2. Description of the Related Art In recent years, with the increase in the efficiency of audio-video coding technology and the development of digital network technology, a multi-channel loudspeaker communication system that allows a plurality of people to easily participate and provide a more natural communication environment has begun to be studied. To achieve this, a multi-channel acoustic echo canceller (canceller) that eliminates acoustic sneak from a plurality of speakers to the microphone is required.
A communication conference system including a reproduction system of N (≧ 2) channels and a sound collection system of M (≧ 1) channels performs elimination of acoustic echo by a configuration as shown in FIG. (See FIG. 9). Each receiving terminal 1₁~ 1_NReceived from the correlation variation processing unit 2₁~ 2_NThrough each speaker 3₁~ 3_NIs reproduced as an acoustic signal by the acoustic echo path 34._1,1~ 34_{N, M}  Through each microphone 4_m(M = 1,..., M). Correlation variation processing unit 2 for all N channels on the receiving side₁~ 2_NM-channel transmission terminal 5 on each output side and transmission side₁~ 5_MN-channel echo canceller 6 between each₁~ 6_MTo cancel the acoustic echo.
[0003]
In an actual communication conference, most of the time when one speaker's voice is picked up by a plurality of microphones and transmitted through multiple channels. Since the cross-correlation between channels of the received signal is very high, in a state where the correlation variation processing is not applied, the estimated echo transfer characteristic and the true echo transfer characteristic are not necessarily the same even when the echo is canceled. I will not do it. Therefore, at the moment when the speaker changes at the ground and the cross-correlation between channels of the received signal changes, the acoustic echo is not canceled. By applying the correlation variation processing to the reception signal, it is guaranteed that the estimated value of the transfer characteristic converges to the true value.
The N-channel echo canceling unit 6_mIs configured to process, for each sound pickup channel, a time-series signal of N inputs and 1 output between all N channels on the reproduction side and one channel on the sound pickup side. This N-channel echo canceller 6_mOf the correlation variation processing unit 2₁~ 2_N6 is shown in FIG. Each received signal x₁(K) ... x_N(K) is the predicted echo path 61₁  ~ 61_N  And a predicted echo signal for each channel is generated._m, An error signal e from the collected sound signal is generated.
[0004]
Here, the processing inside the echo path estimation unit 63 will be described. Microphone 4_m, A signal picked up from the speaker 3 of the n-th channel (n = 1,..., N)_nTo microphone 4_mAcoustic echo path 34 to_nThe impulse response of_n(K), and its length is L. N channel reception signal x₁(K) to x_NThe following relationship exists between (k) and the picked-up signal y (k).
y (k) = Σ_{i = 0} ^L-1h₁(I) x₁(Ki) + ... + Σ_{i = 0} ^L-1h_N(I) x_N(Ki) (1)
Echo path 34 for each channel₁~ 34_NEach impulse response and received signal x_n(K)
h _n= [H_n(0) ... h_n(L-1)]^T(N = 1,..., N)
x _n(K) = [x_n(K) ... x_n(K-L + 1)]^T(N = 1,..., N) (2)
[]^TRepresents the transposition of a vector.
Then, the relationship between the N-channel received signal and the collected signal is described as follows.
y (k) =h ₁ ^T x ₁(K) + ... +h _N ^T x _N(K) (3)
[0005]
In the following, the predicted echo path 61_nA case where an NLMS algorithm (Normalized Least Mean Square algorithm) is used as a coefficient correction method of the adaptive filter that constitutes is described (for example, see Patent Documents 1 and 3-4).
Predicted echo path 61₁~ 61_NVector of adaptive filter coefficientsw ₁(K)
…w _N(K). The signal y (k) actually picked up and the adaptive filter
An error signal e (k) from the measured signal is calculated by a subtractor 62.
e (k) = y (k)-(w ₁ ^T(K)x ₁(K) + ... +w _N ^T(K)x _N(K)) (4)
Is required. Based on the error signal e (k), the echo path estimating unit 63 sequentially corrects the filter coefficients so that the difference between the collected sound signal and the predicted echo signal is reduced, and the predicted echo path 61₁~ 61_NIs forwarded to Each adaptive filter coefficient vector is updated by the following equation.
w _n(K + 1) =w _n(K) + μe (k)x _n(K) (n = 1,..., N) (5)
Here, μ is a step size, and the step size control unit 64
μ = μ₀/ (Σ_{n = 1} ^N x _n ^T(K)x _n(K)) (6)
Determined by μ₀And the power of the input signal. Where μ₀Is a parameter that is previously set to a value of 0 to 1 in order to stabilize the estimation.
[0006]
The estimation performance of the adaptive filter is based on the convergence speed of the filter coefficient, that is, the relative error of the filter coefficient
‖h ₁−w ₁(K) ‖²+… + ‖h _N−w _N(K) ‖²/ (‖h ₁‖²+ ... + ‖h _N‖²) (7)
Can be evaluated by the reduction rate of
[0007]
[Patent Document 1]
JP-A-8-181639
[Patent Document 2]
JP-A-2002-223182, pages 9 to 13
[0008]
[Problems to be solved by the invention]
In a normal loudspeaker conference, a directional microphone is usually used so that acoustic coupling between a speaker and a microphone is as small as possible. In a room where a plurality of speakers are present, even if the distance between the speakers and the microphones is the same, there is a difference of about 15 dB between the speakers located in the blind spot of the directional microphone and the speakers not located there. It easily happens.
It is known that when the gain of each acoustic echo path has such a bias, the accuracy of the echo path estimation by the adaptive filter is greatly affected. When the gains of the acoustic echo paths are substantially equal to each other, the convergence speed of the adaptive filter coefficient error is deteriorated when the gains are significantly uneven compared to the case where the gains are substantially equal.
[0009]
FIG. 7 shows an example of a stereo acoustic echo canceller. FIG. 7 is a graph plotting how the relative error of the adaptive filter coefficient decreases from the initial state when the gain ratio of the two acoustic echo paths is 1, 3, and 9, respectively. However, the reception signal x₁(K), x₂Each power of (k) is set to be equal, and a multi-channel frequency domain adaptive algorithm (for example, see Patent Document 2) is applied. This frequency domain algorithm converts each signal into a frequency domain using FFT (Fast Discrete Fourier Transform), and performs NLMS-based processing for each frequency. In the case of an adaptive algorithm such as a projection algorithm (for example, see Patent Document 1 and pages 4 to 6), the convergence speed tends to deteriorate as the echo path gain becomes uneven, as in this graph.
The cause of this deterioration can be qualitatively explained from the magnitude relationship between the estimation error of the adaptive filter of each channel and the update vector. As an example, for a stereo case, the echo path gain ratio √ (√h ₂‖²/ ‖h ₁‖²) ≒ 9, Received signal for each channel
Power is almost equal (‖x ₁(K) ‖ ≒ ‖x ₂(K) Let's assume the case of ①).
Assume that the adaptive filter is updated by the NLMS algorithm. According to the filter update equation (5) of the NLMS algorithm, the same step size is applied to the first and second channels, and their correction vectors are respectively μe (k)x ₁(K) and μe (
k)x ₂(K). here‖x ₁(K) ‖ ≒ ‖x ₂(K) Relationship
Therefore, the magnitudes of the correction amounts of the first and second channels are substantially equal.
[0010]
If the filter coefficient is set to 0 in the initial state, the echo path gain ratio is about 9, so that the estimation error in the initial stage of the estimation is about 9 times that of the first channel compared to the first channel. , The update amount of the adaptive filter remains the same in both channels. When the step size is set large, the estimation of the first channel having a small echo path gain is easily disturbed, and it takes a long time to perform the estimation. If the step size is set small, it takes time to estimate the second channel having a large echo path gain.
Also, when the position of the microphone is slightly shifted to such a degree that the echo path gain ratio hardly changes after the adaptive filter coefficient error of the predicted echo path of the first and second channels converges, the echo in the first and second channels is also reduced. The amount of change in the path is about nine times the same as the echo path gain. However, the adaptive filter update amounts of the first and second channels remain equivalent. When the step size is set large, the estimation of the first channel having a small echo path gain is easily disturbed, and it takes time to estimate. If the step size is set small, it takes time to estimate the second channel having a large echo path gain.
[0011]
[Means for Solving the Problems]
According to the present invention, the echo path estimation error of the adaptive filter is estimated, and the step size is controlled for each channel according to the estimated error.
However, the adaptive filter of the predicted echo path is originally introduced to estimate the characteristics of the unknown echo path. Therefore, the problem of estimating the magnitude of the estimation error of the adaptive filter, that is, the magnitude of the error between the characteristics of the predicted echo path during the estimation and the characteristics of the true echo path, involves fundamental difficulty.
The current state of the acoustic echo is only affected by the past. Therefore, conventionally, when the impulse response of the acoustic echo path is in the state shown in FIG. 1A, the L tap corresponding to the time from the rise to the impulse response coefficient being sufficiently small is used as the causal component tap of all taps of the adaptive filter. I was Causality means that the cause always precedes the result in time, and the result (output signal) does not occur before the cause (input signal) is added. Physical systems also have causality.
[0012]
On the other hand, in the present invention, L in the L tap of the adaptive filter is_pTaps are used as non-causal component taps for the future that do not affect the current state, and the remaining LL_pAre used as the causal component. That is, the filter coefficient vector of the predicted echo pathw _n(K) is as follows when the time k is omitted.
w _n= [W_n(-L_p) ... w_n(-1) w_n(0) ... w_n(L-L_p  -1)]^T(N = 1,..., N)
This vectorw _nThe first L in (k)_p  Elements w_n(-L_p) ... w_n(-1) becomes a non-causal component, and the following LL_p  Elements w_n(0) ... w_n(L-L_p  -1) is the causal component.
[0013]
Time is k-L + 1 to k-L_p  Input signal x up to_n(K) ... x_n(K-L_p) And time is KL_p  X from +1 to k_n(K-L_p  +1) ... x_n(K) The prediction echo signal y て (k−L_p  ) Is generated as follows.
(Equation 1)

This predicted echo signal y ＾ (k−L_p ) And the filter coefficient vector of each predicted echo path so that the error signal betweenw ₁(K) ...w _NEstimate (k)
U. Non-causal component w_n(K-L_p ) ... w_n(K-1) is a transfer coefficient of the acoustic echo path for a signal from a signal component in the future from time k to time k.
[0014]
When the estimation by the adaptive filter is completed and the characteristics of each predicted echo path converge to the characteristics of the true echo path, all the non-causal components of the predicted echo path converge to 0, and the causal components of the predicted echo path become true. It converges on the characteristics of the echo path (impulse response). Also, at the stage of estimation, the difference between the causal component of the predicted echo path and the true echo path is linked to the magnitude of the non-causal component of the predicted echo path.
Therefore, in the present invention, the magnitude of the non-causal component in each predicted echo path, for example, the sum of its absolute values is used as an estimated value of the estimation error in the predicted echo path, and based on the estimated value, The step size is controlled for each predicted echo path so that the adaptive filter is largely modified and the adaptive filter of the channel with a small estimation error is modified small. This control is expected to improve the convergence speed.
[0015]
BEST MODE FOR CARRYING OUT THE INVENTION
Embodiment 1
First, Embodiment 1 of the present invention will be described with reference to FIG. Parts corresponding to those in FIG. 6 in FIG. 2 are given the same reference numerals.
1st to Nth predicted echo paths 61₁~ 61_NIs a causal component 61 that gives a relation to the past that should affect the current state._1a~ 61_NaNon-causal component part 61 that associates the future with the future that should not affect the current state_1b~ 61_NbWith microphone 4_mFrom the non-causal component unit 61_nbTap number L_pThe signal is supplied to the subtraction unit 62 through a delay unit 65 that gives a delay by only Further, the echo path estimating unit 63 is provided with an estimation error estimating unit 66, and the processing in the step size control unit 64 is different from the conventional one.
[0016]
Predicted echo path 61₁~ 61_NAdaptive filter coefficients inw ₁(K) ...w _N(K)), The filter length is set to L taps, of which L is used for the non-causal component._pAssign taps. Tap L for non-causal component_pIs 1 to L / 2, preferably L / 8 to L / 4. L_pWhen is increased, the estimation error of the predicted echo path can be accurately estimated, but the amount of calculation increases. Therefore, it is determined based on a balance between the accuracy of the estimation error and the amount of calculation. Predicted echo y ＾ (k−L at time k)_p) Is generated by equation (8).
(Equation 2)

An error signal e (k−L) between the predicted echo and the picked-up signal passed through the delay unit 65_p) Is obtained by the subtractor 62.
e (k-L_p) = Y (k−L)_p) -Y ＾ (kL_p)
[0017]
The estimation error estimating unit 66 in the echo path estimating unit 63 performs the prediction echo path 61 at regular intervals, for example, every 0.8 to 6 seconds._nFilter coefficientw _n(K) sum of absolute values of non-causal components σ [w _n] = | W_n(K, -L_p) | + ... + | w_n(K, -1) |. Note that this absolute value sum σ [w _n] Is calculated not at every time k but at the above-mentioned fixed period, and thus a comma “,” is added below the time k to distinguish them. This absolute value sum σ [w _n] To the predicted echo path 61_nFilter coefficientw _nThe estimated value of the estimation error of (k) is used. These estimates σ [w _n], Receiving signal x_nWith (k) (n = 1,..., (N)) as input, the step size control unit 64 determines the step size.
μ_n= Μ₀σ^q[w _n] / (Σ_{j = 1} ^Nσ^q[w _j]x _j ^T(K)x _j(K)) (9)
Determined by Where μ₀Is a parameter set to a value of 0 to 1. σ^q[w _j] Is σ [w _n] To the q-th power, where q is a natural number. q is most preferably 1 in terms of the amount of calculation, then 2 is good, and 3 or 4 may be used.
The step size control unit 64 controls the reception signal x of each channel._nThe power of (k) is estimated as σ [w _n], And the sum is used to calculate the estimated value σ of each channel.^q[w _n] To μ₀The result of dividing the multiplied value is the step size μ for channel n._nAnd
[0018]
The echo path estimation unit 63 calculates the step size μ determined by the step size control unit 64._n, An error signal e (k−L) between the collected signal and the predicted echo signal._p) Is reduced so that each adaptive filter coefficient
w _n(K + 1) =w _n(K) + μ_ne (k)x _n(K) (n = 1,..., N)
To update the predicted echo path 61₁~ 61_NTransfer to
Step size μ_nMay be determined by the following equation, taking into account the power of each channel reception signal.
(Equation 3)

‖x _n(K) ‖ is a vectorx _nThis is the norm of (k).x _n(K) = [x_n(K) ... x_n(K−L + 1)], the norm is generally
(Equation 4)

Is defined by However, p is a natural number, and although 1 is the best in terms of the amount of calculation, it may be larger.
In this equation (9), when the sum of the received signal power weighted by the estimated value of the estimation error in the equation (8) is calculated, each channel is normalized by its norm. Μ multiplied by the norm₀σ^q[w _n] To obtain the step size for channel n.
[0019]
Embodiment 2
Next, the present invention is applied to the reception signal x as shown in FIG._nThe case where (k) is applied to the case of undergoing the correlation variation process will be described. However, it is assumed that the received signal and the received signal subjected to the non-linear processing are separately input to the N-channel echo canceling unit.
Predicted echo path 61₁~ 61_NAdaptive filter coefficients inw ₁(K) ...w _N(K) shows that the filter length is L taps and L for non-causal components._p(> 1) It is assumed that a tap is included. In the following, a case will be described as an example where the generation processing of the predicted echo is performed in the frequency domain. FIG. 3 shows an example of a functional configuration of generating a predicted echo for channel n (n = 1,..., N) and updating a filter coefficient.
[0020]
Step 1
Received signal z of each channel_n(K) and additional signal g for correlation variation processing_n(Z_n(K)) (n = 1,..., N)_n, 72_nIs divided into signal vectors having a length of 2L for each L sample, and the FFT conversion unit 73_n, 74_nPerforms high-speed discrete Fourier transform (FFT) on the frequency domain signalU _n(K),G _nFind (k).
U _n(K) = diag (FFT ([u_n(K−2L + 1),..., U_n(K)]^T)) (N = 1,..., N)
G _n(K) = diag (FFT ([g (u_n(K−2L + 1)),..., G (u_n(K))]^T)) (N = 1,..., N)
Note that diag () is a function that receives a vector as an input and outputs a matrix (diagonal matrix) having each element as a diagonal component, and is used for convenience of description.
[0021]
Adder 75_nIn both these matricesU _n(K) andG _n(K) and add the input signal vector x in the frequency domain._nMatrix with diagonal elements of (k)X _nFind (k).
X _n(K) =U _n(K) +G _n(K)
The correction vector generation unit 76_nA matrix having diagonal elements of the signal vector for correction in the frequency domain by the following equationZ _n(K) is generated.
Z _n(K) = βU _n(K) +G _n(K)
Here, β is used to accelerate the echo path estimation by the adaptive filter, and is set to a value of 0 to 1.
[0022]
Step 2
Input signal in frequency domain for each channelX _n(K) and filter coefficientW _nMultiplying unit 77 by (k)_nMultiply by the input signal vectorX _n(K) is filtered.
Invert the FFT transform unit 78_nPerforms an inverse fast Fourier transform to convert the signal vector into a time-domain signal vector._nAnd the signal vector of L elementsy＾_n(K-L_p). That is, the processing of the following equation is performed.
y＾_n(K-L_p) = [0 _L  I _L] IFFT (X _n(K)W _n(K))0 _LIs an L × L zero matrix,I _LIs an L × L unit matrix.
Step 3
Signal vector calculated for each channel in the time domainy＾_n(K-L_p) And the vector of the predicted echo signaly＾ (k−L_pGet)
y＾ (k−L_p) = Σ_{n = 1} ^N y＾_n(K-L_p)
[0023]
Step 4
The sound pickup signal y (k) from the microphone 4 is supplied to the delay unit 65_nIn L_pThe sampled signal is delayed, and this delayed signal is_nAnd the collected sound signal vectory(K-L_p) And the picked-up signal vectory(K-L_p) And the predicted echo vectory＾ (k−L_p) And the error signal vector in the time domaine(K) is generated by the subtraction unit 62. This error signal vectore(K) is a signal in the frequency domain by the FFT converter 81E(K).
E(K) = FFT ([0, ..., 0,y ^T(K-L_p)-y＾^T(K-L_p)]^TThis transformation is an error vectoreThis is performed by packing L zeros before (k). Also
y(K-L_p) = [Y (k−L_p−L + 1)... Y (k−L_p)]^T
And
[0024]
Step 5
Correction signal vectorZ _nComplex conjugate of (k)Z ^* _n(K) is converted to a conjugate generation unit 83_nAnd the error signal vectorE(K) and a multiplication unit (correction vector generation unit) 84 for each channel via frequency domain processing_n, Convolution and correction vector dW _n(K) is generated. This processing is performed as follows.
v _n(K) = [I _L  0 _L] IFFT (Z ^* _n(K)E(K))
dW _n(K) = FFT ([v _n ^T(K), 0, ..., 0]^T)
Where the matrixZ ^* _nEach component of (k) is a matrixZ _n(K) Complex conjugate of each component.
v _nThe FFT transform of (k) is performed by stuffing L zeros later.
[0025]
Step 6
Step size matrix generator 85_nStep size in the matrix format generated byS _n(K) is modified vector dW _n(K) is a multiplication unit (correction vector correction unit) 86_nMultiply byS _n(K) dW _nIn (k), the filter updating unit 87_nFilter coefficientW _n(K) is updated by the following equation.
W _n(K + L) =W _n(K) +S _n(K) dW _n(K)
Step size in matrix formatS _n(K) is represented by the following equation.
S _n(K) = μ₀σ^q[w _n] Diag ([1 / d (k, 1) ... 1 / d (k, 2L
]) (11)
μ₀  Is a parameter set to a value between 0 and 1 to stabilize the estimation.
[0026]
d (k, i) = t_cd (k-L, i) + (1-t_c) Σ_{n = 1} ^Nσ^q[w _n] | T (X _n(K), i) T (Z _n(K), i) | (i = 1,..., 2L) (12)
t_cIs a time constant and is set to a value from 0 to 1. Function T (X _n(K), i) are matricesX _nThis is a function for extracting the (i, i) -th element of (k). That is, a matrixSD (k, i) of (k) is the reception signalZ _n(K) and input signalX _nThe product of (k) and each frequency component is calculated as the estimated value σ of the estimation error.^q[w _nThe previous value and the current value of the sum of the short-time averages weighted by_cThe weighted addition is performed to obtain the short-time average sum of this time.
[0027]
Impulse response (filter coefficient) of each predicted echo pathw _n(K) shows the filter updating unit 87_nFilter coefficients in the frequency domain ofw _nThe inverse FFT unit 88 from (k)_nIn this case, inverse FFT processing is performed by the following equation, and block shaping is performed.
w _n(K) = [I _L  0 _L] FFT^-1(W _n(K))
This impulse responsew _nNon-causal component w in (k)_n(K-L_p) ... w_nSum of absolute values of (k-1) σ [w _n] Is an estimated error estimating unit 89 as an estimated value of the estimated error._nIs calculated by
[0028]
The step size in matrix format also takes into account the input signal power of each channel, and the following matrixS′_n(K) can also be used.
(Equation 5)

(Equation 6)

(Equation 7)

This r_nBy (k), the input signal power of each channel is reflected in the step size control. However, b is a natural number, and b = 1 is most preferable from the calculation amount. | Re [T (Z _n(K), i)] |, | Im [T (Z _n(K), i)] | is a complex number T (Z _n(K), The magnitude of the real part and the imaginary part of i).
[0029]
Experimental example
Numerical simulations were actually performed on acoustic echo cancelers connected to a two-channel reproduction system and a one-channel sound collection system. In the simulation, a sampling frequency was set to 8 kHz, and an acoustic echo was generated by truncating an indoor transfer function actually measured in a room having a reverberation time of 200 ms to 700 taps as an acoustic echo path. Also, a two-channel reception signal u having a high cross-correlation₁(K), u₂(K) is generated by simulating a situation in which the sound of a single speaker is picked up by two microphones, and adjusted so that the signal powers become equal.
[0030]
As a conventional method, the frequency domain adaptation algorithm proposed in the literature Emura and Haneda, "Additional signal emphasis type frequency domain stereo adaptation algorithm" (Journal of the Acoustical Society of Japan 2001 Autumn Performance, 2001, pp. 537-538) is used. Only the 640 taps corresponding to the causal component were applied to the adaptive filter having each channel. As the method of the present invention, the adaptive algorithm of the second embodiment is applied to an adaptive filter having 640 taps including 128 taps in the non-causal component in each channel. In either case, the adaptive filter is updated every 80 ms. In the method of the present invention, the step size is controlled at intervals of 2.4 s according to equations (11) and (12). Where μ₀= 0.5 was set. Signal vector for adaptive filter modificationZ _nIn (k), the parameter for enhancing the correlation fluctuation component was set to β = 0.2.
[0031]
The correlation fluctuation processing is described in Document P. Energy, T .; Gaensler, S .; Gay and J.M. Benesty, "Studies of a wideband stereophonic acoustic echo canceller," Proc. 1999 IEEE Works on Applications of Signal Processing to Audio and Acoustics, pp. 207-210 (1999) Half-wave rectification method
g₁(U (k)) = α (u (k) + | u (k) |) / 2
g₂(U (k)) = α (u (k) − | u (k) |) / 2
Is applied at α = 0.3 with almost no sense of incongruity, and the speaker reproduction signal x₁(K), x₂(K)
x₁(K) = u₁(K) + g₁(U₁(K))
x₂(K) = u₂(K) + g₂(U₂(K))
Was used.
[0032]
The estimation performance of the method of the present invention and the conventional method was evaluated by the relative estimation error (Misalignment) between the impulse response of the acoustic echo path and the impulse response of the adaptive filter by using the equation (7).
4A and 4B show the behavior of the relative estimation error from the initial state for the method of the present invention and the conventional method, respectively. Here, when the echo path gain ratio is 1 and 9, the time T 'until the relative error reaches -15 dB is obtained.₁And T '₉(The method of the invention) and T₁And T₉Attention is paid to (conventional method). According to the method of the present invention, when the echo path gain ratio is 9, T₉→ T '₉The improvement is 30%. On the other hand, when the echo path gain ratio is 1, T₁→ T '₁The improvement is 21%. As described above, according to the method of the present invention, the speed of the echo path estimation is increased, and the influence of the difference in the echo path gain ratio on the echo path estimation is reduced.
[0033]
Embodiment 1 may be applied to the case where the correlation variation processing shown in FIG. 5 is performed. In this case, the receiving signal u_n(K) and the signal g subjected to the correlation variation processing_n(U_n(K)) a reproduction signal x for supplying the sum to the speaker_n(K) = u_n(K) + g_n(U_n(K)) and the correction signal z_n(K) = βu_n(K) + g_n(K) (0 <β <1), and the vector of the correction signalz _n(K) and error signal vectore(K) and the correction vector dw _n And the formula (9) or (10)x _jInstead of (k)z _jThe step size μ of the n-th channel obtained by substituting (k)_n To dw _n , And multiply this result by the filter coefficientw _nThe update may be performed by adding to (k).
[0034]
The echo prediction and filter update processing in the frequency domain shown in the second embodiment can be applied even when the correlation variation processing is not performed on the received signal. In this case, the received signal in the frequency domainX _n(K) and error signalEFrom (k), the correction vector d in the frequency domain is obtained by the same processing as in step 5 of the second embodiment.w _n(K) is obtained, and further, in Expressions (11) and (12) or Expressions (13), (14) and (15),Z _nInstead of (k)X _nUsing (k), the step size in matrix formatS _n(K) may be obtained.
[0035]
Further, the present invention can be applied to an N-channel reproduction system and an M-channel sound collection system shown in FIG. 5 using M acoustic echo cancellers.
The multi-channel acoustic echo cancellers shown in FIGS. 2 and 3 may be operated by a computer. In this case, a program for causing a computer to execute each step of the above-described multi-channel acoustic echo cancellation method of the present invention is downloaded from a recording medium such as a CD-ROM or a magnetic disk, or downloaded to a computer via a communication line. What is necessary is just to execute the program.
[0036]
【The invention's effect】
According to the present invention, an adaptive filter includes a non-causal component, an estimation error of each adaptive filter coefficient of the multi-channel predicted echo path is estimated from the amount of the non-causal component, and the step size is controlled for each channel. The speed of echo path estimation can be improved even in a situation where path gains are uneven.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating an example of a relationship between an impulse response of an acoustic echo path and a causal component tap of a predicted echo path according to a conventional method and a non-causal component tap and a causal component tap of the method of the present invention.
FIG. 2 is a diagram showing an example of a functional configuration according to the first embodiment of the present invention.
FIG. 3 is a diagram showing an example of a functional configuration of an n-th channel portion according to a second embodiment of the present invention.
FIG. 4 is a diagram showing a simulation result showing a state of convergence of a relative estimation error by the method of the present invention and the conventional method.
FIG. 5 is a diagram showing a configuration example of a conventional communication conference system including an N-channel reproduction system and an M-channel sound collection system.
FIG. 6 is a diagram showing a functional configuration example of a conventional N-channel acoustic echo canceller.
FIG. 7 is a diagram showing a state of convergence of a relative prediction error of an adaptive prediction filter coefficient using an echo path gain ratio as a parameter.

Claims

N channels, where N is an integer equal to or greater than 2, are input to N predicted echo paths to generate predicted echoes,
Subtract the predicted echo from the picked-up signal to eliminate the acoustic echo,
N correction vectors are obtained from the error signal between the picked-up signal and the predicted echo and the N-channel received signal,
A multi-channel acoustic echo cancellation method for sequentially correcting the impulse responses of the N predicted echo paths using the N corrected vectors,
Include non-causal components in the predicted echo path,
Estimating the estimation error of the echo path from the non-causal components of the N predicted echo paths,
A multi-channel acoustic echo canceling method characterized in that, for each predicted echo path, the step size of the impulse response correction is increased as the echo path estimation error is larger, depending on the estimated value of the echo path estimation error.

N channels, where N is an integer equal to or greater than 2, generate an additional signal from the received signal and add it to the received signal;
The added signal is input to N predicted echo paths to generate a predicted echo,
The expected echo is subtracted from the collected signal to eliminate the acoustic echo,
N correction vectors are obtained from an error signal between the collected signal and the predicted echo, and the N-channel reception signal and the additional signal,
A multi-channel acoustic echo cancellation method for sequentially correcting the impulse responses of the N predicted echo paths using the N corrected vectors,
Include non-causal components in the predicted echo path,
Estimating the estimation error of the echo path from the non-causal components of the N predicted echo paths,
A multi-channel acoustic echo canceling method characterized in that, for each predicted echo path, the step size of the impulse response correction is increased as the echo path estimation error is larger, depending on the estimated value of the echo path estimation error.

3. The multi-channel acoustic echo cancellation according to claim 1, wherein the step size control is performed in accordance with the received signal power of the channel together with the control according to the estimated value so that the step size is reduced as the received signal power is increased. Method.

N received echo signals of N channels (N is an integer of 2 or more), each of which includes a non-causal component, and adaptively filters the input received signal to generate a predicted echo path;
A subtraction unit for subtracting the predicted echoes from the N predicted echo paths from the collected sound signal to eliminate acoustic echoes;
N correction vector generation units that generate correction vectors from an error signal between the acoustic echo and the predicted echo and a reception signal of each channel,
A non-causal component of the predicted echo path is input, and N estimation error estimating units for obtaining an estimated value of a filter coefficient estimation error of the predicted echo path;
N step size control units to which the estimated values of the respective channels are input and control the step size of the channel;
N correction vector correction units for correcting the correction vector of the corresponding channel according to the step size of the controlled channel;
A multi-channel acoustic echo canceller, comprising: N filter updating units for sequentially updating filter coefficients of a corresponding predicted echo path with the corrected correction vector.

N correlation variation processing units each of which receives a reception signal of N channels (N is an integer of 2 or more), generates an additional signal from the reception signal, adds the reception signal to the reception signal, and generates a reproduction signal;
N predicted echo paths each receiving the N-channel reproduced signal, including a non-causal component, adaptively filtering the input reproduced signal to generate a predicted echo,
A subtraction unit for subtracting the predicted echoes from the N predicted echo paths from the collected sound signal to eliminate acoustic echoes;
N correction vector generation units that generate correction vectors from the error signal between the acoustic echo and the predicted echo, the reception signal of each channel, and the additional signal thereof,
A non-causal component of the predicted echo path is input, and N estimation error estimating units for obtaining an estimated value of a filter coefficient estimation error of the predicted echo path;
N step size control units to which the estimated values of the respective channels are input and control the step size of the channel;
N correction vector correction units for correcting the correction vector of the corresponding channel according to the step size of the controlled channel;
A multi-channel acoustic echo canceller, comprising: N filter updating units for sequentially updating filter coefficients of a corresponding predicted echo path with the corrected correction vector.

A program for causing a computer to execute each step of the multi-channel acoustic echo cancellation method according to claim 1.

A computer-readable recording medium on which the program according to claim 6 is recorded.