JP3607625B2

JP3607625B2 - Multi-channel echo suppression method, apparatus thereof, program thereof and recording medium thereof

Info

Publication number: JP3607625B2
Application number: JP2001032422A
Authority: JP
Inventors: 澄宇阪内; 雅史田中; 陽一羽田; 和彦山森
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2001-02-08
Filing date: 2001-02-08
Publication date: 2005-01-05
Anticipated expiration: 2021-02-08
Also published as: JP2002237769A

Description

【０００１】
【発明の属する技術分野】
この発明は、音声会議、ＴＶ会議などにおいて一つの音響空間内に複数のスピーカと少くとも一つのマイクロホンが配された拡声通話系等において、エコー信号を抑圧する多チャネル反響抑圧方法、反響抑圧装置、反響抑圧プログラム及びその記録媒体に関する。
【０００２】
【従来の技術】
図４に、拡声通話系の一例を示す。送話者１１０の発声した送話音声は、送話用マイクロホン１０１、送話信号増幅器１０５、伝送路１０９、受話信号増幅器１０８、受話スピーカ１０４を経て受話者１１１に伝わる。同様に送話者１１１の発声した送話音声は送話用マイクロホン１０３、送話信号増幅器１０７、伝送路１０９、受話信号増幅器１０６、受話スピーカ１０２を経て受話者１１０に伝わる。
【０００３】
この拡声通話系は、従来の電話通話系のように送受話器を手に持つ必要がないため、作業をしながらの通話が可能であったり、また、自然な対面通話が実現できるという長所を持ち、通信会議やテレビ電話、拡声電話機などに広く利用が進められている。
しかしながら、上述した拡声通話系の欠点として、エコーの存在が問題となっている。即ち、図４において、スピーカ１０４から受話者１１１に伝わった音声が、マイクロホン１０３にも受音され、送話信号増幅器１０７、伝送路１０９、受話信号増幅器１０６、スピーカ１０２を経て送話側で再生される。送話者１１０にとって、この現象は、自分の発声した音声が、スピーカ１０２から再生されるというエコー現象であり、音響エコーなどと呼ばれている。このエコー現象は、拡声通話系において通話の障害や不快感などの悪影響を生じる。
【０００４】
さらに、スピーカ１０２から再生された音は、マイクロホン１０１で受音されて信号の閉ループを形成し、そのループゲインが１より大きい場合にはハウリング現象が発生して、通話は不能となる。
このような拡声通話系の問題点を解決するために、反響消去装置（エコーキャンセラ）が利用されている。エコーキャンセラは適応フィルタ部、非線形エコー抑圧処理部のどちらか、もしくはそれらを組み合わせて構成される。
ここで、非線形エコー抑圧処理は、音声スイッチやセンタクリッパなどの、適応フィルタ（線形処理）以外のエコー抑圧処理を指す。適応フィルタ、非線形エコー抑圧処理に関しては、辻井重男監修の「エコーキャンセラ技術」（日本工業技術センター、昭６１）などが詳しい。また、ＩＴＵの勧告Ｐ２０１、Ｐ２０４、Ｇ１６５、Ｇ１６７などにも、構成および要求性能が提示されている。これらの処理によって、比較的静かな環境で利用する従来の通信会議やテレビ電話、拡声電話機などは、十分な通話品質を保証することが可能であった。
【０００５】
しかし、昨今、拡声通話の利用形態が拡大している。例えば、高騒音下の自動車内でのハンズフリー通話や、伝送遅延の大きいパケット網を用いたデスクトップテレビ会議、残響の大きい講堂を用いる遠隔講義などが挙げられる。このような利用環境においては、従来の適応フィルタや非線形エコー抑圧処理では、十分な通話品質の保証が困難となっている。
高騒音下では、適応フィルタは一般的にエコー経路のインパルス応答を十分に推定することが困難となる。適応フィルタのフィルタ係数長（タップ長）は、エコー経路の残響時間に基づいて設定される。したがって、残響時間が長い場合は、より多くのタップ長が必要となり、収束速度の低下や装置のハードウェア規模の増大を招く。さらに、伝送遅延が大きい場合には、エコーがより聞こえやすくなり、適応フィルタだけでは消去しきれない残留エコーが通話品質の劣化を引き起こす。
【０００６】
これに対して、非線形エコー抑圧処理は、挿入損失の制御などにより大きくロバストにエコー抑圧できる利点がある。しかし、エコーと同時に送話音声が存在する場合は、それらを区別なく抑圧してしまうために、送話音声に歪みや、音の途切れを引き起こしてしまうといった問題が発生してしまう。すなわち、非線形エコー抑圧処理は、双方向同時通話（ダブルトーク）時に、通話品質の劣化を引き起こすという問題がある。
以上のような問題に対し、周波数領域のエコー抑圧方法が特開平１１−３３１０４６号公報に提案されている。その抑圧方法について、図５を用いて簡単に説明する。なお、該エコー抑圧方法は受話信号ｘ（ｋ）および、マイクロホン１０３からの送信信号（エコー重畳信号）ｙ（ｋ）をそれぞれ、高速フーリエ変換部２０１および２０２で高速フーリエ変換して、信号の短時間スペクトルＸ（ω），Ｙ（ω）をそれぞれ求める。片側発話状態検出部２０４に受話信号ｘ（ｋ）送信信号ｙ（ｋ）を入力して、受話信号ｘ（ｋ）だけが入力される状態か否かを検出する。そして、受話信号だけが入力されている片側発話状態の場合に、先に変換した受話信号スペクトルおよび送信信号スペクトル（この場合、エコー信号スペクトル）の各パワーＰＸ（ω），ＰＹ（ω）（＝ＰＥ（ω））から、エコー経路結合量計算回路２０５において、推定エコー経路結合量ＰＨｅ（ω）＝ＰＹ（ω）／ＰＸ（ω）を計算する。
【０００７】
エコー信号パワー計算部２０６で、受話信号パワーＰＸ（ω）に推定エコー経路結合量ＰＨｅ（ω）を乗じて、予測エコー信号パワーＰＥｅ（ω）を計算する。その計算された予測エコー信号パワーＰＥｅ（ω）と、受話信号および送信信号の各短時間スペクトルＸ（ω），Ｙ（ω）を用いて、エコー抑圧ゲイン決定部２０７において、送信信号ｙ（ｋ）に重畳されたエコー信号ｂ（ｋ）の割合に基づいてエコー抑圧ゲインＧを計算する。このエコー抑圧ゲインＧ（ω）を、エコー抑圧部２０８において、送信信号（エコー重畳信号）スペクトルＹ（ω）に乗じることにより、エコーを抑圧した処理信号スペクトルＳｅ（ω）が得られ、処理信号スペクトルＳｅ（ω）を逆高速フーリエ変換部２０３で逆フーリエ変換することにより、エコー信号ｂ（ｋ）を抑圧し送話信号ｓ（ｋ）を強調した時間信号ｓｅ（ｋ）が得られる。
【０００８】
この方法を用いると、エコーが重畳された送信信号ｙ（ｋ）からエコー信号ｂ（ｋ）だけを抑圧し、送話信号ｓ（ｋ）だけを強調し、相手側に送信することができる。すなわち、非線形エコー抑圧処理でありながら、ダブルトーク時にも送話信号ｓ（ｋ）が途切れることなく、エコー信号ｂ（ｋ）だけを抑圧することが可能となる。
【０００９】
【発明が解決しようとする課題】
上述したように、従来の方法では、エコー経路結合量を計算するために、片側発話状態検出部により、受話のみしている状態を検出する必要がある。この片側発話状態を誤検出すると、エコー抑圧ゲインＧ（ω）が不正確なものとなり、通話品質が劣化するおそれがあった。また従来の方法は一つの音響空間に複数のスピーカと一つのマイクロホンが存在する系やステレオ音声会議のように、一つの音響空間に複数スピーカと複数のマイクロホンとが存在する系のような多チャネル系についての反響消去については示されていない。
【００１０】
この発明の目的は多チャネル系において、片側発話状態の誤検出による通話品質の劣化が生じるおそれのない多チャネル反響抑圧方法、その装置、そのプログラム及びその記録媒体を提供することにある。
【００１１】
【課題を解決するための手段】
この第１の発明によれば、Ｎチャネル（Ｎは２以上の整数）の受話信号に対する収音信号の各チャネル毎のパワー比をそれぞれ、所定時間毎に算出し、これら逐次算出する各チャネルのパワー比の内の最小値を各チャネルの受話信号のエコー経路結合量とし、この各エコー経路結合量を対応チャネルの受話信号にそれぞれ乗算してＮ個のエコー信号を推定し、これらＮ個のエコー信号のパワーを収音信号のパワーからそれぞれ差し引いたＮ個のパワーを、収音信号のパワーでそれぞれ規格化してＮ個のエコー抑圧ゲインを算出し、これらＮ個のエコー抑圧ゲインを収音信号に乗算してエコーを抑圧する。
【００１２】
この第２の発明によれば、Ｎチャネル（Ｎは２以上の整数）の受話信号をチャネル間で加算して一つの加算受話信号とし、この加算受話信号と収音信号とのパワー比を所定時間毎に算出し、これら逐次算出するパワー比の内、最小値をエコー経路結合量とし、このエコー経路結合量を加算受話信号に乗算してエコー信号を推定し、このエコー信号のパワーを収音信号のパワーから差し引いたパワーを収音信号のパワーで規格化してエコー抑圧ゲインを算出し、このエコー抑圧ゲインを収音信号に乗算してエコーを抑圧する。
【００１３】
第１、第２の発明の何れにおいても、多チャネル系において、片側発話状態を検出することなく、エコー経路結合量を求めることができ、片側発話状態の誤検出に基づく、通話品質劣化が生じるおそれがない。
このようにエコー経路結合量を最小値保持により求めても、両側発話（ダブルトーク）の状態が長く続き、その間にエコー経路結合量が大きく変わることは、装置利用の実際上ほぼないことが判明しており、問題はない。なおエコー経路は時間的に非定常であるから、ある程度時間が経つと保持している最小値エコー経路結合量をクリアして初期値に戻すようにすればよい。この初期に戻す時間は利用環境に応じ、つまりエコー経路の変化する頻度にあわせて設定する。
【００１４】
【発明の実施の形態】
以下図面を参照してこの発明の実施形態を説明する。図１にこの発明の第１の実施形態を示す。この第１の実施形態は同一音響空間に複数のスピーカと一個のマイクロホンが配置された場合であり、図１では２個のスピーカ１０４_１と１０４_２を用いた例を示す。
第１チャネルの受話信号ｘ_１（ｋ）はスピーカ１０４_１で再生され、第２チャネルの受話信号ｘ_２（ｋ）はスピーカ１０４_２で再生され、これらスピーカ１０４_１，１０４_２で再生された音声はエコー経路６０１_１，６０１_２をそれぞれ伝搬してエコー信号ｂ_１（ｋ），ｂ_２（ｋ）としてマイクロホン１０３で受音される。ｂ_１（ｋ），ｂ_２（ｋ）などの各ｋは整数であって離散時間を表わす。このマイクロホン１０３には話者の音声が送話信号ｓ（ｋ）として受音される。従ってマイクロホン１０３にはエコー信号ｂ_１（ｋ），ｂ_２（ｋ）と送話信号ｓ（ｋ）が重畳されてエコー重畳信号（収音信号）ｙ（ｋ）が出力される。
【００１５】
このエコー重畳信号ｙ（ｋ）はエコー経路遅延推定部３０２_１，３０２_２に入力され、エコー経路遅延推定部３０２_１，３０２_２には受話信号ｘ_１（ｋ），ｘ_２（ｋ）もそれぞれ入力され、エコー経路遅延推定部３０２_１，３０２_２はそれぞれ受話信号ｘ_１（ｋ），ｘ_２（ｋ）とエコー重畳信号ｙ（ｋ）の相関係数（相互相関関数）が計算され、その各最大値となる遅延量から、エコー経路６０１_１，６０１_２の伝搬遅延量ΔＴ_１，ΔＴ_２が推定される。
エコー経路遅延推定部３０２_１，３０２_２で推定された遅延量ΔＴ_１，ΔＴ_２は遅延部３０３_１，３０３_２に設定され、遅延部３０３_１，３０３_２でそれぞれ受話信号ｘ_１（ｋ），ｘ_２（ｋ）が設定された遅延量だけ遅延される。なお遅延部３０３_１，３０３_２としては例えばＦＩＲフィルタで構成し、遅延時間（量）と対応したフィルタ係数の値を１とし、他のタップの重みを０とする。
【００１６】
この実施形態では信号を周波数領域に変換して処理する例である。つまり遅延部３０３_１，３０３_２でそれぞれ遅延された受話信号ｘ_１（ｋ），ｘ_２（ｋ）はそれぞれ周波数領域変換部２０１_１，２０１_２で例えば高速フーリエ変換（ＦＦＴ）により短時間スペクトル（周波数領域信号）Ｘ_１（ω），Ｘ_２（ω）に変換され、同様にエコー重畳信号ｙ（ｋ）も周波数領域変換部２０２で短時間スペクトルＹ（ω）に変換される。ωは各角周波数を示す。ここで短時間とは例えば３２ｍｓから６４ｍｓ（８ｋＨｚサンプリングで２５６タップから５１２タップに対応）程度が好ましい。このことはこの程度の時が主観評価実験結果から処理品質として良いことを確認したからである。
【００１７】
エコー結合量計算部３０４₁において周波数領域変換部２０１₁からの受話信号Ｘ₁(ω)及び周波数領域変換部２０２からのエコー重畳信号Ｙ（ω）を用いて、エコー重畳信号Ｙ(ω)の受話信号Ｘ₁(ω)に対するパワー比の最小値を更新することにより、エコー経路６０１₁のエコー経路結合量を算出してこれを出力する。
以下、エコー結合量計算部３０４₁が行うエコー経路結合量算出処理について詳しく説明する。
【００１８】
エコー経路結合量は、エコー経路６０１₁へ入力される信号と、エコー経路６０１₁を伝搬した後の信号とのパワー比（Ｐｙ₁／Ｐｘ₁）である。即ち、信号の遅延を考慮すると、遅延部３０３₁で遅延された受話信号ｘ₁(ｋ)に対する出力信号であるエコー信号ｂ₁(ｋ)のパワー比である。同様にエコー経路６０１₂のエコー経路結合量はＰｙ₂／Ｐｘ₂である。
しかしながら、エコー信号ｂ₁(ｋ)を独立に抽出することは不可能であるため、遅延を考慮した受話信号ｘ₁(ｋ)およびエコー重畳信号ｙ（ｋ）、この例ではＸ₁(ω)およびＹ（ω）を用いてエコー経路結合量を算出する。即ち、所定期間毎、例えばＦＦＴの時間幅毎にエコー重畳信号のパワーＰ_Y(ω）の受話信号のパワーＰ_X1（ω）に対する比を算出し、前回取得した比と今回取得した比とを比較して、小さい方をエコー経路結合量とする。即ち、エコー結合量計算部３０４₁は、今まで取得したエコー重畳信号と受信信号のパワー比Ｐ_Y(ω）／Ｐ_X1（ω）において、一番小さい値をエコー経路結合量として保持する。エコー結合量計算部３０４₂も同様にエコー重畳信号Ｙ（ω）と受話信号Ｘ₂(ω)のパワー比Ｐ_Y(ω）／Ｐ_X2（ω）によりエコー経路６０１₂のエコー経路結合量を求める。
【００１９】
エコー推定部３１０_１において受話信号Ｘ_１（ω）に、エコー結合量計算部３０４_１からのそのエコー経路結合量Ｐ_Ｙ（ω）／Ｐ_Ｘ１（ω）を乗算して、エコー経路６０１_１からのエコー信号Ｂ_ｅ１（ω）を推定する。同様にエコー推定部３１０_２において受話信号Ｘ_２（ω）に、エコー結合量計算部３０４_２からのそのエコー経路結合量Ｐ_Ｙ（ω）／Ｐ_Ｘ２（ω）を乗算してエコー経路６０１_２からのエコー信号Ｂ_ｅ２（ω）を推定する。
エコー抑圧ゲイン算出部３０５_１では、エコー重畳信号（収音信号）Ｙ（ω）および推定エコー信号Ｂ_ｅ１（ω）を用いて、エコー抑圧ゲインＧ_１（ω）を算出する。この例では推定したエコー信号Ｂ_ｅ１（ω）を、それ以外の可聴信号によって形成されるマスキングしきい値以下に抑圧するためのエコー抑圧ゲインＧ_１（ω）を算出する。
【００２０】
以下、エコー抑圧ゲイン算出部３０５₁が行うエコー抑圧ゲイン算出処理について説明する。
エコー抑圧ゲイン算出部３０５₁は、以下に示す式（１）を満たすようなエコー抑圧ゲインＧ₁を算出する。

Ｐ_d(ω)はエコー信号Ｐ_B1(ｋ)以外の可聴信号、例えば、送信者側の受話信号ｘ（ｋ）、受話者側の送話信号ｓ（ｋ）、周囲騒音、回線雑音などの周波数領域信号が形成するマスキングしきい値（レベル）を示す。なお、マスキングしきい値Ｐ_d(ω)は、ノイズによるマスキングしきい値から算出可能である。送話信号ｓ（ｋ）のマスキングしきい値は、所定の期間毎に推定することは不可能であるため、長時間の特性は予め実験的に得られた所望のエコー抑制ゲインの主観評価から推定する。
【００２１】
ここで、上記関係式を満たすエコー抑圧ゲインＧ_１を算出する方法は各種存在するが、この実施形態ではウィナーフィルタリングに準じた解法で求める場合について説明する。
上記式（１）は、式（２）のように表せ、
Ｐ_ｓｅ（ω）＋Ｐ_ｄ（ω）＝Ｇ_１（ω）・（Ｐ_ｓ（ω）＋Ｐ_Ｂ１（ω））（２）
上記式（２）をエコー抑圧ゲインＧ_１についての式にすると、

とすることができる。
【００２２】
上記式（３）において、Ｐ_Ｂｅ１（ω）は推定エコー信号Ｂｅ_１（ω）のパワー、Ｐ_ｓｅ（ω）はエコー重畳信号Ｙ（ω）のパワーから上述の推定エコー信号Ｂ_ｅ１（ω）のパワーを引いた推定音声信号Ｓｅ（ω）のパワーである。つまり推定エコー信号Ｂｅ_１（ω）のパワーＰ_Ｂｅ１（ω）からマスキングしきい値Ｐ_ｄ（ω）を差し引いた値を、エコー重畳信号（収音信号）のパワーＰ_Ｙ（ω）から差し引き、その差し引いた値を、エコー重畳信号のパワーＰ_Ｙ（ω）で規格化しエコー抑圧ゲインＧ_１（ω）を算出する。同様にエコー抑圧ゲイン算出部３０５_２は、推定エコー信号Ｂｅ_２（ω）、エコー重畳信号（収音信号）Ｙ（ｍ）を用いて、式（４）によりエコー抑圧ゲインＧ_２（ω）を算出する。
【００２３】
Ｇ_２（ω）＝（Ｐ_Ｙ（ω）−Ｐ_Ｂｅ２（ω）＋Ｐ_ｄ（ω））／Ｐ_Ｙ（ω）（４）
エコー抑圧ゲイン算出部３０５_１，３０５_２でそれぞれ算出されたエコー抑圧ゲインＧ_１（ω），Ｇ_２（ω）はこの例ではエコー抑圧部２０８_１でエコー重畳信号Ｙ（ω）にＧ_１（ω）が先ず乗算されて、エコー信号Ｂ_１（ω）が抑圧され、更にそのエコー抑圧部２０８_１の出力に対し、エコー抑圧部２０８_２でＧ_１（ω）が乗算されてエコー信号Ｂ_２（ω）が抑圧される。これらエコー信号Ｂ_１（ω），Ｂ_２（ω）が抑圧された信号ｓｅ（ω）は時間領域重畳部２０３で例えば逆高速フーリエ変換（ＩＦＦＴ）により時間領域信号に変換されて出力される。この出力信号ｓｅ（ｋ）はエコー重畳信号ｙ（ｋ）のエコー信号ｂ_１（ｋ）及びｂ_２（ｋ）が抑圧され、マイクロホン１０３に入力された送話信号ｓ（ｋ）にできるだけ近い信号となる。なおエコー抑圧部２０８_１と２０８_２の順は何れをマイクロホン１０３側にしてもよい。
【００２４】
図２にこの第２の発明の実施形態を示し、図１と対応する部分に同一参照符号を付けてある。同一音響空間に複数のスピーカ１０４_１〜１０４_Ｎが設けられ、また複数のマイクロホン１０３_１〜１０３_Ｍが設けられている場合である。図１に示した構成では、各チャネルごとに周波数領域変換部、エコー結合量計算部、エコー推定部、エコー抑圧ゲイン算出部、エコー抑圧部の組を設ける必要があり、チャネル数が多くなると、ハードウェアの規模が大きくなる、ソフトウェアで処理しても、処理時間が長くなる。図２に示す実施形態はこの点を改善したものである。マイクロホン１０３_１よりのエコー重畳信号（収音信号）ｙ（ｋ）に対しエコー抑圧する構成を示す。
【００２５】
各チャネルの受話信号ｘ_１（ｋ）…ｘ_Ｎ（ｋ）とエコー重畳信号（収音信号）ｙ（ｋ）とを用いて、エコー経路遅延推定部３０２_１…３０２_Ｎで、スピーカ１０４_１…１０４_Ｎからマイクロホン１０３に達する各エコー経路の遅延量（時間）が推定され、その推定された遅延時間が遅延部３０３_１…３０３_Ｎそれぞれ設定され、受話信号ｘ_１（ｋ）…ｘ_Ｎ（ｋ）はそれぞれ遅延部３０３_１…３０３_Ｎで遅延される。
これら遅延部３０３_１…３０３_Ｎで遅延された受信信号は加算部４０１で加算され、この加算信号ｘ_Ａ（ｋ）は周波数領域変換部２０１で周波数領域信号Ｘ_Ａ（ω）に変換される。マイクロホン１０３_１からのエコー重畳信号も周波数領域変換部２０２で周波数領域信号Ｙ（ω）に変換される。エコー結合量計算部３０４でこれら周波数領域信号Ｘ_Ａ（ω），Ｙ（ω）の短時間毎のパワー比Ｐ_Ｙ（ω）／Ｐ_ＸＡ（ω）が計算され、その最小値の更新を行い、その最小値をエコー経路結合量として出力される。このエコー経路結合量を加算受話信号Ｘ_Ａ（ω）にエコー推定部３１０で乗算されて、エコー信号が推定される。この推定エコー信号とエコー重畳信号Ｙ（ω）とを用いてエコー抑圧ゲイン算出部３０５で例えば式（３）と同様な計算が行われてエコー抑圧ゲインＧ（ω）が計算される。このエコー抑圧ゲインＧ（ω）がエコー重畳信号Ｙ（ω）に対してエコー抑圧部２０８で乗算されて、重畳されているエコー信号が抑圧され、その乗算出力Ｓｅ（ω）は時間領域変換部２０３で時間領域信号Ｓｅ_１（ｋ）に変換されて出力される。
【００２６】
このマイクロホン１０３_１からのエコー重畳信号（収音信号）に対する反響抑圧装置４００_１と同様の構成の反響抑圧装置４００_２…４００_Ｍが設けられ、反響抑圧装置４００_２に受話信号ｘ_１（ｋ）…ｘ_Ｎ（ｋ）とマイクロホン１０３_２からのエコー重畳信号とを入力してエコー抑圧された信号ｓｅ_２（ｋ）を出力し、反響抑圧装置４００_Ｍは受話信号ｘ_１（ｋ）…ｘ_Ｎ（ｋ）とマイクロホン１０３_Ｍからのエコー重畳信号とを入力してエコー抑圧された信号ｓｅ_Ｍ（ｋ）を出力する。
なお各スピーカ１０４_１…１０４_Ｎからマイクロホン１０３_１へのエコー経路の伝搬遅延量がそれぞれスピーカ１０４_１…１０４_Ｎから他のマイクロホン１０３_２…１０３_Ｍへのエコー経路の伝搬遅延量と近似できる場合は反響抑圧装置４００_２…４００_Ｍにおいてはエコー経路遅延推定部３０２、遅延部３０３を省略して、図中に破線で示すように、反響抑圧装置４００_１内の加算部４０１の加算受話信号ｘ_Ａ（ｋ）又は周波数領域変換部２０１の周波数領域信号Ｘ_Ａ（ω）を反響抑圧装置４００_２…４００_Ｍへ供給し、受話信号ｘ_１（ｋ）…ｘ_Ｎ（ｋ）の供給を行わなくてもよい。
【００２７】
図１及び図２においてエコー結合量計算やエコー推定において、エコー経路での伝搬遅延量が、エコー抑圧処理に大きく影響を与えない程度、例えば周波数領域信号に変換して処理する場合に、その変換フレーム長以内、つまり残響時間が比較的短かい場合はエコー遅延量推定部３０２、遅延部３０３を省略してもよい。これらを省略する場合は破線で示すように１つの反響抑圧装置、例えば４００_１の周波数領域変換部２０１からの加算受話信号Ｘ_Ａ（ω）を他の反響抑圧装置４００_２…４００_Ｍへ供給し、これら装置４００_２…４００_Ｍは加算部４０１及び周波数領域変換部２０１を省略し、より簡略化することもできる。
【００２８】
図１及び図２に示した例において、エコー抑圧ゲイン算出部３０５_１，３０５_２，３０５の計算はマスキングしきい値Ｐｄ（ω）を省略してもよい。しかしこの場合は正しくエコー信号を推定できればよいが、この推定値に誤差が生じると、求めたエコー抑圧ゲインの正しいものとならず、送出される送話信号ｓｅ（ｋ）に歪みが生じるおそれがある。この点でマスキングしきい値Ｐｄ（ω）を用いてマスキングしきい値以下のエコーの残留を許容した方が通話品質を向上させることができる。
【００２９】
また図１及び図２においては、信号を周波数領域に変換して処理したが、これら図において周波数領域変換部２０１_１，２０１_２，２０１，２０２、時間領域変換部２０３を省略し、図中に破線で示すように、これら変換部を通過して接続し、時間領域の信号の状態で処理してもよい。つまり受話信号ｘ_１（ｋ）…ｘ_Ｎ（ｋ）、エコー重畳信号ｙ（ｋ）についてその短時間毎の、例えば各サンプル毎のパワー比Ｐ_ｙ（ｋ）／Ｐ_ｘ１（ｋ），Ｐ_ｙ（ｋ）／Ｐ_ｘ２（ｋ）を求め、その最小値を更新して、各エコー経路結合量を計算し、各エコー経路結合量を受話信号ｘ_１（ｋ），ｘ_２（ｋ）に乗算してエコー信号ｂｅ_１（ｋ），ｂｅ_２（ｋ）を推定し、推定エコー信号ｂｅ_１（ｋ），ｂｅ_２（ｋ）のパワーＰ_ｂｅ１，Ｐ_ｂｅ２エコー重畳信号ｙ（ｋ）のパワーＰ_ｙからそれぞれエコー抑圧ゲインＧ_１，Ｇ_２を式（３）と同様な式により計算し、このエコー抑圧ゲインＧ_１，Ｇ_２をエコー重畳信号ｙ（ｋ）に乗算して、エコー抑圧された信号ｓｅ（ｋ）を得る。図２の場合も、受話信号ｘ_１（ｋ）…ｘ_Ｎ（ｋ）を加算してその加算受話信号ｘ_Ａ（ｋ）とエコー重畳信号ｙ（ｋ）とについて同様に時間領域信号の状態で処理してもよい。なお当然のことであるが処理量あるいはハードウェア規模が大きくなるが、周波数領域に変換して処理した方が、高精度の処理ができ通話品質も向上する。
【００３０】
なお先に述べたように何れの場合もエコー結合量計算部では音響経路の変化頻度などに合せて定期的に保持していた最小値をクリアさせてエコー経路結合量の計算を新たに行うようにする。
この発明による反響抑圧装置を従来の適応フィルタ（線形処理）形多チャネルエコーキャンセラと併用して用いることもできる。例えば、図３に示すようにマイクロホン１０３_１からのエコー重畳信号ｙ（ｋ）を、受話信号ｘ_１（ｋ）…ｘ_Ｎ（ｋ）を用いて多チャネル適応フィルタ形エコーキャンセラ５００で処理して、ある程度エコー信号を消去し、その残留エコー信号を含む送話信号を、図１に示した又は図２に示した反響抑圧装置６００に入力して、その残留エコー信号を更に抑圧するようにしてもよい。
【００３１】
図１及び図２に示した実施形態はコンピュータによりプログラムを実行させて機能させてもよい。その場合のプログラムはＣＤ−ＲＯＭ、フロッピーディスク、磁気ディスクなどのコンピュータにより読み出し可能な記録媒体に記録したもの、または通信回線を介して送られたプログラムコンピュータ内のＲＡＭにダウンロードして使用される。
【００３２】
【発明の効果】
以上述べたようにこの発明によれば、多チャネル系において、片側発話状態を検出することなく、エコー経路結合量を求めることができ、片側発話状態の誤検出に基づく通話品質劣化は生じない。この発明におけるエコー経路結合量の計算は常に行っているが、両側発話（ダブルトーク）が長時間続き、その間にエコー経路結合量が大きく変ることはなく、良好な通話品質が保たれる。
【図面の簡単な説明】
【図１】第１の発明の実施形態の機能構成を示す図。
【図２】第２の発明の実施形態の機能構成を示す図。
【図３】この発明による反響抑圧装置を、適応フィルタ形エコーキャンセラと併用した例を示す図。
【図４】一般的な拡声通話の一例を示す図。
【図５】従来の周波数領域で処理する反響抑圧装置の機能構成を示す図。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a multi-channel echo suppression method and echo suppression device for suppressing an echo signal in a speech communication system in which a plurality of speakers and at least one microphone are arranged in one acoustic space in an audio conference, a TV conference, and the like. The present invention relates to an echo suppression program and its recording medium.
[0002]
[Prior art]
FIG. 4 shows an example of a voice call system. The transmitted voice uttered by the speaker 110 is transmitted to the receiver 111 via the transmission microphone 101, the transmission signal amplifier 105, the transmission path 109, the reception signal amplifier 108, and the reception speaker 104. Similarly, the transmitted voice uttered by the speaker 111 is transmitted to the receiver 110 via the transmission microphone 103, the transmission signal amplifier 107, the transmission path 109, the reception signal amplifier 106, and the reception speaker 102.
[0003]
This loudspeaker call system does not require a handset to be held in the hand like a conventional telephone call system, and thus has the advantage of being able to talk while working and realizing natural face-to-face calls. Widespread use for teleconferences, videophones, loudspeakers, etc.
However, the presence of echoes is a problem as a drawback of the above-mentioned loudspeaker communication system. That is, in FIG. 4, the sound transmitted from the speaker 104 to the receiver 111 is also received by the microphone 103 and reproduced on the transmitter side through the transmission signal amplifier 107, the transmission path 109, the reception signal amplifier 106, and the speaker 102. Is done. For the speaker 110, this phenomenon is an echo phenomenon in which the voice he / she uttered is reproduced from the speaker 102, and is called acoustic echo or the like. This echo phenomenon causes adverse effects such as call failure and discomfort in a loudspeaker call system.
[0004]
Further, the sound reproduced from the speaker 102 is received by the microphone 101 to form a closed loop of the signal. When the loop gain is larger than 1, a howling phenomenon occurs and the call cannot be made.
An echo canceller (echo canceller) is used in order to solve such problems of the voice call system. The echo canceller is configured by either an adaptive filter unit, a nonlinear echo suppression processing unit, or a combination thereof.
Here, the nonlinear echo suppression processing refers to echo suppression processing other than the adaptive filter (linear processing) such as a voice switch and a center clipper. For details on adaptive filters and nonlinear echo suppression processing, see “Echo Canceller Technology” (Nippon Industrial Technology Center, Sho 61) supervised by Shigeo Sakurai. The configuration and required performance are also presented in ITU recommendations P201, P204, G165, G167, and the like. Through these processes, conventional communication conferences, videophones, loudspeakers and the like used in a relatively quiet environment can guarantee sufficient call quality.
[0005]
However, in recent years, the usage form of the voice call is expanding. For example, a hands-free call in a car with high noise, a desktop video conference using a packet network with a large transmission delay, a remote lecture using a lecture hall with a large reverberation, and the like. In such a use environment, it is difficult to guarantee sufficient call quality with the conventional adaptive filter and nonlinear echo suppression processing.
Under high noise, it is generally difficult for the adaptive filter to sufficiently estimate the impulse response of the echo path. The filter coefficient length (tap length) of the adaptive filter is set based on the reverberation time of the echo path. Therefore, when the reverberation time is long, more tap lengths are required, which causes a decrease in convergence speed and an increase in hardware scale of the apparatus. Further, when the transmission delay is large, the echo becomes more audible, and the residual echo that cannot be canceled by the adaptive filter alone causes the quality of the call to deteriorate.
[0006]
On the other hand, the non-linear echo suppression process has an advantage that the echo can be largely and robustly suppressed by controlling the insertion loss. However, if there is a transmitted voice simultaneously with the echo, the transmitted voice is suppressed without distinction, which causes a problem that the transmitted voice is distorted or the sound is interrupted. That is, the non-linear echo suppression process has a problem of causing deterioration in call quality during two-way simultaneous call (double talk).
In response to the above problems, a frequency domain echo suppression method is proposed in Japanese Patent Laid-Open No. 11-331046. The suppression method will be briefly described with reference to FIG. In this echo suppression method, the received signal x (k) and the transmission signal (echo superimposed signal) y (k) from the microphone 103 are fast Fourier transformed by the fast Fourier transform units 201 and 202, respectively, and the signal is shortened. Time spectra X (ω) and Y (ω) are obtained respectively. The reception signal x (k) transmission signal y (k) is input to the one-side utterance state detection unit 204 to detect whether or not only the reception signal x (k) is input. In the case of a one-side utterance state in which only the reception signal is input, the powers PX (ω) and PY (ω) (=) of the reception signal spectrum and the transmission signal spectrum (in this case, the echo signal spectrum) converted previously. From the PE (ω)), the echo path coupling amount calculation circuit 205 calculates the estimated echo path coupling amount PHe (ω) = PY (ω) / PX (ω).
[0007]
The echo signal power calculator 206 multiplies the received signal power PX (ω) by the estimated echo path coupling amount PHe (ω) to calculate the predicted echo signal power PEe (ω). Using the calculated predicted echo signal power PEe (ω) and the short-time spectra X (ω) and Y (ω) of the reception signal and the transmission signal, the echo suppression gain determination unit 207 uses the transmission signal y (k The echo suppression gain G is calculated based on the ratio of the echo signal b (k) superimposed on (). By multiplying the echo suppression gain G (ω) by the transmission signal (echo superimposed signal) spectrum Y (ω) in the echo suppression unit 208, a processed signal spectrum Se (ω) in which the echo is suppressed is obtained, and the processed signal By performing inverse Fourier transform on the spectrum Se (ω) by the inverse fast Fourier transform unit 203, a time signal se (k) in which the echo signal b (k) is suppressed and the transmitted signal s (k) is enhanced is obtained.
[0008]
When this method is used, it is possible to suppress only the echo signal b (k) from the transmission signal y (k) on which the echo is superimposed, emphasize only the transmission signal s (k), and transmit it to the other party. That is, it is possible to suppress only the echo signal b (k) without interrupting the transmission signal s (k) even in the case of double talk even though it is nonlinear echo suppression processing.
[0009]
[Problems to be solved by the invention]
As described above, in the conventional method, in order to calculate the echo path coupling amount, it is necessary to detect a state in which only one call is received by the one-side utterance state detection unit. If this one-side utterance state is erroneously detected, the echo suppression gain G (ω) becomes inaccurate, and there is a possibility that the speech quality is deteriorated. Also, the conventional method is a multi-channel system such as a system in which a plurality of speakers and a plurality of microphones are present in one acoustic space, such as a system in which a plurality of speakers and a microphone are present in one acoustic space or a stereo audio conference. There is no indication of echo cancellation for the system.
[0010]
SUMMARY OF THE INVENTION An object of the present invention is to provide a multi-channel echo suppression method, apparatus, program thereof, and recording medium thereof in which there is no risk of deterioration of call quality due to erroneous detection of one-side speech state in a multi-channel system.
[0011]
[Means for Solving the Problems]
According to the first aspect of the present invention, a reception signal of N channels (N is an integer of 2 or more) Against The power ratio of each channel of the collected sound signal is calculated at each predetermined time, and the power ratio of each channel calculated sequentially is calculated. Inside The minimum value is the echo path coupling amount of the received signal of each channel, and each echo path coupling amount is multiplied by the received signal of the corresponding channel to estimate N echo signals, and the power of these N echo signals is calculated. The N powers subtracted from the power of the collected sound signal are respectively normalized by the power of the collected sound signal to calculate N echo suppression gains, and these N echo suppression gains are multiplied by the collected sound signal. Suppress the echo.
[0012]
According to the second aspect of the invention, the received signals of N channels (N is an integer equal to or greater than 2) are added between channels to form one added received signal, and the power ratio between the added received signal and the collected sound signal is set to a predetermined value. Calculated every time, among these sequentially calculated power ratios, the minimum value is the echo path coupling amount, the echo path coupling amount is multiplied by the added received signal to estimate the echo signal, and the echo signal power is collected. The power subtracted from the power of the sound signal is normalized by the power of the collected sound signal to calculate an echo suppression gain, and this echo suppression gain is multiplied by the collected sound signal to suppress the echo.
[0013]
In both the first and second inventions, in the multi-channel system, the echo path coupling amount can be obtained without detecting the one-side utterance state, and the speech quality deterioration is caused based on the false detection of the one-side utterance state. There is no fear.
Thus, even when the echo path coupling amount is obtained by holding the minimum value, it has been found that there is virtually no practical use of the device that the two-way utterance (double talk) state continues for a long time and the echo path coupling amount greatly changes during that time There is no problem. Since the echo path is non-stationary in time, it is only necessary to clear the held minimum value echo path coupling amount after a certain amount of time and return it to the initial value. The initial restoration time is set according to the use environment, that is, in accordance with the frequency of change of the echo path.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows a first embodiment of the present invention. The first embodiment is a case where a plurality of speakers and one microphone are arranged in the same acoustic space. In FIG. ₁ And 104 ₂ An example using is shown.
Received signal x of the first channel ₁ (K) is the speaker 104. ₁ And the received signal x of the second channel ₂ (K) is the speaker 104. ₂ And these

speakers

104 ₁ , 104 ₂ The voice reproduced by the Echo path 601 ₁ 601 ₂ Echo signal b ₁ (K), b ₂ The sound is received by the microphone 103 as (k). b ₁ (K), b ₂ Each k such as (k) is an integer representing discrete time. The microphone 103 receives the voice of the speaker as a transmission signal s (k). Accordingly, the microphone 103 has an echo signal b. ₁ (K), b ₂ (K) and the transmission signal s (k) are superimposed to output an echo superimposed signal (sound collecting signal) y (k).
[0015]
The echo superimposed signal y (k) is sent from the echo path delay estimation unit 302. ₁ , 302 ₂ The echo path delay

estimation unit

302 ₁ , 302 ₂ There is a reception signal x ₁ (K), x ₂ (K) is also input, and the echo path delay estimation unit 302 is input. ₁ , 302 ₂ Is the received signal x ₁ (K), x ₂ The correlation coefficient (cross-correlation function) between (k) and the echo superimposed signal y (k) is calculated, and the echo path 601 is calculated from the delay amount as the maximum value. ₁ 601 ₂ Propagation delay amount ΔT ₁ , ΔT ₂ Is estimated.
Echo path delay

estimation unit

302 ₁ , 302 ₂ The delay amount ΔT estimated by ₁ , ΔT ₂ Delay unit 303 ₁ , 303 ₂ And the

delay unit

303 ₁ , 303 ₂ Each received signal x ₁ (K), x ₂ (K) is delayed by the set delay amount. The

delay unit

303 ₁ , 303 ₂ For example, the filter coefficient value corresponding to the delay time (amount) is set to 1 and the weights of other taps are set to 0.
[0016]
In this embodiment, a signal is converted into a frequency domain and processed. That is, the

delay unit

303 ₁ , 303 ₂ Received signal x delayed by ₁ (K), x ₂ (K) is a frequency domain transform unit 201, respectively. ₁ , 201 ₂ For example, a short-time spectrum (frequency domain signal) X by fast Fourier transform (FFT) ₁ (Ω), X ₂ Similarly, the echo superimposed signal y (k) is also converted into the short-time spectrum Y (ω) by the frequency domain conversion unit 202. ω represents each angular frequency. Here, the short time is preferably about 32 ms to 64 ms (corresponding to 256 taps to 512 taps at 8 kHz sampling). This is because it was confirmed that the processing quality is good from the result of the subjective evaluation experiment.
[0017]
Echo coupling amount Calculation Part 304 ₁ Frequency domain transforming section 201 ₁ Received signal X from ₁ The received signal X of the echo superimposed signal Y (ω) using (ω) and the echo superimposed signal Y (ω) from the frequency domain transform unit 202. ₁ The echo path 601 is updated by updating the minimum value of the power ratio with respect to (ω). ₁ The echo path coupling amount is calculated and output.
Below, echo coupling amount Calculation Part 304 ₁ The echo path coupling amount calculation process performed by will be described in detail.
[0018]
The echo path coupling amount is the echo path 601. ₁ And an echo path 601 ₁ Power ratio (Py) ₁ / Px ₁ ). That is, considering the signal delay, the delay unit 303 ₁ Received signal x delayed by ₁ Echo signal b which is an output signal for (k) ₁ The power ratio of (k). Similarly, the echo path 601 ₂ The echo path coupling amount of Py is Py ₂ / Px ₂ It is.
However, the echo signal b ₁ Since it is impossible to extract (k) independently, the received signal x taking delay into consideration ₁ (k) and echo superimposed signal y (k), in this example X ₁ The echo path coupling amount is calculated using (ω) and Y (ω). That is, the power P of the echo superimposed signal every predetermined period, for example, every FFT time width. _Y Received signal power P of (ω) _X1 The ratio with respect to (ω) is calculated, the ratio acquired last time is compared with the ratio acquired this time, and the smaller one is set as the echo path coupling amount. That is, the amount of echo coupling Calculation Part 304 ₁ Is the power ratio P between the echo superimposed signal and the received signal acquired so far. _Y (ω) / P _X1 In (ω), the smallest value is held as the echo path coupling amount. Echo coupling amount Calculation Part 304 ₂ Similarly, the echo superimposed signal Y (ω) and the received signal X ₂ Power ratio P of (ω) _Y (ω) / P _X2 Echo path 601 by (ω) ₂ The echo path coupling amount of is obtained.
[0019]
Echo estimation unit 310 ₁ Received signal X at ₁ (Ω) includes an echo coupling amount calculation unit 304. ₁ Its echo path coupling amount P from _Y (Ω) / P _X1 (Ω) is multiplied by the echo path 601 ₁ Echo signal B from _e1 Estimate (ω). Similarly, the echo estimation unit 310 ₂ Received signal X at ₂ (Ω) includes an echo coupling amount calculation unit 304. ₂ Its echo path coupling amount P from _Y (Ω) / P _X2 The echo path 601 is multiplied by (ω). ₂ Echo signal B from _e2 Estimate (ω).
Echo suppression gain calculation unit 305 ₁ Then, the echo superimposed signal (sound collecting signal) Y (ω) and the estimated echo signal B _e1 Echo suppression gain G using (ω) ₁ (Ω) is calculated. In this example, the estimated echo signal B _e1 Echo suppression gain G for suppressing (ω) below a masking threshold value formed by other audible signals ₁ (Ω) is calculated.
[0020]
Hereinafter, the echo suppression gain calculation unit 305 ₁ Will be described.
Echo suppression gain calculation unit 305 ₁ Is an echo suppression gain G that satisfies the following expression (1): ₁ Is calculated.

P _d (ω) is the echo signal P _B1 Audible signals other than (k), for example, a reception signal x (k) on the sender side, a transmission signal s (k) on the receiver side, Surroundings Indicates a masking threshold (level) formed by frequency domain signals such as noise and line noise. The masking threshold P _d (ω) can be calculated from a masking threshold value due to noise. Since it is impossible to estimate the masking threshold value of the transmission signal s (k) every predetermined period, the long-time characteristic is obtained from a subjective evaluation of a desired echo suppression gain obtained experimentally in advance. presume.
[0021]
Here, the echo suppression gain G satisfying the above relational expression ₁ There are various methods for calculating the value, but in this embodiment, a case where it is obtained by a solution according to Wiener filtering will be described.
The above equation (1) can be expressed as equation (2),
P _se (Ω) + P _d (Ω) = G ₁ (Ω) ・ (P _s (Ω) + P _B1 (Ω)) (2)
The above equation (2) is expressed as echo suppression gain G ₁ The formula for

It can be.
[0022]
In the above formula (3), P _Be1 (Ω) is the estimated echo signal Be. ₁ (Ω) power, P _se (Ω) is the estimated echo signal B described above from the power of the echo superimposed signal Y (ω). _e1 This is the power of the estimated speech signal Se (ω) obtained by subtracting the power of (ω). That is, the estimated echo signal Be ₁ (Ω) power P _Be1 From (ω), masking threshold P _d The value obtained by subtracting (ω) is the power P of the echo superimposed signal (sound collecting signal). _Y Subtract from (ω), and the subtracted value is the power P of the echo superimposed signal. _Y Echo suppression gain G normalized by (ω) ₁ (Ω) is calculated. Similarly, an echo suppression gain calculation unit 305 ₂ Is the estimated echo signal Be ₂ (Ω) and echo superimposed signal (sound pickup signal) Y (m) are used, and echo suppression gain G is obtained by equation (4). ₂ (Ω) is calculated.
[0023]
G ₂ (Ω) = (P _Y (Ω) -P _Be2 (Ω) + P _d (Ω)) / P _Y (Ω) (4)
Echo suppression gain calculation unit 305 ₁ 305 ₂ Echo suppression gain G ₁ (Ω), G ₂ (Ω) is an echo suppression unit 208 in this example. ₁ To echo superimposed signal Y (ω) ₁ (Ω) is first multiplied to produce an echo signal B ₁ (Ω) is suppressed, and the echo suppression unit 208 is further suppressed. ₁ The echo suppression unit 208 ₂ G ₁ Echo signal B multiplied by (ω) ₂ (Ω) is suppressed. These echo signals B ₁ (Ω), B ₂ The signal se (ω) in which (ω) is suppressed is converted into a time domain signal by, for example, inverse fast Fourier transform (IFFT) in the time domain superimposing unit 203 and output. This output signal se (k) is the echo signal b of the echo superimposed signal y (k). ₁ (K) and b ₂ (K) is suppressed, and the signal becomes as close as possible to the transmission signal s (k) input to the microphone 103. The echo suppression unit 208 ₁ And 208 ₂ Any of these may be arranged on the microphone 103 side.
[0024]
FIG. 2 shows an embodiment of the second invention, and the same reference numerals are given to the portions corresponding to FIG. Multiple speakers 104 in the same acoustic space ₁ ~ 104 _N And a plurality of microphones 103 ₁ ~ 103 _M Is provided. In the configuration shown in FIG. 1, it is necessary to provide a set of a frequency domain conversion unit, an echo coupling amount calculation unit, an echo estimation unit, an echo suppression gain calculation unit, and an echo suppression unit for each channel. Even if processing is performed by software, the scale of hardware becomes large, and the processing time becomes long. The embodiment shown in FIG. 2 improves on this point. Microphone 103 ₁ The structure which echo-suppresses with respect to the echo superimposition signal (sound collection signal) y (k) from this.
[0025]
Received signal x for each channel ₁ (K) ... x _N Using (k) and the echo superimposed signal (sound pickup signal) y (k), the echo path delay estimating unit 302 is used. ₁ ... 302 _N In the speaker 104 ₁ ... 104 _N The delay amount (time) of each echo path reaching the microphone 103 from is estimated, and the estimated delay time is the delay unit 303. ₁ ... 303 _N Received signal x ₁ (K) ... x _N (K) is the delay unit 303, respectively. ₁ ... 303 _N Delayed at.
These delay units 303 ₁ ... 303 _N The received signal delayed by is added by the adder 401, and this added signal x _A (K) is a frequency domain signal X in the frequency domain transform unit 201. _A Converted to (ω). Microphone 103 ₁ Is also converted into a frequency domain signal Y (ω) by the frequency domain converter 202. The echo coupling amount calculation unit 304 uses these frequency domain signals X _A Power ratio P for each short time of (ω), Y (ω) _Y (Ω) / P _XA (Ω) is calculated, the minimum value is updated, and the minimum value is output as an echo path coupling amount. This echo path coupling amount is added to the received signal X _A The echo estimation unit 310 multiplies (ω) to estimate the echo signal. Using the estimated echo signal and the echo superimposed signal Y (ω), the echo suppression gain calculation unit 305 performs a calculation similar to, for example, Equation (3) to calculate the echo suppression gain G (ω). The echo suppression gain G (ω) is multiplied by the echo superimposed signal Y (ω) by the echo suppression unit 208 to suppress the superimposed echo signal, and the multiplied output Se (ω) is converted into a time domain conversion unit. 203, the time domain signal Se ₁ It is converted into (k) and output.
[0026]
This microphone 103 ₁ Echo suppression device 400 for echo superposition signal (collected sound signal) from ₁ Echo suppression device 400 having the same configuration as ₂ ... 400 _M The echo suppression device 400 is provided. ₂ Incoming signal x ₁ (K) ... x _N (K) and microphone 103 ₂ Echo-suppressed signal se ₂ (K) is output and the echo suppression device 400 is output. _M Is the received signal x ₁ (K) ... x _N (K) and microphone 103 _M Echo Suppressed Signal se _M (K) is output.
Each speaker 104 ₁ ... 104 _N To microphone 103 ₁ The propagation delay amount of the echo path to the ₁ ... 104 _N From other microphone 103 ₂ … 103 _M If it can approximate the propagation delay amount of the echo path to the echo suppression device 400 ₂ ... 400 _M , The echo path delay estimation unit 302 and the delay unit 303 are omitted, and as shown by a broken line in the figure, the echo suppression device 400 is used. ₁ Addition reception signal x of the addition unit 401 _A (K) or the frequency domain signal X of the frequency domain transform unit 201 _A (Ω) is the echo suppression device 400. ₂ ... 400 _M And receive signal x ₁ (K) ... x _N It is not necessary to supply (k).
[0027]
In FIG. 1 and FIG. 2, in the echo coupling amount calculation and the echo estimation, when the processing is performed by converting the propagation delay amount in the echo path so as not to greatly affect the echo suppression processing, for example, the frequency domain signal. If the reverberation time is relatively short within the frame length, the echo delay amount estimation unit 302 and the delay unit 303 may be omitted. When these are omitted, as shown by the broken line, one echo suppression device, for example, 400 ₁ Added reception signal X from the frequency domain transform unit 201 _A (Ω) for other echo suppression device 400 ₂ ... 400 _M Supply these devices 400 ₂ ... 400 _M Can be simplified by omitting the adding unit 401 and the frequency domain converting unit 201.
[0028]
In the example shown in FIGS. 1 and 2, the echo suppression gain calculation unit 305 ₁ 305 ₂ , 305 may omit the masking threshold value Pd (ω). However, in this case, it is only necessary to correctly estimate the echo signal. However, if an error occurs in the estimated value, the obtained echo suppression gain is not correct, and there is a possibility that the transmitted transmission signal se (k) may be distorted. is there. In this respect, it is possible to improve the communication quality by allowing the echo remaining below the masking threshold to be allowed using the masking threshold Pd (ω).
[0029]
Further, in FIG. 1 and FIG. 2, the signal is converted into the frequency domain and processed. ₁ , 201 ₂ , 201, 202, and the time domain conversion unit 203 may be omitted, and as shown by a broken line in the drawing, the conversion units may be connected through the conversion unit and processed in the time domain signal state. That is, the received signal x ₁ (K) ... x _N (K) The power ratio P for each sample, for example, for each sample, for the echo superimposed signal y (k) _y (K) / P _x1 (K), P _y (K) / P _x2 (K) is obtained, its minimum value is updated, each echo path coupling amount is calculated, and each echo path coupling amount is calculated as the received signal x ₁ (K), x ₂ The echo signal be is multiplied by (k). ₁ (K), be ₂ (K) is estimated, and the estimated echo signal be ₁ (K), be ₂ (K) Power P _be1 , P _be2 Power P of echo superimposed signal y (k) _y Echo suppression gain G ₁ , G ₂ Is calculated by an equation similar to Equation (3), and this echo suppression gain G ₁ , G ₂ Is multiplied by the echo superimposed signal y (k) to obtain an echo-suppressed signal se (k). Also in the case of FIG. 2, the received signal x ₁ (K) ... x _N (K) is added and the added received signal x _A Similarly, (k) and the echo superimposed signal y (k) may be processed in the state of the time domain signal. As a matter of course, the processing amount or the hardware scale is increased, but processing with conversion to the frequency domain enables high-accuracy processing and improves call quality.
[0030]
As described above, in any case, the echo coupling amount calculation unit newly calculates the echo path coupling amount by clearing the minimum value periodically held in accordance with the change frequency of the acoustic path. To do.
The echo suppression apparatus according to the present invention can be used in combination with a conventional adaptive filter (linear processing) type multi-channel echo canceller. For example, as shown in FIG. ₁ Echo echo signal y (k) from the received signal x ₁ (K) ... x _N (K) is processed by the multi-channel adaptive filter type echo canceller 500 to cancel the echo signal to some extent, and the transmission signal including the residual echo signal is shown in FIG. 1 or the echo suppression shown in FIG. The residual echo signal may be further suppressed by inputting the signal into the apparatus 600.
[0031]
The embodiment shown in FIGS. 1 and 2 may be made to function by executing a program by a computer. In this case, the program is recorded on a computer-readable recording medium such as a CD-ROM, floppy disk, or magnetic disk, or downloaded to a RAM in the program computer sent via a communication line.
[0032]
【The invention's effect】
As described above, according to the present invention, in the multi-channel system, the echo path coupling amount can be obtained without detecting the one-side utterance state, and the call quality deterioration based on the false detection of the one-side utterance state does not occur. Although the calculation of the echo path coupling amount in the present invention is always performed, both-side utterances (double talk) continue for a long time, and the echo path coupling amount does not change greatly during that time, and good speech quality is maintained.
[Brief description of the drawings]
FIG. 1 is a diagram showing a functional configuration of an embodiment of the first invention.
FIG. 2 is a diagram showing a functional configuration of an embodiment of a second invention.
FIG. 3 is a diagram showing an example in which the echo suppression device according to the present invention is used in combination with an adaptive filter type echo canceller.
FIG. 4 is a diagram showing an example of a general voice call.
FIG. 5 is a diagram illustrating a functional configuration of an echo suppression device that performs processing in a conventional frequency domain.

Claims

N-channel (N is an integer of 2 or more) is calculated power ratio of 1-channel sound signals for each of the received signals for each Jo Tokoro time, the minimum value of the power ratio of each channel to be sequentially calculated for each of these channels As the echo path coupling amount of the received signal of that channel ,
Multiplying the received signal of the corresponding channel by the echo path coupling amount of each channel to estimate N echo signals,
The power obtained by subtracting the power of these N echo signals from the power of the collected sound signal is normalized by the power of the collected sound signal to calculate N echo suppression gains,
A multi-channel echo suppression method characterized by multiplying the collected sound signal by these N echo suppression gains.

The received signal of each channel and the collected sound signal are each converted into a short-time spectrum,
The echo path coupling amount is the echo path coupling amount of the short time spectrum, the N echo signals are echo signals of the short time spectrum, the echo suppression gain is an echo suppression gain of the short time spectrum, and The multiplication of the echo suppression gain with respect to the collected sound signal is a multiplication with respect to the short time spectrum of the echo suppression gain of the short time spectrum, and the multiplication spectrum is converted into a time domain signal and output. Item 5. The multi-channel echo suppression method according to Item 1.

N channel (N is an integer of 2 or more) received signals are added between the channels,
The power ratio between the added reception signal and the sound pickup signal of one channel is calculated every predetermined time, and the minimum value of the sequentially calculated power ratio is held as an echo path coupling amount.
The echo signal is estimated by multiplying the added received signal by the echo path coupling amount,
The power value obtained by subtracting the power of the echo signal from the power of the collected sound signal is normalized by the power of the collected sound signal to calculate an echo suppression gain,
A multi-channel echo suppression method, wherein the echo collection gain is multiplied by the echo suppression gain.

Each of the added reception signal and the collected sound signal is converted into a short-time spectrum,
The echo path coupling amount is an echo path coupling amount of a short-time spectrum, the echo signal is an echo signal of the short-time spectrum, the echo suppression gain is an echo suppression gain of a short-time spectrum, and the echo suppression Multiplying the gain and the collected sound signal is the multiplication of the echo suppression gain of the short time spectrum and the collected signal of the short time spectrum,
4. The multi-channel echo suppression method according to claim 3, wherein the spectrum multiplication result is converted into a time domain signal.

By obtaining the cross-correlation between the received signal of each channel and the collected sound signal, the echo path propagation delay amount of each channel is estimated,
5. The received signal of a corresponding channel is delayed according to the estimated echo path propagation delay amount of each channel, and the delayed received signal is used as the received signal. Multi-channel echo suppression method.

6. The calculation of the echo suppression gain according to claim 1, wherein a power value obtained by subtracting a masking threshold is used as the power of the echo signal in the calculation of the echo suppression gain. Channel echo suppression method.

In an echo suppression device that receives a reception signal of N channels (N is an integer of 2 or more) and one sound collection signal and outputs a signal in which an echo component due to the reception signal is suppressed from the sound collection signal.
The received signal for each channel and the collected sound signal are input, and the power ratio of the collected signal to the received signal for each channel is calculated every predetermined time, and the power ratio is calculated for each channel sequentially for each channel. N echo coupling amount calculation units that hold the minimum value of the output and output as the echo path coupling amount of the channel;
N echo estimators for inputting the echo path coupling amount and the received signal for each channel, respectively, multiplying them and outputting an echo signal, respectively;
The echo signals and the collected sound signals from the N echo estimation units are respectively input, and the power value obtained by subtracting the power of each echo signal from the power of the collected sound signals is normalized by the power of the collected sound signals and echoed. N echo suppression gain calculators that output suppression gains;
Each echo suppression gain from the N echo suppression gain calculation units and the collected sound signal are input, and an echo suppression unit that outputs the signal obtained by suppressing the echo component by multiplying the collected sound signal by the echo suppression gain; A multi-channel echo suppression device comprising:

In an echo suppression device that receives a reception signal of N channels (N is an integer of 2 or more) and one sound collection signal and outputs a signal in which an echo component due to the reception signal is suppressed from the sound collection signal.
An adder that receives N-channel received signals, adds them between the channels, and outputs the result as one added received signal;
The added received signal and the collected sound signal are input, their power ratio is calculated every predetermined time, the minimum value of the sequentially calculated power ratio is held, and the echo coupling amount calculation is output as the echo path coupling amount And
The echo estimation unit that inputs the echo path coupling amount and the added reception signal, multiplies them together, and outputs an echo signal;
An echo suppression gain calculation unit that receives the echo signal and the collected sound signal, normalizes the power of the collected sound signal by subtracting the echo signal power, and outputs an echo suppression gain. When,
A multi-channel echo suppression apparatus comprising: an echo suppression unit that receives the echo suppression gain and the collected sound signal, multiplies them, and outputs a signal in which the echo component is suppressed.

A program for causing a computer to execute the method according to claim 1.

A computer-readable recording medium on which the program according to claim 9 is recorded.