JP3573241B2

JP3573241B2 - Echo canceling method and apparatus

Info

Publication number: JP3573241B2
Application number: JP5954997A
Authority: JP
Inventors: 澄宇阪内; 昭二牧野; 陽一羽田; 順治小島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1996-09-20
Filing date: 1997-03-13
Publication date: 2004-10-06
Anticipated expiration: 2017-03-13
Also published as: JPH10150343A

Description

【０００１】
【発明の属する技術分野】
本発明は、例えば２線４線変換系や拡声通話系などにおいてハウリングの原因および聴覚上の障害となる反響信号を消去または抑圧する反響消去方法および装置に関する。
【０００２】
【従来の技術】
まず、このようなハウリングの原因および聴覚上の障害となる反響信号について図８に示す拡声通話系を参照して説明する。
【０００３】
図８において、１，３は送話用マイクロホン、２，４は受話スピーカ、５，７は送話信号増幅器、６，８は受話信号増幅器、９は伝送路、１０は送話者、１１は受話者をそれぞれ表す。送話者１０の発声した送話音声は、送話用マイクロホン１、送話信号増幅器５、伝送路９、受話信号増幅器８、受話スピーカ４を経て受話者１１に伝わる。この拡声通話系は、従来の電話通話系のように送受話器を手に持つ必要がないため、作業をしながらの通話が可能であったり、また自然な対面通話が実現できるという長所を持ち、通信会議やテレビ電話、拡声電話機などに広く利用が進められている。
【０００４】
しかしながら、この通話系の欠点として、反響の存在が問題となっている。すなわち、図８において、スピーカ４から受話側に伝わった音声が、マイクロホン３で受音され、送話信号増幅器７、伝送路９、受話信号増幅器６、スピーカ２を経て送話側に再生される。送話者１０にとって、この現象は、自分の発声した音声が、スピーカ２から再生されるという反響現象であり、音響エコーなどと呼ばれている。この反響現象は、拡声通話系において通話の障害や不快感などの悪影響を生じる。更に、スピーカ２から再生された音は、マイクロホン１で受音されて信号の閉ループを形成する。そして、ループゲインが１より大きい場合にはハウリング現象が発生して、通話は不能となる。
【０００５】
このような拡声通話系の問題点を克服するために、反響消去装置が利用されている。反響消去装置の代表的な構成法としては、フルバンド方式とサブバンド方式が知られている。
【０００６】
図９は従来のフルバンド方式の反響消去装置の一例を示すブロック図を表している。この図において、２１は反響消去装置、２２は疑似反響路、２３は反響路推定回路、２４は減算器を表している。また、ｘ（ｎ）２５は受話信号、ｈ（ｎ）２６は受話スピーカ４と送話用マイクロホン３の間の反響路伝達特性（インパルス応答）、ｙ（ｎ）２７は反響信号、ｙ＾（ｎ）２８は疑似反響信号、ｈ＾（ｎ）２９は反響路インパルス応答の推定値、ｅ（ｎ）３０は誤差信号、ｓ（ｎ）３１は近端話者の送話信号、ｚ（ｎ）３２はマイクロホン出力信号を表している。
【０００７】
反響消去装置２１では、まず反響路推定回路２３において反響路のインパルス応答を推定し、その推定値ｈ＾（ｎ）２９を疑似反響路２２に転送する。次に疑似反響路２２において、ｈ＾（ｎ）２９と受話信号ｘ（ｎ）２５との畳み込み演算を実行して、疑似反響信号ｙ＾（ｎ）２８を合成する。そして減算器２４において、マイクロホン３の出力信号ｚ（ｎ）から疑似反響信号ｙ＾（ｎ）２８を差し引く。反響路インパルス応答ｈ（ｎ）２６の推定が良好に行われていれば、反響信号ｙ（ｎ）２７と疑似反響信号ｙ＾（ｎ）２８はほぼ等しいものとなっており、この減算の結果、マイクロホン出力に含まれる反響器信号ｙ（ｎ）２７は消去される。
【０００８】
ここで、疑似反響路２２は、反響路インパルス応答ｈ（ｎ）２６の経時変動に追従する必要がある。そのため反響路推定回路２３では、適応アルゴリズムを用いて反響路インパルス応答の推定を行う。この推定動作は受話状態、すなわちｓ（ｎ）≒０であり、ｚ（ｎ）≒ｙ（ｎ）と見なせる時に実行される。受話状態において、誤差信号ｅ（ｎ）３０は反響信号の消去残差ｙ（ｎ）−ｙ＾（ｎ）と見なすことができる。以下の説明では、この受話状態を仮定する。適応アルゴリズムとは受話信号ｘ（ｎ）２５と誤差信号ｅ（ｎ）３０を用いて、誤差信号のパワーが最小になるようにインパルス応答の推定値ｈ＾（ｎ）２９を定めるアルゴリズムであって、ＬＭＳ法、学習同定法、ＥＳ法などが知られている。ここで、疑似反響路２２の値が真の反響路の値に近く、疑似反響信号ｙ＾（ｎ）２８が反響信号ｙ（ｎ）２７にほぼ等しくなった状態を、収束したと呼ぶ。また、疑似反響路２２と反響路推定回路２３を合わせて、ここでは適応フィルタと呼ぶ。
【０００９】
図１０は従来のサブバンド方式の反響消去装置の一例を示すブロック図であり、図８，９と共通な部分には同一の番号を付した。受話信号ｘ（ｎ）２５およびマイクロホン出力信号ｚ（ｎ）３２はそれぞれＮ個の周波数帯域に分割される。４１および４２は周波数帯域分割回路、４３は周波数帯域合成回路、４４−１から４４−Ｎは適応フィルタ、４５−１ｘ_１（ｍ）から４５−Ｎｘ_Ｎ（ｍ）は周波数分割された後の受話信号、４６−１ｚ_１（ｍ）から４６−Ｎｚ_Ｎ（ｍ）は周波数分割された後のマイクロホン出力信号、４７−１ｅ_１（ｍ）から７４−Ｎｅ_Ｎ（ｍ）は周波数分割された後の誤差信号である。また、ｍは周波数分割回路で間引かれた後の信号の離散時間を示し、間引き率をＲとしたとき、ｎ＝Ｒ×ｍの関係がある。サブバンド方式の反響消去装置は、このように、各周波数帯域毎の適応フィルタにおいて、反響信号の抑圧を行う。
【００１０】
一般に適応フィルタにおいて、そのフィルタ係数の数を示すタップ長は、反響路の（真の）インパルス応答長分だけ用意された場合には、完全な反響消去が実現できる。しかし、一般に反響信号（音響エコー）は、残響時間にして数百ｍｓ程度のインパルス応答継続時間を有する。従って、この反響器信号を消去しようとする場合、タップ数が非常に長大となり、ハードウェア規模の増大を招く。このため、人間のエコーに対する許容限に基づいて所要エコー抑圧量を決定し、反響路のインパルス応答長とから、要求される抑圧量に見合ったタップ長を用意する方式が考えられる。これまでのフルバンド方式の反響消去装置においては、このタップ数を室内の平均残響時間を元に次のように決めていた。タップ長Ｌ_Ｆｕｌｌ _Ｂａｎｄは所要エコー抑圧量（ＤｅｓｉｒｅｄＬｏｓｓ）ＤＬ、平均残響時間Ｔ_Ｒおよびサンプリング間隔Ｔ_Ｓ（ｓ）として次式で与えられる。
【００１１】
【数１】

この式（１）からわかるように、これまでタップ長は全周波数帯域において求められた所要エコー抑圧量と、同じく全周波数帯域において求められた室内の残響時間によって決定されていた。
【００１２】
しかし、人間の聴覚特性を考えた場合、周波数帯域毎に可聴レベルが異なることから、所要エコー抑圧量も周波数帯域毎に異なる。更に、室内の残響時間は、全周波数帯域を平均した値に比べ、低域はより長く、高域になるに従いより短くなることが知られている。このように、従来のタップ長の決定は、所要エコー抑圧量が周波数帯域毎に異なる点を考慮していない。そのため、信号を複数の帯域に分割し、それぞれの帯域毎にエコー抑圧処理を行うサブバンド方式の反響消去装置に、全周波数帯域で求められた値をそのまま適用すれば、装置のタップ長の割り当てという点において無駄が生じてしまう。
【００１３】
加えて、戻ってくるエコーは伝送遅延が増加するにつれて、より検知しやすくなる。つまり、伝送遅延が小さい場合には、エコーは自分の発声にマスキングされ、側音と同様に聞こえにくい。しかし、伝送遅延が増加するにつれ、エコーは時間軸上で自分の発声のマスキングする範囲を超えてしまうため、より聞こえ易くなってしまう。従って、所要エコー消去量は伝送遅延の大小によって、変化してくるという問題がある。更に、この遅延による所要エコー抑圧量の変化について、周波数軸上での検討はこれまでなされていない。
【００１４】
このように、従来のタップ長の決め方は、周波数軸上での聴覚特性を考慮せず、全帯域での所要エコー抑圧量および残響時間によって決定している点に加え、伝送遅延の大小による所要エコー抑圧量の変化にも対応していない。
【００１５】
【発明が解決しようとする課題】
以上説明したように、従来の反響消去装置においては、適応フィルタのタップ長を全帯域での所要エコー抑圧量の値と、全帯域で平均された残響時間によって決定していた。このため、サブバンド方式の反響消去装置においては、タップ長の各帯域毎の割り当てという点において無駄が生じるという問題がある。
【００１６】
また、伝送遅延の大小により、所要エコー抑圧量の値は変化する。従って、想定された値と大幅に異なる伝送遅延が存在する回線において、反響消去装置を使用した場合、エコーを十分に消去することができず、通話品質の劣化を招くという問題がある。
【００１７】
本発明は、上記に鑑みてなされたもので、その目的とするところは、伝送遅延の大小による通話品質の劣化を生じることなく、反響信号を十分に消去し得る反響消去方法および装置を提供することにある。
【００１８】
【課題を解決するための手段】
上記目的を達成するため、請求項１記載の本発明は、反響路への送出信号を複数の周波数帯域に分割し、前記送出信号が反響路を経由した後の反響信号を複数の周波数帯域に分割して、各周波数帯域の疑似反響路を生成し、複数の周波数帯域の送出信号を複数の周波数帯域毎の疑似反響路に入力して得られる複数の周波数帯域の疑似反響信号を複数の周波数帯域の反響信号から差し引いて反響信号を消去する反響消去方法であって、前記各周波数帯域の疑似反響路をそれぞれ適応フィルタで構成し、反響信号の消去誤差を最小とするように動作するアルゴリズムにより各適応フィルタのフィルタ係数を逐次的に修正し、各周波数帯域毎の発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベルおよび各周波数帯域毎の人間の音に対する可聴レベルから決められる必要十分なエコーの抑圧量に基づいて各適応フィルタのフィルタ係数の数を示すタップ長を決定することを要旨とする。
【００１９】
請求項１記載の本発明にあっては、各周波数帯域の疑似反響路を構成する各適応フィルタのフィルタ係数を逐次的に修正して反響信号の消去誤差を最小とし、発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベルおよび人間の音に対する可聴レベルから決められる必要十分なエコーの抑圧量に基づいて各適応フィルタのフィルタ係数の数を示すタップ長を決定している。
【００２０】
また、請求項２記載の本発明は、請求項１記載の発明において、前記タップ長を決定するステップが伝送遅延の大小を測定し、この測定された伝送遅延、各周波数帯域毎の発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベル、および各周波数帯域毎の人間の音に対する可聴レベルから必要十分なエコーの抑圧量を決定し、各周波数帯域毎の反響路である室内の残響時間および前記決定された所要エコー抑圧量からタップ長の計算を行うことを要旨とする。
【００２１】
請求項２記載の本発明にあっては、伝送遅延、発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベルおよび人間の音に対する可聴レベルから必要十分なエコーの抑圧量を決定し、この決定した所要エコー抑圧量および残響時間からタップ長を計算している。
【００２２】
更に、請求項３記載の本発明は、請求項２記載の発明において、前記所要エコー抑圧量を決定するステップが、前記測定された伝送遅延が所定の値以下のときは、各周波数帯域毎の発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベルとして音声の平均パワーレベルと発話音声の平均マスキングレベルとの差を用い、この差と各周波数帯域毎の人間の音に対する可聴レベルとを用いて、所要のエコー抑圧量を決定することを要旨とする。
【００２３】
請求項３記載の本発明にあっては、伝送遅延が所定の値以下のときは、発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベルとして音声の平均パワーレベルと発話音声の平均マスキングレベルとの差を用い、この差と人間の音に対する可聴レベルを用いて所要のエコー抑圧量を決定している。
【００２４】
請求項４記載の本発明は、請求項２記載の発明において、前記所要エコー抑圧量を決定するステップが、前記測定された伝送遅延が所定の値以上のときは、各周波数帯域毎の発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベルとして音声の平均パワーレベルを用い、この平均パワーレベルと各周波数帯域毎の人間の音に対する可聴レベルとを用いて、所要エコー抑圧量を決定することを要旨とする。
【００２５】
請求項４記載の本発明にあっては、伝送遅延が所定の値以上のときは、発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベルとして音声の平均パワーレベルを用い、この平均パワーレベルと人間の音に対する可聴レベルを用いて所要エコー抑圧量を決定している。
【００２６】
また、請求項５記載の本発明は、請求項３または４記載の発明において、前記伝送遅延の所定の値が６０ｍｓである。
【００２７】
更に、請求項６記載の本発明は、反響路への送出信号を複数の周波数帯域に分割する第１の周波数帯域分割回路と、前記送出信号が反響路を経由した後の反響信号を複数の周波数帯域に分割する第２の周波数帯域分割回路と、前記周波数帯域分割回路により分割されたそれぞれの周波数帯域の疑似反響路を生成し、前記複数の周波数帯域の送出信号を前記複数の周波数帯域毎の疑似反響路の入力とすることにより得られる複数の周波数帯域の疑似反響信号を前記複数の周波数帯域の反響信号から差し引くことにより前記反響信号を消去する反響消去装置であって、前記各周波数帯域の疑似反響路をそれぞれ構成し、前記反響信号の消去誤差を最小とするように動作するアルゴリズムにより逐次的に修正されるフィルタ係数を有する適応フィルタと、各周波数帯域毎の発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベルおよび各周波数帯域毎の人間の音に対する可聴レベルから決められる必要十分なエコーの抑圧量に基づいて前記それぞれの周波数帯域の適応フィルタのフィルタ係数の数を示すタップ長を決定するタップ長割り当て手段とを有することを要旨とする。
【００２８】
請求項６記載の本発明にあっては、各周波数帯域の疑似反響路を構成する各適応フィルタのフィルタ係数を逐次的に修正して反響信号の消去誤差を最小とし、発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベルおよび人間の音に対する可聴レベルから決められる必要十分なエコーの抑圧量に基づいて各適応フィルタのフィルタ係数の数を示すタップ長を決定している。
【００２９】
請求項７記載の本発明は、請求項６記載の発明において、前記タップ長割り当て手段が、伝送遅延の大小を測定する伝送遅延判定手段と、前記測定された伝送遅延、前記各周波数帯域毎の発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベル、および前記各周波数帯域毎の人間の音に対する可聴レベルから必要十分なエコーの抑圧量を決定する所要エコー抑圧量決定手段と、反響路である室内の残響時間を各周波数帯域毎に記憶する残響時間記憶手段と、前記記憶された各周波数帯域毎の反響路である残響時間および前記決定された所要エコー抑圧量からタップ長の計算を行うタップ長計算手段とを有することを要旨とする。
【００３０】
請求項７記載の本発明にあっては、伝送遅延、発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベルおよび人間の音に対する可聴レベルから必要十分なエコーの抑圧量を決定し、この決定した所要エコー抑圧量および残響時間からタップ長を計算している。
【００３１】
また、請求項８記載の本発明は、請求項７記載の発明において、前記所要エコー抑圧量決定手段が、前記伝送遅延判定手段において前記測定された伝送遅延が所定の値以下のときは、前記各周波数帯域毎の発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベルとして音声の平均パワーレベルと発話音声の平均マスキングレベルとの差を用い、この差と前記各周波数帯域毎の人間の音に対する可聴レベルとを用いて、所要のエコー抑圧量を決定することを要旨とする。
【００３２】
請求項８記載の本発明にあっては、伝送遅延が所定の値以下のときは、発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベルとして音声の平均パワーレベルと発話音声の平均マスキングレベルとの差を用い、この差と人間の音に対する可聴レベルとを用いて所要のエコー抑圧量を決定している。
【００３３】
更に、請求項９記載の本発明は、請求項７記載の発明において、前記所要エコー抑圧量決定手段が、前記伝送遅延判定手段において前記測定された伝送遅延が所定の値以上のときは、前記各周波数帯域毎の発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベルとして音声の平均パワーレベルを用い、この平均パワーレベルと前記各周波数帯域毎の人間の音に対する可聴レベルとを用いて、所要エコー抑圧量を決定することを要旨とする。
【００３４】
請求項９記載の本発明にあっては、伝送遅延が所定の値以上のときは、発話音声によるマスキング効果を考慮したエコー音声の平均パワーレベルとして音声の平均パワーレベルを用い、この平均パワーレベルと人間の音に対する可聴レベルとを用いて所要エコー抑圧量を決定している。
【００３５】
請求項１０記載の本発明は、請求項８または９記載の発明において、前記伝送遅延の所定の値が６０ｍｓであることを要旨とする。
【００４０】
【発明の実施の形態】
本発明の実施形態を説明する前に、まず伝送遅延の大小による所要エコー抑圧量の変化について実験を用いて説明する。
【００４１】
従来、タップ長を決定する基になる所要エコー抑圧量は、全周波数帯域一律にしか求められていない。また、伝送遅延が大きくなると、エコーの検知限が増加（所要エコー抑圧量が増加）することは知られていたが、その際、周波数帯域毎にはどのように変化していくのかは、明らかにされてはいない。
【００４２】
そのため、所要エコー抑圧量が伝送遅延の大小によって、周波数帯域毎にどのように変化するのかを模擬実験システムを用いた主観評価により調べた。
【００４３】
模擬実験システムは、７ｋＨｚ帯域を有する４線回線構成の対向拡声通信システムを想定した。図３にシステムの概略図を示す。
【００４４】
この評価実験は、評価側で発声した音声が、相手側の室内伝達特性を模擬した実時間畳み込み装置を通り、再び戻ってくるエコーに対して所要エコー抑圧量を決定するものである。ここで、所要量の評価に用いられるエコー信号は、サブバンド方式の反響消去装置で使用されるものと同様の帯域通過フィルタを通り、拡声の際にはすでに各帯域に制限されている。この各帯域毎のエコーに対し、可変抵抗器により損失を挿入し、模擬的にエコー抑圧を行い所要エコー抑圧量を決定する。なお、送受話感度はＩＴＵ−Ｔ勧告Ｐ．３４に従い設定しており、挿入した損失量が０ｄＢの状態を所要量０ｄＢとした。
【００４５】
所要エコー抑圧量を決定する評価カテゴリは、実装置の設計にあたり最も妥当な所要エコー抑圧量を決定することを念頭に置き、「エコーを聞き分けようと注意深く聴くと、幾分残留エコー感がある。」程度とした。このカテゴリは従来の評価における、検知限と許容限の中間範囲にあたる。
【００４６】
評価パラメータは、設計するサブバンド方式の反響消去装置に対応した３２分割の各帯域（帯域幅２５０Ｈｚ）、および伝送遅延時間である。なお、実時間畳み込み装置内の音響結合は−２ｄＢとした。
【００４７】
こうして、各帯域の所要量が、伝送遅延の大小により、どのように影響を受けるかを統計的に有意な評定者数において調べた。
【００４８】
評価パラメータである遅延時間については、伝送遅延なしと伝送遅延あり（約２００ｍｓ）を模擬した。ここで伝送遅延なしの場合においても、サブバンド方式の反響消去装置の分割合成フィルタの一巡処理遅延（約２８ｍｓ）は加味した。なお、評価側および相手側のエコー経路インパルス応答継続時間は、２００ｍｓとした。こうして求めた周波数帯域別の所要エコー抑圧量を図４に示す。
【００４９】
図４の横軸は周波数（Ｈｚ）、縦軸は所要エコー抑圧量（ｄＢ）を表し、実線は伝送遅延が小さい場合、破線は大きい場合の結果を示している。
【００５０】
この実験結果より、伝送遅延が大きくなるに従い低域部の所要エコー抑圧量が増加するということがわかる。伝送遅延の大小による所要エコー抑圧量の周波数特性の変化は、定性的には次のように理解できる。
【００５１】
伝送遅延が小さい時には発話音声とそのエコーがほぼ同時に聞こえるため、エコーは周波数特性のほぼ等しい発話音声に周波数軸上で均等にマスクされる。つまり、エコー音声パワーの大きな低域部では、発話音声によるマスキングレベルも大きく、エコー音声パワーの小さな高域部になるに従い、発話音声によるマスキングレベルも小さくなる。そのため、エコー音声の平均パワーレベルは周波数軸上で直線的になる。このため可聴感度のよい中域部（２〜４ｋＨｚ）で所要エコー抑圧量が極大値をとる。
【００５２】
これに対し伝送遅延が大きくなると、発話音声とそのエコーに時間的なズレが生じ、周波数軸上でエコーは均等にマスクされる割合が小さくなる。場合によっては、無声区間に戻ってきたエコーは全くマスキングされない。そのため、エコー音声の平均パワーレベルは、音声の平均パワーレベルの周波数分布に従うため、所要エコー抑圧量は低域部でより大きくなる。これらの解釈の妥当性を定量的な分析により調べた。
【００５３】
図５は、伝送遅延が大きい場合の所要エコー抑圧量の決定の一例を示した図であり、Ｘ_ｆ６１はＩＴＵ勧告Ｐ．３４の送受話感度に従った音声の平均パワースペクトルであり、Ｚ_ｆ６２は前記評価実験を行った周囲騒音３０ｄＢＡ以下の評価室内における可聴レベルを示した図である。図６は、伝送遅延が小さい場合の所要エコー抑圧量の決定の一例を示した図であり、Ｙ_ｆ６３はエコー音声の平均パワーレベルと発話音声のマスキングレベルの差、つまり発話音声にマスキングされた後のエコー音声の平均パワースペクトルである。
【００５４】
始めに、伝送遅延が大きい場合の解釈によると、所要エコー抑圧量ＤＬは図５の音声の平均パワースペクトルＸ_ｆ６１と、可聴レベルＺ_ｆ６２から次式のように決定することができる。
【００５５】
ＤＬ＝Ａ・（Ｘ_ｆ−Ｚ_ｆ）（２）
ここで、式（２）中の重み係数Ａは、前述の評価実験条件および図５の音声の平均パワースペクトルＸ_ｆ６１と前記可聴レベルＺ_ｆ６２のときに、Ａ＝１と規格化している。実際に、所要エコー抑圧量の計算した結果を図７のＤＬ_ｌｏｎｇ７１に示す。この結果は、主観評価の結果である図４の伝送遅延が大きい場合の結果と、大変よく一致しており、上記解釈が正しかったこと、および所要エコー抑圧量が式（２）によって求め得ることを示している。
【００５６】
次に、伝送遅延が小さい場合の解釈によると、所要エコー抑圧量ＤＬは図６の発話音声にマスキングされたエコー音声の平均パワースペクトルＹ_ｆ６３と、前記可聴レベルＺ_ｆ６２から次式のように決定することができる。
【００５７】
ＤＬ＝Ａ・（Ｙ_ｆ−Ｚ_ｆ）（３）
遅延の大きい場合と同様に、実際に所要エコー抑圧量の計算した結果を図７のＤＬ_{ｓｈｏｒｔ}７２に示す。この結果は、主観評価の結果である図４の伝送遅延が小さい場合の結果と、大変よく一致しており、上記解釈が正しかったこと、および所要エコー抑圧量が式（３）によって求め得ることを示している。
【００５８】
以上の結果は、次のようにまとめることができる。
【００５９】
（１）伝送遅延が小さい場合；所要エコー抑圧量は、周波数軸に対する発話音声にマスキングされたエコー音声の平均パワーレベルと可聴レベルによって近似される。
【００６０】
（２）伝送遅延が大きい場合；所要エコー抑圧量は、周波数軸に対する音声の平均パワーレベルおよび可聴レベルの特性で近似される。
【００６１】
こうして得られた所要エコー抑圧量から、サブバンド方式の反響消去装置の適応フィルタのタップ長を周波数帯域別に求める方法を以下に示す。各帯域毎のタップ長Ｌ_{ＳｕｂＢａｎｄ}は、各帯域毎の所要エコー抑圧量ＤＬ_ｆおよび室内の各帯域毎のインパルス応答継続時間Ｔ_Ｒｆ（ｓ）との関係より、
【数２】

として求めることができる。ここで、Ｍは間引き数である。
【００６２】
以上説明したように、伝送遅延の大小により、聴覚特性に基づいた所要エコー抑圧量の周波数特性の場合分けを行うこと、および各帯域の残響時間を考慮することにより、フルバンド方式では不可能であった最適なタップ長の割り当てを決定できることがわかる。
【００６３】
本発明では、このようにして所要エコー抑圧量を聴覚特性に基づき求め、周波数帯域毎のタップ長の割り当てを決定する。そのために、全周波数帯域に一律に決定していた従来の方法に比べ、より聴感上反響信号が聞こえにくい。一方、伝送遅延の大小により所要エコー抑圧量の周波数特性が変化することをもとに、帯域毎のタップ割り当ての切り替えを行う。従って、伝送遅延の異なる使用条件においても、通話品質の劣化を防ぐことが可能となり、本発明の目的であるタップ長の効率的な割り当てを行うことができるようになる。
【００６４】
次に、図１を参照して、本発明の一実施形態に係る反響消去装置について説明する。なお、図１において、図１０と同様な部分には同一符号を付与し、その説明を省略する。
【００６５】
図１において、５１はタップ長割り当て回路、５２はタップ長計算回路、５３は所要エコー抑圧量決定回路、５４は伝送遅延判定回路、５５は残響時間記憶回路である。
【００６６】
適応フィルタ４４−１〜４４−Ｎの各周波数帯域毎のタップ長は、タップ長割り当て回路５１によって決定され転送される。このタップ長の割り当ては、所要エコー抑圧量決定回路５３で求められた帯域毎の所要エコー抑圧量ＤＬと、残響時間記憶回路５５に記憶されている帯域毎の残響時間Ｔ_Ｒｆから、式（１）に基づきタップ長計算回路５２で計算される。
【００６７】
はじめに、所要エコー抑圧量の決定について説明する。
【００６８】
前述の実験結果より、所要エコー抑圧量は伝送遅延の大小で、周波数特性を切り替える必要がある。そのため、伝送遅延判定回路５４において、伝送遅延がしきい値ＤＴ_ｔｈ以上か以下かを判定し、その伝送遅延の大小によって所要エコー抑圧量決定回路５３において、所要エコー抑圧量ＤＬを決定する。
【００６９】
ここで、伝送遅延のしきい値ＤＴ_ｔｈの値は、継時マスキングの最大値が約６０ｍｓであることから、ＤＴ_ｔｈ＝６０ｍｓとする。なお、反響消去装置の使用される回線における伝送遅延の値が既知の場合は、所要エコー抑圧量を決定するエコー音声の平均パワースペクトルを始めからＸ_ｆ６１またはＹ_ｆ６３に指定しておくこととする。
【００７０】
タップ長割り当て回路５１中の所要エコー抑圧量決定回路５３において、式（４）中の所要エコー抑圧量ＤＬは、エコー音声の平均パワースペクトルＸ_ｆ，Ｙ_ｆおよび可聴レベルＺ_ｆによって計算される。なお、所要エコー抑圧量決定回路５３には、エコー音声パワースペクトルＸ_ｆ，Ｙ_ｆおよび可聴限Ｚ_ｆの値が記憶されている。
【００７１】
まず、前記伝送遅延判定回路５４において、伝送遅延がしきい値より大きいと判定された場合は、所要エコー抑圧量決定回路５３において、所要エコー抑圧量ＤＬ_ｆを図５の音声の平均パワースペクトルＸ_ｆ６１と、可聴レベルＺ_ｆ６２から式（２）を用いて求める。こうして、伝送遅延が大きい場合は、図７のＤＬ_ｌｏｎｇ７１のような周波数特性を持つ所要エコー抑圧量ＤＬ_ｆが求められる。
【００７２】
なお、式（２）中の重み係数Ａは、前記送受話感度を基準値としている。そのため、使用する通話系の送受話感度に合わせ、ｄＢ単位で線形に重み付けをすることにより、様々な反響消去装置の使用条件に対応できる。
【００７３】
次に、前記伝送遅延判定回路５４において、伝送遅延がしきい値より小さいと判定された場合は、前記所要エコー抑圧量決定回路５３において、所要エコー抑圧量ＤＬ_ｆを図６の発話音声にマスキングされたエコー音声の平均パワースペクトルＹ_ｆ６３と、可聴レベルＺ_ｆ６２から式（２）を用いて求める。こうして、伝送遅延が小さい場合は、図７中のＤＬ_{ｓｈｏｒｔ}７２のような周波数特性を持つ所要エコー抑圧量ＤＬ_ｆが求められる。ここまでが、所要エコー抑圧量ＤＬ_ｆの決定手順である。
【００７４】
一方、式（４）中の各帯域毎の残響時間Ｔ_Ｒｆは、残響時間記憶回路５５からタップ長計算回路５２に転送される。室内の残響時間の周波数帯域毎の値は、はじめから残響時間記憶回路５５に記憶されていることとする。
【００７５】
得られた帯域毎の所要エコー抑圧量ＤＬ_ｆと、帯域毎の残響時間Ｔ_Ｒｆを用い、タップ長計算回路５２において、式（４）に従い帯域毎のタップ長を計算する。
【００７６】
このような手順で求めた各帯域の必要タップ長の一例を図２に示す。ここで式（４）中の各帯域のインパルス応答継続時間には、室容積８７ｍ^３、残響時間３００ｍｓの実験室での実測値を用いた。黒塗りが伝送遅延が小さい場合、白抜きが大きい場合のタップ長の割り当てをそれぞれ示している。
【００７７】
図２よりタップ数は、伝送遅延が小さい場合、可聴感度のよい中域に多く割り当て、低域および高域になるに従い少なくしてよい。また、伝送遅延が大きい場合には、音声音圧の大きな低域部により多く割り当てる必要があり、中域部で幾分増加するものの、高域部になるに従い割り当てを少なくしてよい。
【００７８】
尚、図２は２数値をコラム形式で表示した図であり、そのため黒と白の２本のコラムの対の周波数をｘ軸上に示す。つまりｘ軸の周波数値はステップ状であり、例えばｘ＝４５００（Ｈｚ）のとき、ｙのタップ数は、小さい伝送遅延と大きい伝送遅延でそれぞれｙ_Ｌ≒３３．３とｙ_Ｈ≒５６．３の２通りあることになる。
【００７９】
求めた帯域毎のタップ長は、前述の通りタップ長割り当て装置５１より各帯域毎の適応フィルタ４４−１〜４４−Ｎに転送される。
【００８０】
上述したように、本反響消去装置では、周波数軸上での聴覚特性を考慮し所要エコー抑圧量を求め、周波数帯域毎に必要十分なタップ長を決定し、その効率的な割り当てを行うこと、および伝送遅延の大小によって変化する所要エコー抑圧量に対応して、タップ長の割り当てを切り替え、必要最小限の演算量によって、通話品質の劣化を防いでいる。
【００８１】
図１１は、本発明の他の実施形態に係る反響消去装置の構成を示すブロック図である。図１１に示す反響消去装置は、図１に示した反響消去装置における所要エコー抑圧量決定回路５３を所要エコー抑圧量計算回路５６と残響成分計算回路５７で構成したものであり、これにより伝送遅延が小さい場合においても残響時間の大小により所要エコー抑圧量の周波数特性を切り替えるように構成した点が図１の反響消去装置と異なるものであり、その他の構成は図１の反響消去装置と同じであり、同じ構成要素には同じ符号が付されている。
【００８２】
すなわち、図１１において、所要エコー抑圧量決定回路５３を構成している残響成分計算回路５７は、残響時間記憶回路５５から供給される反響路の残響時間から音声の残響成分Ｗ_ｆを計算し、この値と可聴レベルＺ_ｆを用いて所要エコー抑圧量計算回路５６において所要エコー抑圧量ＤＬを計算し、この計算した所要エコー抑圧量ＤＬをタップ長計算回路５２に供給している。
【００８３】
更に詳しくは、所要エコー抑圧量に対する伝送遅延の影響に加えて、反響路である相手側室内の残響時間の影響を調べるために、主観評価実験を行い、その実験結果について分析し、より効率的なタップ長の割り当てを行う。
【００８４】
この主観評価実験の実験システムは、伝送遅延の影響を調べるために用いた前述の実験システムと同様なものを用いた。評価パラメータは、前述の３２分割した各周波数帯域、および反響路（相手側室内）の残響時間である。
【００８５】
評価パラメータである残響時間は、小さい場合（約１１０ｍｓ）と大きい場合（約４５０ｍｓ）の２種類を模擬した。これらの値には実測値を用いている。そして、各々の残響時間に対して伝送遅延が小さい場合（２８ｍｓ）と大きい場合（３００ｍｓ）のそれぞれについて所要エコー抑圧量を求めた実験結果を図１２に示す。
【００８６】
図１２において、黒塗りの四角形と三角形は、残響時間が小さい場合（約１１０ｍｓ）、白抜きの丸と逆三角形（○，▽）は、残響時間が大きい場合（約４５０ｍｓ）に、それぞれ遅延が２８ｍｓ，３００ｍｓにおける結果を示している。
【００８７】
図１２の結果から、まず全般的に残響時間の大小は伝送遅延の大小に比べて影響が小さいということがわかる。特に伝送遅延が大きい場合には、残響時間は所要エコー抑圧量にほとんど影響を与えていない。この理由としては、遅延時間が３００ｍｓ付近では、すでに発話音声によるマスキングの効果が完全になくなり、所要エコー抑圧量が音声の平均パワーレベルと可聴レベルによってのみ決まり、残響の影響があまり目立たなくなるためと考えられる。
【００８８】
一方、伝送遅延が小さい場合でも、残響時間が大きくなると低域部の所要エコー抑圧量が増加している。これは遅延が小さくとも、評価側に戻ってくるエコーに残響が付加されている場合、その残響成分が発話音声の継時マスキングの範囲を越えていればその分が耳障りなエコーとして検知されてしまうためと考えられる。また、その時の所要エコー抑圧量の周波数特性は、伝送遅延が継時マスキングを越えた場合の所要エコー抑圧量の周波数特性に類似すると解釈できる。なお、その所要エコー抑圧量が全体的に小さくなっている理由としては、エコーの残響成分がその直接音成分に比べて小さいためと考えられる。
【００８９】
ここで、伝送遅延が小さくとも残響時間が大きい場合の、所要エコー抑圧量に対する解釈の妥当性を定量的な分析より調べた。
【００９０】
図１３は、伝送遅延が小さく残響時間が大きい場合の所要エコー抑圧量の決定の一例を示した図である。図中のＷ_ｆ６４は前述の音声Ｘ_ｆ６１の残響成分の平均パワースペクトルを示す。また、Ｚ_ｆ６２は前述の可聴レベルを示している。
【００９１】
伝送遅延が小さくとも、残響時間が大きい場合の所要エコー抑圧量ＤＬは、図１３の残響成分の平均パワースペクトルＷ_ｆ６４と、可聴レベルＺ_ｆ６１から次式のように決定することができる。
【００９２】
ＤＬ＝Ａ・（Ｗ_ｆ−Ｚ_ｆ）（５）
Ｗ_ｆの値は、一般に音声の残響成分が指数関数的に減衰する性質を用い、前述の評価実験の室内の残響特性より決定しており、前述の音声Ｘ_ｆ６１と次のような関係がある。
【００９３】
【数３】

ここで、Ｔ_Ｒｆは周波数帯域毎の反響路（相手側室内）の残響時間であり、Ｔ_ｔｈｆは周波数帯域毎の継時マスキングの影響範囲（時間）である。
【００９４】
実際に、所要エコー抑圧量を計算した結果を図１４のＤＬ_ＲＴ７３に示す。この結果は、主観評価の結果である図１２の伝送遅延が小さく、残響時間が大きい場合の結果とよく一致しており、上記解釈が正しかったこと、および所要エコー抑圧量が式（５）、式（６）によって求め得ることを示している。
【００９５】
以上の結果から、伝送遅延が小さく残響時間が大きい場合所要エコー抑圧量は、周波数軸に対する音声の残響成分の平均パワースペクトルおよび可聴レベルの特性で近似される。
【００９６】
このようにして得られた所要エコー抑圧量から、タップ長は前述の式（４）より求めることができる。本発明ではこのようにして、伝送遅延が小さい場合にも、反響路（相手側室内）の残響時間の大小によって、所要エコー抑圧量を場合分けをして求め、その結果より周波数帯域毎のタップ長の割り当ての切り替えを行う。従って、残響時間の異なる使用条件においても、通話品質の劣化を防ぐことが可能となり、本発明の目的であるタップ長のさらに効率的な割り当てを行うことができるようになる。
【００９７】
【発明の効果】
以上説明したように、本発明によれば、サブバンド方式の反響消去装置において、聴覚特性より所要エコー抑圧量を求め、その値から適応フィルタのタップ長の効率的な割り当てを行うことが可能となる。このため、従来の周波数帯域において、一律に割り当てられていた方式に比べ演算量の削減が可能となり、更に搭載するタップ長の無駄を省くことにより、装置の経済化を行うことも可能となる。また、伝送遅延の大小により、周波数帯域毎のタップ長の割り当てを可変とすることにより、回線状況によらず、反響信号を抑圧することができ、通話品質の向上が実現できる。
【図面の簡単な説明】
【図１】本発明の一実施形態に係る反響消去装置の構成を示す図である。
【図２】図１に示す反響消去装置において伝送時間により異なるタップ長割り当ての一例を示す図である。
【図３】所要エコー抑圧量を決定する模擬実験システムの構成を示す図である。
【図４】伝送遅延時間の所要エコー抑圧量に対する影響の一例を示す実験結果のグラフである。
【図５】伝送遅延が大きい場合の所要エコー抑圧量の決定の一例を示すグラフである。
【図６】伝送遅延の小さい場合の所要エコー抑圧量の決定の一例を示すグラフである。
【図７】伝送遅延の大小による所要エコー抑圧量の相違の一例を示すグラフである。
【図８】拡声通話系の構成を示す図である。
【図９】従来の反響消去装置の一例を示すブロック図である。
【図１０】従来のサブバンド方式の反響消去装置の一例を示すブロック図である。
【図１１】本発明の他の実施形態に係る反響消去装置の構成を示すブロック図である。
【図１２】残響時間の所要エコー抑圧量に対する影響の一例を示す実験結果のグラフである。
【図１３】遅延時間が小さく残響時間が大きい場合の所要エコー抑圧量の決定の一例を示すグラフである。
【図１４】遅延時間が小さく残響時間が大きい場合の所要エコー抑圧量の一例を示すグラフである。
【符号の説明】
３マイクロホン
４スピーカ
４１，４２周波数帯域分割回路
４３周波数帯域合成回路
４４−１〜４４−Ｎ適応フィルタ
５１タップ長割り当て回路
５３所要エコー抑圧量決定回路
５４伝送遅延判定回路
５５残響時間記憶回路
５６所要エコー抑圧量計算回路
５７残響成分計算回路[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an echo canceling method and apparatus for canceling or suppressing an echo signal which causes howling and impairs hearing in, for example, a two-wire / four-wire conversion system or a loudspeaker system.
[0002]
[Prior art]
First, the cause of such howling and the reverberation signal that causes hearing impairment will be described with reference to the loudspeaker system shown in FIG.
[0003]
In FIG. 8, reference numerals 1 and 3 denote transmission microphones, reference numerals 2 and 4 denote reception speakers, reference numerals 5 and 7 denote transmission signal amplifiers,

reference numerals

6 and 8 denote reception signal amplifiers, reference numeral 9 denotes a transmission line, reference numeral 10 denotes a transmitter, and reference numeral 11 denotes Each represents a listener. The transmitted voice uttered by the transmitter 10 is transmitted to the receiver 11 via the transmission microphone 1, the transmission signal amplifier 5, the transmission path 9, the reception signal amplifier 8, and the reception speaker 4. This loudspeaker call system does not require a handset as in the conventional telephone call system, so it has the advantage that it is possible to make a call while working or to realize a natural face-to-face call. It is widely used for teleconferences, videophones, loudspeakers, and the like.
[0004]
However, as a drawback of this communication system, the existence of reverberation is a problem. That is, in FIG. 8, the sound transmitted from the speaker 4 to the receiving side is received by the microphone 3 and is reproduced to the transmitting side via the transmission signal amplifier 7, the transmission path 9, the reception signal amplifier 6, and the speaker 2. . For the transmitter 10, this phenomenon is a reverberation phenomenon in which the voice uttered by the speaker 10 is reproduced from the speaker 2, and is called an acoustic echo or the like. This reverberation phenomenon causes adverse effects such as trouble in communication and discomfort in a loudspeaker system. Further, the sound reproduced from the speaker 2 is received by the microphone 1 to form a closed loop of the signal. If the loop gain is larger than 1, a howling phenomenon occurs, and a call cannot be made.
[0005]
In order to overcome such a problem of the loudspeaker system, an echo canceller is used. As a typical configuration of the echo canceller, a full-band system and a sub-band system are known.
[0006]
FIG. 9 is a block diagram showing an example of a conventional full-band echo canceller. In this figure, 21 is an echo canceller, 22 is a pseudo echo path, 23 is an echo path estimation circuit, and 24 is a subtractor. Further, x (n) 25 is a reception signal, h (n) 26 is an echo path transfer characteristic (impulse response) between the reception speaker 4 and the transmission microphone 3, y (n) 27 is an echo signal, and y 反 ( n) 28 is a pseudo echo signal, h ＾ (n) 29 is an estimated value of an echo impulse response, e (n) 30 is an error signal, s (n) 31 is a near-end speaker's transmission signal, z (n) ) 32 represents a microphone output signal.
[0007]
In the echo canceller 21, the echo path estimation circuit 23 first estimates the impulse response of the echo path, and transfers the estimated value ＾ (n) 29 to the pseudo echo path 22. Next, in the pseudo echo path 22, a convolution operation of h ＾ (n) 29 and the received signal x (n) 25 is executed to synthesize a pseudo echo signal y ＾ (n). Then, the subtractor 24 subtracts the pseudo echo signal y ＾ (n) 28 from the output signal z (n) of the microphone 3. If the echo path impulse response h (n) 26 is well estimated, the echo signal y (n) 27 and the pseudo echo signal y ＾ (n) 28 are substantially equal, and the result of this subtraction is as follows. , The echo signal y (n) 27 included in the microphone output is eliminated.
[0008]
Here, it is necessary for the pseudo echo path 22 to follow the temporal variation of the echo path impulse response h (n) 26. Therefore, the echo path estimating circuit 23 estimates the echo impulse response using an adaptive algorithm. This estimation operation is executed when the reception state, that is, s (n) ≒ 0, and z (n) ≒ y (n) can be considered. In the receiving state, the error signal e (n) 30 can be regarded as the cancellation residual y (n) −y− (n) of the echo signal. In the following description, this receiving state is assumed. The adaptive algorithm is an algorithm that uses the received signal x (n) 25 and the error signal e (n) 30 to determine an impulse response estimate h ＾ (n) 29 so that the power of the error signal is minimized. , LMS method, learning identification method, ES method and the like are known. Here, a state in which the value of the pseudo echo path 22 is close to the value of the true echo path and the pseudo echo signal y ＾ (n) 28 is substantially equal to the echo signal y (n) 27 is referred to as having converged. In addition, the pseudo echo path 22 and the echo path estimation circuit 23 are collectively called an adaptive filter here.
[0009]
FIG. 10 is a block diagram showing an example of a conventional sub-band type echo canceller, and portions common to FIGS. 8 and 9 are denoted by the same reference numerals. The reception signal x (n) 25 and the microphone output signal z (n) 32 are each divided into N frequency bands. 41 and 42 are frequency band division circuits, 43 is a frequency band synthesis circuit, 44-1 to 44-N are adaptive filters, 45-1x₁45-Nx from (m)_N(M) is a reception signal after frequency division, 46-1z₁(M) to 46-Nz_N(M) is a microphone output signal after frequency division, 47-1e₁74-Ne from (m)_N(M) is an error signal after frequency division. Also, m indicates the discrete time of the signal after being decimated by the frequency division circuit. When the decimation rate is R, there is a relation of n = R × m. The sub-band echo canceller suppresses the echo signal in the adaptive filter for each frequency band as described above.
[0010]
In general, in an adaptive filter, complete echo cancellation can be realized if the tap length indicating the number of filter coefficients is prepared for the (true) impulse response length of the echo path. However, the reverberation signal (acoustic echo) generally has an impulse response duration of about several hundred ms in reverberation time. Therefore, when trying to cancel the reverberator signal, the number of taps becomes very long, which causes an increase in hardware scale. For this reason, a method is conceivable in which the required echo suppression amount is determined based on the permissible limit for the human echo, and a tap length corresponding to the required suppression amount is prepared from the impulse response length of the echo path. In the conventional full-band reverberation canceller, the number of taps is determined as follows based on the average reverberation time in a room. Tap length L_Full _BandIs the required echo suppression amount (Desired Loss) DL, the average reverberation time T_RAnd sampling interval T_S(S) is given by the following equation.
[0011]
(Equation 1)

As can be seen from the equation (1), the tap length has been determined by the required echo suppression amount obtained in the entire frequency band and the reverberation time in the room similarly obtained in the entire frequency band.
[0012]
However, when human auditory characteristics are considered, the audible level differs for each frequency band, so the required echo suppression amount also differs for each frequency band. Further, it is known that the reverberation time in a room is longer in a low band and shorter in a higher band as compared with a value obtained by averaging the entire frequency band. As described above, the conventional determination of the tap length does not consider that the required echo suppression amount differs for each frequency band. Therefore, if the value obtained in the entire frequency band is directly applied to the sub-band echo canceller that divides the signal into a plurality of bands and performs echo suppression processing for each band, the tap length of the device is allocated. In that respect, waste occurs.
[0013]
In addition, the returning echo becomes more detectable as the transmission delay increases. That is, when the transmission delay is small, the echo is masked by its own utterance, and it is difficult to hear the same as the side tone. However, as the transmission delay increases, the echo becomes more audible because it exceeds the range in which the utterance is masked on the time axis. Therefore, there is a problem that the required echo cancellation amount changes depending on the magnitude of the transmission delay. Further, the change in the required echo suppression amount due to this delay has not been studied on the frequency axis.
[0014]
As described above, the conventional method of determining the tap length is determined by the required amount of echo suppression and reverberation time in the entire band without considering the auditory characteristics on the frequency axis. It does not respond to changes in the amount of echo suppression.
[0015]
[Problems to be solved by the invention]
As described above, in the conventional echo canceller, the tap length of the adaptive filter is determined by the value of the required echo suppression amount in the entire band and the reverberation time averaged in the entire band. For this reason, in the sub-band echo canceller, there is a problem that the tap length is wastefully allocated to each band.
[0016]
Also, the value of the required echo suppression amount changes depending on the magnitude of the transmission delay. Therefore, when a reverberation canceling device is used in a line having a transmission delay significantly different from an assumed value, there is a problem that the echo cannot be sufficiently canceled and the speech quality is deteriorated.
[0017]
The present invention has been made in view of the above, and an object of the present invention is to provide an echo canceling method and apparatus capable of sufficiently canceling an echo signal without deteriorating speech quality due to transmission delay. It is in.
[0018]
[Means for Solving the Problems]
To achieve the above object, the present invention according to claim 1 divides a transmission signal to an echo path into a plurality of frequency bands, and divides the echo signal after the transmission signal passes through the echo path into a plurality of frequency bands. Divide and generate a pseudo echo path of each frequency band, and input a transmission signal of a plurality of frequency bands to a pseudo echo path of each of a plurality of frequency bands to generate a pseudo echo signal of a plurality of frequency bands to obtain a plurality of frequency bands. A reverberation canceling method for subtracting a reverberation signal by subtracting from a reverberation signal of a band, wherein the pseudo-reverberation path of each of the frequency bands is configured by an adaptive filter, and an algorithm that operates to minimize the reverberation error of the reverberation signal is provided. The filter coefficient of each adaptive filter is sequentially modified, and the average power level of the echo sound in consideration of the masking effect of the uttered sound in each frequency band and the human power for each frequency band And summarized in that to determine the tap length that indicates the number of filter coefficients of each adaptive filter on the basis of the amount of suppression required sufficient echo to be determined from the audible level for.
[0019]
According to the first aspect of the present invention, the filter coefficient of each adaptive filter constituting the pseudo echo path of each frequency band is successively corrected to minimize the echo error elimination error, and to reduce the masking effect by the uttered voice. The tap length indicating the number of filter coefficients of each adaptive filter is determined based on the necessary and sufficient amount of echo suppression determined from the considered average power level of the echo sound and the audible level of human sound.
[0020]
According to a second aspect of the present invention, in the first aspect of the present invention, the step of determining the tap length measures a magnitude of a transmission delay, and uses the measured transmission delay and an uttered voice for each frequency band. The necessary and sufficient amount of echo suppression is determined from the average power level of the echo sound in consideration of the masking effect and the audible level of human sound for each frequency band, andThe interior of the roomThe gist is to calculate the tap length from the reverberation time and the determined required echo suppression amount.
[0021]
According to the second aspect of the present invention, the necessary and sufficient amount of echo suppression is determined from the average power level of the echo sound and the audible level of the human sound in consideration of the transmission delay, the masking effect of the uttered sound, and this determination. The tap length is calculated from the required echo suppression amount and the reverberation time.
[0022]
Further, in the present invention according to claim 3, in the invention according to claim 2, the step of determining the required echo suppression amount includes the step of determining whether the measured transmission delay is equal to or less than a predetermined value. The difference between the average power level of the echo sound and the average masking level of the uttered voice is used as the average power level of the echo voice in consideration of the masking effect of the uttered voice, and the difference and the audible level of human sound for each frequency band are used. Thus, the gist is to determine a required echo suppression amount.
[0023]
According to the third aspect of the present invention, when the transmission delay is equal to or less than a predetermined value, the average power level of the voice and the average masking level of the voice are considered as the average power level of the echo voice in consideration of the masking effect of the voice. , And a required echo suppression amount is determined using the difference and the audible level of human sound.
[0024]
According to a fourth aspect of the present invention, in the second aspect of the present invention, the step of determining the required echo suppression amount is performed when the measured transmission delay is equal to or greater than a predetermined value. Using the average power level of the voice as the average power level of the echo voice taking into account the masking effect of the sound, and determining the required amount of echo suppression using the average power level and the audible level of the human sound for each frequency band. Is the gist.
[0025]
According to the present invention, when the transmission delay is equal to or greater than a predetermined value, the average power level of the voice is used as the average power level of the echo voice in consideration of the masking effect of the voice voice. And the audible level of human sound is used to determine the required amount of echo suppression.
[0026]
According to a fifth aspect of the present invention, in the third or fourth aspect, the predetermined value of the transmission delay is 60 ms.
[0027]
Furthermore, the present invention according to claim 6, a first frequency band dividing circuit that divides a transmission signal to an echo path into a plurality of frequency bands, and a plurality of echo signals after the transmission signal has passed through the echo path. A second frequency band dividing circuit for dividing into frequency bands, and a pseudo echo path of each frequency band divided by the frequency band dividing circuit, and transmitting the transmission signals of the plurality of frequency bands for each of the plurality of frequency bands. An echo canceller that eliminates the echo signal by subtracting a pseudo echo signal in a plurality of frequency bands obtained by inputting the echo signal in the pseudo echo path from the echo signals in the plurality of frequency bands. Adaptive filter having filter coefficients that are sequentially modified by an algorithm that operates to minimize the cancellation error of the echo signal. Based on the average power level of the echo sound in consideration of the masking effect of the uttered voice in each frequency band and the necessary and sufficient amount of echo suppression determined from the audible level for human sound in each frequency band. And a tap length allocating means for determining a tap length indicating the number of filter coefficients of the band adaptive filter.
[0028]
According to the sixth aspect of the present invention, the filter coefficient of each adaptive filter constituting the pseudo echo path of each frequency band is successively corrected to minimize the echo error elimination error, and to reduce the masking effect by the uttered voice. The tap length indicating the number of filter coefficients of each adaptive filter is determined based on the necessary and sufficient amount of echo suppression determined from the considered average power level of the echo sound and the audible level of human sound.
[0029]
According to a seventh aspect of the present invention, in the invention according to the sixth aspect, the tap length allocating unit includes a transmission delay determining unit that measures a magnitude of a transmission delay, the measured transmission delay, An average power level of the echo sound in consideration of a masking effect by the uttered voice, and a required echo suppression amount determining means for determining a necessary and sufficient amount of echo suppression from an audible level for the human sound for each of the frequency bands,It is a reverberation pathReverberation time storage means for storing room reverberation time for each frequency band; andIt is a reverberation pathThe gist of the present invention is to have a tap length calculating means for calculating a tap length from the reverberation time and the determined required echo suppression amount.
[0030]
According to the present invention, the necessary and sufficient amount of echo suppression is determined from the average power level of the echo sound and the audible level of the human sound in consideration of the transmission delay, the masking effect of the uttered sound, and this determination. The tap length is calculated from the required echo suppression amount and the reverberation time.
[0031]
The invention according to claim 8 is the invention according to claim 7, wherein the required echo suppression amount determining means determines that the transmission delay determined by the transmission delay determining means is equal to or less than a predetermined value. The difference between the average power level of the echo sound and the average masking level of the uttered voice is used as the average power level of the echo sound in consideration of the masking effect of the uttered voice for each frequency band, and this difference and the human sound for each of the frequency bands are used. The point is to determine the required amount of echo suppression using the audible level with respect to.
[0032]
According to the present invention, when the transmission delay is equal to or less than a predetermined value, the average power level of the voice and the average masking level of the voice are considered as the average power level of the echo voice in consideration of the masking effect of the voice. , And a required echo suppression amount is determined using the difference and the audible level of human sound.
[0033]
Further, according to the ninth aspect of the present invention, in the invention according to the seventh aspect, the required echo suppression amount determining means determines that the transmission delay determined by the transmission delay determining means is equal to or greater than a predetermined value. Using the average power level of the voice as the average power level of the echo sound in consideration of the masking effect by the uttered voice of each frequency band, using this average power level and the audible level for human sound for each of the frequency bands, The point is to determine the required echo suppression amount.
[0034]
According to the ninth aspect of the present invention, when the transmission delay is equal to or greater than a predetermined value, the average power level of the voice is used as the average power level of the echo voice in consideration of the masking effect of the uttered voice. And the audible level of human sound is used to determine the required echo suppression amount.
[0035]
A tenth aspect of the present invention is based on the eighth or ninth aspect, wherein the predetermined value of the transmission delay is 60 ms.
[0040]
BEST MODE FOR CARRYING OUT THE INVENTION
Before describing the embodiments of the present invention, first, the change in the required echo suppression amount due to the magnitude of the transmission delay will be described using experiments.
[0041]
Conventionally, the required amount of echo suppression, which is the basis for determining the tap length, has been found only in the entire frequency band. Also, it has been known that when the transmission delay increases, the detection limit of the echo increases (the required echo suppression amount increases). At this time, it is clear how the frequency band changes for each frequency band. It has not been.
[0042]
Therefore, how the required amount of echo suppression changes for each frequency band according to the magnitude of the transmission delay was examined by a subjective evaluation using a simulation experiment system.
[0043]
The simulation experiment system assumed an opposing loudspeaker communication system having a 4-kHz line configuration having a 7 kHz band. FIG. 3 shows a schematic diagram of the system.
[0044]
In this evaluation experiment, the voice uttered on the evaluation side passes through a real-time convolution device simulating the indoor transfer characteristic of the other side, and determines the required echo suppression amount for the returning echo. Here, the echo signal used for the evaluation of the required amount passes through a band-pass filter similar to that used in the sub-band echo canceller, and is already limited to each band at the time of loudspeaking. A loss is inserted into the echo for each band by a variable resistor, and the echo is simulated to determine the required echo suppression amount. Note that the transmission / reception sensitivity is based on the ITU-T Recommendation P.3. 34, and the state where the inserted loss amount is 0 dB is defined as a required amount 0 dB.
[0045]
The evaluation category for determining the required amount of echo suppression is to determine the most appropriate required amount of echo suppression when designing the actual device. " This category corresponds to an intermediate range between the detection limit and the allowable limit in the conventional evaluation.
[0046]
The evaluation parameters are each of 32 divided bands (bandwidth 250 Hz) corresponding to the sub-band type echo canceller to be designed, and the transmission delay time. The acoustic coupling in the real-time convolution device was -2 dB.
[0047]
In this way, how the required amount of each band is affected by the magnitude of the transmission delay was examined for a statistically significant number of raters.
[0048]
With respect to the delay time, which is an evaluation parameter, it was simulated that there was no transmission delay and that there was a transmission delay (about 200 ms). Here, even in the case where there is no transmission delay, a round-trip processing delay (about 28 ms) of the divisional synthesis filter of the sub-band echo canceller is added. Note that the duration of the echo path impulse response on the evaluation side and the partner side was 200 ms. FIG. 4 shows the required amount of echo suppression for each frequency band thus obtained.
[0049]
The horizontal axis in FIG. 4 represents the frequency (Hz), the vertical axis represents the required echo suppression amount (dB), the solid line shows the result when the transmission delay is small, and the broken line shows the result when it is large.
[0050]
From this experimental result, it can be seen that the required echo suppression amount in the low band increases as the transmission delay increases. The change in the frequency characteristic of the required echo suppression amount depending on the magnitude of the transmission delay can be qualitatively understood as follows.
[0051]
When the transmission delay is small, the uttered voice and its echo are heard almost simultaneously, so that the echo is uniformly masked on the frequency axis by the uttered voice having substantially the same frequency characteristics. In other words, the masking level by the uttered voice is large in the low-frequency part where the echo voice power is large, and the masking level by the uttered voice is small as the echo frequency is low in the high-frequency part. Therefore, the average power level of the echo sound becomes linear on the frequency axis. For this reason, the required echo suppression amount has a maximum value in the middle range (2 to 4 kHz) where the audibility is good.
[0052]
On the other hand, when the transmission delay increases, a time difference occurs between the uttered voice and the echo thereof, and the ratio of the echo being uniformly masked on the frequency axis decreases. In some cases, the echo returning to the unvoiced section is not masked at all. Therefore, since the average power level of the echo sound follows the frequency distribution of the average power level of the sound, the required amount of echo suppression becomes larger in the low band. The validity of these interpretations was examined by quantitative analysis.
[0053]
FIG. 5 is a diagram illustrating an example of determining the required echo suppression amount when the transmission delay is large._f61 is an ITU Recommendation P.61. 34 is the average power spectrum of the voice according to the transmit / receive sensitivity of_f62 is a diagram showing the audible level in the evaluation room where the ambient noise was 30 dBA or less in the evaluation experiment. FIG. 6 is a diagram showing an example of determining the required echo suppression amount when the transmission delay is small._f63 is the difference between the average power level of the echo voice and the masking level of the uttered voice, that is, the average power spectrum of the echo voice after being masked into the uttered voice.
[0054]
First, according to the interpretation when the transmission delay is large, the required echo suppression amount DL is equal to the average power spectrum X of the voice shown in FIG._f61 and audible level Z_f62, it can be determined as follows:
[0055]
DL = A · (X_f-Z_f) (2)
Here, the weighting factor A in the equation (2) is determined by the above-described evaluation experiment conditions and the average power spectrum X of the voice shown in FIG._f61 and the audible level Z_fAt 62, A = 1 is standardized. Actually, the calculation result of the required echo suppression amount is indicated by DL in FIG._longShown at 71. This result agrees very well with the result of the subjective evaluation shown in FIG. 4 when the transmission delay is large, that the interpretation is correct, and that the required amount of echo suppression can be obtained by equation (2). Is shown.
[0056]
Next, according to the interpretation in the case where the transmission delay is small, the required echo suppression amount DL is equal to the average power spectrum Y of the echo sound masked to the utterance sound in FIG._f63 and the audible level Z_f62, it can be determined as follows:
[0057]
DL = A · (Y_f-Z_f(3)
Similarly to the case where the delay is large, the result of actually calculating the required echo suppression amount is indicated by DL in FIG._shortShown at 72. This result agrees very well with the result of the subjective evaluation in the case where the transmission delay in FIG. 4 is small, that the interpretation is correct, and that the required amount of echo suppression can be obtained by equation (3). Is shown.
[0058]
The above results can be summarized as follows.
[0059]
(1) When the transmission delay is small; the required echo suppression amount is approximated by the average power level and the audible level of the echo sound masked to the utterance sound with respect to the frequency axis.
[0060]
(2) When the transmission delay is large: The required echo suppression amount is approximated by the characteristics of the average power level and the audible level of the sound with respect to the frequency axis.
[0061]
A method for obtaining the tap length of the adaptive filter of the sub-band echo canceller for each frequency band from the required echo suppression amount obtained in this manner will be described below. Tap length L for each band_SubBandIs the required echo suppression amount DL for each band_fAnd the impulse response duration T for each band in the room_RfFrom the relationship with (s),
(Equation 2)

Can be sought. Here, M is a thinning number.
[0062]
As described above, by classifying the frequency characteristics of the required echo suppression amount based on the auditory characteristics according to the magnitude of the transmission delay, and by considering the reverberation time of each band, it is not possible with the full band method. It can be seen that an optimal tap length assignment can be determined.
[0063]
In the present invention, the required echo suppression amount is obtained based on the auditory characteristics in this way, and the assignment of the tap length for each frequency band is determined. For this reason, the echo signal is more difficult to hear in terms of audibility than in the conventional method which is determined uniformly in all frequency bands. On the other hand, based on the fact that the frequency characteristic of the required echo suppression amount changes depending on the magnitude of the transmission delay, the tap assignment for each band is switched. Therefore, it is possible to prevent the deterioration of the communication quality even under the use conditions with different transmission delays, and it is possible to efficiently allocate the tap length, which is the object of the present invention.
[0064]
Next, an echo canceling apparatus according to an embodiment of the present invention will be described with reference to FIG. In FIG. 1, the same parts as those in FIG. 10 are denoted by the same reference numerals, and description thereof will be omitted.
[0065]
In FIG. 1, reference numeral 51 denotes a tap length assignment circuit, 52 denotes a tap length calculation circuit, 53 denotes a required echo suppression amount determination circuit, 54 denotes a transmission delay determination circuit, and 55 denotes a reverberation time storage circuit.
[0066]
The tap length for each frequency band of the adaptive filters 44-1 to 44-N is determined and transferred by the tap length allocating circuit 51. The allocation of the tap length is performed by the required echo suppression amount DL for each band obtained by the required echo suppression amount determination circuit 53 and the reverberation time T for each band stored in the reverberation time storage circuit 55._RfIs calculated by the tap length calculation circuit 52 based on the equation (1).
[0067]
First, the determination of the required echo suppression amount will be described.
[0068]
According to the above experimental results, the required echo suppression amount depends on the transmission delay, and it is necessary to switch the frequency characteristics. Therefore, in the transmission delay determination circuit 54, the transmission delay is set to the threshold DT._thThe required echo suppression amount determination circuit 53 determines the required echo suppression amount DL based on the magnitude of the transmission delay.
[0069]
Here, the threshold value DT of the transmission delay_thIs DT since the maximum value of successive masking is about 60 ms._th= 60 ms. If the value of the transmission delay in the line where the echo canceller is used is known, the average power spectrum of the echo sound for determining the required echo suppression amount is calculated from the beginning._f61 or Y_f63.
[0070]
In the required echo suppression amount determination circuit 53 in the tap length assignment circuit 51, the required echo suppression amount DL in the equation (4) is determined by the average power spectrum X of the echo sound._f, Y_fAnd audible level Z_fIs calculated by The required echo suppression amount determining circuit 53 includes an echo sound power spectrum X_f, Y_fAnd the audible limit Z_fIs stored.
[0071]
First, when the transmission delay determination circuit 54 determines that the transmission delay is larger than the threshold, the required echo suppression amount determination circuit 53 outputs the required echo suppression amount DL._fIs the average power spectrum X of the voice of FIG._f61 and audible level Z_fIt is determined from equation 62 using equation (2). Thus, when the transmission delay is large, the DL in FIG._longRequired echo suppression amount DL having frequency characteristics such as 71_fIs required.
[0072]
The weight coefficient A in the equation (2) uses the transmission / reception sensitivity as a reference value. Therefore, by linearly weighting in dB in accordance with the transmission / reception sensitivity of the communication system to be used, it is possible to cope with various use conditions of the echo canceller.
[0073]
Next, when the transmission delay determination circuit 54 determines that the transmission delay is smaller than the threshold value, the required echo suppression amount DL_fIs the average power spectrum Y of the echo sound masked to the uttered sound in FIG._f63 and audible level Z_fIt is determined from equation 62 using equation (2). Thus, when the transmission delay is small, the DL in FIG._shortRequired echo suppression amount DL having frequency characteristics such as 72_fIs required. Up to this point, the required echo suppression amount DL_fThis is the determination procedure.
[0074]
On the other hand, the reverberation time T for each band in the equation (4)_RfIs transferred from the reverberation time storage circuit 55 to the tap length calculation circuit 52. It is assumed that the value of the reverberation time in the room for each frequency band is stored in the reverberation time storage circuit 55 from the beginning.
[0075]
The obtained required echo suppression amount DL for each band_fAnd the reverberation time T for each band_RfAnd the tap length calculation circuit 52 calculates the tap length for each band according to the equation (4).
[0076]
FIG. 2 shows an example of the required tap length of each band obtained by such a procedure. Here, the impulse response duration of each band in the equation (4) has a room volume of 87 m.³The actual measurement value in a laboratory having a reverberation time of 300 ms was used. The solid black indicates the allocation of tap lengths when the transmission delay is small and the white is large when the white is large.
[0077]
As shown in FIG. 2, when the transmission delay is small, the number of taps may be assigned more to the middle band having good audibility, and may be decreased as the frequency becomes lower and higher. Also, when the transmission delay is large, it is necessary to allocate more to the low-frequency part where the sound and sound pressure is large.
[0078]
FIG. 2 is a diagram in which two numerical values are displayed in a column format. Therefore, the frequency of a pair of two columns of black and white is shown on the x-axis. In other words, the frequency value on the x-axis is step-shaped. For example, when x = 4500 (Hz), the number of taps of y is y with a small transmission delay and a large transmission delay, respectively._L$ 33.3 and y_HThere are two ways of $ 56.3.
[0079]
The obtained tap length for each band is transferred from the tap length allocating device 51 to the adaptive filters 44-1 to 44-N for each band as described above.
[0080]
As described above, in the present echo canceller, the required echo suppression amount is determined in consideration of the auditory characteristics on the frequency axis, a necessary and sufficient tap length is determined for each frequency band, and the efficient allocation is performed. In addition, the tap length assignment is switched in accordance with the required echo suppression amount that changes depending on the magnitude of the transmission delay, and the necessary minimum amount of calculation prevents deterioration in speech quality.
[0081]
FIG. 11 is a block diagram showing a configuration of an echo canceller according to another embodiment of the present invention. In the reverberation canceling apparatus shown in FIG. 11, the required echo suppression amount determining circuit 53 in the reverberation canceling apparatus shown in FIG. 1 is constituted by a required echo suppression amount calculating circuit 56 and a reverberation component calculating circuit 57. 1 is different from the reverberation canceller of FIG. 1 in that the frequency characteristic of the required echo suppression amount is switched depending on the size of the reverberation time even when the reverberation time is small. The same components are given the same reference numerals.
[0082]
That is, in FIG. 11, the reverberation component calculation circuit 57 constituting the required echo suppression amount determination circuit 53 calculates the reverberation component W of the sound from the reverberation time of the reverberation path supplied from the reverberation time storage circuit 55._f, And this value and the audible level Z_fThe required echo suppression amount DL is calculated in the required echo suppression amount calculation circuit 56 by using the equation (1), and the calculated required echo suppression amount DL is supplied to the tap length calculation circuit 52.
[0083]
More specifically, a subjective evaluation experiment was conducted to examine the effect of the reverberation time in the other room, which is the echo path, in addition to the effect of the transmission delay on the required echo suppression, and the results of the experiment were analyzed. Assigning a proper tap length.
[0084]
As an experimental system of this subjective evaluation experiment, the same system as the above-mentioned experimental system used for examining the influence of the transmission delay was used. The evaluation parameters are the above-described 32 divided frequency bands and the reverberation time of the reverberation path (the room on the other side).
[0085]
Two types of reverberation time, which are evaluation parameters, were simulated: a small case (about 110 ms) and a large case (about 450 ms). Actual values are used for these values. FIG. 12 shows experimental results of obtaining the required amount of echo suppression when the transmission delay is small (28 ms) and large (300 ms) for each reverberation time.
[0086]
In FIG. 12, the black squares and triangles have delays when the reverberation time is short (about 110 ms), and the white circles and inverted triangles (○, Δ) show delays when the reverberation time is long (about 450 ms). The results at 28 ms and 300 ms are shown.
[0087]
From the results of FIG. 12, it can be understood that the magnitude of the reverberation time is generally smaller than the magnitude of the transmission delay. In particular, when the transmission delay is large, the reverberation time hardly affects the required echo suppression amount. The reason for this is that, when the delay time is around 300 ms, the effect of masking by the uttered voice has already completely disappeared, the required amount of echo suppression is determined only by the average power level and audible level of the voice, and the effect of reverberation is less noticeable. Conceivable.
[0088]
On the other hand, even when the transmission delay is small, the required amount of echo suppression in the low frequency band increases as the reverberation time increases. This is because even if the delay is small, if reverberation is added to the echo returning to the evaluation side, if the reverberation component exceeds the range of successive masking of the uttered voice, that portion is detected as an unpleasant echo. It is thought to be. Further, it can be interpreted that the frequency characteristic of the required echo suppression amount at that time is similar to the frequency characteristic of the required echo suppression amount when the transmission delay exceeds the successive masking. It is considered that the reason why the required echo suppression amount is reduced as a whole is that the reverberation component of the echo is smaller than the direct sound component.
[0089]
Here, the validity of the interpretation for the required echo suppression when the reverberation time is large even if the transmission delay is small was examined by quantitative analysis.
[0090]
FIG. 13 is a diagram illustrating an example of determining the required echo suppression amount when the transmission delay is small and the reverberation time is large. W in the figure_f64 is the aforementioned voice X_f21 shows an average power spectrum of 61 reverberation components. Also, Z_fReference numeral 62 indicates the audible level.
[0091]
Even if the transmission delay is small, the required echo suppression amount DL when the reverberation time is long is equal to the average power spectrum W of the reverberation component in FIG._f64 and audible level Z_f61, the following equation can be determined.
[0092]
DL = A · (W_f-Z_f) (5)
W_fIs determined from the reverberation characteristics in the room in the above-described evaluation experiment, using the property that the reverberation component of the voice generally attenuates exponentially._f61 and the following relationship.
[0093]
(Equation 3)

Where T_RfIs the reverberation time of the reverberation path (the other room) for each frequency band, and T_thfIs the influence range (time) of successive masking for each frequency band.
[0094]
The result of actually calculating the required echo suppression amount is indicated by DL in FIG._RT73. This result agrees well with the result of the subjective evaluation in FIG. 12 in which the transmission delay is small and the reverberation time is large, and the interpretation is correct, and the required echo suppression amount is expressed by the equation (5). This shows that it can be obtained by Expression (6).
[0095]
From the above results, when the transmission delay is small and the reverberation time is large, the required amount of echo suppression is approximated by the characteristics of the average power spectrum and the audible level of the reverberation component of the sound with respect to the frequency axis.
[0096]
From the required echo suppression amount obtained in this manner, the tap length can be obtained from the above-described equation (4). In this way, according to the present invention, even when the transmission delay is small, the required echo suppression amount is obtained by dividing the case according to the size of the reverberation time of the reverberation path (the other party's room), and the tap for each frequency band is obtained from the result. Switch length assignment. Therefore, it is possible to prevent the deterioration of the communication quality even under the use conditions having different reverberation times, and it is possible to more efficiently allocate the tap length, which is the object of the present invention.
[0097]
【The invention's effect】
As described above, according to the present invention, in the sub-band echo canceller, it is possible to obtain the required echo suppression amount from the auditory characteristics and to efficiently allocate the tap length of the adaptive filter from the obtained value. Become. For this reason, in the conventional frequency band, the amount of calculation can be reduced as compared with a system that has been uniformly assigned, and the apparatus can be made more economical by eliminating waste of the installed tap length. In addition, by making the tap length assignment variable for each frequency band depending on the magnitude of the transmission delay, it is possible to suppress the reverberation signal irrespective of the line condition, and to improve the speech quality.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration of an echo canceller according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating an example of tap length assignment that varies depending on transmission time in the echo canceller illustrated in FIG. 1;
FIG. 3 is a diagram illustrating a configuration of a simulation experiment system that determines a required echo suppression amount.
FIG. 4 is a graph of an experimental result showing an example of an influence of a transmission delay time on a required echo suppression amount.
FIG. 5 is a graph showing an example of determining a required echo suppression amount when a transmission delay is large.
FIG. 6 is a graph showing an example of determining a required echo suppression amount when a transmission delay is small.
FIG. 7 is a graph illustrating an example of a difference in a required echo suppression amount depending on a magnitude of a transmission delay.
FIG. 8 is a diagram showing a configuration of a loudspeaker system.
FIG. 9 is a block diagram showing an example of a conventional echo canceller.
FIG. 10 is a block diagram showing an example of a conventional sub-band echo canceller.
FIG. 11 is a block diagram showing a configuration of an echo canceller according to another embodiment of the present invention.
FIG. 12 is a graph of an experimental result showing an example of an influence of a reverberation time on a required echo suppression amount.
FIG. 13 is a graph showing an example of determining a required echo suppression amount when a delay time is short and a reverberation time is long.
FIG. 14 is a graph showing an example of a required echo suppression amount when a delay time is small and a reverberation time is large.
[Explanation of symbols]
3 microphone
4 Speaker
41, 42 frequency band division circuit
43 frequency band synthesis circuit
44-1 to 44-N Adaptive Filter
51 Tap length assignment circuit
53 Required echo suppression amount determination circuit
54 Transmission delay judgment circuit
55 Reverberation time storage circuit
56 Required echo suppression calculation circuit
57 Reverberation component calculation circuit

Claims

The transmission signal to the echo path is divided into a plurality of frequency bands, the echo signal after the transmission signal has passed through the echo path is divided into a plurality of frequency bands, and a pseudo echo path of each frequency band is generated. A reverberation elimination method that eliminates the reverberation signal by subtracting the pseudo-reverberation signals of a plurality of frequency bands obtained by inputting the transmission signals of the frequency bands into the pseudo-reverberation paths of the plurality of frequency bands from the reverberation signals of the multiple frequency bands So,
The pseudo echo path of each of the frequency bands is configured by an adaptive filter, and the filter coefficient of each adaptive filter is sequentially modified by an algorithm that operates to minimize the echo error of the echo signal,
The filter coefficient of each adaptive filter based on the necessary and sufficient amount of echo suppression determined from the average power level of the echo sound taking into account the masking effect of the spoken voice in each frequency band and the audible level of human sound in each frequency band The echo canceling method characterized by determining a tap length indicating the number of the echoes.

The step of determining the tap length measures the magnitude of the transmission delay, the measured transmission delay, the average power level of the echo sound taking into account the masking effect of the speech sound for each frequency band, and the A necessary and sufficient amount of echo suppression is determined from an audible level for human sound, and a tap length is calculated from a reverberation time in a room, which is an echo path for each frequency band, and the determined required echo suppression amount. The echo canceling method according to claim 1, wherein

The step of determining the required echo suppression amount includes, when the measured transmission delay is equal to or less than a predetermined value, the average power of the voice as the average power level of the echo voice in consideration of the masking effect of the voice voice in each frequency band. 3. A required echo suppression amount is determined by using a difference between the level and an average masking level of the uttered voice, and using the difference and an audible level of a human sound for each frequency band. Echo cancellation method.

The step of determining the required echo suppression amount includes, when the measured transmission delay is equal to or greater than a predetermined value, as an average power level of the echo sound in consideration of a masking effect of the utterance sound for each frequency band. 3. The echo canceling method according to claim 2, wherein the required echo suppression amount is determined using the average power level and the audible level of human sound for each frequency band.

5. The echo cancellation method according to claim 3, wherein the predetermined value of the transmission delay is 60 ms.

A first frequency band dividing circuit for dividing a transmission signal to an echo path into a plurality of frequency bands, and a second frequency band division for dividing the echo signal after the transmission signal passes through the echo path into a plurality of frequency bands Circuit, generating a pseudo echo path of each frequency band divided by the frequency band division circuit, and obtaining the transmission signals of the plurality of frequency bands as inputs to the pseudo echo path for each of the plurality of frequency bands. A reverberation canceling apparatus that cancels the reverberation signal by subtracting a pseudo-reverberation signal of a plurality of frequency bands from the reverberation signal of the plurality of frequency bands,
An adaptive filter having filter coefficients that are sequentially modified by an algorithm that operates to minimize the erasure error of the echo signal;
Based on the average power level of the echo sound in consideration of the masking effect of the speech sound for each frequency band and the necessary and sufficient amount of echo suppression determined from the audible level for human sound for each frequency band, And a tap length allocating means for determining a tap length indicating the number of filter coefficients of the adaptive filter.

The tap length allocating unit includes a transmission delay determining unit that measures a magnitude of a transmission delay, the measured transmission delay, an average power level of an echo sound in consideration of a masking effect of the uttered sound for each frequency band, and A required echo suppression amount determining means for determining a necessary and sufficient echo suppression amount from an audible level of human sound for each frequency band, and a reverberation time storage means for storing a reverberation time in a room as a reverberation path for each frequency band And tap length calculation means for calculating a tap length from the stored reverberation time in the room, which is an echo path for each of the stored frequency bands, and the determined required echo suppression amount. The described echo canceller.

The required echo suppression amount determining means, when the transmission delay measured by the transmission delay determining means is equal to or less than a predetermined value, an average power level of the echo sound in consideration of a masking effect of the uttered voice for each frequency band. Using the difference between the average power level of the voice and the average masking level of the uttered voice, and using the difference and the audible level of the human sound for each of the frequency bands to determine a required amount of echo suppression. The echo canceller according to claim 7, wherein

The required echo suppression amount determining means, when the transmission delay measured by the transmission delay determining means is equal to or more than a predetermined value, an average power level of the echo sound in consideration of a masking effect by the uttered voice for each frequency band. 8. The echo canceller according to claim 7, wherein the required echo suppression amount is determined by using an average power level of the voice as the sound power level, and using the average power level and the audible level of the human sound in each of the frequency bands. apparatus.

10. The echo canceller according to claim 8, wherein the predetermined value of the transmission delay is 60 ms.