JP2004289242A

JP2004289242A - Echo canceling apparatus, echo canceling method, program and recording medium

Info

Publication number: JP2004289242A
Application number: JP2003075683A
Authority: JP
Inventors: Hideaki Sasaki; 秀昭佐々木; Naoto Kawasaki; 直人川▲崎▼; Kenichi Taniguchi; 賢一谷口; Junichi Koga; 淳一古賀; Kensuke Yamashita; 賢祐山下; Noriyoshi Nagase; 徳美永瀬
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2003-03-19
Filing date: 2003-03-19
Publication date: 2004-10-14

Abstract

<P>PROBLEM TO BE SOLVED: To provide an echo canceling apparatus in which an increase in echo and the occurrence of howling are suppressed even when the transfer function is suddenly changed. <P>SOLUTION: A central processing apparatus includes: a transfer function estimate means for estimating an echo component sneaked from a loudspeaker to a microphone; an echo filter means for eliminating the estimated echo component; and a talker detection means 16 for detecting an uttered speech by a remote end talker, an uttered speech by a near end talker, and a double-talk, and the talker detection means 16 includes: leak integral means 21, 22 for calculating the power of a remote end talker voice signal as a received voice signal and a near end talker voice signal as a transmission voice signal to apply leak integral to the calculated power; and a detection means 25 for detecting a state of the uttered speech by the remote end talker and the near end talker on the basis of a result of the calculation stored in the leak integral means. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置、および、そのエコーキャンセル装置におけるエコーキャンセル方法、ならびに、そのエコーキャンセル方法を実行するためのプログラム、ならびに、そのプログラムを実行するための記録媒体に関するものである。
【０００２】
【従来の技術】
スピーカフォン方式電話等の音声ハンズフリー装置において、ハウリングやエコーを防止するためにエコーキャンセル技術がある。このエコーキャンセル技術によれば、スピーカから出力された音声が部屋等の空間を通ってマイクロフォンに入力された音声（エコー）から、その空間を擬似的に模擬した伝達関数とスピーカへ出力した音声とを畳み込んだ信号を差し引くことにより、あたかもエコーがないようにすることができる。
【０００３】
以下に、従来のエコーキャンセル技術について（特許文献１）を用いて説明する。図８は従来のエコーキャンセル装置を示す機能ブロック図である。
【０００４】
図８において、１はスピーカフォン方式電話等における受話音声（遠端話者からの音声）を再生するスピーカ、２は送話音声（近端話者からの音声）を拾うマイクロフォン、３は直接伝搬経路を経たエコーを消去する第１のエコーキャンセル部、４は第１のエコーキャンセル部３の出力信号を用いてダブルトーク状態を検出するダブルトーク検出部、５は間接伝搬経路を経たエコーを消去する第２のエコーキャンセル部である。
【０００５】
第２のエコーキャンセル部５においては、伝搬経路の変化に追従するために、適応型制御が行われており、その方法として正規化ＬＭＳ（ＬｅａｓｔＭｅａｎＳｑｕａｒｅ）など数種が知られている。また、ダブルトーク検出部４は、送信信号のレベルと受信信号のレベルの比を演算し、所定の閾値と比較することで、話者判定を行い、第２のエコーキャンセル部５の上記適応制御動作の有無をコントロールする。
【０００６】
【特許文献１】
特開平５−４８５４７号公報
【０００７】
【発明が解決しようとする課題】
しかしながら、上記従来のエコーキャンセル装置では、近端話者の空間移動などに伴って伝達関数が急激に変化した場合に、話者検出の誤判定を招き、結果としてエコーの増大やハウリング発生を招来するという問題点を有していた。
【０００８】
このエコーキャンセル装置、エコーキャンセル方法、プログラムおよび記録媒体では、近端話者の空間移動などに伴って伝達関数が急激に変化した場合でも、エコーの増大やハウリング発生が抑制されることが要求されている。
【０００９】
本発明は、この要求を満たすため、伝達関数が急激に変化した場合でもエコーの増大やハウリング発生を抑制することができるエコーキャンセル装置、および、伝達関数が急激に変化した場合でもエコーの増大やハウリング発生を抑制するエコーキャンセル方法、ならびに、そのエコーキャンセル方法を実行するためのプログラム、ならびに、そのプログラムを実行するための記録媒体を提供することを目的とする。
【００１０】
【課題を解決するための手段】
上記課題を解決するために本発明のエコーキャンセル装置は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置であって、中央演算処理装置は、スピーカからマイクロフォンへと回り込むエコー成分を推定する伝達関数推定手段と、推定したエコー成分を除去するエコーフィルタ手段と、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出手段とを有し、話者検出手段は、受話音声信号としての遠端話者音声信号および送話音声信号としての近端話者音声信号のパワーを演算してリーク積分するリーク積分手段と、リーク積分手段の保持する演算結果をもとに遠端話者および近端話者の発話状態を検出する検出手段とを有する構成を備えている。
【００１１】
これにより、伝達関数が急激に変化した場合でもエコーの増大やハウリング発生を抑制することができるエコーキャンセル装置が得られる。
【００１２】
上記課題を解決するために本発明のエコーキャンセル方法は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置におけるエコーキャンセル方法であって、スピーカからマイクロフォンへと回り込むエコー成分を推定する伝達関数推定ステップと、推定したエコー成分を除去するエコーフィルタステップと、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出ステップとを有し、話者検出ステップは、受話音声信号としての遠端話者音声信号および送話音声信号としての近端話者音声信号のパワーを演算してリーク積分するリーク積分ステップと、リーク積分ステップにおける演算結果をもとに遠端話者および近端話者の発話状態を検出する検出ステップとを有する構成を備えている。
【００１３】
これにより、伝達関数が急激に変化した場合でもエコーの増大やハウリング発生を抑制するエコーキャンセル方法が得られる。
【００１４】
上記課題を解決するために本発明のプログラムは、上記エコーキャンセル方法の各ステップを実行するためのプログラムである構成を備えている。
【００１５】
これにより、上記エコーキャンセル方法を実行するためのプログラムが得られる。
【００１６】
上記課題を解決するために本発明の記録媒体は、上記プログラムを実行するためのコンピュータで読み取り可能な記録媒体である構成を備えている。
【００１７】
これにより、上記プログラムを実行するための記録媒体が得られる。
【００１８】
【発明の実施の形態】
本発明の請求項１に記載のエコーキャンセル装置は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置であって、中央演算処理装置は、スピーカからマイクロフォンへと回り込むエコー成分を推定する伝達関数推定手段と、推定したエコー成分を除去するエコーフィルタ手段と、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出手段とを有し、話者検出手段は、受話音声信号としての遠端話者音声信号および送話音声信号としての近端話者音声信号のパワーを演算してリーク積分するリーク積分手段と、リーク積分手段の保持する演算結果をもとに遠端話者および近端話者の発話状態を検出する検出手段とを有することとしたものである。
【００１９】
この構成により、話者検出手段が前回検出動作までに利用した遠端話者音声信号および近端話者音声信号のパワーの平均値を閾値として用いることができるので、近端話者の空間移動などに伴って伝達関数の急激な変化が生じた場合にも、変化に追従することができ、エコーの増大やハウリング発生を抑制することができるという作用を有する。
【００２０】
請求項２に記載のエコーキャンセル装置は、請求項１に記載のエコーキャンセル装置において、話者検出手段は、受話音声信号のリーク積分値の平均値を計算して保持する受話平均値保持手段と、送話音声信号のリーク積分値の平均値を計算して保持する送話平均値保持手段とを有することとしたものである。
【００２１】
この構成により、リーク積分値の平均値すなわち背景騒音のレベルに自動適応可能な話者検出を行うことができるという作用を有する。
【００２２】
請求項３に記載のエコーキャンセル装置は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置であって、中央演算処理装置は、スピーカからマイクロフォンへと回り込むエコー成分を推定する伝達関数推定手段と、推定したエコー成分を除去するエコーフィルタ手段と、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出手段とを有し、話者検出手段は、遠端話者、近端話者およびダブルトークの発話期間を計算すると共に計算した発話期間の長短に応じて伝達関数推定手段における学習係数項の値を制御することとしたものである。
【００２３】
この構成により、発話期間の長い場合には伝達関数推定手段における学習係数項の値を増加させ、発話期間の短い場合には伝達関数推定手段における学習係数項の値を減少させることができるので、突発性のノイズに対してエコーキャンセル処理の耐性を高めることができるという作用を有する。
【００２４】
請求項４に記載のエコーキャンセル装置は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置であって、中央演算処理装置は、スピーカからマイクロフォンへと回り込むエコー成分を推定する伝達関数推定手段と、推定したエコー成分を除去するエコーフィルタ手段と、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出手段とを有し、話者検出手段は、無音期間を計算すると共に計算した無音期間の長短に応じて伝達関数推定手段における正規化ＬＭＳ方式のオフセット項の値を制御することとしたものである。
【００２５】
この構成により、無音期間の長い場合には伝達関数推定手段における正規化ＬＭＳ方式のオフセット項の値を増加させ、無音期間の短い場合には伝達関数推定手段における正規化ＬＭＳ方式のオフセット項の値を減少させることができるので、突発性のノイズに対してエコーキャンセル処理の耐性を高めることができるという作用を有する。
【００２６】
請求項５に記載のエコーキャンセル方法は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置におけるエコーキャンセル方法であって、スピーカからマイクロフォンへと回り込むエコー成分を推定する伝達関数推定ステップと、推定したエコー成分を除去するエコーフィルタステップと、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出ステップとを有し、話者検出ステップは、受話音声信号としての遠端話者音声信号および送話音声信号としての近端話者音声信号のパワーを演算してリーク積分するリーク積分ステップと、リーク積分ステップにおける演算結果をもとに遠端話者および近端話者の発話状態を検出する検出ステップとを有することとしたものである。
【００２７】
この構成により、話者検出手段が前回検出動作までに利用した遠端話者音声信号および近端話者音声信号のパワーの平均値を閾値として用いることができるので、近端話者の空間移動などに伴って伝達関数の急激な変化が生じた場合にも、変化に追従することができ、エコーの増大やハウリング発生を抑制するができるという作用を有する。
【００２８】
請求項６に記載のエコーキャンセル方法は、請求項５に記載のエコーキャンセル方法であって、話者検出ステップは、受話音声信号のリーク積分値の平均値を計算して保持する受話平均値保持ステップと、送話音声信号のリーク積分値の平均値を計算して保持する送話平均値保持ステップとを有することとしたものである。
【００２９】
この構成により、リーク積分値の平均値すなわち背景騒音のレベルに自動適応可能な話者検出を行うことができるという作用を有する。
【００３０】
請求項７に記載のエコーキャンセル方法は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置におけるエコーキャンセル方法であって、スピーカからマイクロフォンへと回り込むエコー成分を推定する伝達関数推定ステップと、推定したエコー成分を除去するエコーフィルタステップと、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出ステップとを有し、話者検出ステップでは、遠端話者、近端話者およびダブルトークの発話期間を計算すると共に計算した発話期間の長短に応じて伝達関数推定ステップにおける学習係数項の値を制御することとしたものである。
【００３１】
この構成により、発話期間の長い場合には伝達関数推定手段における学習係数項の値を増加させ、発話期間の短い場合には伝達関数推定手段における学習係数項の値を減少させることができるので、突発性のノイズに対してエコーキャンセル処理の耐性を高めることができるという作用を有する。
【００３２】
請求項８に記載のエコーキャンセル方法は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置におけるエコーキャンセル方法であって、スピーカからマイクロフォンへと回り込むエコー成分を推定する伝達関数推定ステップと、推定したエコー成分を除去するエコーフィルタステップと、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出ステップとを有し、話者検出ステップでは、無音期間を計算すると共に計算した無音期間の長短に応じて伝達関数推定ステップにおける正規化ＬＭＳ方式のオフセット項の値を制御することとしたものである。
【００３３】
この構成により、無音期間の長い場合には伝達関数推定手段における正規化ＬＭＳ方式のオフセット項の値を増加させ、無音期間の短い場合には伝達関数推定手段における正規化ＬＭＳ方式のオフセット項の値を減少させることができるので、突発性のノイズに対してエコーキャンセル処理の耐性を高めることができるという作用を有する。
【００３４】
請求項９に記載のプログラムは、請求項５乃至８のいずれか１に記載されたエコーキャンセル方法の各ステップを実行するためのプログラムであることとしたものである。
【００３５】
この構成により、上記プログラムを実行するコンピュータを用いることにより、請求項５乃至８のいずれか１に記載されたエコーキャンセル方法を任意の場所で任意の時間に実行することができるという作用を有する。
【００３６】
請求項１０に記載の記録媒体は、請求項９に記載されたプログラムを実行するためのコンピュータで読み取り可能な記録媒体であることとしたものである。
【００３７】
この構成により、コンピュータで読み取り可能な記録媒体からプログラムを読み取ることにより、請求項９に記載されたプログラムを任意の場所で任意の時間に実行することができるという作用を有する。
【００３８】
以下、本発明の実施の形態について、図１〜図７を用いて説明する。
【００３９】
（実施の形態１）
図１は、本発明の実施の形態１によるエコーキャンセル装置の基本構成を示すブロック図である。
【００４０】
図１において、６は電話回線とのインタフェースを有する電話回路装置、７はアナログ電気信号である受話音声電気信号をデジタル電気信号に変換する第１のＡ／Ｄ変換装置、８はデジタル電気信号をアナログ電気信号へ変換する第１のＤ／Ａ変換装置、９はＤ／Ａ変換装置８からのアナログ電気信号を音声に変換するスピーカ、１０は音声をアナログ電気信号に変換するマイクロフォン、１１はマイクロフォンからのアナログ電気信号をデジタル電気信号に変換する第２のＡ／Ｄ変換装置、１２はデジタル電気信号をアナログ電気信号（送話音声電気信号）に変換する第２のＤ／Ａ変換装置、１３はＡ／Ｄ変換装置７およびＡ／Ｄ変換装置１１から得られたデジタル電気信号に対してデジタル信号処理を行い、その演算結果をＤ／Ａ変換装置８およびＤ／Ａ変換装置１２に出力する中央演算処理装置、１４は中央演算処理装置１３を動作させるためのプログラムが記憶されているＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、１５は前記ＲＯＭに記憶されているプログラムに従って中央演算処理装置１３が動作する際に使用するＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）である。
【００４１】
図２は、図１の中央演算処理装置１３における機能実現手段（プログラムによって機能が実現される手段）を示す機能ブロック図であり、スピーカフォン方式電話におけるエコーキャンセル方法を示す。本機能はＲＯＭ１４に記録されているプログラムの概略を示している。
【００４２】
図２において、１６はスピーカフォン方式電話等において、エコーキャンセル装置の動作を制御するために遠端話者の発話、近端話者の発話およびダブルトーク（遠端話者と近端話者の同時発話）を検出する話者検出手段、１７は正規化ＬＭＳ（ＬｅａｓｔＭｅａｎＳｑｕａｒｅ）方式等に代表される最急降下法によりスピーカ９とマイクロフォン１０との間の空間の伝達関数を推定する伝達関数推定手段、１８は直接エコー成分の伝達関数と受話音声との畳み込み演算を行う直接エコーフィルタ手段、１９は間接エコー成分の伝達関数と受話音声との畳み込み演算を行う間接エコーフィルタ手段、２０は減算手段である。
【００４３】
このように構成されたエコーキャンセル装置について、その概略動作を説明する。スピーカ９から放射された音声は空間を介してマイクロフォン１０にエコーとして入力され、閉ループが構成され、エコーキャンセル処理を行わなければ最悪ハウリングが発生してしまう。また、スピーカ９から放射された音声は、直接マイクロフォン１０へ入力される直接エコー成分と、空間内の壁、床、天井等の物体によって反射された後にマイクロフォン１０に入る間接エコー成分に分類できる。
【００４４】
図３は、図２の中央演算処理装置３の動作を示すフローチャートであり、スピーカフォン方式電話におけるエコーキャンセル方法を示す。
【００４５】
図３において、エコーキャンセル処理を開始すると（Ｓ１）、話者検出手段１６が遠端話者発話、近端話者発話、ダブルトークを判定し（Ｓ２）、遠端話者発話ならば伝達関数推定手段１７がＮＬＭＳ等のアルゴリズムを用いて直接波成分伝達関数推定（Ｓ３）および間接波成分伝達関数推定（Ｓ４）を行い、直接エコーフィルタ手段１８は推定結果と受話音声との畳み込み演算を行い（Ｓ５）、間接エコーフィルタ手段１９は推定結果と受話音声との畳み込み演算を行い（Ｓ６）、マイクロフォン１０からの送話音声と畳み込み演算結果とを減算手段２０を用いて減算して直接エコー成分と間接エコー成分を除去する（Ｓ７）。
【００４６】
これにより、伝達関数推定の高速化と高精度化を両方実現したエコーキャンセル処理が可能である。
【００４７】
以上のように本実施の形態によれば、直接エコーフィルタ手段１８により推定結果と受話音声との畳み込み演算を行い、間接エコーフィルタ手段１９により推定結果と受話音声との畳み込み演算を行い、マイクロフォン１０からの送話音声と畳み込み演算結果とを減算手段２０を用いて減算して直接エコー成分と間接エコー成分を除去するようにしたので、スピーカからの音量を大きくした場合でもダブルトークの判定精度を高くすることができ、受話音声と送話音声の音声パワー比が同じ場合でもダブルトーク検出精度を高くすることができる。
【００４８】
（実施の形態２）
図４は、本発明の実施の形態２によるエコーキャンセル装置の中央演算処理装置の話者検出手段を示す機能ブロック図であり、エコーキャンセル方式における話者検出方法を示す。なお、本実施の形態によるエコーキャンセル装置の基本構成は図１に示す構成であり、本実施の形態によるエコーキャンセル装置の中央演算処理装置の構成は図２に示す構成である。また、図４の機能はＲＯＭ１４に記録されているプログラムの概略を示している。
【００４９】
図４において、２１は受話音声（遠端話者音声）のリーク積分（リーク積分とは、積分値＝積分値×定数＋入力データの形で行う入力データの積分）を行うリーク積分手段、２２は送話音声（近端話者音声）のリーク積分を行うリーク積分手段、２３はリーク積分手段２１の演算結果の平均値を保持する受話平均値保持手段、２４はリーク積分手段２２の演算結果の平均値を保持する送話平均値保持手段、２５は受話平均値保持手段２３と送話平均値保持手段２４に保持されている値を閾値として、遠端話者発話、近端話者発話、ダブルトークを判定する検出手段である。
【００５０】
このように構成された話者検出手段１６について、その動作を図５を用いて説明する。図５は図４の話者検出手段１６の動作を示すフローチャートである。
【００５１】
図５において、エコーキャンセル処理を開始すると（Ｓ１１）、リーク積分手段２１が受話音声のリーク積分を算出し（Ｓ１２）、その平均値を受話平均値保持手段２３が記憶する（Ｓ１３）、リーク積分手段２２が送話音声のリーク積分を算出し（Ｓ１４）、その平均値を送話平均値保持手段２４が記憶する（Ｓ１５）。検出手段２５は受話平均値保持手段２３と送話平均値保持手段２４とに保持されている値を閾値として、遠端話者発話、近端話者発話、ダブルトークを判定する。
【００５２】
以上のように本実施の形態によれば、中央演算処理装置１３は、スピーカ９からマイクロフォン１０へと回り込むエコー成分を推定する伝達関数推定手段１７と、推定したエコー成分を除去するエコーフィルタ手段１８〜１９と、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出手段１６とを有し、話者検出手段１６は、受話音声信号としての遠端話者音声信号および送話音声信号としての近端話者音声信号のパワーを演算してリーク積分するリーク積分手段２１、２２と、リーク積分手段２１、２２の保持する演算結果をもとに遠端話者および近端話者の発話状態を検出する検出手段２５とを有することにより、話者検出手段１６が前回検出動作までに利用した遠端話者音声信号および近端話者音声信号のパワーの平均値を閾値として用いることができるので、近端話者の空間移動などに伴って伝達関数の急激な変化が生じた場合にも、変化に追従することができ、エコーの増大やハウリング発生を抑制することができる。
【００５３】
また、話者検出手段１６は、受話音声信号のリーク積分値の平均値を計算して保持する受話平均値保持手段２３と、送話音声信号のリーク積分値の平均値を計算して保持する送話平均値保持手段２４とを有することにより、リーク積分値の平均値すなわち背景騒音のレベルに自動適応可能な話者検出を行うことができる。
【００５４】
（実施の形態３）
本発明の実施の形態３によるエコーキャンセル装置の基本構成は図１に示す構成であり、本実施の形態によるエコーキャンセル装置の中央演算処理装置の構成は図２に示す構成である。また、本発明の実施の形態３によるエコーキャンセル装置の中央演算処理装置１３の話者検出手段１６の構成は図４に示す構成である。
【００５５】
このように構成された話者検出手段１６について、その動作を図６を用いて説明する。図６は図４の話者検出手段１６の動作を示すフローチャートである。
【００５６】
図６において、エコーキャンセル処理を開始すると（Ｓ２１）、検出手段２５が発話期間を計算し（Ｓ２２）、発話期間が一定期間より長ければ（Ｓ２３）、伝達関数推定手段１７における学習係数項の値を大きくし（Ｓ２４）、発話期間が一定期間より短ければ伝達関数推定手段１７における学習係数項の値を小さくする（Ｓ２５）。
【００５７】
以上のように本実施の形態によれば、中央演算処理装置１３は、スピーカ９からマイクロフォン１０へと回り込むエコー成分を推定する伝達関数推定手段１７と、推定したエコー成分を除去するエコーフィルタ手段１８〜１９と、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出手段１６とを有し、話者検出手段１６は、遠端話者、近端話者およびダブルトークの発話期間を計算すると共に計算した発話期間の長短に応じて伝達関数推定手段における学習係数項の値を制御することにより、発話期間の長い場合には伝達関数推定手段における学習係数項の値を増加させ、発話期間の短い場合には伝達関数推定手段における学習係数項の値を減少させることができるので、突発性のノイズに対してエコーキャンセル処理の耐性を高めることができる。
【００５８】
（実施の形態４）
本発明の実施の形態４によるエコーキャンセル装置の基本構成は図１に示す構成であり、本実施の形態によるエコーキャンセル装置の中央演算処理装置の構成は図２に示す構成である。また、本発明の実施の形態４によるエコーキャンセル装置の中央演算処理装置１３の話者検出手段１６の構成は図４に示す構成である。
【００５９】
このように構成された話者検出手段１６について、その動作を図７を用いて説明する。図７は図４の話者検出手段１６の動作を示すフローチャートである。
【００６０】
図７において、エコーキャンセル処理を開始すると（Ｓ３１）、検出手段２５が無音期間を計算し（Ｓ３２）、無音期間が一定期間より長ければ（Ｓ３３）、伝達関数推定手段１７における正規化ＬＭＳ方式のオフセット項の値を大きくし（Ｓ３４）、無音期間が一定期間より短ければ伝達関数推定手段１７における正規化ＬＭＳ方式のオフセット項の値を小さくする（Ｓ３５）。
【００６１】
以上のように本実施の形態によれば、中央演算処理装置１３は、スピーカ９からマイクロフォン１０へと回り込むエコー成分を推定する伝達関数推定手段１７と、推定したエコー成分を除去するエコーフィルタ手段１８〜１９と、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出手段１６とを有し、話者検出手段１６は、無音期間を計算すると共に計算した無音期間の長短に応じて伝達関数推定手段における正規化ＬＭＳ方式のオフセット項の値を制御することにより、無音期間の長い場合には伝達関数推定手段における正規化ＬＭＳ方式のオフセット項の値を増加させ、無音期間の短い場合には伝達関数推定手段における正規化ＬＭＳ方式のオフセット項の値を減少させることができるので、突発性のノイズに対してエコーキャンセル処理の耐性を高めることができる。
【００６２】
【発明の効果】
以上説明したように本発明の請求項１に記載のエコーキャンセル装置によれば、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置であって、中央演算処理装置は、スピーカからマイクロフォンへと回り込むエコー成分を推定する伝達関数推定手段と、推定したエコー成分を除去するエコーフィルタ手段と、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出手段とを有し、話者検出手段は、受話音声信号としての遠端話者音声信号および送話音声信号としての近端話者音声信号のパワーを演算してリーク積分するリーク積分手段と、リーク積分手段の保持する演算結果をもとに遠端話者および近端話者の発話状態を検出する検出手段とを有することにより、話者検出手段が前回検出動作までに利用した遠端話者音声信号および近端話者音声信号のパワーの平均値を閾値として用いることができるので、近端話者の空間移動などに伴って伝達関数の急激な変化が生じた場合にも、変化に追従することができ、エコーの増大やハウリング発生を抑制することができるという有利な効果が得られる。
【００６３】
請求項２に記載のエコーキャンセル装置によれば、請求項１に記載のエコーキャンセル装置において、話者検出手段は、受話音声信号のリーク積分値の平均値を計算して保持する受話平均値保持手段と、送話音声信号のリーク積分値の平均値を計算して保持する送話平均値保持手段とを有することにより、リーク積分値の平均値すなわち背景騒音のレベルに自動適応可能な話者検出を行うことができるという有利な効果が得られる。
【００６４】
請求項３に記載のエコーキャンセル装置によれば、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置であって、中央演算処理装置は、スピーカからマイクロフォンへと回り込むエコー成分を推定する伝達関数推定手段と、推定したエコー成分を除去するエコーフィルタ手段と、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出手段とを有し、話者検出手段は、遠端話者、近端話者およびダブルトークの発話期間を計算すると共に計算した発話期間の長短に応じて伝達関数推定手段における学習係数項の値を制御することにより、発話期間の長い場合には伝達関数推定手段における学習係数項の値を増加させ、発話期間の短い場合には伝達関数推定手段における学習係数項の値を減少させることができるので、突発性のノイズに対してエコーキャンセル処理の耐性を高めることができるという有利な効果が得られる。
【００６５】
請求項４に記載のエコーキャンセル装置によれば、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置であって、中央演算処理装置は、スピーカからマイクロフォンへと回り込むエコー成分を推定する伝達関数推定手段と、推定したエコー成分を除去するエコーフィルタ手段と、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出手段とを有し、話者検出手段は、無音期間を計算すると共に計算した無音期間の長短に応じて伝達関数推定手段における正規化ＬＭＳ方式のオフセット項の値を制御することにより、無音期間の長い場合には伝達関数推定手段における正規化ＬＭＳ方式のオフセット項の値を増加させ、無音期間の短い場合には伝達関数推定手段における正規化ＬＭＳ方式のオフセット項の値を減少させることができるので、突発性のノイズに対してエコーキャンセル処理の耐性を高めることができるという有利な効果が得られる。
【００６６】
請求項５に記載のエコーキャンセル方法によれば、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置におけるエコーキャンセル方法であって、スピーカからマイクロフォンへと回り込むエコー成分を推定する伝達関数推定ステップと、推定したエコー成分を除去するエコーフィルタステップと、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出ステップとを有し、話者検出ステップは、受話音声信号としての遠端話者音声信号および送話音声信号としての近端話者音声信号のパワーを演算してリーク積分するリーク積分ステップと、リーク積分ステップにおける演算結果をもとに遠端話者および近端話者の発話状態を検出する検出ステップとを有することにより、話者検出手段が前回検出動作までに利用した遠端話者音声信号および近端話者音声信号のパワーの平均値を閾値として用いることができるので、近端話者の空間移動などに伴って伝達関数の急激な変化が生じた場合にも、変化に追従することができ、エコーの増大やハウリング発生を抑制するができるという有利な効果が得られる。
【００６７】
請求項６に記載のエコーキャンセル方法によれば、請求項５に記載のエコーキャンセル方法であって、話者検出ステップは、受話音声信号のリーク積分値の平均値を計算して保持する受話平均値保持ステップと、送話音声信号のリーク積分値の平均値を計算して保持する送話平均値保持ステップとを有することにより、リーク積分値の平均値すなわち背景騒音のレベルに自動適応可能な話者検出を行うことができるという有利な効果が得られる。
【００６８】
請求項７に記載のエコーキャンセル方法によれば、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置におけるエコーキャンセル方法であって、スピーカからマイクロフォンへと回り込むエコー成分を推定する伝達関数推定ステップと、推定したエコー成分を除去するエコーフィルタステップと、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出ステップとを有し、話者検出ステップでは、遠端話者、近端話者およびダブルトークの発話期間を計算すると共に計算した発話期間の長短に応じて伝達関数推定ステップにおける学習係数項の値を制御することにより、発話期間の長い場合には伝達関数推定手段における学習係数項の値を増加させ、発話期間の短い場合には伝達関数推定手段における学習係数項の値を減少させることができるので、突発性のノイズに対してエコーキャンセル処理の耐性を高めることができるという有利な効果が得られる。
【００６９】
請求項８に記載のエコーキャンセル方法によれば、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置におけるエコーキャンセル方法であって、スピーカからマイクロフォンへと回り込むエコー成分を推定する伝達関数推定ステップと、推定したエコー成分を除去するエコーフィルタステップと、遠端話者の発話、近端話者の発話およびダブルトークを検出する話者検出ステップとを有し、話者検出ステップでは、無音期間を計算すると共に計算した無音期間の長短に応じて伝達関数推定ステップにおける正規化ＬＭＳ方式のオフセット項の値を制御することにより、無音期間の長い場合には伝達関数推定手段における正規化ＬＭＳ方式のオフセット項の値を増加させ、無音期間の短い場合には伝達関数推定手段における正規化ＬＭＳ方式のオフセット項の値を減少させることができるので、突発性のノイズに対してエコーキャンセル処理の耐性を高めることができるという有利な効果が得られる。
【００７０】
請求項９に記載のプログラムによれば、請求項５乃至８のいずれか１に記載されたエコーキャンセル方法の各ステップを実行するためのプログラムであることにより、上記プログラムを実行するコンピュータを用いることにより、請求項５乃至８のいずれか１に記載されたエコーキャンセル方法を任意の場所で任意の時間に実行することができるという有利な効果が得られる。
【００７１】
請求項１０に記載の記録媒体によれば、請求項９に記載されたプログラムを実行するためのコンピュータで読み取り可能な記録媒体であることにより、コンピュータで読み取り可能な記録媒体からプログラムを読み取ることにより、請求項９に記載されたプログラムを任意の場所で任意の時間に実行することができるという有利な効果が得られる。
【図面の簡単な説明】
【図１】本発明の実施の形態１、２、３、４によるエコーキャンセル装置の基本構成を示すブロック図
【図２】図１の中央演算処理装置における機能実現手段を示す機能ブロック図
【図３】図２の中央演算処理装置の動作を示すフローチャート
【図４】本発明の実施の形態２によるエコーキャンセル装置の中央演算処理装置の話者検出手段を示す機能ブロック図
【図５】図４の話者検出手段の動作を示すフローチャート
【図６】図４の話者検出手段の動作を示すフローチャート
【図７】図４の話者検出手段の動作を示すフローチャート
【図８】従来のエコーキャンセル装置を示す機能ブロック図
【符号の説明】
６電話回路装置
７、１１Ａ／Ｄ変換装置
８、１２Ｄ／Ａ変換装置
９スピーカ
１０マイクロフォン
１３中央演算処理装置
１４ＲＯＭ
１５ＲＡＭ
１６話者検出手段
１７伝達関数推定手段
１８直接エコーフィルタ手段
１９間接エコーフィルタ手段
２０減算手段
２１、２２リーク積分手段
２３受話平均値保持手段
２４送話平均値保持手段
２５検出手段[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention is directed to an echo canceling device including a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. The present invention also relates to an echo canceling method in the echo canceling device, a program for executing the echo canceling method, and a recording medium for executing the program.
[0002]
[Prior art]
In a voice hands-free device such as a speakerphone type telephone, there is an echo canceling technique for preventing howling and echo. According to this echo canceling technology, a sound output from a speaker is converted from a sound (echo) input to a microphone through a space such as a room through a transfer function simulating the space and a sound output to the speaker. By subtracting the signal obtained by convoluting, it is possible to eliminate echo.
[0003]
Hereinafter, a conventional echo cancellation technique will be described using (Patent Document 1). FIG. 8 is a functional block diagram showing a conventional echo canceling device.
[0004]
In FIG. 8, reference numeral 1 denotes a speaker for reproducing a received voice (voice from a far-end speaker) in a speakerphone type telephone or the like; 2, a microphone for picking up a transmitted voice (voice from a near-end speaker); A first echo canceling unit for canceling an echo passing through the path, a double talk detecting unit for detecting a double talk state using an output signal of the first echo canceling unit, and a canceling echo for passing through an indirect propagation path. This is a second echo canceling unit.
[0005]
In the second echo canceling unit 5, adaptive control is performed in order to follow a change in the propagation path, and several methods such as a normalized LMS (Least Mean Square) are known. The double talk detecting unit 4 calculates the ratio of the level of the transmission signal to the level of the reception signal, compares the ratio with a predetermined threshold value, determines the speaker, and performs the adaptive control of the second echo cancellation unit 5. Controls the presence or absence of motion.
[0006]
[Patent Document 1]
JP-A-5-48547
[0007]
[Problems to be solved by the invention]
However, in the above-described conventional echo canceling apparatus, when the transfer function changes abruptly due to the spatial movement of the near-end speaker or the like, erroneous determination of speaker detection is caused, and as a result, echo is increased or howling occurs. Had the problem of doing so.
[0008]
The echo canceling apparatus, the echo canceling method, the program, and the recording medium are required to suppress the increase of the echo and the occurrence of howling even when the transfer function suddenly changes due to the spatial movement of the near-end speaker. ing.
[0009]
The present invention satisfies this requirement by providing an echo canceling device capable of suppressing an increase in echo and howling even when the transfer function changes abruptly, and an increase in echo even when the transfer function changes abruptly. It is an object of the present invention to provide an echo canceling method for suppressing occurrence of howling, a program for executing the echo canceling method, and a recording medium for executing the program.
[0010]
[Means for Solving the Problems]
In order to solve the above problems, an echo canceling device of the present invention controls a speaker that outputs voice such as a received voice from a far-end speaker, and a microphone that receives voice of a near-end speaker or the like. An echo canceling device having a central processing unit, the central processing unit comprising: a transfer function estimating unit for estimating an echo component circulating from a speaker to a microphone; an echo filter unit for removing the estimated echo component; Speaker detection means for detecting the utterance of the end speaker, the utterance of the near-end speaker, and double talk, wherein the speaker detection means includes a far-end speaker voice signal as a reception voice signal and a transmission voice signal. Leak integration means for calculating the power of the near-end speaker's voice signal and performing leak integration, and the utterance states of the far-end speaker and the near-end speaker based on the calculation results held by the leak integration means And a structure having a detecting means for detecting.
[0011]
As a result, an echo canceling device that can suppress the increase in echo and the occurrence of howling even when the transfer function changes abruptly is obtained.
[0012]
In order to solve the above problems, an echo canceling method of the present invention controls a speaker that outputs voice such as a received voice from a far-end speaker, and a microphone that receives voice of a near-end speaker or the like. An echo canceling method in an echo canceling device having a central processing unit, comprising: a transfer function estimating step for estimating an echo component circulating from a speaker to a microphone; an echo filter step for removing the estimated echo component; Speaker detection step for detecting a speaker's utterance, a near-end speaker's utterance and double talk, and the speaker detection step includes a near-end speaker's voice signal as a reception voice signal and a near-end voice signal as a transmission voice signal. A leak integration step for calculating the power of the end-talker voice signal to perform leak integration, and a calculation based on the calculation result in the leak integration step. And a configuration having a detection step of detecting an utterance state of end talker and near-end talker.
[0013]
As a result, an echo canceling method that suppresses the increase in echo and the occurrence of howling even when the transfer function changes rapidly can be obtained.
[0014]
In order to solve the above-mentioned problems, a program according to the present invention has a configuration that is a program for executing each step of the echo canceling method.
[0015]
As a result, a program for executing the echo canceling method is obtained.
[0016]
In order to solve the above problems, a recording medium of the present invention has a configuration that is a computer-readable recording medium for executing the above-mentioned program.
[0017]
Thereby, a recording medium for executing the program is obtained.
[0018]
BEST MODE FOR CARRYING OUT THE INVENTION
An echo cancellation apparatus according to a first aspect of the present invention includes a speaker that outputs voice such as a received voice from a far-end speaker, a microphone that receives voice of a near-end speaker, and a central control unit that controls the whole. An echo canceling device having an arithmetic processing device, wherein the central arithmetic processing device includes: a transfer function estimating means for estimating an echo component circulating from the speaker to the microphone; an echo filter means for removing the estimated echo component; Speaker detection means for detecting the utterance of the speaker, the utterance of the near-end speaker, and double talk, wherein the speaker detection means includes a far-end speaker voice signal as a reception voice signal and a transmission voice signal as a transmission voice signal. Leak integration means for calculating the power of the near-end speaker voice signal for leak integration, and detecting the speech state of the far-end speaker and the near-end speaker based on the calculation results held by the leak integration means In which it was decided to have a detecting means that.
[0019]
With this configuration, the average value of the power of the far-end speaker voice signal and the power of the near-end speaker voice signal used by the speaker detection means up to the previous detection operation can be used as the threshold value, so that the spatial movement of the near-end speaker can be used. Thus, even when a sudden change occurs in the transfer function due to the above-mentioned situation, the transfer function can be followed and the increase in echo and the occurrence of howling can be suppressed.
[0020]
According to a second aspect of the present invention, there is provided the echo canceling apparatus according to the first aspect, wherein the speaker detecting means calculates and holds an average value of the leak integrated value of the received voice signal; Transmission average value holding means for calculating and holding the average value of the leak integrated value of the transmission voice signal.
[0021]
With this configuration, there is an effect that speaker detection that can be automatically adapted to the average value of the leak integration values, that is, the background noise level can be performed.
[0022]
The echo canceling device according to claim 3, wherein a speaker that outputs voice such as a received voice from a far-end speaker, a microphone that receives voice of a near-end speaker or the like, and a central processing unit that controls the whole. A central processing unit, a transfer function estimating means for estimating an echo component wrapping around from the speaker to the microphone, an echo filter means for removing the estimated echo component, and a far-end speaker Speaker detection means for detecting the utterance, the utterance of the near-end speaker, and the double talk, wherein the speaker detection means calculates and calculates the utterance periods of the far-end speaker, the near-end speaker, and the double-talk The value of the learning coefficient term in the transfer function estimating means is controlled according to the length of the speech period.
[0023]
With this configuration, the value of the learning coefficient term in the transfer function estimating means can be increased when the speech period is long, and the value of the learning coefficient term in the transfer function estimating means can be decreased when the speech period is short. This has the effect of increasing the resistance of the echo cancellation processing to sudden noise.
[0024]
5. The echo canceling device according to claim 4, wherein the speaker outputs a sound such as a received voice from a far-end speaker, a microphone to which a voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. A central processing unit, a transfer function estimating means for estimating an echo component wrapping around from the speaker to the microphone, an echo filter means for removing the estimated echo component, and a far-end speaker Speaker detecting means for detecting the utterance, the near-end speaker's utterance and the double talk, wherein the speaker detecting means calculates a silence period, and the normalization in the transfer function estimating means according to the length of the calculated silence period. This is to control the value of the offset term of the generalized LMS method.
[0025]
With this configuration, when the silent period is long, the value of the offset term of the normalized LMS method in the transfer function estimating means is increased, and when the silent period is short, the value of the offset term of the normalized LMS method in the transfer function estimating means is increased. Has the effect of increasing the resistance of the echo cancellation process to sudden noise.
[0026]
The echo canceling method according to claim 5, wherein a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. An echo canceling method in an echo canceling device having a transfer function estimating step of estimating an echo component wrapping around from a speaker to a microphone, an echo filter step of removing the estimated echo component, and uttering a far-end speaker. A speaker detection step of detecting a near-end speaker's utterance and double talk, wherein the speaker detection step includes a far-end speaker voice signal as a reception voice signal and a near-end speaker voice as a transmission voice signal A leak integration step for calculating signal power to perform leak integration, and a far-end speaker and a near-end speaker based on a calculation result in the leak integration step. In which it was decided to have a detection step of detecting an utterance state of the speaker.
[0027]
With this configuration, the average value of the power of the far-end speaker voice signal and the power of the near-end speaker voice signal used by the speaker detection means up to the previous detection operation can be used as the threshold value, so that the spatial movement of the near-end speaker can be used. Thus, even when a sudden change in the transfer function occurs, the change can be followed and the increase in echo and the occurrence of howling can be suppressed.
[0028]
An echo canceling method according to a sixth aspect is the echo canceling method according to the fifth aspect, wherein the speaker detection step calculates and holds an average value of the leak integrated value of the received voice signal, and holds the average value. And a transmitting average value storing step of calculating and storing the average value of the leak integrated value of the transmitting voice signal.
[0029]
With this configuration, there is an effect that speaker detection that can be automatically adapted to the average value of the leak integration values, that is, the background noise level can be performed.
[0030]
8. The echo canceling method according to claim 7, wherein a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. An echo canceling method in an echo canceling device having a transfer function estimating step of estimating an echo component wrapping around from a speaker to a microphone, an echo filter step of removing the estimated echo component, and uttering a far-end speaker. A speaker detection step of detecting the near-end speaker's utterance and double talk, and in the speaker detection step, calculate the utterance periods of the far-end speaker, the near-end speaker, and the double-talk, and calculate the calculated utterance periods The value of the learning coefficient term in the transfer function estimation step is controlled in accordance with the length of.
[0031]
With this configuration, the value of the learning coefficient term in the transfer function estimating means can be increased when the speech period is long, and the value of the learning coefficient term in the transfer function estimating means can be decreased when the speech period is short. This has the effect of increasing the resistance of the echo cancellation processing to sudden noise.
[0032]
9. The echo canceling method according to claim 8, wherein a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. An echo canceling method in an echo canceling device having a transfer function estimating step of estimating an echo component wrapping around from a speaker to a microphone, an echo filter step of removing the estimated echo component, and uttering a far-end speaker. A speaker detection step for detecting a near-end speaker's utterance and double talk, wherein the speaker detection step calculates a silence period and normalizes the LMS in the transfer function estimation step according to the calculated length of the silence period. This is to control the value of the offset term of the method.
[0033]
With this configuration, when the silent period is long, the value of the offset term of the normalized LMS method in the transfer function estimating means is increased, and when the silent period is short, the value of the offset term of the normalized LMS method in the transfer function estimating means is increased. Has the effect of increasing the resistance of the echo cancellation process to sudden noise.
[0034]
A program according to a ninth aspect is a program for executing each step of the echo canceling method according to any one of the fifth to eighth aspects.
[0035]
With this configuration, the use of the computer that executes the program has an effect that the echo cancellation method according to any one of claims 5 to 8 can be executed at an arbitrary place and at an arbitrary time.
[0036]
A recording medium according to a tenth aspect is a computer-readable recording medium for executing the program according to the ninth aspect.
[0037]
With this configuration, by reading the program from a computer-readable recording medium, the program described in claim 9 can be executed at an arbitrary place and at an arbitrary time.
[0038]
Hereinafter, an embodiment of the present invention will be described with reference to FIGS.
[0039]
(Embodiment 1)
FIG. 1 is a block diagram showing a basic configuration of the echo canceling device according to the first embodiment of the present invention.
[0040]
In FIG. 1, 6 is a telephone circuit device having an interface with a telephone line, 7 is a first A / D converter for converting a received voice electric signal, which is an analog electric signal, into a digital electric signal, and 8 is a digital electric signal. A first D / A converter for converting an analog electric signal into an analog electric signal, a speaker 9 for converting an analog electric signal from the D / A converter 8 into a sound, a microphone 10 for converting a sound into an analog electric signal, and a microphone 11 A second A / D converter for converting an analog electric signal from a digital signal into a digital electric signal, a second D / A converter for converting a digital electric signal into an analog electric signal (transmitted voice electric signal); Performs digital signal processing on the digital electric signals obtained from the A / D converter 7 and the A / D converter 11, and converts the operation result into a D / A converter. 8 and a central processing unit that outputs to the D / A converter 12; 14 is a ROM (Read Only Memory) in which a program for operating the central processing unit 13 is stored; and 15 is stored in the ROM. A RAM (Random Access Memory) used when the central processing unit 13 operates according to a program.
[0041]
FIG. 2 is a functional block diagram showing function realizing means (means for realizing a function by a program) in the central processing unit 13 of FIG. 1, and shows an echo canceling method in a speakerphone telephone. This function shows an outline of a program recorded in the ROM 14.
[0042]
In FIG. 2, reference numeral 16 denotes a speaker-phone type telephone or the like for controlling the operation of the echo canceling device by controlling the utterance of the far-end speaker, the utterance of the near-end speaker, and the double talk (the far-end speaker and the near-end speaker). A speaker detecting means 17 for detecting simultaneous utterance) is a transfer function estimator for estimating a transfer function of a space between the speaker 9 and the microphone 10 by a steepest descent method represented by a normalized LMS (Least Mean Square) method or the like. Means 18, a direct echo filter means for performing a convolution operation between the transfer function of the direct echo component and the received voice, 19 an indirect echo filter means for performing a convolution operation between the transfer function of the indirect echo component and the received voice, and 20 a subtraction means It is.
[0043]
The schematic operation of the thus configured echo canceling device will be described. The sound radiated from the speaker 9 is input to the microphone 10 via the space as an echo, and a closed loop is formed. If the echo cancellation processing is not performed, the worst howling occurs. The sound radiated from the speaker 9 can be classified into a direct echo component directly input to the microphone 10 and an indirect echo component which enters the microphone 10 after being reflected by an object such as a wall, floor, or ceiling in the space.
[0044]
FIG. 3 is a flowchart showing the operation of the central processing unit 3 of FIG. 2, and shows an echo canceling method in a speakerphone telephone.
[0045]
In FIG. 3, when the echo canceling process is started (S1), the speaker detecting means 16 determines far-end speaker utterance, near-end speaker utterance, and double-talk (S2). Estimating means 17 performs direct wave component transfer function estimation (S3) and indirect wave component transfer function estimation (S4) using an algorithm such as NLMS, and direct echo filter means 18 performs a convolution operation between the estimation result and the received voice. (S5) The indirect echo filter means 19 performs a convolution operation on the estimation result and the received voice (S6), and subtracts the transmitted voice from the microphone 10 and the convolution operation result using the subtraction means 20 to obtain a direct echo component. And the indirect echo component are removed (S7).
[0046]
As a result, it is possible to perform an echo cancellation process that realizes both high-speed and high-accuracy transfer function estimation.
[0047]
As described above, according to the present embodiment, the convolution operation of the estimation result and the received voice is performed by the direct echo filter unit 18, the convolution operation of the estimation result and the received voice is performed by the indirect echo filter unit 19, and the microphone 10 The direct speech component and the indirect echo component are removed by subtracting the transmitted voice from the convolution and the convolution operation result using the subtraction means 20, so that even if the volume from the speaker is increased, the accuracy of the double talk determination can be reduced. It is possible to increase the double talk detection accuracy even when the voice power ratio between the received voice and the transmitted voice is the same.
[0048]
(Embodiment 2)
FIG. 4 is a functional block diagram showing speaker detection means of the central processing unit of the echo cancellation apparatus according to the second embodiment of the present invention, and shows a speaker detection method in the echo cancellation system. The basic configuration of the echo canceling apparatus according to the present embodiment is the configuration shown in FIG. 1, and the configuration of the central processing unit of the echo canceling apparatus according to the present embodiment is the configuration shown in FIG. The function of FIG. 4 shows the outline of the program recorded in the ROM 14.
[0049]
In FIG. 4, reference numeral 21 denotes leak integration means for performing leak integration of received voice (far-end speaker voice) (leak integration is integration of input data performed in the form of integration value = integration value × constant + input data); Is a leak integration means for performing leak integration of the transmitted voice (near-end speaker voice), 23 is a reception average value holding means for holding the average value of the calculation result of the leak integration means 21, and 24 is a calculation result of the leak integration means 22 Means 25 for holding the average value of the utterances, the far end speaker utterance and the near end speaker utterance using the values held in the reception average value holding means 23 and the transmission average value holding means 24 as thresholds. , Detecting means for determining double talk.
[0050]
The operation of the thus configured speaker detecting means 16 will be described with reference to FIG. FIG. 5 is a flowchart showing the operation of the speaker detecting means 16 of FIG.
[0051]
In FIG. 5, when the echo canceling process is started (S11), the leak integrating means 21 calculates the leak integral of the received voice (S12), and the average value is stored in the received average value holding means 23 (S13). The means 22 calculates the leak integral of the transmitted voice (S14), and the average value is stored in the transmitted average value holding means 24 (S15). The detecting means 25 determines the far-end speaker utterance, the near-end speaker utterance, and the double-talk by using the values held in the reception average value holding means 23 and the transmission average value holding means 24 as threshold values.
[0052]
As described above, according to the present embodiment, the central processing unit 13 includes the transfer function estimating unit 17 for estimating the echo component wrapping around from the speaker 9 to the microphone 10 and the echo filter unit 18 for removing the estimated echo component. To 19, and speaker detection means 16 for detecting speech of the far-end speaker, speech of the near-end speaker, and double talk, and the speaker detection means 16 includes a far-end speaker voice as a received voice signal. Leak integration means 21 and 22 for calculating the power of the near-end speaker voice signal as the signal and the transmission voice signal and performing leak integration, and the far-end speaker based on the calculation results held by the leak integration means 21 and 22. And the detecting means 25 for detecting the utterance state of the near-end speaker, the power of the far-end speaker voice signal and the power of the near-end speaker voice signal used by the speaker detecting means 16 up to the previous detection operation. Since the average value can be used as the threshold value, even when the transfer function suddenly changes due to the spatial movement of the near-end speaker, the change can be followed, and the increase in echo and the occurrence of howling can be suppressed. Can be suppressed.
[0053]
The speaker detecting means 16 calculates and holds the average value of the leak integrated value of the received voice signal, and calculates and holds the average value of the leak integrated value of the transmitted voice signal. By having the transmission average value holding means 24, it is possible to perform speaker detection automatically adaptable to the average value of the leak integrated value, that is, the background noise level.
[0054]
(Embodiment 3)
The basic configuration of the echo canceling apparatus according to the third embodiment of the present invention is the configuration shown in FIG. 1, and the configuration of the central processing unit of the echo canceling apparatus according to the present embodiment is the configuration shown in FIG. Further, the configuration of the speaker detection means 16 of the central processing unit 13 of the echo canceling apparatus according to the third embodiment of the present invention is the configuration shown in FIG.
[0055]
The operation of the thus configured speaker detecting means 16 will be described with reference to FIG. FIG. 6 is a flowchart showing the operation of the speaker detecting means 16 of FIG.
[0056]
6, when the echo canceling process is started (S21), the detecting means 25 calculates an utterance period (S22). If the utterance period is longer than a certain period (S23), the value of the learning coefficient term in the transfer function estimating means 17 is obtained. Is increased (S24), and if the utterance period is shorter than a certain period, the value of the learning coefficient term in the transfer function estimating means 17 is decreased (S25).
[0057]
As described above, according to the present embodiment, the central processing unit 13 includes the transfer function estimating unit 17 for estimating the echo component wrapping around from the speaker 9 to the microphone 10 and the echo filter unit 18 for removing the estimated echo component. -19, and speaker detection means 16 for detecting the utterance of the far-end speaker, the utterance of the near-end speaker, and double talk, and the speaker detection means 16 includes a far-end speaker, a near-end speaker, By calculating the speech period of the double talk and controlling the value of the learning coefficient term in the transfer function estimating means according to the length of the calculated speech period, when the speech period is long, the learning coefficient term in the transfer function estimating means is controlled. When the speech period is short, the value of the learning coefficient term in the transfer function estimating means can be decreased. Resistance can be increased.
[0058]
(Embodiment 4)
The basic configuration of the echo canceling apparatus according to the fourth embodiment of the present invention is the configuration shown in FIG. 1, and the configuration of the central processing unit of the echo canceling apparatus according to the present embodiment is the configuration shown in FIG. Further, the configuration of the speaker detection means 16 of the central processing unit 13 of the echo canceling device according to the fourth embodiment of the present invention is the configuration shown in FIG.
[0059]
The operation of the thus configured speaker detecting means 16 will be described with reference to FIG. FIG. 7 is a flowchart showing the operation of the speaker detecting means 16 of FIG.
[0060]
7, when the echo canceling process is started (S31), the detecting unit 25 calculates a silent period (S32). If the silent period is longer than a certain period (S33), the transfer function estimating unit 17 uses the normalized LMS method. The value of the offset term is increased (S34), and if the silence period is shorter than a certain period, the value of the offset term of the normalized LMS method in the transfer function estimating means 17 is decreased (S35).
[0061]
As described above, according to the present embodiment, the central processing unit 13 includes the transfer function estimating unit 17 for estimating the echo component wrapping around from the speaker 9 to the microphone 10 and the echo filter unit 18 for removing the estimated echo component. And speaker detection means 16 for detecting the speech of the far-end speaker, the speech of the near-end speaker, and double talk, and the speaker detection means 16 calculates the silence period and the calculated silence period. By controlling the value of the offset term of the normalized LMS method in the transfer function estimating means in accordance with the length of the transfer function, the value of the offset term of the normalized LMS method in the transfer function estimating means is increased when the silent period is long, When the silent period is short, the value of the offset term of the normalized LMS method in the transfer function estimating means can be reduced. It is possible to increase the resistance of the echo cancellation processing.
[0062]
【The invention's effect】
As described above, according to the echo cancellation apparatus of the first aspect of the present invention, a speaker that outputs voice such as a received voice from a far-end speaker, and a microphone that receives voice of a near-end speaker or the like And a central processing unit for controlling the whole, wherein the central processing unit removes the estimated echo component, and a transfer function estimating means for estimating an echo component wrapping around from the speaker to the microphone. Echo filter means, and speaker detection means for detecting utterance of a far-end speaker, utterance of a near-end speaker, and double talk, wherein the speaker detection means includes a far-end speaker voice signal as a received voice signal And leak integration means for calculating the power of the near-end speaker voice signal as the transmission voice signal and performing leak integration, and the far-end speaker and the remote end speaker based on the calculation results held by the leak integration means. Detecting means for detecting the utterance state of the end speaker, the average value of the power of the far-end speaker voice signal and the power of the near-end speaker voice signal used by the speaker detecting means up to the previous detection operation as a threshold value Since it can be used, even when a sudden change in the transfer function occurs due to the spatial movement of the near end speaker, the change can be followed, and the increase in echo and howling can be suppressed. The advantageous effect described above can be obtained.
[0063]
According to the echo canceling device of the second aspect, in the echo canceling device of the first aspect, the speaker detecting means calculates and holds the average value of the leak integrated value of the received voice signal, and holds the average value of the received voice. Means, and a transmission average value holding means for calculating and holding the average value of the leak integration value of the transmission voice signal, so that the speaker automatically adaptable to the average value of the leakage integration value, that is, the background noise level. An advantageous effect that detection can be performed is obtained.
[0064]
According to the echo canceling device of the third aspect, a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. A central processing unit, a transfer function estimating unit for estimating an echo component circulating from the speaker to the microphone, an echo filter unit for removing the estimated echo component, and a far-end talk. Speaker detection means for detecting the utterance of the speaker, the utterance of the near-end speaker, and double talk, wherein the speaker detection means calculates the utterance periods of the far-end speaker, the near-end speaker, and the double-talk, and By controlling the value of the learning coefficient term in the transfer function estimating means in accordance with the length of the calculated utterance period, the learning coefficient term in the transfer function estimating means is controlled when the utterance period is long. The value can be increased, and the value of the learning coefficient term in the transfer function estimating means can be reduced when the utterance period is short, so that the resistance of the echo cancellation processing to sudden noise can be increased. The effect is obtained.
[0065]
According to the echo canceling device of the fourth aspect, a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. A central processing unit, a transfer function estimating unit for estimating an echo component circulating from the speaker to the microphone, an echo filter unit for removing the estimated echo component, and a far-end talk. Speaker detection means for detecting speaker utterance, near-end speaker utterance and double talk, wherein the speaker detection means calculates a silent period and a transfer function estimating means according to the calculated length of the silent period. By controlling the value of the offset term of the normalized LMS method in the above, when the silent period is long, the offset of the normalized LMS method in the transfer function estimating means is controlled. Can be increased, and the value of the offset term of the normalized LMS method in the transfer function estimating means can be reduced when the silence period is short, so that the resistance of the echo cancellation processing to sudden noise can be improved. The advantageous effect that can be obtained is obtained.
[0066]
According to the echo canceling method of the fifth aspect, a speaker for outputting voice such as a received voice from a far-end speaker, a microphone for inputting voice of a near-end speaker or the like, and a central processing unit for controlling the whole. A transfer function estimating step for estimating an echo component wrapping around from the speaker to the microphone; an echo filter step for removing the estimated echo component; And a speaker detection step for detecting utterance, near-end speaker utterance and double talk, wherein the speaker detection step includes a far-end talker voice signal as a received voice signal and a near-end talk signal as a transmitted voice signal. Leak integration step for calculating the power of the speaker's voice signal and performing leak integration, and the far end speaker and the And a detecting step of detecting the utterance state of the near-end speaker and the near-end speaker. Since it can be used as a threshold value, even when a sudden change in the transfer function occurs due to spatial movement of the near end speaker, it can follow the change, suppressing the increase in echo and the occurrence of howling. The advantageous effect that it can be obtained is obtained.
[0067]
According to the echo canceling method of the sixth aspect, in the echo canceling method of the fifth aspect, the speaker detecting step calculates and holds an average value of a leak integrated value of the received voice signal. By having a value holding step and a transmission average value holding step of calculating and holding the average value of the leak integration value of the transmission voice signal, it is possible to automatically adapt to the average value of the leakage integration value, that is, the level of the background noise. An advantageous effect that speaker detection can be performed is obtained.
[0068]
According to the echo canceling method of the seventh aspect, a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. A transfer function estimating step for estimating an echo component wrapping around from the speaker to the microphone; an echo filter step for removing the estimated echo component; A speaker detection step for detecting the utterance, the near-end speaker's utterance, and the double talk, and in the speaker detection step, the speech periods of the far-end speaker, the near-end speaker, and the double-talk are calculated and calculated. By controlling the value of the learning coefficient term in the transfer function estimation step according to the length of the utterance period, transmission is performed when the utterance period is long. Since the value of the learning coefficient term in the number estimating means can be increased, and the value of the learning coefficient term in the transfer function estimating means can be reduced in the case of a short utterance period, the robustness of the echo cancellation processing to sudden noise Has the advantageous effect of being able to increase the
[0069]
According to the echo canceling method of the eighth aspect, a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. A transfer function estimating step for estimating an echo component wrapping around from the speaker to the microphone; an echo filter step for removing the estimated echo component; A speaker detection step for detecting the utterance, the near-end speaker's utterance, and the double talk, and in the speaker detection step, a silent period is calculated, and a normalization in the transfer function estimation step is performed in accordance with the calculated length of the silent period. By controlling the value of the offset term of the generalized LMS method, the transfer function estimating means is used in the case of a long silent period. Since the value of the offset term of the normalized LMS method can be increased, and the value of the offset term of the normalized LMS method in the transfer function estimating means can be reduced when the silent period is short. An advantageous effect that the resistance of the cancel processing can be improved can be obtained.
[0070]
According to a ninth aspect of the present invention, there is provided a program for executing each step of the echo canceling method according to any one of the fifth to eighth aspects, so that a computer that executes the program is used. Accordingly, the advantageous effect that the echo canceling method described in any one of claims 5 to 8 can be executed at an arbitrary place and at an arbitrary time is obtained.
[0071]
According to the recording medium of the tenth aspect, by being a computer-readable recording medium for executing the program of the ninth aspect, by reading the program from the computer-readable recording medium The advantageous effect that the program described in claim 9 can be executed at an arbitrary place and at an arbitrary time can be obtained.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a basic configuration of an echo canceling apparatus according to Embodiments 1, 2, 3, and 4 of the present invention.
FIG. 2 is a functional block diagram showing function realizing means in the central processing unit of FIG. 1;
FIG. 3 is a flowchart showing the operation of the central processing unit of FIG. 2;
FIG. 4 is a functional block diagram showing speaker detection means of the central processing unit of the echo canceling device according to the second embodiment of the present invention.
FIG. 5 is a flowchart showing the operation of the speaker detecting means of FIG. 4;
FIG. 6 is a flowchart showing the operation of the speaker detecting means of FIG. 4;
FIG. 7 is a flowchart showing the operation of the speaker detecting means of FIG. 4;
FIG. 8 is a functional block diagram showing a conventional echo cancellation device.
[Explanation of symbols]
6 Telephone circuit device
7,11 A / D converter
8,12 D / A converter
9 Speaker
10 microphone
13 Central processing unit
14 ROM
15 RAM
16 Speaker detection means
17 Transfer function estimation means
18 Direct echo filter means
19 Indirect echo filter means
20 Subtraction means
21, 22 leak integrating means
23 Receiving average value holding means
24 Transmission average value holding means
25 Detecting means

Claims

An echo cancellation device having a speaker that outputs voice such as a reception voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole,
The central processing unit includes transfer function estimating means for estimating an echo component wrapping around from the speaker to the microphone, echo filter means for removing the estimated echo component, speech of a far-end speaker, and near-end speaker Speaker detection means for detecting the utterance and double talk of
The speaker detection means includes: a leak integration means for calculating a power of a far-end speaker voice signal as a reception voice signal and a power of a near-end speaker voice signal as a transmission voice signal to perform leak integration; An echo canceling device comprising: detecting means for detecting the utterance state of the far-end speaker and the near-end speaker based on the calculation results held.

The speaker detecting means calculates and holds an average value of the leak integrated value of the received voice signal, and a transmission average which calculates and holds the average value of the leak integrated value of the transmitted voice signal. The echo canceling apparatus according to claim 1, further comprising a value holding unit.

An echo cancellation device having a speaker that outputs voice such as a reception voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole,
The central processing unit includes transfer function estimating means for estimating an echo component wrapping around from the speaker to the microphone, echo filter means for removing the estimated echo component, speech of a far-end speaker, and near-end speaker Speaker detection means for detecting the utterance and double talk of
The speaker detecting means calculates speech periods of the far-end speaker, the near-end speaker, and the double talk, and controls a value of a learning coefficient term in the transfer function estimating means according to the calculated length of the speech period. It is characterized by an echo cancellation device.

An echo cancellation device having a speaker that outputs voice such as a reception voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole,
The central processing unit includes transfer function estimating means for estimating an echo component wrapping around from the speaker to the microphone, echo filter means for removing the estimated echo component, speech of a far-end speaker, and near-end speaker Speaker detection means for detecting the utterance and double talk of
An echo cancellation apparatus, wherein the speaker detecting means calculates a silent period and controls a value of an offset term of a normalized LMS method in the transfer function estimating means according to the calculated length of the silent period.

An echo canceling method in an echo canceling apparatus having a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. So,
A transfer function estimating step for estimating an echo component wrapping around from the speaker to the microphone; an echo filter step for removing the estimated echo component; and detecting speech of a far end speaker, speech of a near end speaker, and double talk Speaker detection step to perform
The speaker detection step includes a leak integration step of calculating a power of a far-end speaker voice signal as a reception voice signal and a power of a near-end speaker voice signal as a transmission voice signal to perform leak integration, and the leak integration step. A detection step of detecting the utterance states of the far-end speaker and the near-end speaker based on the calculation result.

The speaker detection step includes a reception average value holding step of calculating and holding an average value of a leak integration value of the reception voice signal, and a transmission average calculating and holding an average value of the leak integration value of the transmission voice signal. The echo canceling method according to claim 5, further comprising a value holding step.

An echo canceling method in an echo canceling apparatus having a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. So,
A transfer function estimating step for estimating an echo component wrapping around from the speaker to the microphone; an echo filter step for removing the estimated echo component; and detecting speech of a far end speaker, speech of a near end speaker, and double talk Speaker detection step to perform
In the speaker detecting step, the speech periods of the far end speaker, the near end speaker and the double talk are calculated, and the value of the learning coefficient term in the transfer function estimation step is controlled according to the calculated length of the speech period. It is characterized by the echo cancellation method.

An echo canceling method in an echo canceling apparatus having a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. So,
A transfer function estimating step for estimating an echo component wrapping around from the speaker to the microphone; an echo filter step for removing the estimated echo component; and detecting speech of a far end speaker, speech of a near end speaker, and double talk Speaker detection step to perform
An echo canceling method, wherein in the speaker detecting step, a silent period is calculated, and a value of an offset term of a normalized LMS method in the transfer function estimating step is controlled according to the calculated length of the silent period.

A program for executing each step of the echo canceling method according to claim 5.

A computer-readable recording medium for executing the program according to claim 9.