JP2004274683A

JP2004274683A - Echo canceler, echo canceling method, program, and recording medium

Info

Publication number: JP2004274683A
Application number: JP2003066482A
Authority: JP
Inventors: Junichi Koga; 淳一古賀; Kenichi Taniguchi; 賢一谷口; Naoto Kawasaki; 直人川▲崎▼; Hideaki Sasaki; 秀昭佐々木; Kensuke Yamashita; 賢祐山下
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2003-03-12
Filing date: 2003-03-12
Publication date: 2004-09-30

Abstract

<P>PROBLEM TO BE SOLVED: To provide an echo canceler which can accurately detect a speaker even when surrounding acoustic noises or electric noises of a device is large, and an echo canceling method, a program, and a recording medium thereof. <P>SOLUTION: An echo canceler is provided with a first filter means filtering a voice of a received voice or the like from a remote speaker, a second filter means filtering an input voice from a microphone, a first integration means integrating an output of the first filter means, a second integration means integrating an output of the second filter means, and a voice power comparing means comparing an output result of the first and the second integration means to detect a frequency bandwidth in which a voice power exists. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置、および、そのエコーキャンセル装置におけるエコーキャンセル方法、ならびに、そのエコーキャンセル方法を実行するためのプログラム、ならびに、そのプログラムを実行するための記録媒体に関するものである。
【０００２】
【従来の技術】
スピーカフォン方式電話等の音声ハンズフリー装置において、ハウリングやエコーを防止するためにエコーキャンセル技術がある。このエコーキャンセル技術によれば、スピーカから出力された音声が部屋等の空間を通ってマイクロフォンに入力された音声（エコー）から、その空間を擬似的に模擬した伝達関数とスピーカへ出力した音声とを畳み込んだ信号を差し引くことにより、あたかもエコーがないようにすることができる。
【０００３】
以下に、従来のエコーキャンセル技術について（特許文献１）を用いて説明する。図９は従来のエコーキャンセル装置を示す機能ブロック図である。
【０００４】
図９において、１はスピーカフォン方式電話等における受話音声（遠端話者からの音声）を再生するスピーカ、２は送話音声（近端話者からの音声）を拾うマイクロフォン、３は直接伝搬経路を経たエコーを消去する第一のエコーキャンセル部、４は第一のエコーキャンセル部３の出力信号を用いてダブルトーク状態を検出するダブルトーク検出部、５は間接伝搬経路を経たエコーを消去する第二のエコーキャンセル部である。
【０００５】
【特許文献１】
特開平５−４８５４７号公報
【０００６】
【発明が解決しようとする課題】
しかしながら、上記従来のエコーキャンセル装置では、装置を設置した場所の周囲の音響的な雑音若しくは装置の電気的な雑音が大きい場合に、遠端話者若しくは近端話者若しくはダブルトークの検出に失敗し、エコーが発生する為会話に支障が生じるという問題点を有していた。
【０００７】
このエコーキャンセル装置、エコーキャンセル方法、プログラムおよび記録媒体では、送話音声を低域部分および高域部分に分割してそれぞれの帯域に応じた積分出力を取得し、且つ受話音声を中域部分に限定してこの帯域に応じた積分出力を取得することにより、周囲の音響的な雑音若しくは装置の電気的な雑音が大きい場合でも的確な話者検出を行うことができることが要求されている。
【０００８】
本発明は、この要求を満たすため、周囲の音響的な雑音若しくは装置の電気的な雑音が大きい場合でも的確な話者検出を行うことができるエコーキャンセル装置、および、周囲の音響的な雑音若しくは装置の電気的な雑音が大きい場合でも的確な話者検出を行うエコーキャンセル方法、ならびに、そのエコーキャンセル方法を実行するためのプログラムおよび記録媒体を提供することを目的とする。
【０００９】
【課題を解決するための手段】
上記課題を解決するために本発明のエコーキャンセル装置は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置であって、中央演算処理装置は、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタ手段と、マイクロフォンからの入力音声をフィルタリングする第二のフィルタ手段と、第一のフィルタ手段の出力の積分を行う第一の積分手段と、第二のフィルタ手段の出力の積分を行う第二の積分手段と、第一及び第二の積分手段の出力結果を比較して音声パワーが存在する周波数帯域を検出する音声パワー比較手段とを有する構成を備えている。
【００１０】
これにより、周囲の音響的な雑音若しくは装置の電気的な雑音が大きい場合でも的確な話者検出を行うことができるエコーキャンセル装置が得られる。
【００１１】
上記課題を解決するために本発明のエコーキャンセル方法は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置であって、中央演算処理装置は、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタ手段と、マイクロフォンからの入力音声をフィルタリングする第二のフィルタ手段と、第一のフィルタ手段の出力の積分を行う第一の積分手段と、第二のフィルタ手段の出力の積分を行う第二の積分手段と、第一及び第二の積分手段の出力結果を比較して音声パワーが存在する周波数帯域を検出する音声パワー比較手段と、周囲の定常ノイズを測定する定常ノイズフロア測定手段と、定常ノイズフロア測定手段の結果を用いて音声パワー比較手段の係数を制御する係数制御手段とを有する構成を備えている。
【００１２】
これにより、周囲の音響的な雑音若しくは装置の電気的な雑音が大きい場合でも的確な話者検出を行うエコーキャンセル方法が得られる。
【００１３】
上記課題を解決するために本発明のプログラムは、上記エコーキャンセル方法の各ステップを実行するためのプログラムである構成を備えている。
【００１４】
これにより、上記エコーキャンセル方法を実行するためのプログラムが得られる。
【００１５】
上記課題を解決するために本発明の記録媒体は、上記プログラムを実行するためのコンピュータで読み取り可能な記録媒体である構成を備えている。
【００１６】
これにより、上記プログラムを実行するための記録媒体が得られる。
【００１７】
【発明の実施の形態】
本発明の請求項１に記載のエコーキャンセル装置は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置であって、中央演算処理装置は、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタ手段と、マイクロフォンからの入力音声をフィルタリングする第二のフィルタ手段と、第一のフィルタ手段の出力の積分を行う第一の積分手段と、第二のフィルタ手段の出力の積分を行う第二の積分手段と、第一及び第二の積分手段の出力結果を比較して音声パワーが存在する周波数帯域を検出する音声パワー比較手段とを有することとしたものである。
【００１８】
この構成により、送話音声を例えば低域部分および高域部分に分割してそれぞれの帯域に応じた積分出力を取得し、且つ受話音声を例えば中域部分に限定してこの帯域に応じた積分出力を取得することができるので、エコーキャンセラの性能に依存すること無くそれぞれの話者音声を正確に検出することができ、周囲の音響的な雑音若しくは装置の電気的な雑音が大きい場合でも的確な話者検出を行うことができるという作用を有する。
【００１９】
請求項２に記載のエコーキャンセル装置は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置であって、中央演算処理装置は、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタ手段と、マイクロフォンからの入力音声をフィルタリングする第二のフィルタ手段と、第一のフィルタ手段の出力の積分を行う第一の積分手段と、第二のフィルタ手段の出力の積分を行う第二の積分手段と、第一及び第二の積分手段の出力結果を比較して音声パワーが存在する周波数帯域を検出する音声パワー比較手段と、周囲の定常ノイズを測定する定常ノイズフロア測定手段と、定常ノイズフロア測定手段の結果を用いて音声パワー比較手段の係数を制御する係数制御手段とを有することとしたものである。
【００２０】
この構成により、周囲の音響的な雑音若しくは装置の電気的な雑音の大きさに応じて音声パワー比較手段の係数を制御するようにしたので、周囲の音響的な雑音若しくは装置の電気的な雑音が大きくなっても話者検出の精度が劣化しないという作用を有する。
【００２１】
請求項３に記載のエコーキャンセル装置は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置であって、中央演算処理装置は、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタ手段と、マイクロフォンからの入力音声をフィルタリングする第二のフィルタ手段と、第一のフィルタ手段の出力の積分を行う第一の積分手段と、第二のフィルタ手段の出力の積分を行う第二の積分手段と、第一及び第二の積分手段の出力結果を比較して音声パワーが存在する周波数帯域を検出する音声パワー比較手段と、遠端話者が発話していない無音区間を計測する第一の無音区間計測手段と、近端話者が発話していない無音区間を計測する第二の無音区間計測手段と、第一の無音区間計測手段の結果を用いて第一の積分手段の積分窓長を制御する第一の積分窓長制御手段と、第二の無音区間計測手段の結果を用いて第二の積分手段の積分窓長を制御する第二の積分窓長制御手段とを有することとしたものである。
【００２２】
この構成により、無音区間に応じて第一、第二の積分窓長を制御するようにしたので、送話音声を例えば低域部分および高域部分に分割した場合の積分出力と、受話音声を例えば中域部分に限定した場合の積分出力とを更に正確に取得することができるという作用を有する。
【００２３】
請求項４に記載のエコーキャンセル方法は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置におけるエコーキャンセル方法であって、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタリング実行ステップと、マイクロフォンからの入力音声をフィルタリングする第二のフィルタリング実行ステップと、第一のフィルタリング実行ステップにおける出力の積分を行う第一の積分実行ステップと、第二のフィルタリング実行ステップにおける出力の積分を行う第二の積分実行ステップと、第一の積分実行ステップにおける積分値と第二の積分実行ステップにおける積分値とを比較して音声パワーが存在する周波数帯域を検出する音声パワー比較ステップとを有することとしたものである。
【００２４】
この構成により、送話音声を例えば低域部分および高域部分に分割してそれぞれの帯域に応じた積分出力を取得し、且つ受話音声を例えば中域部分に限定してこの帯域に応じた積分出力を取得することができるので、それぞれの話者音声を正確に検出することができ、周囲の音響的な雑音若しくは装置の電気的な雑音が大きい場合でも的確な話者検出を行うことができるという作用を有する。
【００２５】
請求項５に記載のエコーキャンセル方法は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置におけるエコーキャンセル方法であって、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタリング実行ステップと、マイクロフォンからの入力音声をフィルタリングする第二のフィルタリング実行ステップと、第一のフィルタリング実行ステップにおける出力の積分を行う第一の積分実行ステップと、第二のフィルタリング実行ステップにおける出力の積分を行う第二の積分実行ステップと、第一の積分実行ステップにおける積分値と第二の積分実行ステップにおける積分値とを比較して音声パワーが存在する周波数帯域を検出する音声パワー比較ステップと、周囲の定常ノイズを測定する定常ノイズフロア測定ステップと、定常ノイズフロア測定ステップにおける測定ノイズを用いて音声パワー比較ステップの係数を制御する係数制御ステップとを有することとしたものである。
【００２６】
この構成により、周囲の音響的な雑音若しくは装置の電気的な雑音の大きさに応じて音声パワー比較手段の係数を制御するようにしたので、周囲の音響的な雑音若しくは装置の電気的な雑音が大きくなっても話者検出の精度が劣化しないという作用を有する。
【００２７】
請求項６に記載のエコーキャンセル方法は、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置におけるエコーキャンセル方法であって、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタリング実行ステップと、マイクロフォンからの入力音声をフィルタリングする第二のフィルタリング実行ステップと、第一のフィルタリング実行ステップにおける出力の積分を行う第一の積分実行ステップと、第二のフィルタリング実行ステップにおける出力の積分を行う第二の積分実行ステップと、第一の積分実行ステップにおける積分値と第二の積分実行ステップにおける積分値とを比較して音声パワーが存在する周波数帯域を検出する音声パワー比較ステップと、遠端話者が発話していない無音区間を計測する第一の話者発話時間計測ステップと、近端話者が発話していない無音区間を計測する第二の話者発話時間計測ステップと、第一の話者発話時間計測ステップにおける計測無音区間を用いて第一の積分実行ステップにおける積分窓長を制御する第一の積分窓長制御ステップと、第二の話者発話時間計測ステップにおける計測無音区間を用いて第二の積分ステップの積分窓長を制御する第二の積分窓長制御ステップとを有することとしたものである。
【００２８】
この構成により、無音区間に応じて第一、第二の積分窓長を制御するようにしたので、送話音声を例えば低域部分および高域部分に分割した場合の積分出力と、受話音声を例えば中域部分に限定した場合の積分出力とを更に正確に取得することができるという作用を有する。
【００２９】
請求項７に記載のプログラムは、請求項４乃至６のいずれか１に記載されたエコーキャンセル方法の各ステップを実行するためのプログラムであることとしたものである。
【００３０】
この構成により、上記プログラムを実行するコンピュータを用いることにより、請求項４乃至６のいずれか１に記載されたエコーキャンセル方法を任意の場所で任意の時間に実行することができるという作用を有する。
【００３１】
請求項８に記載の記録媒体は、請求項７に記載されたプログラムを実行するためのコンピュータで読み取り可能な記録媒体であることとしたものである。
【００３２】
この構成により、コンピュータで読み取り可能な記録媒体からプログラムを読み取ることにより、請求項７に記載されたプログラムを任意の場所で任意の時間に実行することができるという作用を有する。
【００３３】
以下、本発明の実施の形態について、図１〜図８を用いて説明する。
【００３４】
（実施の形態１）
図１は、本発明の実施の形態１、２、３、４によるエコーキャンセル装置の基本構成を示すブロック図である。
【００３５】
図１において、６は電話回線とのインタフェースを有する電話回路装置、７はアナログ電気信号である受話音声電気信号をデジタル電気信号に変換するＡ／Ｄ変換装置、８はデジタル電気信号をアナログ電気信号へ変換するＤ／Ａ変換装置、９はＤ／Ａ変換装置８からのアナログ電気信号を音声に変換するスピーカ、１０は音声をアナログ電気信号に変換するマイクロフォン、１１はマイクロフォン１０からのアナログ電気信号をデジタル電気信号に変換するＡ／Ｄ変換装置、１２はデジタル電気信号をアナログ電気信号（送話音声電気信号）に変換するＤ／Ａ変換装置、１３はＡ／Ｄ変換装置７およびＡ／Ｄ変換装置１１から得られたデジタル電気信号に対してデジタル信号処理を行い、その演算結果をＤ／Ａ変換装置８およびＤ／Ａ変換装置１２に出力する中央演算処理装置、１４は中央演算処理装置１３を動作させるためのプログラムが記憶されているＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、１５はＲＯＭ１４に記憶されているプログラムに従って中央演算処理装置１３が動作する際に使用するＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）である。
【００３６】
図２は、図１の中央演算処理装置１３における機能実現手段（プログラムによって機能が実現される手段）を示す機能ブロック図であり、スピーカフォン方式電話におけるエコーキャンセル方法を示す。本機能はＲＯＭ１４に記録されているプログラムの概略を示している。
【００３７】
図２において、１６はスピーカフォン方式電話等において、エコーキャンセル装置の動作を制御するために遠端話者の発話、近端話者の発話およびダブルトーク（遠端話者と近端話者の同時発話）を検出する話者検出手段、１７は正規化ＬＭＳ（ＬｅａｓｔＭｅａｎＳｑｕａｒｅ）方式等に代表される最急降下法によりスピーカ９とマイクロフォン１０との間の空間の伝達関数を推定する伝達関数推定手段、１８は直接エコー成分の伝達関数と受話音声との畳み込み演算を行う直接エコーフィルタ手段、１９は間接エコー成分の伝達関数と受話音声との畳み込み演算を行う間接エコーフィルタ手段、２０は減算手段である。
【００３８】
このように構成されたエコーキャンセル装置について、その概略動作を説明する。スピーカ９から放射された音声は空間を介してマイクロフォン１０にエコーとして入力され、閉ループが構成され、エコーキャンセル処理を行わなければ最悪ハウリングが発生してしまう。また、スピーカ９から放射された音声は、直接マイクロフォン１０へ入力される直接エコー成分と、空間内の壁、床、天井等の物体によって反射された後にマイクロフォン１０に入る間接エコー成分に分類できる。
【００３９】
図３は、図２の中央演算処理装置１３の動作を示すフローチャートであり、スピーカフォン方式電話におけるエコーキャンセル方法を示す。
【００４０】
図３において、エコーキャンセル処理を開始すると（Ｓ１）、話者検出手段１６が遠端話者発話、近端話者発話、ダブルトークを判定し（Ｓ２）、遠端話者発話ならば伝達関数推定手段１７がＮＬＭＳ等のアルゴリズムを用いて直接波成分伝達関数推定（Ｓ３）および間接波成分伝達関数推定（Ｓ４）を行い、直接エコーフィルタ手段１８は推定結果と受話音声との畳み込み演算を行い（Ｓ５）、間接エコーフィルタ手段１９は推定結果と受話音声との畳み込み演算を行い（Ｓ６）、マイクロフォン１０からの送話音声と畳み込み演算結果とを減算手段２０を用いて減算して直接エコー成分と間接エコー成分を除去する（Ｓ７）。
【００４１】
これにより、伝達関数推定の高速化と高精度化を両方実現したエコーキャンセル処理が可能である。
【００４２】
以上のように本実施の形態によれば、直接エコーフィルタ手段１８により推定結果と受話音声との畳み込み演算を行い、間接エコーフィルタ手段１９により推定結果と受話音声との畳み込み演算を行い、マイクロフォン１０からの送話音声と畳み込み演算結果とを減算手段２０を用いて減算して直接エコー成分と間接エコー成分を除去するようにしたので、スピーカからの音量を大きくした場合でもダブルトークの判定精度を高くすることができ、受話音声と送話音声の音声パワー比が同じ場合でもダブルトーク検出精度を高くすることができる。
【００４３】
（実施の形態２）
図４は、本発明の実施の形態２、３、４によるエコーキャンセル装置の中央演算処理装置１３を示す機能ブロック図であり、スピーカフォン方式電話におけるエコーキャンセル方法を示す。なお、本実施の形態によるエコーキャンセル装置の基本構成は図１に示す構成である。また、本機能は、ＲＯＭ１４に記録されているプログラムの概略を示している。
【００４４】
図４において、３４は電話回線等の通信回線となるチャネル、３５はスピーカフォン方式電話においてエコーキャンセラの動作を制御するために遠端話者の発話、近端話者の発話およびダブルトーク（遠端話者と近端話者の同時発話）を検出する話者検出手段、３６は正規化ＬＭＳ（ＬｅａｓｔＭｅａｎＳｑｕａｒｅ）方式等に代表される最急降下法により空間の伝達関数を推定する伝達関数推定手段、３７は推定した伝達関数と受話音声との畳み込み演算を行うフィルタ手段、３８は減算手段である。
【００４５】
図５は、チャネル３４を通過した音声Ａ１とチャネル３４を通過していない音声Ａ２の周波数帯域でのパワー分布比較図である。図５において、チャネル３４として電話回線を例に挙げると、電話回線の周波数帯域は３００Ｈｚ〜３４００Ｈｚに制限されており、通常の人の音声帯域である１０Ｈｚ〜１０ｋＨｚの帯域よりも狭くなっている。
【００４６】
このように構成された中央演算処理装置１３について、その動作を図６を用いて説明する。図６は、図４の中央演算処理装置１３の動作を示すフローチャートである。
【００４７】
図６において、エコーキャンセル処理を開始すると（Ｓ１１）、話者検出手段３５においてチャネル３４の帯域に着目し、マイクロフォン１０からの送話音声をチャネル３４の帯域のみの音声を通過させる第一のフィルタ手段（話者検出手段３５内の図示しないフィルタ手段）とチャネルの帯域以外の音声を通過させる第二のフィルタ手段（話者検出手段３５内の図示しないフィルタ手段）とに入力させ（Ｓ１２）、第一のフィルタ手段の出力の積分を第一の積分手段（話者検出手段３５内の図示しない積分手段）により行い、第二のフィルタ手段の出力の積分を第二の積分手段（話者検出手段３５内の図示しない積分手段）により行い、各々のフィルタ手段の積分出力の音声パワーを測定し、話者検出手段３５内の図示しない音声パワー比較手段は、各々のフィルタ手段の積分出力を比較して音声パワーが存在する周波数帯域を検出する。検出の結果、測定した音声パワーにおいて、チャネル３４の帯域のみに音声パワーが集中していれば、遠端話者の発話として検知し、伝達関数推定手段３６が伝達関数の推定を行う（Ｓ１３、Ｓ１４）。また、チャネル３４の帯域とチャネル３４の帯域以外の帯域との両方に音声パワーが存在するならば、近端話者発話もしくはダブルトークと判定して、伝達関数推定手段３６の推定動作を停止させる（Ｓ１３、Ｓ１５）。
【００４８】
以上のように本実施の形態によれば、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタ手段と、マイクロフォン１０からの入力音声をフィルタリングする第二のフィルタ手段と、第一のフィルタ手段の出力の積分を行う第一の積分手段と、第二のフィルタ手段の出力の積分を行う第二の積分手段と、第一及び第二の積分手段の出力結果を比較して音声パワーが存在する周波数帯域を検出する音声パワー比較手段とを有することにより、送話音声を例えば低域部分および高域部分に分割してそれぞれの帯域に応じた積分出力を取得し、且つ受話音声を例えば中域部分に限定してこの帯域に応じた積分出力を取得することができるので、エコーキャンセラの性能に依存すること無くそれぞれの話者音声を正確に検出することができ、周囲の音響的な雑音若しくは装置の電気的な雑音が大きい場合でも的確な話者検出を行うことができる。
【００４９】
（実施の形態３）
本発明の実施の形態３によるエコーキャンセル装置の基本構成および中央演算処理装置１３の構成は、図１および図４に示す構成である。
【００５０】
このように構成された中央演算処理装置１３について、その動作を図７を用いて説明する。図７は、図４の中央演算処理装置の動作を示すフローチャートである。
【００５１】
図７において、エコーキャンセル処理を開始すると（Ｓ２１）、話者検出手段３５が遠端発話、近端発話、ダブルトークを検出し、無音状態であれば、話者検出手段３５内の図示しない定常ノイズフロア測定手段はノイズフロアレベル（周囲の定常ノイズのレベル）を測定する（Ｓ２２）。測定したノイズフロアレベルが高ければ話者検出手段３５内の図示しない係数制御手段は、話者検出手段３５の検出閾値（前述した音声パワー比較手段における周波数帯域検出のための閾値、音声パワー比較手段の係数）を大きくし（Ｓ２３、Ｓ２４）、測定したノイズフロアレベルが低ければ話者検出手段３５の検出閾値を小さくする（Ｓ２３、Ｓ２５）。
【００５２】
以上のように本実施の形態によれば、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタ手段と、マイクロフォン１０からの入力音声をフィルタリングする第二のフィルタ手段と、第一のフィルタ手段の出力の積分を行う第一の積分手段と、第二のフィルタ手段の出力の積分を行う第二の積分手段と、第一及び第二の積分手段の出力結果を比較して音声パワーが存在する周波数帯域を検出する音声パワー比較手段と、周囲の定常ノイズを測定する定常ノイズフロア測定手段と、定常ノイズフロア測定手段の結果を用いて音声パワー比較手段の係数を制御する係数制御手段とを有することにより、周囲の音響的な雑音若しくは装置の電気的な雑音の大きさに応じて音声パワー比較手段の係数を制御するようにしたので、周囲の音響的な雑音若しくは装置の電気的な雑音が大きくなっても話者検出の精度が劣化しない。
【００５３】
（実施の形態４）
本発明の実施の形態４によるエコーキャンセル装置の基本構成および中央演算処理装置１３の構成は、図１および図４に示す構成である。
【００５４】
このように構成された中央演算処理装置１３について、その動作を図８を用いて説明する。図８は、図４の中央演算処理装置の動作を示すフローチャートである。
【００５５】
図８において、エコーキャンセル処理を開始すると（Ｓ３１）、話者検出手段３５が遠端発話、近端発話、ダブルトークを検出し、無音状態であれば無音期間の長さを測定する（Ｓ３２）。無音期間が長ければ（Ｓ３３）、音声パワーのリーク積分窓長を短くし（Ｓ３４）、無音期間が短ければ音声パワーのリーク積分窓長を長くする（Ｓ３５）。すなわち、話者検出手段３５内の図示しない第一の無音区間計測手段は遠端話者が発話していない無音区間を計測し、話者検出手段３５内の図示しない第二の無音区間計測手段は近端話者が発話していない無音区間を計測する。また、話者検出手段３５内の図示しない第一の積分窓長制御手段は第一の無音区間計測手段の結果を用いて第一の積分手段の積分窓長を制御し、話者検出手段３５内の図示しない第二の積分窓長制御手段は第二の無音区間計測手段の結果を用いて第二の積分手段の積分窓長を制御する。
【００５６】
以上のように本実施の形態によれば、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタ手段と、マイクロフォン１０からの入力音声をフィルタリングする第二のフィルタ手段と、第一のフィルタ手段の出力の積分を行う第一の積分手段と、第二のフィルタ手段の出力の積分を行う第二の積分手段と、第一及び第二の積分手段の出力結果を比較して音声パワーが存在する周波数帯域を検出する音声パワー比較手段と、遠端話者が発話していない無音区間を計測する第一の無音区間計測手段と、近端話者が発話していない無音区間を計測する第二の無音区間計測手段と、第一の無音区間計測手段の結果を用いて第一の積分手段の積分窓長を制御する第一の積分窓長制御手段と、第二の無音区間計測手段の結果を用いて第二の積分手段の積分窓長を制御する第二の積分窓長制御手段とを有することにより、無音区間に応じて第一、第二の積分窓長を制御するようにしたので、送話音声を例えば低域部分および高域部分に分割した場合の積分出力と、受話音声を例えば中域部分に限定した場合の積分出力とを更に正確に取得することができる。
【００５７】
【発明の効果】
以上説明したように本発明の請求項１に記載のエコーキャンセル装置によれば、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置であって、中央演算処理装置は、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタ手段と、マイクロフォンからの入力音声をフィルタリングする第二のフィルタ手段と、第一のフィルタ手段の出力の積分を行う第一の積分手段と、第二のフィルタ手段の出力の積分を行う第二の積分手段と、第一及び第二の積分手段の出力結果を比較して音声パワーが存在する周波数帯域を検出する音声パワー比較手段とを有することにより、送話音声を例えば低域部分および高域部分に分割してそれぞれの帯域に応じた積分出力を取得し、且つ受話音声を例えば中域部分に限定してこの帯域に応じた積分出力を取得することができるので、エコーキャンセラの性能に依存すること無くそれぞれの話者音声を正確に検出することができ、周囲の音響的な雑音若しくは装置の電気的な雑音が大きい場合でも的確な話者検出を行うことができるという有利な効果が得られる。
【００５８】
請求項２に記載のエコーキャンセル装置によれば、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置であって、中央演算処理装置は、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタ手段と、マイクロフォンからの入力音声をフィルタリングする第二のフィルタ手段と、第一のフィルタ手段の出力の積分を行う第一の積分手段と、第二のフィルタ手段の出力の積分を行う第二の積分手段と、第一及び第二の積分手段の出力結果を比較して音声パワーが存在する周波数帯域を検出する音声パワー比較手段と、周囲の定常ノイズを測定する定常ノイズフロア測定手段と、定常ノイズフロア測定手段の結果を用いて音声パワー比較手段の係数を制御する係数制御手段とを有することにより、周囲の音響的な雑音若しくは装置の電気的な雑音の大きさに応じて音声パワー比較手段の係数を制御するようにしたので、周囲の音響的な雑音若しくは装置の電気的な雑音が大きくなっても話者検出の精度が劣化しないという有利な効果が得られる。
【００５９】
請求項３に記載のエコーキャンセル装置によれば、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置であって、中央演算処理装置は、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタ手段と、マイクロフォンからの入力音声をフィルタリングする第二のフィルタ手段と、第一のフィルタ手段の出力の積分を行う第一の積分手段と、第二のフィルタ手段の出力の積分を行う第二の積分手段と、第一及び第二の積分手段の出力結果を比較して音声パワーが存在する周波数帯域を検出する音声パワー比較手段と、遠端話者が発話していない無音区間を計測する第一の無音区間計測手段と、近端話者が発話していない無音区間を計測する第二の無音区間計測手段と、第一の無音区間計測手段の結果を用いて第一の積分手段の積分窓長を制御する第一の積分窓長制御手段と、第二の無音区間計測手段の結果を用いて第二の積分手段の積分窓長を制御する第二の積分窓長制御手段とを有することにより、無音区間に応じて第一、第二の積分窓長を制御するようにしたので、送話音声を例えば低域部分および高域部分に分割した場合の積分出力と、受話音声を例えば中域部分に限定した場合の積分出力とを更に正確に取得することができるという有利な効果が得られる。
【００６０】
請求項４に記載のエコーキャンセル方法によれば、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置におけるエコーキャンセル方法であって、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタリング実行ステップと、マイクロフォンからの入力音声をフィルタリングする第二のフィルタリング実行ステップと、第一のフィルタリング実行ステップにおける出力の積分を行う第一の積分実行ステップと、第二のフィルタリング実行ステップにおける出力の積分を行う第二の積分実行ステップと、第一の積分実行ステップにおける積分値と第二の積分実行ステップにおける積分値とを比較して音声パワーが存在する周波数帯域を検出する音声パワー比較ステップとを有することにより、送話音声を例えば低域部分および高域部分に分割してそれぞれの帯域に応じた積分出力を取得し、且つ受話音声を例えば中域部分に限定してこの帯域に応じた積分出力を取得することができるので、それぞれの話者音声を正確に検出することができ、周囲の音響的な雑音若しくは装置の電気的な雑音が大きい場合でも的確な話者検出を行うことができるという有利な効果が得られる。
【００６１】
請求項５に記載のエコーキャンセル方法によれば、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置におけるエコーキャンセル方法であって、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタリング実行ステップと、マイクロフォンからの入力音声をフィルタリングする第二のフィルタリング実行ステップと、第一のフィルタリング実行ステップにおける出力の積分を行う第一の積分実行ステップと、第二のフィルタリング実行ステップにおける出力の積分を行う第二の積分実行ステップと、第一の積分実行ステップにおける積分値と第二の積分実行ステップにおける積分値とを比較して音声パワーが存在する周波数帯域を検出する音声パワー比較ステップと、周囲の定常ノイズを測定する定常ノイズフロア測定ステップと、定常ノイズフロア測定ステップにおける測定ノイズを用いて音声パワー比較ステップの係数を制御する係数制御ステップとを有することにより、周囲の音響的な雑音若しくは装置の電気的な雑音の大きさに応じて音声パワー比較手段の係数を制御するようにしたので、周囲の音響的な雑音若しくは装置の電気的な雑音が大きくなっても話者検出の精度が劣化しないという有利な効果が得られる。
【００６２】
請求項６に記載のエコーキャンセル方法によれば、遠端話者からの受話音声等の音声を出力するスピーカと、近端話者等の音声が入力されるマイクロフォンと、全体を制御する中央演算処理装置とを有するエコーキャンセル装置におけるエコーキャンセル方法であって、遠端話者からの受話音声等の音声をフィルタリングする第一のフィルタリング実行ステップと、マイクロフォンからの入力音声をフィルタリングする第二のフィルタリング実行ステップと、第一のフィルタリング実行ステップにおける出力の積分を行う第一の積分実行ステップと、第二のフィルタリング実行ステップにおける出力の積分を行う第二の積分実行ステップと、第一の積分実行ステップにおける積分値と第二の積分実行ステップにおける積分値とを比較して音声パワーが存在する周波数帯域を検出する音声パワー比較ステップと、遠端話者が発話していない無音区間を計測する第一の話者発話時間計測ステップと、近端話者が発話していない無音区間を計測する第二の話者発話時間計測ステップと、第一の話者発話時間計測ステップにおける計測無音区間を用いて第一の積分実行ステップにおける積分窓長を制御する第一の積分窓長制御ステップと、第二の話者発話時間計測ステップにおける計測無音区間を用いて第二の積分ステップの積分窓長を制御する第二の積分窓長制御ステップとを有することにより、無音区間に応じて第一、第二の積分窓長を制御するようにしたので、送話音声を例えば低域部分および高域部分に分割した場合の積分出力と、受話音声を例えば中域部分に限定した場合の積分出力とを更に正確に取得することができるという有利な効果が得られる。
【００６３】
請求項７に記載のログラムによれば、請求項４乃至６のいずれか１に記載されたエコーキャンセル方法の各ステップを実行するためのプログラムであることにより、上記プログラムを実行するコンピュータを用いることにより、請求項４乃至６のいずれか１に記載されたエコーキャンセル方法を任意の場所で任意の時間に実行することができるという有利な効果が得られる。
【００６４】
請求項８に記載の記録媒体によれば、請求項７に記載されたプログラムを実行するためのコンピュータで読み取り可能な記録媒体であることにより、コンピュータで読み取り可能な記録媒体からプログラムを読み取ることにより、請求項７に記載されたプログラムを任意の場所で任意の時間に実行することができるという有利な効果が得られる。
【図面の簡単な説明】
【図１】本発明の実施の形態１、２、３、４によるエコーキャンセル装置の基本構成を示すブロック図
【図２】図１の中央演算処理装置における機能実現手段を示す機能ブロック図
【図３】図２の中央演算処理装置の動作を示すフローチャート
【図４】本発明の実施の形態２、３、４によるエコーキャンセル装置の中央演算処理装置を示す機能ブロック図
【図５】チャネルを通過した音声とチャネルを通過していない音声の周波数帯域でのパワー分布比較図
【図６】図４の中央演算処理装置の動作を示すフローチャート
【図７】図４の中央演算処理装置の動作を示すフローチャート
【図８】図４の中央演算処理装置の動作を示すフローチャート
【図９】従来のエコーキャンセル装置を示す機能ブロック図
【符号の説明】
６電話回路装置
７、１１Ａ／Ｄ変換装置
８、１２Ｄ／Ａ変換装置
９スピーカ
１０マイクロフォン
１３中央演算処理装置
１４ＲＯＭ
１５ＲＡＭ
３４チャネル
３５話者検出手段
３６伝達関数推定手段
３７フィルタ手段
３８減算手段[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention is directed to an echo canceling device including a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. The present invention also relates to an echo canceling method in the echo canceling device, a program for executing the echo canceling method, and a recording medium for executing the program.
[0002]
[Prior art]
In a voice hands-free device such as a speakerphone type telephone, there is an echo canceling technique for preventing howling and echo. According to this echo canceling technology, a sound output from a speaker is converted from a sound (echo) input to a microphone through a space such as a room through a transfer function simulating the space and a sound output to the speaker. By subtracting the signal obtained by convoluting, it is possible to eliminate echo.
[0003]
Hereinafter, a conventional echo cancellation technique will be described using (Patent Document 1). FIG. 9 is a functional block diagram showing a conventional echo canceling device.
[0004]
In FIG. 9, reference numeral 1 denotes a speaker for reproducing a received voice (voice from a far-end speaker) in a speakerphone type telephone or the like; 2, a microphone for picking up a transmitted voice (voice from a near-end speaker); A first echo canceling unit for canceling the echo passing through the path, a double talk detecting unit for detecting a double talk state using an output signal of the first echo canceling unit, and a canceling echo for passing through the indirect propagation route. This is the second echo canceling unit.
[0005]
[Patent Document 1]
JP-A-5-48547
[0006]
[Problems to be solved by the invention]
However, in the above-described conventional echo canceling apparatus, when acoustic noise around the place where the apparatus is installed or electric noise of the apparatus is large, detection of a far-end talker, a near-end talker, or double talk fails. However, there is a problem that conversation is hindered due to generation of an echo.
[0007]
In the echo canceling apparatus, the echo canceling method, the program and the recording medium, the transmitted voice is divided into a low-frequency part and a high-frequency part to obtain an integrated output corresponding to each band, and the received voice is converted to a middle-frequency part. It is required that accurate speaker detection can be performed even when the surrounding acoustic noise or the electrical noise of the device is large by acquiring an integrated output corresponding to this band in a limited manner.
[0008]
In order to satisfy this demand, the present invention provides an echo canceling device capable of performing accurate speaker detection even when the surrounding acoustic noise or the electrical noise of the device is large, and the surrounding acoustic noise or It is an object of the present invention to provide an echo canceling method for performing accurate speaker detection even when electrical noise of the apparatus is large, and a program and a recording medium for executing the echo canceling method.
[0009]
[Means for Solving the Problems]
In order to solve the above problems, an echo canceling device of the present invention controls a speaker that outputs voice such as a received voice from a far-end speaker, and a microphone that receives voice of a near-end speaker or the like. An echo canceling device having a central processing unit, wherein the central processing unit is configured to filter a voice such as a received voice from a far-end speaker, and a second filtering unit that filters an input voice from a microphone. Second filter means, first integration means for integrating the output of the first filter means, second integration means for integrating the output of the second filter means, and first and second integration means And an audio power comparing means for detecting a frequency band in which the audio power exists by comparing the output results of the above.
[0010]
As a result, an echo cancellation device that can perform accurate speaker detection even when the surrounding acoustic noise or the electrical noise of the device is large is obtained.
[0011]
In order to solve the above problems, an echo canceling method of the present invention controls a speaker that outputs voice such as a received voice from a far-end speaker, and a microphone that receives voice of a near-end speaker or the like. An echo canceling device having a central processing unit, wherein the central processing unit is configured to filter a voice such as a received voice from a far-end speaker, and a second filtering unit that filters an input voice from a microphone. Second filter means, first integration means for integrating the output of the first filter means, second integration means for integrating the output of the second filter means, and first and second integration means Audio power comparing means for comparing the output results of the above to detect a frequency band in which the audio power is present, stationary noise floor measuring means for measuring ambient stationary noise, And a structure having a coefficient control means for controlling the coefficients of the audio power comparing means using the result of the measurement means.
[0012]
As a result, an echo canceling method that performs accurate speaker detection even when the surrounding acoustic noise or the electrical noise of the device is large is obtained.
[0013]
In order to solve the above-mentioned problems, a program according to the present invention has a configuration that is a program for executing each step of the echo canceling method.
[0014]
As a result, a program for executing the echo canceling method is obtained.
[0015]
In order to solve the above problems, a recording medium of the present invention has a configuration that is a computer-readable recording medium for executing the above-mentioned program.
[0016]
Thereby, a recording medium for executing the program is obtained.
[0017]
BEST MODE FOR CARRYING OUT THE INVENTION
An echo cancellation apparatus according to a first aspect of the present invention includes a speaker that outputs voice such as a received voice from a far-end speaker, a microphone that receives voice of a near-end speaker, and a central control unit that controls the whole. An echo canceling device having an arithmetic processing device, wherein the central processing device has a first filter means for filtering voice such as a received voice from a far-end speaker, and a second filtering means for filtering input voice from a microphone. Filter means, first integration means for integrating the output of the first filter means, second integration means for integrating the output of the second filter means, and the first and second integration means Audio power comparing means for comparing output results and detecting a frequency band in which audio power exists.
[0018]
According to this configuration, the transmitted voice is divided into, for example, a low-frequency part and a high-frequency part, and an integrated output corresponding to each band is obtained. Since the output can be obtained, each speaker's voice can be accurately detected without depending on the performance of the echo canceller, and even when the surrounding acoustic noise or the electrical noise of the device is large, it can be accurately detected. This has the effect of enabling accurate speaker detection.
[0019]
The echo canceling device according to claim 2, wherein a speaker that outputs voice such as a received voice from a far-end speaker, a microphone that receives voice of a near-end speaker or the like, and a central processing unit that controls the whole. Wherein the central processing unit comprises: first filter means for filtering voice such as received voice from a far-end speaker; and second filter means for filtering input voice from a microphone. And first integration means for integrating the output of the first filter means, second integration means for integrating the output of the second filter means, and the output results of the first and second integration means. Sound power comparing means for comparing and detecting a frequency band in which sound power exists, stationary noise floor measuring means for measuring ambient stationary noise, and results of stationary noise floor measuring means In which it was decided to have a coefficient control means for controlling the coefficients of the audio power comparing means using.
[0020]
With this configuration, the coefficient of the audio power comparing means is controlled in accordance with the magnitude of the surrounding acoustic noise or the electrical noise of the device, so that the surrounding acoustic noise or the electrical noise of the device is controlled. Has the effect that the accuracy of speaker detection does not deteriorate even if the
[0021]
The echo canceling device according to claim 3, wherein a speaker that outputs voice such as a received voice from a far-end speaker, a microphone that receives voice of a near-end speaker or the like, and a central processing unit that controls the whole. Wherein the central processing unit comprises: first filter means for filtering voice such as received voice from a far-end speaker; and second filter means for filtering input voice from a microphone. And first integration means for integrating the output of the first filter means, second integration means for integrating the output of the second filter means, and the output results of the first and second integration means. Voice power comparing means for comparing and detecting a frequency band in which voice power exists, first silent section measuring means for measuring a silent section in which the far-end speaker is not speaking, and near-end speaker speaking. Second silent section measuring means for measuring a silent section that is not present, first integration window length control means for controlling the integration window length of the first integrating means using the result of the first silent section measuring means, And second integration window length control means for controlling the integration window length of the second integration means using the result of the second silent section measurement means.
[0022]
With this configuration, the first and second integration window lengths are controlled in accordance with the silence section. Therefore, the integrated output when the transmitted voice is divided into, for example, a low-frequency part and a high-frequency part, and the received voice are output. For example, there is an effect that the integral output when the current value is limited to the middle band portion can be obtained more accurately.
[0023]
5. The echo canceling method according to claim 4, wherein a speaker that outputs voice such as a voice received from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. An echo canceling method in an echo canceling device, comprising: a first filtering execution step of filtering a sound such as a reception sound from a far-end speaker; and a second filtering execution step of filtering an input sound from a microphone. A first integration execution step for integrating the output in the first filtering execution step, a second integration execution step for integrating the output in the second filtering execution step, and an integration in the first integration execution step Value and the integrated value in the second integration execution step In which it was decided to have a voice power comparison step of detecting a frequency band that is present.
[0024]
According to this configuration, the transmitted voice is divided into, for example, a low-frequency part and a high-frequency part, and an integrated output corresponding to each band is obtained. Since the output can be obtained, each speaker's voice can be accurately detected, and accurate speaker detection can be performed even when the surrounding acoustic noise or the electrical noise of the device is large. It has the action of:
[0025]
The echo canceling method according to claim 5, wherein a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. An echo canceling method in an echo canceling device, comprising: a first filtering execution step of filtering a sound such as a reception sound from a far-end speaker; and a second filtering execution step of filtering an input sound from a microphone. A first integration execution step for integrating the output in the first filtering execution step, a second integration execution step for integrating the output in the second filtering execution step, and an integration in the first integration execution step Value and the integrated value in the second integration execution step An audio power comparing step of detecting an existing frequency band, a stationary noise floor measuring step of measuring ambient stationary noise, and a coefficient control step of controlling a coefficient of the audio power comparing step using the measured noise in the stationary noise floor measuring step And
[0026]
With this configuration, the coefficient of the audio power comparing means is controlled in accordance with the magnitude of the surrounding acoustic noise or the electrical noise of the device, so that the surrounding acoustic noise or the electrical noise of the device is controlled. Has the effect that the accuracy of speaker detection does not deteriorate even if the
[0027]
The echo canceling method according to claim 6, wherein a speaker that outputs voice such as a received voice from a far-end speaker, a microphone that receives voice of a near-end speaker or the like, and a central processing unit that controls the whole. An echo canceling method in an echo canceling device, comprising: a first filtering execution step of filtering a sound such as a reception sound from a far-end speaker; and a second filtering execution step of filtering an input sound from a microphone. A first integration execution step for integrating the output in the first filtering execution step, a second integration execution step for integrating the output in the second filtering execution step, and an integration in the first integration execution step Value and the integrated value in the second integration execution step A voice power comparing step of detecting an existing frequency band, a first speaker utterance time measuring step of measuring a silent section in which the far-end speaker is not speaking, and a silent section in which the near-end speaker is not speaking. A second speaker utterance time measurement step to be measured, and a first integration window length control step of controlling an integration window length in a first integration execution step using a measured silent section in the first speaker utterance time measurement step And a second integration window length control step of controlling the integration window length of the second integration step using the measured silent section in the second speaker utterance time measurement step.
[0028]
With this configuration, the first and second integration window lengths are controlled in accordance with the silence section. Therefore, the integrated output when the transmitted voice is divided into, for example, a low-frequency part and a high-frequency part, and the received voice are output. For example, there is an effect that the integral output when the current value is limited to the middle band portion can be obtained more accurately.
[0029]
A program according to a seventh aspect is a program for executing each step of the echo canceling method according to any one of the fourth to sixth aspects.
[0030]
With this configuration, the use of the computer that executes the program has an effect that the echo cancellation method according to any one of claims 4 to 6 can be executed at an arbitrary place and at an arbitrary time.
[0031]
A recording medium according to an eighth aspect is a computer-readable recording medium for executing the program according to the seventh aspect.
[0032]
With this configuration, by reading the program from a computer-readable recording medium, the program described in claim 7 can be executed at an arbitrary place and at an arbitrary time.
[0033]
Hereinafter, embodiments of the present invention will be described with reference to FIGS.
[0034]
(Embodiment 1)
FIG. 1 is a block diagram showing a basic configuration of an echo canceling apparatus according to Embodiments 1, 2, 3, and 4 of the present invention.
[0035]
In FIG. 1, 6 is a telephone circuit device having an interface with a telephone line, 7 is an A / D converter for converting a received voice electric signal, which is an analog electric signal, into a digital electric signal, and 8 is a digital electric signal which is an analog electric signal. A digital / analog (D / A) converter 9 for converting an analog electrical signal from the D / A converter 8 into a voice; a microphone 10 for converting a voice into an analog electrical signal; and an analog electrical signal 11 from the microphone 10 A / D converter for converting a digital electric signal into a digital electric signal, 12 a D / A converter for converting a digital electric signal into an analog electric signal (transmission voice electric signal), and 13 an A / D converter 7 and an A / D The digital electric signal obtained from the conversion device 11 is subjected to digital signal processing, and the operation result is converted into a D / A conversion device 8 and a D / A conversion. A central processing unit 14 for outputting to the device 12; a ROM (Read Only Memory) in which a program for operating the central processing unit 13 is stored; and 15, a central processing unit 13 in accordance with the program stored in the ROM 14. Is a RAM (Random Access Memory) used when the device operates.
[0036]
FIG. 2 is a functional block diagram showing function realizing means (means for realizing a function by a program) in the central processing unit 13 of FIG. 1, and shows an echo canceling method in a speakerphone telephone. This function shows an outline of a program recorded in the ROM 14.
[0037]
In FIG. 2, reference numeral 16 denotes a speaker-phone type telephone or the like for controlling the operation of the echo canceling device by controlling the utterance of the far-end speaker, the utterance of the near-end speaker, and the double talk (the far-end speaker and the near-end speaker). A speaker detecting means 17 for detecting simultaneous utterance) is a transfer function estimator for estimating a transfer function of a space between the speaker 9 and the microphone 10 by a steepest descent method represented by a normalized LMS (Least Mean Square) method or the like. Means 18, a direct echo filter means for performing a convolution operation between the transfer function of the direct echo component and the received voice, 19 an indirect echo filter means for performing a convolution operation between the transfer function of the indirect echo component and the received voice, and 20 a subtraction means It is.
[0038]
The schematic operation of the thus configured echo canceling device will be described. The sound radiated from the speaker 9 is input to the microphone 10 via the space as an echo, and a closed loop is formed. If the echo cancellation processing is not performed, the worst howling occurs. The sound radiated from the speaker 9 can be classified into a direct echo component directly input to the microphone 10 and an indirect echo component which enters the microphone 10 after being reflected by an object such as a wall, floor, or ceiling in the space.
[0039]
FIG. 3 is a flowchart showing the operation of the central processing unit 13 in FIG. 2, and shows an echo canceling method in a speakerphone telephone.
[0040]
In FIG. 3, when the echo canceling process is started (S1), the speaker detecting means 16 determines far-end speaker utterance, near-end speaker utterance, and double-talk (S2). Estimating means 17 performs direct wave component transfer function estimation (S3) and indirect wave component transfer function estimation (S4) using an algorithm such as NLMS, and direct echo filter means 18 performs a convolution operation between the estimation result and the received voice. (S5) The indirect echo filter means 19 performs a convolution operation on the estimation result and the received voice (S6), and subtracts the transmitted voice from the microphone 10 and the convolution operation result using the subtraction means 20 to obtain a direct echo component. And the indirect echo component are removed (S7).
[0041]
As a result, it is possible to perform an echo cancellation process that realizes both high-speed and high-accuracy transfer function estimation.
[0042]
As described above, according to the present embodiment, the convolution operation of the estimation result and the received voice is performed by the direct echo filter unit 18, the convolution operation of the estimation result and the received voice is performed by the indirect echo filter unit 19, and the microphone 10 The direct speech component and the indirect echo component are removed by subtracting the transmitted voice from the convolution and the convolution operation result using the subtraction means 20, so that even if the volume from the speaker is increased, the accuracy of the double talk determination can be reduced. It is possible to increase the double talk detection accuracy even when the voice power ratio between the received voice and the transmitted voice is the same.
[0043]
(Embodiment 2)
FIG. 4 is a functional block diagram showing a central processing unit 13 of an echo canceling apparatus according to Embodiments 2, 3, and 4 of the present invention, and shows an echo canceling method in a speakerphone telephone. The basic configuration of the echo canceling apparatus according to the present embodiment is the configuration shown in FIG. This function shows an outline of a program recorded in the ROM 14.
[0044]
In FIG. 4, reference numeral 34 denotes a channel serving as a communication line such as a telephone line, and reference numeral 35 denotes utterance of a far-end speaker, utterance of a near-end speaker, and double talk (distant) for controlling the operation of an echo canceller in a speakerphone type telephone. Speaker detecting means for detecting the simultaneous utterance of the end speaker and the near end speaker), and a transfer function estimator for estimating a space transfer function by a steepest descent method represented by a normalized LMS (Least Mean Square) method or the like Means 37, a filter means for performing a convolution operation of the estimated transfer function and the received voice, and 38 a subtraction means.
[0045]
FIG. 5 is a power distribution comparison diagram of the sound A1 passing through the channel 34 and the sound A2 not passing through the channel 34 in the frequency band. In FIG. 5, if a telephone line is taken as an example of the channel 34, the frequency band of the telephone line is limited to 300 Hz to 3400 Hz, which is narrower than the band of 10 Hz to 10 kHz which is a normal human voice band.
[0046]
The operation of the central processing unit 13 thus configured will be described with reference to FIG. FIG. 6 is a flowchart showing the operation of the central processing unit 13 of FIG.
[0047]
In FIG. 6, when the echo canceling process is started (S11), the speaker detection unit 35 pays attention to the band of the channel 34, and the first filter that allows the transmission sound from the microphone 10 to pass through the sound of only the band of the channel 34. Means (filter means (not shown) in speaker detection means 35) and second filter means (filter means (not shown) in speaker detection means 35) for passing sounds other than the band of the channel (S12). The output of the first filter means is integrated by the first integration means (integration means (not shown) in the speaker detection means 35), and the output of the second filter means is integrated by the second integration means (speaker detection). , And the audio power of the integrated output of each filter means is measured, and the audio power ratio (not shown) in the speaker detection means 35 is measured. Means detects a frequency band exists speech power by comparing the integrated output of each filter means. As a result of the detection, if the voice power is concentrated only in the band of the channel 34 in the measured voice power, it is detected as the utterance of the far-end speaker, and the transfer function estimating means 36 estimates the transfer function (S13, S14). If the voice power exists in both the band of the channel 34 and the band other than the band of the channel 34, it is determined that the speech is the near-end talker or the double talk, and the estimation operation of the transfer function estimating means 36 is stopped. (S13, S15).
[0048]
As described above, according to the present embodiment, first filter means for filtering voice such as received voice from a far-end speaker, second filter means for filtering input voice from microphone 10, The first integration means for integrating the output of one filter means, the second integration means for integrating the output of the second filter means, and the output results of the first and second integration means are compared. A voice power comparing means for detecting a frequency band in which voice power is present, to divide a transmission voice into, for example, a low-frequency part and a high-frequency part to obtain an integrated output corresponding to each band, and It is possible to acquire the integrated output corresponding to this band by limiting the voice to, for example, the middle band part, so that each speaker's voice can be accurately detected without depending on the performance of the echo canceller. Can, it is possible to perform the electrical noise is large even if accurate speaker detection of acoustic noise or device around.
[0049]
(Embodiment 3)
The basic configuration of the echo cancellation device and the configuration of the central processing unit 13 according to the third embodiment of the present invention are the configurations shown in FIGS.
[0050]
The operation of the central processing unit 13 thus configured will be described with reference to FIG. FIG. 7 is a flowchart showing the operation of the central processing unit of FIG.
[0051]
In FIG. 7, when the echo canceling process is started (S21), the speaker detecting unit 35 detects the far-end utterance, the near-end utterance, and the double talk. The noise floor measuring means measures a noise floor level (level of surrounding stationary noise) (S22). If the measured noise floor level is high, the coefficient control means (not shown) in the speaker detection means 35 detects the detection threshold of the speaker detection means 35 (the threshold for detecting the frequency band in the sound power comparison means, the sound power comparison means). Is increased (S23, S24), and if the measured noise floor level is low, the detection threshold of the speaker detecting means 35 is reduced (S23, S25).
[0052]
As described above, according to the present embodiment, first filter means for filtering voice such as received voice from a far-end speaker, second filter means for filtering input voice from microphone 10, The first integration means for integrating the output of one filter means, the second integration means for integrating the output of the second filter means, and the output results of the first and second integration means are compared. Audio power comparing means for detecting a frequency band in which audio power exists, stationary noise floor measuring means for measuring ambient stationary noise, and a coefficient for controlling a coefficient of the audio power comparing means using the result of the stationary noise floor measuring means With the control means, the coefficient of the audio power comparing means is controlled in accordance with the magnitude of the surrounding acoustic noise or the electrical noise of the device. Acoustic noise or device electrical noise is also speaker detection increases accuracy of does not deteriorate.
[0053]
(Embodiment 4)
The basic configuration of the echo canceling apparatus according to the fourth embodiment of the present invention and the configuration of the central processing unit 13 are the configurations shown in FIGS.
[0054]
The operation of the central processing unit 13 thus configured will be described with reference to FIG. FIG. 8 is a flowchart showing the operation of the central processing unit of FIG.
[0055]
In FIG. 8, when the echo canceling process is started (S31), the speaker detecting means 35 detects the far-end utterance, the near-end utterance, and the double-talk, and measures the length of the silent period if there is no sound (S32). . If the silent period is long (S33), the leak integration window length of the audio power is shortened (S34), and if the silent period is short, the leak integration window length of the audio power is increased (S35). That is, the first silent section measuring unit (not shown) in the speaker detecting unit 35 measures a silent section in which the far end speaker does not speak, and the second silent section measuring unit (not shown) in the speaker detecting unit 35. Measures the silent interval in which the near end speaker is not speaking. The first integration window length control means (not shown) in the speaker detection means 35 controls the integration window length of the first integration means using the result of the first silent section measuring means, and the speaker detection means 35 The second integration window length control means (not shown) controls the integration window length of the second integration means using the result of the second silent interval measuring means.
[0056]
As described above, according to the present embodiment, first filter means for filtering voice such as received voice from a far-end speaker, second filter means for filtering input voice from microphone 10, The first integration means for integrating the output of one filter means, the second integration means for integrating the output of the second filter means, and the output results of the first and second integration means are compared. Voice power comparing means for detecting a frequency band in which voice power exists, first silent section measuring means for measuring a silent section in which the far end speaker is not speaking, and a silent section in which the near end speaker is not speaking , A first integration window length control means for controlling an integration window length of the first integration means using a result of the first silent section measurement means, and a second silence section. Second using the results of the section measurement means By having the second integration window length control means for controlling the integration window length of the dividing means, the first and second integration window lengths are controlled in accordance with the silent section, so that the transmitted voice can be It is possible to more accurately acquire an integrated output when the sound is divided into a low-frequency part and a high-frequency part and an integrated output when the received voice is limited to, for example, a middle-frequency part.
[0057]
【The invention's effect】
As described above, according to the echo cancellation apparatus of the first aspect of the present invention, a speaker that outputs voice such as a received voice from a far-end speaker, and a microphone that receives voice of a near-end speaker or the like And a central processing unit for controlling the whole, wherein the central processing unit is configured to filter a voice such as a received voice from a far-end speaker, A second filter means for filtering the input voice, a first integration means for integrating the output of the first filter means, a second integration means for integrating the output of the second filter means, And audio power comparing means for comparing the output result of the second integrating means and detecting a frequency band in which the audio power is present, so that the transmitted voice can be converted into, for example, a low-frequency part and a high-frequency part. It is possible to obtain the integral output according to each band by dividing into the minute, and to obtain the integral output according to this band by limiting the received voice to, for example, the middle band part, so it depends on the performance of the echo canceller. This makes it possible to accurately detect each speaker's voice without performing, and to obtain an advantageous effect that accurate speaker detection can be performed even when the surrounding acoustic noise or the electrical noise of the device is large. Can be
[0058]
According to the echo canceling device of the second aspect, a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. A central processing unit, wherein the central processing unit includes a first filtering unit that filters voice such as a received voice from a far-end speaker, and a second filtering unit that filters input voice from a microphone. Filter means, first integration means for integrating the output of the first filter means, second integration means for integrating the output of the second filter means, and outputs of the first and second integration means Audio power comparing means for comparing the results to detect a frequency band in which the audio power exists, stationary noise floor measuring means for measuring ambient stationary noise, and stationary noise floor measuring means And a coefficient control means for controlling the coefficient of the audio power comparing means using the result, thereby controlling the coefficient of the audio power comparing means in accordance with the magnitude of the surrounding acoustic noise or the electrical noise of the apparatus. As a result, an advantageous effect is obtained in that the accuracy of speaker detection does not deteriorate even if the surrounding acoustic noise or the electrical noise of the device increases.
[0059]
According to the echo canceling device of the third aspect, a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. A central processing unit, wherein the central processing unit includes a first filtering unit that filters voice such as a received voice from a far-end speaker, and a second filtering unit that filters input voice from a microphone. Filter means, first integration means for integrating the output of the first filter means, second integration means for integrating the output of the second filter means, and outputs of the first and second integration means Voice power comparing means for comparing the results to detect a frequency band in which voice power exists, first silent section measuring means for measuring a silent section in which the far-end speaker is not speaking, and near-end speaker A second silent section measuring means for measuring a silent section that is not speaking, and a first integration window length controlling means for controlling an integration window length of the first integrating means using a result of the first silent section measuring means. And second integration window length control means for controlling the integration window length of the second integration means using the result of the second silent interval measurement means, so that the first and second Since the integration window length is controlled, the integrated output when the transmitted voice is divided into, for example, a low-frequency portion and a high-frequency portion, and the integrated output when the received voice is limited to, for example, a mid-frequency portion, are more accurate. The advantage is obtained that it is possible to obtain the following.
[0060]
According to the echo canceling method of the fourth aspect, a speaker for outputting a voice such as a received voice from a far-end speaker, a microphone for receiving a voice of a near-end speaker or the like, and a central processing unit for controlling the whole. A first filtering execution step of filtering voice such as a received voice from a far-end speaker, and a second filtering of filtering input voice from a microphone. An execution step, a first integration execution step for integrating the output in the first filtering execution step, a second integration execution step for integrating the output in the second filtering execution step, and a first integration execution step By comparing the integral value at And a voice power comparing step of detecting a frequency band in which the power is present, whereby the transmitted voice is divided into, for example, a low-frequency part and a high-frequency part, and an integrated output corresponding to each band is obtained, and the received voice is obtained. Is limited to the middle band, for example, and an integrated output corresponding to this band can be obtained, so that each speaker's voice can be accurately detected, and the surrounding acoustic noise or the electrical An advantageous effect that accurate speaker detection can be performed even when the noise is large is obtained.
[0061]
According to the echo canceling method of the fifth aspect, a speaker for outputting voice such as a received voice from a far-end speaker, a microphone for inputting voice of a near-end speaker or the like, and a central processing unit for controlling the whole. A first filtering execution step of filtering voice such as a received voice from a far-end speaker, and a second filtering of filtering input voice from a microphone. An execution step, a first integration execution step for integrating the output in the first filtering execution step, a second integration execution step for integrating the output in the second filtering execution step, and a first integration execution step By comparing the integral value at A voice power comparison step of detecting a frequency band in which a power exists, a stationary noise floor measurement step of measuring ambient stationary noise, and a coefficient for controlling a coefficient of the audio power comparison step using measurement noise in the stationary noise floor measurement step. Control step, the coefficient of the audio power comparing means is controlled in accordance with the magnitude of the surrounding acoustic noise or the electrical noise of the device. An advantageous effect is obtained in that the accuracy of speaker detection does not deteriorate even if the electrical noise increases.
[0062]
According to the echo canceling method of the sixth aspect, a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. A first filtering execution step of filtering voice such as a received voice from a far-end speaker, and a second filtering of filtering input voice from a microphone. An execution step, a first integration execution step for integrating the output in the first filtering execution step, a second integration execution step for integrating the output in the second filtering execution step, and a first integration execution step By comparing the integral value at Voice power comparing step for detecting a frequency band in which a word exists, a first speaker utterance time measuring step for measuring a silent section in which the far-end speaker is not speaking, and silence in which the near-end speaker is not speaking. A second speaker utterance time measuring step for measuring the section, and a first integration window length for controlling the integration window length in the first integration execution step using the measured silent section in the first speaker utterance time measuring step Control step, and a second integration window length control step of controlling the integration window length of the second integration step using the measured silent section in the second speaker utterance time measuring step, thereby controlling the integration window length according to the silent section. Since the first and second integration window lengths are controlled, the integrated output when the transmitted voice is divided into, for example, a low-frequency portion and a high-frequency portion, and the received voice is limited to, for example, a middle-frequency portion. Integral of An advantageous effect that it is possible to further accurately obtain the power obtained.
[0063]
According to the program described in claim 7, since the program is for executing each step of the echo canceling method described in any one of claims 4 to 6, a computer that executes the program is used. Accordingly, an advantageous effect is obtained that the echo cancellation method according to any one of claims 4 to 6 can be executed at an arbitrary place and at an arbitrary time.
[0064]
According to the recording medium of the eighth aspect, the computer readable recording medium for executing the program of the seventh aspect reads the program from the computer readable recording medium. The advantageous effect that the program described in claim 7 can be executed at an arbitrary place and at an arbitrary time can be obtained.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a basic configuration of an echo canceling apparatus according to Embodiments 1, 2, 3, and 4 of the present invention.
FIG. 2 is a functional block diagram showing function realizing means in the central processing unit of FIG. 1;
FIG. 3 is a flowchart showing the operation of the central processing unit of FIG. 2;
FIG. 4 is a functional block diagram showing a central processing unit of an echo canceling apparatus according to Embodiments 2, 3, and 4 of the present invention;
FIG. 5 is a power distribution comparison diagram in a frequency band between voice passing through a channel and voice not passing through a channel.
FIG. 6 is a flowchart showing the operation of the central processing unit in FIG. 4;
FIG. 7 is a flowchart showing the operation of the central processing unit of FIG. 4;
FIG. 8 is a flowchart showing the operation of the central processing unit of FIG. 4;
FIG. 9 is a functional block diagram showing a conventional echo canceling device.
[Explanation of symbols]
6 Telephone circuit device
7,11 A / D converter
8,12 D / A converter
9 Speaker
10 microphone
13 Central processing unit
14 ROM
15 RAM
34 channels
35 Speaker detection means
36 Transfer function estimation means
37 Filter means
38 Subtraction means

Claims

An echo cancellation device having a speaker that outputs voice such as a reception voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole,
The central processing unit includes a first filter unit that filters voice such as a received voice from the far-end speaker, a second filter unit that filters input voice from the microphone, and the first filter. First integrating means for integrating the output of the means, second integrating means for integrating the output of the second filter means, and comparing the output results of the first and second integrating means, An echo canceling device comprising: audio power comparing means for detecting a frequency band in which power exists.

An echo cancellation device having a speaker that outputs voice such as a reception voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole,
The central processing unit includes a first filter unit that filters voice such as a received voice from the far-end speaker, a second filter unit that filters input voice from the microphone, and the first filter. First integrating means for integrating the output of the means, second integrating means for integrating the output of the second filter means, and comparing the output results of the first and second integrating means, Audio power comparing means for detecting a frequency band in which power exists, stationary noise floor measuring means for measuring ambient stationary noise, and controlling a coefficient of the audio power comparing means using the result of the stationary noise floor measuring means. An echo canceller comprising: a coefficient control unit.

An echo cancellation device having a speaker that outputs voice such as a reception voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole,
The central processing unit includes a first filter unit that filters voice such as a received voice from the far-end speaker, a second filter unit that filters input voice from the microphone, and the first filter. First integrating means for integrating the output of the means, second integrating means for integrating the output of the second filter means, and comparing the output results of the first and second integrating means, Voice power comparing means for detecting a frequency band in which power is present, first silent section measuring means for measuring a silent section in which the far end speaker is not speaking, and a silent section in which the near end speaker is not speaking. A second silent interval measuring means for measuring, a first integration window length control means for controlling an integration window length of the first integrating means using a result of the first silent interval measuring means, and Of silence section measurement means Echo canceling apparatus characterized by having a second integration window length control means for controlling the integration window length of said second integrating means with a fruit.

An echo canceling method in an echo canceling apparatus having a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. So,
A first filtering execution step of filtering a voice such as a reception voice from the far end speaker, a second filtering execution step of filtering an input voice from the microphone, and an output of the first filtering execution step. A first integration execution step for performing integration, a second integration execution step for integrating the output in the second filtering execution step, and an integration value in the first integration execution step and a second integration execution step. An audio power comparing step of detecting a frequency band in which audio power exists by comparing the integrated value with an integrated value.

An echo canceling method in an echo canceling apparatus having a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. So,
A first filtering execution step of filtering a voice such as a reception voice from the far end speaker, a second filtering execution step of filtering an input voice from the microphone, and an output of the first filtering execution step. A first integration execution step for performing integration, a second integration execution step for integrating the output in the second filtering execution step, and an integration value in the first integration execution step and a second integration execution step. An audio power comparing step of detecting a frequency band where audio power is present by comparing the integrated value, a stationary noise floor measuring step of measuring ambient stationary noise, and using the measurement noise in the stationary noise floor measuring step. Controlling the coefficient of the audio power comparison step Echo canceling method characterized by a control step.

An echo canceling method in an echo canceling apparatus having a speaker that outputs voice such as a received voice from a far-end speaker, a microphone into which voice of a near-end speaker or the like is input, and a central processing unit that controls the whole. So,
A first filtering execution step of filtering a voice such as a reception voice from the far end speaker, a second filtering execution step of filtering an input voice from the microphone, and an output of the first filtering execution step. A first integration execution step for performing integration, a second integration execution step for integrating the output in the second filtering execution step, and an integration value in the first integration execution step and a second integration execution step. An audio power comparing step of detecting a frequency band in which audio power is present by comparing the integrated value, a first speaker utterance time measuring step of measuring a silent section in which the far-end speaker is not speaking, and A second speaker utterance time measuring step of measuring a silent section in which the near end speaker is not speaking; A first integration window length control step of controlling an integration window length in the first integration execution step using a measured silent section in the first speaker utterance time measurement step, and the second speaker utterance time measurement A second integration window length control step of controlling the integration window length of the second integration step using the measured silent section in the step.

A program for executing each step of the echo canceling method according to any one of claims 4 to 6.

A recording medium readable by a computer for executing the program according to claim 7.