JP2006270709A

JP2006270709A - Echo remover, electronic conference apparatus, and echo removing program

Info

Publication number: JP2006270709A
Application number: JP2005087987A
Authority: JP
Inventors: Yasuhiro Kodama; 康広小玉; Yasuhiko Kato; 靖彦加藤; Takao Fukui; 隆郎福井; Jo Matsui; 丈松井
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2005-03-25
Filing date: 2005-03-25
Publication date: 2006-10-05

Abstract

<P>PROBLEM TO BE SOLVED: To more surely remove an echo component of a sound pickup signal irrespective of the presence of the occurrence of double talk. <P>SOLUTION: An echo component removal part 512 estimates the echo component with an adaptive filter in a time domain from the sound pickup signal and a reference voltage corresponding to a voice to be reproduced and output to remove the echo component from the sound pickup signal. A parameter updating part 513 updates an adaptive filter parameter 501 in response to an update amount μdesignated by a μcalculation part 520. The update amount μof the adaptive filter parameter 501 is designated by the μcalculation part 520 in response to coherence between an error signal yielded by removing the echo component from the sound pickup signal and the reference signal. It is therefore possible to securely control the operation such that the update amount is optimized on the occurrence of the double talk. This improves the quality of the error signal. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、再生出力された音声の収音部への回り込みにより生じるエコー成分を収音信号から除去するエコー除去装置、このエコー除去装置を備えた電子会議装置、エコー除去方法およびエコー除去プログラムに関し、特に、ダブルトークが発生した場合にエコー成分を良好に除去できるようにしたエコー除去装置、電子会議装置、エコー除去方法およびエコー除去プログラムに関する。 The present invention relates to an echo removal apparatus that removes an echo component generated by a wraparound of reproduced and outputted sound from a sound collection signal, an electronic conference apparatus equipped with the echo removal apparatus, an echo removal method, and an echo removal program In particular, the present invention relates to an echo removal apparatus, an electronic conference apparatus, an echo removal method, and an echo removal program that can favorably remove an echo component when double talk occurs.

例えば電子会議システムなど、マイクロフォンによる収音信号を双方向で通信するシステムでは、相手側で収音された音声信号が自分側のスピーカなどで出力され、その再生音声が自分側のマイクロフォンに回り込んで収音された場合に、エコーが発生して送受信する音声の品質が低下することが知られている。このため、このような従来のシステムでは、マイクロフォンによる収音信号からエコー成分を除去するエコーキャンセラが一般的に用いられていた。 For example, in a system that communicates sound collected by a microphone in two ways, such as an electronic conference system, the sound signal collected by the other party is output from the speaker on its own side, and the reproduced sound wraps around the microphone on its own side. It is known that the quality of the voice transmitted / received is degraded due to the occurrence of an echo when the sound is picked up by the sound. For this reason, in such a conventional system, an echo canceller that removes an echo component from a collected sound signal by a microphone is generally used.

従来のエコーキャンセラとしては、音声信号を時間領域のまま処理する適応フィルタによってエコー成分を推定するものと、音声信号を時間領域から周波数領域に変換して処理する適応フィルタを用いるものとが知られている。後者の例としては、収音信号のパワースペクトル、およびこの信号と再生信号とのクロススペクトルを求めるとともに、再生信号と１チャネル以上の収音信号とのコヒーレンスを求めて、これらから周波数帯域ごとの収音信号に占めるエコー成分の比率を推定し、比率からエコー抑圧ゲインを算出して、収音信号のエコーを抑圧する方法があった（例えば、特許文献１参照）。
特開２００３−３０９４９３号公報（段落番号〔００１７〕〜〔００２０〕、図１） Conventional echo cancellers are known to estimate echo components with an adaptive filter that processes speech signals as they are in the time domain, and to use adaptive filters that process speech signals by converting them from the time domain to the frequency domain. ing. As an example of the latter, the power spectrum of the collected sound signal and the cross spectrum between this signal and the reproduced signal are obtained, and the coherence between the reproduced signal and the collected sound signal of one channel or more is obtained, and from these, the frequency spectrum is obtained. There has been a method of estimating the ratio of echo components in the collected sound signal, calculating an echo suppression gain from the ratio, and suppressing the echo of the collected sound signal (see, for example, Patent Document 1).
JP 2003-309493 A (paragraph numbers [0017] to [0020], FIG. 1)

ところで、上述した適応フィルタの処理手法のうち、音声信号を時間領域のままで処理する手法には、自分側と相手側の両者が同時に音声を発するダブルトークの状態では、適応フィルタの学習速度を緩めた方がエコー成分を確実に除去でき、音声品質が向上することが知られている。一方、音声信号を周波数領域に変換して処理する手法では、コヒーレンスという尺度を用いることでダブルトークの状態を含めて、エコー成分除去後の音声品質を向上できる手法が提案されている。しかし、時間領域のままで処理する手法では周波数領域の尺度であるコヒーレンスをそのまま扱うことができないため、ダブルトーク発生を正確に検出して音声品質を向上させることが困難であった。 By the way, of the above-described adaptive filter processing methods, the method of processing an audio signal in the time domain is such that the learning speed of the adaptive filter is set in a double-talk state where both the other party and the other party emit voice simultaneously. It is known that the looser can reliably remove the echo component and improve the voice quality. On the other hand, as a method of converting an audio signal into a frequency domain and processing it, a method has been proposed that can improve audio quality after removing an echo component including a double talk state by using a scale called coherence. However, since the method of processing in the time domain cannot handle the coherence that is a measure of the frequency domain as it is, it is difficult to accurately detect the occurrence of double talk and improve the voice quality.

本発明はこのような点に鑑みてなされたものであり、ダブルトーク発生の有無に関係なく、収音信号のエコー成分をより確実に除去できるエコー除去装置を提供することを目的とする。 The present invention has been made in view of these points, and an object of the present invention is to provide an echo removal apparatus that can more reliably remove an echo component of a collected sound signal regardless of the occurrence of double talk.

また、本発明の他の目的は、ダブルトーク発生の有無に関係なく、収音信号のエコー成分をより確実に除去できる電子会議装置を提供することである。
さらに、本発明の他の目的は、ダブルトーク発生の有無に関係なく、収音信号のエコー成分をより確実に除去できるエコー除去方法を提供することである。 Another object of the present invention is to provide an electronic conference apparatus that can more reliably remove the echo component of the collected sound signal regardless of the occurrence of double talk.
Furthermore, another object of the present invention is to provide an echo removal method that can more reliably remove the echo component of the collected sound signal regardless of the occurrence of double talk.

また、本発明の他の目的は、ダブルトーク発生の有無に関係なく、収音信号のエコー成分をより確実に除去できるエコー除去プログラムを提供することである。 Another object of the present invention is to provide an echo removal program that can more reliably remove an echo component of a collected sound signal regardless of whether or not double talk occurs.

本発明では上記課題を解決するために、再生出力された音声の収音部への回り込みにより生じるエコー成分を収音信号から除去するエコー除去装置において、前記収音信号と再生出力する音声に対応する参照信号とから時間領域の適応フィルタにより前記エコー成分を推定して、前記収音信号から前記エコー成分を除去するエコー成分除去手段と、前記適応フィルタのパラメータを更新するパラメータ更新手段と、前記収音信号から前記エコー成分を除去したエラー信号と前記参照信号とのコヒーレンスに応じて、前記パラメータ更新手段による前記パラメータの更新量を指定する更新量指定手段とを有することを特徴とするエコー除去装置が提供される。 In the present invention, in order to solve the above-described problem, an echo removal apparatus that removes an echo component generated due to the wraparound of the reproduced and output sound from the sound collection signal corresponds to the sound collection signal and the sound to be reproduced and output. An echo component removing means for estimating the echo component from a reference signal by a time domain adaptive filter and removing the echo component from the collected sound signal, a parameter updating means for updating a parameter of the adaptive filter, Echo removal comprising: an update amount specifying means for specifying an update amount of the parameter by the parameter update means in accordance with the coherence between the error signal obtained by removing the echo component from the collected sound signal and the reference signal An apparatus is provided.

ここで、エコー成分除去手段は、収音信号と、再生出力する音声に対応する参照信号とから、時間領域の適応フィルタによりエコー成分を推定して、収音信号からエコー成分を除去する。パラメータ更新手段は、更新量指定手段により指定される更新量に応じて、適応フィルタのパラメータを更新する。パラメータの更新量は、更新量指定手段により、収音信号からエコー成分を除去したエラー信号と参照信号とのコヒーレンスに応じて指定される。ダブルトークの発生時には通常コヒーレンスが低くなることから、ダブルトークの発生時に更新量が最適化されるように確実に制御できるようになる。 Here, the echo component removing means estimates the echo component from the collected sound signal and the reference signal corresponding to the sound to be reproduced and output by the time domain adaptive filter, and removes the echo component from the collected sound signal. The parameter update unit updates the parameter of the adaptive filter in accordance with the update amount designated by the update amount designation unit. The update amount of the parameter is specified by the update amount specifying means according to the coherence between the error signal obtained by removing the echo component from the collected sound signal and the reference signal. Since the coherence is usually low when double talk occurs, it is possible to reliably control the update amount to be optimized when double talk occurs.

本発明のエコー除去装置によれば、時間領域の適応フィルタにより除去すべきエコー成分を推定するので、周波数領域での処理と比較して処理による遅延を減少させ、かつエコー成分の推定処理の追従性を高くできる。これに加えて、適応フィルタのパラメータを、エラー信号と参照信号とのコヒーレンスに応じて変化させることにより、ダブルトークの発生時に更新量が最適化されるように確実に制御できるようになる。従って、ダブルトークの発生の有無に関係なく、収音信号のエコー成分を確実に除去することができる。 According to the echo removing apparatus of the present invention, the echo component to be removed is estimated by the time domain adaptive filter, so that the delay caused by the processing is reduced compared to the processing in the frequency domain, and the echo component estimating processing is followed. Can increase the sex. In addition to this, by changing the parameter of the adaptive filter according to the coherence between the error signal and the reference signal, it is possible to reliably control the update amount when the double talk occurs. Therefore, the echo component of the collected sound signal can be reliably removed regardless of whether or not double talk occurs.

以下、本発明を電子会議システムの端末装置に適用した場合を例に、本発明の実施の形態について図面を参照して詳細に説明する。
図１は、実施の形態に係る電子会議システムの構成例を示す図である。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings, taking as an example the case where the present invention is applied to a terminal device of an electronic conference system.
FIG. 1 is a diagram illustrating a configuration example of an electronic conference system according to an embodiment.

本実施の形態の電子会議システムは、図１に示すように、電子会議端末１０および２０がネットワーク３０に接続された構成を有している。電子会議端末１０および２０は、遠隔地の会議室の間で電子会議を行うための端末であり、ネットワーク３０を通じて画像信号および音声信号を送受信できるようになっている。 The electronic conference system according to this embodiment has a configuration in which electronic conference terminals 10 and 20 are connected to a network 30 as shown in FIG. The electronic conference terminals 10 and 20 are terminals for conducting an electronic conference between remote conference rooms, and can transmit and receive image signals and audio signals through the network 30.

ここでは例として電子会議端末１０の概略構成について説明する。この電子会議端末１０は、ネットワーク３０を通じて画像・音声データを送受信するネットワークインタフェース（Ｉ／Ｆ）１１と、画像信号の符号化／復号化を行う画像ＣＯＤＥＣ（COder/DECoder）１２と、表示画像信号や撮像画像信号の入出力処理を行う画像Ｉ／Ｆ１３と、音声データの符号化／復号化を行う音声ＣＯＤＥＣ１４と、収音信号からエコー成分を除去するエコーキャンセラ１５と、出力音声信号や収音信号の入出力処理を行う音声Ｉ／Ｆ１６とを具備している。また、電子会議端末１０の外部には、カメラ１３ａ、モニタ１３ｂ、マイクロフォン１６ａ、およびスピーカ１６ｂが接続されている。 Here, a schematic configuration of the electronic conference terminal 10 will be described as an example. The electronic conference terminal 10 includes a network interface (I / F) 11 that transmits and receives image / sound data through a network 30, an image CODEC (COder / DECoder) 12 that encodes / decodes an image signal, and a display image signal. And an image I / F 13 that performs input / output processing of a captured image signal, an audio CODEC 14 that encodes / decodes audio data, an echo canceller 15 that removes an echo component from the collected sound signal, and an output audio signal or collected sound And an audio I / F 16 for performing signal input / output processing. A camera 13a, a monitor 13b, a microphone 16a, and a speaker 16b are connected to the outside of the electronic conference terminal 10.

この電子会議端末１０において、カメラ１３ａにより撮像された画像信号は、画像Ｉ／Ｆ１３によりデジタル信号に変換され、画像ＣＯＤＥＣ１２により所定の符号化方式で符号化される。また、マイクロフォン１６ａにより収音された音声信号は、音声Ｉ／Ｆ１６によりデジタル信号に変換され、エコーキャンセラ１５によりエコー成分が除去された後、音声ＣＯＤＥＣ１４により所定の符号化方式で符号化される。符号化された画像および音声のデータは、ネットワークＩ／Ｆ１１によりパケットに多重化され、ネットワーク３０上に送出される。 In the electronic conference terminal 10, an image signal captured by the camera 13 a is converted into a digital signal by the image I / F 13, and is encoded by the image CODEC 12 using a predetermined encoding method. The audio signal collected by the microphone 16a is converted into a digital signal by the audio I / F 16, the echo component is removed by the echo canceller 15, and then encoded by the audio CODEC 14 using a predetermined encoding method. The encoded image and audio data is multiplexed into a packet by the network I / F 11 and sent out on the network 30.

また、ネットワーク３０を通じて受信された画像および音声のデータは、ネットワークＩ／Ｆ１１により分離されてそれぞれ画像ＣＯＤＥＣ１２および音声ＣＯＤＥＣ１４に入力される。分離された画像データは画像ＣＯＤＥＣ１２により復号化され、画像Ｉ／Ｆ１３により表示画像信号に変換されてモニタ１３ｂに出力され、これにより画像が再生表示される。また、ネットワークＩ／Ｆ１１で分離された音声データは、音声ＣＯＤＥＣ１４により復号化され、音声Ｉ／Ｆ１６によりアナログ信号に変換されてスピーカ１６ｂに出力され、これにより音声が再生出力される。また、音声ＣＯＤＥＣ１４で復号化された音声信号はエコーキャンセラ１５にも供給され、参照信号としてエコー除去処理に利用される。 Also, the image and audio data received through the network 30 are separated by the network I / F 11 and input to the image CODEC 12 and the audio CODEC 14, respectively. The separated image data is decoded by the image CODEC 12, converted into a display image signal by the image I / F 13, and output to the monitor 13b, whereby the image is reproduced and displayed. The audio data separated by the network I / F 11 is decoded by the audio CODEC 14, converted into an analog signal by the audio I / F 16, and output to the speaker 16b, thereby reproducing and outputting the audio. The audio signal decoded by the audio CODEC 14 is also supplied to the echo canceller 15 and is used as a reference signal for echo cancellation processing.

このような電子会議システムでは、例えば電子会議端末１０において収音された音声が、電子会議端末２０の側で再生され、その再生音声が回り込んで収音されることがある。電子会議端末１０の側でも同様な音声の回り込みが発生することがある。そこで、電子会議端末１０には、このような場合に発生するエコーの成分をエコーキャンセラ１５により除去することで、送受信される音声の品質を向上させている。 In such an electronic conference system, for example, the sound collected at the electronic conference terminal 10 may be reproduced on the electronic conference terminal 20 side, and the reproduced sound may wrap around and be collected. The same audio wrap may occur on the electronic conference terminal 10 side. Therefore, the electronic conference terminal 10 uses the echo canceller 15 to remove the echo component generated in such a case, thereby improving the quality of the audio transmitted and received.

なお、エコーキャンセラ１５の機能は、電子会議端末１０に限らず、例えばマイクロフォン１６ａに内蔵されていてもよい。この場合、マイクロフォン１６ａは例えば、収音信号を出力する他に、シリアル通信Ｉ／Ｆなどを通じて、電子会議端末１０から参照信号（相手側の電子会議端末２０からの音声信号）の供給を受ける。 Note that the function of the echo canceller 15 is not limited to the electronic conference terminal 10, and may be incorporated in the microphone 16a, for example. In this case, for example, the microphone 16a receives a reference signal (audio signal from the other party's electronic conference terminal 20) from the electronic conference terminal 10 through a serial communication I / F or the like in addition to outputting a collected sound signal.

図２は、エコーキャンセラ１５の内部構成例を示す図である。
エコーキャンセラ１５は、図２に示すように、適応フィルタによりエコー成分を除去する処理を行うエコーキャンセル処理部５１０と、適応フィルタのパラメータの更新量μを算出するμ算出部５２０とを具備する。エコーキャンセル処理部５１０は、参照信号バッファ５１１、エコー成分除去部５１２、パラメータ更新部５１３、パラメータ記憶部５１４、およびリスト記憶部５１５を具備している。また、μ算出部５２０は、コヒーレンス算出部５２１およびコヒーレンス／μ変換部５２２を具備している。 FIG. 2 is a diagram illustrating an internal configuration example of the echo canceller 15.
As shown in FIG. 2, the echo canceller 15 includes an echo cancellation processing unit 510 that performs processing for removing an echo component by an adaptive filter, and a μ calculation unit 520 that calculates an update amount μ of the parameter of the adaptive filter. The echo cancellation processing unit 510 includes a reference signal buffer 511, an echo component removal unit 512, a parameter update unit 513, a parameter storage unit 514, and a list storage unit 515. The μ calculator 520 includes a coherence calculator 521 and a coherence / μ converter 522.

参照信号バッファ５１１は、相手側の電子会議端末２０からネットワーク３０を通じて受信し、音声ＣＯＤＥＣ１４により復号化した参照信号を一時的に蓄積し、エコー成分除去部５１２やパラメータ更新部５１３、リスト記憶部５１５に出力する。 The reference signal buffer 511 temporarily stores the reference signal received from the electronic conference terminal 20 on the other side through the network 30 and decoded by the voice CODEC 14, and the echo component removal unit 512, the parameter update unit 513, and the list storage unit 515. Output to.

エコー成分除去部５１２は、マイクロフォン１６ａによる収音信号を音声Ｉ／Ｆ１６を通じて受信し、その収音信号からエコー成分を除去し、エラー信号として音声ＣＯＤＥＣ１４、パラメータ更新部５１３およびリスト記憶部５１５に出力する。このエコー成分除去部５１２は、パラメータ記憶部５１４に記憶された適応フィルタパラメータ（以下、単にパラメータと呼ぶ）５０１を用い、入力された収音信号と、参照信号バッファ５１１からの参照信号とから、時間領域で処理する適応フィルタによりエコー成分を推定する。 The echo component removal unit 512 receives the sound collection signal from the microphone 16a through the sound I / F 16, removes the echo component from the sound collection signal, and outputs it as an error signal to the sound CODEC 14, the parameter update unit 513, and the list storage unit 515. To do. The echo component removal unit 512 uses an adaptive filter parameter (hereinafter simply referred to as a parameter) 501 stored in the parameter storage unit 514, and uses the input sound pickup signal and the reference signal from the reference signal buffer 511, An echo component is estimated by an adaptive filter processed in the time domain.

パラメータ更新部５１３は、適応フィルタのパラメータ５０１の学習処理を行うブロックであり、μ算出部５２０からパラメータ５０１の更新量μの指定を受け、この更新量μに応じてエコー成分の推定に用いる適応フィルタのパラメータ５０１を更新する。パラメータ記憶部５１４は、パラメータ更新部５１３により更新されるパラメータ５０１を記憶して、エコー成分除去部５１２に出力する。 The parameter update unit 513 is a block that performs learning processing of the parameter 501 of the adaptive filter. The parameter update unit 513 receives the designation of the update amount μ of the parameter 501 from the μ calculation unit 520, and is used for estimation of echo components according to the update amount μ. The filter parameter 501 is updated. The parameter storage unit 514 stores the parameter 501 updated by the parameter update unit 513 and outputs the parameter 501 to the echo component removal unit 512.

リスト記憶部５１５は、エコー成分除去部５１２からのエラー信号、および参照信号バッファ５１１からの参照信号を、それぞれ同じ一定の時間ごとに順次蓄積したエラー信号リスト５０２および参照信号リスト５０３に記憶する。ここで、音声ＣＯＤＥＣ１４からの参照信号と、音声Ｉ／Ｆ１６からの収音信号とはエコーキャンセラ１５への信号出力タイミングが一致しており、リスト記憶部５１５のエラー信号リスト５０２および５０３には、同じサンプル数の音声データが常に蓄積される。 The list storage unit 515 stores the error signal from the echo component removal unit 512 and the reference signal from the reference signal buffer 511 in the error signal list 502 and the reference signal list 503 that are sequentially accumulated at the same constant time, respectively. Here, the reference signal from the audio CODEC 14 and the collected sound signal from the audio I / F 16 have the same signal output timing to the echo canceller 15, and the error signal lists 502 and 503 in the list storage unit 515 include Audio data of the same number of samples is always accumulated.

一方、μ算出部５２０のコヒーレンス算出部５２１は、リスト記憶部５１５に記憶されたエラー信号リスト５０２および参照信号リスト５０３を基にして、エラー信号と参照信号とのコヒーレンス値を算出する。コヒーレンス／μ変換部５２２は、コヒーレンス算出部５２１で算出されたコヒーレンス値を更新量μに変換し、パラメータ更新部５１３に対して更新量μを指定する。 On the other hand, the coherence calculation unit 521 of the μ calculation unit 520 calculates a coherence value between the error signal and the reference signal based on the error signal list 502 and the reference signal list 503 stored in the list storage unit 515. The coherence / μ conversion unit 522 converts the coherence value calculated by the coherence calculation unit 521 into the update amount μ, and designates the update amount μ to the parameter update unit 513.

次に、エコーキャンセル処理部５１０による処理について説明する。
このエコーキャンセル処理部５１０におけるエコー成分の除去処理は、音声信号を時間領域で処理する適応フィルタを用いてエコー成分を推定するものであり、その処理手順自体は従来から実行されていたものを適用できる。ｎ回目のエコーキャンセル処理におけるエコー成分は、次の式（１）により推定できる。 Next, processing by the echo cancellation processing unit 510 will be described.
The echo component removal processing in the echo cancellation processing unit 510 estimates the echo component using an adaptive filter that processes the audio signal in the time domain, and the processing procedure itself is the same as that used in the past. it can. The echo component in the n-th echo cancellation process can be estimated by the following equation (1).

ここで、ｙ（ｔ）は時刻ｔにおけるエコー成分の推定値、ｗは適応フィルタのパラメータ５０１、ｘは参照信号の時間領域データ、ｋはベクトルの要素数を示す。また、ｗ（ｎ）は、｛ｗ₀，ｗ₁，ｗ₂，……，ｗ_k-1｝の順で配列されたパラメータ５０１のベクトル、ｘ（ｎ）は、｛ｘ_t，ｘ_t-1，ｘ_t-2，……，ｘ_t-(k-1)｝の順で配列された参照信号の時間領域データのベクトルを示しており、ｗ（ｎ）ｘ（ｎ）は、２つのベクトルｗ（ｎ），ｘ（ｎ）の内積を示す。 Here, y (t) is the estimated value of the echo component at time t, w is the adaptive filter parameter 501, x is the time domain data of the reference signal, and k is the number of elements of the vector. Further, w (n) is a vector of parameters 501 arranged in the order of {w ₀ , w ₁ , w ₂ ,..., W _k−1 }, and x (n) is {x _t , x _{t− 1} , x _t−2 ,..., X _{t− (k−1)} } represents a vector of time domain data of reference signals, and w (n) x (n) An inner product of vectors w (n) and x (n) is shown.

エコー成分除去部５１２は、パラメータ記憶部５１４内のパラメータ５０１および参照信号を基に、上記の式（１）に従ってエコー成分を推定し、その成分を収音音声から減算してエラー信号を出力する。 The echo component removing unit 512 estimates the echo component according to the above equation (1) based on the parameter 501 and the reference signal in the parameter storage unit 514, subtracts the component from the collected sound, and outputs an error signal. .

一方、パラメータ更新部５１３は、所定の適応アルゴリズムを用いて適応フィルタのパラメータ５０１を更新する。適応アルゴリズムの例として射影法を用いた場合、ｎ回目のパラメータ５０１の更新は次の式（２）により計算される。なお、式（３）は、式（２）中のａ１，ａ２を求めるための行列式であり、ｅはエラー信号の時間領域データを示す。 On the other hand, the parameter update unit 513 updates the parameter 501 of the adaptive filter using a predetermined adaptive algorithm. When the projection method is used as an example of the adaptive algorithm, the n-th update of the parameter 501 is calculated by the following equation (2). Equation (3) is a determinant for obtaining a1 and a2 in Equation (2), and e indicates time domain data of the error signal.

上記の式（２）において、更新量μを大きくすると、パラメータ５０１の学習速度（更新速度）が高くなる。ここで、ダブルトークが発生していない状態では、収音信号中のエコー成分と参照信号との相関が高いため、更新量μを大きくして学習速度を高くした方が、エコー成分を確実に除去して出力音声の品質を向上できる。しかし、ダブルトークが発生した状態では、更新量μを小さくして学習速度を緩めた方が音質はよくなる。 In the above equation (2), when the update amount μ is increased, the learning speed (update speed) of the parameter 501 is increased. Here, when double talk is not occurring, the correlation between the echo component in the collected sound signal and the reference signal is high. Therefore, increasing the amount of update μ to increase the learning speed ensures that the echo component is The quality of output sound can be improved by removing. However, in the state where double talk has occurred, the sound quality is better if the learning rate is decreased by decreasing the update amount μ.

そこで、このエコーキャンセラ１５では、収音信号からエコー成分を除去したエラー信号と参照信号とのコヒーレンスを求め、その値に応じてパラメータの更新量μを調整することで、ダブルトークの発生の有無などに関係なくエラー信号の品質を向上させる。エラー信号と参照信号とのコヒーレンスが高い場合は、除去したいエコー成分と収音信号との相関が高いことになるため、更新量μを高めることでエコー成分をより確実に除去できる。逆に、コヒーレンスが低い場合には、パラメータ５０１がすでに収束しているか、あるいはダブルトークが発生している可能性が高く、更新量μを低下させることでエラー信号の品質を向上できる。 In view of this, the echo canceller 15 obtains the coherence between the error signal obtained by removing the echo component from the collected sound signal and the reference signal, and adjusts the parameter update amount μ according to the value to thereby determine whether or not double talk occurs. Improve the quality of the error signal regardless of. When the coherence between the error signal and the reference signal is high, the correlation between the echo component to be removed and the collected sound signal is high. Therefore, the echo component can be more reliably removed by increasing the update amount μ. Conversely, when the coherence is low, there is a high possibility that the parameter 501 has already converged or double talk has occurred, and the quality of the error signal can be improved by reducing the update amount μ.

図３は、エコーキャンセル処理部５１０の処理の流れを示すフローチャートである。
〔ステップＳ１０１〕エコーキャンセル処理部５１０が音声信号を受信すると、以下の処理が割り込み処理として実行される。 FIG. 3 is a flowchart showing the flow of processing of the echo cancellation processing unit 510.
[Step S101] When the echo cancellation processing unit 510 receives an audio signal, the following processing is executed as interrupt processing.

〔ステップＳ１０２〕エコー成分除去部５１２は、収音信号を受信するとともに、参照信号を参照信号バッファ５１１を介して受信する。
〔ステップＳ１０３〕エコー成分除去部５１２は、パラメータ記憶部５１４から適応フィルタのパラメータ５０１を読み出し、このパラメータ５０１を基に上記式（１）を用いてエコー成分を推定する。 [Step S102] The echo component removal unit 512 receives the collected sound signal and receives the reference signal via the reference signal buffer 511.
[Step S103] The echo component removal unit 512 reads the parameter 501 of the adaptive filter from the parameter storage unit 514, and estimates the echo component using the equation (1) based on the parameter 501.

〔ステップＳ１０４〕エコー成分除去部５１２は、推定したエコー成分を収音信号から減算し、エラー信号を出力する。
〔ステップＳ１０５〕エコー成分除去部５１２から出力されたエラー信号は、パラメータ更新部５１３および音声ＣＯＤＥＣ１４に出力されるとともに、リスト記憶部５１５に供給され、リスト記憶部５１５は、エラー信号をエラー信号リスト５０２に記憶する。また、リスト記憶部５１５は、参照信号バッファ５１１に格納された最新の参照信号を、参照信号リスト５０３に記憶する。 [Step S104] The echo component removal unit 512 subtracts the estimated echo component from the collected sound signal and outputs an error signal.
[Step S105] The error signal output from the echo component removal unit 512 is output to the parameter update unit 513 and the audio CODEC 14, and is also supplied to the list storage unit 515. The list storage unit 515 converts the error signal into the error signal list. Store in 502. The list storage unit 515 stores the latest reference signal stored in the reference signal buffer 511 in the reference signal list 503.

〔ステップＳ１０６〕パラメータ更新部５１３は、μ算出部５２０からの更新量μ、エラー信号、参照信号、およびパラメータ記憶部５１４に記憶されたパラメータ５０１を基にして、式（２）を用いてパラメータ５０１の更新値を算出し、パラメータ記憶部５１４の記憶データを更新する。 [Step S106] The parameter update unit 513 uses the formula (2) to calculate the parameter based on the update amount μ from the μ calculation unit 520, the error signal, the reference signal, and the parameter 501 stored in the parameter storage unit 514. The update value 501 is calculated, and the data stored in the parameter storage unit 514 is updated.

一方、μ算出部５２０による処理は、エコーキャンセル処理部５１０が、音声信号を受信してエコー成分を収音信号から除去し、さらに適応フィルタのパラメータ５０１を更新した後、次の音声信号を受信するまでの残りの時間を利用して実行される。この理由は、エコーキャンセル処理部５１０の処理は音声信号の受信ごとに必ず実行される必要があることにある。そのため、次の音声信号の受信までにμ算出部５２０の処理が完了しない場合には、音声信号の受信時にエコーキャンセル処理部５１０による割り込み処理が実行され、処理終了後にμ算出部５２０の処理が再開される。 On the other hand, in the processing by the μ calculator 520, the echo cancellation processor 510 receives the audio signal, removes the echo component from the collected sound signal, updates the adaptive filter parameter 501, and then receives the next audio signal. It is executed using the remaining time until. The reason for this is that the processing of the echo cancellation processing unit 510 must be executed every time an audio signal is received. Therefore, when the process of the μ calculation unit 520 is not completed before the next audio signal is received, the interrupt process by the echo cancellation processing unit 510 is executed when the audio signal is received, and the process of the μ calculation unit 520 is performed after the process is completed. Resumed.

次に、μ算出部５２０の処理について詳しく説明する。始めに、μ算出処理に用いられるエラー信号リスト５０２および参照信号リスト５０３の例について説明する。
図４は、リスト記憶部５１５の構成例を示す図である。 Next, the process of the μ calculator 520 will be described in detail. First, examples of the error signal list 502 and the reference signal list 503 used for the μ calculation process will be described.
FIG. 4 is a diagram illustrating a configuration example of the list storage unit 515.

図４に示すように、リスト記憶部５１５は、それぞれ２つの記憶領域に分割されたエラー信号リスト５０２ａおよび５０２ｂ、参照信号リスト５０３ａおよび５０３ｂと、これらの記憶領域を管理するためのリスト管理部５１６とを具備している。エラー信号リスト５０２ａおよび５０２ｂでは、その一方が満たされると他方を満たしていくように、エコー成分除去部５１２からのエラー信号が順次蓄積される。参照信号も同様に、参照信号リスト５０３ａおよび５０３ｂを交互に満たすように順次蓄積されていく。 As shown in FIG. 4, the list storage unit 515 includes error signal lists 502a and 502b and reference signal lists 503a and 503b each divided into two storage areas, and a list management unit 516 for managing these storage areas. It is equipped with. In the error signal lists 502a and 502b, error signals from the echo component removal unit 512 are sequentially accumulated so that when one of them is satisfied, the other is satisfied. Similarly, the reference signals are sequentially accumulated so as to alternately fill the reference signal lists 503a and 503b.

これらのエラー信号リスト５０２ａおよび５０２ｂと、参照信号リスト５０３ａおよび５０３ｂは、すべて同じサンプル数の音声データを記憶するようになっており、そのサンプル数は、後述するμ算出部５２０によるＤＦＴ（Discrete Fourier Transform）のポイント数とされる。 These error signal lists 502a and 502b and reference signal lists 503a and 503b all store audio data having the same number of samples, and the number of samples is determined by DFT (Discrete Fourier) by a μ calculation unit 520 described later. (Transform) points.

リスト管理部５１６は、リスト選択フラグＦＬ１およびμ算出許可フラグＦＬ２を保持している。リスト選択フラグＦＬ１は、エコー成分除去部５１２からのエラー信号が現在、エラー信号リスト５０２ａおよび５０２ｂのどちらに入力されているかを示すフラグである。例えば、エラー信号リスト５０２ａにデータ蓄積中の場合「１」とされ、そのリストが満たされてエラー信号リスト５０２ｂへのデータ蓄積が開始されると「０」に反転される。またその場合、「１」であるときは参照信号リスト５０３ａ、「０」であるときは参照信号リスト５０３ｂに対するデータ蓄積中であることも示す。 The list management unit 516 holds a list selection flag FL1 and a μ calculation permission flag FL2. The list selection flag FL1 is a flag indicating which of the error signal lists 502a and 502b the error signal from the echo component removal unit 512 is currently input. For example, “1” is set when data is being stored in the error signal list 502a, and is inverted to “0” when the list is filled and data storage in the error signal list 502b is started. Further, in this case, “1” indicates that data is being stored in the reference signal list 503a, and “0” indicates that data is being stored in the reference signal list 503b.

μ算出許可フラグＦＬ２は、エラー信号リスト５０２ａおよび５０２ｂ（あるいは参照信号リスト５０３ａおよび５０３ｂ）のいずれか一方に、これらを満たすだけのサンプル数の新たな音声データが蓄積されたときに「１」とされる。そして、その音声データがμ算出部５２０のコヒーレンス算出部５２１によって読み出されると、「０」に戻される。ここで、μ算出部５２０による更新量μの算出処理は、新たに蓄積が開始されたエラー信号リスト５０２ａおよび参照信号リスト５０３ａ、またはエラー信号リスト５０２ｂおよび参照信号リスト５０３ｂのいずれかが満たされる時間内に完了するものとする。 The μ calculation permission flag FL2 is set to “1” when new audio data having a number of samples sufficient to satisfy one of the error signal lists 502a and 502b (or the reference signal lists 503a and 503b) is accumulated. Is done. When the audio data is read by the coherence calculation unit 521 of the μ calculation unit 520, the sound data is returned to “0”. Here, the calculation process of the update amount μ by the μ calculation unit 520 is a time during which any of the error signal list 502a and the reference signal list 503a newly started to be accumulated or the error signal list 502b and the reference signal list 503b is satisfied. Be completed within.

μ算出部５２０のコヒーレンス算出部５２１は、コヒーレンスを算出するためにエラー信号リスト５０２ａおよび参照信号リスト５０３ａ、またはエラー信号リスト５０２ｂおよび参照信号リスト５０３ｂのいずれかを読み出すが、このときにリスト選択フラグＦＬ１を参照することでどちらのリストを読み出せばいいかを判定できる。すなわち、リスト選択フラグＦＬ１が示す選択中のリストとは別の他方のリストから、音声データを読み出せばよい。また、μ算出許可フラグＦＬ２を参照することで、新たな音声データの読み出しか可能か否かを判定できる。 The coherence calculation unit 521 of the μ calculation unit 520 reads either the error signal list 502a and the reference signal list 503a or the error signal list 502b and the reference signal list 503b in order to calculate the coherence. It is possible to determine which list should be read by referring to FL1. That is, the audio data may be read from the other list different from the list being selected indicated by the list selection flag FL1. Further, by referring to the μ calculation permission flag FL2, it can be determined whether or not new audio data can be read.

図５は、μ算出部５２０の処理の流れを示すフローチャートである。なお、図５の一連の処理の実行中に、エコーキャンセル処理部５１０において次の音声信号が受信された場合には、エコーキャンセル処理部５１０の処理が割り込み実行され、その実行終了後に図５の続きの処理が実行される。 FIG. 5 is a flowchart showing a processing flow of the μ calculator 520. When the next audio signal is received by the echo cancellation processing unit 510 during the execution of the series of processing of FIG. 5, the processing of the echo cancellation processing unit 510 is interrupted and executed after completion of the execution of FIG. Subsequent processing is executed.

〔ステップＳ２０１〕コヒーレンス算出部５２１は、リスト管理部５１６のμ算出許可フラグＦＬ２を参照し、その値が「１」となったときにステップＳ２０２の処理を実行する。 [Step S201] The coherence calculation unit 521 refers to the μ calculation permission flag FL2 of the list management unit 516, and executes the process of step S202 when the value becomes “1”.

〔ステップＳ２０２〕コヒーレンス算出部５２１は、リスト選択フラグＦＬ１に基づき、所定サンプル数の音声データが新たに蓄積されたエラー信号リスト５０２ａまたは５０２ｂと、参照信号リスト５０３ａまたは５０３ｂからそれぞれ音声データを読み込む。 [Step S202] Based on the list selection flag FL1, the coherence calculation unit 521 reads audio data from the error signal list 502a or 502b in which audio data of a predetermined number of samples is newly accumulated and the reference signal list 503a or 503b, respectively.

〔ステップＳ２０３〕コヒーレンス算出部５２１は、リストから読み出したエラー信号および参照信号をＤＦＴにより周波数領域の値に変換し、コヒーレンスを算出する。具体的には、ＤＦＴの結果に基づき、周波数ｆの成分ごとに、エラー信号および参照信号の各パワースペクトルＷ_xx（ｆ）およびＷ_yy（ｆ）と、各信号のクロススペクトルＷ_xy（ｆ）とを算出する。そして、次の式（４）を用いて周波数ｆに対応するコヒーレンスＣ（ｆ）を算出する。 [Step S203] The coherence calculation unit 521 converts the error signal and reference signal read from the list into values in the frequency domain by DFT, and calculates coherence. Specifically, based on the result of DFT, for each component of frequency f, each power spectrum W _xx (f) and W _yy (f) of the error signal and the reference signal and the cross spectrum W _xy (f) of each signal And calculate. Then, the coherence C (f) corresponding to the frequency f is calculated using the following equation (4).

この式（４）では、最新のＷ_xx（ｆ），Ｗ_yy（ｆ），Ｗ_xy（ｆ）を含む過去Ｍフレーム（Ｍは自然数）のエラー信号および参照信号のスペクトルの平均値を利用している。この個数Ｍの値が小さいほど、収音状態の変化に素早く反応して演算できるものの、算出されるコヒーレンスの値が安定しにくくなる。逆に、個数Ｍを大きくするとコヒーレンスの安定度は向上するが、反応が遅くなるため、反応速度とコヒーレンスの安定度とのバランスを考慮して個数Ｍを決定することが望ましい。 In this equation (4), the average value of the spectrum of the error signal and reference signal of the past M frames (M is a natural number) including the latest W _xx (f), W _yy (f), and W _xy (f) is used. ing. The smaller the number M, the quicker the calculation can be made in response to the change in the sound collection state, but the calculated coherence value becomes less stable. On the contrary, if the number M is increased, the stability of coherence is improved, but the reaction becomes slow. Therefore, it is desirable to determine the number M in consideration of the balance between the reaction rate and the stability of coherence.

なお、このステップＳ２０３では、ＤＦＴの代わりにＦＦＴ（Fast Fourier Transform）などの各種フーリエ変換により、エラー信号および参照信号のスペクトルを算出してもよい。 In step S203, the spectrum of the error signal and the reference signal may be calculated by various Fourier transforms such as FFT (Fast Fourier Transform) instead of DFT.

〔ステップＳ２０４〕コヒーレンス算出部５２１は、コヒーレンスの値を基に平均コヒーレンス値Ｃ＿ａｖｇを算出する。この平均コヒーレンス値Ｃ＿ａｖｇは、ステップＳ２０３で算出した周波数ｆごとのコヒーレンスＣ（ｆ）をすべて加算し、その加算値を、ＤＦＴにより算出された周波数成分の数で除算することで算出する。 [Step S204] The coherence calculator 521 calculates an average coherence value C_avg based on the coherence value. The average coherence value C_avg is calculated by adding all the coherence C (f) for each frequency f calculated in step S203 and dividing the added value by the number of frequency components calculated by DFT.

〔ステップＳ２０５〕コヒーレンス／μ変換部５２２は、算出された平均コヒーレンス値Ｃ＿ａｖｇを、パラメータ５０１の更新量μの値に変換する。上述したように、エラー信号と参照信号との相関が高いほど、すなわち平均コヒーレンス値Ｃ＿ａｖｇが高いほど、更新量μが高くなるように設定することで、エラー信号の品質を向上させることができる。 [Step S205] The coherence / μ conversion unit 522 converts the calculated average coherence value C_avg into a value of the update amount μ of the parameter 501. As described above, the error signal quality can be improved by setting the update amount μ to be higher as the correlation between the error signal and the reference signal is higher, that is, as the average coherence value C_avg is higher.

このコヒーレンス／μ変換部５２２では、例えば平均コヒーレンス値Ｃ＿ａｖｇを定数倍することで更新量μを算出してもよいが、それだけでは全体としてパラメータ５０１の収束速度が遅く、処理の開始から、エラー信号の品質が良好となるようにパラメータ５０１が収束するまで時間がかかる。このため、例えば次の図６に示すような変換グラフを用いて変換を行うようにする。 In the coherence / μ conversion unit 522, for example, the update amount μ may be calculated by multiplying the average coherence value C_avg by a constant. However, as a whole, the convergence speed of the parameter 501 is slow, and an error signal is generated from the start of processing. It takes time until the parameter 501 converges so that the quality of the image becomes good. For this reason, for example, the conversion is performed using a conversion graph as shown in FIG.

〔ステップＳ２０６〕コヒーレンス／μ変換部５２２は、変換した更新量μを、パラメータ更新部５１３に設定する。この後、μ算出部５２０は、例えばユーザの操作入力などに応じて処理が終了されるまで、図５の処理を繰り返す。 [Step S206] The coherence / μ conversion unit 522 sets the converted update amount μ in the parameter update unit 513. Thereafter, the μ calculation unit 520 repeats the process of FIG. 5 until the process is terminated in accordance with, for example, a user operation input.

図６は、コヒーレンス／μ変換処理で用いられる変換グラフの一例を示す図である。
この変換グラフでは、更新量μの最小値が０より大きくなるように変換することで、適応フィルタのパラメータ５０１の収束時間を高め、より短期間で音質向上効果を得られるようにしている。図６では例として、平均コヒーレンス値Ｃ＿ａｖｇが０〜ａ１のときに更新量μを最小値ｃ１とし、平均コヒーレンス値Ｃ＿ａｖｇがａ１〜ｂ１のときに更新量μを一定の割合で増加させ、さらに平均コヒーレンス値Ｃ＿ａｖｇがｂ１〜１のときに更新量μを最大値ｄ１にしている。 FIG. 6 is a diagram illustrating an example of a conversion graph used in the coherence / μ conversion process.
In this conversion graph, conversion is performed so that the minimum value of the update amount μ is larger than 0, so that the convergence time of the parameter 501 of the adaptive filter is increased, and a sound quality improvement effect can be obtained in a shorter period. In FIG. 6, as an example, the update amount μ is set to the minimum value c1 when the average coherence value C_avg is 0 to a1, the update amount μ is increased at a constant rate when the average coherence value C_avg is a1 to b1, and the average When the coherence value C_avg is b1-1, the update amount μ is set to the maximum value d1.

以上説明したように、本実施の形態のエコーキャンセラ１５では、直近のエラー信号および参照信号を蓄積した各リストを基にコヒーレンスを求め、そのコヒーレンスに基づき、エラー信号と参照信号との相関の高さに応じて適応フィルタのパラメータ５０１の更新量μを変化させることで、ダブルトークの発生の有無に関係なく、エコー成分を収音信号からより確実に除去することができる。また、エコー成分の推定には、音声信号を時間領域で処理する適応フィルタを用いており、さらに周波数領域で処理するμ算出部５２０の処理を、エコー成分推定の処理の間にその推定処理間隔以上の周期で行うようにしたことで、効率よく処理できる。 As described above, the echo canceller 15 according to the present embodiment obtains coherence based on each list in which the most recent error signal and reference signal are accumulated, and based on the coherence, the correlation between the error signal and the reference signal is high. By changing the update amount μ of the adaptive filter parameter 501 accordingly, the echo component can be more reliably removed from the collected sound signal regardless of the occurrence of double talk. In addition, for the estimation of the echo component, an adaptive filter that processes the audio signal in the time domain is used. Further, the processing of the μ calculation unit 520 that processes in the frequency domain is performed between the estimation processing intervals between the echo component estimation processes. By carrying out with the above period, it can process efficiently.

［リスト記憶部の他の構成例］
図７は、リスト記憶部の他の構成例を示す図である。
この図７に示すリスト記憶部５１５ａは、図４に示したリスト記憶部５１５に代わって設けられるものであり、エラー信号リスト５０２および参照信号リスト５０３をそれぞれ蓄積するリングバッファ５１７および５１８を具備している。リングバッファ５１７および５１８は、コヒーレンス算出部５２１によりＤＦＴを行うのに必要なサンプル数分の音声データの２倍の容量をそれぞれ備えている。なお、各リングバッファ５１７および５１８は、エコーキャンセラ１５の動作開始時にはすべてデータ「０」で初期化されるものとする。 [Other configuration examples of list storage unit]
FIG. 7 is a diagram illustrating another configuration example of the list storage unit.
The list storage unit 515a shown in FIG. 7 is provided in place of the list storage unit 515 shown in FIG. 4, and includes ring buffers 517 and 518 for storing the error signal list 502 and the reference signal list 503, respectively. ing. Each of the ring buffers 517 and 518 has a capacity twice as large as the number of samples of audio data necessary for performing the DFT by the coherence calculation unit 521. The ring buffers 517 and 518 are all initialized with data “0” when the operation of the echo canceller 15 is started.

また、リスト記憶部５１５ａはさらに、リスト管理部５１８を具備しており、このリスト管理部５１８には、リングバッファ５１７および５１８における読み出し可能位置を示すカウンタであるリングカウンタ５１９を具備するとともに、μ算出許可フラグＦＬ３を保持している。リングカウンタ５１９は、各リングバッファ５１７および５１８の記憶領域が１／４だけ音声データで埋まるごとにカウント値を更新する。そして、カウント値の更新時にμ算出許可フラグＦＬ３が「１」とされ、該当する読み出し位置から音声データがコヒーレンス算出部５２１に読み出されると、μ算出許可フラグＦＬ３が「０」に反転される。 The list storage unit 515a further includes a list management unit 518. The list management unit 518 includes a ring counter 519 that is a counter indicating a readable position in the ring buffers 517 and 518, and μ The calculation permission flag FL3 is held. The ring counter 519 updates the count value every time the storage area of each of the ring buffers 517 and 518 is filled with audio data by ¼. When the count value is updated, the μ calculation permission flag FL3 is set to “1”, and when the audio data is read from the corresponding reading position to the coherence calculation unit 521, the μ calculation permission flag FL3 is inverted to “0”.

ここで、μ算出部５２０による更新量μの算出処理が、各リングバッファ５１７および５１８の１／４の容量分だけ音声データが入力される時間内に完了できるものとする。このとき、コヒーレンス算出部５２１は、μ算出許可フラグＦＬ３が「１」となると、リングカウンタ５１９のカウント値を基に、リングバッファ５１７および５１８の対応する位置から、各バッファの１／２の容量分のエラー信号および参照信号を読み出し、コヒーレンスの演算を行う。 Here, it is assumed that the calculation process of the update amount μ by the μ calculation unit 520 can be completed within a time when the voice data is input by the capacity of ¼ of each of the ring buffers 517 and 518. At this time, when the μ calculation permission flag FL3 becomes “1”, the coherence calculation unit 521 starts from the corresponding position of the ring buffers 517 and 518 based on the count value of the ring counter 519, and ½ capacity of each buffer. Minute error signal and reference signal are read, and coherence calculation is performed.

例えば、リングバッファ５１７の記憶領域がデータの蓄積順に４つの均等な領域５１７ａ〜５１７ｄを持つとすると、領域５１７ｂまでエラー信号が満たされたときは、領域５１７ａおよび５１７ｂに記憶されたエラー信号がコヒーレンス算出部５２１に読み出される。次に、領域５１７ｃまでエラー信号が満たされたときは、領域５１７ｂおよび５１７ｃのエラー信号が読み出される。 For example, if the storage area of the ring buffer 517 has four equal areas 517a to 517d in the data accumulation order, when the error signal is filled up to the area 517b, the error signals stored in the areas 517a and 517b are coherent. The data is read by the calculation unit 521. Next, when the error signal is satisfied up to the area 517c, the error signals in the areas 517b and 517c are read.

このような動作により、例えば図４に示したリスト記憶部５１５と比較すると、更新量μの出力周期を１／２にして、適応フィルタのパラメータ５０１をより的確に更新できる。また、エラー信号リスト５０２および参照信号リスト５０３を蓄積するためのバッファ容量を増加させることなく、コヒーレンス算出に用いる音声データのサンプル数に変わりはなく、常に最新の音声データを使用して演算を行うことが可能となり、エラー信号の品質をより向上させることができる。 By such an operation, for example, compared with the list storage unit 515 shown in FIG. 4, the output period of the update amount μ can be halved, and the adaptive filter parameter 501 can be updated more accurately. Further, without increasing the buffer capacity for storing the error signal list 502 and the reference signal list 503, the number of audio data samples used for coherence calculation remains unchanged, and the calculation is always performed using the latest audio data. And the quality of the error signal can be further improved.

なお、この図７の例では、各リングバッファ５１７および５１８の記憶領域の１／４ずつ読み出すようにしたが、これに限らずコヒーレンスの計算を行うタイミングは自在に変更することが可能である。 In the example of FIG. 7, 1/4 of the storage areas of the ring buffers 517 and 518 are read. However, the present invention is not limited to this, and the timing for calculating the coherence can be freely changed.

［コヒーレンス算出の他の処理例］
コヒーレンス算出部５２１において平均コヒーレンス値Ｃ＿ａｖｇを求める場合には、より多くの（すなわち長い期間の）コヒーレンスＣ（ｆ）を用いることで、平均コヒーレンス値Ｃ＿ａｖｇの安定度を高めることができる。しかしその反面、演算処理の負荷が大きくなり、また内部に必要なメモリの容量も大きくなる。 [Other processing examples of coherence calculation]
When the average coherence value C_avg is obtained by the coherence calculation unit 521, the stability of the average coherence value C_avg can be increased by using more (that is, a long period) coherence C (f). However, on the other hand, the processing load increases and the memory capacity required inside increases.

そこで、ＤＦＴにより求めた周波数成分の一部のみを利用して、平均コヒーレンス値Ｃ＿ａｖｇを求めるようにしてもよい。例えば、ＤＦＴにより求めた周波数成分を１つずつ間引き、間引き後の周波数成分においてコヒーレンスＣ（ｆ）を求め、その平均コヒーレンス値Ｃ＿ａｖｇを算出する。これにより、同じメモリ量や処理能力を持つ場合にも、より長い期間の音声データに基づいて演算できるようになる。また、演算に用いる周波数成分を、例えば、周波数の低い順に所定の数だけ選択する、ランダムに選択するといった方法を採ってもよい。 Therefore, the average coherence value C_avg may be obtained by using only a part of the frequency component obtained by DFT. For example, the frequency components obtained by DFT are thinned out one by one, the coherence C (f) is obtained in the frequency components after the thinning, and the average coherence value C_avg is calculated. As a result, even when the memory capacity and processing capability are the same, calculation can be performed based on audio data for a longer period. In addition, for example, a predetermined number of frequency components used for calculation may be selected in ascending order of frequency, or may be selected at random.

さらに、平均コヒーレンス値Ｃ＿ａｖｇの演算時に、周波数成分ごとに重み付けを行ってもよい。例えば、マイクロフォンやスピーカ、ＣＯＤＥＣの特性などに応じて重み付けの割合を設定する。あるいは、電子会議装置を設置する部屋の大きさや、マイクロフォン、スピーカの数などに応じて、重み付けの割合を設定変更可能にしてもよい。これにより、定常ノイズを含む周波数成分をあまり考慮しないようにするなど、状況に応じて更新量μのとる値を変更してエラー信号の品質低下を抑制することが可能となる。 Furthermore, weighting may be performed for each frequency component when calculating the average coherence value C_avg. For example, the weighting ratio is set according to the characteristics of the microphone, speaker, CODEC, and the like. Alternatively, the weighting ratio may be set and changed according to the size of the room in which the electronic conference apparatus is installed, the number of microphones and speakers, and the like. As a result, it is possible to change the value taken by the update amount μ according to the situation, for example, so as not to take into account frequency components including stationary noise so much, and to suppress degradation of the quality of the error signal.

一方、μ算出部５２０の内部メモリ量やリスト記憶部５１５の記憶容量を抑えたために、コヒーレンス算出に用いる音声データのサンプル数（サンプル期間）が短くなった場合には、平均コヒーレンス値Ｃ＿ａｖｇの演算において時定数Ｄ（ただし、０＜Ｄ＜１）を用いることで、演算結果の安定性を向上させるようにしてもよい。この場合、Ｎターン目に算出される平均コヒーレンス値Ｃ＿ａｖｇ（Ｎ）を、例えば次の式（５）で算出する。なお、Ｎターン目において、周波数成分ごとに算出されたコヒーレンスＣ（ｆ）をすべて加算した後、その周波数成分の数で除算した値を、Ｃ＿ｓｍａｌｌ＿ａｖｅ（Ｎ）とする。
Ｃ＿ａｖｇ（Ｎ）＝｛Ｄ×Ｃ＿ａｖｇ（Ｎ−１）｝＋｛（１−Ｄ）×Ｃ＿ｓｍａｌｌ＿ａｖｅ（Ｎ）｝ ……（５）
このような演算を行うことにより、音声データのサンプル数が少ない場合にも平均コヒーレンス値Ｃ＿ａｖｇの値が安定化されて、エラー信号の品質が高まる。なお、時定数Ｄを大きくすると算出結果の安定性は高まるものの、音声信号の成分変化に対する追従性が低くなるため、それらのバランスを考慮して時定数Ｄを設定することが望ましい。 On the other hand, when the number of samples (sampling period) of the audio data used for coherence calculation is reduced because the internal memory amount of the μ calculation unit 520 and the storage capacity of the list storage unit 515 are reduced, the average coherence value C_avg is calculated. The time constant D (where 0 <D <1) may be used to improve the stability of the calculation result. In this case, the average coherence value C_avg (N) calculated at the Nth turn is calculated by the following equation (5), for example. In addition, in the Nth turn, after adding all the coherence C (f) calculated for each frequency component, a value obtained by dividing by the number of the frequency components is defined as C_small_ave (N).
C_avg (N) = {D × C_avg (N−1)} + {(1−D) × C_small_ave (N)} (5)
By performing such calculation, the average coherence value C_avg is stabilized even when the number of audio data samples is small, and the quality of the error signal is increased. If the time constant D is increased, the stability of the calculation result is increased, but the followability to changes in the components of the audio signal is lowered. Therefore, it is desirable to set the time constant D in consideration of such balance.

［平均コヒーレンス／μ変換の他の処理例］
コヒーレンス／μ変換部５２２において平均コヒーレンス値Ｃ＿ａｖｇを更新量μに変換する際に用いる変換グラフは、図６に示したものに限らず、以下のように様々なパターンを用いることができる。 [Other processing examples of average coherence / μ conversion]
The conversion graph used when the average coherence value C_avg is converted into the update amount μ in the coherence / μ conversion unit 522 is not limited to that shown in FIG. 6, and various patterns can be used as follows.

図８〜図１０は、コヒーレンス／μ変換処理で用いられる変換グラフの他の例を示す図である。
図８（Ａ）では、平均コヒーレンス値Ｃ＿ａｖｇが０から１まで上昇するに連れて、更新量μを最小値ａ２から最大値ｂ２まで一定の割合で増加させている。図８（Ｂ）では、平均コヒーレンス値Ｃ＿ａｖｇがしきい値ａ３より小さいときに更新量μを最小値ｂ３、平均コヒーレンス値がしきい値ａ３以上のとき更新量μを最大値ｃ３として、変換処理を単純化している。 8 to 10 are diagrams illustrating other examples of conversion graphs used in the coherence / μ conversion processing.
In FIG. 8A, as the average coherence value C_avg increases from 0 to 1, the update amount μ is increased from the minimum value a2 to the maximum value b2 at a constant rate. In FIG. 8B, the conversion processing is performed by setting the update amount μ as the minimum value b3 when the average coherence value C_avg is smaller than the threshold value a3, and the update amount μ as the maximum value c3 when the average coherence value is equal to or greater than the threshold value a3. Is simplified.

図９（Ａ）では、平均コヒーレンス値Ｃ＿ａｖｇがａ４，ｂ４，ｃ４，ｄ４となる場合を境界として更新量μの上昇率（ただし０以上）を変えており、更新量をより細かく制御できるようになっている。図９（Ｂ）では、平均コヒーレンス値Ｃ＿ａｖｇの上昇に連れて、更新量μが最小値ａ５から最大値ｂ５まで滑らかな曲線状に上昇している。 In FIG. 9A, the rate of increase of the update amount μ (however, 0 or more) is changed with the case where the average coherence value C_avg becomes a4, b4, c4, d4 so that the update amount can be controlled more finely. It has become. In FIG. 9B, as the average coherence value C_avg increases, the update amount μ increases in a smooth curve from the minimum value a5 to the maximum value b5.

さらに図１０のように、平均コヒーレンス値Ｃ＿ａｖｇと更新量μとの対応を示す２つの曲線５２２ａおよび５２２ｂを用意し、状況に応じて使い分けるようにしてもよい。例えば、平均コヒーレンス値Ｃ＿ａｖｇの履歴を保持しておき、最近の一定期間に算出された平均コヒーレンス値Ｃ＿ａｖｇが高い傾向にある場合（例えばそれらの平均値がしきい値以上の場合）には、エラー信号と参照信号との関連度が高く、エラー信号中にエコー成分が多く残存していると考えられるので、曲線５２２ａを用いて変換し、適応フィルタのパラメータ５０１の収束速度を高めるようにする。逆に最近の平均コヒーレンス値Ｃ＿ａｖｇが低い傾向にある場合には、曲線５２２ｂを用いて変換することで、パラメータ５０１を安定化して音質劣化を防止するなどといった使い分けを行う。また、使用する変換グラフを３つ以上用意して、より細かい条件に応じて使い分けるようにしてもよい。 Furthermore, as shown in FIG. 10, two curves 522a and 522b indicating the correspondence between the average coherence value C_avg and the update amount μ may be prepared and used depending on the situation. For example, when the history of the average coherence value C_avg is held and the average coherence value C_avg calculated in a recent fixed period tends to be high (for example, when the average value is equal to or greater than a threshold value), an error occurs. Since the degree of relevance between the signal and the reference signal is high and it is considered that many echo components remain in the error signal, conversion is performed using the curve 522a to increase the convergence speed of the parameter 501 of the adaptive filter. Conversely, when the recent average coherence value C_avg tends to be low, the conversion is performed using the curve 522b to stabilize the parameter 501 and prevent deterioration in sound quality. Also, three or more conversion graphs to be used may be prepared and used according to more detailed conditions.

さらに、平均コヒーレンス値Ｃ＿ａｖｇがあるしきい値より低くなったときには、適応フィルタのパラメータ５０１の更新を停止させてもよい。更新を停止させるためには、更新量μを０にする手法、コヒーレンス／μ変換部５２２が更新停止信号を出力し、パラメータ更新部５１３がこの更新停止信号を受信した場合には動作を停止する手法などが適用できる。さらに、更新停止時あるいは更新停止からの復帰時においては、例えば以下の図１１あるいは図１２で用いられる条件に基づいて処理を実行してもよい。 Further, when the average coherence value C_avg becomes lower than a certain threshold value, the update of the adaptive filter parameter 501 may be stopped. In order to stop the update, a method of setting the update amount μ to 0, the coherence / μ conversion unit 522 outputs an update stop signal, and when the parameter update unit 513 receives this update stop signal, the operation is stopped. Techniques can be applied. Further, at the time of stopping the update or at the time of returning from the update stop, for example, the process may be executed based on the conditions used in FIG. 11 or FIG.

なお、上記の各例ではいずれも、平均コヒーレンス値Ｃ＿ａｖｇの増加に伴う更新量μの変化量が０以上となっている。しかし通常、平均コヒーレンス値Ｃ＿ａｖｇが１に近すぎる場合には不正確な検出が行われている可能性があるため、例えば平均コヒーレンス値Ｃ＿ａｖｇがあるしきい値を超えたときには、更新量μを低下させるような変換グラフを用いてもよい。 In each of the above examples, the change amount of the update amount μ accompanying the increase in the average coherence value C_avg is 0 or more. However, in general, when the average coherence value C_avg is too close to 1, there is a possibility that inaccurate detection is performed. For example, when the average coherence value C_avg exceeds a certain threshold value, the update amount μ is decreased. Such a conversion graph may be used.

図１１は、平均コヒーレンス値Ｃ＿ａｖｇを更新量μに変換する他の処理例（その１）を示すフローチャートである。なお、この図１１の処理は、コヒーレンス／μ変換部５２２によって実行される図５のステップＳ２０５の処理に対応する。 FIG. 11 is a flowchart showing another processing example (part 1) for converting the average coherence value C_avg into the update amount μ. The process of FIG. 11 corresponds to the process of step S205 of FIG. 5 executed by the coherence / μ conversion unit 522.

〔ステップＳ３０１〕算出した平均コヒーレンス値Ｃ＿ａｖｇが、しきい値Ｔｈ１未満であるか否かを判定し、しきい値Ｔｈ１未満である場合はステップＳ３０２に、しきい値Ｔｈ１以上である場合はステップＳ３０５に進む。 [Step S301] It is determined whether or not the calculated average coherence value C_avg is less than the threshold value Th1, and if it is less than the threshold value Th1, the process proceeds to step S302. If it is greater than or equal to the threshold value Th1, step S305 is performed. Proceed to

〔ステップＳ３０２〕変数ｐに「１」を加算する。
〔ステップＳ３０３〕変数ｐがしきい値Ｔｈ２以上であるか否かを判定する。しきい値Ｔｈ２以上である場合はステップＳ３０４に進み、しきい値Ｔｈ２未満である場合はステップＳ３０７に進む。 [Step S302] "1" is added to the variable p.
[Step S303] It is determined whether or not the variable p is greater than or equal to a threshold value Th2. If it is greater than or equal to the threshold value Th2, the process proceeds to step S304, and if it is less than the threshold value Th2, the process proceeds to step S307.

〔ステップＳ３０４〕パラメータ更新部５１３に対する更新停止信号をＨレベルとする。
〔ステップＳ３０５〕変数ｐを「０」に初期化する。 [Step S304] An update stop signal for the parameter update unit 513 is set to H level.
[Step S305] The variable p is initialized to “0”.

〔ステップＳ３０６〕更新停止信号をＬレベルにする。
〔ステップＳ３０７〕所定の変換グラフに基づき、平均コヒーレンス値Ｃ＿ａｖｇを更新量μに変換し、パラメータ更新部５１３に設定する。 [Step S306] The update stop signal is set to L level.
[Step S307] Based on a predetermined conversion graph, the average coherence value C_avg is converted into an update amount μ and set in the parameter update unit 513.

以上の処理では、平均コヒーレンス値Ｃ＿ａｖｇが、しきい値Ｔｈ２の示す回数以上連続してしきい値Ｔｈ１より小さくなった場合には、パラメータ更新部５１３におけるパラメータ５０１の更新処理を停止させる（ステップＳ３０１〜Ｓ３０２）。そしてその後、平均コヒーレンス値Ｃ＿ａｖｇがしきい値Ｔｈ１以上となったときに、変換グラフに基づいて変換した更新量μをパラメータ更新部５１３に設定して、更新処理を再開させる。従って、平均コヒーレンス値Ｃ＿ａｖｇが低い傾向にある場合には、適応フィルタのパラメータ５０１を安定化して音質劣化を防止することができる。 In the above process, when the average coherence value C_avg is continuously smaller than the threshold value Th1 for the number of times indicated by the threshold value Th2, the parameter update unit 513 stops the update process of the parameter 501 (step S301). ~ S302). After that, when the average coherence value C_avg is equal to or greater than the threshold value Th1, the update amount μ converted based on the conversion graph is set in the parameter update unit 513, and the update process is resumed. Therefore, when the average coherence value C_avg tends to be low, the adaptive filter parameter 501 can be stabilized to prevent deterioration in sound quality.

図１２は、平均コヒーレンス値Ｃ＿ａｖｇを更新量μに変換する他の処理例（その２）を示すフローチャートである。なお、この図１２の処理も同様に、コヒーレンス／μ変換部５２２によって実行される図５のステップＳ２０５の処理に対応する。 FIG. 12 is a flowchart showing another processing example (part 2) for converting the average coherence value C_avg into the update amount μ. The processing in FIG. 12 also corresponds to the processing in step S205 in FIG. 5 executed by the coherence / μ conversion unit 522.

〔ステップＳ４０１〕算出した平均コヒーレンス値Ｃ＿ａｖｇが、しきい値Ｔｈ３未満であるか否かを判定し、しきい値Ｔｈ３未満である場合はステップＳ４０２に、しきい値Ｔｈ３以上である場合はステップＳ４０４に進む。 [Step S401] It is determined whether or not the calculated average coherence value C_avg is less than the threshold value Th3. If the calculated average coherence value C_avg is less than the threshold value Th3, the process proceeds to step S402. Proceed to

〔ステップＳ４０２〕変数ｑを「０」に初期化する。
〔ステップＳ４０３〕更新停止信号をＨレベルにする。
〔ステップＳ４０４〕変数ｑに「１」を加算する。 [Step S402] The variable q is initialized to “0”.
[Step S403] The update stop signal is set to H level.
[Step S404] "1" is added to the variable q.

〔ステップＳ４０５〕変数ｑがしきい値Ｔｈ４以上であるか否かを判定する。しきい値Ｔｈ４以上である場合はステップＳ４０６へ、しきい値Ｔｈ４未満である場合はステップＳ４０３に進む。 [Step S405] It is determined whether or not the variable q is greater than or equal to a threshold value Th4. If it is greater than or equal to the threshold value Th4, the process proceeds to step S406. If it is less than the threshold value Th4, the process proceeds to step S403.

〔ステップＳ４０６〕更新停止信号をＬレベルにする。
〔ステップＳ４０７〕所定の変換グラフに基づき、平均コヒーレンス値Ｃ＿ａｖｇを更新量μに変換し、パラメータ更新部５１３に設定する。 [Step S406] The update stop signal is set to L level.
[Step S407] Based on a predetermined conversion graph, the average coherence value C_avg is converted into an update amount μ and set in the parameter update unit 513.

以上の処理では、平均コヒーレンス値Ｃ＿ａｖｇが１回でもしきい値Ｔｈ３未満になれば、パラメータ更新部５１３の更新処理が停止される。そしてその後は、平均コヒーレンス値Ｃ＿ａｖｇが、しきい値Ｔｈ４の示す回数以上連続してしきい値Ｔｈ３以上になるまで、パラメータ５０１の更新処理が再開されない。従って、平均コヒーレンス値Ｃ＿ａｖｇがほぼしきい値Ｔｈ３以下で変動している場合に、適応フィルタのパラメータ５０１を安定化し、音質の変化が不自然にならないようにすることができる。 In the above process, if the average coherence value C_avg is less than the threshold value Th3 even once, the update process of the parameter update unit 513 is stopped. Thereafter, the update process of the parameter 501 is not resumed until the average coherence value C_avg is continuously equal to or greater than the threshold value Th3 for the number of times indicated by the threshold value Th4. Therefore, when the average coherence value C_avg fluctuates substantially below the threshold value Th3, the adaptive filter parameter 501 can be stabilized so that the change in sound quality does not become unnatural.

なお、以上の実施の形態では、本発明を電子会議端末に適用した場合について説明したが、これに限らず、例えば上記電子会議システムに用いるマイクロフォンや、電話機などの双方向で音声を送受信する端末に対して、本発明を適用したエコーキャンセラを搭載することができる。さらに、マイクロフォンと音声送受信のための電子会議端末などの端末との双方に適用することも可能である。 In the above embodiment, the case where the present invention is applied to an electronic conference terminal has been described. However, the present invention is not limited to this, and for example, a microphone that is used in the electronic conference system or a terminal that transmits and receives audio in two directions such as a telephone. On the other hand, an echo canceller to which the present invention is applied can be mounted. Further, the present invention can be applied to both a microphone and a terminal such as an electronic conference terminal for voice transmission / reception.

また、上記の処理機能は、コンピュータによって実現することができる。その場合、上記実施の形態で示したエコーキャンセラが有すべき機能の処理内容を記述したプログラムが提供され、そのプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等がある。磁気記録装置には、ハードディスク装置（ＨＤＤ）、フレキシブルディスク（ＦＤ）、磁気テープなどがある。光ディスクには、ＤＶＤ（Digital Versatile Disk）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）などがある。光磁気記録媒体には、ＭＯ（Magneto-Optical disk）などがある。 Further, the above processing functions can be realized by a computer. In that case, a program describing the processing contents of the functions that the echo canceller described in the above embodiment should have is provided, and the processing functions are realized on the computer by executing the program on the computer. The program describing the processing contents can be recorded on a computer-readable recording medium. Examples of the computer-readable recording medium include a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory. Examples of the magnetic recording device include a hard disk device (HDD), a flexible disk (FD), and a magnetic tape. Examples of the optical disk include a DVD (Digital Versatile Disk), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only Memory), and a CD-R (Recordable) / RW (ReWritable). Magneto-optical recording media include MO (Magneto-Optical disk).

プログラムを流通させる場合には、例えば、そのプログラムが記録されたＤＶＤ、ＣＤ−ＲＯＭなどの可搬型記録媒体が販売される。また、プログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することもできる。 When distributing the program, for example, a portable recording medium such as a DVD or a CD-ROM in which the program is recorded is sold. It is also possible to store the program in a storage device of a server computer and transfer the program from the server computer to another computer via a network.

プログラムを実行するコンピュータは、例えば、可搬型記録媒体に記録されたプログラムまたはサーバコンピュータから転送されたプログラムを、自己の記憶装置に格納する。そして、コンピュータは、自己の記憶装置からプログラムを読み取り、プログラムに従った処理を実行する。なお、コンピュータは、可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することもできる。また、コンピュータは、サーバコンピュータからプログラムが転送されるごとに、逐次、受け取ったプログラムに従った処理を実行することもできる。 The computer that executes the program stores, for example, the program recorded on the portable recording medium or the program transferred from the server computer in its own storage device. Then, the computer reads the program from its own storage device and executes processing according to the program. The computer can also read the program directly from the portable recording medium and execute processing according to the program. Further, each time the program is transferred from the server computer, the computer can sequentially execute processing according to the received program.

実施の形態に係る電子会議システムの構成例を示す図である。It is a figure which shows the structural example of the electronic conference system which concerns on embodiment. エコーキャンセラの内部構成例を示す図である。It is a figure which shows the internal structural example of an echo canceller. エコーキャンセル処理部の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of an echo cancellation process part. リスト記憶部の構成例を示す図である。It is a figure which shows the structural example of a list memory | storage part. μ算出部の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of (mu) calculation part. コヒーレンス／μ変換処理で用いられる変換グラフの一例を示す図である。It is a figure which shows an example of the conversion graph used by a coherence / micro conversion process. リスト記憶部の他の構成例を示す図である。It is a figure which shows the other structural example of a list memory | storage part. コヒーレンス／μ変換処理で用いられる変換グラフの他の例（その１）を示す図である。It is a figure which shows the other example (the 1) of the conversion graph used by a coherence / micro conversion process. コヒーレンス／μ変換処理で用いられる変換グラフの他の例（その２）を示す図である。It is a figure which shows the other example (the 2) of the conversion graph used by a coherence / micro conversion process. コヒーレンス／μ変換処理で用いられる変換グラフの他の例（その３）を示す図である。It is a figure which shows the other example (the 3) of the conversion graph used by a coherence / micro conversion process. 平均コヒーレンス値を更新量に変換する他の処理例（その１）を示すフローチャートである。It is a flowchart which shows the other process example (the 1) which converts an average coherence value into update amount. 平均コヒーレンス値を更新量に変換する他の処理例（その２）を示すフローチャートである。It is a flowchart which shows the other process example (the 2) which converts an average coherence value into update amount.

Explanation of symbols

１０……電子会議端末、１１……ネットワークＩ／Ｆ、１２……画像ＣＯＤＥＣ、１３……画像Ｉ／Ｆ、１３ａ……カメラ、１３ｂ……モニタ、１４……音声ＣＯＤＥＣ、１５……エコーキャンセラ、１６……音声Ｉ／Ｆ、１６ａ……マイクロフォン、１６ｂ……スピーカ、２０……電子会議端末、３０……ネットワーク、５０１……適応フィルタパラメータ、５０２……エラー信号リスト、５０３……参照信号リスト、５１０……エコーキャンセル処理部、５１１……参照信号バッファ、５１２……エコー成分除去部、５１３……パラメータ更新部、５１４……パラメータ記憶部、５１５……リスト記憶部、５２０……μ算出部、５２１……コヒーレンス算出部、５２２……コヒーレンス／μ変換部
DESCRIPTION OF SYMBOLS 10 ... Teleconference terminal, 11 ... Network I / F, 12 ... Image CODEC, 13 ... Image I / F, 13a ... Camera, 13b ... Monitor, 14 ... Voice CODEC, 15 ... Echo canceller , 16... Voice I / F, 16 a... Microphone, 16 b. Speaker, 20... Teleconference terminal, 30... Network 501. List 510 ... Echo cancel processing unit 511 ... Reference signal buffer 512 ... Echo component removal unit 513 ... Parameter update unit 514 ... Parameter storage unit 515 ... List storage unit 520 ... μ Calculation unit 521 ... Coherence calculation unit 522 ... Coherence / μ conversion unit

Claims

In an echo removing apparatus that removes an echo component generated by a wraparound of a reproduced and output sound from a sound collection signal,
Echo component removal means for estimating the echo component from a reference signal corresponding to the collected sound signal and the sound to be reproduced and output by a time domain adaptive filter, and removing the echo component from the collected sound signal;
Parameter updating means for updating parameters of the adaptive filter;
An update amount specifying means for specifying an update amount of the parameter by the parameter update means according to the coherence between the error signal obtained by removing the echo component from the collected sound signal and the reference signal;
An echo removing apparatus comprising:

The update amount designation means includes:
An average value for calculating an average coherence value by adding the coherence calculated based on the collected sound signal and the reference signal input for a certain period for each frequency component and dividing the added value by the number of the frequency components. A calculation means;
Data conversion means for converting the average coherence value into an update amount of the parameter;
The echo removing apparatus according to claim 1, further comprising:

The data conversion unit converts the average coherence value into an update amount of the parameter so that a change amount of the update amount of the parameter becomes 0 or more with respect to an increase in the average coherence value. Item 3. The echo canceller according to Item 2.

The echo removal according to claim 2, wherein the data conversion means changes a conversion pattern when converting the average coherence value into the update amount according to a tendency of the average coherence value in a previous predetermined period. apparatus.

The average value calculating means calculates the average coherence value using coherence corresponding to a part of frequency components among coherence calculated based on the collected sound signal and the reference signal. 2. The echo canceller according to 2.

The average value calculation unit calculates the average coherence value using a value obtained by weighting the coherence calculated based on the collected sound signal and the reference signal for each frequency component. Echo removal device.

The average value calculating means outputs a value obtained by adding the current average coherence value and the average coherence value calculated in the previous period at a predetermined ratio based on a time constant,
The echo removal apparatus according to claim 2, wherein the data conversion unit converts the addition value into an update amount of the parameter.

3. The parameter updating unit temporarily stops updating the parameter when the average coherence value calculated by the average value calculating unit is lower than a predetermined threshold value. Echo removal device.

In an electronic conference device that realizes a conference with a remote place by transmitting and receiving audio signals and video signals to each other through a network,
Echo component removal means for estimating an echo component from a sound collection signal and a reference signal corresponding to the sound to be reproduced and output by a time domain adaptive filter, and removing the echo component from the sound collection signal;
Parameter updating means for updating parameters of the adaptive filter;
An update amount specifying means for specifying an update amount of the parameter by the parameter update means according to the coherence between the error signal obtained by removing the echo component from the collected sound signal and the reference signal;
An electronic conference apparatus comprising an echo removing unit having

In an echo removal method for removing an echo component generated by a wraparound of a reproduced and output sound from a sound collection signal,
An echo component removing unit estimating the echo component by a time domain adaptive filter from the collected sound signal and a reference signal corresponding to the sound to be reproduced and output, and removing the echo component from the collected sound signal;
Parameter updating means updating the parameters of the adaptive filter;
An update amount designation means designates an update amount of the parameter by the parameter update means according to the coherence between the error signal obtained by removing the echo component from the collected sound signal and the reference signal;
An echo removal method comprising:

In an echo removal program for causing a computer to execute a process of removing an echo component generated by wraparound of a reproduced and output sound into a sound collection unit,
An echo component removing means for estimating the echo component from the collected sound signal and a reference signal corresponding to the sound to be reproduced and output by a time domain adaptive filter, and removing the echo component from the collected sound signal;
Parameter updating means for updating parameters of the adaptive filter;
An update amount specifying means for specifying an update amount of the parameter by the parameter update means according to the coherence between the error signal obtained by removing the echo component from the collected sound signal and the reference signal;
An echo removal program for causing the computer to function as: