JP2013012841A

JP2013012841A - Echo canceller, and method and program therefor

Info

Publication number: JP2013012841A
Application number: JP2011143121A
Authority: JP
Inventors: Shoichiro Saito; 翔一郎齊藤; Suehiro Shimauchi; 末廣島内; Sumitaka Sakauchi; 澄宇阪内
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-06-28
Filing date: 2011-06-28
Publication date: 2013-01-17
Anticipated expiration: 2031-06-28
Also published as: JP5235226B2

Abstract

PROBLEM TO BE SOLVED: To provide a technique for estimating a delay amount of a reproduction signal using an echo signal, to cancel the echo signal.SOLUTION: An echo canceller calculates an index of similarity between a frequency-domain reproduction signal and each of a plurality of frequency-domain sound collection signals, so as to obtain as a delay value a time difference between a time corresponding to the frequency-domain reproduction signal and a time corresponding to the frequency-domain sound collection signal that produces the largest similarity between the frequency-domain reproduction signal and the frequency-domain sound collection signal, indicated by the calculated index of similarity. Based on the delay value, the echo canceller delays the reproduction signal to cancel the echo signal from the sound collection signal using the delayed reproduction signal.

Description

本発明は、収音信号に含まれる反響信号を用いて再生信号の遅延量を推定し、反響信号を消去する技術に関する。 The present invention relates to a technique for estimating a delay amount of a reproduction signal using an echo signal included in a collected sound signal and deleting the echo signal.

ハンズフリーによる双方向通話を行う際に、エコー消去装置が通常用いられる。エコー消去装置では、スピーカへ出力する再生信号を参照信号として用い、部屋の反響特性を模擬したフィルタリングを行い、疑似反響信号を生成し、疑似反響信号をマイクロホンの収音信号から差し引くことでエコーを消去する。 An echo canceller is usually used when performing a hands-free two-way call. The echo canceller uses the playback signal output to the speaker as a reference signal, performs filtering that simulates the echo characteristics of the room, generates a pseudo echo signal, and subtracts the echo signal from the microphone's sound collection signal to generate an echo. to erase.

フィルタリングを行う際に用いる適応フィルタの更新アルゴリズムの一つとしてNormalized Least Mean Square（ＮＬＭＳ）アルゴリズムが知られている（非特許文献１参照）。このアルゴリズムはエコー消去装置においても最も頻繁に使われるものの一つである。 A Normalized Least Mean Square (NLMS) algorithm is known as one of the adaptive filter update algorithms used for filtering (see Non-Patent Document 1). This algorithm is one of the most frequently used echo cancellers.

Simon Haykin, "Adaptive Filter Theory", Prentice Hall Internation al Inc, 1996, third edition, p.432-437.Simon Haykin, "Adaptive Filter Theory", Prentice Hall Internation al Inc, 1996, third edition, p.432-437.

通常は、このＮＬＭＳアルゴリズムによってエコー消去が可能である。しかし、スピーカの再生信号からマイクロホンの収音信号までの遅延が長大である場合、反響信号の到達時間が適応フィルタのタップ長よりも長くなり、適応フィルタは反響路を模擬できず、エコー消去量が大幅に低下することがある。また、長大な遅延に対応するために適応フィルタのタップ長を長大に設定する方法も考えられるが、その場合、適応フィルタにおける演算量が非常に大きくなってしまう。 Normally, echo cancellation is possible with this NLMS algorithm. However, when the delay from the playback signal of the speaker to the sound pickup signal of the microphone is long, the arrival time of the echo signal becomes longer than the tap length of the adaptive filter, and the adaptive filter cannot simulate the echo path, and the echo cancellation amount May drop significantly. Also, a method of setting the tap length of the adaptive filter to be long in order to cope with a long delay can be considered, but in this case, the amount of calculation in the adaptive filter becomes very large.

遅延が長大となる例として、家庭用ディジタルＴＶを用いてＴＶ会議システムを構築する場合等がある。家庭用ディジタルＴＶにおいて、入力された映像と音声の同期を取る必要があるため、映像の表示にかかる時間だけ音声の出力が遅くなることがある。こういった機器にエコー消去装置を接続してハンズフリー通話を行う場合、スピーカの再生信号からマイクロホンの収音信号に含まれる反響信号の間の遅延がかなり大きなものになる。 As an example of a long delay, there is a case where a TV conference system is constructed using a home digital TV. In home digital TV, since it is necessary to synchronize the input video and audio, the output of the audio may be delayed by the time required to display the video. When an echo canceller is connected to such a device to perform a hands-free call, the delay between the reproduction signal of the speaker and the echo signal included in the sound pickup signal of the microphone becomes considerably large.

そのため、部屋の残響に対応するための短いサイズのメモリしか持たない一般のエコー消去装置では、エコーを全く消去することができない、または、エコーの消去量が不十分となる。また、メモリサイズを大きくすればエコーを消去することが可能にはなるが、非常に長いフィルタの計算をしなければならず、演算量が非常に多くかかりフィルタの推定速度も著しく低下する。加えて、製品毎に遅延量は様々なため、予め固定値を指定しておくことができない。 For this reason, a general echo canceling apparatus having only a short-sized memory for coping with the reverberation of the room cannot cancel the echo at all, or the amount of canceling the echo becomes insufficient. Further, if the memory size is increased, echoes can be eliminated. However, a very long filter must be calculated, the calculation amount is very large, and the filter estimation speed is significantly reduced. In addition, since the amount of delay varies for each product, a fixed value cannot be designated in advance.

本発明は、反響信号を用いて再生信号の遅延量を推定し、反響信号を消去する技術を提供することを目的とする。 An object of the present invention is to provide a technique for estimating a delay amount of a reproduction signal using an echo signal and eliminating the echo signal.

上記の課題を解決するために、本発明の第一の態様によれば、時間領域のディジタル再生信号のある離散時刻ｔから始まる連続するｒ個（但し、ｒは複数）のサンプルによる列をフレーム再生信号として求め、時間領域のディジタル収音信号の離散時刻ｔを含む互いに異なる複数の時刻それぞれから始まる連続するｒ個のサンプルによる列それぞれをフレーム収音信号として求め、フレーム再生信号を周波数領域信号に変換して周波数領域再生信号として求め、複数のフレーム収音信号それぞれを周波数領域信号に変換して複数の周波数領域収音信号として求め、周波数領域再生信号と複数の周波数領域収音信号それぞれとの類似性の指標を算出し、算出した類似性の指標が周波数領域再生信号と周波数領域収音信号との類似性が最も高くなることを示す、周波数領域再生信号と周波数領域収音信号が対応する時刻の差を遅延値として求め、遅延値に基づき再生信号を遅延させ、遅延された再生信号を用いて、収音信号から反響信号を消去する。 In order to solve the above problem, according to the first aspect of the present invention, a sequence of r samples (where r is a plurality) starting from a discrete time t of a digital reproduction signal in the time domain is framed. Obtained as a reproduced signal, a sequence of r consecutive samples starting from a plurality of different times including the discrete time t of the digitally collected signal in the time domain is obtained as a frame collected signal, and the frame reproduced signal is obtained as a frequency domain signal. Converted into a frequency domain reproduction signal, and each of the plurality of frame sound collection signals is converted into a frequency domain signal and obtained as a plurality of frequency domain sound collection signals. The similarity index is calculated so that the similarity between the frequency domain reproduction signal and the frequency domain sound collection signal is the highest. The difference between the time corresponding to the frequency domain reproduction signal and the frequency domain sound collection signal is obtained as a delay value, and the reproduction signal is delayed based on the delay value, and the delayed reproduction signal is used to reflect from the sound collection signal. Clear the signal.

上記の課題を解決するために、本発明の第一の態様によれば、時間領域のディジタル収音信号のある離散時刻ｔから始まる連続するｒ個（但し、ｒは複数）のサンプルによる列をフレーム収音信号として求め、時間領域のディジタル再生信号の離散時刻ｔを含む互いに異なる複数の時刻それぞれから始まる連続するｒ個のサンプルによる列それぞれをフレーム再生信号として求め、フレーム収音信号を周波数領域信号に変換して周波数領域収音信号として求め、複数のフレーム再生信号それぞれを周波数領域信号に変換して複数の周波数領域再生信号として求め、周波数領域収音信号と複数の周波数領域再生信号それぞれとの類似性の指標を算出し、算出した類似性の指標が周波数領域収音信号と周波数領域再生信号との類似性が最も高くなることを示す、周波数領域収音信号と周波数領域再生信号が対応する時刻の差を遅延値として求め、遅延値に基づき再生信号を遅延させ、遅延された再生信号を用いて、再生信号から反響信号を消去する。 In order to solve the above problems, according to the first aspect of the present invention, a sequence of r samples (where r is a plurality) starting from a discrete time t of a digital sound pickup signal in the time domain is provided. Obtained as a frame sound collection signal, a sequence of r consecutive samples starting from a plurality of different times including the discrete time t of the digital reproduction signal in the time domain is obtained as a frame reproduction signal, and the frame sound collection signal is obtained in the frequency domain. The signal is converted into a signal and obtained as a frequency domain sound collection signal, and each of the plurality of frame reproduction signals is converted into a frequency domain signal and obtained as a plurality of frequency domain reproduction signals. The similarity index is calculated so that the similarity between the frequency domain collected signal and the frequency domain reproduction signal is the highest. The difference between the time corresponding to the frequency domain sound pickup signal and the frequency domain reproduction signal is obtained as a delay value, the reproduction signal is delayed based on the delay value, and the delayed reproduction signal is used to generate an echo signal from the reproduction signal. Erase.

上記の課題を解決するために、本発明の第三の態様によれば、時間領域のディジタル再生信号のある離散時刻ｔから始まる連続するｒ個（ｒは複数）のサンプルによる列をフレーム再生信号として求め、時間領域のディジタル収音信号の離散時刻ｔを含む互いに異なる複数の時刻それぞれから始まる連続するｒ個のサンプルによる列それぞれをフレーム収音信号として求め、フレーム再生信号と複数のフレーム収音信号それぞれとの類似性の指標を算出し、算出した類似性の指標がフレーム再生信号とフレーム収音信号との類似性が最も高くなることを示す、フレーム再生信号とフレーム収音信号が対応する時刻の差を遅延値として求め、遅延値に基づき再生信号を遅延させ、遅延された再生信号を用いて、収音信号から反響信号を消去する。 In order to solve the above problem, according to a third aspect of the present invention, a sequence of r (r is a plurality of) samples starting from a discrete time t of a digital reproduction signal in the time domain is used as a frame reproduction signal. As a frame sound pickup signal, each column of r consecutive samples starting from a plurality of different times including the discrete time t of the digital sound pickup signal in the time domain is obtained as a frame sound pickup signal. A similarity index with each signal is calculated, and the calculated similarity index indicates that the similarity between the frame reproduction signal and the frame sound collection signal is the highest, and the frame reproduction signal and the frame sound collection signal correspond to each other. The time difference is obtained as a delay value, the reproduction signal is delayed based on the delay value, and the echo signal is erased from the collected sound signal using the delayed reproduction signal.

上記の課題を解決するために、本発明の第四の態様によれば、収音信号に含まれる反響信号を用いて再生信号の遅延量を推定する。時間領域の再生信号と時間領域の収音信号との相関値を、収音信号のフレーム番号とサンプル番号を変化させながら各フレームの各サンプルに対して求め、相関値が最大となるときの収音信号のフレーム番号とサンプル番号を用いて、遅延値を算出し遅延値に基づき再生信号を遅延させ、遅延された再生信号を用いて、収音信号から反響信号を消去する。 In order to solve the above problem, according to the fourth aspect of the present invention, the delay amount of the reproduction signal is estimated using the echo signal included in the collected sound signal. The correlation value between the time domain playback signal and the time domain sound collection signal is obtained for each sample of each frame while changing the frame number and sample number of the sound collection signal, and the correlation value is maximized. A delay value is calculated using the frame number and sample number of the sound signal, the reproduction signal is delayed based on the delay value, and the echo signal is erased from the collected sound signal using the delayed reproduction signal.

上記の課題を解決するために、本発明の第五の態様によれば、収音信号に含まれる反響信号を用いて再生信号の遅延量を推定する。周波数領域の再生信号と周波数領域の収音信号とを用いて、収音信号のフレーム番号を変化させながら、各フレームの各サンプルに対して相関値を求め、相関値が最大となるときの収音信号のフレーム番号とサンプル番号を用いて、遅延値を算出し遅延値に基づき再生信号を遅延させ、遅延された再生信号を用いて、収音信号から反響信号を消去する。 In order to solve the above problem, according to the fifth aspect of the present invention, the delay amount of the reproduction signal is estimated using the echo signal included in the collected sound signal. Using the playback signal in the frequency domain and the collected sound signal in the frequency domain, the correlation value is obtained for each sample of each frame while changing the frame number of the collected sound signal. A delay value is calculated using the frame number and sample number of the sound signal, the reproduction signal is delayed based on the delay value, and the echo signal is erased from the collected sound signal using the delayed reproduction signal.

本発明は、反響信号を用いて再生信号の遅延量を推定することができ、メモリサイズ及び演算量の増加させることなく、反響信号を消去できるという効果を奏する。 The present invention can estimate the delay amount of the reproduction signal using the echo signal, and has the effect of eliminating the echo signal without increasing the memory size and the calculation amount.

第一、二、六、七、八、十実施形態の遅延推定装置の機能ブロック図。The functional block diagram of the delay estimation apparatus of 1st, 2nd, 6th, 7th, 8th, and 10th embodiment. 第一、二、六、七、八、十実施形態の遅延推定装置の処理フロー図。The processing flow figure of the delay estimation apparatus of 1st, 2nd, 6th, 7th, 8th, 10th embodiment. 第一、二、六実施形態の遅延推定部の機能ブロック図。The functional block diagram of the delay estimation part of 1st, 2nd, 6th embodiment. 第一、二実施形態の遅延推定部の処理フロー図。The processing flowchart of the delay estimation part of 1st and 2 embodiment. 相関値算出部１１５の処理フロー図。The processing flow figure of the correlation value calculation part 115. FIG. 相関値算出部の処理内容を説明するための図。The figure for demonstrating the processing content of a correlation value calculation part. 相関値が最大となるときの収音信号のフレーム番号とサンプル番号を求める方法を説明するための図。The figure for demonstrating the method of calculating | requiring the frame number and sample number of a sound-collection signal when a correlation value becomes the maximum. 信号蓄積部１８０の機能ブロック図。The functional block diagram of the signal storage part 180. FIG. 信号蓄積部１８０の処理フロー図。The processing flow figure of the signal storage part 180. FIG. エコー消去部９０の機能ブロック図。The functional block diagram of the echo elimination part 90. FIG. 相関値算出部２１５の処理フロー図。The processing flow figure of the correlation value calculation part 215. 相関値算出部２１５の合算処理、エリア相関値算出処理の処理フロー図。The processing flow figure of the summation process of the correlation value calculation part 215 and an area correlation value calculation process. 第三、九実施形態の遅延推定装置の機能ブロック図。The functional block diagram of the delay estimation apparatus of 3rd and 9th embodiment. 第三、九実施形態の遅延推定装置の処理フロー図。The processing flowchart of the delay estimation apparatus of 3rd, 9th embodiment. 第三、九実施形態の遅延推定部の機能ブロック図。The functional block diagram of the delay estimation part of 3rd, 9 embodiment. 遅延推定部３１０の処理フロー図。The processing flow figure of the delay estimation part 310. エコー消去部９４の機能ブロック図。The functional block diagram of the echo elimination part 94. FIG. 第四、五実施形態の遅延推定装置の機能ブロック図。The functional block diagram of the delay estimation apparatus of 4th, 5th embodiment. 第四、五実施形態の遅延推定装置の処理フロー図。The processing flowchart of the delay estimation apparatus of 4th, 5th embodiment. 信号蓄積部４８０の機能ブロック図。The functional block diagram of the signal storage part 480. FIG. 第四、五実施形態の遅延推定装置の機能ブロック図。The functional block diagram of the delay estimation apparatus of 4th, 5th embodiment. 第四実施形態の遅延推定装置の機能ブロック図。The functional block diagram of the delay estimation apparatus of 4th embodiment. 相関値算出部４１５の処理フロー図。The processing flow figure of the correlation value calculation part 415. 相関値算出部４１５において相関値を求める方法を説明するための図。The figure for demonstrating the method of calculating | requiring a correlation value in the correlation value calculation part 415. FIG. 相関値算出部５１５の処理フロー図。The processing flow figure of the correlation value calculation part 515. 相関値算出部４１５において相関値を求める方法を説明するための図。The figure for demonstrating the method of calculating | requiring a correlation value in the correlation value calculation part 415. FIG. 遅延推定部６１０の処理フロー図。The processing flow figure of the delay estimation part 610. 遅延推定部７１０の機能ブロック図。The functional block diagram of the delay estimation part 710. FIG. 遅延推定部７１０の処理フロー図。The processing flow figure of the delay estimation part 710. 遅延推定部８１０の機能ブロック図。The functional block diagram of the delay estimation part 810. FIG. 第八、十実施形態の遅延推定装置の処理フロー図。The processing flowchart of the delay estimation apparatus of 8th and 10th embodiment. 第九実施形態の遅延推定装置の処理フロー図。The processing flowchart of the delay estimation apparatus of 9th embodiment. 第十実施形態においてγ＝１．０のシミュレーション結果を示す図。The figure which shows the simulation result of (gamma) = 1.0 in 10th embodiment. 図３３の各時刻で最大である相関から現在の遅延値を計算した図。The figure which computed the present delay value from the correlation which is the maximum at each time of FIG. 第十実施形態においてγ＝５．０のシミュレーション結果を示す図。The figure which shows the simulation result of (gamma) = 5.0 in 10th embodiment. 図３５の各時刻で最大である相関から現在の遅延値を計算した図。The figure which computed the present delay value from the correlation which is the maximum at each time of FIG.

以下、本発明の実施形態について、説明する。 Hereinafter, embodiments of the present invention will be described.

＜第一実施形態に係る遅延推定装置１００＞
第一実施形態に係るエコー消去装置は、遅延推定装置１００とエコー消去部９０とを含む。エコー消去部９０は従来技術を用いてエコーを消去すればよいので、主に、図１及び図２を用いて第一実施形態に係る遅延推定装置１００を説明する。遅延推定装置１００は遅延推定部１１０と信号蓄積部１８０とを含む。 <Delay Estimation Device 100 According to First Embodiment>
The echo cancellation apparatus according to the first embodiment includes a delay estimation apparatus 100 and an echo cancellation unit 90. Since the echo canceller 90 may cancel the echo using the conventional technique, the delay estimation apparatus 100 according to the first embodiment will be mainly described with reference to FIGS. 1 and 2. Delay estimation apparatus 100 includes a delay estimation unit 110 and a signal storage unit 180.

遅延推定部１１０は、時間領域のディジタル収音信号（以下、単に「収音信号」という）ｙ（ｎ）と時間領域のディジタル再生信号（以下、単に「再生信号」または「受話信号」という）ｘ（ｎ）とを受け取り、収音信号ｙ（ｎ）に含まれる反響信号を用いて再生信号ｘ（ｎ）の遅延量を推定する（ｓ１１０）。ここで、ｎはディジタル信号のサンプル番号を表し、例えばサンプリング周波数が４８０００Ｈｚの信号の場合、ｎは４８０００分の１秒ごとに１増える値である。 The delay estimation unit 110 includes a time-domain digital sound pickup signal (hereinafter simply referred to as “sound-collection signal”) y (n) and a time-domain digital reproduction signal (hereinafter simply referred to as “reproduction signal” or “received signal”). x (n) is received, and the delay amount of the reproduction signal x (n) is estimated using the echo signal included in the collected sound signal y (n) (s110). Here, n represents the sample number of the digital signal. For example, in the case of a signal having a sampling frequency of 48000 Hz, n is a value that increases by 1 every 1/8 of 48000.

信号蓄積部１８０は、推定された遅延量（以下「遅延推定値」ｄ_ｅｓｔに応じて、再生信号ｘ（ｎ）を遅延させて、出力する（ｓ１８０）。 The signal storage unit 180 delays and outputs the reproduction signal x (n) according to the estimated delay amount (hereinafter referred to as “delay estimation value” d _est ) (s180).

エコー消去部９０は、遅延された再生信号を用いて、収音信号ｙ（ｎ）から反響信号を消去し（ｓ９０）、送話信号ｅ（ｎ）を送話端４に出力する。 The echo canceller 90 cancels the echo signal from the collected sound signal y (n) using the delayed reproduction signal (s90), and outputs the transmitted signal e (n) to the transmitting end 4.

ここで、収音信号ｙ（ｎ）はマイクロホン３により収音されるディジタル信号であり、再生信号ｘ（ｎ）はスピーカ２２で再生されるディジタル信号である。ｎはサンプル番号またはそのサンプルに対応する時刻を示す。 Here, the collected sound signal y (n) is a digital signal collected by the microphone 3, and the reproduction signal x (n) is a digital signal reproduced by the speaker 22. n indicates a sample number or a time corresponding to the sample.

同一空間内にスピーカ２２とマイクロホン３とが存在する場合、スピーカ２２とマイクロホン３との間には音響的な伝達経路である反響路ｈ（ｎ）が生じる。再生音がこの反響路ｈ（ｎ）を介してマイクロホン３により収音される。マイクロホン３で収音される音の内、スピーカ２２の再生音に起因する音を反響音といい、反響音に起因する信号を反響信号という。よって、収音信号には反響信号が含まれる。遅延推定装置１００は、この反響信号を利用して遅延量を推定する。 When the speaker 22 and the microphone 3 exist in the same space, an echo path h (n) that is an acoustic transmission path is generated between the speaker 22 and the microphone 3. The reproduced sound is picked up by the microphone 3 through the echo path h (n). Of the sound picked up by the microphone 3, the sound caused by the reproduction sound of the speaker 22 is called an echo sound, and the signal caused by the echo sound is called an echo signal. Therefore, the collected sound signal includes an echo signal. The delay estimation apparatus 100 estimates the amount of delay using this echo signal.

遅延推定装置１００は、受話端１を介して、再生信号ｘ（ｎ）を受信する。なお、再生装置２も再生信号ｘ（ｎ）を受信する。再生装置２は、例えば、家庭用ディジタルＴＶであり、図示しない映像データも受信する。遅延部２１において、再生信号と映像データとの同期を取る。その際、映像データの表示にかかる時間だけ再生信号の出力を遅くする。スピーカ２２は、同期後の再生信号を受信し、再生する。再生音は、反響路ｈ（ｔ）を介してマイクロホン３により収音される。マイクロホン３は収音信号ｙ（ｎ）を遅延推定装置１００及びエコー消去部９０に出力する。なお、同期後の映像データは図示しない表示部に表示される。 The delay estimation apparatus 100 receives the reproduction signal x (n) via the receiving end 1. Note that the playback device 2 also receives the playback signal x (n). The playback device 2 is, for example, a home digital TV, and also receives video data (not shown). In the delay unit 21, the reproduction signal and the video data are synchronized. At that time, the output of the reproduction signal is delayed by the time required for displaying the video data. The speaker 22 receives and reproduces the synchronized reproduction signal. The reproduced sound is picked up by the microphone 3 via the echo path h (t). The microphone 3 outputs the collected sound signal y (n) to the delay estimation apparatus 100 and the echo canceller 90. The synchronized video data is displayed on a display unit (not shown).

以下、各部の詳細を説明する。 Details of each part will be described below.

＜遅延推定部１１０＞
図３及び図４を用いて遅延推定部１１０を説明する。遅延推定部１１０は、フレーム化部１１１と、ベクトル化部１１２と、無音区間判定部１１３と、相関値算出部１１５と、遅延値算出部１１７と、遅延出力部１１９とを含む。 <Delay estimation unit 110>
The delay estimation unit 110 will be described with reference to FIGS. 3 and 4. The delay estimation unit 110 includes a framing unit 111, a vectorization unit 112, a silence interval determination unit 113, a correlation value calculation unit 115, a delay value calculation unit 117, and a delay output unit 119.

（フレーム化部１１１）
フレーム化部１１１は、時間領域のディジタル再生信号ｘ（ｎ）を受け取り、ある離散時刻ｔから始まる連続するｒ個（ｒは複数）のサンプルによる列をフレーム化し（ｓ１１１）、フレーム単位の再生信号ｘ_ｍをベクトル化部１１２に出力する。以下ではｒ＝２Ｌ（Ｌは正の整数）として説明する。なお、ｍはフレーム番号及びそのフレーム番号に対応する時刻（以下「フレーム時刻」という）を表す。 (Frame unit 111)
The framing unit 111 receives the digital reproduction signal x (n) in the time domain, frames a sequence of consecutive r (r is plural) samples starting from a certain discrete time t (s111), and reproduces the reproduction signal in units of frames. and it outputs the x _m the vectorization unit 112. In the following description, it is assumed that r = 2L (L is a positive integer). Note that m represents a frame number and a time corresponding to the frame number (hereinafter referred to as “frame time”).

同様に、フレーム化部１１１は、時間領域のディジタル収音信号ｙ（ｎ）を受け取り、前記離散時刻ｔを含む互いに異なる複数の時刻それぞれから始まる連続するｒ個のサンプルによる列をフレーム化し、フレーム単位の収音信号ｙ_ｍを無音区間判定部１１３に出力する。以下では、Ｌ個のサンプルに相当する時刻ずつずらした複数の時刻それぞれから始まる連続する２Ｌ個のサンプルによる列をフレーム化するものとして説明する。例えば以下のようにフレーム化する。
x_m=[x(mL-2L+1),x(mL-2L+2),…,x(mL)]^T
y_m=[y(mL-2L+1),y(mL-2L+2),…,y(mL)]^T
なお、・^Ｔは行列・の転置行列を表す。 Similarly, the framing unit 111 receives the digital sound pickup signal y (n) in the time domain, frames a sequence of consecutive r samples starting from a plurality of different times including the discrete time t, and it outputs the collected sound signal y _m of unit silent section determining unit 113. In the following description, it is assumed that a sequence of 2L samples starting from a plurality of times shifted by a time corresponding to L samples is framed. For example, the frame is formed as follows.
x _m = [x (mL-2L + 1), x (mL-2L + 2), ..., x (mL)] ^T
y _m = [y (mL-2L + 1), y (mL-2L + 2), ..., y (mL)] ^T
Note that • ^T represents a transposed matrix of a matrix.

（ベクトル化部１１２）
ベクトル化部１１２は、フレーム単位の再生信号ｘ_ｍを受け取り、再生信号ｘ_ｍの前半Ｌ個を切り出して、ベクトル
x'_m ^T=[x(mL-2L+1),x(mL-2L+2),…,x(mL-L)]
を生成し（ｓ１１２）、無音区間判定部１１３と相関値算出部１１５に出力する。 (Vectorizer 112)
Vectorization unit 112 receives the reproduction signal x _m of frames, cut out half L number of reproduced signals x _m, vector
x ' _m ^T = [x (mL-2L + 1), x (mL-2L + 2), ..., x (mL-L)]
(S112) and output to the silent section determination unit 113 and the correlation value calculation unit 115.

（無音区間判定部１１３）
無音区間判定部１１３は、再生信号ｘ_ｍを用いて、再生信号ｘ_ｍが無音区間か否かを判定する（ｓ１１３ａ）。例えば、無音区間判定部１１３は、再生信号ｘ_ｍから得られるベクトルｘ’_ｍを受け取り、ベクトルｘ’_ｍのパワー||ｘ’_ｍ||^２を算出し、閾値Ｔ_ｘ以上か否かを判定する。なお、||・||は・のＬ２ノルムを表す。閾値Ｔ_ｘ以上の場合には、無音区間ではないと判定し、閾値Ｔ_ｘ未満の場合には、無音区間であると判定する。無音区間判定部１１３は、パワー||ｘ’_ｍ||^２が閾値Ｔ_ｘ以上の場合、そのときのｍをｍ_０として相関値算出部１１５に出力する（ｓ１１３ｂ）。閾値Ｔ_ｘは再生信号に含まれるノイズの影響を小さくするために用いる。無音か小さな声では閾値Ｔ_ｘを下回り、通常の音量の音声で閾値Ｔ_ｘを超えるように閾値Ｔ_ｘを設定する。 (Silent section determination unit 113)
Silent section determining unit 113, using the reproduction signal _{x m,} reproduced signals _{x m} determines whether the silent section (S113a). For example, silent interval determination section 113 'receives the _m, the vector x' vector x obtained from the reproduced signal _{x m} calculates the power || x _'m || ² of _m, determines whether or larger than the threshold _{T x} To do. || · || represents the L2 norm of. If it is equal to or greater than the threshold T _x , it is determined that it is not a silent section, and if it is less than the threshold T _x , it is determined that it is a silent section. When the power || x ′ _m || ² is equal to or greater than the threshold T _x , the silent section determination unit 113 outputs m to the correlation value calculation unit 115 as m _{0 at} that time (s113b). The threshold value _Tx is used to reduce the influence of noise included in the reproduction signal. Below the threshold T _x is a silence or a small voice, to set the threshold T _x to exceed the threshold in T _x voice of normal volume.

なお、閾値Ｔ_ｘ未満の場合には、次の再生信号ｘ（ｎ）と収音信号ｙ（ｎ）を受け取り、フレーム化処理（ｓ１１１）、ベクトル化処理（ｓ１１２）、無音区間判定処理（ｓ１１３ａ）を繰り返す。 If it is less than the threshold T _x , the next reproduction signal x (n) and the collected sound signal y (n) are received, the framing process (s111), the vectorization process (s112), and the silent section determination process (s113a). )repeat.

受け取った全ての再生信号ｘ（ｎ）と収音信号ｙ（ｎ）に対して、相関値算出部１１５以降の処理を行ってもよいが、通常反響音はある程度大きな再生音の場合に生じるので、そのような場合にのみ遅延量を推定すれば十分効果を得ることができる。よって、このように無音区間判定部１１３において、無音区間でないと判定されたフレームに対してのみ、以降の処理を行うことで、演算量を減らすことができる。 The processing after the correlation value calculation unit 115 may be performed on all the received reproduction signals x (n) and the collected sound signals y (n). However, the normal reverberation sound is generated when the reproduction sound is somewhat loud. If the delay amount is estimated only in such a case, a sufficient effect can be obtained. Therefore, the amount of calculation can be reduced by performing the subsequent processing only on the frame determined not to be a silence interval in the silence interval determination unit 113 in this way.

（相関値算出部１１５）
相関値算出部１１５は、再生信号ｘ’_ｍ ^Ｔと収音信号ｙ_ｍとを受け取り、その相関値ｃ_ｆ（ｎ）を、収音信号ｙ_ｍのフレーム番号とサンプル番号を変化させながら、各フレームｍの各サンプルｎに対して算出する（ｓ１１５）。 (Correlation value calculation unit 115)
The correlation value calculation section 115 receives the reproduced signal x _'m ^T a collected sound signal y _m, the correlation value c _{f (n),} while changing the frame number and the sample number of the collected sound signal y _m, each Calculation is performed for each sample n of the frame m (s115).

図５を用いて相関値算出部１１５の処理内容をより詳細に説明する。例えば、相関値算出部１１５は、無音区間ではないと判定したフレーム番号ｍ_０を受け取り、以下のベクトル The processing content of the correlation value calculation unit 115 will be described in more detail with reference to FIG. For example, the correlation value calculation unit 115 receives the frame number m ₀ determined not to be a silent section, and the following vector

を定義する（ｓ１１５ａ）。ここで０_ｎはｎ個の０が並んだベクトルを表す。さらに、以下の式により、フレームｍのｎ番目のサンプルの相関値ｃ_ｆ（ｎ）を算出する（ｓ１１５ｃ）。 Is defined (s115a). Here, 0 _n represents a vector in which n 0s are arranged. Further, the correlation value c _f (n) of the nth sample of the frame m is calculated by the following equation (s115c).

但し、Ｄ_Ｆを想定する最大遅延をフレーム数で表したものとし、ｍ_０≦ｍ≦ｍ_０＋Ｄ_Ｆ−１とし、ｆ＝ｍ−ｍ_０とする（ｓ１１５ｂ）。よって、０≦ｆ≦Ｄ_Ｆ−１である。なお、ｘ＾_ｍ（ｉ）、ｙ＾_ｍ（ｉ）はそれぞれベクトルｘ＾_ｍ、ｙ＾_ｍのｉ番目の要素を表し、記号＾は直前の文字の頭上に附されるものとする。図６に示すように、式（３）において、ｎの値を０からＬ−１に変化させ（ｓ１１５ｂ、ｓ１１５ｄ，ｓ１１５ｅ）、ベクトルｘ＾_ｍ＝［ｘ’_ｍ０ ^Ｔ］（但し、下付き文字ｍ０はｍ_０を表す）と収音信号ｙ＾_ｍ＝［ｙ_ｍ（１＋ｎ），…，ｙ_ｍ（Ｌ＋ｎ）］の相関値を算出する（ｓ１１５ｃ）。さらにフレーム番号ｍを、ｍ_０からｍ_０＋Ｄ_Ｆ−１まで変化させ（図４のｓ１１３ｂ、図５のｓ１１５ｆ、ｓ１１５ｇ）、各フレームｍの各サンプルｎ毎の相関値ｃ_ｆ（ｎ）を算出する。言い換えると、フレーム時刻が１フレーム進む毎に、つまりｍが１増えるごとに、ｘ＾_ｍは一定の値（式（１）及び図６参照、ｘ＾_ｍはｍ_０のときの値から変化しない）を保持するのに対し、ｙ＾_ｍは値が変化するため（式（２）及び図６参照、ｙ＾_ｍはフレーム時刻ｍに応じて変化し、さらにサンプル番号ｎも変化する）、その時間差の異なる信号との相関を順に取っていく。想定する最大遅延をＤ_ｓサンプル（例えば、サンプリング周波数を１６ｋＨｚとし、最大遅延を２００ｍｓと想定したとき、Ｄ_ｓ＝３２００である）としたとき、（ｍ−ｍ_０）Ｌ＞Ｄ_ｓとなるｍ＝ｍ_１＝ｍ_０＋Ｄ_Ｆ-１までｃ_ｆを計算する（つまり、ｍ_０≦ｍ≦ｍ_１＝ｍ_０＋Ｄ_Ｆ-１）。 However, the maximum delay assuming _DF is represented by the number of frames, m ₀ ≦ m ≦ m ₀ + D _F −1, and f = m−m ₀ (s115b). Therefore, 0 ≦ f ≦ D _F −1. Note that x ^ _m (i) and y ^ _m (i) represent the i-th elements of the vectors x ^ _m and y ^ _m , respectively, and the symbol ^ is attached to the head of the immediately preceding character. As shown in FIG. 6, in the formula (3), by changing the value of n from 0 to L-1 (s115b, s115d, s115e), the vector _{_{x ^ m = [x 'm0}} T] ( where subscript m0 represents _{m 0)} and the picked-up signal _y ^ _m = calculates the correlation values of [y m (1 + n) , ..., y m (L + n)] (s115c). Further, the frame number m is changed from m ₀ to m ₀ + D _F −1 (s113b in FIG. 4, s115f and s115g in FIG. 5), and a correlation value c _f (n) for each sample n in each frame m is calculated. To do. In other words, for each frame time increases one frame, i.e. every time m is increased 1, x ^ _m is a constant value (equation (1) and refer to FIG. 6, x ^ _m does not change from the value when the m ₀ ) Is held, but the value of y ^ _m changes (see equation (2) and FIG. 6, y ^ _m changes according to the frame time m, and the sample number n also changes). The correlation with signals with different time differences is taken in order. When the assumed maximum delay is D _s samples (for example, assuming that the sampling frequency is 16 kHz and the maximum delay is 200 ms, D _s = 3200), m _satisfying (m−m ₀ ) L> D _s _Cf is calculated up to = m ₁ = m ₀ + D _F −1 (that is, m ₀ ≦ m ≦ m ₁ = m ₀ + D _F −1).

相関値算出部１１５は、式（３）を用いて、Ｄ_Ｆ×Ｌ個の相関値ｃ_ｆ（ｎ）を算出し、算出した相関値の中で最大の相関値となるときのフレーム番号をｆ_ｍａｘとし、最大の相関値となるときのサンプル番号をｎ_ｍａｘとして遅延値算出部１１７に出力する（図７参照）。 The correlation value calculation unit 115 calculates D _F × L correlation values c _f (n) using Expression (3), and determines the frame number when the maximum correlation value is obtained among the calculated correlation values. f _max and the sample number when the maximum correlation value is obtained is set to n _max and output to the delay value calculation unit 117 (see FIG. 7).

上記では、式（３）の相関値で説明を行ったが、相関値に限らず再生信号からなるサンプル列と収音信号からなるサンプル列との類似性の指標を表すものであればよい。この観点から相関値算出部を類似性算出部と呼んでもよい。 In the above description, the correlation value of the expression (3) has been described. From this viewpoint, the correlation value calculation unit may be called a similarity calculation unit.

（遅延値算出部１１７）
遅延値算出部１１７は、相関値が最大となるときの収音信号のフレーム番号ｆ_ｍａｘとサンプル番号ｎ_ｍａｘを受け取り、これを用いて、例えば以下の式により遅延値ｄ_ｍａｘを算出し、遅延出力部１１９へ出力する（ｓ１１７）。 (Delay value calculation unit 117)
The delay value calculation unit 117 receives the frame number f _max and the sample number n _max of the collected sound signal when the correlation value is maximum, and uses this to calculate the delay value d _max using, for example, the following formula, The data is output to the output unit 119 (s117).

言い換えると、遅延値算出部１１７は、相関値算出部１１５で算出した類似性の指標が最も高くなることを示す、再生信号からなるサンプル列と収音信号からなるサンプル列が対応する時刻の差を遅延値として求める。 In other words, the delay value calculation unit 117 indicates the difference between the time corresponding to the sample sequence consisting of the reproduction signal and the sample sequence consisting of the collected sound signal, which indicates that the similarity index calculated by the correlation value calculation unit 115 is the highest. Is obtained as a delay value.

（遅延出力部１１９）
遅延出力部１１９は、所定数の遅延値を受け取り、最も頻度の高い遅延値を遅延推定値として出力する（ｓ１１９ｅ）。 (Delay output unit 119)
The delay output unit 119 receives a predetermined number of delay values and outputs the most frequent delay value as a delay estimation value (s119e).

例えば、遅延出力部１１９は、Ｄ_ｓの長さを持つ配列ｄ_ｈを用意し、０で初期化する（ｓ１１９ａ）。遅延出力部１１９は、遅延値ｄ_ｍａｘを受け取ると、配列ｄ_ｈのインデックスがｄ_ｍａｘ番目の要素の数を１増やす（ｓ１１９ｂ）。Ｔ_ｓｕｍ個の遅延値ｄ_ｍａｘを取得するまで、上記処理（ｓ１１１〜ｓ１１７、ｓ１１９ｂ）を繰り返す（ｓ１１９ｃ、ｓ１１９ｄ）。このような処理を行うことで、配列ｄ_ｈは遅延推定値の候補のヒストグラムとなる。そして、Ｔ_ｓｕｍ個の遅延値ｄ_ｍａｘを取得したとき（言い換えると、配列ｄ_ｈの要素の合計がＴ_ｓｕｍとなったとき）に、配列ｄ_ｈの全要素の中で一番大きな値をとる配列の要素を探索し、その要素のインデックスを遅延推定値ｄ_ｅｓｔとして出力する（ｓ１１９ｅ）。Ｔ_ｓｕｍはヒストグラムの最頻値が常に真値となるために必要な計算回数を表し、推定値のばらつき方によって数回から数十回分の計算を行うように設定する。このような構成とすることで、誤差によって遅延推定値がばらつくことを大幅に軽減できる。 For example, the delay output unit 119 prepares a sequence _{d h} with a length of _{D s,} is initialized to 0 (S119A). Delayed output unit 119 receives a delay value _{d max,} an index of the sequence _{d h} is increased by one the number of _{d max} th element (s119b). Until obtaining a T _sum number of delay values _{d max,} the above process is repeated (s111~s117, s119b) (s119c, s119d). By performing such processing, the sequence d _h is the histogram of the candidate delay estimates. _Then, (in other words, when the sum of the elements of the array _{d h} becomes _{T sum)} when obtaining _{T sum} number of delay values _{d max} to take the largest value among all elements of the array _{d h} The element of the array is searched, and the index of the element is output as the delay estimation value d _est (s119e). T _sum represents the number of calculations necessary for the mode value of the histogram to always be a true value, and is set to perform several to several tens of calculations depending on how the estimated values vary. With such a configuration, it is possible to greatly reduce the variation in the delay estimation value due to an error.

＜信号蓄積部１８０＞
信号蓄積部１８０は、遅延推定値ｄ_ｅｓｔに応じて、再生信号ｘ（ｎ）を遅延させて、遅延再生信号ｘ（ｎ’）を出力する。例えば、信号蓄積部１８０は、信号格納部１８１と信号バッファ１８３と第一信号出力部１８５とを含む（図８、図９参照）。 <Signal Storage Unit 180>
The signal storage unit 180 delays the reproduction signal x (n) according to the delay estimation value d _est and outputs the delayed reproduction signal x (n ′). For example, the signal storage unit 180 includes a signal storage unit 181, a signal buffer 183, and a first signal output unit 185 (see FIGS. 8 and 9).

信号バッファ１８３は長さＤのサンプルを保持できるバッファである（Ｄ≧Ｄ_ｓであればよく、通常Ｄ＝Ｄ_ｓとすればよい）。信号格納部１８１は、再生信号ｘ（ｎ）を受け取り、信号バッファ１８３上の古いサンプルから順に上書きする形で保存する（ｓ１８１）。第一信号出力部１８５は、遅延推定値ｄ_ｅｓｔを受け取り、この遅延推定値ｄ_ｅｓｔに基づいて、現在のサンプルｘ（ｎ）から数えてｄ_ｅｓｔ＋２Ｌ-１サンプル古いものからｄ_ｅｓｔサンプル古いものまで計２Ｌ個出力する（ｓ１８５）。つまり、２Ｌ個の遅延再生信号ｘ（ｎ’）（但し、ｎ−ｄ_ｅｓｔ−２Ｌ＋１）≦ｎ’≦ｎ−ｄ_ｅｓｔ）を出力する。 The signal buffer 183 is a buffer that can hold a sample of length D (D ≧ D _s , usually D = D _s ). The signal storage unit 181 receives the reproduction signal x (n) and stores it in the form of overwriting in order from the old sample on the signal buffer 183 (s181). The first signal output unit 185 receives the delay estimates _{d est,} on the basis of the delay estimates _{d est,} those _{d est} sample old from _d est + 2L-1 samples old counted from the current sample x (n) 2L in total are output (s185). That is, 2L delayed reproduction signals x (n ′) (where n−d _est −2L + 1) ≦ n ′ ≦ n−d _est ) are output.

＜エコー消去部９０＞
エコー消去部９０は、例えば、従来技術を用いてエコーを消去すればよい。エコー消去部９０は、遅延再生信号ｘ（ｎ’）を用いて、収音信号ｙ（ｎ）から反響信号を消去し、送話信号ｅ（ｎ）を送話端４に出力する。再生信号ｘ（ｎ）ではなく、遅延再生信号ｘ（ｎ’）を用いる点が従来技術と異なるが、その他の点は従来技術と同様である。例えば、図１０のようにエコー推定部を用い、非特許文献１記載の適応フィルタによって収音信号から疑似エコー信号を差し引いてエコー消去をする方法や、特許３４２０７０５号公報のように収音信号にエコー抑圧ゲインをかけてエコーを抑圧する方法がある。 <Echo elimination unit 90>
The echo canceller 90 may cancel the echo using, for example, a conventional technique. The echo canceling unit 90 uses the delayed reproduction signal x (n ′) to delete the echo signal from the collected sound signal y (n) and outputs the transmitted signal e (n) to the transmitting end 4. Although the delayed reproduction signal x (n ′) is used instead of the reproduction signal x (n), it is different from the conventional technique, but the other points are the same as the conventional technique. For example, using an echo estimator as shown in FIG. 10 and subtracting the pseudo echo signal from the collected sound signal by the adaptive filter described in Non-Patent Document 1, or canceling the echo as in Japanese Patent No. 3420705 There is a method of suppressing echo by applying an echo suppression gain.

例えば、図１０に示すように、エコー消去部９０は、エコー推定部９１と減算部９３を含む構成であってもよい。エコー推定部９１において、非特許文献１記載の適応フィルタを用いて、遅延再生信号ｘ（ｎ’）により疑似反響信号ｙ’（ｎ）を生成する。次に、減算部９３において収音信号ｙ（ｎ）から疑似反響信号ｙ’（ｎ）を差し引いてエコーを消去した送話信号ｅ（ｎ）を求め、出力する。なお、エコー推定部９１は送話信号ｅ（ｎ）を受け取り、適応フィルタのフィルタ係数の更新の際に利用する。 For example, as shown in FIG. 10, the echo cancellation unit 90 may include an echo estimation unit 91 and a subtraction unit 93. The echo estimation unit 91 generates a pseudo echo signal y ′ (n) from the delayed reproduction signal x (n ′) using the adaptive filter described in Non-Patent Document 1. Next, the subtracting unit 93 obtains and outputs a transmission signal e (n) obtained by subtracting the pseudo echo signal y ′ (n) from the collected sound signal y (n) to cancel the echo. The echo estimation unit 91 receives the transmission signal e (n) and uses it when updating the filter coefficient of the adaptive filter.

＜効果＞
本実施形態は、遅延推定部において、反響信号を用いて再生信号の遅延量を推定することができる。さらに、信号蓄積部では、推定した遅延量に基づき、再生信号と遅延させて出力することができる。 <Effect>
In this embodiment, the delay estimation unit can estimate the delay amount of the reproduction signal using the echo signal. Further, the signal storage unit can output the reproduction signal with a delay based on the estimated delay amount.

エコー消去部において、遅延再生信号を用いて、エコーを消去することで、遅延の影響によるエコー消去部の性能劣化を、フィルタタップ長を増やさずに防ぐことができる。フィルタタップ長を増やさないので、演算量の増加を防ぐことができる。加えて、製品毎の遅延量を推定することができるため、製品毎に適切な遅延量を推定し、エコーを消去することができる。さらに、フレーム毎の処理のため、ＩＰ電話のようなパケット単位で処理するアプリケーションへの適用が容易である。 In the echo canceller, by using the delayed reproduction signal to cancel the echo, it is possible to prevent performance degradation of the echo canceller due to the delay effect without increasing the filter tap length. Since the filter tap length is not increased, an increase in the amount of calculation can be prevented. In addition, since the delay amount for each product can be estimated, it is possible to estimate an appropriate delay amount for each product and cancel the echo. Furthermore, since the processing is performed on a frame-by-frame basis, it is easy to apply to an application that processes in units of packets such as an IP phone.

なお、エコー消去装置の内部に上述した遅延推定装置を組込み、遅延再生信号を出力するのではなく、適応フィルタの開始位置を調整する構成としてもよい。遅延再生信号を出力する場合と同様に、必要な演算量を増加させることなく、エコー消去性能を維持することができる。なお、本実施形態では、一定長のフレーム単位でスピーカの再生信号とマイクロホンの収音信号の相関を計算し、各フレームの相関値の大小によって遅延量を柔軟に決定することができる。 Note that the delay estimation device described above may be incorporated in the echo canceller and the start position of the adaptive filter may be adjusted instead of outputting the delayed reproduction signal. As in the case of outputting the delayed reproduction signal, the echo cancellation performance can be maintained without increasing the amount of calculation required. In the present embodiment, the correlation between the reproduction signal of the speaker and the collected sound signal of the microphone is calculated in units of a fixed length frame, and the delay amount can be determined flexibly depending on the magnitude of the correlation value of each frame.

＜その他の変形例＞
遅延推定装置１００が受信する再生信号及び収音信号がアナログ信号の場合には、図示しないＡＤ変換部において、アナログ再生信号ｘ（ｔ）及びアナログ収音信号ｙ（ｔ）（ｔは時刻を表す）を、それぞれ所定のサンプリング周波数（例えば１６ｋＨｚ）でサンプリングし、各サンプルを量子化し、ディジタル受話信号サンプルｘ（ｎ）及びディジタル収音信号ｙ（ｎ）に変換する構成としてもよい。 <Other variations>
When the reproduction signal and the sound collection signal received by the delay estimation apparatus 100 are analog signals, the analog reproduction signal x (t) and the analog sound collection signal y (t) (t represents time) in an AD converter (not shown). ) Are sampled at a predetermined sampling frequency (for example, 16 kHz), each sample is quantized, and converted into a digital received signal sample x (n) and a digital sound pickup signal y (n).

遅延推定装置１００はベクトル化部１１２を含まなくともよい。その場合には、ｘ’_ｍに代えてｘ_ｍを用いて無音区間判定処理（ｓ１１３ａ）、相関値算出処理（ｓ１１５）を行えばよい。 The delay estimation apparatus 100 may not include the vectorization unit 112. In this case, silent interval determination process using _{x m} instead of x _'m (s113a), may be performed correlation value calculation processing (s115).

遅延推定装置１００は遅延出力部１１９を含まず、遅延値算出部１１７の出力値であるｄ_ｍａｘをそのまま遅延推定部１１０の遅延推定値ｄ_ｅｓｔとして出力する構成としてもよい。遅延推定値が不安定になるが、推定速度が速くなるという効果がある。なお、以下に説明する実施形態においても同様である。 The delay estimation apparatus 100 may not include the delay output unit 119, and may output the d _max that is the output value of the delay value calculation unit 117 as it is as the delay estimation value d _est of the delay estimation unit 110. Although the delay estimation value becomes unstable, there is an effect that the estimation speed is increased. The same applies to the embodiments described below.

＜第二実施形態に係る遅延推定装置２００＞
第一実施形態と異なる部分についてのみ説明する。図１及び図２を用いて第二実施形態に係る遅延推定装置２００を説明する。 <Delay Estimation Device 200 According to Second Embodiment>
Only parts different from the first embodiment will be described. A delay estimation apparatus 200 according to the second embodiment will be described with reference to FIGS. 1 and 2.

遅延推定装置２００は、遅延推定部２１０と信号蓄積部１８０とを含む。遅延推定部２１０の構成及び処理内容が第一実施形態と異なる。遅延推定部２１０は、収音信号ｙ（ｎ）と再生信号ｘ（ｎ）とを受け取り、収音信号ｙ（ｎ）に含まれる反響信号を用いて再生信号ｘ（ｎ）の遅延量を推定する（ｓ２１０）。遅延推定部２１０内の相関値算出部２１５の構成及び処理内容（ｓ２１５）が第一実施形態と異なる（図３及び図４参照）。以下、図１１及び図１２を用いて詳細を説明する。 Delay estimation apparatus 200 includes a delay estimation unit 210 and a signal storage unit 180. The configuration and processing contents of the delay estimation unit 210 are different from those of the first embodiment. The delay estimation unit 210 receives the collected sound signal y (n) and the reproduced signal x (n), and estimates the delay amount of the reproduced signal x (n) using the echo signal included in the collected sound signal y (n). (S210). The configuration and processing contents (s215) of the correlation value calculation unit 215 in the delay estimation unit 210 are different from those in the first embodiment (see FIGS. 3 and 4). Details will be described below with reference to FIGS. 11 and 12.

＜相関値算出部２１５＞
相関値算出部２１５は、無音区間判定部１１３において無音区間でないと判定された再生信号ｘ’_ｍ ^Ｔを所定の範囲Ｉ毎に合算し、収音信号ｙ_ｍを所定の範囲Ｉ毎に合算する（ｓ２１５ｂ）。例えば、以下の式により合算する（ｘ＾_ｍ、ｙ＾_ｍについては式（１）、式（２）参照）。 <Correlation value calculation unit 215>
Correlation value calculation section 215 sums the reproduced signal x _'m ^T of the silent section determining unit 113 is determined not to be a silent section for each predetermined range I, sums the collected sound signal y _m for each predetermined range I (S215b). For example, summing the following equation _(x ^ m, y _^ for _m Formula (1), see equation (2)).

ｍｉｎ｛・｝は集合・の最小値を返す関数である。つまり、再生信号ｘ＾_ｍ及び収音信号ｙ＾_ｍをそれぞれ、Ｌ’_Ｉ個またはＬ’_Ｉ＋１個のエリアに区切り、エリア毎に合算する（ｓ２１５ｂ−１〜ｓ２１５ｂ−４）。 min {·} is a function that returns the minimum value of the set. That is, the reproduction signal x ^ _m and the collected sound signal y ^ _m are divided into L' _I or L' _I + 1 areas, respectively, and summed up for each area (s215b-1 to s215b-4).

さらに、相関値算出部２１５は、合算された再生信号ｘ⁻ _ｍ（ｎ）と合算された収音信号ｙ⁻ _ｍ（ｎ）とのエリア相関値を、各フレームの各所定の範囲に対して求める（ｓ２１５ｃ）。なお、記号⁻は直前の文字の頭上に附されるものとする。例えば、以下の式によりエリア相関値ｃ’_ｆを求める。 Further, the correlation value calculation unit 215 calculates an area correlation value between the summed reproduction signal x ⁻ _m (n) and the summed sound pickup signal y ⁻ _m (n) for each predetermined range of each frame. Obtain (s215c). Symbols ^- shall be subjected to overhead of the previous characters. For example, the area correlation value c ′ _f is obtained by the following equation.

つまり、合算された再生信号ｘ⁻ _ｍ＝［ｘ⁻ _ｍ０（１），…，ｘ⁻ _ｍ０（Ｌ_Ｉ）］と合算された収音信号ｙ⁻ _ｍ＝［ｙ⁻ _ｍ０（１＋ｎ），…，ｙ⁻ _ｍ０（Ｌ_Ｉ＋ｎ）］（但し、ｎは変化し、０≦ｎ≦Ｌ_Ｉ−１である。また、式（２）より収音信号はフレーム時刻ｍの変化に応じて信号が変化する）までのエリア相関値を算出する（ｓ２１５ｃ−１〜ｓ２１５ｃ−４）。 That, combined reproduced signal ^{_{^{_{x - m = [x - m0}}}} (1), ..., x - m0 (L I)] and the summed voice collecting signals ^{_{^{_{y - m = [y - m0}}}} (1 + n), ..., ^{_{_{y -. m0 (L I +}}} n)] ( where, n is changed, is 0 ≦ n ≦ _L I -1 the signal changes in response to changes in the sound collection signal frame time m from the formula (2) Area correlation values are calculated (s215c-1 to s215c-4).

相関値算出部２１５は、式（１０）を用いて、Ｄ_Ｆ×Ｌ_Ｉ個の相関値ｃ’_ｆ（ｎ）を算出し、フレーム毎に算出した相関値の中で最大の相関値となるときのサンプル番号をｎ’_ｍａｘとして求める。フレーム毎にｎ’_ｍａｘを求めるため、Ｄ_Ｆ個のサンプル番号ｎ’_ｍａｘを求める。 The correlation value calculation unit 215 calculates D _F × L _I correlation values c ′ _f (n) using Expression (10), and becomes the maximum correlation value among the correlation values calculated for each frame. The sample number at that time is obtained as n ′ _max . In order to obtain n ′ _max for each frame, _DF sample numbers n ′ _max are obtained.

次に相関値算出部２１５は、再生信号ｘ_ｍと収音信号ｙ_ｍとの相関値ｃ_ｆを求める（ｓ２１５ｅ）。その際、収音信号のフレーム番号ｍを変化させる。さらに、エリア相関値ｃ’_ｆが最大となるときの所定の範囲（この例では、サンプル番号ｎ’_ｍａｘから始まるＩ個のサンプル）を中心とする前後数サンプルの範囲内でサンプル番号を変化させながら、各フレームの各サンプルに対して相関値ｃ_ｆを求める。例えば、ｎ_ｌｏｗ＝ｎ’_ｍａｘ−Ｍ（但し、ｎ_ｌｏｗ＜１のときｎ_ｌｏｗ＝１とする）からｎ_ｈｉｇｈ＝ｎ’_ｍａｘ＋Ｍ（但し、ｎ_ｈｉｇｈ＞Ｌのときｎ_ｈｉｇｈ＝Ｌとする）の範囲で再生信号ｘ＾_ｍと収音信号ｙ＾_ｍとの相関値を求める（ｓ２１５ｄ〜ｓ２１５ｇ）。例えば以下の式により求める。 Then the correlation value calculation unit 215 finds the correlation value _{c f} of the reproduced signal _{x m} and collected signal _{y m (s215e).} At this time, the frame number m of the collected sound signal is changed. Further, the sample number is changed within a range of several samples around the predetermined range (in this example, I samples starting from the sample number n ′ _max ) when the area correlation value c ′ _f is maximum. Accordingly, a correlation value _cf is obtained for each sample of each frame. For _{_{example, n low = n 'max -M}} <n from (time 1 and _{n low} = 1 _high = _n _{_{where, n low)' max + M}} ( _{where, n high>} and _{n high} = L when L) The correlation value between the reproduced signal x ^ _m and the collected sound signal y ^ _m is obtained in the range of (s215d to s215g). For example, it calculates | requires with the following formula | equation.

Ｍはｎ’_ｍａｘの周辺で相関の最大値があると思われる範囲を示す。つまり、エリア相関値ｃ’_ｆを用いて遅延のおおよその値を計算し、その後、相関値ｃ_ｆから正確な遅延値を求める。 M represents a range in which the maximum value of the correlation is considered around n ′ _max . In other words, by using the area correlation value c _'f to calculate the approximate value of the delay, then determining the correct delay value from the correlation value c _f.

＜効果＞
このような構成とすることで、第一実施形態と同様の効果を得ることができる。さらに、１フレームあたり第一実施形態ではＬタップの相関計算がＬ回必要だったのが、Ｌ_Ｉタップの相関計算がＬ_Ｉ回（式（１０）参照）とＬタップの相関計算が２Ｍ＋１回（式（３）参照）で済むようになる。例えばＬ＝３２０、Ｉ＝１０、Ｍ＝５０のとき、その計算量はおおよそ１／３となる。 <Effect>
By setting it as such a structure, the effect similar to 1st embodiment can be acquired. Further, the in the first embodiment per frame correlation calculation of L taps are needed times L, L correlation calculation of _I taps L _I times (equation (10) refer) and correlation computation 2M + 1 times the L taps (See Equation (3)). For example, when L = 320, I = 10, and M = 50, the amount of calculation is approximately 1/3.

＜第三実施形態に係る遅延推定装置３００＞
第一実施形態と異なる部分のみ説明する。第三実施形態に係るエコー消去装置は、遅延推定装置１００とエコー消去部９４とを含む。エコー消去部９４は従来技術を用いてエコーを消去すればよいので、主に、図１３及び図１４を用いて第三実施形態に係る遅延推定装置３００を説明する。遅延推定装置３００は遅延推定部３１０と信号蓄積部３８０とを含む。遅延推定装置３００は、周波数領域変換部８１及び８２、エコー消去部９４、時間領域変換部８３を備えるエコー消去装置の内部に組込まれているものとする。 <Delay Estimation Device 300 According to Third Embodiment>
Only parts different from the first embodiment will be described. The echo cancellation apparatus according to the third embodiment includes a delay estimation apparatus 100 and an echo cancellation unit 94. Since the echo canceling unit 94 may cancel the echo using the conventional technique, the delay estimation apparatus 300 according to the third embodiment will be mainly described with reference to FIGS. 13 and 14. The delay estimation apparatus 300 includes a delay estimation unit 310 and a signal storage unit 380. The delay estimation apparatus 300 is assumed to be incorporated in an echo cancellation apparatus including frequency domain conversion units 81 and 82, an echo cancellation unit 94, and a time domain conversion unit 83.

周波数領域変換部８１及び８２は、それぞれ時間領域の再生信号ｘ（ｎ）及び収音信号ｙ（ｎ）を周波数領域の再生信号Ｘ_ｍ及び収音信号Ｙ_ｍに変換し（ｓ８１、ｓ８２）、再生信号Ｘ_ｍを遅延推定部３１０と信号蓄積部３８０とに出力し、収音信号Ｙ_ｍを遅延推定部３１０とエコー消去部９４とに出力する。例えば、以下の式により変換する。 Frequency domain transforming section 81 and 82, converts the reproduction signal x of the respective time domain (n) and the collected sound signal y (n) to the playback signal _{X m} and collected sound signal _{Y m} of frequency domain (s81, s82), and it outputs the reproduced signal _{X m} to a delay estimator 310 and the signal storage section 380, and outputs a sound collection signal _{Y m} in a delay estimation unit 310 and the echo canceling portion 94. For example, conversion is performed according to the following expression.

ｗは長さ２Ｌのハミング窓等である。 w is a Hamming window having a length of 2L.

遅延推定部３１０は、周波数領域の再生信号Ｘ_ｍと収音信号Ｙ_ｍとを受け取り、収音信号ｙ（ｎ）に含まれる反響信号を用いて再生信号ｘ（ｎ）の遅延量を推定する（ｓ３１０）。 Delay estimation unit 310 receives a reproduction signal X _m and collected signal Y _m of frequency domain, to estimate the delay amount of the reproduced signal x (n) by using the echo signal contained in the collected signal y (n) (S310).

信号蓄積部３８０は、遅延推定値ｄ_ｅｓｔに応じて、再生信号Ｘ_ｍを遅延させて、出力する（ｓ３８０）。 Signal storage unit 380, according to the delay estimate _{d est,} delays the reproduced signal _{X m,} and outputs (s380).

エコー消去部９４は、遅延された再生信号を用いて、収音信号Ｙ_ｍから反響信号を消去し（ｓ９４）、送話信号Ｅ_ｍを時間領域変換部８３に出力する。 Echo canceling unit 94 uses the reproduced signal delayed erases the echo signal from the collected sound signal Y _m (s94), and outputs a transmission signal E _m time-domain converter 83.

時間領域変換部８３は、周波数領域の送話信号Ｅ_ｍを時間領域の送話信号ｅ（ｎ）に変換し、送話端４に出力する。例えば、以下の式により変換する。 Time domain conversion unit 83 converts the transmission signal E _m in the frequency domain into transmission signal e (n) in the time domain, and outputs the transmission end 4. For example, conversion is performed according to the following expression.

＜遅延推定部３１０＞
図１５及び図１６を用いて遅延推定部３１０を説明する。遅延推定部３１０は、無音区間判定部３１３と相関値算出部３１５と遅延値算出部１１７と遅延出力部３１９とを含む。 <Delay estimation unit 310>
The delay estimation unit 310 will be described with reference to FIGS. 15 and 16. The delay estimation unit 310 includes a silent section determination unit 313, a correlation value calculation unit 315, a delay value calculation unit 117, and a delay output unit 319.

（無音区間判定部３１３）
無音区間判定部３１３は、再生信号Ｘ_ｍを受け取り、再生信号Ｘ_ｍが無音区間か否かを判定する（ｓ３１３ａ）。例えば、無音区間判定部３１３は、再生信号Ｘ_ｍのパワー||Ｘ_ｍ||を算出し、閾値Ｔ_ｘ以上か否かを判定する。無音区間判定部３１３は、パワー||Ｘ_ｍ||が閾値Ｔ_ｘ以上の場合、そのときのフレーム番号ｍをｍ_０とし、再生信号Ｘ_ｍをＸ_ｍ０として相関値算出部３１５に出力する（ｓ３１３ｂ）。 (Silent section determination unit 313)
Silent section determining unit 313 receives the reproduced signal _{X m,} the reproduced signal _{X m} determines whether the silent section (s313a). For example, the silent section determination unit 313 calculates the power || X _m || of the reproduction signal X _m and determines whether it is equal to or greater than the threshold T _x . When the power || X _m || is equal to or greater than the threshold value T _x , the silent section determination unit 313 outputs the frame number m at that time to m ₀ and the reproduction signal X _m to X _m0 and outputs it to the correlation value calculation unit 315 ( s313b).

（相関値算出部３１５）
相関値算出部３１５は、無音区間判定部において無音区間でないと判定された再生信号Ｘ_ｍ０と収音信号Ｙ_ｍと受け取り、これらの値を用いて、相関値を求める（ｓ３１５）。その際、収音信号のフレーム番号を変化させながら相関値を求めることで、各フレームの各サンプルに対して相関値を求める。例えば以下の式により相関値を求める。 (Correlation value calculation unit 315)
The correlation value calculation unit 315 receives the reproduction signal X _m0 and the sound collection signal Y _m that are determined not to be a silence interval by the silence interval determination unit, and obtains a correlation value using these values (s315). At this time, the correlation value is obtained for each sample of each frame by obtaining the correlation value while changing the frame number of the collected sound signal. For example, the correlation value is obtained by the following formula.

但し、＊は複素共役を表し、ｍ_０≦ｍ≦ｍ_０＋Ｄ_Ｆ−１である。
ｃ^〜 _ｆ（但し、記号^〜は直前の文字の頭上に附されるものとする）の前半Ｌ個を However, * represents the complex conjugate, a _{_{_{m 0 ≦ m ≦ m 0 +}}} D F -1.
c ^to _f (note that the symbol ^~ is added to the head of the immediately preceding character)

と定義する。相関値算出部３１５は、式（１４）を用いて、Ｄ_Ｆ×２Ｌ個の相関値ｃ^〜 _ｆ（ｎ）を算出し、式（１５）により、Ｄ_Ｆ×Ｌ個の相関値ｃ_ｆ（ｎ）を取得する（ｓ３１５ａ〜ｓ３１５ｃ）。取得した相関値ｃ_ｆ（ｎ）の中で最大の相関値となるときのフレーム番号をｆ_ｍａｘとし、最大の相関値となるときのサンプル番号をｎ_ｍａｘとして遅延値算出部１１７に出力する。 It is defined as The correlation value calculation section 315, using equation (14), _{D F} × calculates 2L number of correlation values ^c _~ f (n), the equation (15), _{D F} × L number of correlation values _c f ( n) is acquired (s315a to s315c). The obtained correlation value c _f (n) is output to the delay value calculation unit 117 with the frame number when the maximum correlation value is obtained as f _max and the sample number when the maximum correlation value is obtained as n _max .

１フレームあたり第一実施形態の場合、式（３）において、Ｌタップの相関計算がＬ回必要であったが、本実施形態では、式（１４）において要素数の２Ｌ回の計算を行うだけでよい。 In the case of the first embodiment per frame, in the equation (3), the L tap correlation calculation is required L times, but in this embodiment, only the calculation of 2L times of the number of elements is performed in the equation (14). It's okay.

なお上記では、式（１４）及び式（１５）の相関値で説明を行ったが、第一実施形態の場合と同様に、相関値に限らず周波数領域の再生信号と収音信号との類似性の指標を表すものであればよい。 In the above description, the correlation values of Expression (14) and Expression (15) have been described. However, similar to the case of the first embodiment, not only the correlation value but also the similarity between the reproduction signal in the frequency domain and the collected sound signal. It only needs to represent a sex indicator.

（遅延出力部３１９）
遅延出力部３１９は、遅延値算出部１１７から所定数の遅延値を受け取り、遅延出力部１１９と同様の方法により、最も頻度の高い遅延値を遅延推定値ｄ_ｅｓｔとして求める（ｓ１１９ａ〜ｓ１１９ｅ）。 (Delay output unit 319)
Delayed output unit 319 receives a predetermined number of delay value from the delay value calculating section 117, in the same manner as the delay output unit 119 obtains the most frequent delay value as the delay estimate _{d est (s119a~s119e).}

さらに、遅延出力部３１９は、以下のｄ’_ｅｓｔを求める。 Further, the delay output unit 319 calculates the following d ′ _est .

遅延出力部３１９は、ｄ’_ｅｓｔを改めてｄ_ｅｓｔとし（つまり、ｄ_ｅｓｔにｄ’_ｅｓｔを代入し）、信号蓄積部３８０に出力する（ｓ３１９ｆ）。信号蓄積部３８０には、Ｌサンプル毎の周波数領域の再生信号が蓄積されているが、このような構成とすることで、Ｌの倍数の遅延を再現することができる。 Delayed output unit 319 'and again _{d est} a _est _(i.e., d to _{d est'} d substituting _est), and outputs the signal storage unit 380 (s319f). The signal storage unit 380 stores a reproduction signal in the frequency domain for each L sample. With such a configuration, a delay that is a multiple of L can be reproduced.

＜信号蓄積部３８０＞
信号蓄積部３８０は、遅延推定値ｄ_ｅｓｔに応じて、再生信号Ｘ_ｍを遅延させて、周波数領域の遅延再生信号Ｘ_ｍ’（但し、ｍ’＝ｍ−ｄ_ｅｓｔ／Ｌ）を出力する（ｓ３８０）。再生信号ｘ（ｎ）に代えて再生信号Ｘ_ｍを用いる以外は、信号蓄積部１８０と同様である。 <Signal accumulation unit 380>
The signal storage unit 380 delays the reproduction signal X _m according to the delay estimation value d _est , and outputs a frequency domain delayed reproduction signal X _{m ′} (where m ′ = m−d _est / L) ( s380). The signal storage unit 180 is the same as the signal storage unit 180 except that the reproduction signal _Xm is used instead of the reproduction signal x (n).

＜エコー消去部９４＞
エコー消去部９４は、遅延された再生信号を用いて、収音信号Ｙ_ｍから反響信号を消去し（ｓ９４）、送話信号Ｅ_ｍを時間領域変換部８３に出力する。例えば、図１７に示すように、エコー消去部９４は、エコー抑圧ゲイン計算部９５と乗算部９７を含む構成であってもよい。エコー抑圧ゲイン計算部９５において、特許３４２０７０５号公報記載の従来技術を用いて、遅延再生信号Ｘ_ｍ’と収音信号Ｙ_ｍによりエコー抑圧ゲインＧ_ｍを求める。次に乗算部９７において収音信号Ｙ_ｍにエコー抑圧ゲインＧ_ｍを乗じてエコーを抑圧し、抑圧後の送話信号Ｅ_ｍを出力する。 <Echo elimination unit 94>
Echo canceling unit 94 uses the reproduced signal delayed erases the echo signal from the collected sound signal Y _m (s94), and outputs a transmission signal E _m time-domain converter 83. For example, as shown in FIG. 17, the echo canceller 94 may include an echo suppression gain calculator 95 and a multiplier 97. An echo suppression gain calculation unit 95 obtains an echo suppression gain G _m from the delayed reproduction signal X _{m ′} and the sound collection signal Y _m using the conventional technique described in Japanese Patent No. 3420705. Then it suppresses an echo by multiplying the echo suppression gain G _m sound pickup signal Y _m in the multiplier unit 97, and outputs a transmit signal E _m after suppression.

＜効果＞
このような構成とすることで、第一実施形態と同様の効果を得ることができる。さらに、エコー消去装置で用いる周波数領域の再生信号及び収音信号を利用することで、遅延推定時の相関計算の演算量を低く抑えることができる。 <Effect>
By setting it as such a structure, the effect similar to 1st embodiment can be acquired. Furthermore, by using the frequency domain reproduction signal and the collected sound signal used in the echo canceller, it is possible to keep the amount of calculation of correlation calculation at the time of delay estimation low.

＜その他の変形例＞
周波数領域変換部８１及び８２において、式（１）、式（２）を用いて、以下の式により、得られる周波数領域の再生信号及び収音信号であってもよい。 <Other variations>
In the frequency domain conversion units 81 and 82, the reproduction signal and the sound collection signal in the frequency domain obtained by the following equations using the equations (1) and (2) may be used.

この信号であっても遅延推定装置３００は、同様の効果を奏する。さらに、式（１４）において、Ｌ回の計算を行うだけでよく、ｃ_ｆを定義しなおす必要がなくなる。なお、以下に説明する第四実施形態においても同様である。 Even with this signal, the delay estimation apparatus 300 has the same effect. Furthermore, in the equation (14), you need only perform L computations, it is not necessary to redefine the c _f. The same applies to the fourth embodiment described below.

本実施形態では、遅延推定装置がエコー消去装置の内部に組込まれているものとしたが、組込みでなくともよい。その場合には、遅延推定装置内部に周波数領域変換部と時間領域変換部を含む構成とすればよい。 In the present embodiment, the delay estimation device is incorporated in the echo canceller, but may not be incorporated. In that case, what is necessary is just to set it as the structure which contains a frequency domain conversion part and a time domain conversion part inside a delay estimation apparatus.

＜第四実施形態に係る遅延推定装置４００＞
第三実施形態と異なる部分についてのみ説明する。図１８及び図１９を用いて第四実施形態に係る遅延推定装置４００を説明する。 <Delay Estimation Device 400 According to Fourth Embodiment>
Only parts different from the third embodiment will be described. A delay estimation apparatus 400 according to the fourth embodiment will be described with reference to FIGS. 18 and 19.

遅延推定装置４００は、遅延推定部４１０と信号蓄積部４８０を含む。遅延推定部４１０及び信号蓄積部４８０の構成及び処理内容がそれぞれ第三実施形態と異なる。遅延推定部４１０は、収音信号Ｙ_ｍとＤ_Ｆ個の再生信号Ｘ_ｍ，Ｘ_ｍ-1，…，Ｘ_{ｍ-ＤＦ＋１}とを受け取り、収音信号Ｙ_ｍに含まれる反響信号を用いて再生信号Ｘ_ｍの遅延量を推定する（ｓ４１０）。但し、下付き文字ＤＦは、Ｄ_Ｆを表す。 Delay estimation device 400 includes a delay estimation unit 410 and a signal storage unit 480. The configurations and processing contents of the delay estimation unit 410 and the signal storage unit 480 are different from those of the third embodiment. Delay estimation unit 410, sound pickup signals _{Y m} and _{D F-number} of the reproduced signal _{_{X m, X m-1,}} ..., receive and X _{m-DF + 1,} by using the echo signal contained in the collected signal _{Y m} Play to estimate the delay amount of the signal _{X m} (s410). However, subscript DF represents the _{D F.}

＜信号蓄積部４８０＞
信号蓄積部４８０は、遅延推定値ｄ_ｅｓｔに応じて、再生信号Ｘ_ｍを遅延させて、出力する（ｓ４８０）。信号蓄積部４８０は、例えば、信号格納部４８１と信号バッファ４８３と第一信号出力部４８５と第二信号出力部４８７とを含む（図２０参照）。 <Signal accumulation unit 480>
Signal storage unit 480, according to the delay estimate _{d est,} delays the reproduced signal _{X m,} and outputs (s480). The signal storage unit 480 includes, for example, a signal storage unit 481, a signal buffer 483, a first signal output unit 485, and a second signal output unit 487 (see FIG. 20).

信号バッファ４８３はＤ個の周波数領域の再生信号を保持できるバッファである（Ｄ≧Ｄ_Fであればよく、通常Ｄ＝Ｄ_Fとすればよい）。信号格納部４８１は、再生信号Ｘ_ｍを受け取り、信号バッファ４８３上の古い再生信号から順に上書きする形で保存する。 The signal buffer 483 is a buffer that can hold D frequency domain reproduction signals (D ≧ _DF , and usually D = _DF ). Signal storage unit 481 receives the reproduced signal X _m, are stored in the form of overwriting the old reproduction signal on the signal buffer 483 sequentially.

第二信号出力部４８７は、現フレームｍを含めてＤ_Ｆ個の再生信号Ｘ_ｍ，Ｘ_ｍ-1，…，Ｘ_{ｍ-ＤＦ＋１}を信号バッファ４８３から取得し、遅延推定部４１０に出力する。 The second signal output unit 487, _{D F-number} of the reproduced signal _X m including the current frame m, X _m-1, ..., acquires X _{m-DF + 1} from the signal buffer 483, and outputs the delay estimator 410.

また、信号蓄積部４８０の第一信号出力部４８５は、遅延推定値ｄ_ｅｓｔに応じて、再生信号Ｘ_ｍを遅延させて、周波数領域の遅延再生信号Ｘ_ｍ’（但し、ｍ’＝ｍ−ｄ_ｅｓｔ／Ｌ）を出力する。 Further, the first signal output unit 485 of the signal storage section 480, according to the delay estimate _{d est,} delays the reproduced signal _{X m,} delayed reproduced signal X _m in the frequency region _'(where, m' = m- d _est / L) is output.

＜遅延推定部４１０＞
遅延推定部４１０は、相関値算出部４１５と遅延値算出部１１７と遅延出力部３１９とを含む。相関値算出部４１５の構成及び処理内容（図２２のｓ４１５）が第三実施形態と異なる。 <Delay estimation unit 410>
The delay estimation unit 410 includes a correlation value calculation unit 415, a delay value calculation unit 117, and a delay output unit 319. The configuration and processing contents of the correlation value calculation unit 415 (s415 in FIG. 22) are different from those in the third embodiment.

（相関値算出部４１５）
相関値算出部４１５は、過去Ｄ_Ｆ個の再生信号Ｘ_ｍ，Ｘ_ｍ-1，…，Ｘ_{ｍ-ＤＦ＋１}と収音信号Ｙ_ｍとを用いて、Ｄ_Ｆ個の再生信号Ｘ_ｍ，Ｘ_ｍ-1，…，Ｘ_{ｍ-ＤＦ＋１}のフレーム番号と収音信号Ｙ_ｍのフレーム番号を変化させながら、各再生信号と各収音信号の各組合せの各サンプルに対して相関値を求める（ｓ４１５）。相関値算出部４１５は、図２３に示す各処理を行う。 (Correlation value calculation unit 415)
The correlation value calculation section 415, past _{D F-number} of the reproduced signal _{_{X m, X m-1,}} ..., by using the X _{m-DF + 1} and collected signal _{Y m,} _{D F-number} of the reproduced signal _X m, _{X m −1} ,..., X _{m−DF + 1 and} the frame number of the collected sound signal Y _m are changed, and a correlation value is obtained for each sample of each combination of each reproduced signal and each collected sound signal (s415). . The correlation value calculation unit 415 performs each process shown in FIG.

相関値算出部４１５において各値に初期値を設定する（ｓ４１５ａ、ｓ４１５ｂ）。 The correlation value calculation unit 415 sets an initial value for each value (s415a, s415b).

収音信号Ｙ_ｍとＤ_Ｆ個の再生信号Ｘ_ｍ，Ｘ_ｍ-1，…，Ｘ_{ｍ-ＤＦ＋１}とを受け取る。但し、ｍがＤ_Ｆ未満の場合には（ｓ４１５ｃ）、取得可能な再生信号のみ受け取る。 Collected sound signal _{Y m} and _{D F-number} of the reproduced signal _{_{X m, X m-1,}} ..., receive and X _{m-DF + 1.} However, if m is less than _{D F} receive only (S415C), obtainable reproduced signal.

ｍがＤ_Ｆ未満の場合には（ｓ４１５ｃ）、取得した再生信号と収音信号Ｙ_ｍとの相関値を算出する（ｓ４１５ｄ−１〜ｓ４１５ｄ−３）。相関値の算出方法は第三実施形態と同様である。 m is in the case of less than _{D F} calculates the (S415C), the correlation value between the obtained reproduction signal and the collected sound signal _{Y m (s415d-1~s415d-3} ). The correlation value calculation method is the same as in the third embodiment.

但し、０≦ｆ≦ｍ−１である。 However, 0 ≦ f ≦ m−1.

取得した再生信号と同数の相関値を算出し、以下の処理を行う（ｓ４１５ｅ、ｓ４１５ｆ、ｓ４１５ｇ−１〜ｓ４１５ｇ−５）。
i=m-f
for f=0〜m-1
if c_f(n_f)>c_tmp(i)
c_tmp(i)=c_f(n_f)
n_temp(i)=n_f
f_temp(i)=f
end
end
ｍがＤ_Ｆ以上となるまで上記の処理を繰り返す（ｓ４１５ｐ）。 The same number of correlation values as the obtained reproduction signals are calculated, and the following processing is performed (s415e, s415f, s415g-1 to s415g-5).
i = mf
for f = 0 ~ m-1
if c _f (n _f )> c _tmp (i)
c _tmp (i) = c _f (n _f )
n _temp (i) = n _f
f _temp (i) = f
end
end
m is the above processing is repeated until the above _{D F (s415p).}

ｍがＤ_Ｆ以上の場合には（ｓ４１５ｃ）、Ｄ_Ｆ個の再生信号Ｘ_ｍ，Ｘ_ｍ-1，…，Ｘ_{ｍ-ＤＦ＋１}と収音信号Ｙ_ｍとの相関値ｃ_ｆを算出する（ｓ４１５ｈ−１〜ｓ４１５ｈ−３）。 If m is more than _{_{D F} (s415c), D F-number} of the reproduced signal _{_{X m, X m-1,}} ..., and calculates a correlation value _{c f} of X _{m-DF + 1} and collected signal _{Y m (s415h} -1 to s415h-3).

但し、０≦ｆ≦Ｄ_Ｆ−１である。 However, 0 ≦ f ≦ D _F −1.

ｍがＤ_Ｆ以上の場合には（ｓ４１５ｆ）、ｍをＤ_Ｆで割ったあまりをｒとし（ｓ４１５ｉ）、ｉを以下のように設定する（ｓ４１５ｊ−１〜ｓ４１５ｊ−３）。 If m is more than _{D F (s415f),} the remainder was divided by _{D F} m and r (s415i), i and is set as follows (s415j-1~s415j-3).

さらに、以下の処理を行う（ｓ４１５ｊ−１〜ｓ４１５ｊ−３、ｓ４１５ｋ−１〜ｓ４１５ｋ−４）。
for f=0〜D_F-1
if c_f(n_f)>c_tmp(i)
c_tmp(i)=c_f(n_f)
n_temp(i)=n_f
f_temp(i)=f
end
end
過去Ｄ_Ｆ回の相関計算と比較が終了したインデックスｒ＋１を用いて、
n_max=n_temp(r+1)
f_max=f_temp(r+1) (19)
として、遅延値算出部１１７に出力する（ｓ４１５ｍ）。図２４は、ｍ＝Ｄ_Ｆ（ｒ＝０）のときのｃ_ｔｍｐのｃ_ｆ、ｃ_ｔｍｐ（ｉ）、ｎ_ｔｅｍｐ（ｉ）、ｆ_ｔｅｍｐ（ｉ）を記憶する記憶部の状態を示す。このとき、ｃ_０とｃ_ｔｍｐ（０）とを比較し、ｃ_１とｃ_ｔｍｐ（Ｄ_Ｆ−１）とを比較し、ｃ_２とｃ_ｔｍｐ（Ｄ_Ｆ−２）とを比較し、…、ｃ_ＤＦ−１とｃ_ｔｍｐ（１）とを比較する。比較の結果、ｃ_ｆのほうが大きい場合には、ｃ_ｔｍｐを更新する。全ての比較、更新を終えると、ｒ＋１に対応するｎ_ｔｅｍｐ（ｉ）、ｆ_ｔｅｍｐ（ｉ）を出力する。この例では、ｒ＝０なので、ｎ_ｔｅｍｐ（１）、ｆ_ｔｅｍｐ（１）をｎ_ｍａｘ、ｆ_ｍａｘとして出力する。本実施形態ではＤ_Ｆ回の比較処理を行ったｃ_ｔｍｐに対応するｆ_ｍａｘとｎ_ｍａｘを出力したいので、ｒ＋１のときのｃ_ｔｍｐ（ｉ）に対応するｎ_ｔｅｍｐ（ｉ）、ｆ_ｔｅｍｐ（ｉ）を出力する。次のフレームを受け取った場合には、上記処理を行い、ｎ_ｔｅｍｐ（２）、ｆ_ｔｅｍｐ（２）をｎ_ｍａｘ、ｆ_ｍａｘとして出力する。 Further, the following processing is performed (s415j-1 to s415j-3, s415k-1 to s415k-4).
for f = 0 ~ D _F -1
if c _f (n _f )> c _tmp (i)
c _tmp (i) = c _f (n _f )
n _temp (i) = n _f
f _temp (i) = f
end
end
Using the index r + 1 for which the past _DF correlation calculations and comparisons have been completed,
n _max = n _temp (r + 1)
f _max = f _temp (r + 1) (19)
Is output to the delay value calculation unit 117 (s415m). Figure 24 shows the state of the storage _{portion c tmp} of _{_{_{c f, c tmp (i)}}} , n temp (i), stores the _{f temp} (i) in the case of _{m = D F (r = 0} ). At this time, c ₀ and c _tmp (0) are compared, c ₁ and c _tmp (D _F −1) are compared, c ₂ and c _tmp (D _F −2) are compared,. c _DF-1 is compared with c _tmp (1). When the comparison result shows more of _{c f} is _large, it updates the _{c tmp.} When all comparisons and updates are completed, n _temp (i) and f _temp (i) corresponding to r + 1 are output. In this example, since r = 0, n _temp (1) and f _temp (1) are output as n _max and f _max . In the present embodiment, since it is desired to output f _max and n _max corresponding to c _tmp for which _DF comparison processing has been performed, n _temp (i) and f _temp (i) corresponding to c _tmp (i) at r + 1. ) Is output. When the next frame is received, the above processing is performed, and n _temp (2) and f _temp (2) are output as n _max and f _max .

計算が終了したｃ_ｔｍｐ（ｉ）、ｎ_ｔｍｐ（ｉ）、ｆ_ｔｍｐ（ｉ）は０で初期化し（ｓ４１５ｎ）、ｃ_ｔｍｐ（ｉ）には次フレームに入力される新たな再生信号Ｘ_ｍ＋１と各収音信号Ｙ_ｍ＋１の相関値を格納していく。相関値算出部４１５は、ｓ４１５ｂ〜ｓ４１５ｐの処理を繰り返す（ｓ４１５ｐ）。 Calculation is finished _{_{c tmp (i), n tmp}} (i), f tmp (i) is initialized to 0 _(s415n), a new reproduced signal _{X m + 1} which is input to the next frame to _{c tmp} (i) The correlation value of each collected sound signal Y _{m + 1} is stored. The correlation value calculation unit 415 repeats the processing from s415b to s415p (s415p).

言い換えると、相関値算出部４１５では、周波数領域収音信号Ｙ_ｍと複数の周波数領域再生信号Ｘ_ｍ，Ｘ_ｍ-1，…，Ｘ_{ｍ-ＤＦ＋１}それぞれとの類似性の指標を算出する。 In other words, the correlation value calculation unit 415 calculates an index of similarity between the frequency domain sound collection signal Y _m and each of the plurality of frequency domain reproduction signals X _m , X _m−1 ,..., X _{m-DF + 1} .

＜効果＞
このような構成とすることで、第三実施形態と同様の効果を得ることができる。ｃ_ｔｍｐの各要素は一つのＸ_ｍに対応しており、あるＸ_ｍ”を固定したままＹ_ｍ”，Ｙ_ｍ”＋１，…，Ｙ_{ｍ”＋ＤＦ-１}との相関を計算する、という第三実施形態の演算を同時並行でＤ_Ｆ回行うことができる。よって、第三実施形態よりも高速に遅延推定値が得られる。 <Effect>
By setting it as such a structure, the effect similar to 3rd embodiment can be acquired. Each element of c _tmp corresponds to one X _m , and the correlation with Y _{m ″} , Y _{m ″ +1} ,..., Y _{m ″ + DF-1} is calculated while a certain X _{m ″} is fixed. The calculations of the three embodiments can be performed _DF times in parallel. Therefore, the delay estimation value can be obtained faster than in the third embodiment.

＜その他の変形例＞
第四実施形態において、遅延推定装置４００は、無音区間判定部４１３（図２１において破線で示す）を含んでもよい。無音区間判定部４１３は、Ｄ_Ｆ個の再生信号Ｘ_ｍ，Ｘ_ｍ-1，…，Ｘ_{ｍ-ＤＦ＋１}を受け取り、再生信号Ｘ_ｉのパワーが閾値以下か否かを判定し、閾値以上の再生信号のみ遅延推定部４１０に出力する（ｓ４１３、図２２において破線で示す）。再生信号Ｘ_ｉのパワーが小さい、つまり再生信号が無音もしくはある閾値以下のパワーしかない場合に、対応する相関値ｃ_ｆの計算を行わない構成となる。Ｘ_ｉのパワーが小さい場合は相関値ｃ_ｆがノイズの影響を受けやすくなるが、このような構成とすることで、頑強な推定が可能となる。閾値は例えば信号の定格レベルの−１０ｄＢなどと設定する。 <Other variations>
In the fourth embodiment, the delay estimation apparatus 400 may include a silent section determination unit 413 (indicated by a broken line in FIG. 21). Silent section determining unit 413, _{D F} number of reproduced signals _{_{X m, X m-1,}} ..., receives X _{m-DF + 1,} power regeneration signals _{X i,} it is determined whether a threshold below reproduced above threshold Only the signal is output to delay estimation section 410 (s413, indicated by a broken line in FIG. 22). Power of the reproduced signal X _i is small, that is, when the reproduced signal has only the following power silence or a threshold, the configuration is not performed the calculation of the corresponding correlation value c _f. If the power of the X _i is small but the correlation value c _f is easily affected by noise, by adopting such a configuration, it is possible to robust estimation. For example, the threshold value is set to -10 dB of the rated level of the signal.

＜第五実施形態に係る遅延推定装置５００＞
第四実施形態と異なる部分についてのみ説明する。第五実施形態に係る遅延推定装置５００を説明する。遅延推定部５１０内の相関値算出部５１５の処理内容が遅延推定装置４００とは異なる（ｓ５１０、ｓ５１５、図１８、図１９、図２１、図２２参照）。図２５のｓ５１５ｈ−２、ｓ５１５ｄ−２に示すように、相関を計算するＸ_ｉをＡ（Ａは２以上の整数）フレーム毎にしか用いない。例えばＡ＝３の時、ｍ番目のフレームの時刻においてＸ_ｍとＹ_ｍの相関、Ｘ_ｍ-ＡとＹ_ｍの相関、Ｘ_ｍ-２ＡとＹ_ｍの相関というように計算し、ｍ＋１番目のフレームの時刻においてはＸ_ｍとＹ_ｍ＋１の相関、Ｘ_ｍ-ＡとＹ_ｍ＋１の相関、Ｘ_ｍ-２ＡとＹ_ｍ＋１の相関というように計算する。このようにしても、相関計算に用いられるＸ_ｉは間引かれるが、同一のＸ_ｉに対する異なる遅延に対応する相関値は間引かれない（図２６参照）。 <Delay Estimation Device 500 According to Fifth Embodiment>
Only parts different from the fourth embodiment will be described. A delay estimation apparatus 500 according to the fifth embodiment will be described. The processing content of the correlation value calculation unit 515 in the delay estimation unit 510 is different from that of the delay estimation device 400 (see s510, s515, FIG. 18, FIG. 19, FIG. 21, and FIG. 22). As shown in s515h-2, s515d-2 in FIG. 25, the _{X i} A for calculating the correlation (A is an integer of 2 or more) only used for each frame. For example when A = 3, m-th correlation _{X m} and _{Y m} at time frame, the correlation of X _m-A and _{Y m,} calculated as a correlation of X _m-2A and _{Y m,} m + 1 th correlation of _{X m} and _{Y m + 1} in the time of a frame, the correlation of X _m-a and _{Y m + 1,} is calculated as a correlation of X _m-2A and _{Y m + 1.} Even in this case, X _i used for the correlation calculation is thinned, but correlation values corresponding to different delays for the same X _i are not thinned (see FIG. 26).

＜効果＞
このような構成とすることで、第四実施形態と同様の効果を得ることができる。なお、Ａフレームに１回しかｄ_ｍａｘの計算がされないため、第四実施形態に比べてＴ_ｓｕｍ個の遅延値を推定するためにＡ倍の時間がかかるが（言い換えると、遅延推定値ｄ_ｅｓｔの推定速度が１／Ａに減少する）、その分演算量も相関計算部分に関しては１／Ａに減少する。遅延推定装置の処理能力に応じて適宜設定すればよい。 <Effect>
By setting it as such a structure, the effect similar to 4th embodiment can be acquired. Since d _max is calculated only once in the A frame, it takes A times longer to estimate T _sum delay values than in the fourth embodiment (in other words, the delay estimation value d _est The estimated calculation speed is reduced to 1 / A), and the calculation amount is also reduced to 1 / A for the correlation calculation part. What is necessary is just to set suitably according to the processing capacity of a delay estimation apparatus.

＜第六実施形態に係る遅延推定装置６００＞
第一実施形態と異なる部分についてのみ説明する。図１、図２、図３、図２７を用いて第六実施形態に係る遅延推定装置６００を説明する。遅延推定装置６００内の遅延推定部６１０の構成及び処理内容（ｓ６１０）が第一実施形態とは異なる。さらに詳しくいうと、遅延推定部６１０内の遅延出力部６１９の処理内容（ｓ６１９）が異なる。 <Delay Estimation Device 600 According to Sixth Embodiment>
Only parts different from the first embodiment will be described. A delay estimation apparatus 600 according to the sixth embodiment will be described with reference to FIGS. 1, 2, 3, and 27. The configuration and processing contents (s610) of the delay estimation unit 610 in the delay estimation apparatus 600 are different from those in the first embodiment. More specifically, the processing content (s619) of the delay output unit 619 in the delay estimation unit 610 is different.

遅延出力部６１９は入力されたｄ_ｍａｘを用いて
d_est=(1-α)d_max+αd’_est (20)
として出力する（図２７のｓ６１９）。なお、ｄ’_ｅｓｔは前回推定したｄ_ｅｓｔの値である。αは減衰係数で、０．９程度の値を用いる。 The delay output unit 619 uses the input d _max
d _est = (1-α) d _max + αd ' _est (20)
(S619 in FIG. 27). Note that d ′ _est is the value of d _est estimated last time. α is an attenuation coefficient, and a value of about 0.9 is used.

＜効果＞
このような構成により第一実施形態と同様の効果を得ることができる。なお、この場合、遅延出力部１１９のようにＴ_ｓｕｍ回の推定が行われるまで待たずに、真値に近い値を維持することができる。第二〜五実施形態の遅延出力部を同様の構成としてもよい。 <Effect>
With this configuration, the same effect as that of the first embodiment can be obtained. In this case, the value close to the true value can be maintained without waiting for the estimation of T _sum times as in the delay output unit 119. The delay output units of the second to fifth embodiments may have the same configuration.

＜第七実施形態に係る遅延推定装置７００＞
第一実施形態と異なる部分についてのみ説明する。図１、図２、図２８、図２９を用いて第七実施形態に係る遅延推定装置７００を説明する。遅延推定装置７００内の遅延推定部７１０の構成及び処理内容（ｓ７１０）が第一実施形態とは異なる。さらに詳しくいうと、遅延推定部７１０内の遅延値算出部７１７と遅延出力部７１９の処理内容（図２９のｓ７１７、ｓ７１９ａ、ｓ７１９ｅ）が異なる。 <Delay Estimation Device 700 According to Seventh Embodiment>
Only parts different from the first embodiment will be described. A delay estimation apparatus 700 according to the seventh embodiment will be described with reference to FIGS. 1, 2, 28, and 29. The configuration and processing contents (s710) of the delay estimation unit 710 in the delay estimation apparatus 700 are different from those in the first embodiment. More specifically, the processing contents (s717, s719a, and s719e in FIG. 29) of the delay value calculation unit 717 and the delay output unit 719 in the delay estimation unit 710 are different.

遅延値算出部７１７はｄ_ｍａｘの代わりにｆ_ｍａｘを遅延値として出力する。 The delay value calculation unit 717 outputs f _max as a delay value instead of d _max .

遅延出力部７１９は、Ｄ_Ｆの長さを持つ配列ｄ_ｈを用意し、０で初期化する（ｓ７１９ａ）。遅延出力部１１９は、遅延値ｆ_ｍａｘを受け取ると、配列ｄ_ｈのインデックスがｆ_ｍａｘ番目の要素の数を１増やす（ｓ７１９ｂ）。Ｔ_ｓｕｍ個の遅延値ｄ_ｍａｘを取得するまで、処理を繰り返す。Ｔ_ｓｕｍ回の推定を終了したところで、全要素の中で一番大きな値をとるインデックスｉ_ｍａｘ（０≦ｉ_ｍａｘ≦Ｄ_Ｆ−１）に対し、
d_est=i_maxL (21)
を出力する。 Delayed output unit 719 prepares a sequence _{d h} with a length of _{D F,} is initialized to 0 (s719a). Delayed output unit 119 receives a delay value _{f max,} the array index _{d h} is increased by one the number of _{f max} th element (s719b). The process is repeated until T _sum delay values d _max are obtained. When T _sum estimation is finished, for the index i _max (0 ≦ i _max ≦ D _F −1) having the largest value among all elements,
d _est = i _max L (21)
Is output.

＜効果＞
このような構成とすることで第一実施形態と同様の効果を得ることができる。なお、遅延推定値の正確な値は求まらないが、フレーム内の細かい誤差を無視してフレームごとに集約することで、推定が安定するメリットがある。第二〜六実施形態の遅延値算出部、遅延出力部を同様の構成としてもよい。 <Effect>
By adopting such a configuration, the same effect as in the first embodiment can be obtained. Although an accurate value of the delay estimation value cannot be obtained, there is a merit that the estimation is stabilized by ignoring a fine error in the frame and consolidating each frame. The delay value calculation unit and the delay output unit of the second to sixth embodiments may have the same configuration.

＜第八実施形態に係る遅延推定装置８００＞
第一実施形態と異なる部分についてのみ説明する。図１、図２、図３０、図３１を用いて第八実施形態に係る遅延推定装置８００を説明する。遅延推定装置８００内の遅延推定部８１０の構成及び処理内容（ｓ８１０）が第一実施形態とは異なる。さらに詳しくいうと、遅延推定部８１０は相関蓄積部８１６をさらに含み、遅延値算出部８１７の処理内容（図３１のｓ８１７）が異なる。相関蓄積部８１６は、前回算出した相関値ｃ⁻ _{ｆｏｌｄ}を蓄積する。 <Delay Estimation Device 800 According to Eighth Embodiment>
Only parts different from the first embodiment will be described. A delay estimation apparatus 800 according to the eighth embodiment will be described with reference to FIGS. 1, 2, 30, and 31. The configuration and processing contents (s810) of the delay estimation unit 810 in the delay estimation apparatus 800 are different from those in the first embodiment. More specifically, the delay estimation unit 810 further includes a correlation accumulation unit 816, and the processing content of the delay value calculation unit 817 (s817 in FIG. 31) is different. Correlation storage section 816, the correlation value ^c calculated last time _- accumulating _{f old.}

遅延値算出部８１７は、求めた相関値ｃ_ｆからそのままｄ_ｍａｘを計算するのではなく、定数β（０≦β≦１）を用いてｃ_ｆの時間変化を平滑化したｃ⁻ _ｆを用いてｆ_ｍａｘおよびｄ_ｍａｘを計算する。具体的には、遅延値算出部８１７は、相関蓄積部８１６から蓄積された（前回計算された）平滑化した相関値ｃ⁻ _{ｆｏｌｄ}を取得し、これを用いて、以下の式により平滑化した相関値ｃ⁻ _ｆを求める（ｓ８１７）。 Delay value calculating section 817, is not directly to compute the d _max from the correlation value c _f determined, constant β (0 ≦ β ≦ 1) obtained by smoothing the time variation of c _f with c ^- using _f To calculate f _max and d _max . Specifically, the delay value calculating section 817, is stored from the correlation storage section 816 a (previously calculated a) correlation values c are smoothed ^- to get the _{f old,} and used to, smoothed by the following formula correlation value c ^- determining the _{f (s817).}

但し、０≦ｆ≦Ｄ_Ｆ−１とする。 However, 0 ≦ f ≦ D _F −1.

さらに、遅延値算出部８１７は、平滑化した相関値ｃ⁻ _ｆを用いて、以下の式により、ｆ_ｍａｘを計算する。さらにｆ_ｍａｘを用いてｄ_ｍａｘを計算し、ｄ_ｍａｘを出力する。 Further, the delay value calculating section 817, smoothed correlation values c ^- using _f, the following equation _to calculate the _{f max.} Further, d _max is calculated using f _max and d _max is output.

＜効果＞
このような構成とすることによって、第一実施形態と同様の効果を得ることができる。さらに、時間平滑化によって外乱音声等によるｃ_ｆの値の一時的な乱れを防ぐことができる。第二〜七実施形態の遅延値算出部を同様の構成としてもよい。なお、第四実施形態のように相関値を毎時刻計算する場合には、平滑化した相関値ｃ⁻ _ｆoldは１フレーム前の値であるが、そうでない場合は、数フレーム前の値となることもある。 <Effect>
By setting it as such a structure, the effect similar to 1st embodiment can be acquired. Furthermore, it is possible to prevent a temporary disturbance of the values of c _f due to disturbance such as voice by the time smoothing. The delay value calculation unit of the second to seventh embodiments may have the same configuration. Note that when the correlation value is calculated every time as in the fourth embodiment, the smoothed correlation value c ^- _fold is a value one frame before, otherwise it is a value several frames before. Sometimes.

＜第九実施形態に係る遅延推定装置９００＞
第三実施形態に係る遅延推定装置３００と異なる部分についてのみ説明する。遅延推定部９１０の相関値算出部９１５の処理内容が異なる（図１３、図１４のｓ９１０、図１５及び図３２参照）。 <Delay Estimation Device 900 According to Ninth Embodiment>
Only parts different from the delay estimation apparatus 300 according to the third embodiment will be described. The processing contents of the correlation value calculation unit 915 of the delay estimation unit 910 are different (see s910 in FIG. 13 and FIG. 14, FIG. 15 and FIG. 32).

相関値算出部９１５は、再生信号Ｘ_ｍ０と収音信号Ｙを受け取り、再生信号Ｘ_ｍ０の大きさに応じてゲインＧ_ｍ０を求める（ｓ９１５ａ）。例えば、以下のようにして求める。 The correlation value calculation unit 915 receives the reproduction signal X _m0 and the collected sound signal Y, and obtains a gain G _m0 according to the magnitude of the reproduction signal X _m0 (s915a). For example, it calculates | requires as follows.

但し、閾値Ｔ_ｇ１＞Ｔ_ｇ２の正の値であり、０≦γ＜１である。Ｔ_ｇ１は通常会話において最も大きな周波数成分の値付近に設定し、Ｔ_ｇ２は通常会話においてスペクトルの谷に当たる部分の値付近に設定する。 However, the threshold value T _g1 > T _{g2 is} a positive value, and 0 ≦ γ <1. T _g1 is set near the value of the largest frequency component in normal conversation, and T _g2 is set near the value corresponding to the valley of the spectrum in normal conversation.

相関値算出部９１５は、受け取った再生信号Ｘ_ｍ０と収音信号Ｙと、求めたゲインＧ_ｍ０を用いて、相関を以下のように求める（ｓ９１５ｂ）。 The correlation value calculation unit 915 obtains the correlation as follows using the received reproduction signal X _m0 , the sound collection signal Y, and the obtained gain G _m0 (s915b).

＜効果＞
このような構成とすることで第三実施形態と同様の効果を得ることができる。さらに、あまり大きすぎる再生信号の周波数成分に関しては、時間領域へ戻したときの相関値に影響が大きすぎるため低減し、小さい再生信号の周波数成分に関してもＳＮ比が悪く外乱の影響を受けやすいため、寄与を低くすることができ、より精度の高い推定が可能となる。第四実施形態の遅延値算出部を同様の構成としてもよい。 <Effect>
By adopting such a configuration, the same effect as that of the third embodiment can be obtained. Further, the frequency component of the reproduced signal that is too large is reduced because the correlation value when returning to the time domain is too large, and the frequency component of the small reproduced signal is reduced, and the SN ratio is poor and susceptible to disturbance. , The contribution can be reduced, and more accurate estimation is possible. The delay value calculation unit of the fourth embodiment may have the same configuration.

＜その他の変形例＞
なお、時間領域で相関を計算する場合でも、ｘ（ｎ）の周波数領域の値を求め、ゲインＧ_ｍ０を設計した後、同様の特性を持つ時間領域のフィルタを求めてｘをフィルタリングすることで同様の効果が得られる。 <Other variations>
Even when calculating the correlation in the time domain, after _obtaining the frequency domain value of x (n) and designing the gain G _m0 , a time domain filter having similar characteristics is obtained and x is filtered. Similar effects can be obtained.

＜第十実施形態に係る遅延推定装置＞
第八実施形態に係る遅延推定装置８００と異なる部分についてのみ説明する。本実施形態では、第八実施形態の遅延値算出部で用いていたβの値を可変とする。図１、図２、図３０、図３１を用いて第十実施形態に係る遅延推定装置１０００を説明する。遅延推定装置１０００内の遅延推定部１０１０の構成及び処理内容（ｓ１０１０）が第八実施形態とは異なる。さらに詳しくいうと、遅延推定部１０１０内部の遅延値算出部１０１７の処理内容（ｓ１０１７）が第八実施形態と異なる。遅延値算出部１０１７は図示しない相関差分計算部と平滑係数切替部とを含む。 <Delay Estimation Device According to Tenth Embodiment>
Only parts different from the delay estimation apparatus 800 according to the eighth embodiment will be described. In the present embodiment, the β value used in the delay value calculation unit of the eighth embodiment is variable. A delay estimation apparatus 1000 according to the tenth embodiment will be described with reference to FIGS. 1, 2, 30, and 31. The configuration and processing content (s1010) of the delay estimation unit 1010 in the delay estimation apparatus 1000 are different from those in the eighth embodiment. More specifically, the processing content (s1017) of the delay value calculation unit 1017 in the delay estimation unit 1010 is different from that in the eighth embodiment. The delay value calculation unit 1017 includes a correlation difference calculation unit and a smoothing coefficient switching unit (not shown).

ｃ⁻ _ｆのあるフレームｍでの値をｃ⁻ _ｆ（ｍ）とし、相関差分計算部は、相関蓄積部からｃ⁻ _ｆ（ｍ）とｃ⁻ _ｆ（ｍ−１）を受け取り、 c ^- the value of the frame m with _f ^c _- a f (m), the correlation difference calculation section, the correlation storage section ^c _- receives f a _{(m-1), - f} (m) and ^c

を計算する。Δｃ⁻ _ｆ（ｍ）は遅延が変動していない場合は、それぞれの遅延において（各ｆにおいて）おおよそ同じ挙動をする。それに対し、遅延が変動した場合、今まで遅延の真値に近いｆに対応するｃ⁻ _ｆは急激に値が減少し、新しい遅延の真値に近いｆに対応するｃ⁻ _ｆは急激に値が上昇する。つまり、Δｃ⁻ _ｆ（ｍ）の正負がｆによって、異なり、かつ、大きさが大きくなる。 Calculate Δc ⁻ _f (m) behaves approximately the same at each delay (at each f) when the delay does not vary. In contrast, when the delay is changed, c corresponds to f close to the true value of the delay until now ^- it is _f abruptly value decreases, c corresponds to f close to the true value of the new delay ^- _f sharply values Rises. That is, the sign of Δc ⁻ _f (m) varies depending on f and increases in magnitude.

また、相関差分計算部は、細やかな時間変動の影響を除くため、以下の式を計算し、Δｃ⁻ _ｆ（ｍ）を定義しなおす。 In addition, the correlation difference calculation unit calculates the following formula and redefines Δc ⁻ _f (m) in order to eliminate the influence of fine time fluctuations.

なお、Ｉ_ｗは正の整数でｃ⁻ _ｆを加算するフレーム幅である。例えばＩ_ｗは１０程度の値とする。相関差分計算部は、Δｃ⁻ _ｆ（ｍ）を平滑係数切替部に送信する。 Note that I _w c a positive integer ^- a frame width of adding _f. For example, _Iw is set to a value of about 10. The correlation difference calculation unit transmits Δc ⁻ _f (m) to the smoothing coefficient switching unit.

平滑係数切替部は、 The smoothing coefficient switching unit

という値を求める。なお、ｓｇｎ（・）は・の符号（１もしくは−１）を表す。そして、 Is obtained. Here, sgn (•) represents the symbol (1 or −1). And

という条件判定を行う。Ｔ_ｃは相関が大きく変動していることを判定する閾値、Ｔ_ｓは相関の時間差分の正負がそろっていないことを判定する閾値である。例えば、Ｉ_ｗ＝１０、Ｄ_Ｆ＝２０程度のときにＴ_ｃ＝１０程度の値とする。また、−Ｄ_Ｆ≦Ｓ_Δ≦Ｄ_Ｆであり、Ｄ_Ｆ＝２０のときに、Ｔ_ｓ＝１０程度とする。 The condition judgment is performed. T _c is a threshold value for determining that the correlation is largely fluctuating, and T _s is a threshold value for determining that the correlation time difference is not positive or negative. For example, when I _w = 10 and D _F = 20, a value of T _c = 10 is set. In addition, when −D _F ≦ S _Δ ≦ D _F and D _F = 20, T _{s is} about 10.

平滑係数切替部は、式（３５）の条件を満たしたときのみ、第八実施形態のβを以下の式によりβ_２に置き換える。
β₂=1-γ(1-β) (36)
γは１以上の実数で、βの値が小さくなることで平滑化の効果が小さくなり、遅延変動への追随が速くなる。例えば、γ＝５．０とする。なお、平滑係数切替部は、βをβ_２に置き換えた後に、上記条件を満たさなくなった場合には、β_２をβに戻す。遅延値算出部８１７は、βまたはβ_２を用いて、式（２２）を計算し、ｃ⁻ _ｆを求める。他の処理は第八実施形態と同様である。 The smoothing coefficient switching unit replaces β in the eighth embodiment with β ₂ by the following equation only when the condition of Equation (35) is satisfied.
β ₂ = 1-γ (1-β) (36)
γ is a real number equal to or greater than 1, and the effect of smoothing is reduced by decreasing the value of β, and the follow-up to delay variation becomes faster. For example, γ = 5.0. The smoothing coefficient switching unit returns β ₂ to β when the above condition is not satisfied after β is replaced with β ₂ . Delay value calculating section 817, using a beta or beta _2, calculates the equation (22), c ^- determining the _f. Other processes are the same as in the eighth embodiment.

＜効果＞
このような構成とすることで第八実施形態と同様の効果を得ることができる。なお、第八実施形態において、ｃ_ｆの時間変化を平滑化すると述べているが、平滑化をかければかけるほど遅延変動に対して追随が遅くなるというデメリットがあるが、本実施形態であれば、遅延が変動した際に追随を速くし、遅延が変動していない場合は平滑化を強めにして外乱に強くするという処理を遅延値算出部に追加している。 <Effect>
By adopting such a configuration, the same effect as in the eighth embodiment can be obtained. Note that in the eighth embodiment, although said smoothes the temporal variation of c _f, there is a disadvantage that follow the delay variation as applied by multiplying the smoothed slower but, if this embodiment In the delay value calculation unit, a process of speeding up the follow-up when the delay fluctuates, and increasing the smoothing and strengthening the disturbance when the delay does not fluctuate is added.

［シミュレーション結果］
図３３、図３４に第五、七、八、十実施形態を組み合わせた構成の遅延推定装置（但し、γ＝１．０とし、第四実施形態の変形例で説明した無音区間判定部４１３を備える）の計算機上のシミュレーション結果を示す。再生信号は１６ｋＨｚサンプリングの音声データで、Ｌ＝１６０（＝１０ｍｓ）、Ｄ_Ｆ＝２０、Ａ＝５（第五実施形態の間引き）、Ｔ_ｓｕｍ＝６、β＝０．９５（平滑係数）とした。遅延を１２．５秒と４２秒の位置で変動させ、相関の変化と推定遅延の推移をプロットした。図３４の推定遅延のグラフは、図３３の各時刻で最大である相関から現在の遅延値を計算したものである。図３３は３通りの遅延に対応する相関値の変動を表し、ｃ_１、ｃ_６、ｃ_１1、はそれぞれ１０ｍｓ、６０ｍｓ、１１０ｍｓの遅延に対応する相関値の変動を表す。０秒から１２．５秒までは遅延は１０ｍｓ程度であり、ｃ１の値（太線）が最大になれば正しい遅延が推定されることになる。図３３のプロットもそのようになっている。また図３４プロットも遅延真値と推定遅延値が一致している。同様に、１２．５秒から４２秒は遅延が１１０ｍｓ程度、４２秒から６０秒までは遅延が６０ｍｓ程度であり、それぞれ正しい遅延（極太線、太点線）が推定されている。ただし、推定遅延値が遅延の推定値になるには１０秒程度の推定時間がかかっている。 [simulation result]
33 and 34, the delay estimation apparatus having the configuration of the fifth, seventh, eighth, and tenth embodiments (provided that γ = 1.0 and the silent section determination unit 413 described in the modification of the fourth embodiment is used) The simulation results on the computer are provided. The reproduction signal is audio data of 16 kHz sampling, L = 160 (= 10 ms), D _F = 20, A = 5 (decimation of the fifth embodiment), T _sum = 6, β = 0.95 (smooth coefficient). did. The delay was varied at the positions of 12.5 and 42 seconds, and the correlation change and the estimated delay transition were plotted. The estimated delay graph of FIG. 34 is obtained by calculating the current delay value from the correlation that is maximum at each time of FIG. FIG. 33 shows the fluctuation of the correlation values corresponding to the three delays, and c ₁ , c ₆ , and c ₁₁ represent the fluctuations of the correlation values corresponding to the delays of 10 ms, 60 ms, and 110 ms, respectively. The delay is about 10 ms from 0 second to 12.5 seconds, and the correct delay is estimated when the value of c1 (thick line) is maximized. The plot of FIG. 33 is also like that. Also in the plot of FIG. 34, the true delay value and the estimated delay value are the same. Similarly, the delay is about 110 ms from 12.5 seconds to 42 seconds, and the delay is about 60 ms from 42 seconds to 60 seconds, and the correct delay (extremely thick line, thick dotted line) is estimated. However, it takes an estimated time of about 10 seconds for the estimated delay value to become the estimated delay value.

同様の実験を、第十実施形態のγ＝５．０として実験を行った。図３５、図３６に結果を示す。遅延が変動した際の相関値の増加・減少の傾斜が大きくなっており、遅延の変動にすばやく追従している。そのため、図３６の推定遅延値も、実際の遅延変動から２秒程度で推定が行えている。追従を大きくするには、βの値をもともと小さくしておけばよいが、そうすると遅延変動が起きていない部分の推定値の変動まで大きくなってしまう。この実験では、２回の遅延変動の周辺以外は安定した相関の計算が行われているため、推定速度と安定性の両立が行えている。 A similar experiment was conducted with γ = 5.0 in the tenth embodiment. The results are shown in FIGS. The slope of the increase / decrease of the correlation value when the delay fluctuates is large, and it quickly follows the fluctuation of the delay. Therefore, the estimated delay value in FIG. 36 can be estimated in about 2 seconds from the actual delay variation. In order to increase the follow-up, it is only necessary to decrease the value of β from the beginning, but if this is done, the fluctuation of the estimated value in the portion where the delay fluctuation does not occur becomes large. In this experiment, since stable correlation calculation is performed except in the vicinity of two delay fluctuations, both the estimated speed and stability can be achieved.

＜プログラム及び記録媒体＞
上述した遅延推定装置は、コンピュータにより機能させることもできる。この場合はコンピュータに、目的とする装置（各種実施例で図に示した機能構成をもつ装置）として機能させるためのプログラム、またはその処理手順（各実施例で示したもの）の各過程をコンピュータに実行させるためのプログラムを、ＣＤ−ＲＯＭ、磁気ディスク、半導体記憶装置などの記録媒体から、あるいは通信回線を介してそのコンピュータ内にダウンロードし、そのプログラムを実行させればよい。 <Program and recording medium>
The delay estimation apparatus described above can also be functioned by a computer. In this case, each process of a program for causing a computer to function as a target device (a device having the functional configuration shown in the drawings in various embodiments) or a processing procedure (shown in each embodiment) is processed by the computer. A program to be executed by the computer may be downloaded from a recording medium such as a CD-ROM, a magnetic disk, or a semiconductor storage device or via a communication line into the computer, and the program may be executed.

＜その他の変形例＞
本発明は上記の実施形態及び変形例に限定されるものではない。例えば、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 <Other variations>
The present invention is not limited to the above-described embodiments and modifications. For example, the various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. In addition, it can change suitably in the range which does not deviate from the meaning of this invention.

Claims

Obtaining a sequence of r samples (where r is a plurality) starting from a discrete time t of a digital reproduction signal in the time domain as a frame reproduction signal;
Obtaining each of a sequence of r samples consecutive from a plurality of different times including the discrete time t of the digital sound pickup signal in the time domain as a frame sound pickup signal;
Converting the frame reproduction signal into a frequency domain signal to obtain a frequency domain reproduction signal;
Converting each of the plurality of frame sound collection signals to a frequency domain signal to obtain a plurality of frequency domain sound collection signals;
Calculating an index of similarity between the frequency domain reproduction signal and each of the plurality of frequency domain sound collection signals;
The calculated similarity index indicates that the similarity between the frequency domain reproduction signal and the frequency domain sound collection signal is the highest, and the difference in time corresponding to the frequency domain reproduction signal and the frequency domain sound collection signal. Calculating as a delay value;
Delaying the reproduction signal based on the delay value;
Using the delayed playback signal to erase the echo signal from the collected signal;
Echo cancellation method including

Obtaining, as a frame reproduction signal, a sequence of successive r samples starting from a plurality of different times including the discrete time t of the digital reproduction signal in the time domain;
Obtaining a sequence of r samples (where r is a plurality) starting from a discrete time t of a digital sound pickup signal in the time domain as a frame sound pickup signal;
Converting each of the plurality of frame reproduction signals into a frequency domain signal to obtain a plurality of frequency domain reproduction signals;
Converting the frame sound collection signal into a frequency domain signal to obtain a frequency domain sound collection signal;
Calculating an index of similarity between the frequency domain collected signal and each of the plurality of frequency domain reproduction signals;
The calculated similarity index indicates that the similarity between the frequency domain sound collection signal and the frequency domain reproduction signal is the highest, and the difference in time corresponding to the frequency domain sound collection signal and the frequency domain reproduction signal. Calculating as a delay value;
Delaying the reproduction signal based on the delay value;
Using the delayed playback signal to eliminate an echo signal from the playback signal;
Echo cancellation method including

Obtaining a sequence of r samples (r is a plurality) starting from a discrete time t of a digital reproduction signal in the time domain as a frame reproduction signal;
Obtaining each of a sequence of r samples consecutive from a plurality of different times including the discrete time t of the digital sound pickup signal in the time domain as a frame sound pickup signal;
Calculating an index of similarity between the frame reproduction signal and each of the plurality of frame sound collection signals;
The calculated similarity index indicates that the similarity between the frame reproduction signal and the frame sound pickup signal is the highest, and the difference between the times corresponding to the frame reproduction signal and the frame sound pickup signal is used as a delay value. Seeking steps,
Delaying the reproduction signal based on the delay value;
Using the delayed playback signal to erase the echo signal from the collected signal;
Echo cancellation method including

An echo canceller that estimates a delay amount of a reproduction signal using an echo signal included in a collected sound signal,
A correlation value calculation unit for obtaining a correlation value between the reproduction signal in the time domain and the sound collection signal in the time domain for each sample of each frame while changing a frame number and a sample number of the sound collection signal;
A delay value calculating unit that calculates a delay value using a frame number and a sample number of a sound pickup signal when the correlation value is maximized;
A signal accumulator that delays the reproduction signal based on the delay value;
Using the delayed reproduction signal, an echo canceling unit for canceling an echo signal from the collected sound signal;
Echo canceller including

The echo canceller according to claim 4,
The correlation value calculation unit
Summing the reproduction signals in the time domain for each predetermined range, summing the sound collection signals in the time domain for each predetermined range,
An area correlation value between the summed reproduction signal and the summed sound pickup signal is obtained for each predetermined range of each frame,
The correlation value between the reproduction signal in the time domain and the sound collection signal in the time domain is centered on a predetermined range when the frame number of the sound collection signal is changed and the area correlation value is maximized. Obtain for each sample of each frame, changing the sample number within the range of several samples before and after,
Echo canceler.

An echo canceller that estimates a delay amount of a reproduction signal using an echo signal included in a collected sound signal,
A correlation value calculation unit that obtains a correlation value for each sample of each frame while changing the frame number of the sound collection signal using the reproduction signal in the frequency domain and the sound collection signal in the frequency domain;
A delay value calculating unit that calculates a delay value using a frame number and a sample number of a sound pickup signal when the correlation value is maximized;
A signal accumulator that delays the reproduction signal based on the delay value;
Using the delayed reproduction signal, an echo canceling unit for canceling an echo signal from the collected sound signal;
Echo canceller including

The echo canceller according to claim 6,
A second signal output unit for outputting the past several frames including the current frame among the reproduction signals accumulated in the signal buffer; and
The correlation value calculating unit uses each of the reproduced signals of the past several frames and the collected sound signal to change each reproduced signal while changing the frame number of the reproduced signal of the past several frames and the frame number of the collected sound signal. And obtain a correlation value for each sample of each combination of collected sound signals,
Echo canceler.

The echo canceller according to any one of claims 4 to 7,
A correlation storage unit for storing the correlation value;
β is a real number not less than 0 and not more than 1, and the delay value calculation unit uses the sample number n _max of the collected sound signal when the correlation value is maximized and the accumulated correlation value c ⁻ _{f old.} The value c _f

Smoothed as, smoothed correlation value c ^- _f and the correlation value c ^- _f by using the frame number of the sound signals picked up when the maximum, to calculate a delay value,
Echo canceler.

The echo canceller according to claim 6 or 7,
The correlation value calculating section in accordance with the magnitude of the reproduced signal X _m in the frequency domain, to generate a gain, using said collected sound signal with the gain and the reproduced signal X _m, the collected sound signal While changing the frame number, find the correlation value for each sample in each frame.
Echo canceler.

The echo canceller according to claim 7, comprising:
The second signal output unit outputs a past frame for each A frame when outputting the past several frames including the current frame among the reproduction signals accumulated in the signal buffer.
Echo canceler.

The echo canceller according to any one of claims 4 to 10,
A delay output unit that receives a predetermined number of the delay values and outputs the most frequent delay value as a delay estimation value;
The signal storage unit delays the reproduction signal according to the delay estimation value obtained based on the delay value;
Echo canceler.

The echo canceller according to any one of claims 4 to 10,
α is an attenuation coefficient, and using the delay value d _max and the previously estimated delay estimate d ′ _est , the current delay estimate d _est is
d _est = (1-α) d _max + αd ' _est (20)
A delay output unit that outputs as
Echo canceler.

The echo canceller according to claim 11, comprising:
The delay value calculation unit calculates a frame number of a sound pickup signal when the correlation value is maximum as a delay value.
Echo canceler.

The program for making a computer perform the echo cancellation method in any one of Claim 1 to 3.