JP6502307B2

JP6502307B2 - Echo cancellation apparatus, method and program therefor

Info

Publication number: JP6502307B2
Application number: JP2016219916A
Authority: JP
Inventors: 島内　末廣; 末廣島内
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-11-10
Filing date: 2016-11-10
Publication date: 2019-04-17
Anticipated expiration: 2036-11-10
Also published as: JP2018078490A

Description

本発明は、スピーカとマイクロホンとの音響結合に起因する反響の消去方法に関する。 The present invention relates to a method of canceling echo due to acoustic coupling between a speaker and a microphone.

スピーカとマイクロホンとが同一空間に配備されたとき、スピーカからの再生音がマイクロホンに回り込むため、マイクロホンにより収音された収音信号に、反響（エコー）が混入する。収音信号から反響を除去するための反響消去方法が従来から考案されている（非特許文献１参照）。 When the speaker and the microphone are disposed in the same space, the reproduced sound from the speaker wraps around to the microphone, so that echo (echo) mixes in the collected signal collected by the microphone. An echo cancellation method for removing echoes from a collected signal has conventionally been devised (see Non-Patent Document 1).

図１を用いて従来の反響消去方法について説明する。図１は、周波数領域において音響伝達経路の伝達特性の推定値を求める反響消去装置の機能ブロック図を示す。ここで、収音信号y(n)に対してNサンプル毎の信号フレームを構成し、１フレームを処理単位として反響消去処理が実行されるものとし、音響伝達経路のインパルス応答の実効長はLサンプルであると想定する。 The conventional echo cancellation method will be described with reference to FIG. FIG. 1 shows a functional block diagram of an echo canceller which determines an estimate of the transfer characteristic of an acoustic transfer path in the frequency domain. Here, a signal frame for every N samples is configured with respect to the collected signal y (n), and echo cancellation processing is performed with one frame as a processing unit, and the effective length of the impulse response of the sound transmission path is L Assume it is a sample.

まず、周波数領域変換部１０は、（N+L-1）サンプル過去から、現時刻nまでのスピーカ３への入力信号x(n-N-L+1),x(n-N-L+2),…,x(n)を取得し、これを離散フーリエ変換することで、周波数領域の入力信号X(0),X(1),…,X(N+L-1)を得る。 First, the frequency domain conversion unit 10 inputs signals x (nN−L + 1), x (nN−L + 2),... To the speaker 3 from the (N + L−1) sample past to the present time n. , x (n) and subjecting this to discrete Fourier transform to obtain input signals X (0), X (1),..., X (N + L-1) in the frequency domain.

次に、エコー推定部２０は、各周波数番号k(k=0,1,…,N+L-1)において、周波数領域のエコー推定値Y^(k)=H(k)X(k)を計算する。なお、H(k)は周波数領域の音響伝達経路の伝達特性の推定値である。 Next, in each frequency number k (k = 0, 1, ..., N + L-1), the echo estimation unit 20 estimates the echo estimated value Y ^ (k) = H (k) X (k) in the frequency domain. Calculate H (k) is an estimated value of the transfer characteristic of the sound transfer path in the frequency domain.

時間領域変換部３０は、周波数領域のエコー推定値Y^(k)を逆離散フーリエ変換することで、時間領域のエコー推定値y^(n-N-L+1),y^(n-N-L+2),…,y^(n)を得る。さらに、その最後Nサンプル分の要素y^(n-N+1),y^(n-N+2),…,y^(n)を抜きだし、減算部４０に出力する。 The time domain transform unit 30 performs inverse discrete Fourier transform on the echo estimated value Y ^ (k) in the frequency domain to convert the echo estimated value in the time domain y ^ (nN-L + 1), y ^ (nN-L +). 2), ..., y ^ (n) are obtained. Furthermore, the elements y ^ (n-N + 1), y ^ (n-N + 2), ..., y ^ (n) for the last N samples are extracted and output to the subtraction unit 40.

減算部４０は、Nサンプルの収音信号y(n-N+1),y(n-N+2),…,y(n)と時間領域のエコー推定値y^(n-N+1),y^(n-N+2),…,y^(n)との差分である残差信号e(n-N+1),e(n-N+2),…,e(n)を得る。この残差信号は、反響消去信号としても送話端５へ出力される。 The subtractor 40 outputs N samples of collected sound signals y (n-N + 1), y (n-N + 2),..., Y (n) and echo estimated values y ^ (n-N + 1) in the time domain. , y ^ (n-N + 2),..., y ^ (n), and the residual signal e (n-N + 1), e (n-N + 2),. Get). This residual signal is also output to the transmitting end 5 as an echo cancellation signal.

周波数領域変換部５０は、残差信号e(n-N+1),e(n-N+2),…,e(n)の系列の前にL点の零系列を加え、離散フーリエ変換により、N+L点の周波数領域の残差信号E(k)を得る。 The frequency domain transformation unit 50 adds a zero sequence of L points before the sequence of the residual signals e (n-N + 1), e (n-N + 2),. Thus, a residual signal E (k) in the frequency domain of N + L points is obtained.

伝達経路特性更新部６０は、各周波数番号kにおける周波数領域の音響伝達経路の伝達特性の推定値H(k)を次式により更新する。 The transfer path characteristic update unit 60 updates the estimated value H (k) of the transfer characteristic of the sound transfer path in the frequency domain at each frequency number k according to the following equation.

ただし、μ_kは更新量を調整するステップサイズであり、P(k)は入力信号のパワーの大きさに依存した正規化係数、^*は複素共役を表す。 Here, μ _k is a step size for adjusting the update amount, P (k) is a normalization coefficient depending on the magnitude of the power of the input signal, and ^* is a complex conjugate.

反響消去装置９０は、Nサンプル分の離散時間が経過した時点で、周波数領域変換部１０の処理に戻り、入力信号x(n)および、収音信号y(n)を取得し直し、処理を繰り返す。 The echo canceler 90 returns to the processing of the frequency domain transform unit 10 when discrete time for N samples has passed, and reacquires the input signal x (n) and the collected signal y (n), repeat.

C. Breining et al., "Acoustic echo control. An application of very-high-order adaptive filters", in IEEE Signal Processing Magazine, vol. 16, no. 4, pp. 42-69, Jul 1999.C. Breining et al., "Acoustic echo control. An application of very-high-order adaptive filters", in IEEE Signal Processing Magazine, vol. 16, no. 4, pp. 42-69, Jul 1999.

多くの反響消去方法は、音声通話用に考案されており、円滑な通話を実現するために、反響消去にかかる処理遅延の低減が要求される。このため、収音信号の少数のサンプルに対して、逐次エコー推定値を推定し、反響消去信号を出力する必要がある。多くの場合、スピーカとマイクロホンの間の音響伝達経路のインパルス応答をまず推定し、それにスピーカ３への入力信号x(n)を適用することで、エコー推定値の推定が実現される。インパルス応答は、数10〜数100msの長さを有し、これらのサンプル値の全ての推定を、逐次与えられる収音信号の少数のサンプルと、スピーカ３への入力信号のサンプルを参照して、実行することになる。このため、未知数の数（インパルス応答のサンプルの数）が、満足すべき方程式の数（収音信号のサンプル数）を上回る、解が不定な連立方程式を逐次解いていくことにより、インパルス応答の各サンプルが推定されることになる。このため、従来の反響消去方法では、実際に十分に反響信号が消去されるまでに、一定の時間を要する。図１に挙げた処理においても、各周波数番号kにおける周波数領域の音響伝達経路の伝達特性の推定値H(k)が適切に推定され、反響（エコー）が十分消去されるためには、一定時間の信号の入力を要する。 Many echo cancellation methods are designed for voice calls, and it is required to reduce the processing delay for echo cancellation in order to realize smooth calls. For this reason, it is necessary to estimate echo estimation values sequentially for a small number of samples of the collected signal and output an echo cancellation signal. In many cases, by first estimating the impulse response of the acoustic transmission path between the speaker and the microphone, and applying the input signal x (n) to the speaker 3 thereto, estimation of the echo estimation value is realized. The impulse response has a length of several tens to several hundreds ms, and estimates of all these sample values are made with reference to a few samples of the collected signal sequentially given and samples of the input signal to the speaker 3 , Will run. For this reason, the impulse response can be obtained by sequentially solving an indeterminate simultaneous equation in which the number of unknowns (the number of samples of the impulse response) exceeds the number of equations (the number of samples of the collected signal) Each sample will be estimated. For this reason, in the conventional echo cancellation method, it takes a certain time until the echo signal is practically sufficiently canceled. Also in the process shown in FIG. 1, the estimated value H (k) of the transfer characteristic of the sound transfer path in the frequency domain at each frequency number k is properly estimated, and is constant to sufficiently cancel the echo (echo). Requires time signal input.

本発明は、与えられた収音信号のサンプル系列に対し、即座に、反響が従来よりも適切に消去された反響消去信号の系列を生成し得る反響消去装置、その方法及びプログラムを提供することを目的とする。 The present invention provides an echo canceler capable of generating a series of echo cancellation signals in which echoes are canceled more appropriately than in the past, and a method and program thereof, for a given sample sequence of collected sound signals. With the goal.

上記の課題を解決するために、本発明の一態様によれば、反響消去装置は、想定されるインパルス応答の実効長のサンプルの数よりも反響消去処理の処理単位とするサンプルの数Nが大きいものとし、Nサンプル毎に、時間領域の入力信号xを周波数領域の入力信号Xに変換する第一周波数領域変換部と、繰り返し回数の番号を示すインデックスをpとし、入力信号Xと周波数領域の音響伝達経路の伝達特性の推定値H_pとを用いて、周波数領域のエコー推定値Y^_pを求めるエコー推定部と、エコー推定値Y^_pを時間領域のエコー推定値y^_pに変換する時間領域変換部と、時間領域の収音信号yとエコー推定値y^_pとの差分である時間領域の残差信号e_pを求める減算部と、時間領域の残差信号e_pを周波数領域の残差信号E_pに変換する第二周波数領域変換部と、入力信号Xと残差信号E_pとを用いて、推定値H_pを更新し、更新後の推定値H_p+1を得る伝達経路特性更新部と、入力信号Xと収音信号yとを用いて、更新後の推定値H_p+1が収束するまでエコー推定部と時間領域変換部と減算部と第二周波数領域変換部と伝達経路特性更新部とにおける処理を繰り返すように制御する制御部とを含む。 In order to solve the above problems, according to one aspect of the present invention, the echo canceler has the number N of samples as a processing unit of echo cancellation processing more than the number of samples of the effective length of the assumed impulse response. The first frequency domain transform unit converts the input signal x in the time domain into the input signal X in the frequency domain every N samples, the index indicating the number of repetitions is p, and the input signal X and the frequency domain are large. Using the estimated value H _p of the transfer characteristic of the acoustic transfer path to obtain the echo estimated value Y ^ _p in the frequency domain, and the echo estimated value Y ^ _p as the echo estimated value y ^ _p in the time domain A time domain transformation unit to be converted, a subtraction unit to obtain a time domain residual signal e _p which is a difference between the time domain collected sound signal y and an echo estimated value y ^ _p , and a time domain residual signal e _p a second frequency domain converter for converting the residual signal E _p in the frequency domain, Using the force signal X and the residual signal E _p, and updates the estimated value H _p, and the transmission path characteristic update section for obtaining an estimate H _{p + 1} after updating, and the input signal X and the collected sound signal y Control to repeat the processing in the echo estimation unit, the time domain conversion unit, the subtraction unit, the second frequency domain conversion unit, and the transmission path characteristic update unit until the updated estimated value H _{p + 1} converges using Including the department.

上記の課題を解決するために、本発明の他の態様によれば、反響消去方法は、想定されるインパルス応答の実効長のサンプルの数よりも反響消去処理の処理単位とするサンプルの数Nが大きいものとし、Nサンプル毎に、時間領域の入力信号xを周波数領域の入力信号Xに変換する第一周波数領域変換ステップと、繰り返し回数の番号を示すインデックスをpとし、入力信号Xと周波数領域の音響伝達経路の伝達特性の推定値H_pとを用いて、周波数領域のエコー推定値Y^_pを求めるエコー推定ステップと、エコー推定値Y^_pを時間領域のエコー推定値y^_pに変換する時間領域変換ステップと、時間領域の収音信号yとエコー推定値y^_pとの差分である時間領域の残差信号e_pを求める減算ステップと、時間領域の残差信号e_pを周波数領域の残差信号E_pに変換する第二周波数領域変換ステップと、入力信号Xと残差信号E_pとを用いて、推定値H_pを更新し、更新後の推定値H_p+1を得る伝達経路特性更新ステップとを含み、入力信号Xと収音信号yとを用いて、更新後の推定値H_p+1が収束するまでエコー推定ステップと時間領域変換ステップと減算ステップと第二周波数領域変換ステップと伝達経路特性更新ステップとにおける処理を繰り返す。 In order to solve the above problems, according to another aspect of the present invention, the echo cancellation method has the number N of samples as a processing unit of echo cancellation processing more than the number of samples of the effective length of the assumed impulse response. The first frequency domain conversion step of converting the input signal x in the time domain to the input signal X in the frequency domain every N samples, the index indicating the number of repetitions is p, and the input signal X and the frequency by using the estimated value H _p of the transfer characteristic of the acoustic transmission paths in the region, and the echo estimation step of obtaining the echo estimate Y ^ _p in the frequency domain, of the echo estimate Y ^ _p time domain echo estimate y ^ _p and the time domain conversion step of converting into a subtraction step of obtaining a residual signal e _p in the time domain which is the difference between the collected signal y and the echo estimate y ^ _p in the time domain, the residual signal e _p in the time domain to convert the residual signal E _p in the frequency domain A second frequency domain transforming step, by using the input signal X and the residual signal E _p, and updates the estimated value H _p, and a transmission path characteristic update step of obtaining an estimate H _{p + 1} after updating, The echo estimation step, the time domain conversion step, the subtraction step, the second frequency domain conversion step, and the transfer path characteristic update step until the updated estimated value H _{p + 1} converges using the input signal X and the collected sound signal y Repeat the process in and.

本発明によれば、与えられた収音信号のサンプル系列に対し、即座に、反響が従来よりも適切に消去された反響消去信号の系列を生成し得るという効果を奏する。 According to the present invention, it is possible to generate an echo cancellation signal sequence in which the echo is canceled more appropriately than in the past, for a given sample sequence of the collected sound signal.

従来技術に係る反響消去装置の機能ブロック図。The functional block diagram of the echo canceler which concerns on a prior art. 第一実施形態に係る反響消去装置の機能ブロック図。FIG. 2 is a functional block diagram of an echo canceler according to the first embodiment. 第一実施形態に係る反響消去装置の処理フローの例を示す図。The figure which shows the example of the processing flow of the echo canceler which concerns on 1st embodiment. 第一実施形態に係る伝達経路特性更新部の処理例を説明するための図。The figure for demonstrating the process example of the transfer path characteristic update part which concerns on 1st embodiment. 第一実施形態の変形例に係る伝達経路特性更新部の処理例を説明するための図。The figure for demonstrating the process example of the transmission path characteristic update part which concerns on the modification of 1st embodiment.

以下、本発明の実施形態について、説明する。なお、以下の説明に用いる図面では、同じ機能を持つ構成部や同じ処理を行うステップには同一の符号を記し、重複説明を省略する。以下の説明において、テキスト中で使用する記号「^」等は、本来直前の文字の真上に記載されるべきものであるが、テキスト記法の制限により、当該文字の直後に記載する。式中においてはこれらの記号は本来の位置に記述している。また、ベクトルや行列の各要素単位で行われる処理は、特に断りが無い限り、そのベクトルやその行列の全ての要素に対して適用されるものとする。 Hereinafter, embodiments of the present invention will be described. In the drawings used in the following description, the same reference numerals are given to constituent parts having the same functions and steps for performing the same processing, and redundant description will be omitted. In the following description, the symbol “^” or the like used in the text should be written directly above the preceding character, but due to the limitation of the text notation, it is written immediately after the character. In the formula, these symbols are described at their original positions. Moreover, the processing performed in each element unit of a vector or a matrix is applied to all elements of the vector or the matrix unless otherwise noted.

＜第一実施形態のポイント＞
本実施形態における反響消去方法は、未知数の数（インパルス応答のサンプルの数L）よりも、満足すべき方程式の数（収音信号のサンプル数N）が大きいものとする。つまり、本実施形態では、反響消去信号の出力の遅延に若干ながらの猶予が与えられた場合を考える。例えば、音声通話用途ではなく、音声認識システムにおいて、話者に音声の発話を促すガイダンス音声や報知音などをスピーカから再生した場合に、音声収録用に回り込む反響（エコー）を消去し、発話者の音声のみを、音声認識システムに与えることを想定する。この場合、音声通話用途と比較し、僅かながら、反響消去信号を出力するまでの遅延時間の増加が許容される利用例もある。本実施形態では、このように、音声通話用途と比較して、出力の遅延に若干ながらの猶予が与えられた場合を考える。すなわち、図１の従来技術において、音声通話用では、1回の処理単位で扱う収音信号のサンプル数Nが、例えば、10ms程度に相当するのに対し、ここでは、100msあるいは数100ms程度の長さで設定し得る場合を想定する。この場合においても、図１の従来技術は適用できるが、そのまま適用した場合には、一定時間の間、反響が十分に消去されないという問題は依然として生じる。これは、音声通話用途の場合、未知数の数（インパルス応答のサンプルの数L）が、満足すべき方程式の数（収音信号のサンプル数N）を上回り、L>Nの劣決定な連立方程式を解く問題であり、伝達特性の推定値を一意に決定できない状況に問題がある。一方、本実施形態では、L<Nの優決定な連立方程式を解く問題に変わっており、例えば、最小二乗法により、伝達特性の推定値を一意に決定することができる。しかしながら、図１の従来技術で得られる伝達特性の推定値は最小二乗法の解とはなっていない。このため、一定時間の間、反響が十分に消去されない問題が依然として残る。実際に最小二乗法を適用するには、入力信号についての相関行列の逆行列を解く必要があるが、非常に高次元の行列の逆行列を数値的に安定に解くのは、非常に困難である。このため、本実施形態では、逆行列を直接解かずに、伝達特性の推定値を、再帰的に最小二乗解へと近づける方法で実現する。 <Point of the first embodiment>
In the echo cancellation method according to this embodiment, the number of equations to be satisfied (the number N of samples of the sound collection signal) is larger than the number of unknowns (the number L of samples of the impulse response). That is, in the present embodiment, it is assumed that the delay of the output of the echo cancellation signal is slightly delayed. For example, in a voice recognition system, not a voice communication application, when a guidance voice or a notification sound prompting a speaker to utter a voice is reproduced from the speaker, the echo (echo) coming around for voice recording is erased, and the utterer It is assumed that only the voice of is given to the speech recognition system. In this case, in some cases, the delay time to output the echo cancellation signal may be slightly increased as compared with the voice communication application. In this embodiment, as described above, it is assumed that the output delay is slightly delayed compared to the voice communication application. That is, in the prior art of FIG. 1, the number N of samples of the collected sound signal handled in one processing unit corresponds to, for example, about 10 ms in voice communication, in this case about 100 ms or several 100 ms. It is assumed that the length can be set. Even in this case, although the prior art of FIG. 1 is applicable, when applied as it is, the problem that the echo is not sufficiently eliminated for a certain period of time still occurs. This is because, in the case of voice communication applications, the number of unknowns (number L of samples of impulse response) exceeds the number of equations to be satisfied (number N of samples of collected sound signal), and L> N underdetermined simultaneous equations There is a problem in the situation where it is not possible to uniquely determine the estimated value of the transfer characteristic. On the other hand, in the present embodiment, the problem is solved by solving a well-determined simultaneous equation of L <N. For example, the estimated value of the transfer characteristic can be uniquely determined by the least squares method. However, the estimated value of the transfer characteristic obtained by the prior art of FIG. 1 is not a solution of the least squares method. For this reason, there still remains the problem that the echo is not sufficiently eliminated for a certain period of time. In order to apply the least squares method in practice, it is necessary to solve the inverse of the correlation matrix for the input signal, but it is very difficult to solve the inverse of the matrix of very high dimensionality numerically and stably is there. For this reason, in the present embodiment, without directly solving the inverse matrix, the estimated value of the transfer characteristic is recursively realized by a method of approaching the least square solution.

従来の反響消去方法の多くは、音声通話に適用するため、出力信号（反響消去信号）の遅延時間を小さくする要件のもと、解がその時点で一意に定まらない劣決定な連立方程式を解くことで、スピーカとマイクロホンの間の音響伝達経路の伝達特性を推定していた。このため、伝達特性の推定に一定の時間を要し、処理の開始当初や伝達特性が変動した直後において、反響が十分に消去できない時間区間が生じた。本実施形態では、未知数の数（インパルス応答のサンプルの数L）よりも、満足すべき方程式の数（収音信号のサンプル数N）を大きくとれる用途（例えば、音声通話以外の音声認識システムなど出力信号の遅延時間に関する要件が緩和された用途）に、反響消去方法を適用する場合に、音響伝達経路の伝達特性の推定が、解をその都度一意に決定できる優決定の連立方程式を解く問題として扱えることに着目した。しかしながら、要件が変わったとはいえ、従来の反響消去方法をそのまま適用するだけでは、優決定の連立方程式を解くことにならず、依然として、反響の消去が不十分となることがあるため、優決定の連立方程式の最小二乗解を求める新たな方法を考案した。この方法では、最小二乗法において通常必要となる入力信号についての非常に高次元な相関行列の逆行列を直接計算することの数値的な不安定性を避けるために、従来、劣決定の連立方程式を解く場合には意味を持たなかった、同一の入力信号と収音信号のサンプルを用いた再帰的な残差信号の評価処理を加えた。これにより、優決定の要件においては、新たに加えた再帰的な残差信号の評価が、最小二乗法における逆行列計算を直接的に解くことに対する数値的に安定な代用の役割を果たし、より正確な音響伝達経路の伝達特性の推定を可能とした。 Many of the conventional echo cancellation methods are applied to voice communications, and under the requirement to reduce the delay time of the output signal (echo cancellation signal), solve underdetermined simultaneous equations whose solution is not uniquely determined at that time. Thus, the transfer characteristic of the sound transfer path between the speaker and the microphone was estimated. Therefore, it takes a certain time to estimate the transfer characteristic, and a time interval occurs in which the echo can not be sufficiently eliminated at the beginning of the process or immediately after the change of the transfer characteristic. In the present embodiment, the application (for example, a voice recognition system other than voice communication) that allows the number of equations to be satisfied (the number N of samples of the collected sound signal) to be larger than the number of unknowns (the number L of samples of impulse response). When applying the echo cancellation method to applications where the requirements for delay time of the output signal are relaxed), the problem of solving an overdetermined simultaneous equation in which the estimation of the transfer characteristics of the sound transfer path can uniquely determine the solution each time I focused on what I can handle as However, even though the requirements have changed, simply applying the conventional echo cancellation method as it is does not solve the overdetermined simultaneous equations, and the echo cancellation may still be insufficient. We have devised a new method for finding the least squares solution of simultaneous equations of. In this method, in order to avoid the numerical instability of directly calculating the inverse matrix of the very high-dimensional correlation matrix for the input signal which is usually required in the least squares method, the conventional system of underdetermined simultaneous equations is used. The evaluation process of the recursive residual signal using the sample of the same input signal and the collection signal which did not make sense in the case of solution was added. Thus, in the over-decision requirement, the evaluation of the newly added recursive residual signal acts as a numerically stable substitute for directly solving the inverse matrix calculation in the least squares method, It is possible to estimate the transfer characteristics of the sound transfer path accurately.

＜反響消去装置１００＞
図２は第一実施形態に係る反響消去装置１００の機能ブロック図を、図３はその処理フローを示す。 <Election cancellation device 100>
FIG. 2 shows a functional block diagram of the echo canceller 100 according to the first embodiment, and FIG. 3 shows its processing flow.

反響消去装置１００は、周波数領域変換部１１０とエコー推定部１２０と時間領域変換部１３０と減算部１４０と周波数領域変換部１５０と伝達経路特性更新部１６０と制御部１７０とを含む。 Echo cancellation apparatus 100 includes a frequency domain transform unit 110, an echo estimation unit 120, a time domain transform unit 130, a subtractor 140, a frequency domain transform unit 150, a transfer path characteristic update unit 160, and a control unit 170.

反響消去装置１００は、受話端２を介して入力される入力信号x(n)と、マイクロホン４を介して収音された収音信号y(n)とを入力とする。スピーカ３とマイクロホン４とは同一空間に配備される。入力信号x(n)は、スピーカ３で再生され、その再生音がマイクロホン４に回り込むため、マイクロホン４により収音された収音信号y(n)に反響(エコー)が混入する。反響消去装置１００は、収音信号y(n)から反響(エコー)の推定値を取り除き、反響消去信号(誤差信号e_P(n))を求め、送話端５に出力する。なお、受話端２、送話端５は、例えば、音声認識を行いながら発話を行い、利用者との対話を実現する対話システム等に接続される。 The echo canceler 100 receives an input signal x (n) input via the receiving end 2 and a collected signal y (n) collected via the microphone 4. The speaker 3 and the microphone 4 are disposed in the same space. The input signal x (n) is reproduced by the speaker 3, and the reproduced sound wraps around to the microphone 4, so that echo (echo) mixes in the collected signal y (n) collected by the microphone 4. The echo canceler 100 removes the echo (echo) estimation value from the collected signal y (n), obtains an echo cancellation signal (error signal e _P (n)), and outputs the echo cancellation signal (error signal e _P (n)) to the transmitting end 5. The receiving end 2 and the transmitting end 5 are connected to, for example, a dialogue system or the like which performs speech while performing speech recognition and realizes dialogue with the user.

以下、各部の処理内容を説明する。 The processing content of each part will be described below.

＜周波数領域変換部１１０＞
周波数領域変換部１１０は、入力信号x(n)を入力とし、入力信号x(n)をNサンプル取得する毎に、N+L-1サンプル過去から、現時刻nまでのスピーカ３への入力信号x(n-N-L+1),x(n-N-L+2),…,x(n)を取得し、(L+N)個の時間領域の入力信号x(n-N-L+1),x(n-N-L+2),…,x(n)を周波数領域の入力信号X(0),X(1),…,X(N+L-1)に変換し（Ｓ１１０）、エコー推定部１２０と伝達経路特性更新部１６０とに出力する。例えば、離散フーリエ変換により周波数領域の信号に変換する。なお、想定されるインパルス応答のサンプルの数をLとし、NはLより大きい整数の何れかとする。k=0,1,…,N+L-1とし、X(0),X(1),…,X(N+L-1)をX(k)とも表現する。過去の入力信号x(n-N-L+1),x(n-N-L+2),…,x(n-1)は図示しない記憶部に格納しておき、Nサンプル取得する毎に記憶部から取得すればよい。なお、他のデータ(取得した信号や計算により求めた信号等)に関しても必要に応じて図示しない記憶部に格納しておき、取得する構成とすればよい。 <Frequency domain converter 110>
The frequency domain conversion unit 110 receives the input signal x (n), and acquires the input signal x (n) to the speaker 3 from the past N + L-1 samples to the present time n every time N samples are acquired. The signals x (nN-L + 1), x (nN-L + 2),..., X (n) are acquired, and (L + N) time domain input signals x (nN-L + 1), Convert x (nN-L + 2), ..., x (n) into frequency domain input signals X (0), X (1), ..., X (N + L-1) (S110), and estimate echo It outputs to the part 120 and the transfer path characteristic update part 160. For example, the signal is converted to a frequency domain signal by discrete Fourier transform. In addition, let L be the number of samples of the assumed impulse response, and N be any integer larger than L. Assuming that k = 0, 1,..., N + L−1, X (0), X (1),..., X (N + L−1) are also expressed as X (k). The past input signals x (nN-L + 1), x (nN-L + 2), ..., x (n-1) are stored in the storage unit (not shown), and every N samples are acquired from the storage unit You just need to get it. Note that other data (acquired signals, signals obtained by calculation, and the like) may be stored in a storage unit (not shown) as necessary and acquired.

＜エコー推定部１２０＞
エコー推定部１２０は、入力信号X(k)と音響伝達経路の伝達特性の推定値H_p(k)とを入力とし、これらの値を用いて、周波数領域のエコー推定値Y^_p(k)を求め（Ｓ１２０）、時間領域変換部１３０に出力する。例えば、次式によりエコー推定値Y^_p(k)を求める。
Y^_p(k)=H_p(k)X(k) (11)
なお、前述の通り、本実施形態では、同一の入力信号と収音信号のサンプルを用いた再帰的な残差信号の評価処理を加える。pは、再帰的な評価処理を行う際の繰り返し回数の番号を示すインデックスであり、p=1,2,…,Pとする。 <Echo estimation unit 120>
The echo estimation unit 120 receives the input signal X (k) and the estimated value H _p (k) of the transfer characteristic of the acoustic transfer path, and uses these values to calculate an echo estimated value Y ^ _p (k in the frequency domain). ) Is output (S120) to the time domain conversion unit 130. For example, the echo estimated value ^ _p (k) is obtained by the following equation.
Y ^ _p (k) = H _p (k) X (k) (11)
As described above, in this embodiment, the process of evaluating the residual residual signal using the same input signal and the sample of the collected signal is added. p is an index indicating the number of repetitions when performing recursive evaluation processing, and it is assumed that p = 1, 2,.

＜時間領域変換部１３０＞
時間領域変換部１３０は、周波数領域のエコー推定値Y^_p(k)を入力とし、周波数領域のエコー推定値Y^_p(0),Y^_p(1),…,Y^_p(N+L-1)を時間領域のエコー推定値y^_p(n-N-L+1),y^_p(n-N-L+2),…,y^_p(n)に変換し（Ｓ１３０）、減算部１４０に出力する。例えば、最後Nサンプル分のエコー推定値y^_p(n-N+1),y^_p(n-N+2),…,y^_p(n)を抜きだし、減算部１４０に出力してもよい。時間領域変換部１３０は、周波数領域変換部１１０の変換方式の逆変換に対応する変換方式により、周波数領域のエコー推定値を時間領域のエコー推定値に変換すればよい。例えば、逆離散フーリエ変換により時間領域の信号に変換する。 <Time domain conversion unit 130>
The time domain transform unit 130 receives the echo estimated value Y ^ _p (k) in the frequency domain as an input, and estimates the echo estimated value Y ^ _p (0), Y ^ _p (1), ..., Y ^ _p (N) in the frequency domain. Convert + L-1) to the time domain echo estimate y ^ _p (nN-L + 1), y ^ _p (nN-L + 2), ..., y ^ _p (n) (S130) and subtract Output to section 140. For example, echo estimated values y ^ _p (n-N + 1), y ^ _p (n-N + 2), ..., y ^ _p (n) for the last N samples are extracted and output to the subtraction unit 140 May be The time domain conversion unit 130 may convert the echo estimation value in the frequency domain into an echo estimation value in the time domain by a conversion method corresponding to the inverse conversion of the conversion method of the frequency domain conversion unit 110. For example, the signal is converted to a time domain signal by inverse discrete Fourier transform.

＜減算部１４０＞
減算部１４０は、エコー推定値y^_p(n-N-L+1),y^_p(n-N-L+2),…,y^_p(n)と収音信号y(n)とを入力とする。エコー推定値y^_p(n-N-L+1),y^_p(n-N-L+2),…,y^_p(n)のうちの最後Nサンプル分のエコー推定値y^_p(n-N+1),y^_p(n-N+2),…,y^_p(n)と、N-1サンプル過去から現時刻nまでの収音信号y(n-N+1),y(n-N+2),…,y(n)との差分である残差信号e_p(n-N+1),e_p(n-N+2),…,e_p(n)を求め（Ｓ１４０）、周波数領域変換部１５０に出力する。例えば、e_p(i)=y(i)-y^_p(i)(i=n-N+1,n-N+2,…,n)である。なお、p=Pの場合には反響消去信号として残差信号e_p(n-N+1),e_p(n-N+2),…,e_p(n)を送話端５にも出力する。 <Subtractor 140>
The subtractor unit 140 receives the echo estimated values y ^ _p (nN-L + 1), y ^ _p (nN-L + 2), ..., y ^ _p (n) and the collected sound signal y (n). Do. Echo estimate _{y ^ p (nN-L +} 1), y ^ p (nN-L + 2), ..., y ^ p last N samples of the echo estimate y ^ _p of the (n) (n- N + 1), y ^ _p (n-N + 2), ..., y ^ _p (n), and the collected sound signal y (n-N + 1), y from N-1 samples past to the current time n The residual signal e _p (n-N + 1), e _p (n-N + 2), ..., e _p (n) which is the difference between (n-N + 2), ..., y (n) The signal is obtained (S140) and output to the frequency domain conversion unit 150. For example, e _p (i) = y (i) −y ^ _p (i) (i = n−N + 1, n−N + 2,..., N). In the case of p = P, the residual signal e _p (n−N + 1), e _p (n−N + 2),..., E _p (n) is also transmitted to the transmitting end 5 as the echo cancellation signal. Output.

＜周波数領域変換部１５０＞
周波数領域変換部１５０は、時間領域の残差信号e_p(n-N+1),e_p(n-N+2),…,e_p(n)を入力とし、これらの値を周波数領域の残差信号E_p(0),E_p(1),…,E_p(N+L-1)に変換し（Ｓ１５０）、伝達経路特性更新部１６０に出力する。周波数領域変換部１５０は、周波数領域変換部１１０の変換方式と同様の変換方式により、時間領域の残差信号を周波数領域の残差信号に変換すればよい。例えば、残差信号e_p(n-N+1),e_p(n-N+2),…,e_p(n)の系列の前に、L点の零系列を加え、離散フーリエ変換により、N+L点の周波数領域の残差信号E_p(k)を得る。 <Frequency domain converter 150>
Frequency domain transform section 150, the residual signal e _p in the time domain (n-N + 1), e p (n-N + 2), ..., and enter the e _p (n), the frequency domain these values The residual signals E _p (0), E _p (1),..., E _p (N + L−1) are converted (S150), and output to the transfer path characteristic updating unit 160. The frequency domain transform unit 150 may transform the residual signal in the time domain into a residual signal in the frequency domain by a conversion scheme similar to that of the frequency domain transform unit 110. For example, a zero sequence of L points is added to a sequence of residual signals e _p (n-N + 1), e _p (n-N + 2), ..., e _p (n), and discrete Fourier transform , N + L point frequency domain residual signals E _p (k) are obtained.

＜伝達経路特性更新部１６０＞
伝達経路特性更新部１６０は、入力信号X(0),X(1),…,X(N+L-1)と残差信号E_p(0),E_p(1),…,E_p(N+L-1)とを入力とし、これらの値を用いて、推定値H_p(k)を更新し（Ｓ１６０）、更新後の推定値H_p+1(k)をエコー推定部１２０に出力する。例えば、推定値H_p(k)は次式により更新する。 <Transmission route characteristic update unit 160>
The transfer path characteristic updating unit 160 is configured to receive the input signals X (0), X (1),..., X (N + L-1) and the residual signal E _p (0), E _p (1) _,. The estimated value H _p (k) is updated using (N + L−1) as the input and these values are used (S 160), and the updated estimated value H _{p + 1} (k) is used as the echo estimation unit 120. Output to For example, the estimated value H _p (k) is updated by the following equation.

ただし、μ_kは更新量を調整するステップサイズであり、P(k)は入力信号のパワーの大きさに依存した正規化係数、^*は複素共役を表す。なお、周波数領域の推定値H_p+1(k)を得る際、推定値H_p+1(k)を時間領域に変換した推定値h_p+1(0),h_p+1(1),…,h_p+1(N+L-1)が、前半のサンプル以外の要素において零となるように、音響伝達経路の伝達特性の更新において、時間領域における拘束をかけてもよい。例えば、(i)更新後の推定値H_p+1(k)を直接時間領域に変換し、前半のサンプル以外の要素を零に置換え、再度周波数領域の信号に変換してもよいし、(ii)式(12)の右辺第２項の更新項部分のみを時間領域に変換し、前半のサンプル以外の要素を零に置換え、再度周波数領域の信号に変換した後、式(12)により更新してもよい。図４Ａに示す後半のサンプルに生じるノイズを零に置換えることで消去し（図４Ｂ参照）、より安定した伝達特性の推定を可能とする。 Here, μ _k is a step size for adjusting the update amount, P (k) is a normalization coefficient depending on the magnitude of the power of the input signal, and ^* is a complex conjugate. Incidentally, in obtaining the estimated value H _{p + 1} (k) in the frequency domain, estimate H _{p + 1} estimates were converted time domain _{(k) h p + 1 (} 0), h p + 1 (1) ,..., H _{p + 1} (N + L−1) may be constrained in the time domain in the update of the transfer characteristic of the sound transfer path such that the elements other than the first half sample become zero. For example, (i) the updated estimated value H _{p + 1} (k) may be directly converted to the time domain, elements other than the first half samples may be replaced with zeros, and converted again to the frequency domain signal ii) Only the update term part of the second term of the right side of the equation (12) is converted to the time domain, the elements other than the first half sample are replaced with zeros, converted to the frequency domain signal again, and updated by the equation (12) You may By replacing the noise generated in the latter sample shown in FIG. 4A with zero (see FIG. 4B), it is possible to estimate the transfer characteristic more stably.

＜制御部１７０＞
制御部１７０は、所定の条件を満たすまでエコー推定部１２０と時間領域変換部１３０と減算部１４０と周波数領域変換部１５０と伝達経路特性更新部１６０とにおける処理Ｓ１２０，Ｓ１３０，Ｓ１４０，Ｓ１５０，Ｓ１６０を繰り返すように制御する（Ｓ１７０）。なお、エコー推定部１２０と伝達経路特性更新部１６０とにおける処理Ｓ１２０，Ｓ１６０では繰り返し回数ｐに関わらず入力信号X(0),X(1),…,X(N+L-1)を用い、減算部１４０における処理Ｓ１４０では繰り返し回数ｐに関わらず収音信号y(n),y(n-1),…,y(n-N+1)を用いる。 <Control unit 170>
The control unit 170 performs processes S120, S130, S140, S150, and S160 in the echo estimation unit 120, the time domain conversion unit 130, the subtraction unit 140, the frequency domain conversion unit 150, and the transfer path characteristic update unit 160 until the predetermined condition is satisfied. Control to repeat (S170). In the processes S120 and S160 in the echo estimation unit 120 and the transfer path characteristic update unit 160, the input signals X (0), X (1),..., X (N + L-1) are used regardless of the number of repetitions p. In processing S140 in the subtraction unit 140, the collected sound signals y (n), y (n-1),..., Y (n-N + 1) are used regardless of the number of repetitions p.

所定の条件とは、推定値H_p+1(k)が収束したか否かを調べることができる条件であればよい。例えば、(i)所定の回数、処理を繰り返したか否か(所定の回数以上処理を繰り返したのであれば、推定値H_p+1(k)は収束していると想定する)、(ii)所定の時間、処理を繰り返したか否か(所定の時間以上処理を繰り返したのであれば、推定値H_p+1(k)は収束していると想定する)、(iii)推定値H_p+1(k)と推定値H_p(k)との差分が所定の閾値よりも小さいか否かを判定することで、収束したか否かを調べることができる。例えば、(i)所定の回数、処理を繰り返した場合、(ii)所定の時間、処理を繰り返した場合、(iii)推定値H_p+1(k)と推定値H_p(k)との差分が所定の閾値よりも小さい場合に、推定値H_p+1(k)が収束したと判定する。 The predetermined condition may be any condition that can check whether the estimated value H _{p + 1} (k) has converged. For example, (i) whether or not the process has been repeated a predetermined number of times (if the process is repeated a predetermined number of times or more, it is assumed that the estimated value H _{p + 1} (k) converges), (ii) Whether or not the processing is repeated for a predetermined time (if the processing is repeated for a predetermined time or more, the estimated value H _{p + 1} (k) is assumed to converge), (iii) the estimated value H _{p +} It can be checked whether convergence has occurred by determining whether the difference between ₁ (k) and the estimated value H _p (k) is smaller than a predetermined threshold. For example, (i) when the process is repeated a predetermined number of times, (ii) when the process is repeated for a predetermined time, (iii) between the estimated value H _{p + 1} (k) and the estimated value H _p (k) When the difference is smaller than a predetermined threshold value, it is determined that the estimated value H _{p + 1} (k) has converged.

(i)の場合、制御部１７０は、処理Ｓ１２０，Ｓ１３０，Ｓ１４０，Ｓ１５０，Ｓ１６０の処理回数pをカウントしておき、所定の回数Pを超えるまで、上述の処理を繰り返すように各部に制御信号を送信する。 In the case of (i), the control unit 170 counts the number of times p of the processes S120, S130, S140, S150, and S160, and repeats the above process until the predetermined number of times P is exceeded. Send

(ii)の場合、制御部１７０は、最初の処理（例えばエコー推定処理Ｓ１２０）を行ってからの経過時間を計測しておき、所定の時間(例えばNサンプル分の離散時間)が経過するまで、上述の処理を繰り返すように各部に制御信号を送信する。 In the case of (ii), the control unit 170 measures an elapsed time after performing the first process (for example, echo estimation process S120), and continues until a predetermined time (for example, discrete time for N samples) elapses. Control signals are transmitted to the respective units so as to repeat the above-described processing.

(iii)の場合、制御部１７０は、推定値H_p+1(k)と推定値H_p(k)とを取得し、その差分が所定の閾値よりも大きいか否かを判定し、大きい場合には、上述の処理を繰り返すように各部に制御信号を送信する。 In the case of (iii), the control unit 170 obtains the estimated value H _{p + 1} (k) and the estimated value H _p (k), and determines whether the difference is larger than a predetermined threshold value. In such a case, control signals are transmitted to each unit so as to repeat the above-described processing.

なお、これらの条件を組合せてもよい。例えば、条件(i)所定の回数、処理を繰り返した場合、または、条件(iii)推定値H_p+1(k)と推定値H_p(k)との差分が所定の閾値よりも小さい場合に、推定値H_p+1(k)が収束したと判定する。 Note that these conditions may be combined. For example, when the process is repeated a predetermined number of times of the condition (i), or the difference between the estimated value H _{p + 1} (k) and the estimated value H _p (k) is smaller than a predetermined threshold Then, it is determined that the estimated value H _{p + 1} (k) has converged.

制御部１７０は、減算部１４０が、条件を満たすまでは残差信号e_p(n-N+1),e_p(n-N+2),…,e_p(n)を周波数領域変換部１５０のみに出力し、条件を満たしたときは周波数領域変換部１５０及び送話端５に出力するように制御する。本実施形態では、条件(i)のみを適用した例を示している。そのため、p=1,2,…,Pとし、p=1,2,…，P-1の場合には残差信号e_P(n-N+1),e_P(n-N+2),…,e_P(n)を周波数領域変換部１５０のみに出力し、p=Pの場合には周波数領域変換部１５０及び送話端５に出力する。 Controller 170, the subtraction unit 140, until the condition is satisfied residual signal _{e p (n-N + 1} ), e p (n-N + 2), ..., e p (n) is the frequency domain converter Control is performed so that the signal is output only to 150 and is output to the frequency domain conversion unit 150 and the transmitting end 5 when the condition is satisfied. In the present embodiment, an example in which only the condition (i) is applied is shown. Therefore, if p = 1, 2,..., P, and p = 1, 2,..., P−1, the residual signal e _P (n−N + 1), e _P (n−N + 2) ,..., E _P (n) are output only to the frequency domain transform unit 150, and when p = P, they are output to the frequency domain transform unit 150 and the transmitting end 5.

反響消去装置１００は、Nサンプル分の離散時間が経過した時点で、処理Ｓ１１０に戻り、入力信号x(n)および、収音信号y(n)を取得し直し、処理Ｓ１１０〜Ｓ１７０を繰り返す。なお、Nサンプル分の離散時間が経過する毎に、推定値H_p(k)の初期値H₁(k)の値として、過去の推定値(例えば1フレーム前において最後に得られる推定値H_P(k))をそのまま継承して用いてもよいし、零にリセットしてもよい。 When the discrete time for N samples has passed, the echo canceler 100 returns to the process S110, acquires the input signal x (n) and the collected sound signal y (n) again, and repeats the processes S110 to S170. In addition, whenever discrete time for N samples elapses, a past estimated value (for example, an estimated value H obtained last in one frame ago) is used as the value of the initial value H ₁ (k) of the estimated value H _p (k). _P (k)) may be inherited and used as it is, or may be reset to zero.

図１の従来技術との違いは、処理Ｓ１７０が加わり、同一の入力信号と収音信号に対して、更新された周波数領域の伝達特性の推定値H_p(k)を再度適用して、残差を評価し、再帰的に更新を繰り返すようにした点である。この繰り返しにより、入力信号の相関行列の逆行列演算を行うことなく、伝達特性の推定値H_p(k)を最小二乗解に近づけることができる。このため、本実施形態により、L<Nの場合において、処理の開始直後から、十分に反響が消去された信号を出力することができる。 The difference from the prior art in FIG. 1 is that the process S170 is added, and the estimated value H _p (k) of the transfer characteristic of the updated frequency domain is reapplied to the same input signal and the collected sound signal. The difference is evaluated and the update is repeated recursively. By this repetition, the estimated value H _p (k) of the transfer characteristic can be brought close to the least square solution without performing the inverse matrix operation of the correlation matrix of the input signal. Therefore, according to this embodiment, in the case of L <N, it is possible to output a signal whose echo has been sufficiently canceled immediately after the start of the process.

＜効果＞
以上の構成により、スピーカ３とマイクロホン４の間の伝達特性の想定されるインパルス応答のサンプル数Lよりも多くのサンプル数Nの収音信号を１つの処理単位として反響消去を行う反響消去方法において、入力信号についての相関行列の逆行列を直接的に計算することなく、伝達特性の最小二乗解を近似的に得ることができる。これにより、伝達特性の推定に、一定時間の信号の入力を要する必要がなく、処理の開始直後から、十分に反響が消去された信号を出力することができる。なお、本実施形態では、周波数領域でエコーの推定（Ｓ１２０）や伝達特性の推定（Ｓ１６０）を行っているが、これらの処理を時間領域で行ってもよい。ただし、この場合、計算量が増える。また、この場合には周波数領域変換部１１０、時間領域変換部１３０、周波数領域変換部１５０を削除してもよい。また、スピーカへの入力信号や、マイクロホンからの収音信号が多チャネルの場合における反響消去においても、本発明を適用することが可能であり、同様の効果を得ることができる。 <Effect>
With the above configuration, the echo cancellation method performs echo cancellation with one sound processing signal having a number N of samples greater than the number L of samples of the impulse response assumed to have transfer characteristics between the speaker 3 and the microphone 4. The least squares solution of the transfer characteristic can be approximately obtained without directly calculating the inverse of the correlation matrix for the input signal. As a result, it is not necessary to require input of a signal for a certain period of time to estimate transfer characteristics, and it is possible to output a signal whose echo has been sufficiently canceled immediately after the start of processing. In the present embodiment, the echo estimation (S120) and the transfer characteristic estimation (S160) are performed in the frequency domain, but these processes may be performed in the time domain. However, in this case, the amount of calculation increases. Also, in this case, the frequency domain transform unit 110, the time domain transform unit 130, and the frequency domain transform unit 150 may be deleted. The present invention can also be applied to echo cancellation in the case where the input signal to the speaker and the sound pickup signal from the microphone have multiple channels, and similar effects can be obtained.

＜変形例＞
音響伝達経路のインパルス応答長が実効的にNより短いとみなせるならば、L>Nの場合においても、本発明を適用し、効果を得ることができる。すなわち、音の再生や収音の段階において、信号のバッファ処理等に起因して、入力信号x(n)が実際にスピーカ３から再生されるまでに遅延が生じたり、マイクロホン４で収音された収音信号y(n)が実際に反響消去処理のために取得されるまでに遅延が生じる場合などにおいて、この遅延を予め知ることができないときは、その遅延時間も含めて、Lの値を大きく設定し、Nより大きくせざるを得ない場合がある。その場合においても、本発明は適用可能である。以下、適用方法について、第一実施形態と異なる部分を中心に説明する。 <Modification>
If the impulse response length of the sound transmission path can be effectively regarded as shorter than N, the present invention can be applied and the effect can be obtained even in the case of L> N. That is, at the stage of sound reproduction or sound collection, a delay occurs before the input signal x (n) is actually reproduced from the speaker 3 due to signal buffer processing or the like, or the sound is collected by the microphone 4. When a delay occurs before the collected signal y (n) is actually acquired for echo cancellation processing, if this delay can not be known in advance, the value of L including its delay time In some cases, it may be necessary to set N larger than N. Even in that case, the present invention is applicable. Hereinafter, the application method will be described focusing on parts different from the first embodiment.

＜伝達経路特性更新部１６０＞
伝達経路特性更新部１６０は、入力信号X(0),X(1),…,X(N+L-1)と残差信号E_p(0),E_p(1),…,E_p(N+L-1)とを入力とし、これらの値を用いて、推定値H_p(k)を更新する際（Ｓ１６０）、更新後の推定値H'_p+1(k)が、時間領域において，音響伝達経路のインパルス応答の実効長L'個のサンプルのみ非零の値を取り得るように拘束をかける。例えば、一旦、更新後の推定値H'_p+1(k)を時間領域の推定値h'_p+1(0),h'_p+1(1),…,h'_p+1(N+L-1)に変換し、前半Lサンプルの中で、最大のピーク値を含む区間を音響伝達経路のインパルス応答の実効長L'の長さで切り出す。つまり、時間領域の推定値h'_p+1(0),h'_p+1(1),…,h'_p+1(N+L-1)の最大のピーク値を検出し、そのピーク値のサンプル値から、または、そのピーク値から数サンプル前のサンプル値から、L'サンプル分のサンプル値を切り出す。切り出した区間以外の要素を零に置換えてから、置換え後の時間領域の推定値h_p+1(0),h_p+1(1),…,h_p+1(N+L-1)を再度、周波数領域の推定値H_p+1(k)に変換することで、音響伝達経路の伝達特性に時間領域における拘束をかける。なお、L>N>L'を満たすようにNを設定する。本変形例では、実質的にL'<Nの優決定な連立方程式を解く問題に変わっており、第一実施形態の場合と同様、伝達特性の推定値を再帰的に最小二乗解へと近づけることができる。また、図５Ａに示す前半Lサンプルの中で、最大のピーク値を含む区間以外に生じるノイズを零に置換えることで消去し（図５Ｂ参照）、より安定した伝達特性の推定を可能とする。なお、このような前半Lサンプルの有効範囲を狭めることになる時間領域における拘束は、Ｓ１２０〜Ｓ１６０の繰り返しの初期段階では適用しない方が、伝達特性の変動に対して柔軟に対応できる。例えば、Ｓ１２０〜Ｓ１６０の繰り返しを１０回行うとした場合(P=10)、最初の５回は、前半Lサンプルの有効範囲を狭める拘束は行わず、繰り返しの後半５回において、推定された音響伝達経路のインパルス応答が最大のピーク値を取る区間を特定し、その区間以外を零とする拘束を適用するように実装することができる。 <Transmission route characteristic update unit 160>
The transfer path characteristic updating unit 160 is configured to receive the input signals X (0), X (1),..., X (N + L-1) and the residual signal E _p (0), E _p (1) _,. When the estimated value H _p (k) is updated using (N + L−1) as an input and these values are used (S 160), the updated estimated value H ′ _{p + 1} (k) In the domain, only the effective length L 'samples of the impulse response of the sound transmission path are constrained so as to take non-zero values. For example, once the updated estimated value H ′ _{p + 1} (k) is estimated in the time domain h ′ _{p + 1} (0), h ′ _{p + 1} (1),..., H ′ _{p + 1} (N In the first half L samples, the section including the maximum peak value is cut out by the length of the effective length L ′ of the impulse response of the sound transmission path. That is, the maximum peak value of the estimated values h ′ _{p + 1} (0), h ′ _{p + 1} (1),..., H ′ _{p + 1} (N + L−1) in the time domain is detected, and the peaks The sample value for L 'samples is cut out from the sample value of the value or from the sample value several samples before the peak value. After replacing elements other than the cut out interval with zero, estimated values h _{p + 1} (0), h _{p + 1} (1),..., H _{p + 1} (N + L-1) of the time domain after replacement Is again converted to the estimated value H _{p + 1} (k) in the frequency domain to constrain the transfer characteristic of the acoustic transfer path in the time domain. Note that N is set so as to satisfy L>N> L ′. In this modification, the problem is solved by solving a system of equations in which L '<N is substantially determined. As in the first embodiment, the estimated value of the transfer characteristic is recursively brought closer to the least squares solution. be able to. Further, among the first half L samples shown in FIG. 5A, noise occurring in areas other than the section including the largest peak value is eliminated by replacing it with zero (see FIG. 5B), and more stable transfer characteristics can be estimated. . Such restraint in the time domain that narrows the effective range of the first half L sample can be flexibly coped with the fluctuation of the transfer characteristic if it is not applied at the initial stage of repetition of S120 to S160. For example, assuming that S120 to S160 are repeated ten times (P = 10), the first five times do not constrain the effective range of the first half L sample, and the estimated sound is used in the second five times of the repetition It can be implemented to specify a section in which the impulse response of the transmission path takes the maximum peak value, and apply a constraint that makes the other than that section zero.

＜効果＞
このような構成とすることで、L>Nの場合にも第一実施形態と同様の効果を得ることができる。 <Effect>
With such a configuration, the same effect as that of the first embodiment can be obtained when L> N.

＜その他の変形例＞
本発明は上記の実施形態及び変形例に限定されるものではない。例えば、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 <Other Modifications>
The present invention is not limited to the above embodiments and modifications. For example, the various processes described above may be performed not only in chronological order according to the description, but also in parallel or individually depending on the processing capability of the apparatus that executes the process or the necessity. In addition, changes can be made as appropriate without departing from the spirit of the present invention.

＜プログラム及び記録媒体＞
また、上記の実施形態及び変形例で説明した各装置における各種の処理機能をコンピュータによって実現してもよい。その場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 <Program and Recording Medium>
In addition, various processing functions in each device described in the above-described embodiment and modification may be realized by a computer. In that case, the processing content of the function that each device should have is described by a program. By executing this program on a computer, various processing functions in each of the above-described devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing content can be recorded in a computer readable recording medium. As the computer readable recording medium, any medium such as a magnetic recording device, an optical disc, a magneto-optical recording medium, a semiconductor memory, etc. may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させてもよい。 Further, this program is distributed, for example, by selling, transferring, lending, etc. a portable recording medium such as a DVD, a CD-ROM or the like in which the program is recorded. Furthermore, the program may be stored in a storage device of a server computer, and the program may be distributed by transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶部に格納する。そして、処理の実行時、このコンピュータは、自己の記憶部に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実施形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよい。さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、プログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 For example, a computer that executes such a program first temporarily stores a program recorded on a portable recording medium or a program transferred from a server computer in its own storage unit. Then, at the time of execution of the process, the computer reads the program stored in its storage unit and executes the process according to the read program. In another embodiment of the program, the computer may read the program directly from the portable recording medium and execute processing in accordance with the program. Furthermore, each time a program is transferred from this server computer to this computer, processing according to the received program may be executed sequentially. In addition, a configuration in which the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes processing functions only by executing instructions and acquiring results from the server computer without transferring the program to the computer It may be Note that the program includes information provided for processing by a computer that conforms to the program (such as data that is not a direct command to the computer but has a property defining the processing of the computer).

また、コンピュータ上で所定のプログラムを実行させることにより、各装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In addition, although each device is configured by executing a predetermined program on a computer, at least a part of the processing content may be realized as hardware.

Claims

Assuming that the number N of samples to be processed as the echo cancellation processing is larger than the number of samples of the effective length of the assumed impulse response, the input signal x in the time domain is converted to the input signal X in the frequency domain every N samples. A first frequency domain converter to convert;
An index indicating the number of repetition times and p, by using the estimated value H _p of the transfer characteristic of the acoustic transmission path of the input signal X and the frequency domain, and echo estimator for determining the echo estimate Y ^ _p in the frequency domain ,
And the time domain converter for converting the echo estimate Y ^ _p to echo estimate y ^ _p in the time domain,
A subtraction unit for obtaining the residual signal e _p in the time domain which is the difference between the collected signal y and the echo estimate y ^ _p in the time domain,
A second frequency domain converter for converting the residual signal e _p in the time domain to the residual signal E _p in the frequency domain,
A transfer path characteristic updating unit that updates the estimated value H _p using the input signal X and the residual signal E _p to obtain the updated estimated value H _{p + 1} ;
The echo estimation unit, the time domain conversion unit, the subtraction unit, and the second frequency domain conversion until the updated estimated value H _{p + 1} converges using the input signal X and the collected sound signal y And a control unit that performs control so as to repeat processing in the unit and the transfer path characteristic update unit,
Echo canceler.

The echo canceler according to claim 1, wherein
Let L be the number of samples including the expected delay and the effective length of the impulse response, and N be any integer larger than L, and the first frequency domain transform unit generates (L + N) pieces for every N samples. , X (n) of the time domain input signal x (nN−L + 1), (nN−L + 2),..., X (n) to the frequency domain input signal X (0), X (1),. Convert to + L-1),
The echo estimation unit estimates estimated values H _p (0), H _p (1) of the transfer characteristics of the acoustic transmission path with the input signals X (0), X (1),..., X (N + L-1). ,..., H _p (N + L−1), and the echo estimated values Y ^ _p (0), Y ^ _p (1),..., Y ^ _p (N + L−1) in the frequency domain Ask for
The time domain transformation unit is configured to estimate the echo estimated values Y ^ _p (0), Y ^ _p (1), ..., Y ^ _p (N + L-1) as the time domain echo estimated value y ^ _p (nN- L + 1), y ^ _p (nN-L + 2), ..., y ^ _p (n),
The subtractor unit is configured to calculate a collected signal y (n−N + 1), y (n−N + 2),..., Y (n) and an echo estimated value y _p (n−N + 1), in the time domain. _{y ^ p (n-n +} 2), ..., the residual signal is a difference between _{_{y ^ p (n) e p}} (n-n + 1), e p (n-n + 2), ..., e _{Find p} (n),
The second frequency domain transform unit is configured to receive the residual signal e _p (n−N + 1), e _p (n−N + 2),..., E _p (n) as a frequency domain residual signal in the time domain. Convert to E _p (0), E _p (1), ..., E _p (N + L-1),
The transfer path characteristic update unit is configured to update the input signal X (0), X (1),..., X (N + L-1) and the residual signal E _p (0), E _p (1),. The estimated values H _p (0), H _p (1), ..., H _p (N + L-1) are updated using E _p (N + L-1), and the updated estimated value H We obtain _{p + 1} (0), H _{p + 1} (1), ..., H _{p + 1} (N + L-1) ,
The controller controls the input signal X (0), X (1),..., X (N + L-1) and the collected sound signal y (n), y (n-1),. Until the updated estimated values H _{p + 1} (0), H _{p + 1} (1),..., H _{p + 1} (N + L−1) converge using The echo estimation unit, the time domain conversion unit, the subtraction unit, the second frequency domain conversion unit, and the transmission path characteristic update unit are controlled to be repeated.
Echo canceler.

The echo canceler according to claim 1, wherein
Let L be the number of samples including the expected delay and the effective length of the impulse response, L 'be the number of samples of the effective length of the impulse response, and N be any integer greater than L' and less than L. The first frequency domain transform unit frequency-divides (L + N) time domain input signals x (nN−L + 1), (nN−L + 2),..., X (n) every N samples. Convert the input signal X (0), X (1), ..., X (N + L-1) of the domain
The echo estimation unit estimates estimated values H _p (0), H _p (1) of the transfer characteristics of the acoustic transmission path with the input signals X (0), X (1),..., X (N + L-1). ,..., H _p (N + L−1), and the echo estimated values Y ^ _p (0), Y ^ _p (1),..., Y ^ _p (N + L−1) in the frequency domain Ask for
The time domain transformation unit is configured to estimate the echo estimated values Y ^ _p (0), Y ^ _p (1), ..., Y ^ _p (N + L-1) as the time domain echo estimated value y ^ _p (nN- L + 1), y ^ _p (nN-L + 2), ..., y ^ _p (n),
The subtractor unit is configured to calculate a collected signal y (n−N + 1), y (n−N + 2),..., Y (n) and an echo estimated value y _p (n−N + 1), in the time domain. _{y ^ p (n-n +} 2), ..., the residual signal is a difference between _{_{y ^ p (n) e p}} (n-n + 1), e p (n-n + 2), ..., e _{Find p} (n),
The second frequency domain transform unit is configured to receive the residual signal e _p (n−N + 1), e _p (n−N + 2),..., E _p (n) as a frequency domain residual signal in the time domain. Convert to E _p (0), E _p (1), ..., E _p (N + L-1),
The transfer path characteristic update unit is configured to update the input signal X (0), X (1),..., X (N + L-1) and the residual signal E _p (0), E _p (1),. When updating the estimated values H _p (0), H _p (1), ..., H _p (N + L-1) using E _p (N + L-1), the estimated values after the update H _{p + 1} (0), H _{p + 1} (1),..., H _{p + 1} (N + L−1) are estimated values in the time domain h _{p + 1} (nN−L + 1), h _{p + 1} (nN−L + 2),..., H _{p + 1} (n), the converted estimated values h _{p + 1} (nN−L + 1), h _{p + 1} (nN−L + 2),..., H _{p + 1} (n) is the updated estimated value H _{p + 1} (0), H _p so that the value becomes zero in the interval other than the interval of L ′ sample including the peak value Constrain the calculation of ₊₁ (1), ..., H _{p +1} (N + L-1),
The controller controls the input signal X (0), X (1),..., X (N + L-1) and the collected sound signal y (n), y (n-1),. Until the estimated values H _{p + 1} (0), H _{p + 1} (1),..., H _{p + 1} (N + L−1) converge using Control is performed so as to repeat processing in the time domain conversion unit, the subtraction unit, the second frequency domain conversion unit, and the transfer route characteristic update unit.
Echo canceler.

Assuming that the number N of samples to be processed as the echo cancellation processing is larger than the number of samples of the effective length of the assumed impulse response, the input signal x in the time domain is converted to the input signal X in the frequency domain every N samples. A first frequency domain conversion step to convert;
An index indicating the number of repetition times and p, by using the estimated value H _p of the transfer characteristic of the acoustic transmission path of the input signal X and the frequency domain, and echo estimating step of obtaining the echo estimate Y ^ _p in the frequency domain ,
And the time domain conversion step of converting the echo estimate Y ^ _p to echo estimate y ^ _p in the time domain,
A subtraction step for obtaining a residual signal e _p in the time domain which is a difference between the collected sound signal y in the time domain and the echo estimated value y ^ _p ;
A second frequency domain transforming step of transforming the residual signal e _p in the time domain to the residual signal E _p in the frequency domain,
Updating the estimated value H _p using the input signal X and the residual signal E _p to obtain an updated estimated value H _{p + 1} ;
The echo estimation step, the time domain conversion step, the subtraction step, and the second frequency domain conversion until the updated estimated value H _{p + 1} converges using the input signal X and the collected sound signal y Repeating the process in the step and the transfer route characteristic update step;
Echo cancellation method.

A program for causing a computer to function as the echo canceler according to any one of claims 1 to 3.