JP4448423B2

JP4448423B2 - Echo suppression method, apparatus for implementing this method, program, and recording medium therefor

Info

Publication number: JP4448423B2
Application number: JP2004309638A
Authority: JP
Inventors: 暁江村; 末廣島内
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-10-25
Filing date: 2004-10-25
Publication date: 2010-04-07
Anticipated expiration: 2024-10-25
Also published as: JP2006121588A

Description

この発明は、例えば多チャネル音響再生系を有する通信会議システムに適用され、ハウリングの原因及び聴覚上の障害となる音響エコーを抑圧する多チャネル反響抑圧方法、その装置、そのプログラム及びその記録媒体に関するものである。 BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multi-channel echo suppression method, apparatus, program, and recording medium that are applied to, for example, a communication conference system having a multi-channel sound reproduction system and suppress acoustic echoes that cause acoustic feedback and cause hearing problems. Is.

近年のディジタルネットワークの大容量化により、複数の人が容易に参加でき、より自然な通話環境を提供できる多チャネル拡声型の通信会議システムが検討されている。このシステムでは、受話音声がスピーカから再生されマイクロホンに収音されて音響エコーが生じ、そのまま送信されると通話の障害や不快感などの問題が生じる。スピーカ再生信号の信号パワーよりも音響エコー信号の信号パワーが大きい場合には、音響エコーはハウリングを引き起こして通話を不可能にしてしまう。この多チャネル通信会議システムにおける音響エコーを抑圧する方法として、特開２００３−３０９４９３号公報（特許文献１）がある。 With the recent increase in capacity of digital networks, a multi-channel loudspeaker type teleconferencing system that allows a plurality of people to easily participate and provide a more natural calling environment has been studied. In this system, the received voice is reproduced from the speaker and picked up by the microphone to generate an acoustic echo. If the received voice is transmitted as it is, problems such as a call failure and discomfort arise. When the signal power of the acoustic echo signal is larger than the signal power of the speaker reproduction signal, the acoustic echo causes howling and makes a call impossible. As a method of suppressing acoustic echo in this multi-channel communication conference system, there is JP-A-2003-309493 (Patent Document 1).

Ｍ（≧２）チャネルの再生系と２チャネルの収音系とで構成される通信会議システムは、図１に示すような構成により音響エコーの抑圧を行う。すなわち各受話端子１_ｍ（ｍ＝１，…，Ｍ）からの受話信号は、再生信号として各スピーカ２_ｍ（ｍ＝１，…，Ｍ）に送られ、音響信号として再生され、各Ｍ個の音響エコー経路を経て各マイクロホン３_ｎ（ｎ＝１，…，Ｎ）に回り込む。マイクロホン３_ｎからの収音信号は、収音信号ごとにＭチャネルエコー抑圧部６_ｎ（ｎ＝１，…，Ｎ）によりエコー成分が抑圧され、送信信号として送話端子５_ｎ（ｎ＝１，…，Ｎ）から送信される。 A communication conference system including an M (≧ 2) channel reproduction system and a two-channel sound collection system performs acoustic echo suppression with the configuration shown in FIG. That is, the reception signal from each reception terminal 1 _m (m = 1,..., M) is sent as a reproduction signal to each speaker 2 _m (m = 1,..., M) and reproduced as an acoustic signal. each microphone ₃ n through the acoustic echo path (n = 1, ..., n ) around to. The collected sound signal from the microphone 3 _n is suppressed for each sound collected signal by the M channel echo suppressor 6 _n (n = 1,..., N) and the transmission terminal 5 _n (n = 1) is transmitted as a transmission signal. ,..., N).

上記Ｍチャネルエコー抑圧部６の内部構成を図２に示す。Ｍチャネルエコー抑圧部６は、スピーカ２_ｍからのＭチャネルの再生信号とマイクロホンからの１チャネルの収音信号から、周波数成分ごとに収音信号に占めるエコー成分の比率を推定し、周波数成分ごとに音響エコー相当分だけ収音信号の振幅を減衰させることで音響エコーを抑圧する。なお、収音系がＮチャネルの場合には、図１に示すようにＭチャネルエコー抑圧部６をＮ個並列に並べることになる。
エコー抑圧部６では、ＴＦ変換部６１_ｍ（ｍ＝１，…，Ｍ）にて時間領域の再生信号ｘ_１（ｋ），…，ｘ_Ｍ（ｋ）（ただし、ｋは時間を示す変数。）を、フレーム長２Ｌサンプルで、Ｌサンプルごとにフレーム化し、周波数領域に変換してスペクトルＸ_１（ｊ，ｆ），…，Ｘ_Ｍ（ｊ，ｆ）（ただし、ｊはフレームの時刻を示す変数）を求める。ＴＦ変換部６２では、時間領域の収音信号ｙ（ｋ）を周波数領域に変換してスペクトルＹ（ｊ，ｆ）を求める。Ｌサンプルごとの信号のサンプル時刻ｋとフレーム時刻ｊの関係を図１１に示す。 The internal configuration of the M channel echo suppressor 6 is shown in FIG. The M channel echo suppressor 6 estimates the ratio of the echo component in the sound collection signal for each frequency component from the M channel reproduction signal from the speaker 2 _m and the one channel sound collection signal from the microphone. The acoustic echo is suppressed by attenuating the amplitude of the collected sound signal by an amount corresponding to the acoustic echo. If the sound collection system is N-channel, N M-channel echo suppression units 6 are arranged in parallel as shown in FIG.
In the echo suppression unit 6, the reproduction signal x ₁ (k),..., X _M (k) in the time domain in the TF conversion unit 61 _m (m = 1,..., M) (where k is a variable indicating time. ) With a frame length of 2L samples, framed for each L sample, converted to the frequency domain, and spectrum X ₁ (j, f),..., X _M (j, f) (where j represents the time of the frame) Variable). In the TF conversion unit 62, the spectrum Y (j, f) is obtained by converting the collected sound signal y (k) in the time domain into the frequency domain. FIG. 11 shows the relationship between the sample time k of the signal for each L sample and the frame time j.

図２中のエコー成分比率推定部６３で、上記再生信号のスペクトルＸ_１（ｊ，ｆ），…，Ｘ_Ｍ（ｊ，ｆ）と上記収音信号のスペクトルＹ（ｊ，ｆ）から収音成分に占めるエコー成分の比率γ^２（ｊ，ｆ）が求められ、減衰比算出部６４で減衰率を求める。乗算部６５では周波数成分ごとに収音信号の振幅をエコー相当分だけ減衰させる。ＦＴ変換部６６では、周波数領域での処理結果を時間領域に変換し、エコーが抑圧された送信信号を得る。なお、上記減衰率は、エコー成分比率γ^２（ｆ）から複数の方法により求めることができ、具体的には特開平１１−３３１０４６号公報（特許文献２）に詳しく示されている。 The echo component ratio estimation unit 63 in FIG. 2 collects sound from the reproduction signal spectrum X ₁ (j, f),..., X _M (j, f) and the sound collection signal spectrum Y (j, f). The ratio γ ² (j, f) of the echo component occupying the component is obtained, and the attenuation ratio is calculated by the attenuation ratio calculation unit 64. The multiplier 65 attenuates the amplitude of the collected sound signal for each frequency component by an amount corresponding to the echo. The FT transform unit 66 transforms the processing result in the frequency domain into the time domain, and obtains a transmission signal in which echo is suppressed. The attenuation factor can be obtained from the echo component ratio γ ² (f) by a plurality of methods, and specifically described in detail in Japanese Patent Laid-Open No. 11-331046 (Patent Document 2).

次に図２中のエコー成分比率推定部６３の詳細な構成を図３に示す。相関除去部６３１では、多チャネル再生信号のスペクトルＸ_１（ｊ，ｆ），…，Ｘ_Ｍ（ｊ，ｆ）から互いに相関のない多チャネルのスペクトルＸ_１（ｊ，ｆ），Ｘ_２（１）（ｊ，ｆ），…，Ｘ_{Ｍ（Ｍ−１）}（ｊ，ｆ）を求める。相関除去部６３２では、収音信号のスペクトルＹ（ｊ，ｆ）から第１〜第ｍ−１チャネル再生信号の相関成分を除去したスペクトルＹ_{（ｍ―１）}（ｊ，ｆ）（ｍ＝２，…，Ｍ）を求める。コヒーレンス算出部６３３では、コヒーレンス算出部６３３_１で第１チャネルの再生信号Ｘ_１（ｊ，ｆ）と収音信号Ｙ（ｊ，ｆ）のコヒーレンスγ_１ｙ ^２（ｊ，ｆ）を、コヒーレンス算出部６３３_ｍ（ｍ＝２，…，Ｍ）で第ｍチャネルの再生信号Ｘ_{ｍ（ｍ−１）}（ｊ，ｆ）とＹ_{（ｍ−１）}（ｊ，ｆ）（ｍ＝２，…，Ｍ）のコヒーレンスγ_{ｍｙ（ｍ−１）} ^２（ｊ，ｆ）を求める。エコー成分比率算出部６３４では、次式によりエコー成分比率γ^２（ｊ，ｆ）を求める。

エコー成分比率推定のフローを図４に示す。
特開２００３−３０９４９３号公報特開平１１−３３１０４６号公報 Next, a detailed configuration of the echo component ratio estimation unit 63 in FIG. 2 is shown in FIG. The decorrelation unit 631, the multi-channel reproduction signal spectrum _{X 1 (j, f),} ..., X M (j, f) the spectrum of the multi-channel having no correlation with each other from _{_{X 1 (j, f),}} X 2 (1 ₎ (J, f),..., X _{M (M-1)} (j, f) is obtained. In the correlation removing unit 632, a spectrum Y _(m−1) (j, f) (m = 2) obtained by removing the correlation component of the _{first to (m−1)} -th channel reproduction signals from the spectrum Y (j, f) of the collected sound signal. , ..., M). In the coherence calculation unit 633, the coherence calculation unit 633 ₁ converts the reproduction signal X ₁ (j, f) of the first channel and the coherence γ _1y ² (j, f) of the collected sound signal Y (j, f) into the coherence calculation unit. At 633 _m (m = 2,..., M), the reproduction signal X _{m (m−1)} (j, f) and Y _(m−1) (j, f) (m = 2,. ) Coherence γ _{my (m−1)} ² (j, f). The echo component ratio calculation unit 634 obtains an echo component ratio γ ² (j, f) by the following equation.

The flow of echo component ratio estimation is shown in FIG.
JP 2003-309493 A JP-A-11-331046

上記従来法では、収音信号は一定のフレーム長でフレーム化され、ＦＦＴにより周波数領域に変換され、エコー抑圧処理を経て送信される。この方法では、送話音声信号はフレーム長分バッファリングされ、処理されてから送信されるので、ハードウェアの処理能力には関係なく、フレーム長によって決まるアルゴリズム上の遅延（処理遅延）が存在する。この遅延が大きい場合には通話系として非常に離しづらくなってしまうため、フレーム長を短くして処理遅延を抑える必要がある。
しかし、スピーカから再生されてマイクロホンに収音されるまでにフレーム長以上遅延するエコー成分は、非エコー成分として扱われることが問題となる。したがって、フレーム長を残響時間（通常の部屋で３００ｍｓ程度）よりも大幅に短く設定した場合、エコー成分比率が小さめに設定されたり、エコー成分の推定値が揺らいだりするために、エコー成分比率の推定性能が劣化し、エコー成分抑圧性能が劣化してしまう。 In the above-described conventional method, the collected sound signal is framed with a fixed frame length, converted into the frequency domain by FFT, and transmitted through echo suppression processing. In this method, the transmission voice signal is buffered for the frame length, processed, and then transmitted. Therefore, there is an algorithmic delay (processing delay) determined by the frame length regardless of the processing capability of the hardware. . When this delay is large, it is very difficult to separate as a call system, so it is necessary to reduce the processing delay by shortening the frame length.
However, there is a problem that an echo component that is delayed from the frame length by the time it is reproduced from the speaker and collected by the microphone is treated as a non-echo component. Therefore, when the frame length is set to be significantly shorter than the reverberation time (about 300 ms in a normal room), the echo component ratio is set smaller, or the estimated value of the echo component fluctuates. The estimation performance is degraded, and the echo component suppression performance is degraded.

この発明では、収音信号の短時間スペクトルＹ（ｊ，ｆ）に含まれるエコー成分の比率を、現時点の多チャネル再生信号フレームから求めた短時間スペクトルＸ_１（ｊ，ｆ），…，Ｘ_Ｍ（ｊ，ｆ）だけでなく、過去の再生信号フレームから求めた短時間スペクトルも一緒に使用して推定する方法を提案する。
この発明では更に、多チャネル再生信号の現時点のフレームと過去のフレームとを、現時点のフレームの第１チャネル再生信号からなる主成分および主成分との相関が除去されたその他のフレームからなる副成分に分け、主成分のエコーが収音信号に占める割合を求め、副成分のエコーが主成分との相関を除去した収音信号に占める割合を求め、これら２つの割合から収音信号に占める多チャネル再生信号のエコー成分比率を推定する方法を提案する。 In the present invention, the ratio of echo components included in the short-time spectrum Y (j, f) of the collected sound signal is determined from the short-time spectrum X ₁ (j, f),. A method is proposed for estimation using not only _M (j, f) but also a short-time spectrum obtained from a past reproduction signal frame.
In the present invention, the current frame and the past frame of the multi-channel reproduction signal are further divided into the main component consisting of the first channel reproduction signal of the current frame and the sub-component consisting of other frames from which the correlation with the main component has been removed. The ratio of the main component echo to the collected sound signal is obtained, and the ratio of the sub component echo to the collected sound signal from which the correlation with the main component is removed is obtained. A method for estimating the echo component ratio of the channel reproduction signal is proposed.

この方法により、過去の信号フレームをエコー成分比率の推定に取り込むことができ、フレーム長が残響時間よりも大幅に短く設定された場合でもエコー成分比率の推定性能劣化を回避し、エコー抑圧性能の劣化を防ぐことができる。 With this method, past signal frames can be taken into the estimation of the echo component ratio, and even when the frame length is set to be much shorter than the reverberation time, the estimation performance of the echo component ratio is avoided and the echo suppression performance is improved. Deterioration can be prevented.

以下にこの発明の実施形態を図面を参照しながら説明するが、各図中の対応する部分は同一参照番号を付けて重複説明を省略する。
［第１実施形態］
この発明をＭ（≧２）チャネル再生系とＮ（≧１）チャネル収音系からなる場合について説明する。収音系のＮチャネルに対しては、Ｍ入力１出力の送信音声パワー推定部をＮ個並列に並べることで、Ｎチャネルの収音系に対応する。この発明では、図３に内部構成を示している図２のエコー抑圧部６中のエコー成分比率推定部６３を、図５に内部構造を示すエコー成分比率推定部６８に置き換える。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below with reference to the drawings. Corresponding portions in the respective drawings are given the same reference numerals, and redundant description is omitted.
[First Embodiment]
The present invention will be described in the case of an M (≧ 2) channel reproduction system and an N (≧ 1) channel sound collection system. For the N channel of the sound collection system, N transmission audio power estimators with M inputs and one output are arranged in parallel to correspond to the N channel sound collection system. In the present invention, the echo component ratio estimation unit 63 in the echo suppression unit 6 of FIG. 2 whose internal configuration is shown in FIG. 3 is replaced with an echo component ratio estimation unit 68 whose internal structure is shown in FIG.

以下では、フレーム長を２Ｌサンプル、シフト長をＬサンプル、フレーム時刻をｊとする。フレーム時刻ｊの信号フレームは、サンプル時刻ｋ＝ｊＬ−２Ｌ＋１〜ｊＬの信号サンプルからなる。このときの信号のサンプル時刻ｋとフレーム時刻ｊの関係は図１１のようになる。また、過去の再生信号フレームから求めたスペクトルとして、１フレーム前の短時間スペクトルＸ_１（ｊ−１，ｆ），…，Ｘ_Ｍ（ｊ−１，ｆ）を使用する例を説明する。
図２のＴＦ変換部６１_ｍ（ｍ＝１，…，Ｍ）において、各チャネルの時間領域の再生信号ｘ_ｍ（ｋ）をＬサンプル毎に長さ２Ｌの信号ベクトルにフレーム化し、ＦＦＴを使って短時間スペクトルに変換する。

ただし、ｍ＝１，…，Ｍ
この処理では、各信号をハニング窓等でウインドウ処理してから周波数変換してもよい。 In the following, it is assumed that the frame length is 2L samples, the shift length is L samples, and the frame time is j. The signal frame at frame time j is composed of signal samples at sample times k = jL−2L + 1 to jL. The relationship between the signal sampling time k and the frame time j at this time is as shown in FIG. An example in which the short-time spectrum X ₁ (j−1, f),..., X _M (j−1, f) one frame before is used as the spectrum obtained from the past reproduction signal frame will be described.
In the TF converter 61 _m (m = 1,..., M) in FIG. 2, the reproduction signal x _m (k) in the time domain of each channel is framed into a signal vector having a length of 2 L for each L sample, and FFT is used. For a short time.

However, m = 1, ..., M
In this processing, each signal may be subjected to frequency conversion after being windowed by a Hanning window or the like.

また、ＴＦ変換部６２において、収音信号ｙ（ｋ）を周波数領域に変換し、短時間スペクトルをえる。

この処理でも、各信号をウインドウ処理してから周波数変換してもよい。
図５に内部構造が示されているエコー成分比率推定部６８において以下のステップＦ１〜７により、周波数領域の多チャネル再生信号Ｘ_ｍ（ｊ，ｆ）と周波数領域の収音信号Ｙ（ｊ，ｆ）から、周波数成分ごとに収音信号に含まれるエコー成分の比率を求める。図６にエコー成分比率を推定するためのフローを示す。 Further, the TF converter 62 converts the collected sound signal y (k) into the frequency domain and obtains a short-time spectrum.

In this processing, each signal may be subjected to window processing and then frequency conversion.
In the echo component ratio estimation unit 68 whose internal structure is shown in FIG. 5, the frequency domain multi-channel reproduction signal X _m (j, f) and the frequency domain sound collection signal Y (j, f, From f), the ratio of echo components included in the collected sound signal is obtained for each frequency component. FIG. 6 shows a flow for estimating the echo component ratio.

ステップＦ１
現時点のフレームから求めた多チャネル再生信号の短時間スペクトルＸ_１（ｊ，ｆ），…，Ｘ_Ｍ（ｊ，ｆ）を図５の相関除去部６８１内の蓄積部６８１ａ１に保存する。
ステップＦ２
相関除去部６８１ｂ１では、例えば次式の方法で多チャネル再生信号の短時間スペクトルＸ_２（ｊ，ｆ），…，Ｘ_Ｍ（ｊ，ｆ）からＸ_１（ｊ，ｆ）との相関成分を除去して、スペクトルＸ_２（１）（ｊ，ｆ），…，Ｘ_Ｍ（１）（ｊ，ｆ）を得、多チャネル再生信号スペクトルの副成分の一部とする。

ただし、ｍ＝２，…，Ｍ
ここで、ε［］は、平均をとることを意味し、平均処理の一例としては、

のように、１フレーム前の処理結果と０〜１の値をとる平滑化定数βを用いる方法がある。 Step F1
The short-time spectrum X ₁ (j, f),..., X _M (j, f) of the multi-channel reproduction signal obtained from the current frame is stored in the accumulation unit 681a1 in the correlation removal unit 681 in FIG.
Step F2
The decorrelation unit 681B1, for example, short-time spectrum _X 2 multichannel reproduction signal by the following equation method _{(j, f), ...,} X M (j, f) from _X 1 (j, f) the correlation components of the The spectrum X _{2 (1)} (j, f),..., X _{M (1)} (j, f) is obtained as a part of the sub-component of the multi-channel reproduction signal spectrum.

However, m = 2, ..., M
Here, ε [] means taking an average, and as an example of the averaging process,

As described above, there is a method using a processing result of one frame before and a smoothing constant β that takes a value of 0 to 1.

ステップＦ３
相関除去部６８１ｂ２において、蓄積部６８１ａ２に蓄積された１フレーム前の多チャネル再生信号のスペクトルＸ_１（ｊ−１，ｆ），…，Ｘ_Ｍ（ｊ−１，ｆ）から、Ｘ_１（ｊ，ｆ）との相関を次のように除去したスペクトルＸ_１（１）（ｊ−１，ｆ），…，Ｘ_Ｍ（１）（ｊ−１，ｆ）を求め、多チャネル再生信号スペクトルの副成分の一部とする。

ただし、ｍ＝１，…，Ｍ
なお、ｎフレーム前の短時間スペクトルＸ_１（ｊ−ｎ，ｆ），…，Ｘ_Ｍ（ｊ−ｎ，ｆ）をエコー成分比率推定に使用する場合にも、同様の計算により得られた結果を多チャネル再生信号スペクトルの副成分の一部とすればよい。 Step F3
In decorrelation unit 681B2, the spectrum _X 1 of the multi-channel playback signal before one frame stored in the storage unit 681a2 (j-1, f) , ..., from X M (j-1, f ), X 1 (j , F) to obtain a spectrum X _{1 (1)} (j−1, f),..., X _{M (1)} (j−1, f) from which the correlation with the multi-channel reproduction signal spectrum is obtained as follows. It is a part of subcomponent.

However, m = 1, ..., M
Incidentally, n frames before short-time spectrum _{X 1 (j-n, f} ), ..., X M (j-n, f) in the case of using the echo component ratio estimation results obtained by the similar calculation May be a part of the sub-component of the multi-channel reproduction signal spectrum.

ステップＦ４
相関除去部６８２では、現時点のフレームの収音信号の短時間スペクトルＹ（ｊ，ｆ）からＸ_１（ｊ，ｆ）との相関成分を除去したスペクトルＹ_（１）（ｊ，ｆ）を求める。

Step F4
The correlation removal unit 682 obtains a spectrum Y ₍₁₎ (j, f) obtained by removing the correlation component with X ₁ (j, f) from the short-time spectrum Y (j, f) of the sound collection signal of the current frame. .

ステップＦ５
コヒーレンス算出部６８３_１では、多チャネル再生信号スペクトルの主成分である現時点のフレームの第１チャネル再生信号の短時間スペクトルＸ_１（ｊ，ｆ）と現時点の収音信号のスペクトルＹ（ｊ，ｆ）から、次のコヒーレンスを求める。

Step F5
In the coherence calculation unit 683 ₁ , the short-time spectrum X ₁ (j, f) of the first channel reproduction signal of the current frame, which is the main component of the multi-channel reproduction signal spectrum, and the spectrum Y (j, f) of the current collected sound signal ) To find the next coherence.

ステップＦ６
副成分エコー比率算出部６８３_２では、まず相関除去された収音信号スペクトルＹ_（１）（ｊ，ｆ）に含まれるエコー成分Ｙ＾_（１）（ｊ，ｆ）を求める。エコー成分Ｙ＾_（１）（ｊ，ｆ）は、多チャネル再生信号短時間スペクトルの副成分Ｘ_２（１）（ｊ，ｆ），…，Ｘ_Ｍ（１）（ｊ，ｆ），Ｘ_１（１）（ｊ−１，ｆ），…，Ｘ_Ｍ（１）（ｊ−１，ｆ）の線形和

のうちで、相関除去された収音信号スペクトルとの誤差
｜Ｙ_（１）（ｊ，ｆ）−Ｙ＾_（１）（ｊ，ｆ）｜^２
が最小となるスペクトルである。この誤差を最小にするスペクトルは、

とし、

により求められる。さらに、次式により副成分のエコー比率を周波数成分ごとに求める。

Step F6
The sub-component echo ratio calculation unit 683 ₂ first obtains an echo component Y ^ ₍₁₎ (j, f) included in the collected sound signal spectrum Y ₍₁₎ (j, f) whose correlation has been removed. The echo component Y ^ ₍₁₎ (j, f) is a sub-component X _{2 (1)} (j, f),..., X _{M (1)} (j, f), X ₁ of the multi-channel reproduction signal short-time spectrum. ₍₁₎ (j−1, f),..., X _{M (1)} (j−1, f) linear sum

Among them, the error from the collected sound signal spectrum from which the correlation has been removed | Y ₍₁₎ (j, f) −Y ^ ₍₁₎ (j, f) | ²
Is the spectrum with the minimum. The spectrum that minimizes this error is

age,

It is calculated by. Further, the echo ratio of subcomponents is obtained for each frequency component by the following equation.

ステップＦ７
エコー成分比率算出部６８４において、ステップＦ５、Ｆ６で求めた各比率から、収音信号スペクトルＹ（ｊ，ｆ）に占めるエコー成分の比率を求める。

図２の減衰比算出部６４では、まず周波数成分ごとにエコー成分比率から振幅減衰率を算出し、乗算部６５で収音信号の振幅を減衰させ、エコー抑圧処理結果Ｚ（ｊ，ｆ）を得る。減衰方法の例としては次式の方法がある。

これによりエコーが抑圧される。 Step F7
In the echo component ratio calculation unit 684, the ratio of the echo component in the collected sound signal spectrum Y (j, f) is obtained from each ratio obtained in steps F5 and F6.

In the attenuation ratio calculation unit 64 of FIG. 2, first, the amplitude attenuation rate is calculated from the echo component ratio for each frequency component, the multiplication unit 65 attenuates the amplitude of the collected sound signal, and the echo suppression processing result Z (j, f) is obtained. obtain. An example of the attenuation method is the following equation.

As a result, the echo is suppressed.

ＦＴ変換部６６では、周波数領域におけるエコー抑圧処理結果Ｚ（ｊ，ｆ）を次式のように時間領域のフレーム信号に変換する。

このフレーム信号から、例えばフレームの一部を切出した［ｚ（ｊＬ−Ｌ＋１） … ｚ（ｊＬ）］を送信信号としてもよいし、複数のフレームをウインドウ処理してオーバーラップする区間を合成することで送信信号を得てもよい。
現時点のフレームの処理が終了すると、最後に現時点の蓄積部６８１ａ１に蓄積された再生信号情報は過去の蓄積部６８１ａ２に転送され、蓄積される。
なお、蓄積部６８１ａ内で現時点の蓄積部６８１ａ１と過去の蓄積部６８１ａ２とを特に区別し、上記のように一連の処理の最後に現時点の蓄積部６８１ａ１に蓄積された再生信号情報を過去の蓄積部６８１ａ２に転送するのではなく、１つの蓄積部６８１ａに蓄積された情報の中で最新情報を現時点の情報として処理する方法もある。また、図７に示すように処理に利用する現時点の再生信号のスペクトルを、蓄積部から取り出すのではなく、入力された再生信号のスペクトルを直接利用する方法もある。 The FT converter 66 converts the echo suppression processing result Z (j, f) in the frequency domain into a time domain frame signal as shown in the following equation.

From this frame signal, for example, [z (jL−L + 1)... Z (jL)] obtained by cutting out a part of the frame may be used as a transmission signal, or a plurality of frames are subjected to window processing to synthesize overlapping sections. The transmission signal may be obtained with
When the processing of the current frame is completed, the reproduction signal information stored last in the current storage unit 681a1 is transferred to and stored in the past storage unit 681a2.
In the storage unit 681a, the current storage unit 681a1 and the past storage unit 681a2 are particularly distinguished, and the reproduction signal information stored in the current storage unit 681a1 at the end of the series of processes as described above is stored in the past. There is also a method of processing the latest information as the current information in the information stored in one storage unit 681a instead of transferring to the unit 681a2. In addition, as shown in FIG. 7, there is a method in which the spectrum of the current reproduction signal used for processing is not extracted from the storage unit, but the input reproduction signal spectrum is directly used.

［第２実施形態］
この発明は、エコー抑圧方法と適応フィルタによる音響エコー消去方法とを組み合わせたものであり、その構成例を図８に示す。
Ｍチャネル受話信号は、スピーカ２_ｍ（ｍ＝１，…，Ｍ）で音響信号として再生され、音響エコー経路を経てマイクロホン３に回り込む。同時に音響エコー消去部９の予測エコー生成部９１に入力される。減算器９２によってマイクロホン３からの収音信号ｙ（ｋ）から予測エコー信号が差し引かれ、その残差信号がエコー経路推定部９３にフィードバックされると同時にエコー抑圧部６への入力信号となる。エコー抑圧部６では、第１実施形態と同様にエコー成分を抑圧し、送話端子５から送話信号を送信する。
この構成では、エコー抑圧部６への入力として適応フィルタによるエコー消去後の信号（残差信号）を用いている。そのため受話音声と送話音声が重なるダブルトーク状況においても、収音信号に含まれる受話エコー成分を大幅に低減でき、拡声通話の品質を向上できる。
なお、図８はＭ（≧２）チャネル再生系と１チャネル収音系からなる場合を説明したが、収音系がＮ（≧２）チャネルの場合にも同様の構成をＮ個並列に並べることで対応可能である。 [Second Embodiment]
The present invention is a combination of an echo suppression method and an acoustic echo cancellation method using an adaptive filter, and a configuration example thereof is shown in FIG.
The M channel received signal is reproduced as an acoustic signal by the speaker 2 _m (m = 1,..., M), and goes around the microphone 3 through the acoustic echo path. At the same time, it is input to the predicted echo generator 91 of the acoustic echo canceler 9. The subtractor 92 subtracts the predicted echo signal from the collected sound signal y (k) from the microphone 3, and the residual signal is fed back to the echo path estimation unit 93 and simultaneously becomes an input signal to the echo suppression unit 6. The echo suppression unit 6 suppresses echo components as in the first embodiment, and transmits a transmission signal from the transmission terminal 5.
In this configuration, a signal (residual signal) after echo cancellation by an adaptive filter is used as an input to the echo suppression unit 6. Therefore, even in a double talk situation in which the received voice and the transmitted voice overlap, the received echo component included in the collected sound signal can be greatly reduced, and the quality of the expanded call can be improved.
Note that FIG. 8 illustrates the case of the M (≧ 2) channel reproduction system and the 1-channel sound collection system, but N similar configurations are arranged in parallel when the sound collection system has N (≧ 2) channels. This is possible.

［第３実施形態］
この発明は、エコー抑圧方法と音声スイッチ方法とを組み合わせたものであり、その構成例を図９に示す。
スピーカ２_ｍからのＭチャネルの再生信号とマイクロホンからの１チャネルの収音信号から、エコー抑圧部６_ｎ’（ｎ＝１，…，Ｎ）によりエコーが抑圧された送信信号が得られるとともに、送話判定部に入力される非エコー信号パワーが求められる。なお図９は、収音系がＮチャネルの場合に対応しており、Ｍチャネルエコー抑圧部６_ｎ’をＮ個並列に並べた構成となっている。 [Third Embodiment]
The present invention is a combination of an echo suppression method and a voice switch method, and a configuration example thereof is shown in FIG.
From the reproduction signal of M channel from the speaker 2 _m and the collected sound signal of 1 channel from the microphone, a transmission signal in which echo is suppressed by the echo suppression unit 6 _n ′ (n = 1,..., N) is obtained. The non-echo signal power input to the transmission determination unit is obtained. FIG. 9 corresponds to the case where the sound collection system is N-channel, and has a configuration in which N M-channel echo suppression units 6 _n ′ are arranged in parallel.

収音信号は、エコー抑圧部６’のＴＦ変換部６２で周波数領域に変換され、エコー成分比率推定部６８および信号パワー算出部６９に入力される。エコー成分比率推定部６８は、第１実施形態と同様に周波数成分ごとに収音信号に占めるエコー成分の比率を推定する。推定結果は、エコー抑圧のために減衰比算出部６４に入力されると同時に、信号パワー算出部６９に入力される。信号パワー算出部６９では、上記エコー成分比率推定結果γ^２（ｊ，ｆ）とＴＦ変換部６２からの出力である収音信号の短時間スペクトルＹ（ｊ，ｆ）から、非エコー信号パワーＰ_ＹＩを求める。送話判定部４では、上記非エコー信号パワーＰ_ＹＩをあらかじめ設定した閾値Ｐ_ｔｈと比較して送話の有無を判定し、送話音声があると判断されたときは受話側の可変損失部７_ｍ（ｍ＝１，…，Ｍ）により受話信号のみを減衰させてスピーカへの再生信号とする。送話音声がないと判断されたときは、送話端子４から送信される送話信号のみを可変損失部８_ｎ（ｎ＝１，…，Ｎ）により減衰させる。 The collected sound signal is converted into the frequency domain by the TF conversion unit 62 of the echo suppression unit 6 ′ and input to the echo component ratio estimation unit 68 and the signal power calculation unit 69. The echo component ratio estimation unit 68 estimates the ratio of echo components in the collected sound signal for each frequency component, as in the first embodiment. The estimation result is input to the attenuation ratio calculation unit 64 and simultaneously to the signal power calculation unit 69 for echo suppression. In the signal power calculation unit 69, the non-echo signal power P is calculated from the echo component ratio estimation result γ ² (j, f) and the short-time spectrum Y (j, f) of the collected sound signal output from the TF conversion unit 62. _{Find YI} . The transmission determination unit 4 determines whether or not there is a transmission by comparing the non-echo signal power P _YI with a preset threshold value P _th, and when it is determined that there is a transmission voice, the variable loss unit on the reception side Only the received signal is attenuated by 7 _m (m = 1,..., M) to obtain a reproduction signal to the speaker. When it is determined that there is no transmission voice, only the transmission signal transmitted from the transmission terminal 4 is attenuated by the variable loss unit 8 _n (n = 1,..., N).

ここで、上記非エコー信号パワーＰ_ＹＩを求める方法の例としては、次式がある。

この構成の送信信号は、エコー抑圧後に可変損失部を経た信号となるため拡声通話品質の向上が期待できる。 Here, as an example of a method for obtaining the non-echo signal power P _YI , there is the following equation.

Since the transmission signal having this configuration becomes a signal that has passed through a variable loss portion after echo suppression, an improvement in the quality of voice communication can be expected.

［第４実施形態］
この発明は、エコー抑圧方法、音声スイッチ方法、適応フィルタによる音響エコー消去方法を組み合わせたものであり、その構成例を図１０に示す。
Ｍチャネル受話信号は、スピーカ２_ｍ（ｍ＝１，…，Ｍ）で音響信号として再生され、音響エコー経路を経てマイクロホン３_ｎ（ｎ＝１，…，Ｎ）に回り込む。同時に音響エコー消去部９_ｎに入力される。予測エコー生成部９１で再生信号ｘ_ｍ（ｋ）からエコー成分が予測され、減算器９２によって収音信号ｙ（ｋ）から予測エコー信号が差し引かれる。その残差信号がエコー経路推定部９３にフィードバックされると同時にエコー抑圧部６’への入力信号となる。なお図１０は、収音系がＮチャネルの場合に対応しており、Ｍチャネルエコー抑圧部６_ｎ’をＮ個並列に並べた構成となっている。 [Fourth Embodiment]
The present invention is a combination of an echo suppression method, a voice switch method, and an acoustic echo cancellation method using an adaptive filter, and its configuration example is shown in FIG.
The M channel received signal is reproduced as an acoustic signal by the speaker 2 _m (m = 1,..., M), and circulates to the microphone 3 _n (n = 1,..., N) through the acoustic echo path. At the same time inputted to the acoustic echo cancellation unit 9 _n. An echo component is predicted from the reproduction signal x _m (k) by the prediction echo generation unit 91, and the prediction echo signal is subtracted from the collected sound signal y (k) by the subtractor 92. The residual signal is fed back to the echo path estimation unit 93 and at the same time becomes an input signal to the echo suppression unit 6 ′. FIG. 10 corresponds to the case where the sound collection system is N-channel, and has a configuration in which N M-channel echo suppression units 6 _n ′ are arranged in parallel.

エコー抑圧部６’、送話判定部４、受話側の可変損失部７_ｍ、送話側の可変損失部８_ｎの構成と処理方法は第３実施形態と同じである。
この構成の送信信号は、エコー消去後の残差信号にエコー抑圧処理し、さらに可変損失部を経た信号となるため拡声通話品質の一層の向上が期待できる。 The configuration and processing method of the echo suppression unit 6 ′, the transmission determination unit 4, the reception-side variable loss unit 7 _m , and the transmission-side variable loss unit 8 _n are the same as those in the third embodiment.
Since the transmission signal having this configuration is subjected to echo suppression processing on the residual signal after echo cancellation and further undergoes a variable loss section, further enhancement of the quality of the voice call can be expected.

Ｍ入力Ｎ出力のエコー抑圧装置の一般的構成を示す図。The figure which shows the general structure of the echo suppression apparatus of M input N output. Ｍチャネルエコー抑圧部の構成を示す図。The figure which shows the structure of an M channel echo suppression part. 従来のエコー成分比率推定部の構成を示す図。The figure which shows the structure of the conventional echo component ratio estimation part. 従来のエコー成分比率推定のフローを示す図。The figure which shows the flow of the conventional echo component ratio estimation. 第１実施形態のエコー成分比率推定部の構成を示す図。The figure which shows the structure of the echo component ratio estimation part of 1st Embodiment. 第１実施形態のエコー成分比率推定のフローを示す図。The figure which shows the flow of echo component ratio estimation of 1st Embodiment. 第１実施形態のエコー成分比率推定部の変形例の構成を示す図。The figure which shows the structure of the modification of the echo component ratio estimation part of 1st Embodiment. 第２実施形態のＭチャネルエコー抑圧装置の構成を示す図。The figure which shows the structure of the M channel echo suppression apparatus of 2nd Embodiment. 第３実施形態のＭ入力Ｎ出力のエコー抑圧装置の構成を示す図。The figure which shows the structure of the echo suppression apparatus of M input N output of 3rd Embodiment. 第４実施形態のＭ入力Ｎ出力のエコー抑圧装置の構成を示す図。The figure which shows the structure of the echo suppression apparatus of M input N output of 4th Embodiment. 信号のサンプル時刻ｋとフレーム時刻ｊの関係を示す図。The figure which shows the relationship between the sample time k of a signal, and the frame time j.

Claims

In a method for predicting an echo component from a reproduction signal of a plurality of channels (M channel) and a sound pickup signal of at least one channel and suppressing the echo,
The main component is the short-time spectrum of the first channel playback signal of the current frame,
With respect to the reproduction signal from the second to the M-th channel of the current frame and the reproduction signal from the first to the M-th channel at least one frame past, from each of the short-time spectra, the short-term spectrum as the main component Remove multiple correlations to find multiple short-time spectra that make up subcomponents,
Find the ratio of the main component echo to the short-time spectrum of the collected signal,
Obtain the ratio of the echoes of subcomponents in the short-time spectrum of the collected sound signal from which the correlation with the main component has been removed,
Estimate the ratio of echo components in the short-time spectrum of the collected sound signal for each frequency from the above two ratios,
Based on the echo component ratio estimated for each frequency, the amplitude of the collected signal is attenuated by an amount equivalent to the echo for each frequency component;
An echo suppression method characterized by the above.

The method of claim 1 , wherein
The echo component ratio gamma ² occupying the short-time spectrum of the collected signal ^(f), the echo of the main component is short Percentage spectrum γ _{1 2} ^(f) and subcomponent of collected signal echo is mainly From the ratio γ ₂ ² (f) of the short-time spectrum of the collected sound signal from which the correlation is removed,

Asking for,
An echo suppression method characterized by the above.

The method of claim 2 , wherein
The echo component Y ^ ₍₁₎ (f) included in the short-time spectrum Y ₍₁₎ (f) of the collected sound signal from which the correlation with the principal component is removed is represented by | Y ₍₁₎ (f) -Y ^ _{( 1)} (f) | Obtain as a linear sum that minimizes ² ;
The proportion γ ₂ ² (f) of the short-time spectrum of the collected sound signal from which the echo of the subcomponent is removed from the correlation with the main component,

Asking for,
An echo canceling method characterized by the above.

In the method in any one of Claims 1-3 ,
The echo suppression processing result Z (f) is calculated from the short-time spectrum Y (f) of the collected sound signal and the echo component ratio γ ² (f) occupying the short-time spectrum of the collected sound signal.

Asking for,
An echo suppression method characterized by the above.

In the method in any one of Claims 1-4 ,
A residual signal between the predicted value of the echo predicted from the reproduction signal and the signal obtained from the sound collection unit is used as the sound collection signal;
An echo suppression method characterized by the above.

Means for receiving a reproduction signal of a plurality of channels (M channel) and a sound pickup signal of at least one channel;
Means based on the short-time spectrum of the first channel reproduction signal of the current frame;
Correlation between the reproduction signal from the second to M-th channel of the current frame and the reproduction signal from the first to M-th channel at least one frame past from each short-time spectrum to the short-time spectrum as a main component Means for obtaining a plurality of short-time spectra constituting subcomponents ,
Means for determining the proportion of the principal component echo in the short-time spectrum of the collected signal;
Means for determining the ratio of the echoes of the subcomponents in the short-time spectrum of the collected sound signal from which the correlation with the main component is removed;
Means for estimating, for each frequency, an echo component ratio in the short-time spectrum of the collected sound signal from the two ratios;
Based on the echo component ratio estimated for each frequency, means for attenuating the amplitude of the collected signal for each frequency component by an amount equivalent to the echo,
An echo suppression apparatus comprising:

The apparatus of claim 6 .
As means for estimating the echo component ratio γ ² (f) in the short-time spectrum of the collected sound signal, the ratio γ ₁ ² (f) in which the main component echo occupies the short-time spectrum of the collected signal and the sub-component echo are From the ratio γ ₂ ² (f) in the short-time spectrum of the collected sound signal from which the correlation with the main component is removed,

Means to obtain by
An echo suppressor comprising:

The apparatus of claim 7 .
The short-time spectrum of the collected sound signal from which the correlation with the main component has been removed as means for obtaining the ratio γ ₂ ² (f) of the short-term spectrum of the collected sound signal from which the correlation with the main component has been removed by the echo of the subcomponent _{Y: (1)} echo component contained in the (f) _{Y ^ (1)} (f),
| Y ₍₁₎ (f) -Y ^ ₍₁₎ (f) | ²
And the ratio γ ₂ ² (f) in the short-time spectrum of the collected sound signal in which the echo of the subcomponent is removed from the correlation with the main component,

Means to obtain by
An echo suppressor comprising:

In the apparatus in any one of Claims 6-8 ,
The echo suppression processing result Z (f) is obtained from the short-time spectrum Y (f) of the collected sound signal, the collected sound signal Y (f), and the echo component ratio γ ² (f) in the short-time spectrum of the collected sound signal.

Means to obtain by
An echo suppressor comprising:

In the apparatus in any one of Claims 6-9 ,
Means for making a sound collection signal a residual signal between a predicted value of an echo predicted from a reproduction signal and a signal obtained from the sound collection section;
An echo suppressor comprising:

Echo suppressing program for executing by a computer an echo suppressing method according to claim 1-5.

The computer-readable recording medium which recorded the echo suppression program of Claim 11 .