JP2015216492A

JP2015216492A - Echo suppression device

Info

Publication number: JP2015216492A
Application number: JP2014097864A
Authority: JP
Inventors: 拓人市川; Takuto Ichikawa; 純生佐藤; Sumio Sato; 永雄服部; Nagao Hattori; 健明末永; Takeaki Suenaga; 幹生瀬戸; Mikio Seto
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2014-05-09
Filing date: 2014-05-09
Publication date: 2015-12-03

Abstract

PROBLEM TO BE SOLVED: To perform appropriate echo suppression corresponding to a speech state while suppressing a throughput.SOLUTION: An echo suppression device includes: a first echo suppression processing section which estimates an acoustic coupling amount that is a ratio of a power spectrum of a reception signal and a power spectrum of an echo signal and uses the acoustic coupling amount to generate a first echo suppression signal removing at least a portion of the echo signal from a voice collection signal; a speech state determination section for determining the speech state by using power of the reception signal and power of the first echo suppression signal; an acoustic coupling amount storage processing section for storing the acoustic coupling amount in the case where the speech state determination section determines a state of reception only; and a second echo suppression processing section by which, when the speech state determination section determines a state of reception and transmission, a second echo suppression signal is generated by removing at least a portion of the echo signal from the voice collection signal while using the acoustic coupling amount stored in the acoustic coupling amount storage processing section.

Description

本発明は、通信システム等で生じる音響エコーの抑圧に関する。 The present invention relates to suppression of acoustic echo generated in a communication system or the like.

テレビ会議システムは、複数の遠隔地間において、双方向の画像データや音声データをやり取りできるようにすることで、距離を超えたコミュニケーションを可能とするものであり、例えば、遠隔教育・医療・会議などに利用される。 Video conferencing systems enable two-way image data and voice data to be exchanged between multiple remote locations, enabling communication over distances. For example, distance learning, medical care, and conferences It is used for etc.

テレビ会議システムにおいては、一般的に音響エコーが生じる。すなわち、スピーカとマイクロフォンが同じ空間にある場合に、スピーカから出力される音声をマイクロフォンが収音することによって、スピーカ−マイクロフォン間でループが形成され、音響エコーが発生する。音響エコーは、通信音声の品質（例えば、音声の明瞭度）を劣化させ、良好な音声通信を阻害する。 In a video conference system, an acoustic echo generally occurs. That is, when the speaker and the microphone are in the same space, the microphone picks up the sound output from the speaker, thereby forming a loop between the speaker and the microphone and generating an acoustic echo. The acoustic echo deteriorates the quality of communication voice (for example, the clarity of voice) and hinders good voice communication.

音響エコーを抑圧する手法としては、適応フィルタを用いる手法や、短時間スペクトル振幅（ＳＴＳＡ：Short-Time Spectral Amplitude）推定に基づく手法（非特許文献１及び特許文献１参照）を挙げることができる。 Examples of methods for suppressing acoustic echo include a method using an adaptive filter and a method based on short-time spectral amplitude (STSA) estimation (see Non-Patent Document 1 and Patent Document 1).

非特許文献１に開示された手法は、遠隔地からの受信信号のパワースペクトルに対するマイクロフォンの収音信号のパワースペクトルの比を利用して音響エコー成分を抑圧するためのゲイン係数を算出するものであり、適応フィルタを用いる手法と比べて、演算量が少ないという利点がある。 The technique disclosed in Non-Patent Document 1 calculates a gain coefficient for suppressing an acoustic echo component by using a ratio of a power spectrum of a microphone collected signal to a power spectrum of a received signal from a remote place. There is an advantage that the amount of calculation is small as compared with the method using the adaptive filter.

以下、非特許文献１に記載されている手法について説明する。なお、以下Ａ１〜Ａ４８および［式１］〜［式８］については図１３〜図１６のとおり定義されるものとする。 Hereinafter, the method described in Non-Patent Document 1 will be described. Hereinafter, A1 to A48 and [Expression 1] to [Expression 8] are defined as shown in FIGS.

離散時間をＡ１、周波数分析区間（処理セグメント）のインデックスをＡ２とする。周波数スペクトルのインデックスをＡ３とし、これに対応する角周波数をＡ４とする。送話信号をＡ５、送話信号の処理セグメントＡ２での短時間スペクトルをＡ６、送話信号の処理セグメントＡ２でのパワースペクトルをＡ７とする。受信信号（遠隔地から受信した音声）をＡ８、受信信号の処理セグメントＡ２での短時間スペクトルをＡ９、受信信号の処理セグメントＡ２でのパワースペクトルをＡ１０とする。音響エコー信号をＡ１１、音響エコー信号の処理セグメントＡ２での短時間スペクトルをＡ１２、音響エコー信号の処理セグメントＡ２でのパワースペクトルをＡ１３とする。収音信号（マイクロフォンで収音した音声）をＡ１４、収音信号の処理セグメントＡ２での短時間スペクトルをＡ１５、収音信号の処理セグメントＡ２での短時間スペクトルの振幅成分をＡ１６、収音信号の処理セグメントＡ２でのパワースペクトルをＡ１７とする。音響経路における処理セグメントＡ２での音響結合量（音響エコー信号のパワースペクトルと、受信信号のパワースペクトルの比）をＡ１８とする。 The discrete time is A1, and the frequency analysis section (processing segment) index is A2. The index of the frequency spectrum is A3, and the corresponding angular frequency is A4. The transmission signal is A5, the short-time spectrum in the transmission signal processing segment A2 is A6, and the power spectrum in the transmission signal processing segment A2 is A7. The received signal (voice received from a remote location) is A8, the short-time spectrum in the received signal processing segment A2 is A9, and the power spectrum in the received signal processing segment A2 is A10. Assume that the acoustic echo signal is A11, the short-time spectrum in the acoustic echo signal processing segment A2 is A12, and the power spectrum in the acoustic echo signal processing segment A2 is A13. A14 for the collected sound signal (sound collected by the microphone), A15 for the short time spectrum in the processing segment A2 of the collected sound signal, A16 for the amplitude component of the short time spectrum in the processing segment A2 of the collected sound signal, The power spectrum in the processing segment A2 is A17. The amount of acoustic coupling in the processing segment A2 in the acoustic path (ratio of the power spectrum of the acoustic echo signal to the power spectrum of the received signal) is A18.

非特許文献１に記載されている手法は、収音信号の短時間スペクトルの振幅成分Ａ１６にエコー抑圧ゲインＡ２０を乗じてエコーを抑圧する手法である。エコー抑圧後の出力信号の短時間スペクトルＡ２１の振幅成分Ａ２２は［式１］で示すように算出される。 The method described in Non-Patent Document 1 is a method of suppressing an echo by multiplying an amplitude component A16 of a short-time spectrum of a collected sound signal by an echo suppression gain A20. The amplitude component A22 of the short-time spectrum A21 of the output signal after echo suppression is calculated as shown in [Equation 1].

音響エコー成分を抑圧するためのゲイン値Ａ２０は、音響エコー信号Ａ１１と送話信号Ａ５が無相関であると仮定し、ウィーナーフィルタ法を用いることで、［式２］のように導出される。 The gain value A20 for suppressing the acoustic echo component is derived as [Equation 2] by using the Wiener filter method on the assumption that the acoustic echo signal A11 and the transmission signal A5 are uncorrelated.

ここで、送話信号のパワースペクトルＡ７の期待値はＡ２６、音響エコー信号のパワースペクトルＡ１３の期待値はＡ２７、音響エコーパワースペクトルの推定値はＡ３０である。 Here, the expected value of the power spectrum A7 of the transmission signal is A26, the expected value of the power spectrum A13 of the acoustic echo signal is A27, and the estimated value of the acoustic echo power spectrum is A30.

音響エコー信号のパワースペクトルの推定値Ａ３０は、受信信号のパワースペクトルＡ１０と、音響結合量の推定値Ａ３３と、前処理セグメントにおける音響エコーパワーの推定値Ａ３４と、を用いて、［式３］に示すように算出される。 The estimated value A30 of the power spectrum of the acoustic echo signal is obtained by using [Equation 3] using the power spectrum A10 of the received signal, the estimated value A33 of the acoustic coupling amount, and the estimated value A34 of the acoustic echo power in the preprocessing segment. Is calculated as shown in FIG.

ここで、第２項のβは、エコーの残響成分の影響を加味するために設定される忘却係数である。 Here, β in the second term is a forgetting factor set in order to take into account the effect of the echo reverberation component.

音響経路における音響結合量の推定値Ａ３３は、受信信号のパワースペクトルＡ１０と収音信号のパワースペクトルＡ１７の比をもとに算出する。具体的には、処理セグメントＡ２で仮の音響結合量推定値Ａ３８を[式４]により算出し、その前のセグメントＡ３９で求めた仮の音響結合量推定値Ａ４０の大小比較を行い、より小さい値を保持して音響結合量の推定値Ａ３３とする（［式５］）。 The estimated value A33 of the acoustic coupling amount in the acoustic path is calculated based on the ratio between the power spectrum A10 of the received signal and the power spectrum A17 of the collected sound signal. Specifically, the provisional acoustic coupling amount estimated value A38 is calculated by [Expression 4] in the processing segment A2, and the magnitude of the provisional acoustic coupling amount estimated value A40 obtained in the previous segment A39 is compared. The value is held as the estimated value A33 of the acoustic coupling amount ([Equation 5]).

ここで、min（）は、最小値を選択する関数である。 Here, min () is a function for selecting the minimum value.

また、非特許文献１とは別のＳＴＳＡ推定に基づく手法として、特許文献１にて開示されている手法がある。特許文献１おいて記載されている手法は、非特許文献１と同様に、受信信号（スピーカから出力する信号）のパワースペクトルと収音信号のパワースペクトルを利用し、音響エコー成分を抑圧するためのゲイン係数を算出する手法であるが、[式４]〜[式５]の代わりに、[式６]〜[式８]により音響経路における音響結合量の推定値Ａ３３を算出する手法である。 Further, as a technique based on STSA estimation different from Non-Patent Document 1, there is a technique disclosed in Patent Document 1. Similar to Non-Patent Document 1, the technique described in Patent Document 1 uses the power spectrum of the received signal (signal output from the speaker) and the power spectrum of the collected sound signal to suppress the acoustic echo component. However, instead of [Expression 4] to [Expression 5], the estimated value A33 of the acoustic coupling amount in the acoustic path is calculated by [Expression 6] to [Expression 8]. .

ここで、［式６］の左辺Ａ４３は、受信信号の短時間スペクトルＡ９と収音信号の短時間スペクトルＡ１５のクロススペクトル期待値であり、［式７］の左辺Ａ４６は、受信信号のパワースペクトルＡ１０の期待値である。Ａ４３内に用いられているＡ４８は、Ａ９の共役複素数である。Ｌ_１及びＬ_２は、計算に用いる周波数インデックスの範囲を示し、Ｍ_１及びＭ_２は、計算に用いる処理セグメントの範囲を示している。 Here, the left side A43 of [Expression 6] is the expected cross spectrum value of the short-time spectrum A9 of the received signal and the short-time spectrum A15 of the collected sound signal, and the left side A46 of [Expression 7] is the power spectrum of the received signal. This is the expected value of A10. A48 used in A43 is a conjugate complex number of A9. L ₁ and L ₂ indicate the range of the frequency index used for the calculation, and M ₁ and M ₂ indicate the range of the processing segment used for the calculation.

日本国公開特許公報「特開２００８−５４２６９号（２００８年３月６日公開）」Japanese Patent Publication “JP 2008-54269 (published March 6, 2008)”

阪内澄宇、羽田陽一、片岡章俊著「ＳＴＳＡ推定に基づくエコー抑圧処理のゲイン強調化方式」信学論（Ａ）,１vol.J88-A,no.6,Jun.2005,p695-703Sakauchi, S., Haneda, Y., Kataoka, A., “Echo suppression processing based on STSA estimation, gain emphasis method” (1), 1vol.J88-A, no.6, Jun.2005, p695-703

前述のとおり、音響結合量は、音響エコー信号のパワースペクトルと、受信信号のパワースペクトルの比である。 As described above, the acoustic coupling amount is a ratio between the power spectrum of the acoustic echo signal and the power spectrum of the received signal.

非特許文献１に記載されている、受信信号のパワースペクトルに対する、収音信号のパワースペクトルの比から音響結合量を求める方法（［式４］）は、受信信号のみがある状態（「シングルトーク受話」状態）で、音響経路における音響結合量Ａ１８を精度よく推定する方法である。この状態以外の、送話信号も受信信号もない状態（「発話なし」状態）と、送話信号のみがある状態（「シングルトーク送話」状態）と、送話信号と受信信号の両方がある状態（「ダブルトーク」状態）では音響結合量は精度よく推定できない。 The method ([Equation 4]) for obtaining the acoustic coupling amount from the ratio of the power spectrum of the collected sound signal to the power spectrum of the received signal described in Non-Patent Document 1 is a state where there is only the received signal (“single talk” This is a method for accurately estimating the acoustic coupling amount A18 in the acoustic path in the “received” state). Other than this state, there are no transmission signal and no reception signal ("No utterance" state), only transmission signal ("Single Talk transmission" state), and both transmission and reception signals. In some states (“double talk” state), the amount of acoustic coupling cannot be estimated accurately.

この問題に対し、「発話なし」状態では、収音信号の短時間スペクトルの振幅成分Ａ１６が非常に小さい値であるため、［式１］で計算されるエコー抑圧後の信号Ａ２１も非常に小さい値となり、遠隔地に送信する信号は非常に小さい音声となるため、悪影響はない。 In contrast to this problem, in the “no utterance” state, the amplitude component A16 of the short-time spectrum of the collected sound signal is a very small value, so the signal A21 after echo suppression calculated by [Equation 1] is also very small. Value, and the signal transmitted to the remote location is very small, so there is no adverse effect.

また、「シングルトーク送話」状態では、音響結合量の推定値が非常に大きな値となるため、この場合は、ゲイン値を１とするなどして、対応することができる。 Further, in the “single talk transmission” state, the estimated value of the acoustic coupling amount is a very large value. In this case, it is possible to cope with this by setting the gain value to 1.

しかしながら、「ダブルトーク」状態では、マイクロフォンの収音信号に送話信号と音響エコー信号とが含まれるため、[式４]により算出される仮の音響結合量は、音響エコー信号に送話信号が加算された信号と、受信信号と、の信号スペクトル間の比となる。すなわち、送話信号によって、音響結合量の推定に誤差が発生する。この誤差により、[式３]で推定される音響エコーパワーＡ３０と、[式２]で推定されるエコー抑圧ゲインＡ２０と、に推定誤差が発生する。エコー抑圧ゲインＡ２０の推定誤差により、エコー抑圧ゲインＡ２０が大きくなりすぎると送話信号まで過剰に抑圧され送話信号が歪み、エコー抑圧ゲインＡ２０が小さすぎるとエコー成分が十分に抑圧されない、という問題が発生する。 However, in the “double talk” state, since the transmission signal and the acoustic echo signal are included in the collected sound signal of the microphone, the provisional acoustic coupling amount calculated by [Equation 4] is the transmission signal in the acoustic echo signal. Is the ratio between the signal spectrum of the signal to which the signal is added and the received signal. That is, an error occurs in the estimation of the acoustic coupling amount due to the transmission signal. Due to this error, an estimation error occurs between the acoustic echo power A30 estimated by [Expression 3] and the echo suppression gain A20 estimated by [Expression 2]. Due to the estimation error of the echo suppression gain A20, if the echo suppression gain A20 becomes too large, the transmission signal is excessively suppressed and the transmission signal is distorted. If the echo suppression gain A20 is too small, the echo component is not sufficiently suppressed. Will occur.

特許文献１の手法では、「ダブルトーク」状態での音響結合量の推定精度が向上するが、[式６]および[式７]の演算量が多いため、非特許文献１の手法と比べて演算量が非常に大きくなり、ＳＴＳＡ推定に基づく手法が適応フィルタによる手法と比べて演算量が小さくなる、という利点が失われてしまうことが問題となる。 In the method of Patent Document 1, the accuracy of estimating the acoustic coupling amount in the “double talk” state is improved. However, since the amount of calculation of [Equation 6] and [Equation 7] is large, compared with the method of Non-Patent Document 1. The problem is that the amount of calculation becomes very large, and the advantage that the method based on STSA estimation is smaller than the method based on the adaptive filter is lost.

本発明は、以上の課題に鑑みてなされたものであり、処理量を抑えつつ、通話状態に応じた適切なエコー抑圧を行うことを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to perform appropriate echo suppression according to a call state while suppressing a processing amount.

本エコー抑圧装置は、通信路からの受信信号に応じて発音する発音装置および収音装置とともに用いられ、収音装置を介して得られる収音信号に、短時間スペクトル振幅推定に基づいた処理を施すことによって通信路への送信を行うための信号を生成するエコー抑圧装置であって、受信信号のパワースペクトルおよび収音信号のパワースペクトルに基づいて受信信号のパワースペクトルとエコー信号のパワースペクトルとの比である音響結合量を推定するとともに該音響結合量を用いて収音信号からエコー信号の少なくとも一部を取り除いた第１のエコー抑圧信号を生成する第１のエコー抑圧処理部と、受信信号のパワーおよび第１のエコー抑圧信号のパワーを用いて通話状態を判定する通話状態判定部と、上記通話状態判定部が受信のみの状態と判定したときの上記音響結合量を保存する音響結合量保存処理部と、上記通話状態判定部が受信および送信の状態と判定したときに、音響結合量保存処理部に保存された音響結合量を用いて収音信号からエコー信号の少なくとも一部を取り除いた第２のエコー抑圧信号を生成する第２のエコー抑圧処理部とを備えることを特徴とする。 This echo suppressor is used in conjunction with a sound generator and a sound collector that generate sound in response to a received signal from a communication channel, and performs processing based on short-time spectral amplitude estimation on a collected sound signal obtained via the sound collector. An echo suppression device that generates a signal for transmission to a communication channel by applying a power spectrum of the received signal and a power spectrum of the echo signal based on the power spectrum of the received signal and the power spectrum of the collected sound signal A first echo suppression processing unit that estimates an acoustic coupling amount that is a ratio of the two and generates a first echo suppression signal by removing at least a part of the echo signal from the collected sound signal using the acoustic coupling amount; A call state determination unit for determining a call state using the power of the signal and the power of the first echo suppression signal, and the call state determination unit is configured to receive only An acoustic coupling amount storage processing unit that stores the acoustic coupling amount when it is determined as a state, and an acoustic coupling stored in the acoustic coupling amount storage processing unit when the call state determination unit determines that the state is a reception and transmission state And a second echo suppression processing unit that generates a second echo suppression signal obtained by removing at least a part of the echo signal from the collected sound signal using the volume.

上記構成によれば、受信および送信の状態（ダブルトーク状態）において、このダブルトーク状態において第１のエコー抑圧処理部で得られる音響結合量ではなく、例えば、直近の受信のみの状態（シングルトーク受話状態）で得られた音響結合量（保存された上記音響結合量）を用いて第２のエコー抑圧信号を生成することができる。これにより、処理量を抑えつつ、通話状態に応じた適切なエコー抑圧を行うことできる。 According to the above configuration, in the reception and transmission state (double talk state), not the acoustic coupling amount obtained in the first echo suppression processing unit in this double talk state, for example, only the latest reception state (single talk) The second echo suppression signal can be generated using the acoustic coupling amount (stored acoustic coupling amount) obtained in the reception state). Thereby, it is possible to perform appropriate echo suppression according to the call state while suppressing the processing amount.

本エコー抑圧装置においては、上記通話状態判定部は、受信信号のパワーと第１閾値との比較、および第１のエコー抑圧信号のパワーと第２閾値との比較とを行い、通話状態を判定する構成とすることもできる。 In the present echo suppression device, the call state determination unit compares the power of the received signal with the first threshold value and compares the power of the first echo suppression signal with the second threshold value to determine the call state. It can also be set as the structure to do.

このように、通話状態の判定に第１および第２閾値を用いることで、効率よく適切な判定が可能となる。 In this way, by using the first and second threshold values for determining the call state, it is possible to make an appropriate determination efficiently.

本エコー抑圧装置においては、前記通話状態判定部は、さらに、第１または第２のエコー抑圧処理部で推定されるエコー信号のパワースペクトルから得られるエコー信号のパワーと第３閾値との比較を行い、通話状態を判定する構成とすることもできる。 In the echo suppression apparatus, the call state determination unit further compares the power of the echo signal obtained from the power spectrum of the echo signal estimated by the first or second echo suppression processing unit with a third threshold value. It is also possible to adopt a configuration in which the call state is determined.

このように、通話状態の判定に、第１または第２のエコー抑圧処理部で推定されるエコー信号のパワースペクトルから得られるエコー信号のパワーを用いることで、エコー信号が大きい場合であっても適切な判定が可能となる。 Thus, even when the echo signal is large, the call state is determined by using the power of the echo signal obtained from the power spectrum of the echo signal estimated by the first or second echo suppression processing unit. Appropriate judgment is possible.

なお、上記の場合では、上記第３閾値と比較するエコー信号のパワーに係る受信信号の受信タイミングが、第１閾値と比較する受信信号のパワーに係る受信信号の受信タイミングよりも前に設定されている構成とすることもできる。 In the above case, the reception timing of the received signal related to the power of the echo signal compared with the third threshold is set before the reception timing of the received signal related to the power of the received signal compared to the first threshold. It can also be set as the structure.

本エコー抑圧装置においては、上記通話状態判定部が受信のみの状態または受信も送信もない状態と判定したときには、第１のエコー抑圧信号を出力し、上記通話状態判定部が受信および送信の状態と判定したときには第２のエコー抑圧信号を出力する出力処理部を備える構成とすることもできる。 In the echo suppression device, when the call state determination unit determines that the state is only reception or neither reception nor transmission, the first echo suppression signal is output, and the call state determination unit receives and transmits the state. It can also be configured to include an output processing unit that outputs the second echo suppression signal when it is determined.

このように、通話状態に応じて出力を切り替えることで、通話状態に応じた適切なエコー抑圧を行うことできる。 Thus, by switching the output according to the call state, it is possible to perform appropriate echo suppression according to the call state.

本エコー抑圧装置においては、上記通話状態判定部が受信のみの状態または受信も送信もない状態と判定したときには、出力をゼロとし、上記通話状態判定部が受信および送信の状態と判定したときには第２のエコー抑圧信号を出力する出力処理部を備える構成とすることもできる。 In this echo suppression device, when the call state determination unit determines that the state is only receiving or does not receive or transmit, the output is zero, and when the call state determination unit determines that the state is reception and transmission, An output processing unit that outputs two echo suppression signals may be provided.

このように、通話状態に応じて出力を切り替え、所定の状態では出力をゼロとすることで、通話状態に応じたより適切なエコー抑圧を行うことできる。 Thus, by switching the output according to the call state and setting the output to zero in a predetermined state, more appropriate echo suppression according to the call state can be performed.

本エコー抑圧装置においては、上記通話状態判定部が送信のみの状態と判定したときに、上記出力処理部は第１のエコー抑圧信号を出力する構成とすることができる。 In the present echo suppression device, the output processing unit can be configured to output the first echo suppression signal when the call state determination unit determines that the state is a transmission only state.

本発明によれば、処理量を抑えつつ、通話状態に応じた適切なエコー抑圧を行うことができる。 According to the present invention, it is possible to perform appropriate echo suppression according to the call state while suppressing the processing amount.

実施の形態１のエコー抑圧装置の基本的な構成例を示すブロック図である。1 is a block diagram illustrating a basic configuration example of an echo suppression device according to Embodiment 1. FIG. 図１の第１のエコー抑圧処理部を具体的に示すブロック図である。FIG. 2 is a block diagram specifically illustrating a first echo suppression processing unit in FIG. 1. 図１の第２のエコー抑圧処理部を具体的に示すブロック図である。It is a block diagram which shows the 2nd echo suppression process part of FIG. 1 concretely. 図１のエコー抑圧装置の基本的な処理を示すフローチャートである。3 is a flowchart showing basic processing of the echo suppression device of FIG. 1. 図１の第１のエコー抑圧処理部の基本的な処理を示すフローチャートである。3 is a flowchart showing basic processing of a first echo suppression processing unit in FIG. 1. 図１の第２のエコー抑圧処理部の基本的な処理を示すフローチャートである。3 is a flowchart showing basic processing of a second echo suppression processing unit in FIG. 1. 図１の通話状態判定部の基本的な処理を示すフローチャートである。It is a flowchart which shows the basic process of the call state determination part of FIG. 実施の形態２のエコー抑圧装置の基本的な構成例を示すブロック図である。6 is a block diagram illustrating a basic configuration example of an echo suppression apparatus according to Embodiment 2. FIG. 図８の通話状態判定部の基本的な処理を示すフローチャートである。It is a flowchart which shows the basic process of the call state determination part of FIG. 実施の形態３のエコー抑圧装置の基本的な構成例を示すブロック図である。FIG. 10 is a block diagram illustrating a basic configuration example of an echo suppression apparatus according to a third embodiment. 実施の形態に用いられる各符号や数式の定義を示す図である。It is a figure which shows the definition of each code | symbol and numerical formula used for embodiment. 実施の形態に用いられる各符号や数式の定義を示す図である。It is a figure which shows the definition of each code | symbol and numerical formula used for embodiment. 背景技術の説明に用いられる各符号や数式の定義を示す図である。It is a figure which shows the definition of each code | symbol and numerical formula used for description of background art. 背景技術の説明に用いられる各符号や数式の定義を示す図である。It is a figure which shows the definition of each code | symbol and numerical formula used for description of background art. 背景技術の説明に用いられる各符号や数式の定義を示す図である。It is a figure which shows the definition of each code | symbol and numerical formula used for description of background art. 背景技術の説明に用いられる各符号や数式の定義を示す図である。It is a figure which shows the definition of each code | symbol and numerical formula used for description of background art.

本発明に係る実施の形態を、図１〜図１２に基づいて説明すれば以下のとおりである。なお、以下、Ｂ１〜Ｂ４７・Ｄ７〜Ｄ１７・Ｆ４〜Ｆ２７および［式３］〜［式５］については図１１〜図１２のとおり定義されるものとする。 An embodiment according to the present invention will be described below with reference to FIGS. Hereinafter, B1 to B47, D7 to D17, F4 to F27, and [Formula 3] to [Formula 5] are defined as shown in FIGS.

〔実施の形態１〕
図１は実施の形態１に係るエコー抑圧装置の構成を示す模式図である。図１に示すように、エコー抑圧装置１００は、通信路から受信した音声信号（受信信号）Ｂ１を再生するためのスピーカ１２０（発音装置）、送話信号Ｂ２を取得するためのマイクロフォン１３０（収音装置）、および通信路に組み合わされて使用される。なお、スピーカ１２０で再生され、マイクロフォン１３０によって収音される音声信号（エコー信号）をＢ１０とする。エコー抑圧装置１００は、受信信号Ｂ１を受信信号スペクトルＢ３に変換するための受信信号ＤＦＴ（Discrete Fourier Transform）部１４０と、収音信号Ｂ４を収音信号スペクトルＢ５に変換するための収音信号ＤＦＴ（Discrete Fourier Transform）部１５０と、出力信号Ｂ６を通信路に送信するための時間領域データＢ７に変換するための送信信号ＩＤＦＴ（Inverse Discrete Fourier Transform）部１６０とを備える。 [Embodiment 1]
FIG. 1 is a schematic diagram showing a configuration of an echo suppression apparatus according to the first embodiment. As shown in FIG. 1, the echo suppression apparatus 100 includes a speaker 120 (sound generator) for reproducing an audio signal (received signal) B1 received from a communication path, and a microphone 130 (acquisition) for acquiring a transmission signal B2. Sound device) and a communication path. Note that an audio signal (echo signal) reproduced by the speaker 120 and collected by the microphone 130 is B10. The echo suppression apparatus 100 includes a received signal DFT (Discrete Fourier Transform) unit 140 for converting the received signal B1 into the received signal spectrum B3, and a collected sound signal DFT for converting the collected sound signal B4 into the collected sound signal spectrum B5. A (Discrete Fourier Transform) unit 150 and a transmission signal IDFT (Inverse Discrete Fourier Transform) unit 160 for converting the output signal B6 into time-domain data B7 for transmission to the communication path are provided.

エコー抑圧装置１００は、さらに、受信信号のパワースペクトルＢ８を算出するための受信信号パワースペクトル計算部１７０と、収音信号のパワースペクトルＢ９を算出するための収音信号パワースペクトル計算部１８０と、受信信号パワースペクトルＢ８と収音信号パワースペクトルＢ９を用いて収音信号に収音されたエコーを抑圧する第１のエコー抑圧処理部１９０と、受信信号のパワーＢ１２を算出するための受信信号パワー計算部１１０と、第１のエコー抑圧処理部から出力される音声信号のパワーＢ１３を算出するためのエコー抑圧部出力信号パワー計算部１１１と、受信信号のパワーＢ１２と第１のエコー抑圧処理部から出力される音声信号のパワーＢ１３とから、通話状態（「発話なし」状態、「シングルトーク受話」状態、「シングルトーク送話」状態、「ダブルトーク」状態）を判定する通話状態判定部１１２と、判定された通話状態に応じた処理を行う後段処理部１１３とを備える。 The echo suppression apparatus 100 further includes a reception signal power spectrum calculation unit 170 for calculating the power spectrum B8 of the reception signal, a sound collection signal power spectrum calculation unit 180 for calculating the power spectrum B9 of the sound collection signal, A first echo suppression processing unit 190 that suppresses echoes collected in the collected sound signal using the received signal power spectrum B8 and the collected sound signal power spectrum B9, and a received signal power for calculating the power B12 of the received signal Calculation unit 110, echo suppression unit output signal power calculation unit 111 for calculating power B13 of the audio signal output from the first echo suppression processing unit, received signal power B12 and first echo suppression processing unit From the power B13 of the audio signal output from the voice communication state ("no utterance" state, "single talk reception" state) It comprises "single-talk transmission" state, and determines the call state determination unit 112 to "double talk" state), and a post-processing unit 113 which performs processing according to the determined communication state.

後段処理部１１３は、通話状態判定部１１２の判定Ｃに応じて、第１のエコー抑圧処理部１９０から出力された信号Ｂ１６、あるいは第２のエコー抑圧処理部１１６から出力された信号Ｂ３６を、送信信号ＩＤＦＴ部１６０への出力信号（周波数領域データ信号）Ｂ６とする出力処理部１１４と、通話状態判定部１１２が「シングルトーク受話」状態と判定したときに、第１のエコー抑圧処理部から出力される音響結合量Ｂ１７をＢ１８として保存する音響結合量保存処理部１１５と、通話状態判定部１１２が「ダブルトーク受話」状態と判定したときに、音響結合量保存部に保存されている音響結合量Ｂ１８と受信信号パワースペクトルＢ８と収音信号パワースペクトルＢ９とを用いて収音信号に収音されたエコーを抑圧する第２のエコー抑圧処理部１１６とを備える。 The post-processing unit 113 determines the signal B16 output from the first echo suppression processing unit 190 or the signal B36 output from the second echo suppression processing unit 116 in accordance with the determination C of the call state determination unit 112. An output processing unit 114 that outputs an output signal (frequency domain data signal) B6 to the transmission signal IDFT unit 160, and the first echo suppression processing unit when the call state determination unit 112 determines the “single talk reception” state. The acoustic coupling amount storage processing unit 115 that stores the output acoustic coupling amount B17 as B18, and the acoustics stored in the acoustic coupling amount storage unit when the call state determination unit 112 determines the “double talk reception” state. A second echo that suppresses an echo collected in the collected sound signal using the coupling amount B18, the received signal power spectrum B8, and the collected sound signal power spectrum B9. And a pressure processing unit 116.

図２に、第１のエコー抑圧処理部の基本的な構成の一例を示す。第１のエコー抑圧処理部１９０は、受信信号パワースペクトルＢ８と収音信号パワースペクトルＢ９に基づき、音響結合量Ｂ１７を推定する音響結合量推定部２１０と、音響結合量Ｂ１７と受信信号パワースペクトルＢ８に基づき、エコー信号のパワースペクトルＢ４７を推定する第１のエコー信号パワースペクトル推定部２２０と、エコー信号パワースペクトルＢ４７と収音信号パワースペクトルＢ９に基づき、エコー抑圧ゲイン５０を算出する第１のエコー抑圧ゲイン計算部２３０と、エコー抑圧ゲイン５０と収音信号スペクトルＢ５とから、収音信号中のエコーを抑圧した信号Ｂ１６を生成する第１のエコー抑圧ゲイン乗算部２４０とを備える。 FIG. 2 shows an example of a basic configuration of the first echo suppression processing unit. The first echo suppression processing unit 190 is configured to estimate the acoustic coupling amount B17 based on the received signal power spectrum B8 and the collected sound signal power spectrum B9, and the acoustic coupling amount B17 and the received signal power spectrum B8. The first echo signal power spectrum estimation unit 220 that estimates the power spectrum B47 of the echo signal, and the first echo that calculates the echo suppression gain 50 based on the echo signal power spectrum B47 and the collected sound signal power spectrum B9 A suppression gain calculation unit 230, and a first echo suppression gain multiplication unit 240 that generates a signal B16 in which an echo in the collected sound signal is suppressed from the echo suppression gain 50 and the collected sound signal spectrum B5.

図３に音響結合量保存処理部と第２のエコー抑圧処理部の基本的な構成の一例を示す。 FIG. 3 shows an example of the basic configuration of the acoustic coupling amount storage processing unit and the second echo suppression processing unit.

音響結合量保存処理部１１５は、第１の音響結合量推定部２１０で推定された音響結合量Ｂ１７をＢ１８として保存する音響結合量保存部３１０を備える。 The acoustic coupling amount storage processing unit 115 includes an acoustic coupling amount storage unit 310 that stores the acoustic coupling amount B17 estimated by the first acoustic coupling amount estimation unit 210 as B18.

第２のエコー抑圧処理部１１６は、音響結合量保存部３１０に保存されている音響結合量Ｂ１８を読み込む音響結合量読み込み部３２０と、読み込んだ音響結合量Ｂ１８と受信信号パワースペクトルＢ８に基づき、エコー信号のパワースペクトルＤ４を推定する第２のエコー信号パワースペクトル推定部３３０と、第２のエコー信号パワースペクトル推定部３３０で推定されたエコー信号パワースペクトルＤ４と収音信号パワースペクトルＢ９に基づき、エコー抑圧ゲインＤ７を算出する第２のエコー抑圧ゲイン計算部３４０と、エコー抑圧ゲインＤ７と収音信号スペクトルＢ５とから、収音信号中のエコーを抑圧した信号Ｂ３６を生成する第２のエコー抑圧ゲイン乗算部３５０とを備える。 The second echo suppression processing unit 116, based on the acoustic coupling amount reading unit 320 that reads the acoustic coupling amount B18 stored in the acoustic coupling amount storage unit 310, the read acoustic coupling amount B18, and the received signal power spectrum B8, Based on the second echo signal power spectrum estimation unit 330 that estimates the power spectrum D4 of the echo signal, the echo signal power spectrum D4 estimated by the second echo signal power spectrum estimation unit 330, and the collected sound signal power spectrum B9, A second echo suppression gain calculation unit 340 that calculates an echo suppression gain D7, and a second echo suppression that generates a signal B36 that suppresses echoes in the collected sound signal from the echo suppression gain D7 and the collected sound signal spectrum B5. A gain multiplier 350.

次に、処理の流れについて、図４〜図６を用いて説明する。 Next, the flow of processing will be described with reference to FIGS.

図４はエコー抑圧処理全体の流れを示している。まずステップＳ４１において、受信信号パワースペクトル計算部１７０は、受信信号のＤＦＴの結果からそのパワースペクトルＢ２０を算出し、収音信号パワースペクトル計算部１８０は、収音信号のＤＦＴの結果からそのパワースペクトルＢ２１を算出し、第１のエコー抑圧処理部１９０に送る。 FIG. 4 shows the flow of the entire echo suppression process. First, in step S41, the received signal power spectrum calculating unit 170 calculates the power spectrum B20 from the DFT result of the received signal, and the collected sound signal power spectrum calculating unit 180 calculates the power spectrum from the DFT result of the collected signal. B21 is calculated and sent to the first echo suppression processing unit 190.

次にステップＳ４２において、第１のエコー抑圧処理部１９０は、受信信号パワースペクトルＢ２０と収音信号パワースペクトルＢ２１とを用いて、ＳＴＳＡ推定に基づくエコー抑圧処理を実施する。エコー抑圧処理の詳細は後述する。 Next, in step S42, the first echo suppression processing unit 190 performs echo suppression processing based on STSA estimation using the received signal power spectrum B20 and the collected sound signal power spectrum B21. Details of the echo suppression processing will be described later.

次にステップＳ４３において、受信信号パワー計算部１１０は、受信信号のパワースペクトルから受信信号のパワーＢ１２を算出し、エコー抑圧処理部出力信号パワー計算部１１１は、第１のエコー抑圧処理部の出力信号Ｂ１６から第１のエコー抑圧処理部の出力信号のパワーＢ１３を算出する。 Next, in step S43, the received signal power calculation unit 110 calculates the power B12 of the received signal from the power spectrum of the received signal, and the echo suppression processing unit output signal power calculation unit 111 outputs the output of the first echo suppression processing unit. The power B13 of the output signal of the first echo suppression processing unit is calculated from the signal B16.

次にステップＳ４４において、通話状態判定部１１２は、受信信号のパワーＢ１２と第１のエコー抑圧処理部の出力信号のパワーＢ１３とを用いて、通話状態（「発話なし」状態、「シングルトーク受話」状態、「シングルトーク送話」状態、「ダブルトーク」状態）を判定し、判定結果を後段処理部１１３に送信する。判定の方法については後述する。 Next, in step S44, the call state determination unit 112 uses the power B12 of the received signal and the power B13 of the output signal of the first echo suppression processing unit to use the call state (“no utterance” state, “single talk reception”). ”State,“ single talk transmission ”state,“ double talk ”state), and transmits the determination result to the post-processing unit 113. The determination method will be described later.

ステップＳ４４で判定された通話状態が「シングルトーク受話」状態の場合、ステップＳ４５に進む。ステップＳ４５において、音響結合量保存処理部１１５は、第１のエコー抑圧処理部１９０内の音響結合量推定部２１０で推定された音響結合量Ｂ１７をＢ１８として保存する。 When the call state determined in step S44 is the “single talk reception” state, the process proceeds to step S45. In step S45, the acoustic coupling amount storage processing unit 115 stores the acoustic coupling amount B17 estimated by the acoustic coupling amount estimation unit 210 in the first echo suppression processing unit 190 as B18.

次にステップＳ４６において、出力処理部１１４は、第１のエコー抑圧処理部１９０の出力信号Ｂ１６を後段処理部１１３の出力信号Ｂ６とする。 Next, in step S 46, the output processing unit 114 sets the output signal B 16 of the first echo suppression processing unit 190 as the output signal B 6 of the subsequent processing unit 113.

ステップＳ４４で判定された通話状態が「シングルトーク送話」状態の場合、ステップＳ４７に進む。ステップＳ４７において、出力処理部１１４は、第１のエコー抑圧処理部１９０の出力信号Ｂ１６を後段処理部１１３の出力信号Ｂ６とする。 If the call state determined in step S44 is the “single talk transmission” state, the process proceeds to step S47. In step S47, the output processing unit 114 sets the output signal B16 of the first echo suppression processing unit 190 as the output signal B6 of the subsequent stage processing unit 113.

ステップＳ４４で判定された通話状態が「ダブルトーク」状態の場合、ステップＳ４８に進む。ステップＳ４８において、第２のエコー抑圧処理部１１６は受信信号パワースペクトルＢ８と収音信号パワースペクトルＢ９と音響結合量保存処理部１１５に保存されている音響結合量Ｂ１８とを用いて、ＳＴＳＡ推定に基づくエコー抑圧処理を実施し、Ｂ３６を出力する。エコー抑圧処理については後述する。 If the call state determined in step S44 is the “double talk” state, the process proceeds to step S48. In step S48, the second echo suppression processing unit 116 performs STSA estimation using the received signal power spectrum B8, the collected sound signal power spectrum B9, and the acoustic coupling amount B18 stored in the acoustic coupling amount storage processing unit 115. Based on the echo suppression processing, B36 is output. The echo suppression process will be described later.

次にステップＳ４９で、出力処理部１１４は、第２のエコー抑圧処理部１１６の出力信号Ｂ３６を後段処理部１１３の出力信号Ｂ６とする。 In step S49, the output processing unit 114 sets the output signal B36 of the second echo suppression processing unit 116 as the output signal B6 of the subsequent processing unit 113.

ステップＳ４４で判定された通話状態が上記３状態以外の場合、ステップＳ５０に進む。この状態は、送話信号と受信信号の両方がない「発話なし」状態である。ステップＳ５０において、出力処理部１１４は、第１のエコー抑圧処理部１９０の出力信号Ｂ１６を後段処理部１１３の出力信号Ｂ６とする。 If the call state determined in step S44 is other than the above three states, the process proceeds to step S50. This state is a “no utterance” state in which neither a transmission signal nor a reception signal is present. In step S50, the output processing unit 114 sets the output signal B16 of the first echo suppression processing unit 190 as the output signal B6 of the subsequent stage processing unit 113.

最後に、ステップＳ５１において、送信信号ＩＤＦＴ部１６０は、出力処理部１１４から出力された信号Ｂ６をＩＤＦＴによって時間領域データ信号Ｂ７に変換し、通信路へ送る。 Finally, in step S51, the transmission signal IDFT unit 160 converts the signal B6 output from the output processing unit 114 into a time domain data signal B7 by IDFT and sends it to the communication path.

図５は第１のエコー抑圧処理部の流れを示している。 FIG. 5 shows the flow of the first echo suppression processing unit.

まずステップＳ１５１において、音響結合量推定部２１０は、受信信号パワースペクトルＢ８と収音信号パワースペクトルＢ９とを用いて、音響結合量Ｂ１７を推定する。推定方法は、たとえば、図１２の［式４］および［式５］による。 First, in step S151, the acoustic coupling amount estimation unit 210 estimates the acoustic coupling amount B17 using the received signal power spectrum B8 and the collected sound signal power spectrum B9. The estimation method is based on, for example, [Formula 4] and [Formula 5] in FIG.

次にステップＳ１５２において、第１のエコー信号パワースペクトル推定部２２０は、音響結合量Ｂ１７と収音信号のパワースペクトルＢ９を用いて、収音信号に含まれるエコーのパワースペクトルＢ４７を推定する。推定手段は、例えば、図１２の［式３］による。 Next, in step S152, the first echo signal power spectrum estimation unit 220 estimates an echo power spectrum B47 included in the collected sound signal by using the acoustic coupling amount B17 and the power spectrum B9 of the collected sound signal. The estimation means is based on, for example, [Formula 3] in FIG.

次にステップＳ１５３において、第１のエコー抑圧ゲイン計算部２３０は、第１のエコー信号パワースペクトル推定部２２０で推定されたエコーパワースペクトルＢ４７および収音信号のパワースペクトルＢ９を用いて、収音信号中のエコーを抑圧するためのエコー抑圧ゲインＢ５０を算出する。 Next, in step S153, the first echo suppression gain calculation unit 230 uses the echo power spectrum B47 estimated by the first echo signal power spectrum estimation unit 220 and the power spectrum B9 of the sound collection signal to collect the sound collection signal. An echo suppression gain B50 for suppressing the middle echo is calculated.

最後にステップＳ１５４において、第１のエコー抑圧ゲイン乗算部２４０は、エコー抑圧ゲインＢ５０を収音信号スペクトルＢ５に乗ずることによって収音信号中のエコーを抑圧した信号（第１のエコー抑圧信号）Ｂ１６を生成する。 Finally, in step S154, the first echo suppression gain multiplication unit 240 multiplies the collected sound signal spectrum B5 by the echo suppression gain B50 to suppress the echo in the collected sound signal (first echo suppression signal) B16. Is generated.

図６は第２のエコー抑圧処理部の流れを示している。 FIG. 6 shows the flow of the second echo suppression processing unit.

まずステップＳ１６１において、音響結合量読込部３２０は、音響結合量保存部３１０に保存されている、直近の「シングルトーク受話」状態で推定された音響結合量Ｂ１８を読み込む。 First, in step S 161, the acoustic coupling amount reading unit 320 reads the acoustic coupling amount B 18 estimated in the latest “single talk reception” state stored in the acoustic coupling amount storage unit 310.

次にステップＳ１６２において、第２のエコー信号パワースペクトル推定部３３０は、音響結合量Ｂ１８と収音信号のパワースペクトルＢ９を用いて、収音信号に含まれるエコーのパワースペクトルＤ４を推定する。推定方法は、例えば、図１２の［式３］による。 Next, in step S162, the second echo signal power spectrum estimation unit 330 estimates the echo power spectrum D4 included in the collected sound signal by using the acoustic coupling amount B18 and the power spectrum B9 of the collected sound signal. The estimation method is based on, for example, [Formula 3] in FIG.

次にステップＳ１６３において、第２のエコー抑圧ゲイン計算部３４０は、第２のエコー信号パワースペクトル推定部３３０で推定されたエコーパワースペクトルＤ４および収音信号のパワースペクトルＢ９を用いて、収音信号中のエコーを抑圧するためのエコー抑圧ゲインＤ７を算出する。 Next, in step S163, the second echo suppression gain calculation unit 340 uses the echo power spectrum D4 estimated by the second echo signal power spectrum estimation unit 330 and the power spectrum B9 of the sound collection signal to collect the sound collection signal. An echo suppression gain D7 for suppressing the echo inside is calculated.

最後にステップＳ１６４において、第２のエコー抑圧ゲイン乗算部３５０は、エコー抑圧ゲインＤ７を収音信号スペクトルＢ５に乗ずることによって収音信号中のエコーを抑圧した信号（第２のエコー抑圧信号）Ｂ３６を生成する。 Finally, in step S164, the second echo suppression gain multiplication unit 350 multiplies the collected sound signal spectrum B5 by the echo suppression gain D7 to suppress the echo in the collected sound signal (second echo suppression signal) B36. Is generated.

ここで、音響結合量Ｂ１８は、「ダブルトーク」状態で推定されたものではなく、直近の「シングルトーク受話」状態で推定・保存された音響結合量であるため、実際の音響結合量に近い値である。そのため、「ダブルトーク」状態においてもエコー信号のパワースペクトルを精度よく推定することができる。 Here, the acoustic coupling amount B18 is not estimated in the “double talk” state, but is an acoustic coupling amount estimated and stored in the latest “single talk reception” state, and thus is close to the actual acoustic coupling amount. Value. Therefore, the power spectrum of the echo signal can be accurately estimated even in the “double talk” state.

図７は通話状態判定部における、通話状態の判定方法を示している。 FIG. 7 shows a call state determination method in the call state determination unit.

まず、ステップＳ７１において、受信信号のパワーＢ１２を閾値Ｄ１３により判定し、閾値を下回る場合には受信信号がない状態、閾値を上回る場合には受信信号がある状態とみなす。 First, in step S71, the power B12 of the received signal is determined based on the threshold value D13. When the threshold value D13 is below the threshold value, it is assumed that there is no received signal.

ステップＳ７１にて、受信信号のパワーＢ１２が閾値Ｄ１３を下回り、受信信号がないとみなした場合、ステップＳ７２にて、第１のエコー抑圧処理後信号のパワーＢ１３を閾値Ｄ１７により判定し、閾値を下回る場合には送話信号がないとみなし、ステップＳ７３にて、通話状態Ｃに「発話なし」状態を示す値を格納する。一方、閾値を上回る場合には、送話信号があるとみなし、ステップＳ７４にて、通話状態Ｃに「シングルトーク送話」状態を示す値を格納する。 If it is determined in step S71 that the power B12 of the received signal is lower than the threshold D13 and there is no received signal, in step S72, the power B13 of the first post-echo suppression signal is determined by the threshold D17, and the threshold is If it is lower, it is considered that there is no transmission signal, and a value indicating the “no utterance” state is stored in the call state C in step S73. On the other hand, if it exceeds the threshold value, it is considered that there is a transmission signal, and a value indicating the “single talk transmission” state is stored in the call state C in step S74.

ステップＳ７１にて、受信信号のパワーＢ１２が閾値Ｄ１３を上回り、受信信号があるとみなした場合、ステップＳ７５にて、第１のエコー抑圧処理後信号のパワーＢ１３を閾値Ｄ１７により判定し、閾値を下回る場合には送話信号がないとみなし、ステップＳ７６にて、通話状態Ｃに「シングルトーク受話」状態を示す値を格納する。一方、閾値を上回る場合には、送話信号があるとみなし、ステップＳ７７にて、通話状態Ｃに「ダブルトーク」状態を示す値を格納する。 In step S71, if the received signal power B12 exceeds the threshold value D13 and it is determined that there is a received signal, in step S75, the first echo-suppressed signal power B13 is determined by the threshold value D17, and the threshold value is set. If it is lower, it is considered that there is no transmission signal, and a value indicating the “single talk reception” state is stored in the call state C in step S76. On the other hand, if it exceeds the threshold value, it is considered that there is a transmission signal, and a value indicating the “double talk” state is stored in the call state C in step S77.

ここで、上記閾値Ｄ１３およびＤ１７は、無音と判断できるパワーＰ_０［ｄＢ］に設定する。 Here, the threshold values D13 and D17 are set to power P ₀ [dB] that can be determined as silence.

以上の構成によって、「発話なし」状態、「シングルトーク受話」状態、「シングルトーク送話」状態では、従来通りのエコー抑圧処理を実施し、「ダブルトーク」状態では「シングルトーク受話」状態で推定した音響結合量を用いて再度エコー抑圧処理を実施することで、演算量の増加を従来の２倍程度に抑えつつ、通話状態に関わらず適切にエコー成分を抑圧できるようにしたエコー抑圧装置を提供することができる。 With the above configuration, the conventional echo suppression processing is performed in the “no utterance” state, “single talk reception” state, and “single talk transmission” state, and in the “single talk reception” state in the “double talk” state. An echo suppression device that can appropriately suppress the echo component regardless of the call state while suppressing the increase in the calculation amount to about twice that of the prior art by performing the echo suppression process again using the estimated acoustic coupling amount Can be provided.

〔実施の形態２〕
受信信号のパワーの大きさにかかわらず、エコー信号のパワーが大きい場合には、収音信号には音響エコー信号と送話信号の双方の信号が含まれるため、受信信号がある状態と同等に扱うべきである。これは例えば、受信信号がない状態で、残響音が大きな音として残っている場合である。 [Embodiment 2]
Regardless of the magnitude of the received signal power, if the echo signal power is high, the sound collection signal includes both the acoustic echo signal and the transmitted signal. Should be handled. This is the case, for example, when there is no received signal and the reverberant sound remains as a loud sound.

そこで、第２の実施の形態において、受信信号のパワーと第１のエコー抑圧処理部の出力信号のパワーと推定エコー信号のパワーを用いて通話状態の判定を行う方法について、図８と図９を用いて説明する。 Therefore, in the second embodiment, a method for determining the call state using the power of the received signal, the power of the output signal of the first echo suppression processing unit, and the power of the estimated echo signal will be described with reference to FIGS. Will be described.

図８のエコー抑圧装置においては、第１の実施の形態のエコー抑圧処理部出力信号パワー計算部１１１の代わりに、エコー抑圧処理部出力信号パワー計算部８１０が用いられ、通話状態判定部１１２の代わりに、通話状態判定部８５０が用いられている。 In the echo suppression apparatus of FIG. 8, an echo suppression processing unit output signal power calculation unit 810 is used instead of the echo suppression processing unit output signal power calculation unit 111 of the first embodiment, and the call state determination unit 112 Instead, a call state determination unit 850 is used.

エコー抑圧処理部出力信号パワー計算部８１０は、第１のエコー抑圧処理部１９０から出力される信号Ｂ１６のパワーＤ１３を計算するエコー抑圧後信号パワー計算部８２０と、第１のエコー信号パワースペクトル推定部２２０で推定された推定エコー信号パワースペクトルＢ４７から、エコー信号のパワーＦ４を算出し、通話状態判定部８５０に出力する第１の推定エコーパワー計算部８３０と、第２のエコー信号パワースペクトル推定部３３０で推定された推定エコー信号パワースペクトルＤ４から、エコー信号のパワーＦ６を算出し、通話状態判定部８５０に出力する第２の推定エコーパワー計算部８４０と、を備える。 The echo suppression processing unit output signal power calculation unit 810 includes a post-echo suppression signal power calculation unit 820 that calculates the power D13 of the signal B16 output from the first echo suppression processing unit 190, and a first echo signal power spectrum estimation. A first estimated echo power calculation unit 830 that calculates the power F4 of the echo signal from the estimated echo signal power spectrum B47 estimated by the unit 220 and outputs it to the call state determination unit 850; and a second echo signal power spectrum estimation A second estimated echo power calculation unit 840 that calculates an echo signal power F6 from the estimated echo signal power spectrum D4 estimated by the unit 330 and outputs it to the call state determination unit 850.

通話状態判定部８５０は、第１の推定エコーパワー計算部８３０から出力されたエコー信号のパワーＦ４と、第２の推定エコーパワー計算部８４０から出力されたエコー信号のパワーＦ６と、から、通話状態に応じて保存するエコーパワーを選択し、Ｆ９として保存する推定エコーパワー保存部８６０と、受信信号のパワーＢ１２と、第１のエコー抑圧処理後信号のパワーＢ１３と、前処理セグメントで保存された推定エコー信号のパワーＦ１２と、から通話状態（「発話なし」状態、「シングルトーク受話」状態、「シングルトーク送話」状態、「ダブルトーク」状態）を判定する通話状態判定処理部８７０とにより構成されている。それ以外の構成については、実施の形態１と同様である。 The call state determination unit 850 performs a call from the power F4 of the echo signal output from the first estimated echo power calculation unit 830 and the power F6 of the echo signal output from the second estimated echo power calculation unit 840. The echo power storage unit 860 that selects the echo power to be stored according to the state and stores it as F9, the power B12 of the received signal, the power B13 of the signal after the first echo suppression processing, and the preprocessing segment A call state determination processing unit 870 that determines a call state (“no utterance” state, “single talk reception” state, “single talk transmission” state, “double talk” state) from the estimated echo signal power F12; It is comprised by. Other configurations are the same as those in the first embodiment.

以下に、それぞれの処理部における動作を説明する。第１の推定エコーパワー計算部８３０は、第１のエコー信号パワースペクトル推定部２２０で推定されたエコー信号のパワースペクトルＢ４７から、エコー信号のパワーＦ４を算出し、通話状態判定部８５０に出力する。通話状態判定処理部８７０は、受信信号のパワーＢ１２と第１のエコー抑圧処理後信号のパワーＢ１３と前処理セグメントで保存された推定エコー信号のパワーＦ１２を用いて通話状態を判定する。判定の方法は後述する。通話状態判定部８５０で通話状態が「ダブルトーク」状態と判定された場合、第２の推定エコーパワー推定部８４０は、第２のエコー信号パワースペクトル推定部３３０で推定されたエコー信号のパワースペクトルＤ４から、エコー信号のパワーＦ６を算出し、通話状態判定部８５０に出力する。通話状態判定部８５０は、通話状態を「発話なし」状態、「シングルトーク受話」状態、「シングルトーク送話」状態と判定した場合は、第１の推定エコーパワー計算部８３０から出力されたエコー信号のパワーＦ４を推定エコー信号パワー保存部８６０にＦ９として保存する。また、通話状態を「ダブルトーク」状態と判定した場合は、第２の推定エコーパワー計算部８４０から出力されたエコー信号のパワーＦ６を推定エコー信号パワー保存部８６０にＦ９として保存する。 Hereinafter, the operation of each processing unit will be described. First estimated echo power calculation section 830 calculates echo signal power F 4 from echo signal power spectrum B 47 estimated by first echo signal power spectrum estimation section 220, and outputs it to call state determination section 850. . The call state determination processing unit 870 determines the call state using the power B12 of the received signal, the power B13 of the first echo-suppressed signal, and the power F12 of the estimated echo signal stored in the preprocessing segment. The determination method will be described later. When the call state determination unit 850 determines that the call state is the “double talk” state, the second estimated echo power estimation unit 840 determines the power spectrum of the echo signal estimated by the second echo signal power spectrum estimation unit 330. From D4, the power F6 of the echo signal is calculated and output to the call state determination unit 850. When the call state determination unit 850 determines that the call state is “no utterance” state, “single talk reception” state, or “single talk transmission” state, the echo output from the first estimated echo power calculation unit 830 The signal power F4 is stored in the estimated echo signal power storage unit 860 as F9. When the call state is determined to be the “double talk” state, the echo signal power F6 output from the second estimated echo power calculation unit 840 is stored in the estimated echo signal power storage unit 860 as F9.

図９は通話状態判定部８５０における、通話状態の判定方法を示している。 FIG. 9 shows a call state determination method in the call state determination unit 850.

まず、ステップＳ９１において、受信信号のパワーＢ１２を閾値Ｄ１３により、推定エコー信号のパワーＦ１２を閾値Ｆ２７により判定し、両方のパワーが閾値を下回る場合には受信信号がない状態、閾値を上回る場合には受信信号がある状態とみなす。 First, in step S91, the power B12 of the received signal is determined by the threshold D13 and the power F12 of the estimated echo signal is determined by the threshold F27. If both powers are lower than the threshold, there is no received signal; Is considered to be a state with a received signal.

ステップＳ９１にて、受信信号のパワーＢ１２と推定エコー信号のパワーＦ１２が閾値を下回り、受信信号がないとみなした場合、ステップＳ９２にて、第１のエコー抑圧処理後信号のパワーＢ１３を閾値Ｄ１７により判定し、閾値を下回る場合には送話信号がないとみなし、ステップＳ９３にて、通話状態Ｃに「発話なし」状態を示す値を格納する。一方、閾値を上回る場合には、送話信号があるとみなし、ステップＳ９４にて、通話状態Ｃに「シングルトーク送話」状態を格納する。 If it is determined in step S91 that the power B12 of the received signal and the power F12 of the estimated echo signal are below the threshold and there is no received signal, the power B13 of the first echo-suppressed signal is set to the threshold D17 in step S92. If it falls below the threshold value, it is considered that there is no transmission signal, and a value indicating the “no utterance” state is stored in the call state C in step S93. On the other hand, if it exceeds the threshold, it is considered that there is a transmission signal, and the “single talk transmission” state is stored in the call state C in step S94.

ステップＳ９１にて、受信信号のパワーＢ１２と推定エコー信号のパワーＦ１２のどちらか（または両方）が閾値を上回り、受信信号があるとみなした場合、ステップＳ９５にて、第１のエコー抑圧処理後信号のパワーＢ１３を閾値Ｄ１７により判定し、閾値を下回る場合には送話信号がないとみなし、ステップＳ９６にて、通話状態Ｃに「シングルトーク受話」状態を示す値を格納する。一方、閾値を上回る場合には、送話信号があるとみなし、ステップＳ９７にて、通話状態Ｃに「ダブルトーク」状態を示す値を格納する。 If it is determined in step S91 that either (or both) the power B12 of the received signal and the power F12 of the estimated echo signal exceed the threshold and there is a received signal, after the first echo suppression process in step S95 The signal power B13 is determined based on the threshold value D17, and if it is below the threshold value, it is considered that there is no transmission signal, and a value indicating the “single talk reception” state is stored in the call state C in step S96. On the other hand, if it exceeds the threshold, it is considered that there is a transmission signal, and a value indicating the “double talk” state is stored in the call state C in step S97.

ここで、上記閾値Ｆ２７は、忘却係数βを含めて設定する。忘却係数βは、次の処理セグメント（時間Ｔ０）までのパワーの減衰率を示すものである。また、音速とスピーカー・マイク間距離から、マイクに収音されるまでの時間（Ｔ１）が求められる。このとき、ある処理セグメントで収音された音響エコーのパワーは、次の処理セグメントで収音される際には、β^T1/T0倍に減衰し、β^T1/T0×音響エコーのパワーＦ１２となるため、これが無音か否かを判定する。よって、音響エコーのパワーＦ１２の判定閾値Ｆ２７は、無音と判断できるパワーをＰ_０［ｄＢ］としたとき、Ｐ_０をβ^T1/T0で除した値Ｐ_０/β^T1/T0とする。 Here, the threshold value F27 is set including the forgetting factor β. The forgetting factor β indicates the power attenuation rate up to the next processing segment (time T0). Further, the time (T1) until sound is picked up by the microphone is obtained from the sound speed and the distance between the speaker and the microphone. At this time, the power of the acoustic echo collected in one processing segment is attenuated by β ^{T1 / T0} times when it is collected in the next processing segment, and β ^{T1 / T0} × acoustic echo power F12 Therefore, it is determined whether this is silent. Accordingly, the determination threshold value F27 of the acoustic echo power F12 is set to a value P ₀ / β ^{T1 / T0 obtained} by dividing P ₀ by β ^{T1 / T0,} where P ₀ [dB] is the power that can be determined as silence.

以上の構成によって、受信信号のパワーが小さく、エコー信号のパワーが大きい場合であっても、通話状態を精度よく判定し、通話状態に関わらず適切にエコー成分を抑圧できるようにしたエコー抑圧装置を提供することができる。 With the above configuration, an echo suppressor that can accurately determine the call state and appropriately suppress the echo component regardless of the call state even when the received signal power is low and the echo signal power is high. Can be provided.

〔実施の形態３〕
通話状態判定部で「発話なし」状態、又は「シングルトーク受話」状態と判定した場合は、送話信号がないと判定したことと同等であり、収音されている信号は雑音やエコー信号であり、不要な情報である。 [Embodiment 3]
If the call state determination unit determines that there is no utterance or single talk reception, it is equivalent to determining that there is no transmission signal, and the collected signal is a noise or echo signal. Yes, unnecessary information.

そこで、第３の実施の形態において、通話状態が「発話なし」状態又は「シングルトーク受話」状態と判定された場合には、遠隔地へ送信する信号をゼロとする方法について、図１０を用いて説明する。 Therefore, in the third embodiment, when the call state is determined as the “no utterance” state or the “single talk reception” state, FIG. 10 is used as a method of setting the signal to be transmitted to the remote place to zero. I will explain.

図１０のエコー抑圧装置においては、第１の実施の形態の後段処理部１１３の代わりに、後段処理部１０１が用いられている。後段処理部１０１は、出力処理部１１４の代わりに、エコー抑圧信号処理部１０２が用いられている。それ以外の構成については、実施の形態１と同様である。 In the echo suppression apparatus of FIG. 10, a post-processing unit 101 is used instead of the post-processing unit 113 of the first embodiment. The post-processing unit 101 uses an echo suppression signal processing unit 102 instead of the output processing unit 114. Other configurations are the same as those in the first embodiment.

それぞれの処理部における動作を説明する。通話状態判定部１１２は、処理セグメントでの通話状態を判定する。判定方法は、実施の形態１・２のどちらの方法でもよい。 The operation in each processing unit will be described. The call state determination unit 112 determines the call state in the processing segment. The determination method may be either of the methods in the first and second embodiments.

通話状態判定部１１２で判定された通話状態が「シングルトーク受話」状態の場合、音響結合量保存処理部１１５は、第１のエコー抑圧処理部１９０で推定された音響結合量Ｂ１７をＢ１８として保存する。次に、エコー抑圧信号処理部１０２は、後段処理部１０１の出力信号Ｂ６をゼロとする。 When the call state determined by the call state determination unit 112 is the “single talk reception” state, the acoustic coupling amount storage processing unit 115 stores the acoustic coupling amount B17 estimated by the first echo suppression processing unit 190 as B18. To do. Next, the echo suppression signal processing unit 102 sets the output signal B6 of the post-processing unit 101 to zero.

通話状態判定部１１２で判定された通話状態が「シングルトーク送話」状態の場合、出力処理部１１４は、第１のエコー抑圧処理部１９０の出力信号Ｂ１６を後段処理部１０１の出力信号Ｂ６とする。 When the call state determined by the call state determination unit 112 is the “single talk transmission” state, the output processing unit 114 uses the output signal B16 of the first echo suppression processing unit 190 as the output signal B6 of the subsequent processing unit 101. To do.

通話状態判定部１１２で判定された通話状態が「ダブルトーク」状態の場合、第２のエコー抑圧処理部１１５は受信信号パワースペクトルＢ８と収音信号パワースペクトルＢ９と音響結合量保存処理部１１５に保存されている音響結合量Ｂ１８とを用いて、ＳＴＳＡ推定に基づくエコー抑圧処理を実施し、Ｂ３６を出力する。後段処理部１０１は、第２のエコー抑圧処理部の出力信号Ｂ３６を後段処理部１０１の出力信号Ｂ６とする。 When the call state determined by the call state determination unit 112 is the “double talk” state, the second echo suppression processing unit 115 supplies the received signal power spectrum B8, the collected sound signal power spectrum B9, and the acoustic coupling amount storage processing unit 115. Using the stored acoustic coupling amount B18, echo suppression processing based on STSA estimation is performed, and B36 is output. The post-processing unit 101 uses the output signal B36 of the second echo suppression processing unit as the output signal B6 of the post-processing unit 101.

通話状態判定部１１２で判定された通話状態が上記３状態以外の場合、送話信号と受信信号の両方がない「発話なし」状態とみなし、エコー抑圧信号処理部１０２は、後段処理部１０１の出力信号Ｂ６をゼロとする。 When the call state determined by the call state determination unit 112 is other than the above three states, it is regarded as a “no utterance” state in which neither the transmission signal nor the reception signal is present, and the echo suppression signal processing unit 102 The output signal B6 is set to zero.

以上の構成によって、「発話なし」状態、又は「シングルトーク受話」状態と判定した場合は、遠隔地に送信される信号がゼロとなるため、上記状態において、遠隔地に雑音やエコー信号が送信されないエコー抑圧装置を提供することができる。 With the above configuration, when it is determined that there is no utterance or single talk reception, the signal transmitted to the remote location is zero, so in this state, noise and echo signals are transmitted to the remote location. An echo suppression device that is not performed can be provided.

〔ソフトウェアによる実現例〕
エコー抑圧装置の各部は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ＣＰＵ（Central Processing Unit）を用いてソフトウェアによって実現してもよい。 [Example of software implementation]
Each unit of the echo suppression device may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be realized by software using a CPU (Central Processing Unit).

後者の場合、エコー抑圧装置は、各機能を実現するソフトウェアであるプログラムの命令を実行するＣＰＵ、上記プログラムおよび各種データがコンピュータ（またはＣＰＵ）で読み取り可能に記録されたＲＯＭ（Read Only Memory）または記憶装置（これらを「記録媒体」と称する）、上記プログラムを展開するＲＡＭ（Random Access Memory）などを備えている。そして、コンピュータ（またはＣＰＵ）が上記プログラムを上記記録媒体から読み取って実行することにより、各部の機能が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the echo suppressor includes a CPU that executes instructions of a program that is software that realizes each function, a ROM (Read Only Memory) in which the program and various data are recorded so as to be readable by a computer (or CPU), or A storage device (these are referred to as “recording media”), a RAM (Random Access Memory) that expands the program, and the like are provided. Then, the function of each unit is achieved by the computer (or CPU) reading the program from the recording medium and executing it. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.

本発明の第１の技術手段は、受信信号のパワースペクトルと、収音信号のパワースペクトルと、から、音響エコー信号のパワースペクトルと、受信信号のパワースペクトル、の比（以下、「音響結合量」という）を推定し、前記受信信号のパワースペクトルに、前記音響結合量を乗じてエコー信号のパワースペクトルを推定し、前記収音信号のパワースペクトルと、前記エコー信号のパワースペクトルと、からエコー抑圧ゲインを計算し、収音信号の短時間スペクトルに、前記エコー抑圧ゲインを乗じてエコー信号成分を取り除いた信号の短時間スペクトルを出力する、短時間スペクトル振幅推定に基づくエコー抑圧装置であって、前記受信信号のパワースペクトルと、前記収音信号のパワースペクトルと、を用いて、短時間スペクトル振幅推定に基づくエコー抑圧処理を行う第１のエコー抑圧処理部と、前記受信信号のパワースペクトルから、前記受信信号のパワーを計算する受信信号パワー計算部と、前記第１のエコー抑圧処理部から出力された信号から、前記第１のエコー抑圧処理部からの出力信号のパワーを計算するエコー抑圧処理部出力信号パワー計算部と、前記受信信号パワー計算部から出力されたパワーと、前記エコー抑圧処理部出力信号パワー計算部からの出力されたパワーと、から、通話状態を判定する通話状態判定部と、前記第１のエコー抑圧処理部から出力された信号と、前記収音信号のパワースペクトルと、前記通話状態判定部で判定された通話状態を入力とし、前記通話状態に応じた処理を行う後段処理部と、を有し、前記後段処理部は、前記通話状態判定部で、受信信号のみがある状態、又は、送話信号と受信信号の両方がない状態、又は、送話信号のみがある状態、と判定した場合には、前記第１のエコー抑圧処理部から出力された信号を操作し、この出力信号を後段処理部の出力信号とするエコー抑圧後信号処理部と、前記通話状態判定部で、受信信号のみがある状態と判定した場合には、前記第１のエコー抑圧処理部で推定した音響結合量を保存する音響結合量保存処理部と、前記通話状態判定部で、送話信号と受信信号の両方がある状態と判定した場合には、前記保存されている音響結合量と、前記受信信号のパワースペクトルと、前記収音信号のパワースペクトルと、を用いて短時間スペクトル振幅推定に基づくエコー抑圧処理を行い、この出力信号を後段処理部の出力信号とする第２のエコー抑圧処理部と、を具備する、ことを特徴とするものである。 According to the first technical means of the present invention, the ratio of the power spectrum of the acoustic echo signal and the power spectrum of the received signal (hereinafter referred to as “acoustic coupling amount”) from the power spectrum of the received signal and the power spectrum of the collected sound signal. And the echo signal power spectrum is estimated by multiplying the power spectrum of the received signal by the acoustic coupling amount, and an echo is obtained from the power spectrum of the collected sound signal and the power spectrum of the echo signal. An echo suppression device based on short-time spectrum amplitude estimation, which calculates a suppression gain and outputs a short-time spectrum of a signal obtained by multiplying the short-time spectrum of a collected signal by the echo suppression gain and removing an echo signal component. , Using the power spectrum of the received signal and the power spectrum of the collected sound signal, Output from a first echo suppression processing unit that performs echo suppression processing based on a constant, a received signal power calculation unit that calculates power of the received signal from the power spectrum of the received signal, and an output from the first echo suppression processing unit An echo suppression processing unit output signal power calculation unit for calculating the power of the output signal from the first echo suppression processing unit, the power output from the received signal power calculation unit, and the echo suppression processing A signal output from the first echo suppression processing unit; a power spectrum of the collected sound signal; and a power spectrum of the collected sound signal. A post-processing unit that receives the call state determined by the call state determination unit and performs processing according to the call state, and the post-processing unit includes the call state When the determination unit determines that there is only a received signal, or a state in which there is no transmission signal and a reception signal, or a state in which there is only a transmission signal, the first echo suppression processing unit When the signal processed by the echo suppression signal processing unit that uses the output signal as the output signal of the subsequent processing unit and the call state determination unit determines that there is only a received signal, When the acoustic coupling amount storage processing unit that stores the acoustic coupling amount estimated by the first echo suppression processing unit and the call state determination unit determine that both the transmission signal and the reception signal exist, Echo suppression processing based on short-time spectrum amplitude estimation is performed using the stored acoustic coupling amount, the power spectrum of the received signal, and the power spectrum of the collected sound signal. Output signal And a second echo suppression processing unit.

第２の技術手段は、第１の技術手段であって、前記通話状態判定部は、前記受信信号パワー計算部から出力された受信信号のパワーと、前記エコー抑圧処理部出力信号パワー計算部から出力された第１のエコー抑圧処理部の出力信号のパワーと、を閾値によって判定し、通話状態を判定することを特徴とするものである。 The second technical means is the first technical means, wherein the call state determining unit includes the power of the received signal output from the received signal power calculating unit and the output signal power calculating unit of the echo suppression processing unit. The power of the output signal of the first echo suppression processing unit that has been output is determined based on a threshold value, and the call state is determined.

第３の技術手段は、第１の技術手段であって、前記通話状態判定部は、前記受信信号パワー計算部から出力された受信信号のパワーと、前記エコー抑圧処理部出力信号パワー計算部から出力された第１のエコー抑圧処理部の出力信号のパワーと、前記第１のエコー抑圧処理部、または、前記第２のエコー抑圧処理部で推定したエコー信号のパワーと、を閾値によって判定し、通話状態を判定することを特徴とするものである。 The third technical means is the first technical means, wherein the call state determination unit is configured to receive the power of the reception signal output from the reception signal power calculation unit and the output signal power calculation unit of the echo suppression processing unit. The output power of the output signal of the first echo suppression processing unit and the power of the echo signal estimated by the first echo suppression processing unit or the second echo suppression processing unit are determined by a threshold. The call state is determined.

第４の技術手段は、第１の技術手段であって、前記後段処理部は、前記通話状態判定部で、受信信号のみがある状態、または、送話信号と受信信号の両方がない状態と判定した場合には、前記エコー抑圧後信号処理部にて、前記第１のエコー抑圧処理部から出力された音声信号を前記後段処理部の出力信号とすることを特徴とするものである。 The fourth technical means is the first technical means, and the post-processing unit is a state where there is only a received signal, or a state where there is neither a transmitted signal nor a received signal, in the call state determining unit. If it is determined, the post-echo suppression signal processing unit uses the audio signal output from the first echo suppression processing unit as the output signal of the subsequent processing unit.

第５の技術手段は、第１の技術手段であって、前記後段処理部は、前記通話状態判定部で、受信信号のみがある状態、または、送話信号と受信信号の両方がない状態と判定した場合には、前記エコー抑圧後信号処理部にて、前記後段処理部の出力信号をゼロとすることを特徴とするものである。 A fifth technical means is the first technical means, wherein the post-processing unit is a state where there is only a reception signal, or a state where there is no transmission signal and no reception signal, in the call state determination unit. If it is determined, the post-echo suppression signal processing unit sets the output signal of the post-processing unit to zero.

本発明は、例えば、ＴＶ会議システムに好適である。 The present invention is suitable for a TV conference system, for example.

１９０第１のエコー抑圧処理部
１１６第２のエコー抑圧処理部
１１２通話状態判定部
１１５音響結合量保存処理部
１１４出力処理部
１２０スピーカ（発音装置）
１３０マイクロフォン（収音装置） 190 First Echo Suppression Processing Unit 116 Second Echo Suppression Processing Unit 112 Call State Determination Unit 115 Acoustic Coupling Amount Storage Processing Unit 114 Output Processing Unit 120 Speaker (Sound Generation Device)
130 Microphone (sound collecting device)

Claims

Used with sound generators and sound collectors that generate sound in response to signals received from the communication path, and by applying a process based on short-time spectral amplitude estimation to the collected sound signals obtained via the sound collector. An echo suppressor that generates a signal for transmission,
Based on the power spectrum of the received signal and the power spectrum of the collected sound signal, an acoustic coupling amount that is a ratio between the power spectrum of the received signal and the power spectrum of the echo signal is estimated, and an echo is generated from the collected sound signal using the acoustic coupling amount. A first echo suppression processing unit for generating a first echo suppression signal from which at least part of the signal is removed;
A call state determination unit that determines a call state using the power of the received signal and the power of the first echo suppression signal;
An acoustic coupling amount storage processing unit that stores the acoustic coupling amount when the call state determination unit determines that the state is a reception-only state;
A second echo obtained by removing at least a part of the echo signal from the collected sound signal using the acoustic coupling amount stored in the acoustic coupling amount storage processing unit when the call state determination unit determines that the state is the reception and transmission state An echo suppression apparatus comprising: a second echo suppression processing unit that generates a suppression signal.

The call state determination unit determines the call state by comparing the power of the received signal with a first threshold value and comparing the power of the first echo suppression signal with a second threshold value. Item 2. The echo suppressor according to Item 1.

The call state determination unit further compares the power of the echo signal obtained from the power spectrum of the echo signal estimated by the first or second echo suppression processing unit with a third threshold value to determine the call state. The echo suppression apparatus according to claim 2.

A first echo suppression signal is output when the call state determination unit determines that it is in a reception-only state or a state in which neither reception nor transmission is received, and a second state when the call state determination unit determines that it is in a reception and transmission state. The echo suppression apparatus according to claim 1, further comprising an output processing unit that outputs an echo suppression signal.

When the call state determination unit determines that it is in a reception-only state or in a state where neither reception nor transmission is performed, the output is zero, and when the call state determination unit determines that it is in a reception and transmission state, a second echo suppression signal is output. The echo suppression apparatus according to claim 1, further comprising: an output processing unit that performs the processing.

The echo suppression apparatus according to claim 4 or 5, wherein the output processing unit outputs a first echo suppression signal when the call state determination unit determines that the state is a transmission only state.

The reception timing of the reception signal related to the power of the echo signal compared with the third threshold is set before the reception timing of the reception signal related to the power of the reception signal compared with the first threshold. The echo suppressor according to claim 3.