JP5466581B2

JP5466581B2 - Echo canceling method, echo canceling apparatus, and echo canceling program

Info

Publication number: JP5466581B2
Application number: JP2010128725A
Authority: JP
Inventors: 翔一郎齊藤; 末廣島内; 陽一羽田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-06-04
Filing date: 2010-06-04
Publication date: 2014-04-09
Anticipated expiration: 2030-06-04
Also published as: JP2011254420A

Description

本発明は、マイクで収音される収音信号からスピーカで再生される受話信号に起因するエコー成分を、周波数毎にゲインを乗じて抑圧する反響消去技術に関する。 The present invention relates to an echo canceling technique for suppressing an echo component caused by a received signal reproduced by a speaker from a collected signal collected by a microphone by multiplying a gain for each frequency.

反響消去装置は、適応フィルタによる線形エコー消去と、振幅スペクトル制御による非線形エコー抑圧の２段構成となっているものがある。非特許文献１記載の反響消去装置１０が、２段構成の反響消去装置の従来技術として知られている。図１を用いて、反響消去装置１０の概略を説明する。 Some echo cancellation apparatuses have a two-stage configuration of linear echo cancellation by an adaptive filter and nonlinear echo suppression by amplitude spectrum control. An echo canceling device 10 described in Non-Patent Document 1 is known as a prior art of a two-stage echo canceling device. An outline of the echo canceling apparatus 10 will be described with reference to FIG.

スピーカ２で再生された受話信号ｘ（ｎ）はエコー経路５を経て、マイク３に回り込む。反響消去装置１０は、マイク３で収音される収音信号ｙ（ｎ）からスピーカ２で再生される受話信号ｘ（ｎ）に起因するエコー成分を抑圧する。なお、ｎは時刻を表す整数である。 The received signal x (n) reproduced by the speaker 2 goes around the microphone 3 via the echo path 5. The echo canceling apparatus 10 suppresses echo components caused by the received signal x (n) reproduced by the speaker 2 from the collected signal y (n) collected by the microphone 3. Note that n is an integer representing time.

この構成では、適応フィルタ部１１において、受話端１から入力される受話信号ｘ（ｎ）を用いて、収音信号ｙ（ｎ）から線形処理でエコー成分を消去し、残留エコー信号ｄ_１（ｎ）を求める。さらに、周波数領域変換部１３において、残留エコー信号ｄ_１（ｎ）を現在時刻ｎからｄ_１（ｎ）、ｄ_１（ｎ−１）、…、ｄ_１（ｎ−Ｌ＋１）のＬ個分を１フレームとし、周波数領域の信号Ｄ_１（ｆ，ｋ）に変換する。Ｄ_１（ｆ，ｋ）は残留エコー信号ｄ_１（ｎ）をフーリエ変換したものであり、ｆは離散角周波数を、ｋはフレーム時刻を表し、フーリエ変換長をＦとしたときｆは１からＦの整数である。 In this configuration, the adaptive filter unit 11 uses the received signal x (n) input from the receiving end 1 to eliminate the echo component from the collected sound signal y (n) by linear processing, and the residual echo signal d ₁ ( n). Further, in the frequency domain transform unit 13, the residual echo signal d ₁ (n) is converted from the current time n to L _{1 of} d ₁ (n), d ₁ (n−1),..., D ₁ (n−L + 1). One frame is converted into a frequency domain signal D ₁ (f, k). D ₁ (f, k) is a Fourier transform of the residual echo signal d ₁ (n), f is a discrete angular frequency, k is a frame time, and f is 1 when the Fourier transform length is F. It is an integer of F.

雑音抑圧部１５において、残留エコー信号Ｄ_１（ｆ，ｋ）に含まれる雑音成分を抑圧し、雑音除去信号Ｄ_２（ｆ，ｋ）を求める。周波数領域変換部１７において、受話信号ｘ（ｎ）を周波数領域の信号Ｘ（ｆ，ｋ）に変換する。さらに、残留エコー抑圧部１８において、この信号Ｘ（ｆ，ｋ）を用いて、雑音除去信号Ｄ_２（ｆ，ｋ）に含まれる残留エコー成分を抑圧し、送話信号Ｄ_３（ｆ，ｋ）を求める。時間領域変換部１９において、送話信号Ｄ_３（ｆ，ｋ）を時間領域の送話信号ｄ_３（ｎ）に変換し、送話端４へ出力する。 The noise suppression unit 15 suppresses a noise component included in the residual echo signal D ₁ (f, k), and obtains a noise removal signal D ₂ (f, k). The frequency domain converter 17 converts the received signal x (n) into a frequency domain signal X (f, k). Further, the residual echo suppression unit 18 uses the signal X (f, k) to suppress the residual echo component included in the noise removal signal D ₂ (f, k), and transmits the transmission signal D ₃ (f, k). ) In the time domain conversion unit 19, the transmission signal D ₃ (f, k) is converted into a transmission signal d ₃ (n) in the time domain and output to the transmission end 4.

ここで残留エコー抑圧部１８におけるエコー抑圧処理の部分に着目する。残留エコー抑圧部１８では、エコー抑圧ゲインＧ（ｆ，ｋ）を求め、残留エコー抑圧部１８の入力信号であるＤ_２（ｆ，ｋ）に対して周波数領域でＧ（ｆ，ｋ）を乗ずることでエコーを抑圧している。具体的にはエコー抑圧ゲインＧ（ｆ，ｋ）を
Ｇ（ｆ，ｋ）＝（|Ｄ_２（ｆ，ｋ）|^２−|Ｙ＾（ｆ，ｋ）|^２）／|Ｄ_２（ｆ，ｋ）|^２（１）
として計算する。|・｜は絶対値を取ることを表す。さらに、送話信号Ｄ_３（ｆ，ｋ）を
Ｄ_３（ｆ，ｋ）＝Ｇ（ｆ，ｋ）Ｄ_２（ｆ，ｋ）（２）
として計算する。
式（１）のＹ＾（ｆ，ｋ）は疑似残留エコーであり、非特許文献１では
E[|Y^（f,k）|²]=E[|H（f,k）|²]|X（f,k）|²+βE[|Y^（f,k-1）|²］（３）
として求めている。Ｈ（ｆ，ｋ）は疑似残留エコー経路を表し、Ｅ［|Ｘ（ｆ，ｋ）|^２］とＥ［|Ｄ_２（ｆ，ｋ）|^２］の比の最小値等を用いて求める。Ｅ［・］は集合平均を取ることを表す。βは忘却定数で残響時間に合わせた値に設定する。 Here, attention is focused on the part of echo suppression processing in the residual echo suppression unit 18. The residual echo suppression unit 18 obtains an echo suppression gain G (f, k) and multiplies D ₂ (f, k), which is an input signal of the residual echo suppression unit 18, by G (f, k) in the frequency domain. This suppresses the echo. Specifically, the echo suppression gain G (f, k) is set to G (f, k) = (| D ₂ (f, k) | ² − | Y ^ (f, k) | ² ) / | D ₂ (f , K) | ² (1)
Calculate as | · | Represents taking an absolute value. Further, the transmission signal D ₃ (f, k) is changed to D ₃ (f, k) = G (f, k) D ₂ (f, k) (2)
Calculate as
Y ^ (f, k) in equation (1) is a pseudo-residual echo.
E [| Y ^ (f, k) | ² ] = E [| H (f, k) | ² ] | X (f, k) | ² + βE [| Y ^ (f, k-1) | ² ] (3)
Asking. H (f, k) represents a pseudo residual echo path, and is obtained using the minimum value of the ratio of E [| X (f, k) | ² ] and E [| D ₂ (f, k) | ² ]. . E [•] represents taking a set average. β is a forgetting constant and is set to a value that matches the reverberation time.

残留エコー抑圧部１８における振幅スペクトル制御は、適応フィルタ部１１でエコーが消しきれなかった場合に残る残留エコー成分を消去することができる。しかし、適応フィルタ部１１と異なり、エコー抑圧量に応じてエコーとは関係ない送話音声も一部抑圧してしまう。その結果、送話音声が歪んで聞き取りづらくなってしまう問題がある。 The amplitude spectrum control in the residual echo suppression unit 18 can eliminate the residual echo component that remains when the adaptive filter unit 11 cannot cancel the echo. However, unlike the adaptive filter unit 11, the transmitted voice that is not related to the echo is partially suppressed according to the echo suppression amount. As a result, there is a problem that the transmitted voice is distorted and difficult to hear.

そこで、非特許文献１では、音声歪を軽減する方法として原音付加率１−αを設定する方法を提案している。つまり、送話信号を式（２）の代わりに
Ｄ_３（ｆ，ｋ）＝（１−α）Ｄ_３（ｆ，ｋ）＋αＧ（ｆ，ｋ）Ｄ_２（ｆ，ｋ）（４）
としてエコー抑圧ゲインＧ（ｆ，ｋ）の影響を軽減する。ここで原音付加率αは０から１の実数である。 Therefore, Non-Patent Document 1 proposes a method of setting the original sound addition rate 1-α as a method of reducing audio distortion. That is, instead of the expression (2), the transmission signal is represented by D ₃ (f, k) = (1−α) D ₃ (f, k) + αG (f, k) D ₂ (f, k) (4)
To reduce the influence of the echo suppression gain G (f, k). Here, the original sound addition rate α is a real number from 0 to 1.

阪内澄宇、羽田陽一、田中雅史、佐々木潤子、片岡章俊著、”雑音抑圧及びエコー抑圧機能を備えた音響エコーキャンセラ”、電子情報通信学会論文誌Ａ、2004年、Vol.J-87-A、No.4、pp.448-457Hanai Seiyu, Haneda Yoichi, Tanaka Masafumi, Sasaki Junko, Kataoka Akitoshi, “Acoustic Echo Canceller with Noise Suppression and Echo Suppression Functions”, IEICE Transactions A, 2004, Vol. J-87- A, No.4, pp.448-457

原音付加率を大きくし、エコー抑圧ゲインを小さくすれば音声の歪は少なくなるが、その分エコー消去性能は悪くなり、この二つはトレードオフの関係になっている。最適な原音付加率は、抑圧対象の信号により異なるが、従来技術の原音付加率は固定であり、必ずしも状況に応じた値を設定することができず、最適な原音付加率を設定できないという問題がある。 If the original sound addition rate is increased and the echo suppression gain is decreased, the distortion of the sound is reduced, but the echo cancellation performance is deteriorated accordingly, and the two are in a trade-off relationship. The optimal original sound addition rate varies depending on the signal to be suppressed, but the original sound addition rate of the prior art is fixed, and it is not always possible to set a value according to the situation, and the optimal original sound addition rate cannot be set. There is.

反響消去装置において、母音部分の信号に対して最適な原音付加率とした場合、子音部分の信号は、もともとの振幅が小さいことに加え、周波数スペクトルの特性が抑圧により変化するため別の子音に聞き間違えるという弊害が生じると考えられる。以下、図２を用いて説明する。送話音声が母音の場合、送話音声に残留エコーが重畳した信号（図２Ａ参照）に、残留エコー抑圧処理により送話音声の欠損が生じた場合にも元のスペクトルと概形はあまり変わらない（図２Ｂ参照）。同様の原音付加率で送話音声が子音の場合、送話音声に残留エコーが重畳した信号（図２Ｃ参照）に、残留エコー抑圧処理により送話音声の欠損が生じると、もともとの振幅が小さいことに加え、周波数スペクトルの特性が抑圧により変化するため（図２Ｄ参照）、元のスペクトルと大きく異なるものとなり、別の子音に聞き間違える等の問題が生じる。 In the echo canceller, when the optimal original sound addition rate is set for the vowel part signal, the consonant part signal has a small original amplitude, and the frequency spectrum characteristics change due to suppression. It seems that there will be a negative effect of making mistakes. Hereinafter, a description will be given with reference to FIG. When the transmitted voice is a vowel, the original spectrum and the outline are not so different even when the transmitted voice is lost due to the residual echo suppression process in the signal (see FIG. 2A) in which the residual echo is superimposed on the transmitted voice. No (see FIG. 2B). When the transmission voice is a consonant with the same original sound addition rate, if the transmission voice is lost due to the residual echo suppression process in the signal in which the residual echo is superimposed on the transmission voice (see FIG. 2C), the original amplitude is small. In addition, since the characteristics of the frequency spectrum change due to the suppression (see FIG. 2D), the frequency spectrum is greatly different from the original spectrum, causing problems such as misunderstanding of another consonant.

逆に、子音部分の信号に対して最適な原音付加率とした場合には、母音部分で十分なエコー消去性能を得ることができないという問題が生じる。 On the other hand, when the original sound addition rate is optimal for the signal of the consonant part, there arises a problem that sufficient echo canceling performance cannot be obtained in the vowel part.

上記の課題を解決するために、本発明に係る反響消去技術は、収音信号に基づいて得られる信号ｄ（ｎ）及び受話信号ｘ（ｎ）を、フレーム毎にそれぞれ周波数領域の信号Ｄ（ｆ，ｋ）及びＸ（ｆ，ｋ）に変換し、信号Ｄ（ｆ，ｋ）及びＸ（ｆ，ｋ）を用いて、エコー抑圧ゲインＧｂ＾（ｆ，ｋ）を求め、信号Ｄ（ｆ，ｋ）からエコー成分を取り除いた信号Ｄ’（ｆ，ｋ）を用いて、抑圧対象の信号が母音であるか子音であるかを判定し、抑圧対象の信号が母音であると判定された場合にはγ_２を緩和係数β（ｋ）とし、それ以外の場合にはγ_１を緩和係数β（ｋ）とし、信号Ｄ（ｆ，ｋ）とエコー抑圧ゲインＧｂ＾（ｆ，ｋ）と緩和係数β（ｋ）との積から信号Ｄ（ｆ，ｋ）と緩和係数β（ｋ）との積を減算しＤ（ｆ，ｋ）に加算した結果が得られるような処理を行って、第２残留エコー抑圧信号Ｄ_３（ｆ，ｋ）を求め、第２残留エコー抑圧信号Ｄ_３（ｆ，ｋ）を時間領域の信号ｄ_３（ｎ）に変換する。但し、ｎは時刻を、ｆ＝１，２，…，Ｆは離散角周波数を、ｋはフレーム時刻を表し、γ_１＜γ_２とする。 In order to solve the above-described problem, the echo canceling technique according to the present invention converts a signal d (n) and a received signal x (n) obtained based on a collected sound signal into a frequency domain signal D () for each frame. f, k) and X (f, k), and using the signals D (f, k) and X (f, k), the echo suppression gain Gb ^ (f, k) is obtained and the signal D (f , K) using the signal D ′ (f, k) obtained by removing the echo component, it is determined whether the signal to be suppressed is a vowel or a consonant, and the signal to be suppressed is determined to be a vowel. In this case, γ ₂ is the relaxation coefficient β (k), and in other cases, γ ₁ is the relaxation coefficient β (k), and the signal D (f, k) and the echo suppression gain Gb ^ (f, k) The result obtained by subtracting the product of the signal D (f, k) and the relaxation coefficient β (k) from the product of the relaxation coefficient β (k) and adding it to D (f, k) is obtained. It performs processing as a second residual echo suppressed signal _D 3 (f, k) the calculated, converted second residual echo suppressed signal _D 3 (f, k) to the signal _d 3 (n) of the time domain . Here, n represents time, f = 1, 2,..., F represents discrete angular frequency, k represents frame time, and γ ₁ <γ ₂ .

本発明は、状況に応じてエコー抑圧ゲインの大きさを変更し、十分にエコー抑圧をしながら、音声歪を同時に少なくするという効果を奏する。 The present invention produces an effect of simultaneously reducing sound distortion while changing the magnitude of the echo suppression gain according to the situation and sufficiently suppressing the echo.

従来の反響消去装置１０を説明するためのブロック図。The block diagram for demonstrating the conventional echo cancellation apparatus 10. FIG. 図２Ａは送話音声が母音の場合の送話音声に残留エコーが重畳した信号を、図２Ｂは図２Ａの信号に対し残留エコー抑圧処理を行った後の信号を、図２Ｃは送話音声が子音の場合の送話音声に残留エコーが重畳した信号を、図２Ｄは図２Ｃの信号に対し残留エコー抑圧処理を行った後の信号を表す図。2A shows a signal in which the residual echo is superimposed on the transmission voice when the transmission voice is a vowel, FIG. 2B shows a signal after the residual echo suppression processing is performed on the signal in FIG. 2A, and FIG. 2C shows the transmission voice. FIG. 2D is a diagram illustrating a signal obtained by performing a residual echo suppression process on the signal illustrated in FIG. 2C. 実施例１の反響消去装置１００を説明するためのブロック図。1 is a block diagram for explaining an echo cancellation apparatus 100 according to a first embodiment. 実施例１の反響消去装置１００の処理フローを説明するための図。The figure for demonstrating the processing flow of the echo cancellation apparatus 100 of Example 1. FIG. 実施例１の反響消去装置１００の適応フィルタ部１１を説明するためのブロック図。FIG. 3 is a block diagram for explaining an adaptive filter unit 11 of the echo canceling apparatus 100 according to the first embodiment. 実施例１の反響消去装置１００の雑音抑圧部１５を説明するためのブロック図。FIG. 3 is a block diagram for explaining a noise suppression unit 15 of the echo canceling apparatus 100 according to the first embodiment. 実施例１の反響消去装置１００の第１残留エコー抑圧部１３０、母音子音判定部１４０、緩和係数決定部１５０及び第２エコー抑圧部１６０を説明するためのブロック図。FIG. 3 is a block diagram for explaining a first residual echo suppression unit 130, a vowel consonant determination unit 140, a relaxation coefficient determination unit 150, and a second echo suppression unit 160 of the echo cancellation apparatus 100 according to the first embodiment. 実施例１の反響消去装置１００の第１残留エコー抑圧部１３０の処理フローを説明するための図。The figure for demonstrating the processing flow of the 1st residual echo suppression part 130 of the echo cancellation apparatus 100 of Example 1. FIG. 実施例１の反響消去装置１００の母音子音判定部１４０、緩和係数決定部１５０及び第２エコー抑圧部１６０の処理フローを説明するための図。The figure for demonstrating the processing flow of the vowel consonant determination part 140 of the echo cancellation apparatus 100 of Example 1, the relaxation coefficient determination part 150, and the 2nd echo suppression part 160. FIG. 実施例１の反響消去装置１００の緩和係数決定部１５０を説明するためのブロック図。FIG. 3 is a block diagram for explaining a relaxation coefficient determination unit 150 of the echo cancellation apparatus 100 according to the first embodiment. 図１１Ａは式Ｄ_３（ｆ，ｋ）＝｛１−β（ｋ）（１−Ｇｂ＾（ｆ，ｋ））｝Ｄ_２（ｆ，ｋ）を計算するための第２残留エコー抑圧部１６０ａを説明するための、図１１Ａは式D₃(f,k)=(1-β(k))D₂（f,k）+β(k)D'₃（f,k）を計算するための第２残留エコー抑圧部１６０ｂを説明するためのブロック図。FIG. 11A shows a second residual echo suppressor 160a for calculating the formula D ₃ (f, k) = {1-β (k) (1-Gb ^ (f, k))} D ₂ (f, k). 11A is used to calculate the equation D ₃ (f, k) = (1−β (k)) D ₂ (f, k) + β (k) D ′ ₃ (f, k). The block diagram for demonstrating the 2nd residual echo suppression part 160b. 実施例２の反響消去装置２００の緩和係数決定部２５０を説明するためのブロック図。The block diagram for demonstrating the relaxation coefficient determination part 250 of the echo cancellation apparatus 200 of Example 2. FIG. 実施例２の反響消去装置２００の緩和係数決定部２５０の処理フローを説明するための図。The figure for demonstrating the processing flow of the relaxation coefficient determination part 250 of the echo cancellation apparatus 200 of Example 2. FIG. 実施例２の反響消去装置２００の緩和係数決定部２５０を説明するための図。FIG. 10 is a diagram for explaining a relaxation coefficient determination unit 250 of the echo canceling apparatus 200 according to the second embodiment. 実施例３の反響消去装置３００の緩和係数決定部３５０を説明するためのブロック図。FIG. 10 is a block diagram for explaining a relaxation coefficient determining unit 350 of the echo canceling apparatus 300 according to the third embodiment. 実施例３の反響消去装置３００の緩和係数決定部３５０の処理フローを説明するための図。The figure for demonstrating the processing flow of the relaxation coefficient determination part 350 of the echo cancellation apparatus 300 of Example 3. FIG.

以下、本発明の実施の形態について、詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail.

＜反響消去装置１００＞
反響消去装置１００は、マイク３で収音される収音信号ｙ（ｎ）からスピーカ２で再生される受話信号ｘ（ｎ）に起因するエコー成分を周波数毎にエコー抑圧ゲインを乗じて抑圧する。 <Echo canceling apparatus 100>
The echo canceling apparatus 100 suppresses the echo component caused by the received signal x (n) reproduced by the speaker 2 from the collected signal y (n) collected by the microphone 3 by multiplying the echo suppression gain for each frequency. .

反響消去装置１００は、例えば、図３に示すように、適応フィルタ部１１、周波数領域変換部１３及び１７、雑音抑圧部１５、時間領域変換部１９、第１残留エコー抑圧部１３０、母音子音判定部１４０、緩和係数決定部１５０及び第２残留エコー抑圧部１６０を有する。図３及び図４を用いて実施例１に係る反響消去装置１００を説明する。なお、図３中、図１と対応する部分には同一の符号を付し、説明を省略する。以下の図についても同様に省略する。
＜適応フィルタ部１１＞
適応フィルタ部１１は、受話端１から入力される受話信号ｘ（ｎ）を用いて、マイク３から入力される収音信号ｙ（ｎ）から線形処理でエコー成分を消去し、残留エコー信号ｄ_１（ｎ）を求め（ｓ１１）、周波数領域変換部１３へ出力する。例えば図５に示すように、適応フィルタ部１１は、エコー予測部１１ａ、減算部１１ｂ及びエコー経路推定部１１ｃを備える。 For example, as shown in FIG. 3, the echo cancellation apparatus 100 includes an adaptive filter unit 11, frequency domain conversion units 13 and 17, a noise suppression unit 15, a time domain conversion unit 19, a first residual echo suppression unit 130, and a vowel consonant determination. Unit 140, relaxation coefficient determination unit 150, and second residual echo suppression unit 160. The echo canceling apparatus 100 according to the first embodiment will be described with reference to FIGS. 3 and 4. In FIG. 3, parts corresponding to those in FIG. The same applies to the following figures.
<Adaptive filter unit 11>
The adaptive filter unit 11 uses the received signal x (n) input from the receiving end 1 to cancel the echo component from the collected sound signal y (n) input from the microphone 3 by linear processing, and the residual echo signal d ₁ (n) is obtained (s11) and output to the frequency domain transform unit 13. For example, as illustrated in FIG. 5, the adaptive filter unit 11 includes an echo prediction unit 11a, a subtraction unit 11b, and an echo path estimation unit 11c.

エコー予測部１１ａは、フィルタ係数ベクトルＨ’（ｎ）と受話信号ｘ（ｎ）を受け取り、これを以下の式のように畳み込み、疑似エコー信号ｙ’（ｎ）を求め、これを減算部１１ｂへ送る。 The echo prediction unit 11a receives the filter coefficient vector H ′ (n) and the received signal x (n), convolves them as in the following equation, obtains a pseudo echo signal y ′ (n), and subtracts this from the subtraction unit 11b. Send to.

ｙ’（ｎ）＝Ｈ’^Ｔ（ｎ）Ｘ（ｎ）
但し、
Ｈ’（ｎ）＝［ｈ’（ｎ，０）…ｈ’（ｎ，Ｌ−１）］^Ｔ
Ｘ（ｎ）＝［ｘ（ｎ）…ｘ（ｎ−Ｌ＋１）］^Ｔ
であり、［］^Ｔはベクトルの転置を、Ｌはフィルタ長を、ｈ’（ｎ，ｌ）は各フィルタ係数を表す。 y ′ (n) = H ′ ^T (n) X (n)
However,
H ′ (n) = [h ′ (n, 0)... H ′ (n, L−1)] ^T
X (n) = [x (n)... X (n−L + 1)] ^T
[] ^T represents vector transposition, L represents filter length, and h ′ (n, l) represents each filter coefficient.

減算部１１ｂは、収音信号ｙ（ｎ）と疑似エコー信号ｙ’（ｎ）を受け取り、収音信号ｙ（ｎ）から疑似エコー信号ｙ’（ｎ）を差し引き、残留エコー信号ｄ_１（ｎ）（＝ｙ（ｎ）−ｙ’（ｎ））を求め、周波数領域変換部１３とエコー経路推定部１１ｃへ送る。 The subtractor 11b receives the collected sound signal y (n) and the pseudo echo signal y ′ (n), subtracts the pseudo echo signal y ′ (n) from the collected sound signal y (n), and obtains a residual echo signal d ₁ (n ) (= Y (n) −y ′ (n)) is obtained and sent to the frequency domain transform unit 13 and the echo path estimation unit 11c.

エコー経路推定部１１ｃは、残留エコー信号ｄ_１（ｎ）と受話信号ｘ（ｎ）を受け取り、これに基づき、収音信号ｙ（ｎ）と疑似エコー信号ｙ’（ｎ）との誤差が小さくなるようにエコー予測部１１ａのフィルタ係数ベクトルＨ’（ｎ）が更新し、エコー予測部１１ａに送る。例えば、ＮＬＭＳ（ＮｏｒｍａｌｉｚｅｄＬｅａｓｔＭｅａｎＳｑｕａｒｅ）アルゴリズムを用いて、以下の式のようにフィルタ係数ｈ’（ｎ＋１）を更新する。 The echo path estimation unit 11c receives the residual echo signal d ₁ (n) and the received signal x (n), and based on this, the error between the collected sound signal y (n) and the pseudo echo signal y ′ (n) is small. The filter coefficient vector H ′ (n) of the echo prediction unit 11a is updated so as to be sent to the echo prediction unit 11a. For example, the filter coefficient h ′ (n + 1) is updated using the NLMS (Normalized Least Mean Square) algorithm as shown in the following equation.

Ｈ’（ｎ＋１）＝Ｈ’（ｎ）＋（μｄ_１（ｎ）Ｘ（ｎ））／（Ｘ^Ｔ（ｎ）Ｘ（ｎ））
但し、μは推定を安定にするために設定されるステップサイズである。
＜周波数領域変換部１３及び１７＞
周波数領域変換部１３は、例えば、残留エコー信号ｄ_１（ｎ）を受け取り、現在時刻nからｄ_１（ｎ）、ｄ_１（ｎ−１）、…、ｄ_１（ｎ−Ｌ＋１）のL個分を１フレームとし、フレーム毎に周波数領域の信号Ｄ_２（ｆ，ｋ）に変換し（ｓ１３）、雑音抑圧部１５に送る。なお、反響消去装置１００に適応フィルタ部１１を設けない場合には、周波数領域変換部１３は、収音信号ｙ（ｎ）を受け取る構成としてもよい。Ｌは通常１０ｍｓや２０ｍｓに対応するサンプル数を用いる。 H ′ (n + 1) = H ′ (n) + (μd ₁ (n) X (n)) / (X ^T (n) X (n))
However, μ is a step size set to stabilize the estimation.
<Frequency domain conversion units 13 and 17>
The frequency domain transform unit 13 receives, for example, the residual echo signal d ₁ (n), and starts from the current time n to L 1 of d ₁ (n), d ₁ (n−1),..., D ₁ (n−L + 1). The minute is defined as one frame, and is converted into a frequency domain signal D ₂ (f, k) for each frame (s 13) and sent to the noise suppression unit 15. When the adaptive filter unit 11 is not provided in the echo canceling apparatus 100, the frequency domain conversion unit 13 may be configured to receive the collected sound signal y (n). For L, the number of samples corresponding to 10 ms or 20 ms is usually used.

周波数領域変換部１７は、受話信号ｘ（ｎ）を受け取り、フレーム毎に周波数領域の信号Ｘ（ｆ，ｋ）に変換し（ｓ１７）、第１エコー抑圧部１３０に送る。なお、変換方式としては、離散フーリエ変換（ＤＦＴ：discrete Fourier transform）や短時間フーリエ変換（ＳＴＦＴ：short-time Fourier transform）等がある。
＜雑音抑圧部１５＞
雑音抑圧部１５は、周波数領域の残留エコー信号Ｄ_１（ｆ，ｋ）を受け取り、この信号Ｄ_１（ｆ，ｋ）に含まれる雑音成分Ｎ（ｆ，ｋ）を抑圧し、雑音除去信号Ｄ_２（ｆ，ｋ）を求め（ｓ１５）、第１残留エコー抑圧部１３０と第２残留エコー抑圧部１６０へ送る。雑音抑圧部１５は、例えば図６に示すように、雑音レベル推定部１５ａ、雑音抑圧ゲイン計算部１５ｂ、乗算部１５ｃを備える。 The frequency domain converter 17 receives the received signal x (n), converts it into a frequency domain signal X (f, k) for each frame (s17), and sends it to the first echo suppressor 130. Examples of the conversion method include discrete Fourier transform (DFT) and short-time Fourier transform (STFT).
<Noise Suppression Unit 15>
The noise suppressing unit 15 receives the residual echo signal D ₁ (f, k) in the frequency domain, suppresses the noise component N (f, k) included in the signal D ₁ (f, k), and denoises the signal D. ₂ (f, k) is obtained (s15) and sent to the first residual echo suppressor 130 and the second residual echo suppressor 160. For example, as shown in FIG. 6, the noise suppression unit 15 includes a noise level estimation unit 15a, a noise suppression gain calculation unit 15b, and a multiplication unit 15c.

雑音レベル推定部１５ａは、信号Ｄ_１（ｆ，ｋ）を受け取り、音声の存在しない区間の入力信号Ｄ_１（ｆ，ｋ）から集合平均Ｅ［|Ｎ（ｆ，ｋ）|^２］を求める。但し、Ｎ（ｆ，ｋ）は残留エコー信号Ｄ_１（ｆ，ｋ）に含まれる雑音成分とする。 The noise level estimation unit 15a receives the signal D ₁ (f, k), and obtains a set average E [| N (f, k) | ² ] from the input signal D ₁ (f, k) in a section where no speech exists. . However, N (f, k) is a noise component included in the residual echo signal D ₁ (f, k).

雑音抑圧ゲイン計算部１５ｂは、信号Ｄ_１（ｆ，ｋ）と集合平均Ｅ［|Ｎ（ｆ，ｋ）|^２］を受け取り、以下の式により、雑音抑圧ゲインＧａ＾（ｆ，ｋ）を求める。 The noise suppression gain calculation unit 15b receives the signal D ₁ (f, k) and the set average E [| N (f, k) | ² ], and calculates the noise suppression gain Ga ^ (f, k) by the following equation. Ask.

乗算部１５ｃは、残留エコー信号Ｄ_１（ｆ，ｋ）に雑音抑圧ゲインＧａ＾（ｆ，ｋ）を乗じて、雑音除去信号Ｄ_２（ｆ，ｋ）を求める。その際、以下の式にように、雑音除去信号Ｄ_２（ｆ，ｋ）に適当な割合１−αで残留エコー信号Ｄ_１（ｆ，ｋ）（原音）を付加し、音声歪をマスクして雑音除去信号Ｄ_２（ｆ，ｋ）の聴感上の劣化を抑える構成としてもよい。
Ｄ_２（ｆ，ｋ）＝（１−α）Ｄ_１（ｆ，ｋ）＋αＧａ＾（ｆ，ｋ）Ｄ_１（ｆ，ｋ） The multiplier 15c multiplies the residual echo signal D ₁ (f, k) by a noise suppression gain Ga ^ (f, k) to obtain a noise removal signal D ₂ (f, k). At that time, as shown in the following equation, the residual echo signal D ₁ (f, k) (original sound) is added to the noise removal signal D ₂ (f, k) at an appropriate ratio 1-α to mask the audio distortion. Thus, a configuration that suppresses deterioration in the audibility of the noise removal signal D ₂ (f, k) may be employed.
_{D 2 (f, k) =} (1-α) D 1 (f, k) + αGa ^ (f, k) D 1 (f, k)

＜第１残留エコー抑圧部１３０＞
第１残留エコー抑圧部１３０は、雑音除去信号Ｄ_２（ｆ，ｋ）と受話信号Ｘ（ｆ，ｋ）を受け取り、これを用いてエコー抑圧ゲインＧｂ＾（ｆ，ｋ）を求め、これを信号Ｄ_２（ｆ，ｋ）に乗じて第１残留エコー抑圧信号Ｄ’_３（ｆ，ｋ）を求める（ｓ１３０）。第１残留エコー抑圧部１３０は、第１残留エコー抑圧信号Ｄ’_３（ｆ，ｋ）を母音子音判定部１４０に送り、エコー抑圧ゲインＧｂ＾（ｆ，ｋ）を第２残留エコー抑圧部１６０に送る。 <First Residual Echo Suppression Unit 130>
The first residual echo suppression unit 130 receives the noise removal signal D ₂ (f, k) and the received signal X (f, k), and uses this to determine the echo suppression gain Gb ^ (f, k), The signal D ₂ (f, k) is multiplied to obtain a first residual echo suppression signal D ′ ₃ (f, k) (s130). The first residual echo suppression unit 130 sends the first residual echo suppression signal D ′ ₃ (f, k) to the vowel consonant determination unit 140 and the echo suppression gain Gb ^ (f, k) to the second residual echo suppression unit 160. Send to.

第１残留エコー抑圧部１３０は、例えば図７に示すように、エコー抑圧ゲイン計算部１３１と乗算部１３５を備える。さらに、エコー抑圧ゲイン計算部１３１は、音響結合量推定部１３２、エコーレベル推定部１３３、ゲイン計算部１３４を備える。図７及び図８を用いて各部の処理を説明する。 The first residual echo suppression unit 130 includes an echo suppression gain calculation unit 131 and a multiplication unit 135, for example, as shown in FIG. Furthermore, the echo suppression gain calculation unit 131 includes an acoustic coupling amount estimation unit 132, an echo level estimation unit 133, and a gain calculation unit 134. The processing of each unit will be described with reference to FIGS.

音響結合量推定部１３２は、雑音除去信号Ｄ_２（ｆ，ｋ）と受話信号Ｘ（ｆ，ｋ）を受け取る。音響結合量推定部１３２は、雑音除去信号Ｄ_２（ｆ，ｋ）と受話信号Ｘ（ｆ，ｋ）の集合平均Ｅ［|Ｄ_２（ｆ，ｋ）|^２］、Ｅ［|Ｘ（ｆ，ｋ）|^２］をそれぞれ求め、Ｅ［|Ｄ_２（ｆ，ｋ）|^２］、Ｅ［|Ｘ（ｆ，ｋ）|^２］の比の最小値を更新することによって、音響結合量の周波数特性Ｅ［|Ｈ（ｆ，ｋ）|^２］を求め（ｓ１３２）、エコーレベル推定部１３３へ送る。 The acoustic coupling amount estimation unit 132 receives the noise removal signal D ₂ (f, k) and the reception signal X (f, k). The acoustic coupling amount estimation unit 132 sets the collective averages E [| D ₂ (f, k) | ² ], E [| X (f) of the noise removal signal D ₂ (f, k) and the received signal X (f, k). , k) | seeking ^2], _{respectively, E [| D 2 (f} , k) | 2], E [| X (f, k) | by updating the minimum value of the ratio of ^2, an acoustic coupling amount Frequency characteristic E [| H (f, k) | ² ] is obtained (s132) and sent to the echo level estimation unit 133.

エコーレベル推定部１３３は、音響結合量の周波数特性Ｅ［|Ｈ（ｆ，ｋ）|^２］と受話信号Ｘ（ｆ，ｋ）を受け取り、式（３）により、疑似残留エコーＹ＾（ｆ，ｋ）の集合平均Ｅ［|Ｙ＾（ｆ，ｋ）|^２］を求め（ｓ１３３）、ゲイン計算部１３４に送る。 The echo level estimation unit 133 receives the frequency characteristic E [| H (f, k) | ² ] of the acoustic coupling amount and the received signal X (f, k), and uses the pseudo residual echo Y ^ (f , K), the set average E [| Y ^ (f, k) | ² ] is obtained (s133) and sent to the gain calculation unit 134.

E[|Y^（f,k）|²]=E[|H（f,k）|²]|X（f,k）|²+βE[|Y^（f,k-1）|²］（３）
ゲイン計算部１３４は、疑似残留エコーＹ＾（ｆ，ｋ）と雑音除去信号Ｄ_２（ｆ，ｋ）を受け取り、式（１）により、エコー抑圧ゲインＧｂ＾（ｆ，ｋ）を求め（ｓ１３１，ｓ１３４）、乗算部１３５と第２残留エコー抑圧部１３５に送る。 E [| Y ^ (f, k) | ² ] = E [| H (f, k) | ² ] | X (f, k) | ² + βE [| Y ^ (f, k-1) | ² ] (3)
Gain calculator 134, the pseudo residual echo Y ^ (f, k) and the noise cancellation signal _D 2 (f, k) receive, by the equation (1), the echo suppression gain Gb ^ (f, k) the calculated (s131 , S134), and sent to the multiplier 135 and the second residual echo suppressor 135.

Ｇ（ｆ，ｋ）＝（|Ｄ_２（ｆ，ｋ）|^２−|Ｙ＾（ｆ，ｋ）|^２）／|Ｄ_２（ｆ，ｋ）|^２（１）
乗算部１３５は、式（２）により、雑音除去信号Ｄ_２（ｆ，ｋ）にエコー抑圧ゲインＧｂ＾（ｆ，ｋ）を乗じて、第１残留エコー抑圧信号Ｄ’_３（ｆ，ｋ）を求め（ｓ１３５）、母音子音判定部１４０に送る。
Ｄ’_３（ｆ，ｋ）＝Ｇ（ｆ，ｋ）Ｄ_２（ｆ，ｋ）（２） G (f, k) = (| D ₂ (f, k) | ² − | Y ^ (f, k) | ² ) / | D ₂ (f, k) | ² (1)
The multiplication unit 135 multiplies the noise removal signal D ₂ (f, k) by the echo suppression gain Gb ^ (f, k) according to the equation (2) to obtain the first residual echo suppression signal D ′ ₃ (f, k). (S135) and sent to the vowel consonant determination unit 140.
D ′ ₃ (f, k) = G (f, k) D ₂ (f, k) (2)

＜母音子音判定部１４０＞
母音子音判定部１４０は、第１残留エコー抑圧信号Ｄ’_３（ｆ，ｋ）を受け取り、これを用いて、抑圧対象の信号Ｄ_２（ｆ，ｋ）が母音であるか子音であるかを判定する（ｓ１４０）。母音子音判定部１４０は、例えば図７に示すように、判定用評価値計算部１４１と判定部１４３を備える。図７及び図９を用いて各部の処理を説明する。 <Vowel Consonant Determination Unit 140>
The vowel consonant determination unit 140 receives the first residual echo suppression signal D ′ ₃ (f, k) and uses it to determine whether the signal D ₂ (f, k) to be suppressed is a vowel or a consonant. Determine (s140). The vowel consonant determination unit 140 includes, for example, a determination evaluation value calculation unit 141 and a determination unit 143 as shown in FIG. The processing of each unit will be described with reference to FIGS.

判定用評価値計算部１４１は、第１残留エコー抑圧信号Ｄ’_３（ｆ，ｋ）を受け取り、以下の式で、第１残留エコー抑圧信号Ｄ’_３（ｆ，ｋ）のスペクトルのスパース性を示す値Ｓ（Ｄ’_３（ｋ））を求め（ｓ１４１）、判定部１４３に送る。 The evaluation value calculator for determination 141 receives the first residual echo suppression signal D ′ ₃ (f, k), and uses the following formula to determine the sparsity of the spectrum of the first residual echo suppression signal D ′ ₃ (f, k). A value S (D ′ ₃ (k)) indicating is obtained (s141) and sent to the determination unit 143.

但し、Ｄ’_３（ｋ）はＤ’_３（ｆ，ｋ）のベクトル表記であり、Ｄ’_３（ｋ）＝｛Ｄ’_３（０，ｋ），Ｄ’_３（１，ｋ），…，Ｄ’_３（Ｆ，ｋ）｝であり、ｆ_ｈは考慮する最高周波数を、ｆ_ｌは考慮する最低周波数を表す。例えば、音声通話通信で用いられる３００Ｈｚ〜３ｋＨｚや可聴域２０Ｈｚ〜２０ｋＨｚを最低周波数及び最高周波数として設定する。この式（５）において、 However, D ′ ₃ (k) is a vector notation of D ′ ₃ (f, k), and D ′ ₃ (k) = {D ′ ₃ (0, k), D ′ ₃ (1, k),. , D ′ ₃ (F, k)}, f _h represents the highest frequency considered, and f _l represents the lowest frequency considered. For example, 300 Hz to 3 kHz and audible range 20 Hz to 20 kHz used in voice call communication are set as the lowest frequency and the highest frequency. In this formula (5),

であり、|Ｄ’_３（ｆ，ｋ）|のｆ_ｌ≦ｆ≦ｆ_ｈでの値が最もスパースなとき（１つの周波数成分のみ値を持ち、他の周波数成分は０のとき）に１をとり、最もスパースでないとき（全ての周波数成分が同じ値のとき）に√（ｆ_ｈ−ｆ_ｌ＋１）をとる。そのため、０≦Ｓ（Ｄ’_３（ｋ））≦１であり、Ｄ’_３（ｆ，ｋ）が母音のスペクトルの場合Ｓ（Ｄ’_３（ｋ））は１に近い値となり（図２Ｂ参照）、子音の場合Ｓ（Ｄ’_３（ｋ））は０に近い値となる（図２Ｄ参照）。 1 when the value of | D ′ ₃ (f, k) | at f _l ≦ f ≦ f _h is the most sparse (when only one frequency component has a value and the other frequency component is 0). √ (f _h −f _l +1) is taken when it is least sparse (when all frequency components have the same value). Therefore, 0 ≦ S (D ′ ₃ (k)) ≦ 1, and when D ′ ₃ (f, k) is a vowel spectrum, S (D ′ ₃ (k)) is a value close to 1 (FIG. 2B). In the case of a consonant, S (D ′ ₃ (k)) is a value close to 0 (see FIG. 2D).

そこで、判定部１４３は、スパース性を示す値Ｓ（Ｄ’_３（ｋ））を受け取り、Ｓ（Ｄ’_３（ｋ））が予め定められた閾値Ｔ以上か否か判定し、閾値Ｔ以上の場合には母音と判定し、閾値Ｔ未満の場合には子音と判定する（ｓ１４３）。判定部１４３は、判定結果ｊ（ｋ）を緩和係数決定部１５０へ送る。閾値Ｔは０≦Ｔ≦１であり、実験等により予め母音子音を判定することができるように定められる（例えばＴ＝０．５）。また、判定結果ｊ（ｋ）には、例えば、子音であることを表す情報として０を、母音であることを表す情報として１を設定してもよい。 Therefore, the determination unit 143 receives a value S (D ′ ₃ (k)) indicating sparsity, determines whether S (D ′ ₃ (k)) is equal to or greater than a predetermined threshold T, and is equal to or greater than the threshold T. Is determined as a vowel, and when it is less than the threshold T, it is determined as a consonant (s143). The determination unit 143 sends the determination result j (k) to the relaxation coefficient determination unit 150. The threshold T is 0 ≦ T ≦ 1, and is determined so that a vowel consonant can be determined in advance by experiments or the like (for example, T = 0.5). Further, in the determination result j (k), for example, 0 may be set as information indicating that it is a consonant, and 1 may be set as information indicating that it is a vowel.

なお、母音子音判定に第１残留エコー抑圧信号Ｄ’_３（ｆ，ｋ）を用いるのは、判定に用いる信号に、受話信号に由来するエコー成分が残っていると、抑圧対象の信号の性質を誤判定するためである。よって、エコー成分を取り除いた信号であれば、母音子音判定に用いることができる。エコー成分を取り除いた信号とは、例えば、適応フィルタ部１１において線形処理でエコー成分を消去したか、または、第１残留エコー抑圧部１３０で非線形エコー抑圧したか、少なくとも一方の処理を行った信号であればよい。よって、図７中、長破線で示すように雑音除去信号Ｄ_２（ｆ，ｋ）を母音子音判定部に送る構成としてもよい。但し、残留エコー成分が含まれるため、判定の精度は落ちる。 Note that the first residual echo suppression signal D ′ ₃ (f, k) is used for the vowel consonant determination when the echo component derived from the received signal remains in the signal used for the determination. This is because of erroneous determination. Therefore, any signal from which the echo component is removed can be used for vowel consonant determination. The signal from which the echo component has been removed is, for example, a signal in which the adaptive filter unit 11 has eliminated the echo component by linear processing, or has been subjected to nonlinear echo suppression by the first residual echo suppression unit 130, or has undergone at least one processing If it is. Therefore, as shown by the long broken line in FIG. 7, the noise removal signal D ₂ (f, k) may be sent to the vowel consonant determination unit. However, since the residual echo component is included, the accuracy of the determination is lowered.

＜緩和係数決定部１５０＞
緩和係数決定部１５０は、抑圧対象の信号が母音であると判定された場合には１を緩和係数β（ｋ）とし、それ以外の場合にはγを緩和係数β（ｋ）とする（ｓ１５０）。但し、γは０≦γ＜１とし、実験等により予め適切な値を求め、予め定めておく。
例えば、緩和係数決定部１５０は、図１０に示すように、記憶部１５１、１５３及び切替部１５５を備える。図９及び図１０を用いて各部の処理を説明する。緩和係数決定部１５０は、判定結果ｊ（ｋ）を受け取る。ｊ（ｋ）が母音であることを表す情報の場合には、切替部１５５は、記憶部１５１と接続する。緩和係数決定部１５０は、記憶部１５１から１を取り出し、β（ｋ）＝１として、緩和係数β（ｋ）を決定し、出力する（ｓ１５０、ｓ１５１）。ｊ（ｋ）が子音であることを表す情報の場合には、切替部１５５は、記憶部１５３と接続する。緩和係数決定部１５０は、記憶部１５３からγを取り出し、β（ｋ）＝γとして、緩和係数β（ｋ）を決定し、出力する（ｓ１５０、ｓ１５３）。 <Relaxation coefficient determination unit 150>
The relaxation coefficient determination unit 150 sets 1 as a relaxation coefficient β (k) when it is determined that the signal to be suppressed is a vowel, and otherwise sets γ as a relaxation coefficient β (k) (s150). ). However, γ is set to 0 ≦ γ <1, and an appropriate value is obtained in advance through an experiment or the like and determined in advance.
For example, the relaxation coefficient determining unit 150 includes storage units 151 and 153 and a switching unit 155 as illustrated in FIG. The processing of each unit will be described with reference to FIGS. 9 and 10. The relaxation coefficient determination unit 150 receives the determination result j (k). When j (k) is information indicating that it is a vowel, the switching unit 155 is connected to the storage unit 151. The relaxation coefficient determination unit 150 extracts 1 from the storage unit 151, determines β (k) = 1, and outputs the relaxation coefficient β (k) (s150, s151). When j (k) is information indicating that it is a consonant, the switching unit 155 is connected to the storage unit 153. The relaxation coefficient determining unit 150 extracts γ from the storage unit 153, determines β (k) = γ, and outputs the relaxation coefficient β (k) (s150, s153).

なお、母音子音判定部１４０の判定部１４３と緩和係数決定部１５０の処理は、以下の式で表すことができる。 In addition, the process of the determination part 143 of the vowel consonant determination part 140 and the relaxation coefficient determination part 150 can be represented with the following formula | equation.

＜第２残留エコー抑圧部１６０＞
第２残留エコー抑圧部１６０は、例えば、第２残留エコー抑圧部１６０は、Ｄ_２（ｆ，ｋ）とＧｂ＾（ｆ，ｋ）とβ（ｋ）を受け取り、以下の式により第２残留エコー抑圧信号Ｄ_３（ｆ，ｋ）を求め（ｓ１６０）、時間領域変換部１９に送る。 <Second Residual Echo Suppression Unit 160>
For example, the second residual echo suppressor 160 receives D ₂ (f, k), Gb ^ (f, k), and β (k), and receives the second residual echo by the following equation. An echo suppression signal D ₃ (f, k) is obtained (s160) and sent to the time domain conversion unit 19.

Ｄ_３（ｆ，ｋ）＝｛１−β（ｋ）（１−Ｇｂ＾（ｆ，ｋ））｝Ｄ_２（ｆ，ｋ）（７）
このときの第２残留エコー抑圧部１６０の構成例を図１１Ａに示す。以下、簡単に処理を説明する。減算部１６２ａは、記憶部１６１ａから取り出した値１から受け取ったエコー抑圧ゲインＧｂ＾（ｆ，ｋ）を差し引き、（１−Ｇｂ＾（ｆ，ｋ））を求める。乗算部１６３ａは、この値に緩和係数β（ｋ）を乗じ、β（ｋ）（１−Ｇｂ＾（ｆ，ｋ）を求める。減算部１６５ａは記憶部１６４ａから取り出した値１からβ（ｋ）（１−Ｇｂ＾（ｆ，ｋ）を差し引き、｛１―β（ｋ）（１−Ｇｂ＾（ｆ，ｋ））｝を求める。乗算部１６６ａにおいて、この値を雑音除去信号Ｄ_２（ｆ，ｋ）に乗じて、第２残留エコー抑圧信号Ｄ_３（ｆ，ｋ）を求め出力する。
このような構成とすることで、送話音声を子音と判定した場合、エコー抑圧ゲインを弱めて送話音声の子音の周波数成分の欠損を緩和することができる。 D ₃ (f, k) = {1-β (k) (1-Gb ^ (f, k))} D ₂ (f, k) (7)
A configuration example of the second residual echo suppressing unit 160 at this time is shown in FIG. 11A. The process will be briefly described below. The subtraction unit 162a subtracts the received echo suppression gain Gb ^ (f, k) from the value 1 extracted from the storage unit 161a to obtain (1-Gb ^ (f, k)). The multiplication unit 163a multiplies this value by the relaxation coefficient β (k) to obtain β (k) (1−Gb ^ (f, k). The subtraction unit 165a calculates β (k from the value 1 taken out from the storage unit 164a. ) (1-Gb ^ (f, k)) is subtracted to obtain {1-β (k) (1-Gb ^ (f, k))}, which is multiplied by the noise removal signal D ₂ ( The second residual echo suppression signal D ₃ (f, k) is obtained by multiplying by f, k) and output.
With such a configuration, when the transmitted voice is determined to be a consonant, it is possible to reduce the loss of the frequency component of the consonant of the transmitted voice by weakening the echo suppression gain.

＜時間領域変換部１９＞
時間領域変換部１９は、第２残留エコー抑圧信号Ｄ_３（ｆ，ｋ）を受け取り、これを時間領域の信号ｄ_３（ｎ）に変換し（ｓ１９）、送話端４に送る。なお、変換方式は、周波数領域変換部１３及び１７の変換方式に対応する逆フーリエ変換等であればよい。
［プログラム及び記録媒体］
上述した反響消去装置は、コンピュータにより機能させることもできる。この場合はコンピュータに、目的とする装置（各種実施例で図に示した機能構成をもつ装置）として機能させるためのプログラム、またはその処理手順（各実施例で示したもの）の各過程をコンピュータに実行させるためのプログラムを、ＣＤ−ＲＯＭ、磁気ディスク、半導体記憶装置などの記録媒体から、あるいは通信回線を介してそのコンピュータ内にダウンロードし、そのプログラムを実行させればよい。
＜効果＞
このような構成とすることによって、状況に応じて緩和係数（原音付加率）を変更することができ、十分にエコー抑圧をしながら、音声歪を同時に少なくするという効果を奏する。そのため、従来技術と比較して、より音声が聞き取りやすくなる。 <Time domain conversion unit 19>
The time domain conversion unit 19 receives the second residual echo suppression signal D ₃ (f, k), converts it into a time domain signal d ₃ (n) (s 19), and sends it to the transmitting end 4. The conversion method may be an inverse Fourier transform or the like corresponding to the conversion method of the frequency domain conversion units 13 and 17.
[Program and recording medium]
The echo canceling apparatus described above can also be operated by a computer. In this case, each process of a program for causing a computer to function as a target device (a device having the functional configuration shown in the drawings in various embodiments) or a processing procedure (shown in each embodiment) is processed by the computer. A program to be executed by the computer may be downloaded from a recording medium such as a CD-ROM, a magnetic disk, or a semiconductor storage device or via a communication line into the computer, and the program may be executed.
<Effect>
With such a configuration, the relaxation coefficient (original sound addition rate) can be changed according to the situation, and there is an effect that the sound distortion is simultaneously reduced while sufficiently suppressing the echo. Therefore, it becomes easier to hear the voice as compared with the prior art.

抑圧対象の信号が子音か母音かを判定して、判定結果に応じて緩和係数（原音付加率）を変更するので、抑圧対象の信号が子音の場合には、エコー抑圧ゲインを小さく緩和し、音声の歪が小さくし、聞き取り誤りの発生等を防止する。抑圧対象の信号が母音の場合には、エコー抑圧ゲインを大きくし、十分なエコー消去性能を得ることができる。 Since it is determined whether the signal to be suppressed is a consonant or a vowel, and the relaxation coefficient (original sound addition rate) is changed according to the determination result, if the signal to be suppressed is a consonant, the echo suppression gain is reduced to a small value, The distortion of the voice is reduced and the occurrence of listening errors is prevented. When the signal to be suppressed is a vowel, the echo suppression gain can be increased and sufficient echo cancellation performance can be obtained.

つまり、本実施例では、音声の性質に従って時刻毎に適切なエコー抑圧ゲインを設定でき、エコー消去量と音声の聞き取りやすさをバランスよく両立することができる。その結果、ハンズフリー通話等での音声がより聞き取りやすくなる。 That is, in the present embodiment, an appropriate echo suppression gain can be set for each time according to the nature of the voice, and the echo cancellation amount and the ease of listening to the voice can be balanced. As a result, it becomes easier to hear the voice in a hands-free call or the like.

なお、このエコー抑圧ゲインの緩和に関しては非線形抑圧処理に対して有効なものであり、適応フィルタ部１１の側へ導入しても、もともと音声歪がない上にエコー消去量が減少するだけで逆効果である。また、雑音抑圧の方へ導入することは可能だが、雑音は多くの場合音声の子音に近い広帯域なスペクトルを持つため、雑音が子音と判定され雑音抑圧性能を低下させる結果となり、本発明の効果を得ることはできない。 This relaxation of the echo suppression gain is effective for nonlinear suppression processing, and even if it is introduced to the adaptive filter unit 11 side, there is no speech distortion and only the echo cancellation amount is reduced. It is an effect. Although it is possible to introduce noise suppression, noise often has a broad spectrum close to the consonant of the speech, so that the noise is determined as a consonant and results in a reduction in noise suppression performance. Can't get.

［変形例］
反響消去装置１００に入力される入力信号及び収音信号がアナログ信号の場合には、反響消去装置１００は、アナログ信号をデジタル信号に変換する図示しないＡ／Ｄ変換部を有してもよい。また、送話端４にアナログ信号を出力する場合には、反響消去装置１００は、デジタル信号をアナログ信号に変換する図示しないＤ／Ａ変換部を有してもよい。 [Modification]
When the input signal and the collected sound signal input to the echo canceling apparatus 100 are analog signals, the echo canceling apparatus 100 may include an A / D conversion unit (not shown) that converts the analog signal into a digital signal. When outputting an analog signal to the transmitting end 4, the echo canceling apparatus 100 may include a D / A conversion unit (not shown) that converts a digital signal into an analog signal.

適応フィルタ部１１において、周波数領域の受話信号Ｘ（ｆ，ｋ）及び収音信号Ｙ（ｆ，ｋ）を用いて、エコー成分を消去してもよい。その場合、周波数領域変換部１３は、適応フィルタ部１１の前段に設けられる。適応フィルタ部１１は、周波数領域変換部１３及び１７の出力信号Ｘ（ｆ，ｋ）及びＹ（ｆ，ｋ）を受信する。 The adaptive filter unit 11 may eliminate the echo component using the frequency domain received signal X (f, k) and the collected sound signal Y (f, k). In that case, the frequency domain transform unit 13 is provided in the preceding stage of the adaptive filter unit 11. The adaptive filter unit 11 receives the output signals X (f, k) and Y (f, k) from the frequency domain transform units 13 and 17.

第２残留エコー抑圧部１６０は、図７に長破線で示すように、Ｇｂ＾（ｆ，ｋ）に代えてＤ’_３（ｆ，ｋ）を受け取り、以下の式（８）により第２残留エコー抑圧信号Ｄ_３（ｆ，ｋ）を求めてもよい。 The second residual echo suppression unit 160 receives D ′ ₃ (f, k) instead of Gb ^ (f, k) as shown by a long broken line in FIG. The echo suppression signal D ₃ (f, k) may be obtained.

D₃(f,k)=(1-β(k))D₂（f,k）+β(k)D'₃（f,k）（８）
なお、式（２）より、Ｄ’_３（ｆ，ｋ）＝Ｇｂ＾（ｆ，ｋ）Ｄ_２（ｆ，ｋ）である。この場合の第２残留エコー抑圧部１６０の構成を図１１Ｂに示す。乗算部１６２ｂは記憶部１６１ｂから取り出した値１から、受け取った緩和係数β（ｋ）を差し引き、（１−β（ｋ））を求める。乗算部１６３は、受け取った雑音除去信号Ｄ_２（ｆ，ｋ）にこの値（１−β（ｋ））を乗じ、（１−β（ｋ））Ｄ_２（ｆ，ｋ）を求める。乗算部１６４ｂは、受け取ったＤ’_３（ｆ，ｋ）に緩和係数β（ｋ）を乗じ、β（ｋ）Ｄ’_３（ｆ，ｋ）を求める。加算部１６５ｂは、（１−β（ｋ））Ｄ_２（ｆ，ｋ）とβ（ｋ）Ｄ’_３（ｆ，ｋ）を加算し、第２残留エコー抑圧信号Ｄ_３（ｆ，ｋ）を求め、出力する。 D ₃ (f, k) = (1-β (k)) D ₂ (f, k) + β (k) D ' ₃ (f, k) (8)
Incidentally, the equation (2), a _{D '3 (f, k)} = Gb ^ (f, k) D 2 (f, k). FIG. 11B shows the configuration of the second residual echo suppression unit 160 in this case. The multiplication unit 162b subtracts the received relaxation coefficient β (k) from the value 1 extracted from the storage unit 161b to obtain (1−β (k)). The multiplier 163 multiplies the received noise removal signal D ₂ (f, k) by this value (1-β (k)) to obtain (1-β (k)) D ₂ (f, k). The multiplier 164b multiplies the received D ′ ₃ (f, k) by the relaxation coefficient β (k) to obtain β (k) D ′ ₃ (f, k). The adder 165b adds (1−β (k)) D ₂ (f, k) and β (k) D ′ ₃ (f, k) to obtain the second residual echo suppression signal D ₃ (f, k). Is output.

なお、第２残留エコー抑圧部１６０の構成は図１１Ａ、図１１Ｂの構成に限定されるものではなく、雑音除去信号Ｄ_２（ｆ，ｋ）とエコー抑圧ゲインＧｂ＾（ｆ，ｋ）と緩和係数β（ｋ）との積から雑音除去信号Ｄ_２（ｆ，ｋ）と緩和係数β（ｋ）との積を減算し、減算結果をＤ_２（ｆ，ｋ）に加算した結果が得られるような処理を行って、第２残留エコー抑圧信号Ｄ_３（ｆ，ｋ）を求めることができればよい。 Note that the configuration of the second residual echo suppression unit 160 is not limited to the configuration of FIGS. 11A and 11B, and the noise removal signal D ₂ (f, k), the echo suppression gain Gb ^ (f, k), and the relaxation. A product obtained by subtracting the product of the noise removal signal D ₂ (f, k) and the relaxation coefficient β (k) from the product of the coefficient β (k) and adding the subtraction result to D ₂ (f, k) is obtained. It suffices if the second residual echo suppression signal D ₃ (f, k) can be obtained by performing such processing.

本発明のポイントは、母音子音判定部１４０で抑圧対象の信号が母音であるか子音であるかを判定し、判定結果を用いて緩和係数β（ｋ）を変更することである。よって、図４中破線で示すように、適応フィルタ部１１における線形エコー消去処理（ｓ１１）や、雑音抑圧部１５の雑音抑圧処理（ｓ１５）は必ずしも行わなくともよく、対応する各部はそれぞれ設けなくともよい。また、母音子音判定部１４０に第１残留エコー抑圧信号Ｄ’_３（ｆ，ｋ）以外の信号を送る場合には、第１残留エコー抑圧部１３０における第１残留エコー抑圧処理（ｓ１３０）のうち、少なくともエコー抑圧ゲイン計算部１３１においてエコー抑圧ゲインを求めればよく（ｓ１３１）、図８中破線で示すように、乗算部１３５での乗算処理（ｓ１３５）は行わなくともよく、乗算部１３５は設けなくともよい。なお、適応フィルタ部１１、雑音抑圧部１５、第１残留エコー抑圧部１３０、母音子音判定部１４０における処理は例示であり、他の従来技術を用いてもよい。 The point of the present invention is that the vowel consonant determination unit 140 determines whether the signal to be suppressed is a vowel or a consonant, and changes the relaxation coefficient β (k) using the determination result. Therefore, as shown by a broken line in FIG. 4, the linear echo cancellation process (s11) in the adaptive filter unit 11 and the noise suppression process (s15) of the noise suppression unit 15 do not necessarily have to be performed, and corresponding units are not provided. Also good. Further, when a signal other than the first residual echo suppression signal D ′ ₃ (f, k) is sent to the vowel consonant determination unit 140, the first residual echo suppression process (s 130) in the first residual echo suppression unit 130 At least the echo suppression gain calculation unit 131 only needs to obtain the echo suppression gain (s131). As indicated by the broken line in FIG. 8, the multiplication processing by the multiplication unit 135 (s135) may not be performed, and the multiplication unit 135 is provided. Not necessary. Note that the processes in the adaptive filter unit 11, the noise suppression unit 15, the first residual echo suppression unit 130, and the vowel consonant determination unit 140 are examples, and other conventional techniques may be used.

例えば、母音子音判定部１４０の判定用評価値計算部１４１において、参考文献１記載の方法で第１残留エコー抑圧信号Ｄ’_３（ｆ，ｋ）のスペクトルのスパース性を求めてもよい。
［参考文献１］荒木章子、中谷智広、澤田宏著、"ディリクレ事前分布を用いた音声のスパース性に基づく音源数推定と音源分離"、音響学会2009年秋季研究発表会、2009
なお、参考文献１において、ディリクレ分布はφの値が1より小さい場合、ベクトルαがスパースなほど大きな値となる。 For example, the determination evaluation value calculation unit 141 of the vowel consonant determination unit 140 may determine the sparsity of the spectrum of the first residual echo suppression signal D ′ ₃ (f, k) by the method described in Reference Document 1.
[Reference 1] Akiko Araki, Tomohiro Nakatani, Hiroshi Sawada, "Sound source number estimation and sound source separation based on speech sparsity using Dirichlet prior distribution", Acoustical Society of Japan 2009 Fall Meeting, 2009
In Reference Document 1, when the value of φ is smaller than 1, the Dirichlet distribution becomes larger as the vector α becomes sparse.

また、母音子音判定部１４０は、スペクトルのスパース性を示す値を用いずに、例えば参考文献２または３記載の方法で、抑圧対象の信号Ｄ_２（ｆ，ｋ）が母音であるか子音であるかを判定してもよい。
［参考文献２］澤田秀之、大加戸稔著、”雑音環境下における音声インターフェース構築のための特定話者のセンシング”、電気学会論文誌、2006、Vol.126, No.11, pp.1446-1453
［参考文献３］二矢田勝行、星見昌克著、”帯域パワーとＬＰＣケプストラム係数の時系列を用いた不特定話者用子音認識法”、電子情報通信学会論文誌Ｄ、1986、Vol.J69-D、No.6、pp.949-957
この場合、参考文献２においては波形の絶対値の時間平均の大きさで母音子音を判断し、参考文献３においてはパワーの変動を見てパワーディップ（子音部）を抽出する。 Further, the vowel consonant determination unit 140 does not use a value indicating the sparsity of the spectrum, for example, by the method described in Reference 2 or 3, the signal D ₂ (f, k) to be suppressed is a vowel or a consonant. You may determine whether there is.
[Reference 2] Hideyuki Sawada and Satoshi Okado, "Sensing of a specific speaker for voice interface construction under noisy environment", IEEJ Transactions, 2006, Vol.126, No.11, pp.1446- 1453
[Reference 3] Katsuyuki Niyada and Masakatsu Hoshimi, “Consonant recognition method for unspecified speakers using time series of band power and LPC cepstrum coefficient”, IEICE Transactions D, 1986, Vol. -D, No.6, pp.949-957
In this case, in Reference Document 2, a vowel consonant is determined based on the time average magnitude of the absolute value of the waveform, and in Reference Document 3, a power dip (consonant part) is extracted by looking at power fluctuations.

なお、適応フィルタ部１１等を設けない場合には、周波数領域変換部１３が受け取る信号は、残留エコー信号ｄ_１（ｎ）以外の収音信号ｙ（ｎ）に基づいて得られる信号（例えば収音信号ｙ（ｎ）自体等）でもよい。 When the adaptive filter unit 11 or the like is not provided, the signal received by the frequency domain conversion unit 13 is a signal obtained based on the sound collection signal y (n) other than the residual echo signal d ₁ (n) (for example, collection). Sound signal y (n) itself, etc.).

また、第１残留エコー抑圧部１３０及び第２残留エコー抑圧部１６０が受け取る信号は、雑音除去信号Ｄ_２（ｆ，ｋ）以外の周波数領域の各信号Ｙ（ｆ，ｋ）、Ｄ_１（ｆ，ｋ）の何れかであってもよく、反響消去装置の構成に合わせて適宜変更する。 The signals received by the first residual echo suppression unit 130 and the second residual echo suppression unit 160 are signals Y (f, k) and D ₁ (f) in the frequency domain other than the noise removal signal D ₂ (f, k). , K), and may be changed as appropriate according to the configuration of the echo canceller.

緩和係数決定部１５０では、β（ｋ）＝１またはγとしているが、これに限定されるものではなく、β（ｋ）＝γ_１（＝αγ）またはγ_２（＝α）（但し、０＜α＜１）として緩和係数に定数αを乗じてもよい。αとαγの値は実験等により母音に適切な緩和係数として、子音に適切な緩和係数として予め定められる（例えばα＝０．５、γ＝０．５とし、γ_１＝０．２５、γ_２＝０．５など）。 In the relaxation coefficient determination unit 150, β (k) = 1 or γ is set, but the present invention is not limited to this, and β (k) = γ ₁ (= αγ) or γ ₂ (= α) (provided that 0 The relaxation coefficient may be multiplied by a constant α as <α <1). The values of α and αγ are determined in advance as appropriate relaxation coefficients for vowels and appropriate relaxation coefficients for consonants by experiments or the like (for example, α = 0.5, γ = 0.5, γ ₁ = 0.25, γ ₂ = 0.5 etc.).

また、γ_１、γ_２、緩和係数β（ｋ）は周波数毎に異なる値をとる構成としてもよい。このとき、γ_１＝｛γ_１（０），γ_１（１），…，γ_１（Ｆ）｝、γ_２＝｛γ_２（０），γ_２（１），…，γ_２（Ｆ）｝、β（ｋ）＝｛β（０，ｋ），β（１，ｋ），…，β（Ｆ，ｋ）｝であり、γ_１（ｆ）≦γ_２（ｆ）であり、少なくとも一部の離散角周波数ｆ’において、γ_１（ｆ’）＜γ_２（ｆ’）であればよい。このような構成とすることで、周波数毎に適切な緩和係数を設定することができる。例えば、周波数が高くなるほど、子音部分が多くなるので、緩和係数が小さくなるように設定する構成が考えられる。 Further, γ ₁ , γ ₂ , and the relaxation coefficient β (k) may have different values for each frequency. At this _{_{time, γ 1 = {γ 1 (}} 0), γ 1 (1), ..., γ 1 (F)}, γ 2 = {γ 2 (0), γ 2 (1), ..., γ 2 (F )}, Β (k) = {β (0, k), β (1, k),..., Β (F, k)}, and γ ₁ (f) ≦ γ ₂ (f), at least It is sufficient that γ ₁ (f ′) <γ ₂ (f ′) at some discrete angular frequencies f ′. With such a configuration, an appropriate relaxation coefficient can be set for each frequency. For example, since the consonant part increases as the frequency increases, a configuration in which the relaxation coefficient is set to be small can be considered.

＜反響消去装置２００＞
図３、４、７、１２、１３を用いて実施例２に係る反響消去装置２００について実施例１と異なる部分のみ説明する。緩和係数決定部２５０の構成及び処理内容が実施例１と異なる。 <Echo canceling device 200>
Only a portion different from the first embodiment will be described with respect to the echo canceling apparatus 200 according to the second embodiment with reference to FIGS. The configuration and processing contents of the relaxation coefficient determination unit 250 are different from those in the first embodiment.

母音子音判定部１４０は、判定結果ｊ（ｋ）に加えて、図７中一点鎖線で示すように、判定用評価値計算部１４１で求めたスパース性を示す値Ｓ（Ｄ’_３（ｋ））も緩和係数決定部２５０へ出力する。
＜緩和係数決定部２５０＞
緩和係数決定部２５０は、判定結果ｊ（ｋ）とスパース性を示す値Ｓ（Ｄ’_３（ｋ））を受け取る。ｊ（ｋ）が母音であることを表す情報の場合には、切替部２５８は、記憶部２５１と接続する。緩和係数決定部２５０は、記憶部２５１から１を取り出し、β（ｋ）＝１として、緩和係数β（ｋ）を決定し、出力する（ｓ２５０、ｓ２５１）。 In addition to the determination result j (k), the vowel consonant determination unit 140 has a value S (D ′ ₃ (k) indicating the sparsity obtained by the evaluation value calculation unit 141 for determination, as indicated by a dashed line in FIG. ) Is also output to the relaxation coefficient determination unit 250.
<Relaxation coefficient determination unit 250>
The relaxation coefficient determination unit 250 receives the determination result j (k) and the value S (D ′ ₃ (k)) indicating sparsity. When j (k) is information indicating that it is a vowel, the switching unit 258 is connected to the storage unit 251. The relaxation coefficient determination unit 250 extracts 1 from the storage unit 251, determines β (k) = 1, and outputs the relaxation coefficient β (k) (s250, s251).

ｊ（ｋ）が子音であることを表す情報の場合には、切替部２５８は、加算部２５７と接続する。緩和係数決定部２５０は、加算部２５７からγ_１（ｋ）＝１−κ（Ｔ−Ｓ（Ｄ’（ｋ））を受け取り、β（ｋ）＝γ_１として、緩和係数β（ｋ）を決定し、出力する（ｓ２５０、ｓ２５７）。なお、０≦κ≦１／Ｔとする。図１３にＳ（Ｄ’（ｋ））とβ（ｋ）の関係を示す。 When j (k) is information indicating that it is a consonant, the switching unit 258 is connected to the adding unit 257. The relaxation coefficient determination unit 250 receives γ ₁ (k) = 1−κ (TS (D ′ (k))) from the addition unit 257, sets β (k) = γ ₁ and sets the relaxation coefficient β (k). The output is determined and output (s250, s257), where 0 ≦ κ ≦ 1 / T, and the relationship between S (D ′ (k)) and β (k) is shown in FIG.

なお、減算部２５４は、記憶部２５４から取り出した閾値Ｔから受け取ったＳ（Ｄ’（ｋ））を差し引き、（Ｔ−Ｓ（Ｄ’（ｋ）））を求める。乗算部２５６は、記憶部２５５から取り出した値κを（Ｔ−Ｓ（Ｄ’（ｋ）））に乗じ、κ（Ｔ−Ｓ（Ｄ’（ｋ））を求める。加算部２５７は、記憶部２５１から取り出した値１からκ（Ｔ−Ｓ（Ｄ’（ｋ））を差し引き、γ_１（ｋ）を求め、格納しておく。 Note that the subtraction unit 254 subtracts S (D ′ (k)) received from the threshold T extracted from the storage unit 254 to obtain (TS−D (k ′)). The multiplication unit 256 multiplies the value κ extracted from the storage unit 255 by (TS (D ′ (k))) to obtain κ (TS (D ′ (k)). The addition unit 257 stores the value. Κ (TS (D ′ (k))) is subtracted from the value 1 taken out from the unit 251, and γ ₁ (k) is obtained and stored.

なお、母音子音判定部１４０の判定部１４３と緩和係数決定部２５０の処理は、以下の式で表すことができる。 In addition, the process of the determination part 143 of the vowel consonant determination part 140 and the relaxation coefficient determination part 250 can be represented with the following formula | equation.

＜効果＞
このような構成とすることで、実施例１と同様の効果を奏する。さらにＳ（Ｄ’（ｋ））＜Ｔとなる範囲のうちでも、非常にスパース性の低い信号は抑圧を小さ目に、ある程度スパース性のある信号に対しては抑圧を大き目に設定するといった柔軟な設定が可能となる。 <Effect>
By adopting such a configuration, the same effects as those of the first embodiment are obtained. Furthermore, even within the range where S (D ′ (k)) <T, a signal having a very low sparsity is set to a small suppression, and a signal having a certain degree of sparsity is set to a large suppression. Setting is possible.

［変形例］
実施例２では、閾値ＴとＳ（Ｄ’（ｋ））の関係により、場合分けして緩和係数β（ｋ）を求めているが、場合分けせず、緩和係数β（ｋ）はＳ（Ｄ’（ｋ））が増加するにしたがって単調増加する値であるとしてもよい。 [Modification]
In the second embodiment, the relaxation coefficient β (k) is obtained for each case according to the relationship between the threshold value T and S (D ′ (k)). However, the relaxation coefficient β (k) is determined as S ( It may be a value that monotonously increases as D ′ (k)) increases.

前述のとおり、０≦Ｓ（Ｄ’_３（ｋ））≦１なので、閾値Ｔ＝１とすれば、このような構成を実現することができる。さらに、母音子音判定部の処理を省き、簡略化することができる。つまり、図７において、母音子音判定部１４０は判定用評価値計算部１４１のみを備え、Ｓ（Ｄ’（ｋ））のみを出力する。図１２において、記憶部２５１と切替部２５８を設けず、緩和係数決定部２５０は、フレーム毎にβ（ｋ）＝１−κ（Ｔ−Ｓ（Ｄ’（ｋ））を算出し、出力する。このような構成の場合にも、状況に応じてエコー抑圧ゲインの大きさを変更することができ、非常にスパース性の低い信号は抑圧を小さ目に、ある程度スパース性のある信号に対しては抑圧を大き目に設定するといった柔軟な設定が可能となる。 As described above, since 0 ≦ S (D ′ ₃ (k)) ≦ 1, such a configuration can be realized by setting the threshold T = 1. Furthermore, the processing of the vowel consonant determination unit can be omitted and simplified. That is, in FIG. 7, the vowel consonant determination unit 140 includes only the evaluation value calculation unit 141 for determination, and outputs only S (D ′ (k)). In FIG. 12, the storage unit 251 and the switching unit 258 are not provided, and the relaxation coefficient determination unit 250 calculates and outputs β (k) = 1−κ (TS (D ′ (k))) for each frame. Even in such a configuration, the magnitude of the echo suppression gain can be changed according to the situation, and a signal with very low sparsity is small, and for signals with a certain degree of sparsity, A flexible setting such as setting suppression to a large value is possible.

なお、κは周波数毎に異なる値をとる構成としてもよい。このとき、κ＝｛κ（０），κ（１），…，κ（Ｆ）｝であり、少なくとも一部の離散角周波数ｆ’において、１−κ（ｆ’）（Ｔ−Ｓ（Ｄ’（ｋ））＜γ_２（ｆ’）であればよい。このような構成とすることで、β（ｋ）を周波数毎に異なる値とし、より細かな緩和係数の設定を可能とする。 Note that κ may have a different value for each frequency. At this time, κ = {κ (0), κ (1),..., Κ (F)}, and at least a part of the discrete angular frequency f ′ is 1−κ (f ′) (TS−D (D It suffices if “(k)) <γ ₂ (f ′). With such a configuration, β (k) is set to a different value for each frequency, and a finer relaxation coefficient can be set.

＜反響消去装置３００＞
図３、４、７、１５、１６を用いて実施例３に係る反響消去装置３００について実施例１と異なる部分のみ説明する。緩和係数決定部３５０の構成及び処理内容が実施例１と異なる。
＜緩和係数決定部３５０＞
緩和係数決定部３５０は、判定結果ｊ（ｋ）と受話信号Ｘ（ｋ）と第１残留エコー抑圧信号Ｄ’_３（ｋ）を受け取る。ｊ（ｋ）が母音であることを表す情報の場合には、切替部３５６は、記憶部３５４と接続する。 <Echo canceling device 300>
Only the parts different from the first embodiment of the echo canceling apparatus 300 according to the third embodiment will be described with reference to FIGS. The configuration and processing contents of the relaxation coefficient determination unit 350 are different from those in the first embodiment.
<Relaxation coefficient determination unit 350>
The relaxation coefficient determination unit 350 receives the determination result j (k), the received signal X (k), and the first residual echo suppression signal D ′ ₃ (k). When j (k) is information indicating that it is a vowel, the switching unit 356 is connected to the storage unit 354.

送話音声検知部３５１及び判定部３５２は、それぞれ判定結果ｊ（ｋ）を受け取り、ｊ（ｋ）が子音であることを表す情報の場合には、以下の処理を行う。 The transmitted voice detection unit 351 and the determination unit 352 each receive the determination result j (k), and when j (k) is information indicating that it is a consonant, the following processing is performed.

まず、送話音声検知部３５１において、||Ｄ’_３（ｋ）||／||Ｘ（ｋ）||を求める。なお、||・||はノルムをとることを表し、Ｘ（ｋ）＝｛Ｘ（０，ｋ），Ｘ（１，ｋ），…，Ｘ（Ｆ，ｋ）｝である。 First, in the transmitted voice detection unit 351, || D ′ ₃ (k) || / || (X (k) || is obtained. ||. || represents that the norm is taken, and X (k) = {X (0, k), X (1, k),..., X (F, k)}.

判定部３５２は、この値||Ｄ’_３（ｋ）||／||Ｘ（ｋ）||を受け取り、閾値Ｔ_ｒより小さいか否かを判定し、判定結果ｊ_２（ｋ）を切替部３５６に出力する。ｊ_２（ｋ）が閾値Ｔ_ｒより小さいことを表す情報の場合には、切替部３５６は、判定結果ｊ（ｋ）の値に係らず、記憶部３５４と接続する。緩和係数決定部３５０は、記憶部３５４から１を取り出し、β（ｋ）＝１として、緩和係数β（ｋ）を決定し、出力する（ｓ３５０、ｓ３５４）。但し、Ｔ_ｒは予め定められた正の実数であり、送話音声の子音部分が受話信号よりも十分に小さくなった場合に緩和係数が１になるよう調整するための値であり、実験等により予め適切な値を求め、予め定めておく。Ｔ_ｒは０より大きな値であり、例えばＴ_ｒ＝０．０１とする。 The determination unit 352 receives this value || D ′ ₃ (k) || / || (k) ||, determines whether or not it is smaller than the threshold value _Tr, and switches the determination result j ₂ (k). Output to the unit 356. In the case of information indicating that j ₂ (k) is smaller than the threshold value _Tr , the switching unit 356 connects to the storage unit 354 regardless of the value of the determination result j (k). The relaxation coefficient determining unit 350 extracts 1 from the storage unit 354, determines β (k) = 1, and outputs the relaxation coefficient β (k) (s350, s354). However, _Tr is a predetermined positive real number, and is a value for adjusting the relaxation coefficient to be 1 when the consonant part of the transmitted voice is sufficiently smaller than the received signal, such as an experiment. To obtain an appropriate value in advance. T _r is a value larger than 0, for example, T _r = 0.01.

切替部３５６は、上述の場合を除いて（つまり、判定結果ｊ（ｋ）が子音であることを表す情報を受け取り、かつ、判定結果ｊ_２（ｋ）が閾値Ｔ_ｒより大きいことを表す情報を受け取った場合）には、記憶部３５５と接続する。緩和係数決定部３５０は、記憶部３５５からγ_１（０≦γ_１＜１）を取り出し、β（ｋ）＝γ_１として、緩和係数β（ｋ）を決定し、出力する（ｓ３５０、ｓ３５５）。 Except for the above case, switching unit 356 receives information indicating that determination result j (k) is a consonant, and indicates that determination result j ₂ (k) is greater than threshold value _Tr. Is connected to the storage unit 355. The relaxation coefficient determination unit 350 takes out γ ₁ (0 ≦ γ ₁ <1) from the storage unit 355, determines the relaxation coefficient β (k) as β (k) = γ ₁ and outputs it (s350, s355). .

なお、母音子音判定部１４０の判定部１４３と緩和係数決定部３５０の処理は、以下の式で表すことができる。 In addition, the process of the determination part 143 of the vowel consonant determination part 140 and the relaxation coefficient determination part 350 can be represented with the following formula | equation.

＜効果＞
このような構成とすることで、実施例１と同様の効果を得ることができる。さらに、送話音声が存在しない、または、送話音声が非常に小さいときには、第１エコー抑圧信号Ｄ’_３（ｆ，ｋ）がスパース性を持っていたとしても緩和係数を１にするので、抑圧ゲインを緩和することなく、十分なエコー消去が可能となる。このようにスパース性の判定と通話状態の判定の両方を用いてゲインを緩和することで、抑圧を緩和する必要がない送話音声がない区間等において、エコーを十分に抑圧することができる。 <Effect>
By adopting such a configuration, the same effect as in the first embodiment can be obtained. Furthermore, when the transmission voice does not exist or the transmission voice is very small, the relaxation coefficient is set to 1 even if the first echo suppression signal D ′ ₃ (f, k) has sparsity. Sufficient echo cancellation can be performed without reducing the suppression gain. As described above, by reducing the gain using both the sparsity determination and the call state determination, it is possible to sufficiently suppress the echo in a section where there is no transmission voice that does not require suppression.

本発明の反響消去方法は、ハンズフリー通話、ハンズフリー音声認識等に利用することができる。 The echo canceling method of the present invention can be used for hands-free calling, hands-free speech recognition, and the like.

１００、２００、３００反響消去装置
１１適応フィルタ部
１３、１７周波数領域変換部
１５雑音抑圧部
１９時間領域変換部
１３０第１残留エコー抑圧部
１４０母音子音判定部
１５０、２５０、３５０緩和係数決定部
１６０第２残留エコー抑圧部 100, 200, 300 Echo canceling device 11 Adaptive filter unit 13, 17 Frequency domain conversion unit 15 Noise suppression unit 19 Time domain conversion unit 130 First residual echo suppression unit 140 Vowel consonant determination unit 150, 250, 350 Relaxation coefficient determination unit 160 Second residual echo suppressor

Claims

n represents time, f = 1, 2,..., F represents discrete angular frequency, k represents frame time, and γ ₁ <γ ₂ .
Frequency domain conversion step of converting the signal d (n) and the received signal x (n) obtained based on the collected sound signal into frequency domain signals D (f, k) and X (f, k) for each frame, respectively. When,
An echo suppression gain calculation step for obtaining an echo suppression gain Gb ^ (f, k) using the signals D (f, k) and X (f, k);
A vowel consonant determination step for determining whether a signal to be suppressed is a vowel or a consonant using a signal D ′ (f, k) obtained by removing an echo component from the signal D (f, k);
In the vowel consonant determination step, when it is determined that the signal to be suppressed is a vowel, γ ₂ is set as a relaxation coefficient β (k), and in other cases, γ ₁ is set as a relaxation coefficient β (k ) And a relaxation coefficient determination step
From the product of the signal D (f, k), the echo suppression gain Gb ^ (f, k), and the relaxation coefficient β (k), the signal D (f, k) and the relaxation coefficient β (k) A second residual echo suppression step for obtaining a second residual echo suppression signal D ₃ (f, k) by performing a process so as to obtain a result obtained by subtracting the product and adding the result to D (f, k);
A time domain conversion step of converting the second residual echo suppression signal D ₃ (f, k) into a time domain signal d ₃ (n);
An echo canceling method.

The echo cancellation method according to claim 1 ,
In the vowel consonant determination step, a value S (D ′ (k)) indicating the sparsity of the spectrum of the signal using the signal D ′ (f, k) obtained by removing the echo component from the signal D (f, k). ), And when the value S (D ′ (k)) is greater than or equal to the threshold T, it is determined as a vowel, and when it is less than the threshold T, it is determined as a consonant.
An echo canceling method characterized by the above.

The echo cancellation method according to claim 1 or 2,
In the relaxation coefficient determination step, when it is determined that the signal to be suppressed is a vowel, the γ ₂ = 1 is set as the relaxation coefficient β (k), and in other cases, the 0 ≦ γ ₁ <1 Is the relaxation coefficient β (k),
An echo canceling method characterized by the above.

3. The echo canceling method according to claim 2, wherein 0 ≦ S (D ′ (k)) ≦ 1, 0 ≦ T ≦ 1, 0 ≦ κ ≦ 1 / T,
In the relaxation coefficient determining step, when it is determined that the signal to be suppressed is a vowel, the γ ₂ = 1 is set as the relaxation coefficient β (k), and in other cases, the γ ₁ (k) = 1−κ (TS (D ′ (k)) is a relaxation coefficient β (f),
An echo canceling method characterized by the above.

The echo cancellation method according to claim 1 or 2,
T _r is a predetermined positive real number,
When it is determined in the relaxation coefficient determination step that the signal to be suppressed is a vowel or when (|| D ′ (k) |||| (X (k) ||) < _Tr Γ ₂ = 1 is a relaxation coefficient β (k), and in other cases, 0 ≦ γ ₁ <1 is a relaxation coefficient β (k).
An echo canceling method characterized by the above.

n represents time, f = 1, 2,..., F represents discrete angular frequency, k represents frame time,
Frequency domain conversion step of converting the signal d (n) and the received signal x (n) obtained based on the collected sound signal into frequency domain signals D (f, k) and X (f, k) for each frame, respectively. When,
An echo suppression gain calculation step for obtaining an echo suppression gain Gb ^ (f, k) using the signals D (f, k) and X (f, k);
A value S (D ′ (k)) indicating the sparsity of the spectrum of the signal D (f, k) using the signal D ′ (f, k) obtained by removing an echo component from the signal D (f, k). An evaluation value calculation step for vowel consonant determination to obtain
A relaxation coefficient determining step for increasing the relaxation coefficient β (k) as the value of S (D ′ (k)) increases;
From the product of the signal D (f, k), the echo suppression gain Gb ^ (f, k), and the relaxation coefficient β (k), the signal D (f, k) and the relaxation coefficient β (k) A second residual echo suppression step for obtaining a second residual echo suppression signal D ₃ (f, k) by performing a process so as to obtain a result obtained by subtracting the product and adding the result to D (f, k);
A time domain conversion step of converting the second residual echo suppression signal D ₃ (f, k) into a time domain signal d ₃ (n);
An echo canceling method.

The echo canceling method according to any one of claims 1 to 6,
The γ ₁ , γ ₂ , β (k) can take different values for each frequency, and γ ₁ = {γ ₁ (0), γ ₁ (1),..., Γ ₁ (F)}, γ ₂ = {Γ ₂ (0), γ ₂ (1),..., Γ ₂ (F)}, β (k) = {β (0, k), β (1, k),. )}, And γ ₁ (f ′) <γ ₂ (f ′) at least at some of the discrete angular frequencies f ′.
An echo canceling method characterized by the above.

n represents time, f = 1, 2,..., F represents discrete angular frequency, k represents frame time, and γ ₁ <γ ₂ .
Frequency domain converter for converting signal d (n) and received signal x (n) obtained based on the collected sound signal into frequency domain signals D (f, k) and X (f, k) for each frame, respectively. When,
An echo suppression gain calculator for obtaining an echo suppression gain Gb ^ (f, k) using the signals D (f, k) and X (f, k);
A vowel consonant determination unit that determines whether a signal to be suppressed is a vowel or a consonant using a signal D ′ (f, k) obtained by removing an echo component from the signal D (f, k);
When the vowel consonant determination unit determines that the signal to be suppressed is a vowel, the γ ₂ is set as a relaxation coefficient β (k); otherwise, γ ₁ is set as a relaxation coefficient β (k ) And a relaxation coefficient determination unit
From the product of the signal D (f, k), the echo suppression gain Gb ^ (f, k), and the relaxation coefficient β (k), the signal D (f, k) and the relaxation coefficient β (k) A second residual echo suppression unit that obtains a second residual echo suppression signal D ₃ (f, k) by performing a process such that a product is subtracted and added to D (f, k);
A time domain converter that converts the second residual echo suppression signal D ₃ (f, k) into a time domain signal d ₃ (n);
An echo canceling device.

An echo canceling program for causing a computer to execute each step constituting the echo canceling method according to any one of claims 1 to 7.