JP2001094480A

JP2001094480A - Method and device for suppressing echo

Info

Publication number: JP2001094480A
Application number: JP27007799A
Authority: JP
Inventors: Kiyotaka Sakauchi; 澄宇阪内; Masafumi Tanaka; 雅史田中; Yutaka Kaneda; 豊金田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-09-24
Filing date: 1999-09-24
Publication date: 2001-04-06
Anticipated expiration: 2019-09-24
Also published as: JP3579622B2

Abstract

PROBLEM TO BE SOLVED: To suppress an echo signal by working out a desired echo suppression amount that is in matching with various loudspeaking environments and universal to users. SOLUTION: A masking threshold with respect to echo signals is obtained from all masker signals masking the echo signals, a desired echo suppression amount is worked out from the masking threshold and the echo signals, and the echo signals are suppressed based thereon.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、例えば、２線４線
変換系および拡声通話系などにおいて、ハウリングの原
因および聴覚上の障害となるエコー信号を、通話品質上
最適に抑圧するための所望エコー抑圧量の導出方法に関
するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for optimally suppressing echo signals that cause howling and impair hearing in the speech quality, for example, in a two-wire / four-wire conversion system and a loudspeaker system. The present invention relates to a method for deriving an echo suppression amount.

【０００２】[0002]

【従来の技術】図１は、拡声通話系の模式図を示したも
のである。図１において、１，３は送話用マイクロホ
ン、２，４は受話用スピーカ、５，７は送話信号増幅
器、６，８は受話信号増幅器、９は伝送路、10,11は送
話者、12は受話者をそれぞれ表す。送話者10の発声した
送話音声は、送話用マイクロホン１、送話信号増幅器
５、伝送路９、受話信号増幅器８、受話用スピーカ４を
経て送話者11に伝わる。この拡声通話系は、従来の電話
通話系のように送受話器を手に持つ必要がないため、作
業をしながらの通話が可能であったり、また、自然な対
面通話を実現できるという長所を持ち、通信会議やテレ
ビ電話、拡声電話機などに広く利用が進められている。2. Description of the Related Art FIG. 1 is a schematic diagram of a loudspeaker system. In FIG. 1, reference numerals 1 and 3 denote transmission microphones, reference numerals 2 and 4 denote reception speakers, reference numerals 5 and 7 denote transmission signal amplifiers, reference numerals 6 and 8 denote reception signal amplifiers, reference numeral 9 denotes a transmission line, and reference numerals 10 and 11 denote speakers. , 12 represent the listener, respectively. The transmitted voice uttered by the transmitter 10 is transmitted to the transmitter 11 via the transmission microphone 1, the transmission signal amplifier 5, the transmission path 9, the reception signal amplifier 8, and the reception speaker 4. Unlike the conventional telephone communication system, this loudspeaker communication system does not require the handset to be held in the hand, so it has the advantages of being able to make a call while working and of realizing a natural face-to-face call. It is widely used for teleconferences, videophones, loudspeakers, and the like.

【０００３】一方、この通話系の欠点として、エコーの
存在が問題となっている。すなわち、図１において、受
話用スピーカ４から受話側に伝わった音声が、送話用マ
イクロホン３で受音され、送話信号増幅器７、伝送路
９、受話信号増幅器６、受話用スピーカ２を経て送話側
に再生される。送話者10および受話者12にとって、この
現象は、自分の発声した音声が、受話用スピーカ２から
再生されるというエコー現象であり、音響エコーなどと
呼ばれている。このエコー現象は、拡声通話系において
通話の障害や不快感などの悪影響を生じる。さらに、受
話用スピーカ２から再生された音は、送話用マイクロホ
ン１で受音されて信号の閉ループを形成する。そして、
ループゲインが１より大きい場合にはハウリング現象が
発生して、通話は不能となる。On the other hand, as a drawback of this communication system, the existence of an echo is a problem. That is, in FIG. 1, the sound transmitted from the receiving speaker 4 to the receiving side is received by the transmitting microphone 3, and passes through the transmitting signal amplifier 7, the transmission line 9, the receiving signal amplifier 6, and the receiving speaker 2. Played back to the sender. For the sender 10 and the receiver 12, this phenomenon is an echo phenomenon in which the voice uttered by oneself is reproduced from the receiving speaker 2, and is called an acoustic echo or the like. This echo phenomenon causes adverse effects such as trouble in communication and discomfort in a voice communication system. Further, the sound reproduced from the receiving speaker 2 is received by the transmitting microphone 1 to form a closed loop of the signal. And
If the loop gain is larger than 1, a howling phenomenon occurs, and a call cannot be made.

【０００４】このような拡声通話系の問題点を克服する
ために、エコーキャンセラが利用されている。図２は、
その模式図を示したものであり、エコーキャンセラ21は
大きく分けて適応フィルタ処理部22と損失挿入処理部23
から構成される。適応フィルタ処理部22は音響エコー経
路のインパルス応答を推定し、疑似エコー信号を生成
し、マイクロホン出力信号から差し引くことによりエコ
ーの消去を行う。ここで適応フィルタは、音響エコー経
路の経時変動に追従するために、適応アルゴリズムを用
いてインパルス応答の推定を行う。適応アルゴリズムと
はマイクロホン出力信号から疑似エコー信号を差し引い
た誤差信号のパワーを最小になるような推定値を定める
アルゴリズムである。An echo canceller is used to overcome such a problem of the voice communication system. FIG.
The echo canceller 21 is roughly divided into an adaptive filter processing unit 22 and a loss insertion processing unit 23.
Consists of The adaptive filter processing unit 22 estimates the impulse response of the acoustic echo path, generates a pseudo echo signal, and cancels the echo by subtracting it from the microphone output signal. Here, the adaptive filter estimates an impulse response using an adaptive algorithm in order to follow the temporal variation of the acoustic echo path. The adaptive algorithm is an algorithm that determines an estimated value that minimizes the power of an error signal obtained by subtracting a pseudo echo signal from a microphone output signal.

【０００５】音声スイッチやエコーサプレッサなどの損
失挿入処理部23は、適応フィルタによって消し去ること
のできない残留エコーを回線に損失を挿入することによ
り抑圧する。このような適応フィルタ処理部や損失挿入
回路から構成されるエコーキャンセラの装置設計を行う
際には、「エコーをどのレベルまで低減する必要がある
のか。」すなわち、所望エコー抑圧量を決定することが
重要となる。[0005] A loss insertion processing unit 23 such as a voice switch or an echo suppressor suppresses residual echo that cannot be eliminated by the adaptive filter by inserting a loss into the line. When designing an echo canceller composed of such an adaptive filter processing unit and a loss insertion circuit, it is necessary to determine "to what level should the echo be reduced?" Is important.

【０００６】なぜなら、最適な所望エコー抑圧量が決定
できれば、適応フィルタ処理部のタップ数の最適化を行
うことが可能となり、タップ数不足による耳につく残留
エコーの発生や、タップ数過剰によるハード規模の増大
を防ぐことができるためである。また、損失挿入処理部
においては、適応フィルタ処理部で消しきれなかった残
留エコー分だけの損失挿入が可能となり、過剰な損失挿
入による送話音声の劣化や背景雑音の不自然な断続感を
防ぐことができるためである。The reason is that if the optimum desired echo suppression amount can be determined, the number of taps in the adaptive filter processing unit can be optimized. This is because an increase in scale can be prevented. Further, in the loss insertion processing unit, it is possible to insert a loss only for the residual echo that has not been completely eliminated by the adaptive filter processing unit, thereby preventing deterioration of transmitted voice due to excessive loss insertion and unnatural intermittent feeling of background noise. This is because you can do it.

【０００７】これまで、帯域一括の所望エコー抑圧量に
ついては、ITU-T勧告のG.167などで検討が進んでいる。
さらに、聴覚特性が周波数に依存した性質をもつため、
周波数帯域毎の所望エコー抑圧量も、特開平11−122144
号公報（特願平９−278444号）などで明らかにされてい
る。Until now, the amount of desired echo suppression for a band has been studied in ITU-T recommendation G.167 and the like.
Furthermore, since the auditory characteristics have a frequency-dependent property,
The desired echo suppression amount for each frequency band is also disclosed in JP-A-11-122144.
No. 9-278444.

【０００８】[0008]

【発明が解決しようとする課題】従来技術で述べた所望
エコー抑圧量の決定方法の、全ては主観評価に依るもの
であるため、得られた実験値に客観生が乏しいという問
題がある。ひとつめの問題は、主観評価で得られた実験
値は、その実験が行われた拡声通話環境、例えば、音響
特性、伝送特性などに依存した値となる。これでは、想
定したエコーキャンセラを使用する拡声通話環境であれ
ば支障がないが、ひとつでもその特性が異なる拡声通話
環境には適用できない。つまり、様々な拡声通話環境に
見合う値を求めるためには、その全ての拡声通話環境で
主観評価を行う必要があり、それは、現実的には不可能
である。しかし、拡声通話の利用環境は、従来の会議室
から、家庭内、雑踏、自動車内などと多様化しており、
様々な環境での所望エコー抑圧量を決定する必要があ
る。Since all of the methods for determining the desired echo suppression amount described in the prior art are based on subjective evaluation, there is a problem that the obtained experimental values are poor in objective life. The first problem is that the experimental value obtained by the subjective evaluation depends on the loudspeaker environment in which the experiment was performed, for example, the acoustic characteristics and the transmission characteristics. In this case, there is no problem in a loudspeaker environment using an assumed echo canceller, but it cannot be applied to a loudspeaker environment having at least one characteristic. That is, in order to obtain a value suitable for various loudspeaker environments, it is necessary to perform a subjective evaluation in all loudspeaker environments, which is practically impossible. However, the use environment of loudspeakers is diversifying from conventional conference rooms to homes, busy streets, cars, etc.
It is necessary to determine the desired amount of echo suppression in various environments.

【０００９】ふたつめの問題は、主観評価では、評定者
の評価結果を用いて所望エコー抑圧量を決定している。
しかし、利用者の多様化もあるため、限られた評定者だ
けの評価では一般性という点で問題が生じる。また、よ
り正確に求めるためには、統計的に十分な多数の評定者
が必要となってしまう。このように、主観評価では、様
々な拡声通話環境および利用者に対する最適な所望エコ
ー抑圧量を求めることは困難である。主観評価で得られ
た実験値を用いて、所望エコー抑圧量を決定した場合、
先に述べたような適応フィルタの動作や損失挿入を最適
に行うことができないため、ハード規模の増大や、通話
品質の劣化を引き起こしてしまう。The second problem is that, in the subjective evaluation, the desired echo suppression amount is determined using the evaluation result of the evaluator.
However, due to the diversification of users, a problem arises in terms of generality if only a limited number of raters are used. In addition, in order to obtain a more accurate value, a large number of raters that are statistically sufficient are required. As described above, in the subjective evaluation, it is difficult to obtain an optimum desired echo suppression amount for various loudspeaker environments and users. When the desired echo suppression amount is determined using the experimental values obtained in the subjective evaluation,
Since the operation and loss insertion of the adaptive filter as described above cannot be performed optimally, an increase in hardware scale and deterioration in speech quality are caused.

【００１０】そこで、本発明の目的は、様々な拡声通話
環境においてそれぞれの個別の環境に見合った、また、
利用者に対しても普遍的な所望エコー抑圧量を導出でき
るような客観的な方法を提供することにある。[0010] It is an object of the present invention to provide a system which is suitable for each individual environment in various voice communication environments.
It is an object of the present invention to provide an objective method for a user to derive a universal desired echo suppression amount.

【００１１】[0011]

【課題を解決するための手段】所望エコー抑圧量を客観
的に導出するためには、拡声通話環境の音響特性、伝送
特性および拡声通話を行う利用者の聴覚特性の全てを定
量的に扱う必要がある。すなわち、音響特性と伝送特性
に対しては、様々な利用環境で異なるため、導出の際に
はパラメータとして代入を行う。また、聴覚特性は、様
々な利用者をカバーできるように、公知とされる聴覚特
性のデータを用いる。In order to objectively derive the desired echo suppression amount, it is necessary to quantitatively treat all of the acoustic characteristics, transmission characteristics, and the auditory characteristics of a user who makes a loudspeaker call environment. There is. That is, since the acoustic characteristics and the transmission characteristics are different in various use environments, substitution is performed as a parameter at the time of derivation. As the auditory characteristics, known auditory characteristic data is used so as to cover various users.

【００１２】本発明は、様々な拡声通話環境の特性のパ
ラメータと、公知とされている聴覚特性のデータを用
い、客観的に所望エコー抑圧量を導出することを最大の
特徴とする。従来技術では、様々な拡声通話環境の全て
に対して主観評価ができない、主観評価を行う評価者に
結果が依存するという問題が発生したが、本発明は客観
的な評価ができる点で異なる。The most characteristic feature of the present invention is to objectively derive a desired echo suppression amount by using parameters of various characteristics of a loudspeaker environment and data of known auditory characteristics. In the prior art, there was a problem that the subjective evaluation could not be performed for all of the various loudspeaker environments and the result depended on the evaluator performing the subjective evaluation. However, the present invention is different in that the objective evaluation can be performed.

【００１３】[0013]

【作用】本発明において、拡声通話環境の音響特性、伝
送特性をパラメータとして用いているため、様々な利用
環境に見合う所望エコー抑圧量をそれぞれ個別に導出す
ることができる。一方、公知とされている聴覚特性を用
いて所望エコー抑圧量を導出するため、利用者に依ら
ず、かつ、聴覚的に最適な値を決定することができる。
したがって、最適な所望エコー抑圧量を導出することが
可能となり、本発明の目的である様々な通話環境におい
て個別に、その利用者に対しても普遍的に所望エコー抑
圧量を導出できるようになる。さらに、こうして求めた
所望エコー抑圧量を用いて損失挿入処理部を制御すれ
ば、エコーキャンセラの装置設計においては、小型経済
化や通話品質の向上を促すことができる。In the present invention, since the acoustic characteristics and the transmission characteristics of the loudspeaker environment are used as parameters, it is possible to individually derive desired echo suppression amounts suitable for various use environments. On the other hand, since the desired echo suppression amount is derived using a known auditory characteristic, it is possible to determine an acoustically optimal value regardless of the user.
Therefore, it is possible to derive the optimum desired echo suppression amount, and it is possible to derive the desired echo suppression amount individually in various communication environments, which is the object of the present invention, and also for the user. . Further, by controlling the loss insertion processing unit using the desired echo suppression amount obtained in this way, it is possible to promote a reduction in the size of the echo canceller and an improvement in communication quality in the design of the echo canceller.

【００１４】[0014]

【発明の実施の形態】図３は、本発明のエコーキャンセ
ラ21の構成を示す。エコーキャンセラ21は、適応フィル
タ処理部22と損失挿入処理部23および適応フィルタ処理
部22に接続されたマイクロホン３とスピーカ４とから構
成される。また、損失挿入処理部23は、マスカーおよび
マスキー判別手段30、周波数分析手段A31、周波数分析
手段B32、所望エコー抑圧量（＝挿入損失量）決定部3
3、損失挿入手段34、周波数合成手段35から構成され
る。（ここで、マスカーとは、受話音声のみならず、送
話音声、受話音声および周囲雑音を含む受話者が聴取す
る全ての音であり、また、マスキーとは（残留）エコー
を示す。）適応フィルタ処理部22はエコー経路のインパ
ルス応答を推定し、疑似エコー信号を生成し、マイクロ
ホン出力信号から差し引くことによりエコーの消去を行
う。FIG. 3 shows the configuration of an echo canceller 21 according to the present invention. The echo canceller 21 includes an adaptive filter processing unit 22, a loss insertion processing unit 23, and a microphone 3 and a speaker 4 connected to the adaptive filter processing unit 22. The loss insertion processing unit 23 includes a masker and masking discrimination unit 30, a frequency analysis unit A31, a frequency analysis unit B32, and a desired echo suppression amount (= insertion loss amount) determination unit 3.
3. It comprises loss insertion means 34 and frequency synthesis means 35. (Here, the masker means not only the received voice, but also all the sounds that the listener hears, including the transmitted voice, the received voice, and the ambient noise, and the masky indicates (residual) echo.) The filter processor 22 estimates the impulse response of the echo path, generates a pseudo echo signal, and cancels the echo by subtracting it from the microphone output signal.

【００１５】マスカーおよびマスキー判別手段30に送話
入力信号（送話音声s）、残留エコーe,受信入力信号
（周囲雑音ｎ,受話信号ｘ）の各信号を入力してマスカ
ーおよびマスキーの判別を行う。周波数分析手段Ａ31
は受話入力信号を周波数分析（周波数領域に変換）し、
また、周波数分析手段Ｂ32は（残留）エコーeを周波数
分析（周波数領域に変換）する。そして、周波数分析手
段Ａ31と周波数分析手段Ｂ32とマスカーおよびマスキー
判別手段30の各出力信号に基づいて各周波数毎にマスカ
ーの計算、エコーの計算、マスキングしきい値の計算お
よび所望エコー抑圧量の計算を行い、所望エコー抑圧量
を決定し、損失挿入手段34に出力する。損失挿入手段34
は回線に所望の損失を挿入して適応フィルタ処理部22で
消し去ることのできない残留エコーを各周波数毎に残留
エコーに作用し聴覚的に聞こえないレベルに抑制（消
去）する。そして、周波数合成手段35により周波数合成
して送信出力信号とする。Each of a transmission input signal (transmission voice s), a residual echo e, and a reception input signal (ambient noise n, a reception signal x) is input to the masker / maski discriminating means 30 to discriminate the masker and masky. Do. Frequency analysis means A31
Performs frequency analysis (conversion to the frequency domain) of the received input signal,
Further, the frequency analysis means B32 analyzes the frequency of the (residual) echo e (converts it into a frequency domain). Then, based on the output signals of the frequency analyzing means A31, the frequency analyzing means B32, the masker and the masking discriminating means 30, the calculation of the masker, the calculation of the echo, the calculation of the masking threshold and the calculation of the desired echo suppression amount are performed for each frequency. To determine the desired echo suppression amount and output it to the loss insertion means. Loss insertion means 34
The filter inserts a desired loss into the line and suppresses (eliminates) a residual echo that cannot be eliminated by the adaptive filter processing unit 22 to a level that is inaudible by acting on the residual echo for each frequency. Then, the signal is frequency-synthesized by the frequency synthesizing means 35 to obtain a transmission output signal.

【００１６】図４において所望エコーの抑圧量の導出方
法を説明する。所望エコーの抑圧量の導出方法は、大き
く分けてマスカーの計算、エコーの計算、マスキングし
き値の計算、所望エコー抑圧量の計算に分けられる。は
じめに、マスカーの計算とエコーの計算について説明す
る。（マスカーの計算） a-1:エコーのマスカーとなり得る信号は、自分側の音声
（x）、相手側の音声(s)、周囲雑音(n)などであり、こ
れら各信号を入力する。 a-2:これらのマスカーに拡声通話環境のパラメータであ
る音響特性、伝送特性が付加する。具体的には、エコー
は利用者の耳元で検知されるため、全てのマスカーの発
生源からそれを検知する利用者耳元までの音響特性（イ
ンパルス応答）を全てのマスカーに畳み込む。 a-3:また、通信網を通り利用者に検知されるマスカーに
対しては、その伝送特性である伝送遅延、伝送損失、伝
送路周波数特性を付加する。こうして、様々な利用環境
に対する特性をパラメータとして導入して、マスキーの
計算を行う（エコーの計算） b-1〜b-3:(残留)エコー（マスキー）の計算についても
同様に行う。（マスキングしきい値の計算）次に、マスキングしきい
値の計算を行う。本発明では、32 kHzサンプリングでの
計算方法について述べる。 c-1:はじめに全てのマスカーｍ_iを1024サンプルのフレ
ームに切り取る。iは離散時間を表す。そのフレームに
切り取ったサンプルに1024点のハミング窓を掛ける。Referring to FIG. 4, a method of deriving a desired echo suppression amount will be described. The method of deriving the desired echo suppression amount is roughly divided into masker calculation, echo calculation, masking threshold calculation, and desired echo suppression amount calculation. First, the calculation of the masker and the calculation of the echo will be described. (Calculation of masker) a-1: Signals that can be the masker of the echo include the voice (x) of the own side, the voice (s) of the partner side, the ambient noise (n), and the like, and these signals are input. a-2: To these maskers, acoustic characteristics and transmission characteristics which are parameters of the loudspeaker environment are added. Specifically, since the echo is detected at the ear of the user, the acoustic characteristics (impulse response) from the source of all the maskers to the ear of the user who detects it are convolved with all the maskers. a-3: In addition, a transmission delay, a transmission loss, and a transmission line frequency characteristic, which are transmission characteristics, are added to a masker detected by a user through a communication network. In this way, the characteristics for various use environments are introduced as parameters, and the masking is calculated (echo calculation). B-1 to b-3: (residual) echo (masking) calculation is performed in the same manner. (Calculation of Masking Threshold) Next, calculation of a masking threshold is performed. In the present invention, a calculation method using 32 kHz sampling will be described. c-1: cut out all of the masker m _i to 1024 samples of the frame at the beginning. i represents discrete time. Apply a 1024-point hamming window to the sample cut in that frame.

【００１７】ｍｗ_i ＝ｍ_i ×〔0.5−0.5cos(2π(i−0.5)／1024）〕ハミング窓を掛けたサンプルをFFTを用いて周波数領域
に変換する。FFTによって得られた値の極座標表現を計
算し、振幅成分および位相成分をｍｒ_w、ｍｆ_wとする。 c-2:しきい値計算を行う各領域において、エネルギー
（帯域別マスカーエネルギー）の計算を行う。ここで、
領域値であるｗlow_b (領域で最も低い周波数成分）、ｗ
high_b (領域で最も高い周波数成分）の組合せは、MPEG
(ISO 11172-3)の聴覚心理モデルIIの表D.3ａで示される
値に従う。なお、それらの組合せ表を図５に示す。（図
５において、16kHzの帯域の音声を32kHzの標本化周波数
で標本化した場合、1024サンプルのフレームに切り取る
と、16kHzの帯域の音声に対しては513サンプルとなる。
これを１〜49の領域（領域計算の指標ｂ）に分けた各領
域のｗlow,ｗhighとバーク値bvalを図５に示す。）ｍ_b
＝Σｍｒ_w ² (ｗ=ｗlow_bからｗhigh_b)c-3: 他帯域への
影響（ある帯域（領域）のマスカーが他の帯域（領域）
のマスキーに与える影響）の計算を行う。Mw _i = m _i × [0.5−0.5 cos (2π (i−0.5) / 1024)] A sample subjected to a Hamming window is transformed into a frequency domain using an FFT. Calculate the polar representation of the values obtained by the FFT, the amplitude and phase components mr _w, and mf _w. c-2: Calculate energy (masker energy for each band) in each region where threshold calculation is performed. here,
The region values wlow _b (the lowest frequency component in the region), w
The combination of high _b (the highest frequency component in the area) is MPEG
According to the values shown in Table D.3a of the psychoacoustic model II of (ISO 11172-3). In addition, the combination table of them is shown in FIG. (In FIG. 5, when audio in a 16 kHz band is sampled at a sampling frequency of 32 kHz, if the audio is cut into a frame of 1024 samples, 513 samples are obtained for audio in a 16 kHz band.
FIG. 5 shows wlow, whigh and bark value bval of each area obtained by dividing the area into 1 to 49 areas (index b of area calculation). ) _Mb
= Σmr _w ² (w = low _b to w high _b ) c-3: Influence on other bands (a masker in one band (region) is replaced by another band (region)
Effect on the musky).

【００１８】広がり関数を用いて、分割したエネルギー
を畳み込む。ｍcb_b ＝Σｍ_bb ×sprdng f(bval_bb,bval_b）（bb=1から
bmax） bval_bは、領域中央のバーク値で、その値は図５に示
す。広がり関数は、先に述べた聴覚心理モデルIIのD.2.
3節で定義された以下の関数を用いる。 tmpx＝1.05(j−i) ｘ＝8×min（(tmpx−0.5)² −2(tmpx−0.5),0) tmpy＝15.811389+7.5(tmpx+0.474)−17.5(1.0+(tmpx+0.474)²)^0.5 if(tmpy＜−100)then(sprdngf(i,j)=0) else(sprdngf(i,j)=10^(x+tmpy)/10) c-4:同時マスキングしきい値（ある瞬間のマスキングし
きい値）の計算逆正規化したエネルギーを計算する。Using the spread function, the divided energy is convolved. mcb _b = Σm _bb × sprdng f (bval _bb , bval _b ) (from bb = 1
bmax) bval _b is the bark value at the center of the area, and the value is shown in FIG. The spread function is D.2. Of the psychoacoustic model II mentioned earlier.
Use the following function defined in section 3. tmpx = 1.05 (j−i) x = 8 × min ((tmpx−0.5) ² −2 (tmpx−0.5), 0) tmpy = 15.811389 + 7.5 (tmpx + 0.474) −17.5 (1.0+ (tmpx + 0.474) ² ) ^0.5 if (tmpy <-100) then (sprdngf (i, j) = 0) else (sprdngf (i, j) = 10 ^{(x + tmpy) / 10} ) c-4: Simultaneous masking threshold (with Calculation of instantaneous masking threshold) Calculate the denormalized energy.

【００１９】mn_b=mcb_b×rnorm_b 正規化定数は以下の式で計算する。 rnorm_b ＝１／｛Σsprdngf(bval_bb,bval_b)｝（bb=0から
bmax）領域計算の指標ｂから、周波数上の指標ｗに変換する。 mb_w＝mn_b ／（ｗhigh_b−ｗlow_b＋1） c-5:継時マスキング（ある瞬間の音が次の瞬間に与える
マスキング）しきい値を計算する。 c-6:可聴レベルとの比較・マスキングしきい値の決定同
時マスキングしきい値mb_wと可聴エネルギーしきい値abs
thr_wと比較し、大きい値をマスキングしきい値thr_wとす
る。Mn _b = mcb _b × rnorm _{b The} normalization constant is calculated by the following equation. rnorm _b = 1 / {sprdngf (bval _bb , bval _b )} (from bb = 0
bmax) The index b in the area calculation is converted into an index w on the frequency. _{_{mb w = mn b / (whigh}} b -wlow b +1) c-5: temporal masking to calculate the (a certain moment of sound gives masking at the moment of the next) threshold. c-6: Comparison with audible level / determination of masking threshold Simultaneous masking threshold mb _w and audible energy threshold abs
Compared to thr _w , a larger value is set as a masking threshold thr _w .

【００２０】thr_w＝max（mb_w,absthr_w）ここで、可聴エネルギーしきい値absthr_wの値は、MPEG
(ISO 11172-3)の聴覚心理モデルIIの表D.4ａで示される
値に従う。なお、それらの表を図６に示す。（図６には、指標に対する絶対しきい値（可聴エネルギ
ーしきい値absthr_w）を示す。）（エコーエネルギーの計算） d-1:通信環境パラメータを付加したエコーのエネルギー
を計算する。エコーｅ_iを1024サンプルのフレームに切
り取る。そのフレームに切り取ったサンプルに、1024点
のハミング窓を掛ける。Thr _w = max (mb _w , absthr _w ) where the value of the audible energy threshold absthr _w is MPEG
According to the values shown in Table D.4a of the psychoacoustic model II of (ISO 11172-3). These tables are shown in FIG. (FIG. 6 shows an absolute threshold (audible energy threshold absthr _w ) for the index.) (Calculation of echo energy) d-1: Calculate the energy of the echo to which the communication environment parameter is added. Cut the echo e _i to 1024 samples of the frame. A 1024-point hamming window is applied to the sample cut in the frame.

【００２１】ｅｗ_i ＝ｅ_i ×〔0.5−0.5cos(2π(i−0.5)／1024）〕
ハミング窓を掛けたサンプルをFFTを用いて周波数領域
に変換する。FFTによって得られた値の極座標表現を計
算し、振幅成分および位相成分をｅｒ_w、ｅｆ_wとする。 d-2:しきい値計算を行う各領域において、エネルギーの
計算を行う。Ew _i = e _i × [0.5−0.5 cos (2π (i−0.5) / 1024)]
The sample with the hamming window is transformed to the frequency domain using FFT. The polar coordinate representation of the value obtained by the FFT is calculated, and the amplitude component and the phase component are set as er _w and ef _w . d-2: Calculate energy in each region where threshold calculation is performed.

【００２２】ｅ_b＝Σｅｒ_w ² （ｗ＝ｗlow_bからｗhig
h_b）（所望エコー抑圧量の計算） e-1: 先に決定したマスキングしきい値thr_wとエコーの
エネルギーｅ_b の差を取り、エコー検知量edvを計算す
る。 edv＝ｅ_b −thr_w e-2: 最後にエコー検知量から本フレームの所望エコー
抑圧量dlを決定する。E _b = Σer _w ² (w = wlow _b to whig
h _b) (optionally echo suppression amount calculation) e-1: taking the difference in masked previously determined threshold thr _w and echo energy e _b, it calculates the echo detection amount EDV. _{_{edv = e b -thr w e-}} 2: finally determine the desired echo suppression amount dl of the present frame from the echo detection amount.

【００２３】dl＝0.8×edv＋8.4 以上の計算を各フレーム毎に行い、得られたデータ集合
の統計的に信頼のおける区間の最大値を周波数帯域毎の
所望エコー抑圧量とする。このような計算を行うため、
想定する拡声通話環境に合わせた音響特性や伝送特性の
パラメータを代入することにより、様々な環境での所望
エコー抑圧量を求めることができる。また、人間の聴覚
の公知のデータを用いて計算しているため、求められた
所望エコー抑圧量は、一般性があり、様々な利用者に対
して、適当な値となっている。Dl = 0.8 × edv + 8.4 The above calculation is performed for each frame, and the maximum value of the statistically reliable section of the obtained data set is set as the desired echo suppression amount for each frequency band. To perform such calculations,
By substituting the parameters of the acoustic characteristics and the transmission characteristics according to the assumed voice communication environment, it is possible to obtain a desired echo suppression amount in various environments. Further, since the calculation is performed using known data of human hearing, the desired echo suppression amount obtained has generality and is an appropriate value for various users.

【００２４】[0024]

【発明の効果】以上説明したように、本発明は、エコー
信号をマスキングする全てのマスカー信号からマスキン
グしきい値を求め、そのマスキングしきい値とエコー信
号から、所望エコー抑圧量を導出、すなわち、拡声通話
環境をパラメータとして代入し、聴覚特性のデータを用
いて所望エコー抑圧量を導出している。そのため、様々
な拡声通話環境に見合った、かつ、様々な利用者に対し
て一般的な所望エコー抑圧量の導出ができ、この導出し
た所望エコー抑圧量を反響抑圧方法および反響抑圧装置
に適用してエコーを効果的に抑制することができる。As described above, according to the present invention, a masking threshold value is obtained from all masker signals for masking an echo signal, and a desired echo suppression amount is derived from the masking threshold value and the echo signal. Then, the voice communication environment is substituted as a parameter, and the desired echo suppression amount is derived using the data of the auditory characteristics. Therefore, it is possible to derive a general desired echo suppression amount suitable for various loudspeaker communication environments and for various users, and apply the derived desired echo suppression amount to an echo suppression method and an echo suppression device. Echo can be effectively suppressed.

[Brief description of the drawings]

【図１】拡声通話系の模式図。FIG. 1 is a schematic diagram of a voice communication system.

【図２】エコーキャンセラの模式図。FIG. 2 is a schematic diagram of an echo canceller.

【図３】本発明の反響消去装置の構成図。FIG. 3 is a configuration diagram of the echo canceller of the present invention.

【図４】本発明の一実施例に係わる所望エコー抑圧量導
出方法のフローを示す図。FIG. 4 is a diagram showing a flow of a method for deriving a desired echo suppression amount according to one embodiment of the present invention.

【図５】しきい値計算の領域値をまとめた表。FIG. 5 is a table summarizing region values for threshold calculation.

【図６】可聴エネルギーしきい値をまとめた表。FIG. 6 is a table summarizing audible energy thresholds.

[Explanation of symbols]

１,３送話用マイクロホン２,４受話用スピーカ５,７送話信号増幅器６,８受話信号増幅器 10,11 送話者 12 受話者 21 エコーキャンセラ 22 適応フィルタ処理部 23 損失挿入処理部 30 マスカーおよびマスキー判別手段 31,32 周波数分析手段Ａ，Ｂ 33 所望エコー抑圧量決定部 34 損失挿入手段 35 周波数合成手段 1,3 Transmitting microphone 2,4 Receiving speaker 5,7 Transmitting signal amplifier 6,8 Receiving signal amplifier 10,11 Transmitter 12 Receiver 21 Echo canceller 22 Adaptive filter processing unit 23 Loss insertion processing unit 30 Masker And masking discriminating means 31, 32 Frequency analyzing means A, B 33 Desired echo suppression amount determining unit 34 Loss inserting means 35 Frequency synthesizing means

───────────────────────────────────────────────────── フロントページの続き (72)発明者金田豊東京都千代田区大手町二丁目３番１号日本電信電話株式会社内Ｆターム(参考） 5D020 AC01 5K027 BB03 DD10 5K046 AA01 BB00 CC29 HH01 HH79 ────────────────────────────────────────────────── ─── Continued on the front page (72) Inventor Yutaka Kaneda F-term (reference) in Nippon Telegraph and Telephone Corporation 2-3-1 Otemachi 2-chome, Chiyoda-ku, Tokyo 5D020 AC01 5K027 BB03 DD10 5K046 AA01 BB00 CC29 HH01 HH79

Claims

[Claims]

1. A masking threshold value for an echo signal is obtained from all masker signals for masking an echo signal, and a desired echo suppression amount is derived from the masking threshold value and the echo signal. A reverberation suppression method comprising suppressing an echo signal based on the echo signal.

2. The echo suppression method according to claim 1, wherein the echo signal used for deriving the desired echo suppression amount includes acoustic characteristics in a room where loudspeaker communication is performed and in the vicinity of an evaluator;
And a signal to which transmission characteristics of a communication network are added.

3. The echo suppression method according to claim 1, wherein all of the masker signals used for deriving the desired echo suppression amount are acoustic characteristics in a room where loudspeaker communication is performed and in the vicinity of an evaluator, and transmission characteristics of a communication network. The echo suppression method is characterized by using a signal to which an echo is added.

4. The echo suppression method according to claim 1, wherein the masking threshold for the echo signal is calculated as
A reverberation suppression method characterized by a masking threshold determined by a simultaneous masking phenomenon and a successive masking phenomenon by all maskers and an audible level determined by ambient noise and human auditory characteristics.

5. The echo suppression method according to claim 1, wherein the calculation of the desired echo suppression amount is obtained from an echo detection level determined from the echo signal and the masking threshold. .

6. A reverberation suppression apparatus comprising a desired echo suppression amount determining section and loss insertion means, wherein the desired echo suppression amount determining section determines a masking threshold value for an echo signal from all masker signals for masking the echo signal. The echo suppressor derives a desired echo suppression amount from the masking threshold and the echo signal, and the loss insertion means suppresses the echo signal based on the desired echo suppression amount.

7. The echo suppressor according to claim 6, wherein the echo signal used for deriving the desired echo suppression amount is a sound characteristic in a room where loudspeaker communication is performed and in the vicinity of an evaluator;
And a signal to which transmission characteristics of a communication network are added.

8. The echo suppressor according to claim 6, wherein all the masker signals used for deriving the desired echo suppression amount are acoustic characteristics in a room where loudspeaker communication is performed and in the vicinity of an evaluator, and transmission characteristics of a communication network. A reverberation suppressor characterized by using a signal added with a symbol.

9. The echo suppression apparatus according to claim 6, wherein the calculation of the masking threshold for the echo signal comprises: a masking threshold determined by a simultaneous masking phenomenon and a successive masking phenomenon by all maskers;
A reverberation suppressor, which is obtained from an audible level determined by ambient noise and human auditory characteristics. 10. The reverberation suppression apparatus according to claim 6, wherein the desired echo suppression amount is calculated from an echo detection level determined from the echo signal and the masking threshold.