JP5049256B2

JP5049256B2 - Echo canceller and echo canceling method

Info

Publication number: JP5049256B2
Application number: JP2008307912A
Authority: JP
Inventors: 俊輔菅沼
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2008-12-02
Filing date: 2008-12-02
Publication date: 2012-10-17
Anticipated expiration: 2028-12-02
Also published as: JP2010135936A

Description

本発明は、音響エコー消去技術に関し、特にマイクおよびスピーカを備えた会議システムに好適な音響エコー消去技術に関する。 The present invention relates to an acoustic echo cancellation technique, and more particularly to an acoustic echo cancellation technique suitable for a conference system having a microphone and a speaker.

特許文献１には、電話会議システムもしくはテレビ会議システムにおいて、複数のマイクロホン素子を有するマイクロホンアレイを使って、マイク間位相差から、スピーカ音のみ存在する帯域を判定し、その帯域にのみ音響エコーキャンセラを適応制御する技術が開示されている。この技術によれば、会議室等の状況に応じて動的にエコーキャンセラの制御を行うことができ、音響エコーの抑圧性能を高めることができる。
特開２００８−１４１７１８号公報 In Patent Document 1, in a telephone conference system or a video conference system, using a microphone array having a plurality of microphone elements, a band in which only speaker sound exists is determined from a phase difference between microphones, and an acoustic echo canceller is applied only to that band. A technique for adaptively controlling the above is disclosed. According to this technique, the echo canceller can be controlled dynamically according to the situation of the conference room or the like, and the acoustic echo suppression performance can be enhanced.
JP 2008-141718 A

特許文献１に記載の技術では、マイクロホンアレイへの入力音声信号を複数の周波数帯域に分割し、それぞれの周波数帯域に対して音響経路の学習や推定を行っている。 In the technique described in Patent Document 1, an audio signal input to a microphone array is divided into a plurality of frequency bands, and an acoustic path is learned and estimated for each frequency band.

ところで、実使用環境において完全な推定を行うことは困難であり、実際には推定誤差が発生して、音響エコーの残留成分である残留エコーが発生することがある。このため、特許文献１に記載の技術では、フレーム毎に、入力音声信号に占める音響エコーの大きさを推定し、これが所定の閾値以上であるフレームの送信をボイススイッチにより遮断している（特許文献１の段落００３４等参照）。しかし、このようにすると、瞬間的ではあるが、音声信号の送信が途切れ、全二重通話状態を維持することができない。 By the way, it is difficult to perform complete estimation in an actual use environment, and an estimation error may actually occur and a residual echo that is a residual component of an acoustic echo may occur. For this reason, in the technique described in Patent Document 1, the size of the acoustic echo occupied in the input audio signal is estimated for each frame, and transmission of a frame in which this exceeds a predetermined threshold is blocked by a voice switch (patent) (Refer to paragraph 0034 of literature 1). However, if this is done, the transmission of the voice signal is interrupted instantaneously, but the full-duplex call state cannot be maintained.

本発明は上記事情に鑑みてなされたものであり、本発明の目的は、音響エコーキャンセラのエコー抑圧効果をより高めることが可能な技術を提供することにある。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technique capable of further enhancing the echo suppression effect of an acoustic echo canceller.

上記課題を解決するために、本発明は、フレームの送信を遮断することなく、音響エコーキャンセラ適応後の残留エコーを抑制する。音響エコーキャンセラ適応後の音声信号における残留エコー成分の特徴と、会話音声成分の特徴とが異なることに鑑み、音響エコーキャンセラ適応後の音声信号の音圧スペクトル情報から残留エコー成分を推定し抑圧する。 In order to solve the above problems, the present invention suppresses residual echo after adaptation to an acoustic echo canceller without interrupting frame transmission. In consideration of the difference between the characteristics of the residual echo component in the speech signal after adaptation to the acoustic echo canceller and the speech component, the residual echo component is estimated and suppressed from the sound pressure spectrum information of the speech signal after adaptation to the acoustic echo canceller. .

例えば、本発明のエコーキャンセラは、
マイクに入力された音声信号である入力音声信号から、スピーカから出力された音声信号である出力音声信号の音響エコーを消去するエコーキャンセラであって、
前記出力音声信号の音響エコーを周波数帯域毎に推定して擬似エコー信号を生成する擬似エコー生成手段と、
前記入力音声信号から前記擬似エコー信号を差し引いて、前記入力音声信号から前記出力音声信号の音響エコーをキャンセルするエコーキャンセル手段と、
前記音響エコーがキャンセルされた前記入力音声信号について、周波数帯域毎に、時間方向および周波数方向において当該周波数帯域の信号成分と隣接する範囲の信号成分に基づいて、当該周波数帯域の信号成分から前記音響エコーの残留成分である残留エコーを消去する残留エコー消去手段と、を有する。 For example, the echo canceller of the present invention is
An echo canceler that eliminates an acoustic echo of an output audio signal that is an audio signal output from a speaker from an input audio signal that is an audio signal input to a microphone,
Pseudo echo generating means for estimating the acoustic echo of the output audio signal for each frequency band and generating a pseudo echo signal;
Echo cancellation means for subtracting the pseudo echo signal from the input audio signal to cancel the acoustic echo of the output audio signal from the input audio signal;
For the input audio signal in which the acoustic echo is canceled, for each frequency band, based on the signal component in the range adjacent to the signal component in the frequency band in the time direction and the frequency direction, the acoustic signal is extracted from the signal component in the frequency band. And a residual echo canceling means for canceling a residual echo that is a residual component of the echo.

本発明によれば、音響エコーキャンセラのエコー抑圧効果をより高めることができる。 According to the present invention, the echo suppression effect of the acoustic echo canceller can be further enhanced.

本発明を、マイクおよびスピーカを備えた会議システムに用いられるマイクアレイ装置のエコーキャンセラに適用する場合を例にとり、本発明の実施の形態を説明する。なお、マイクアレイ装置は、ＩＰ網経由で音声信号を送受信する機能および音響エコーキャンセラ機能を備えている。
［第一実施の形態］
図１は、本発明の第一実施の形態に係るエコーキャンセラ１の概略構成図である。 An embodiment of the present invention will be described by taking as an example the case where the present invention is applied to an echo canceller of a microphone array apparatus used in a conference system including a microphone and a speaker. Note that the microphone array device has a function of transmitting / receiving audio signals via an IP network and an acoustic echo canceller function.
[First embodiment]
FIG. 1 is a schematic configuration diagram of an echo canceller 1 according to the first embodiment of the present invention.

図示するように、エコーキャンセラ１は、遠端音声信号入力部１０１と、近端音声信号入力部１０２と、ＦＦＴ部１０３、１０４と、擬似エコー生成部１０５と、線形エコーキャンセラ部１０６と、スペクトルサブトラクション部１０７と、残留エコー消去部１０８と、ＩＦＦＴ部１０９と、近端音声信号出力部１１０と、を有する。 As illustrated, the echo canceller 1 includes a far-end speech signal input unit 101, a near-end speech signal input unit 102, FFT units 103 and 104, a pseudo echo generation unit 105, a linear echo canceller unit 106, a spectrum, A subtraction unit 107; a residual echo canceling unit 108; an IFFT unit 109; and a near-end audio signal output unit 110.

遠端音声信号入力部１０１は、通話相手から受信したデジタル音声信号（以下、遠端音声信号と呼ぶ）の入力端子である。なお、遠端音声信号は、ＤＡ変換された後にスピーカから出力される。 The far-end audio signal input unit 101 is an input terminal for a digital audio signal (hereinafter referred to as a far-end audio signal) received from a call partner. Note that the far-end audio signal is output from the speaker after DA conversion.

近端音声信号入力部１０２は、マイクに入力され、ＡＤ変換されたデジタル音声信号（以下、近端音声信号）の入力端子である。近端音声信号は、本エコーキャンセラ１により音響エコーが消去された後、通話相手へ送信される。 The near-end audio signal input unit 102 is an input terminal for a digital audio signal (hereinafter referred to as a near-end audio signal) input to the microphone and subjected to AD conversion. The near-end voice signal is transmitted to the other party after the acoustic echo is erased by the echo canceller 1.

ＦＦＴ部１０３は、遠端音声信号入力部１０１に入力された遠端音声信号にＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）を施して、遠端音声信号の周波数領域情報を出力する。 The FFT unit 103 performs FFT (Fast Fourier Transform) on the far-end audio signal input to the far-end audio signal input unit 101 and outputs frequency domain information of the far-end audio signal.

ＦＦＴ部１０４は、近端音声信号入力部１０２に入力された近端音声信号にＦＦＴを施して、近端音声信号の周波数領域情報を出力する。 The FFT unit 104 performs FFT on the near-end audio signal input to the near-end audio signal input unit 102 and outputs frequency domain information of the near-end audio signal.

擬似エコー生成部１０５は、ＦＦＴ部１０３から出力された遠端音声信号の周波数領域情報に基づいて、所定の周波数帯域毎に、適応フィルタを用いて遠端音声信号の音響エコーの成分を推定して、擬似エコー信号の周波数領域情報を生成する。 The pseudo echo generation unit 105 estimates an acoustic echo component of the far end speech signal using an adaptive filter for each predetermined frequency band based on the frequency domain information of the far end speech signal output from the FFT unit 103. Thus, the frequency domain information of the pseudo echo signal is generated.

適応フィルタは、参照信号（ここでは遠端音声信号）が本フィルタに入力された場合に、その参照信号がスピーカから出力され、音波として伝わってマイクに到達するまでに発生する変化と（可能な限り）同等な変化をその参照信号に与えた結果（信号）を出力するように作成されたフィルタである。このような適応フィルタは、通話状況に応じて行われる音響環境学習の結果として作成される。擬似エコー生成部１０５および適応フィルタには、特許文献１に記載の技術を含む様々な既存の技術を利用できる。 When the reference signal (here, the far-end audio signal) is input to the filter, the adaptive filter is a change that occurs until the reference signal is output from the speaker and transmitted as a sound wave to reach the microphone. As long as it is a filter created to output the result (signal) of applying an equivalent change to its reference signal. Such an adaptive filter is created as a result of acoustic environment learning performed in accordance with the call situation. Various existing techniques including the technique described in Patent Document 1 can be used for the pseudo echo generation unit 105 and the adaptive filter.

線形エコーキャンセラ部１０６は、ＦＦＴ部１０４から出力された近端音声信号の周波数領域情報から擬似エコー信号の周波数領域情報を差し引くことで、近端音声信号に含まれている音響エコーを消去する。 The linear echo canceller unit 106 subtracts the frequency domain information of the pseudo echo signal from the frequency domain information of the near end audio signal output from the FFT unit 104 to eliminate the acoustic echo included in the near end audio signal.

スペクトルサブトラクション部１０７は、線形エコーキャンセラ部１０６により音響エコーが消去された近端音声信号の音圧スペクトル情報に対して、所定の周波数帯域毎に、音圧を、擬似エコー信号の音圧スペクトル情報の同じ周波数帯域における音圧に応じて抑制する。スペクトルサブトラクション部１０７には、既存のスペクトルサブトラクション技術を用いることができる。 The spectral subtraction unit 107, for the sound pressure spectrum information of the near-end speech signal from which the acoustic echo has been eliminated by the linear echo canceller unit 106, for each predetermined frequency band, the sound pressure, and the sound pressure spectrum information of the pseudo echo signal Are suppressed according to the sound pressure in the same frequency band. The spectrum subtraction unit 107 can use an existing spectrum subtraction technique.

残留エコー消去部１０８は、スペクトルサブトラクション部１０７より出力された近端音声信号の音圧スペクトル情報の例えば１フレームシフト単位に対して、所定の周波数帯域毎に、着目する周波数帯域の成分と時間方向および周波数方向において隣接する範囲の成分に基づいて、着目する周波数帯域の成分から、音響エコーの残留成分（残留エコー）を消去する。 The residual echo canceling unit 108, for example, in units of one frame shift of the sound pressure spectrum information of the near-end speech signal output from the spectral subtraction unit 107, and the frequency band component and time direction of interest for each predetermined frequency band The residual component of the acoustic echo (residual echo) is eliminated from the component of the frequency band of interest based on the components in the adjacent range in the frequency direction.

擬似エコー生成部１０５の適応フィルタの作成に必要な音響環境学習に誤りが全く存在しなければ、十分な音響エコーの抑圧が期待できる。しかし、実使用での音響環境は常時変化する。このため、音響環境の学習を完全に行うことは難しく、その学習結果には多少の誤差または誤りが発生する。このため、スペクトルサブトラクション部１０７から出力される近端音声信号の音圧スペクトル情報には、消去しきれなかった残留エコーが存在することとなる。そこで、本実施の形態においては、スペクトルサブトラクション部１０７から出力される近端音声信号の音圧スペクトル情報において、残留エコー成分の特徴と会話音声成分の特徴とが異なることに鑑み、残留エコー消去部１０８により、この音圧スペクトル情報から残留エコーの成分を推定し、残留エコーを抑圧している。なお、残留エコー消去部１０８の詳細については後述する。 If there is no error in the acoustic environment learning necessary for creating the adaptive filter of the pseudo echo generation unit 105, sufficient acoustic echo suppression can be expected. However, the acoustic environment in actual use changes constantly. For this reason, it is difficult to completely learn the acoustic environment, and some errors or errors occur in the learning result. For this reason, in the sound pressure spectrum information of the near-end speech signal output from the spectrum subtraction unit 107, there are residual echoes that could not be eliminated. Therefore, in the present embodiment, in view of the fact that the characteristic of the residual echo component is different from the characteristic of the conversational voice component in the sound pressure spectrum information of the near-end speech signal output from the spectrum subtraction unit 107, the residual echo canceling unit By 108, the residual echo component is estimated from the sound pressure spectrum information, and the residual echo is suppressed. Details of the residual echo canceller 108 will be described later.

ＩＦＦＴ部１０９は、残留エコー消去部１０８から出力された近端音声信号の周波数領域情報にＩＦＦＴ（ＩｎｖｅｒｓｅＦＦＴ）を施して、近端音声信号を出力する。 IFFT section 109 performs IFFT (Inverse FFT) on the frequency domain information of the near-end audio signal output from residual echo cancellation section 108 and outputs a near-end audio signal.

近端音声信号出力部１１０は、ＩＦＦＴ部１０９から出力された近端音声信号の出力端子である。 The near-end audio signal output unit 110 is an output terminal for the near-end audio signal output from the IFFT unit 109.

次に、残留エコー消去部１０８の詳細を説明する。 Next, details of the residual echo canceling unit 108 will be described.

図１に示すように、残留エコー消去部１０８は、残留エコー推定部１０８１と、残留エコー抑圧部１０８２と、を有する。 As shown in FIG. 1, the residual echo canceller 108 includes a residual echo estimator 1081 and a residual echo suppressor 1082.

残留エコー推定部１０８１は、スペクトルサブトラクション部１０７より出力された近端音声信号の音圧スペクトル情報の例えば１フレームシフト単位に対して、所定の周波数帯域毎に、着目する周波数帯域と時間方向および周波数方向において隣接する範囲の成分のなかから、音圧レベルの高い成分を少なくとも一つ選定する。そして、選定した成分の音圧レベルと着目する周波数帯域の成分の音圧レベルとの比較結果に基づいて、着目する周波数帯域の成分に残留エコーが含まれているか否かを推定する。 The residual echo estimator 1081 performs a frequency band of interest, a time direction, and a frequency for each predetermined frequency band with respect to, for example, one frame shift unit of the sound pressure spectrum information of the near-end speech signal output from the spectrum subtraction unit 107. At least one component having a high sound pressure level is selected from components in a range adjacent in the direction. Then, based on the comparison result between the sound pressure level of the selected component and the sound pressure level of the component in the frequency band of interest, it is estimated whether or not the residual echo is included in the component of the frequency band of interest.

残留エコー抑圧部１０８２は、残留エコー推定部１０８１により残留エコーが含まれていると推定された周波数帯域の成分について、この周波数帯域と時間方向および周波数方向において隣接する範囲の成分のなかから、音圧レベルの低い成分を少なくとも一つ選定する。そして、選定した成分の音圧レベルとこの周波数帯域の成分の音圧レベルとの差に応じて、この周波数帯域の成分を抑圧する。 The residual echo suppression unit 1082 generates a sound from a frequency band component estimated by the residual echo estimation unit 1081 to include a residual echo from components in a range adjacent to the frequency band in the time direction and the frequency direction. Select at least one component with a low pressure level. Then, according to the difference between the sound pressure level of the selected component and the sound pressure level of the component in this frequency band, the component in this frequency band is suppressed.

図２は、残留エコーの推定・抑圧処理の第一の例を説明するための図である。ここでは、遠端音声信号入力部１０１に入力される遠端音声信号および近端音声信号入力部１０２に入力される近端音声信号がサンプリング周波数３２ｋＨｚのデジタル音声信号であり、ＦＦＴ部１０３、１０４が、入力されたデジタル音声信号をフレーム長２０４８ポイントのＦＦＴにより１０２４個の周波数帯域に分割した上で、フレームシフト単位を１０２４ポイント（３２ｍｓ）とする場合を想定している。 FIG. 2 is a diagram for explaining a first example of residual echo estimation / suppression processing. Here, the far-end audio signal input to the far-end audio signal input unit 101 and the near-end audio signal input to the near-end audio signal input unit 102 are digital audio signals with a sampling frequency of 32 kHz, and the FFT units 103 and 104 However, it is assumed that the input digital audio signal is divided into 1024 frequency bands by FFT with a frame length of 2048 points and the frame shift unit is 1024 points (32 ms).

図２において、符号２０は、スペクトルサブトラクション部１０７より出力された近端音声信号の音圧スペクトル情報である。ここで、縦軸２１は周波数、横軸２２は時間である。図示するように、近端音声信号の音圧スペクトル情報２０は１０２４個の周波数帯域に分割され、そのフレームシフト単位２３は３２ｍｓである。 In FIG. 2, reference numeral 20 denotes sound pressure spectrum information of the near-end speech signal output from the spectrum subtraction unit 107. Here, the vertical axis 21 is frequency and the horizontal axis 22 is time. As shown in the figure, the sound pressure spectrum information 20 of the near-end audio signal is divided into 1024 frequency bands, and the frame shift unit 23 is 32 ms.

残留エコー推定部１０８１は、例えば１フレームシフト単位毎に、１０２４個の周波数帯域の一つ一つに着目し、着目する周波数帯域に残留エコー成分が存在するか否かを以下の要領で判断する。 The residual echo estimation unit 1081 pays attention to each of 1024 frequency bands, for example, for each frame shift unit, and determines whether or not there is a residual echo component in the frequency band of interest in the following manner. .

着目する周波数帯域を着目帯域Ａとする。先ず、残留エコー推定部１０８１は、着目帯域Ａおよび８個の比較候補周波数帯域Ｂ１〜Ｂ８を含む処理対象ブロックを決定する。具体的には、着目帯域Ａの属するフレームシフト単位の直前のフレームシフト単位から、着目帯域Ａと同じ周波数帯域Ｂ４および着目帯域Ａの上下に隣接する周波数帯域Ｂ１、Ｂ６を比較候補周波数帯域として選出する。また、着目帯域Ａの属するフレームシフト単位から、着目帯域Ａの上下に隣接する周波数帯域Ｂ２、Ｂ７を比較候補周波数帯域として選出する。さらに、着目帯域Ａの属する１フレームシフト単位の直後のフレームシフト単位から、着目帯域Ａと同じ周波数帯域Ｂ５および着目帯域Ａの上下に隣接する周波数帯域Ｂ３、Ｂ８を比較候補周波数帯域として選出する。 Let the frequency band of interest be the bandwidth of interest A. First, the residual echo estimation unit 1081 determines a processing target block including the target band A and the eight comparison candidate frequency bands B1 to B8. Specifically, from the frame shift unit immediately before the frame shift unit to which the target band A belongs, the same frequency band B4 as the target band A and frequency bands B1 and B6 adjacent to the upper and lower sides of the target band A are selected as comparison candidate frequency bands. To do. Further, frequency bands B2 and B7 adjacent to the top and bottom of the target band A are selected as comparison candidate frequency bands from the frame shift unit to which the target band A belongs. Further, from the frame shift unit immediately after the one frame shift unit to which the target band A belongs, the same frequency band B5 as the target band A and frequency bands B3 and B8 adjacent to the upper and lower sides of the target band A are selected as comparison candidate frequency bands.

次に、残留エコー推定部１０８１は、以上のようにして選出した、処理対象ブロック内の比較候補周波数帯域Ｂ１〜Ｂ８の音圧レベルと着目帯域Ａの音圧レベルとを比較し、着目帯域Ａの音圧レベルが比較候補周波数帯域Ｂ１〜Ｂ８の音圧レベルから乖離しているか否かを判断する。具体的には、図２の条件２４に示すように、着目帯域Ａの音圧レベルが、比較候補周波数帯域Ｂ１〜Ｂ４、Ｂ６〜Ｂ８の音圧レベルのうちの最も高い音圧レベルＤ１、および比較候補周波数帯域Ｂ１〜Ｂ３、Ｂ５〜Ｂ８の音圧レベルのうちの最も高い音圧レベルＤ２の少なくとも一方より高いか否かを判断する。その結果、着目帯域Ａの音圧レベルが比較候補周波数帯域Ｂ１〜Ｂ８の音圧レベルから乖離しているならば（音圧レベルＤ１、音圧レベルＤ２の少なくとも一方よりも高いならば）、この着目帯域Ａに残留エコー成分が存在すると推定する。 Next, the residual echo estimation unit 1081 compares the sound pressure level of the comparison candidate frequency bands B1 to B8 in the processing target block and the sound pressure level of the target band A, selected as described above, and determines the target band A. It is determined whether or not the sound pressure level deviates from the sound pressure levels of the comparison candidate frequency bands B1 to B8. Specifically, as shown in the condition 24 of FIG. 2, the sound pressure level D1 having the highest sound pressure level in the target band A among the sound pressure levels in the comparison candidate frequency bands B1 to B4 and B6 to B8, and It is determined whether or not it is higher than at least one of the highest sound pressure levels D2 among the sound pressure levels of the comparison candidate frequency bands B1 to B3 and B5 to B8. As a result, if the sound pressure level of the target band A deviates from the sound pressure levels of the comparison candidate frequency bands B1 to B8 (if it is higher than at least one of the sound pressure level D1 and the sound pressure level D2), this It is estimated that a residual echo component exists in the band A of interest.

残留エコー抑圧部１０８２は、残留エコー推定部１０８１により残留エコーが存在すると推定された着目帯域Ａの音圧レベルを、着目帯域Ａの音圧レベルが比較候補周波数帯域Ｂ１〜Ｂ８の音圧レベルから乖離しなくなるように抑圧する。具体的には、図２の条件２５に示すように、着目帯域Ａの音圧レベルから、音圧レベルＤ１、Ｄ２のうちの低い方の音圧レベルを差し引いた値を、着目帯域Ａの音圧レベルの抑圧量Ｓに決定する。そして、着目帯域Ａの音圧レベルを抑圧量Ｓだけ抑圧する。これにより、着目帯域Ａに存在する残留エコー成分を抑圧する。 The residual echo suppression unit 1082 determines the sound pressure level of the target band A estimated by the residual echo estimation unit 1081 to have a residual echo from the sound pressure level of the comparison candidate frequency bands B1 to B8. Suppress it so that it does not diverge. Specifically, as indicated by condition 25 in FIG. 2, a value obtained by subtracting the lower sound pressure level of the sound pressure levels D1 and D2 from the sound pressure level of the target band A is used as the sound of the target band A. The pressure level suppression amount S is determined. Then, the sound pressure level of the band A of interest is suppressed by the suppression amount S. Thereby, the residual echo component which exists in the attention zone A is suppressed.

例えば、着目帯域Ａの音圧レベルが５０、比較候補周波数帯域Ｂ１、Ｂ２、Ｂ３、Ｂ４、Ｂ５、Ｂ６、Ｂ７、Ｂ８の音圧レベルが、それぞれ１０、２０、１０、２０、６０、１０、１０、１０であるとする。この場合、比較候補周波数帯域Ｂ１〜Ｂ４、Ｂ６〜Ｂ８のうちの最高音圧レベルＤ１＝２０、比較候補周波数帯域Ｂ１〜Ｂ３、Ｂ５〜Ｂ８のうちの最高音圧レベルＤ２＝６０となり、着目帯域Ａの音圧レベル「５０」は一方の音圧レベルＤ１よりも高いので、着目帯域Ａに残留エコー成分が存在すると推定される。そして、音圧レベルＤ１、Ｄ２のうちの低い方の音圧レベル「２０」を着目帯域Ａの音圧レベル「５０」から差し引いた値「３０」が着目帯域Ａの音圧レベルの抑圧量Ｓに決定され、着目帯域Ａの音圧レベルを、この抑圧量Ｓだけ抑圧する。これにより、着目帯域Ａの音圧レベルは「２０」となる。 For example, the sound pressure level of the target band A is 50, and the sound pressure levels of the comparison candidate frequency bands B1, B2, B3, B4, B5, B6, B7, and B8 are 10, 20, 10, 20, 60, 10, 10 and 10 are assumed. In this case, the highest sound pressure level D1 of comparison candidate frequency bands B1 to B4 and B6 to B8 is 20, and the highest sound pressure level D2 of comparison candidate frequency bands B1 to B3 and B5 to B8 is 60. Since the sound pressure level “50” of A is higher than one of the sound pressure levels D1, it is estimated that a residual echo component exists in the band A of interest. Then, a value “30” obtained by subtracting the lower sound pressure level “20” of the sound pressure levels D1 and D2 from the sound pressure level “50” of the target band A is the suppression amount S of the sound pressure level of the target band A. The sound pressure level of the band A of interest is suppressed by this suppression amount S. As a result, the sound pressure level of the band A of interest is “20”.

図３は、残留エコーの推定・抑圧処理の第二の例を説明するための図である。ここでは、遠端音声信号入力部１０１に入力される遠端音声信号および近端音声信号入力部１０２に入力される近端音声信号がサンプリング周波数３２ｋＨｚのデジタル音声信号であり、ＦＦＴ部１０３、１０４が、入力されたデジタル音声信号をフレーム長２０４８ポイントのＦＦＴにより１０２４個の周波数帯域に分割した上で、フレームシフト単位を５１２ポイント（１６ｍｓ）とする場合を想定している。 FIG. 3 is a diagram for explaining a second example of the residual echo estimation / suppression process. Here, the far-end audio signal input to the far-end audio signal input unit 101 and the near-end audio signal input to the near-end audio signal input unit 102 are digital audio signals with a sampling frequency of 32 kHz, and the FFT units 103 and 104 However, it is assumed that the input digital audio signal is divided into 1024 frequency bands by FFT with a frame length of 2048 points and the frame shift unit is 512 points (16 ms).

図３において、図２と同じものには同じ符号を付している。図示するように、音圧スペクトル情報２０は、１０２４個の周波数帯域に分割され、そのフレームシフト単位２３は１６ｍｓである。 In FIG. 3, the same components as those in FIG. As shown in the figure, the sound pressure spectrum information 20 is divided into 1024 frequency bands, and the frame shift unit 23 is 16 ms.

残留エコー推定部１０８１は、１フレームシフト単位毎に、１０２４個の周波数帯域の一つ一つに着目し、着目する周波数帯域（着目帯域）に残留エコー成分が存在するか否かを以下の要領で判断する。 The residual echo estimation unit 1081 pays attention to each of the 1024 frequency bands for each frame shift unit, and determines whether or not there is a residual echo component in the frequency band of interest (target band) as follows. Judge with.

着目する周波数帯域を着目帯域Ａとする。先ず、残留エコー推定部１０８１は、着目帯域Ａおよび１６個の比較候補周波数帯域Ｃ１〜Ｃ１６を含む処理対象ブロックを決定とする。具体的には、着目帯域Ａの属するフレームシフト単位の３つ前のフレームシフト単位から、着目帯域Ａと同じ周波数帯域Ｃ７および着目帯域Ａの上下に隣接する周波数帯域Ｃ１、Ｃ１１を比較候補周波数帯域として選出する。また、着目帯域Ａの属するフレームシフト単位の２つ前のフレームシフト単位から、着目帯域Ａと同じ周波数帯域Ｃ８および着目帯域Ａの上下に隣接する周波数帯域Ｃ２、Ｃ１２を比較候補周波数帯域として選出する。また、着目帯域Ａの属するフレームシフト単位の直前のフレームシフト単位から、着目帯域Ａの上下に隣接する周波数帯域Ｃ３、Ｃ１３を比較候補周波数帯域として選出する。また、着目帯域Ａの属するフレームシフト単位から、着目帯域Ａの上下に隣接する周波数帯域Ｃ４、Ｃ１４を比較候補周波数帯域として選出する。また、着目帯域Ａの属するフレームシフト単位の直後のフレームシフト単位から、着目帯域Ａと同じ周波数帯域Ｃ９および着目帯域Ａの上下に隣接する周波数帯域Ｃ５、Ｃ１５を比較候補周波数帯域として選出する。さらに、着目帯域Ａの属するフレームシフト単位の２つ後のフレームシフト単位から、着目帯域Ａと同じ周波数帯域Ｃ１０および着目帯域Ａの上下に隣接する周波数帯域Ｃ６、Ｃ１６を比較候補周波数帯域として選出する。 Let the frequency band of interest be the bandwidth of interest A. First, the residual echo estimation unit 1081 determines a processing target block including the target band A and the 16 comparison candidate frequency bands C1 to C16. Specifically, from the frame shift unit three frames before the frame shift unit to which the target band A belongs, the same frequency band C7 as the target band A and the frequency bands C1 and C11 adjacent to the upper and lower sides of the target band A are compared with the comparison candidate frequency bands. Elected as. Further, from the frame shift unit two frames before the frame shift unit to which the target band A belongs, the same frequency band C8 as the target band A and the frequency bands C2 and C12 adjacent above and below the target band A are selected as the comparison candidate frequency bands. . Further, frequency bands C3 and C13 adjacent to the top and bottom of the target band A are selected as comparison candidate frequency bands from the frame shift unit immediately before the frame shift unit to which the target band A belongs. Further, frequency bands C4 and C14 adjacent to the top and bottom of the target band A are selected as comparison candidate frequency bands from the frame shift unit to which the target band A belongs. Further, from the frame shift unit immediately after the frame shift unit to which the target band A belongs, the same frequency band C9 as the target band A and frequency bands C5 and C15 adjacent to the upper and lower sides of the target band A are selected as the comparison candidate frequency bands. Further, the frequency band C10 that is the same as the target band A and the frequency bands C6 and C16 that are adjacent to the upper and lower sides of the target band A are selected as comparison candidate frequency bands from the second frame shift unit to which the target band A belongs. .

次に、残留エコー推定部１０８１は、以上のようにして選出した、処理対象ブロック内の比較候補周波数帯域Ｃ１〜Ｃ１６の音圧レベルと着目帯域Ａの音圧レベルとを比較し、着目帯域Ａの音圧レベルが比較候補周波数帯域Ｃ１〜Ｃ１６の音圧レベルから乖離しているか否かを判断する。具体的には、図３の条件２６に示すように、比較候補周波数帯域Ｃ１、Ｃ２の音圧レベルのうちの低い方をＥ１、比較候補周波数帯域Ｃ３、Ｃ４の音圧レベルのうちの低い方をＥ２、比較候補周波数帯域Ｃ５、Ｃ６の音圧レベルのうちの低い方をＥ３、比較候補周波数帯域Ｃ７、Ｃ８の音圧レベルのうちの低い方をＥ４、比較候補周波数帯域Ｃ９、Ｃ１０の音圧レベルのうちの低い方をＥ５、比較候補周波数帯域Ｃ１１、Ｃ１２の音圧レベルのうちの低い方をＥ６、比較候補周波数帯域Ｃ１３、Ｃ１４の音圧レベルのうちの低い方をＥ７、比較候補周波数帯域Ｃ１５、Ｃ１６の音圧レベルのうちの低い方をＥ８とする。そして、着目帯域Ａの音圧レベルが、音圧レベルＥ１〜Ｅ４、Ｅ６〜Ｅ８のうちの最も高い音圧レベルＦ１、および音圧レベルＥ１〜Ｅ３、Ｅ５〜Ｅ８のうち最も高い音圧レベルＦ２の少なくとも一方より高いか否かを判断する。その結果、着目帯域Ａの音圧レベルが比較候補周波数帯域Ｃ１〜Ｃ１６の音圧レベルから乖離しているならば（音圧レベルＦ１、音圧レベルＦ２の少なくとも一方より高いならば）、この着目帯域Ａに残留エコー成分が存在すると推定する。 Next, the residual echo estimation unit 1081 compares the sound pressure level of the comparison candidate frequency bands C1 to C16 in the processing target block selected as described above with the sound pressure level of the target band A, and the target band A It is determined whether or not the sound pressure level deviates from the sound pressure levels in the comparison candidate frequency bands C1 to C16. Specifically, as shown in the condition 26 of FIG. 3, the lower one of the sound pressure levels of the comparison candidate frequency bands C1 and C2 is the lower one of the sound pressure levels of the comparison candidate frequency bands C3 and C4. E2, the lower of the sound pressure levels of the comparison candidate frequency bands C5 and C6, E3, the lower of the sound pressure levels of the comparison candidate frequency bands C7 and C8, and the sound of the comparison candidate frequency bands C9 and C10. The lower of the pressure levels is E5, the lower of the sound pressure levels of the comparison candidate frequency bands C11 and C12 is E6, the lower of the sound pressure levels of the comparison candidate frequency bands C13 and C14 is E7, and the comparison candidate The lower one of the sound pressure levels in the frequency bands C15 and C16 is defined as E8. The sound pressure level in the bandwidth A of interest is the highest sound pressure level F1 among the sound pressure levels E1 to E4 and E6 to E8, and the highest sound pressure level F2 among the sound pressure levels E1 to E3 and E5 to E8. It is judged whether it is higher than at least one of these. As a result, if the sound pressure level of the target band A deviates from the sound pressure levels of the comparison candidate frequency bands C1 to C16 (if higher than at least one of the sound pressure level F1 and the sound pressure level F2), this target It is estimated that there is a residual echo component in band A.

残留エコー抑圧部１０８２は、残留エコー推定部１０８１により残留エコー成分が存在すると推定された着目帯域Ａの音圧レベルを、着目帯域Ａの音圧レベルが比較候補周波数帯域Ｃ１〜Ｃ１６の音圧レベルから乖離しなくなるように抑圧する。具体的には、図３の条件２７に示すように、着目帯域Ａの音圧レベルから、音圧レベルＦ１、Ｆ２のうち低い方の音圧レベルを差し引いた値を、着目帯域Ａの音圧レベルの抑圧量Ｓに決定する。そして、着目帯域Ａの音圧レベルを抑圧量Ｓだけ抑圧する。これにより、着目帯域Ａに存在する残留エコー成分を抑圧する。 The residual echo suppression unit 1082 uses the sound pressure level of the target band A estimated by the residual echo estimation unit 1081 to have a residual echo component, and the sound pressure level of the target band A of the comparison candidate frequency bands C1 to C16. Suppress it so that it does not deviate from. Specifically, as shown in the condition 27 of FIG. 3, a value obtained by subtracting the lower sound pressure level of the sound pressure levels F1 and F2 from the sound pressure level of the target band A is used as the sound pressure of the target band A. The level suppression amount S is determined. Then, the sound pressure level of the band A of interest is suppressed by the suppression amount S. Thereby, the residual echo component which exists in the attention zone A is suppressed.

なお、図２および図３において、最も高い周波数帯域、あるいは最も低い周波数帯域が着目領域Ａである場合、この着目帯域Ａよりも高い周波数帯域、あるいは低い周波数帯域は存在しない。これらの例外的な取り扱いについては、実使用において残留エコーを消去する上でそれほど重要ではない。例えば、実際には存在しない隣接周波数帯域の音圧レベルを０と見なして抑圧量Ｓを決定してもよい。あるいは、無条件に抑圧量Ｓを０に決定してもよい。このことは、以降に、図５を用いて説明する第三の例の場合も同様である。 2 and 3, when the highest frequency band or the lowest frequency band is the attention area A, there is no frequency band higher or lower than the attention band A. These exceptional handlings are not so important in canceling residual echoes in practical use. For example, the suppression amount S may be determined by regarding the sound pressure level in the adjacent frequency band that does not actually exist as 0. Alternatively, the suppression amount S may be determined to 0 unconditionally. The same applies to the case of the third example described below with reference to FIG.

以上、本発明の第一実施の形態について説明した。 The first embodiment of the present invention has been described above.

図２および図３に示す残留エコーの推定・抑圧処理は、線形エコーキャンセラ部１０６およびスペクトルサブトラクション部１０７によって音響エコーが消去された近端音声信号に含まれる残留エコー成分の特徴と人の会話音声成分の特徴とが異なることを利用している。音響エコーが消去された近端音声信号の音圧スペクトル情報において、任意の周波数帯域を着目帯域Ａとして、それぞれの成分の特徴を説明する。 The residual echo estimation / suppression process shown in FIG. 2 and FIG. 3 includes the characteristics of the residual echo component included in the near-end speech signal from which the acoustic echo has been eliminated by the linear echo canceller unit 106 and the spectral subtraction unit 107, and the human conversation speech. Utilizing the fact that the characteristics of the components are different. In the sound pressure spectrum information of the near-end audio signal from which the acoustic echo has been eliminated, the characteristics of each component will be described with an arbitrary frequency band as the target band A.

着目帯域Ａにおいて人の会話音声成分が主である場合、着目帯域Ａの成分の音圧レベルは、着目帯域Ａと時間方向および周波数方向において隣接する周波数帯域の成分の音圧レベルと近い値をとる傾向が強い。 When the human conversation voice component is mainly in the target band A, the sound pressure level of the component of the target band A is close to the sound pressure level of the component of the frequency band adjacent to the target band A in the time direction and the frequency direction. The tendency to take is strong.

一方、着目帯域Ａにおいて残留エコー成分が主である場合、着目帯域Ａの成分の音圧レベルは、着目帯域Ａと周波数方向において隣接する周波数帯域の成分の音圧レベルから乖離した値をとる傾向が強い。これは、音響環境学習および音響エコー消去処理が周波数帯域毎に行われるため、線形エコーキャンセラ部１０６およびスペクトルサブトラクション部１０７による音響エコーの消去量に周波数帯域毎の差が存在し、音響エコー消去処理によって消去しきれなかった残留エコー成分の音圧レベルが近隣帯域間で必ずしも近い値をとらないからである。 On the other hand, when the residual echo component is dominant in the target band A, the sound pressure level of the component of the target band A tends to take a value that deviates from the sound pressure level of the frequency band component adjacent to the target band A in the frequency direction. Is strong. This is because acoustic environment learning and acoustic echo cancellation processing are performed for each frequency band, so there is a difference for each frequency band in the acoustic echo cancellation amount by the linear echo canceller unit 106 and the spectral subtraction unit 107. This is because the sound pressure level of the residual echo component that could not be canceled due to the above does not necessarily take a close value between neighboring bands.

また、音圧の高い残留エコー成分はとりわけ離散的に存在する傾向がある。これは、音響エコーの推定誤差が大きくなった周波数帯域および時間において、音圧の高い残留エコー成分が発生する一方、音響環境学習および音響エコー消去処理が概ね安定動作するエコーキャンセラにおいては、（フレームシフト単位における全周波数帯域の成分に占める割合としての）その発生確率が低いためである。 Also, residual echo components with high sound pressure tend to exist in particular discretely. This is because, in an echo canceller in which acoustic environment learning and acoustic echo cancellation processing operate in a stable manner, a residual echo component with high sound pressure is generated in the frequency band and time when the estimation error of acoustic echo is large. This is because the probability of occurrence thereof (as a proportion of all frequency band components in the shift unit) is low.

また、時間方向については、（極短時間でみた場合、）同じ周波数帯域であれば共通の適応フィルタを用いて音響エコー消去処理を行うこととなるため、数フレームシフト時間程度（例えば６４ｍｓ程度）にわたり音圧レベルの近い残留エコー成分が連続することがある。しかし、一般的な会話音声の音響エコー消去処理を行った音声信号では、それ以上の時間連続して残留エコー成分が存在することは稀である。 Also, in the time direction (when viewed in a very short time), if the same frequency band is used, acoustic echo cancellation processing is performed using a common adaptive filter, and therefore, several frame shift times (for example, about 64 ms) A residual echo component with a close sound pressure level may continue. However, in a speech signal that has been subjected to acoustic echo cancellation processing of a general conversational speech, it is rare that a residual echo component exists continuously for a longer time.

本実施の形態では、以上のような、周波数帯域毎の音響環境学習およびエコー消去処理後の近端音声信号における残留エコー成分の特徴と、人の会話音声成分の特徴との相違に鑑み、残留エコー消去部１０８が、スペクトルサブトラクション部１０７より出力された近端音声信号について、周波数帯域毎に、着目する周波数帯域と時間方向および周波数方向において隣接する範囲の成分に基づいて、着目する周波数帯域の成分から残留エコーを消去している。 In the present embodiment, in view of the difference between the characteristics of the residual echo component in the near-end speech signal after the acoustic environment learning and echo cancellation processing for each frequency band as described above, and the characteristics of the human speech component, For the near-end audio signal output from the spectral subtraction unit 107, the echo canceling unit 108, for each frequency band, based on the frequency band of interest and the components in the range adjacent in the time direction and the frequency direction, The residual echo is eliminated from the component.

具体的には、残留エコー推定部１０８１が、スペクトルサブトラクション部１０７より出力された近端音声信号について、周波数帯域毎に、着目する周波数帯域と時間方向および周波数方向において隣接する範囲の成分のなかから、音圧レベルの高い成分を少なくとも一つ選定し、選定した成分の音圧レベルと着目する周波数帯域の成分の音圧レベルとの比較結果に基づいて、着目する周波数帯域の成分に残留エコーが含まれているか否かを推定する。そして、残留エコー抑圧部１０８２は、残留エコーが含まれていると推定された周波数帯域の成分について、この周波数帯域と時間方向および周波数方向において隣接する範囲の成分のなかから音圧レベルの低い信号成分を選定し、選定した成分の音圧レベルとこの周波数帯域の成分の音圧レベルとの差に応じて、この周波数帯域の成分を抑圧する。 Specifically, with respect to the near-end speech signal output from the spectral subtraction unit 107, the residual echo estimation unit 1081 determines, for each frequency band, from the frequency band of interest and the components in the adjacent range in the time direction and the frequency direction. Then, select at least one component with a high sound pressure level, and based on the comparison result between the sound pressure level of the selected component and the sound pressure level of the component of the frequency band of interest, residual echo is present in the component of the frequency band of interest. Estimate whether it is included. The residual echo suppression unit 1082 then transmits a signal having a low sound pressure level from components in a frequency band that is estimated to contain residual echo from components in a range adjacent to the frequency band in the time direction and the frequency direction. A component is selected, and the component in this frequency band is suppressed according to the difference between the sound pressure level of the selected component and the sound pressure level of the component in this frequency band.

したがって、本実施の形態によれば、会話音声の歪み（抑圧）を小さく抑えながら、線形エコーキャンセラ部１０６およびスペクトルサブトラクション部１０７で消去しきれない残留エコーを抑圧することができ、これによりエコーキャンセラ１のエコー抑圧効果を高めることができる。
［第二実施の形態］
図４は、本発明の第二実施の形態に係るエコーキャンセラ１Ａの概略構成図である。 Therefore, according to the present embodiment, it is possible to suppress residual echo that cannot be completely erased by the linear echo canceller unit 106 and the spectral subtraction unit 107, while suppressing distortion (suppression) of conversational speech, thereby reducing the echo canceller. 1 echo suppression effect can be enhanced.
[Second Embodiment]
FIG. 4 is a schematic configuration diagram of an echo canceller 1A according to the second embodiment of the present invention.

図示するように、本実施の形態に係るエコーキャンセラ１Ａが図１に示す第一実施の形態に係るエコーキャンセラ１と異なる点は、残留エコー消去部１０８に代えて残留エコー消去部１０８Ａを有することである。 As illustrated, the echo canceller 1A according to the present embodiment is different from the echo canceller 1 according to the first embodiment shown in FIG. 1 in that a residual echo canceling unit 108A is provided instead of the residual echo canceling unit 108. It is.

また、残留エコー消去部１０８Ａが残留エコー消去部１０８と異なる点は、それぞれ比較候補周波数帯域となる周波数帯域の範囲が異なる残留エコー推定部１０８１を複数有すること、および複数の残留エコー推定部１０８１のなかから残留エコーの推定に使用する残留エコー推定部１０８１を選択する選択部１０８３を有することである。 Further, the residual echo canceling unit 108A is different from the residual echo canceling unit 108 in that there are a plurality of residual echo estimating units 1081 having different frequency band ranges that are comparison candidate frequency bands, and a plurality of residual echo estimating units 1081 Among them, a selection unit 1083 for selecting the residual echo estimation unit 1081 used for estimation of the residual echo is provided.

選択部１０８３は、ＦＦＴ部１０４から出力される近端音声信号の音圧スペクトル情報、擬似エコー生成部１０５により生成された擬似エコー信号の音圧スペクトル情報、および線形エコーキャンセラ部１０６より出力される音響エコー消去後の近端音声信号の音圧スペクトル情報に基づいて、音響エコーの大きさを推定する。そして、この音響エコーの大きさに応じて、比較候補周波数帯域の範囲が広くなるように、残留エコー推定部１０８１を選択する。 The selection unit 1083 is output from the sound pressure spectrum information of the near-end speech signal output from the FFT unit 104, the sound pressure spectrum information of the pseudo echo signal generated by the pseudo echo generation unit 105, and the linear echo canceller unit 106. The size of the acoustic echo is estimated based on the sound pressure spectrum information of the near-end speech signal after the acoustic echo is erased. Then, the residual echo estimation unit 1081 is selected so that the range of the comparison candidate frequency band is widened according to the size of the acoustic echo.

例えば、擬似エコー信号の全周波数帯域における音圧レベルの合計値が、音響エコー消去後の近端音声信号の全周波数帯域における音圧レベルの合計値よりも一定割合（所定の基準値）以上大きい場合、すなわち近端音声信号に対して音響エコーが基準値以上の大きさである場合は、比較候補周波数帯域の範囲が広い、すなわち残留エコーの抑圧効果がより高い残留エコー推定部１０８１を選択する。そして、選択した残留エコー推定部１０８１に、残留エコーの推定を実施させる。 For example, the total value of the sound pressure levels in the entire frequency band of the pseudo echo signal is larger than the total value of the sound pressure levels in the entire frequency band of the near-end speech signal after acoustic echo cancellation by a certain percentage (predetermined reference value) or more. In this case, that is, when the acoustic echo is larger than the reference value with respect to the near-end speech signal, the residual echo estimation unit 1081 having a wide comparison candidate frequency band range, that is, a higher residual echo suppression effect is selected. . Then, the selected residual echo estimator 1081 is caused to perform residual echo estimation.

図５は、残留エコーの推定・抑圧処理の第三の例を説明するための図である。ここでは、遠端音声信号入力部１０１に入力される遠端音声信号および近端音声信号入力部１０２に入力される近端音声信号がサンプリング周波数３２ｋＨｚのデジタル音声信号であり、ＦＦＴ部１０３、１０４が、入力されたデジタル音声信号をフレーム長２０４８ポイントのＦＦＴにより１０２４個の周波数帯域に分割した上で、フレームシフト単位を５１２ポイント（１６ｍｓ）とする場合を想定している。 FIG. 5 is a diagram for explaining a third example of residual echo estimation / suppression processing. Here, the far-end audio signal input to the far-end audio signal input unit 101 and the near-end audio signal input to the near-end audio signal input unit 102 are digital audio signals with a sampling frequency of 32 kHz, and the FFT units 103 and 104 However, it is assumed that the input digital audio signal is divided into 1024 frequency bands by FFT with a frame length of 2048 points and the frame shift unit is 512 points (16 ms).

図５において、図３と同じものには同じ符号を付している。図３に示す場合と同様、音圧スペクトル情報２０は、１０２４個の周波数帯域に分割され、そのフレームシフト単位２３は１６ｍｓである。 5, the same components as those in FIG. 3 are denoted by the same reference numerals. As in the case shown in FIG. 3, the sound pressure spectrum information 20 is divided into 1024 frequency bands, and the frame shift unit 23 is 16 ms.

着目する周波数帯域を着目帯域Ａとする。先ず、残留エコー推定部１０８１は、着目帯域Ａと２８個の比較候補周波数帯域Ｃ１〜Ｃ２８を含む処理対象ブロックを決定する。具体的には、着目帯域Ａの属するフレームシフト単位の３つ前のフレームシフト単位から、着目帯域Ａと同じ周波数帯域Ｃ１３、着目帯域Ａの上下に隣接する周波数帯域Ｃ７、Ｃ１７、および周波数帯域Ｃ７、Ｃ１７に隣接する周波数帯域Ｃ１、Ｃ２３を比較候補周波数帯域として選出する。また、着目帯域Ａの属するフレームシフト単位の２つ前のフレームシフト単位から、着目帯域Ａと同じ周波数帯域Ｃ１４、着目帯域Ａの上下に隣接する周波数帯域Ｃ８、Ｃ１８、および周波数帯域Ｃ８、Ｃ１８に隣接する周波数帯域Ｃ２、Ｃ２４を比較候補周波数領域として選出する。また、着目帯域Ａの属するフレームシフト単位の直前のフレームシフト単位から、着目帯域Ａの上下に隣接する周波数帯域Ｃ９、Ｃ１９、および周波数帯域Ｃ９、Ｃ１９に隣接する周波数帯域Ｃ３、Ｃ２５を比較候補周波数帯域として選出する。また、着目帯域Ａの属するフレームシフト単位から、着目帯域Ａの上下に隣接する周波数帯域Ｃ１０、Ｃ２０、および周波数帯域Ｃ１０、Ｃ２０に隣接する周波数帯域Ｃ４、Ｃ２６を比較候補周波数帯域として選出する。また、着目帯域Ａの属するフレームシフト単位の直後のフレームシフト単位から、着目帯域Ａと同じ周波数帯域Ｃ１５、着目帯域Ａの上下に隣接する周波数帯域Ｃ１１、Ｃ２１、および周波数帯域Ｃ１１、Ｃ２１に隣接する周波数帯域Ｃ５、Ｃ２７を比較候補として選出する。さらに、着目帯域Ａの属するフレームシフト単位の２つ後のフレームシフト単位から、着目帯域Ａと同じ周波数帯域Ｃ１６、着目帯域Ａの上下に隣接する周波数帯域Ｃ１２、Ｃ２２、および周波数帯域Ｃ１２、Ｃ２２に隣接する周波数帯域Ｃ６、Ｃ２８を比較候補周波数帯域として選出する。 Let the frequency band of interest be the bandwidth of interest A. First, the residual echo estimation unit 1081 determines a processing target block including the target band A and the 28 comparison candidate frequency bands C1 to C28. Specifically, from the frame shift unit three frames before the frame shift unit to which the target band A belongs, the same frequency band C13 as the target band A, frequency bands C7 and C17 adjacent to the upper and lower sides of the target band A, and the frequency band C7. , C17 and C17 adjacent to C17 are selected as comparison candidate frequency bands. Further, from the frame shift unit immediately before the frame shift unit to which the target band A belongs, the same frequency band C14 as the target band A, the frequency bands C8 and C18 adjacent to the upper and lower sides of the target band A, and the frequency bands C8 and C18. Adjacent frequency bands C2 and C24 are selected as comparison candidate frequency regions. Further, from the frame shift unit immediately before the frame shift unit to which the target band A belongs, the frequency bands C9 and C19 adjacent to the upper and lower sides of the target band A and the frequency bands C3 and C25 adjacent to the frequency bands C9 and C19 are compared with the comparison candidate frequencies. Elected as a band. Further, from the frame shift unit to which the target band A belongs, the frequency bands C10 and C20 adjacent to the upper and lower sides of the target band A and the frequency bands C4 and C26 adjacent to the frequency bands C10 and C20 are selected as comparison candidate frequency bands. Further, from the frame shift unit immediately after the frame shift unit to which the target band A belongs, the same frequency band C15 as the target band A, adjacent frequency bands C11 and C21 above and below the target band A, and adjacent to the frequency bands C11 and C21. The frequency bands C5 and C27 are selected as comparison candidates. Further, from the frame shift unit two frames after the frame shift unit to which the target band A belongs, the same frequency band C16 as the target band A, the frequency bands C12 and C22 adjacent to the upper and lower sides of the target band A, and the frequency bands C12 and C22. Adjacent frequency bands C6 and C28 are selected as comparison candidate frequency bands.

次に、残留エコー推定部１０８１は、以上のようにして選出した、処理対象ブロック内の比較候補周波数帯域Ｃ１〜Ｃ２８の音圧レベルと着目帯域Ａの音圧レベルとを比較し、着目帯域Ａの音圧レベルが比較候補周波数帯域Ｃ１〜Ｃ１６の音圧レベルから乖離しているか否かを判断する。具体的には、図５の条件２８に示すように、比較候補周波数帯域Ｃ１、Ｃ２、Ｃ７、Ｃ８の音圧レベルのうちの低いものをＥ１、比較候補周波数帯域Ｃ３、Ｃ４、Ｃ９、Ｃ１０の音圧レベルのうちの低いものをＥ２、比較候補周波数帯域Ｃ５、Ｃ６、Ｃ１１、Ｃ１２の音圧レベルのうちの低いものをＥ３、比較候補周波数帯域Ｃ１３、Ｃ１４の音圧レベルのうちの低いものをＥ４、比較候補周波数帯域Ｃ１５、Ｃ１６の音圧レベルのうちの低いものをＥ５、比較候補周波数帯域Ｃ１７、Ｃ１８、Ｃ２３、Ｃ２４の音圧レベルのうちの低いものをＥ６、比較候補周波数帯域Ｃ１９、Ｃ２０、Ｃ２５、Ｃ２６の音圧レベルのうちの低いものをＥ７、そして、比較候補周波数帯域Ｃ２１、Ｃ２２、Ｃ２７、Ｃ２８の音圧レベルのうちの低いものをＥ８とする。そして、着目帯域Ａの音圧レベルが、音圧レベルＥ１〜Ｅ４、Ｅ６〜Ｅ８のうちの最も高い音圧レベルＦ１、および音圧レベルＥ１〜Ｅ３、Ｅ５〜Ｅ８のうちの最も高い音圧レベルＦ２の少なくとも一方より高いか否かを判断する。その結果、着目帯域Ａの音圧レベルが比較候補周波数帯域Ｃ１〜Ｃ２８の音圧レベルから乖離しているならば（音圧レベルＦ１、Ｆ２の少なくとも一方よりも高いならば）、この着目帯域Ａに残留エコー成分が存在すると推定する。 Next, the residual echo estimation unit 1081 compares the sound pressure level of the comparison candidate frequency bands C1 to C28 in the processing target block and the sound pressure level of the target band A, selected as described above, and calculates the target band A. It is determined whether or not the sound pressure level deviates from the sound pressure levels in the comparison candidate frequency bands C1 to C16. Specifically, as shown in the condition 28 in FIG. 5, the lower one of the sound pressure levels in the comparison candidate frequency bands C1, C2, C7, and C8 is set to E1, and the comparison candidate frequency bands C3, C4, C9, and C10. The lower one of the sound pressure levels is E2, the lower one of the sound pressure levels in the comparison candidate frequency bands C5, C6, C11, and C12 is the lower one of the sound pressure levels in the comparison candidate frequency bands C13 and C14. E4, a lower one of the sound pressure levels of the comparison candidate frequency bands C15, C16 is E5, a lower one of the sound pressure levels of the comparison candidate frequency bands C17, C18, C23, C24 is E6, and a comparison candidate frequency band C19 , C20, C25, C26 among the low sound pressure levels E7, and low of the comparison candidate frequency bands C21, C22, C27, C28 among the sound pressure levels E To. And the sound pressure level of the zone A of interest is the highest sound pressure level F1 among the sound pressure levels E1 to E4 and E6 to E8, and the highest sound pressure level among the sound pressure levels E1 to E3 and E5 to E8. It is determined whether it is higher than at least one of F2. As a result, if the sound pressure level of the target band A deviates from the sound pressure levels of the comparison candidate frequency bands C1 to C28 (if it is higher than at least one of the sound pressure levels F1 and F2), the target band A It is estimated that there is a residual echo component.

残留エコー抑圧部１０８２は、残留エコー推定部１０８１により残留エコー成分が存在すると推定された着目帯域Ａの音圧レベルを、着目帯域Ａの音圧レベルが比較候補周波数帯域Ｃ１〜Ｃ２８の音圧レベルから乖離しなくなるように抑圧する。具体的には、図３の条件２７の場合と同様である。 The residual echo suppression unit 1082 uses the sound pressure level of the target band A estimated by the residual echo estimation unit 1081 to have a residual echo component, and the sound pressure level of the target band A of the comparison candidate frequency bands C1 to C28. Suppress it so that it does not deviate from. Specifically, it is the same as the case of condition 27 in FIG.

図５に示す第三の例によれば、図３に示す第二の例に比べて比較候補周波数帯域の範囲が広くなっており、これにより残留エコーの検出感度がより高くなり、残留エコーの抑圧効果がより高まる。しかし、会話音声の歪みは増加する。そこで、例えば、選択部１０８３は、通常の場合、すなわち音響エコーの大きさが基準値未満の場合は、図３に示す第二の例で動作する残留エコー推定部１０８１を選択し、音響エコーの大きさが基準値以上となった場合にのみ、図５に示す第三の例で動作する残留エコー推定部１０８１を選択する。これにより、残留エコーを抑圧しつつも、会話音声の歪みが極力生じないようにしている。 According to the third example shown in FIG. 5, the range of the comparison candidate frequency band is wide compared to the second example shown in FIG. The suppression effect is further increased. However, the distortion of conversational speech increases. Therefore, for example, in the normal case, that is, when the size of the acoustic echo is less than the reference value, the selection unit 1083 selects the residual echo estimation unit 1081 that operates in the second example illustrated in FIG. Only when the magnitude is equal to or larger than the reference value, the residual echo estimation unit 1081 operating in the third example shown in FIG. 5 is selected. Thereby, while suppressing the residual echo, the distortion of the conversation voice is prevented from occurring as much as possible.

以上、本発明の第二実施の形態について説明した。 The second embodiment of the present invention has been described above.

本実施の形態では、音響エコーの大きさに応じて、残留エコーの抑圧効果の異なる残留エコーの推定・抑圧処理を使い分けているので、特許文献１に記載の技術であればボイススイッチを使って音声を完全に遮断せざるを得なかったような悪条件においても、全二重通話状態を維持し、より快適な通話環境を提供することができる。その他の効果は第一実施の形態と同様である。 In the present embodiment, since residual echo estimation / suppression processing with different residual echo suppression effects is used depending on the size of the acoustic echo, the voice switch is used in the technique described in Patent Document 1. Even in an adverse condition where the voice must be completely blocked, the full-duplex call state can be maintained and a more comfortable call environment can be provided. Other effects are the same as those of the first embodiment.

なお、本発明は上記の各実施の形態に限定されるものではなく、その要旨の範囲内で数々の変形が可能である。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the gist.

例えば、上記の各実施の形態において、エコーキャンセラ１、１Ａの構成は、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）などの集積ロジックＩＣによりハード的に実行されるものでもよい。あるいは、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）によりソフト的に実行されるものでもよい。もしくは、ＣＰＵ、メモリ、ＨＤＤ、ＤＶＤ−ＲＯＭ等の補助記憶装置、モデム、およびＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）を備えたＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）等の汎用コンピュータにおいて、ＣＰＵが、所定のプログラムを補助記憶装置からメモリ上にロードして実行することにより実現されるものでもよい。 For example, in each of the above-described embodiments, the configuration of the echo cancellers 1 and 1A may be implemented in hardware by an integrated logic IC such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). Alternatively, it may be executed in software by a DSP (Digital Signal Processor). Alternatively, in a general-purpose computer such as an auxiliary storage device such as a CPU, memory, HDD, or DVD-ROM, a modem, and a PC (Personal Computer) equipped with a NIC (Network Interface Card), the CPU stores a predetermined program as an auxiliary storage device. It may be realized by loading the program from the memory into the memory and executing it.

図１は、本発明の第一実施の形態に係るエコーキャンセラ１の概略構成図である。FIG. 1 is a schematic configuration diagram of an echo canceller 1 according to the first embodiment of the present invention. 図２は、残留エコーの推定・抑圧処理の第一の例を説明するための図である。FIG. 2 is a diagram for explaining a first example of residual echo estimation / suppression processing. 図３は、残留エコーの推定・抑圧処理の第二の例を説明するための図である。FIG. 3 is a diagram for explaining a second example of the residual echo estimation / suppression process. 図４は、本発明の第二実施の形態に係るエコーキャンセラ１Ａの概略構成図である。FIG. 4 is a schematic configuration diagram of an echo canceller 1A according to the second embodiment of the present invention. 図５は、残留エコーの推定・抑圧処理の第三の例を説明するための図である。FIG. 5 is a diagram for explaining a third example of residual echo estimation / suppression processing.

Explanation of symbols

１、１Ａ：エコーキャンセラ、１０１：遠端音声信号入力部、１０２：近端音声信号入力部、１０３、１０４：ＦＦＴ部、１０５：擬似エコー生成部、１０６：線形エコーキャンセラ部、１０７：スペクトルサブトラクション部、１０８、１０８Ａ：残留エコー消去部、１０９：ＩＦＦＴ部、１１０：近端音声信号出力部、１０８１：残留エコー推定部、１０８２：残留エコー抑圧部、１０８３：選択部 1, 1A: Echo canceller, 101: Far end audio signal input unit, 102: Near end audio signal input unit, 103, 104: FFT unit, 105: Pseudo echo generation unit, 106: Linear echo canceller unit, 107: Spectral subtraction 108, 108A: residual echo canceller, 109: IFFT unit, 110: near-end speech signal output unit, 1081: residual echo estimation unit, 1082: residual echo suppression unit, 1083: selection unit

Claims

An echo canceler that eliminates an acoustic echo of an output audio signal that is an audio signal output from a speaker from an input audio signal that is an audio signal input to a microphone,
Pseudo echo generating means for estimating the acoustic echo of the output audio signal for each frequency band and generating a pseudo echo signal;
Echo cancellation means for subtracting the pseudo echo signal from the input audio signal to cancel the acoustic echo of the output audio signal from the input audio signal;
For the input audio signal in which the acoustic echo is canceled, the acoustic echo remains from the signal component in the frequency band based on the signal component in the range adjacent to the frequency band in the time direction and the frequency direction for each frequency band. An echo canceller comprising: a residual echo canceling unit that cancels a residual echo as a component.

The echo canceller according to claim 1,
The residual echo canceling means includes:
For the input audio signal from which the acoustic echo has been canceled, at least a signal component having a high sound pressure level is selected from signal components in a range adjacent to the signal component in the frequency band in the time direction and the frequency direction for each frequency band. Based on the comparison result between the sound pressure level of the selected signal component and the sound pressure level of the signal component in the frequency band, whether or not the residual echo is included in the signal component in the frequency band Residual echo estimation means for estimating
For the signal component of the frequency band estimated to contain the residual echo, the signal component having a low sound pressure level is selected from the signal components in the range adjacent to the frequency band in the time direction and the frequency direction, An echo comprising: residual echo suppression means for suppressing the signal component of the frequency band in accordance with the difference between the sound pressure level of the selected signal component and the sound pressure level of the signal component of the frequency band. Canceller.

The echo canceller according to claim 2,
The residual echo estimation means and the residual echo suppression means are:
An echo canceller, wherein a frequency range adjacent to the frequency band signal component is changed based on a sound pressure level of the pseudo echo signal.

An echo cancellation method for canceling an acoustic echo of an output audio signal that is an audio signal output from a speaker from an input audio signal that is an audio signal input to a microphone,
Estimating the acoustic echo of the output audio signal for each frequency band to generate a pseudo echo signal,
Subtracting the pseudo echo signal from the input audio signal, canceling the acoustic echo of the output audio signal from the input audio signal,
For the input audio signal in which the acoustic echo is canceled, the acoustic echo remains from the signal component in the frequency band based on the signal component in the range adjacent to the frequency band in the time direction and the frequency direction for each frequency band. An echo cancellation method characterized by canceling residual echo as a component.