WO2012158164A1 - Using echo cancellation information to limit gain control adaptation - Google Patents

Using echo cancellation information to limit gain control adaptation Download PDF

Info

Publication number
WO2012158164A1
WO2012158164A1 PCT/US2011/036861 US2011036861W WO2012158164A1 WO 2012158164 A1 WO2012158164 A1 WO 2012158164A1 US 2011036861 W US2011036861 W US 2011036861W WO 2012158164 A1 WO2012158164 A1 WO 2012158164A1
Authority
WO
WIPO (PCT)
Prior art keywords
echo
signal
audio signal
end audio
echo canceller
Prior art date
Application number
PCT/US2011/036861
Other languages
French (fr)
Inventor
John Andrew MACDONALD
Original Assignee
Google Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google Inc. filed Critical Google Inc.
Priority to PCT/US2011/036861 priority Critical patent/WO2012158164A1/en
Priority to EP11721216.7A priority patent/EP2710788A1/en
Publication of WO2012158164A1 publication Critical patent/WO2012158164A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers

Definitions

  • the present invention relates generally to a method and system for cancellation of echoes in telecommunication systems. It particularly relates to a method and system for using echo cancellation information to minimize gain control adaptation.
  • Speech quality is an important factor for telephony system suppliers. Customer demand makes it vital to strive for continuous improvements.
  • An echo which is a delayed version of what was originally transmitted, is regarded as a severe distraction to the speaker if the delay is long. For short round trip delays of less than approximately 20 ms, the speaker will not be able to distinguish the echo from the side tone in the handset.
  • a remotely generated echo signal often has a substantial delay.
  • the speech and channel coding compulsory in digital radio communications systems and for telephony over the Internet protocol (IP telephony, for short) also result in significant delays which make the echoes generated a relatively short distance away clearly audible to the speaker. Hence, canceling the echo is a significant factor in maintaining speech quality.
  • An echo canceller typically includes a linear filtering part which essentially is an adaptive filter that tries to adapt to the echo path. In this way, a replica of the echo can be produced from the far-end signal and subtracted from the near-end signal, thereby canceling the echo.
  • the filter generating the echo replica may have a finite or infinite impulse response. Most commonly it is an adaptive, linear finite impulse response (FIR) filter with a number of delay lines and a corresponding number of coefficients, or filter delay taps. The coefficients are values, which when multiplied with delayed versions of the filter input signal, generate an estimate of the echo.
  • the filter is adapted, i.e. updated, so that the coefficients converge to optimum values.
  • a traditional way to cancel out the echo is to update a finite impulse response (FIR) filter using the normalized least mean square (NLMS) algorithm.
  • An automatic gain control (AGC) system attempts to bring an audio signal to an appropriate level prior to application of echo cancellation by an acoustic echo canceller (AEC).
  • AGC acoustic echo canceller
  • AEC acoustic echo canceller
  • an AGC could be presented with audio containing echo in addition to a target speech.
  • the AGC analyzes a near-end signal before it is processed by the AEC, it is exposed to echo which is very often at a different level than the target speech.
  • a method for limiting gain control adaptation to a near-end audio signal using echo state information obtained from an echo canceller includes receiving echo state information from the echo canceller and signal level information of the near-end audio signal received by the echo canceller; determining a gain adaptation for the near-end audio signal based on the signal level information; and preventing upward gain adaptation to the received near-end audio signal when the echo state information indicates that the received near-end audio signal contains an echo signal.
  • the method includes computing a first coherence value by comparing correlations between a far-end signal and the near-end signal; computing a second coherence value by comparing correlations between the near-end signal and an error signal containing a residual echo output from a linear adaptive filter; and tracking the first and second coherence values to determine the echo state information.
  • the method includes performing echo cancellation based on echo cancellation information obtained from the echo canceller to generate an outgoing signal.
  • the method includes adding a comfort noise to the outgoing signal.
  • the signal level information of the near-end audio signal includes a moving average of the power of the near-end audio signal.
  • a system for limiting gain control adaptation to a near-end audio signal using echo state information obtained from an echo canceller includes an echo canceller that receives, as input, the near-end audio signal, the echo canceller comprising a non-linear processor, characterized in that the non-linear processor is configured to output echo state information of the echo canceller and an automatic gain control (AGC) analyzing unit operatively connected to the echo canceller, the AGC analyzing unit analyzing signal level information of the near-end audio signal received by the echo canceller.
  • AGC automatic gain control
  • the system also includes an AGC processing unit operatively connected to the echo canceller and the AGC analyzing unit, the AGC processing unit determining a gain adaptation for the near-end audio signal based on the signal level information and preventing upward gain adaptation to the received near-end audio signal when the echo state information indicates that the received near-end audio signal contains an echo signal.
  • the non-linear processor computes a first coherence value by comparing correlations between a far-end signal and the near-end signal and a second coherence value by comparing correlations between the near- end signal and an error signal containing a residual echo output from a linear adaptive filter and tracks the first and second coherence values to determine the echo state information.
  • the echo canceller performs echo cancellation on the near-end signal based on echo cancellation information to generate an outgoing signal.
  • the system includes a comfort noise generator to generate a comfort noise to be added to the outgoing signal.
  • a computer-readable storage medium having stored thereon computer executable program for limiting gain control adaptation to a near-end audio signal using echo state information obtained from an echo canceller.
  • the computer program when executed causes a processor to execute the steps of: receiving echo state information from the echo canceller and signal level information of the near-end audio signal received by the echo canceller; determining a gain adaptation for the near-end audio signal based on the signal level information; and preventing upward gain adaptation to the received near-end audio signal when the echo state information indicates that the received near-end audio signal contains an echo signal.
  • the computer program when executed causes the processor to further execute the steps of: computing a first coherence value by comparing correlations between a far-end signal and the near-end signal; computing a second coherence value by comparing correlations between the near-end signal and an error signal containing a residual echo output from a linear adaptive filter; and tracking the first and second coherence values to determine the echo state information.
  • the computer program when executed causes the processor to further execute the step of performing echo cancellation based on echo cancellation information obtained from the echo canceller to generate an outgoing signal.
  • the computer program when executed causes the processor to further execute the step of adding a comfort noise to the outgoing signal.
  • Fig. 1 is a block diagram of an acoustic echo canceller in accordance with an embodiment of the present invention.
  • Fig. 2 illustrates a more detailed block diagram describing the functions performed in the adaptive filter of Fig. 1 in accordance with an embodiment of the present invention.
  • FIG. 3 illustrates computational stages of the adaptive filter of Fig. 2 in accordance with an embodiment of the present invention.
  • FIG. 4 illustrates a more detailed block diagram describing block G m in Fig. 3 in accordance with an embodiment of the present invention.
  • Fig. 5 illustrates a flow diagram describing computational stages of the nonlinear processor of Fig. 1 in accordance with an embodiment of the present invention.
  • Fig. 6 is a block diagram of an acoustic echo canceller and an automatic gain controller in accordance with an embodiment of the present invention.
  • Fig. 7 is a flow diagram illustrating operations performed by the acoustic echo canceller according to an embodiment of the present invention illustrated in Fig. 6.
  • Fig. 8 is a flow diagram illustrating interactions of the acoustic echo canceller and the automatic gain controller according to an embodiment of the present invention illustrated in Fig. 6.
  • FIG. 9 is a block diagram illustrating an exemplary computing device that is arranged for acoustic echo cancellation in accordance with an embodiment of the present invention.
  • Fig. 1 illustrates an acoustic echo canceller (AEC) 100 in accordance with an exemplary embodiment of the present invention.
  • the AEC 100 is designed as a high quality echo canceller for voice and audio communication over packet switched networks. More specifically, the AEC 100 is designed to cancel acoustic echo 130 that emerges due to the reflection of sound waves of a render device 10 from boundary surfaces and other objects back to a near-end capture device 20. The echo 130 may also exist due to the direct path from render device 10 to the capture device 20.
  • Render device 10 may be any of a variety of audio output devices, including a loudspeaker or group of loudspeakers configured to output sound from one or more channels.
  • Capture device 20 may be any of a variety of audio input devices, such as one or more microphones configured to capture sound and generate input signals.
  • render device 10 and capture device 20 may be hardware devices internal to a computer system, or external peripheral devices connected to a computer system via wired and/or wireless connections.
  • render device 10 and capture device 20 may be components of a single device, such as a microphone, telephone handset, etc.
  • one or both of render device 10 and capture device 20 may include analog-to-digital and/or digital-to-analog transformation functionalities.
  • the echo canceller 100 includes a linear filter 102, a nonlinear processor (LP) 104, a far-end buffer 106, and a blocking buffer 108.
  • a far- end signal 1 10 generated at the far-end and transmitted to the near-end is input to the filter 102 via the far-end buffer (FEBuf) 106 and the blocking buffer 108.
  • the far-end signal 1 10 is also input to a play-out buffer 1 12 located near the render device 10.
  • the output signal 1 16 of the far-end buffer 106 is input to the blocking buffer 108 and the output signal 118 of the blocking buffer is input to the linear filter 102.
  • the far-end buffer 106 is configured to compensate for and synchronize to buffering at sound devices (not shown).
  • the blocking buffer 108 is configured to block the signal samples for a frequency-domain transformation to be performed by the linear filter 102 and the NLP 104.
  • the linear filter 102 is an adaptive filter.
  • Linear filter 102 operates in the frequency domain through, e.g., the Discrete Fourier Transform (DFT).
  • the DFT may be implemented as a Fast Fourier Transform (FFT).
  • FFT Fast Fourier Transform
  • the other input to the filter 102 is the near-end signal (Sin) 122 from the capture device 20 via a recording buffer 114.
  • the near-end signal 122 includes near-end speech 120 and the echo 130.
  • the NLP 104 receives three signals as input. It receives (1) the far-end signal via the far-end buffer 106 and blocking buffer 108, (2) the near-end signal via the recording buffer 114, and (3) the output signal 124 of the filter 102.
  • the output signal 124 is also referred to as an error signal. In a case when the NLP 104 attenuates the output signal 124, a comfort noise signal is generated which will be explained later.
  • each frame is divided into 64 sample blocks. Since this choice of block size does not produce an integer number of blocks per frame the signal needs to be buffered before the processing. This buffering is handled by the blocking buffer 108 as discussed above. Both the filter 102 and the NLP 104 operate in the frequency domain and utilize DFTs of 128 samples.
  • the performance of the AEC 100 is influenced by the operation of the play- out buffer 112 and the recording buffer 114 at the sound device.
  • the AEC 100 may not start unless the combined size of the play-out buffer 112 and the recording buffer 114 is reasonably stable within a predetermined limit. For example, if the combined size is stable within +/- 8 ms of the first started size, for four consecutive frames, the AEC 100 is started by filling up the internal far-end buffer 106.
  • FIG. 2 illustrates a more detailed block diagram describing the functions performed in the filter 102 of Fig. 1.
  • Fig. 3 illustrates computational stages of the filter 102 in accordance with an embodiment of the present invention.
  • the adaptive filter 102 includes a first transform section 200, an inverse transform section 202, a second transform section 204, and an impulse response section (H) 206.
  • the far-end signal x(n) 210 to be rendered at the render device 10 is input to the first transform section 200.
  • the output signal X(n, k) of the first transform section 200 is input to the impulse response section 206.
  • the output signal Y(n, k) is input to the second transform section 202 which outputs the signal y(n).
  • This signal y(n) is then subtracted from the near-end signal d(n) 220 captured by the capture device 20 to output an error signal e(n) 230 as the output of the linear stage of the filter 102.
  • the error signal 230 is also input to the second transform section 204 the output signal of which, E(n, k), is also input to the impulse response section 206.
  • the above-mentioned adaptive filtering approach relates to an implementation of a standard blocked time-domain Least Mean Square (LMS) algorithm.
  • LMS Least Mean Square
  • the complexity reduction is due to the filtering and the correlations being performed in the frequency domain, where time-domain convolution is replaced by multiplication.
  • the error is formed in the time domain and is transformed to the frequency domain for updating the filter 102 as illustrated in Fig. 2.
  • FIG. 4 illustrates a more detailed block diagram describing block G m in the FLMS method of Fig. 3 in accordance with an embodiment of the present invention.
  • l N is a N x N-sized identity matrix
  • ON is a N x N-sized zero matrix. This means that the time domain vector is appended with N zeros before the Fourier transform.
  • the far-end samples, x(n) 310 are blocked into vectors of 2N samples, i.e. two blocks, at step S312,
  • x(k-m) [x ((k - m-2)N) ... x((k - m)N-l)] T
  • the estimated echo signal is then obtained as the N last coefficients of the inverse transformed sum of the filter products performed at step S320 from which first block is discarded at step S322.
  • the estimated echo signal is represented as
  • N zeros are inserted at step S316 to the error vector, and the augmented vector is transformed at step S318 as
  • Fig.4 illustrates a more detailed block diagram describing block G m in Fig.3 in accordance with an embodiment of the present invention where the filter coefficient update can be expressed as
  • W m ⁇ k + 1) W m (k ⁇ F ⁇ 0 X"(k-m ⁇ (k).
  • B(k) as shown in Fig.4, is a modified error vector.
  • the modification includes a power normalization followed by a magnitude limiter 410.
  • the normalized error vector as also shown in Fig.4, is
  • Q(k) diag ([1/po l/pi ... l/j3 ⁇ 4_v-i]) is a diagonal step size matrix controlling the adjustment of each frequency component using power estimates
  • the diagonal matrix X(k-m) is conjugated by the conjugate unit 420 which is then multiplied with vector B(k) prior to performing an inverse DFT transform by the Inverse Discrete Fourier Transform (IDFT) unit 430. Then the discard last block unit 440 discards the last block. After discarding the last block, a zero block is appended by the append zero block unit 450 prior to performing a DFT by the DFT unit 460. Then, a block delay is introduced by the delay unit 480 which outputs Wm(k).
  • IDFT Inverse Discrete Fourier Transform
  • Fig. 5 illustrates a flow diagram describing computational processes of the NLP 104 of Fig. 1 in accordance with an embodiment of the present invention.
  • the NLP 104 of the AEC 100 accepts three signals as input: i) the far-end signal x(n) 110 to be rendered by the render device 10, ii) the near-end signal d(n) 122 captured by the capture device 20, and iii) the output error signal e(n) 124 of the linear stage performed at the filter 102.
  • the error signal e(n) 124 typically contains residual echo that should be removed for good performance.
  • the objective of the NLP 104 is to remove this residual echo.
  • the first step is to transform all three input signals to the frequency domain.
  • the far-end signal 1 10 is transformed to the frequency domain.
  • the near-end signal 122 is transformed to the frequency domain and at step S501 ", the error signal 124 is transformed to the frequency domain.
  • the NLP 104 is block-based and shares the block length N of the linear stage, but uses an overlap-add method rather than overlap- save: consecutive blocks are concatenated, windowed and transformed. By defining o as the element-wise product operator, the k th transformed block is expressed as
  • F is the 2N DFT matrix as before
  • Xk is a length N time-domain sample column vector
  • the length 2N DFT vectors are retained.
  • the redundant N - 1 complex coefficients are discarded.
  • X A , Di and E* refer to the frequency-domain representations of the k* far-end, near- end and error blocks, respectively.
  • echo suppression is achieved by multiplying each frequency band of the error signal e(n) 124 with a suppression factor between 0 and 1.
  • each band corresponds to an individual DFT coefficient. In general, however, each band may correspond to an arbitrary range of frequencies. Comfort noise is added and after undergoing an inverse FFT, the suppressed signal is windowed, and overlapped and added with the previous block to obtain the output.
  • the power spectral density (PSD) of each signal is obtained.
  • the PSD of the far-end signal x(n) 110 is computed.
  • the PSD of the near- end signal d(n) 122 is computed and at step S503", the PSD of the error signal e(n) 124 is computed.
  • the PSDs of the far-end signal 110, near-end signal 122, and the error signal 124 are represented by S x , S d , and S e , respectively.
  • the complex-valued cross-PSDs between i) the far-end signal x(n) 110 and near-end signal d(n) 122, and ii) the near-end signal d(n) 122 and error signal e(n) 124 are also obtained.
  • the complex-valued cross-PSD between the far-end signal 110 and the near-end signal 122 is computed and at step S504', the complex-valued cross-PSD between the near-end signal 122 and the error signal 124 is computed.
  • the complex-valued cross-PSD of the far-end signal 110 and near-end signal 122 is represented as S xc i.
  • the complex-valued cross-PSD of the near-end signal 122 and error signal 124 is represented as S ⁇ j e .
  • the PSDs are exponentially smoothed to avoid sudden erroneous shifts in echo suppression.
  • the PSDs are given by
  • an old block is selected to best synchronize it with the corresponding echo in the near-end at step S505.
  • the linear filter 102 diverges from a good echo path estimate. This tends to result in a highly distorted error signal, which although still useful for analysis, should not be used for output. According to an embodiment of the invention, divergence may be detected fairly easily, as it usually adds rather than removes energy from the near-end signal d(n) 122.
  • the divergence state determined at step S51 1 is utilized to either select (S512) or D k as follows: If
  • the PSDs are used to compute the coherence measures for each frequency band between i) the far-end signal 1 10 and near-end signal 122 at step S513 as follows:
  • Coherence is a frequency- domain analog to time-domain correlation. It is a measure of similarity with 0 ⁇ c(n) ⁇ 1 ; where a higher coherence corresponds to more similarity.
  • ⁇ 3 ⁇ 4 E3 ⁇ 4 o c de- tinder the assumption that the linear stage is working properly, c ⁇ n) de ⁇ 1 when no echo has been removed, allowing the error to pass through unchanged.
  • 1 » c n) de ⁇ resulting in a suppression of the error, ideally removing any residual echo remaining after the linear filtering by the filter 102 at the linear stage.
  • the echo 130 is suppressed while allowing simultaneous near-end speech 120 to pass through.
  • the NLP 104 is configured to achieve this because the coherence is calculated independently for each frequency band. Thus, bands containing echo are fully or partially suppressed, while bands free of echo are not affected.
  • f s is the sampling frequency.
  • the preferred bands were chosen from frequency regions most likely to be accurate across a range of scenarios.
  • step S519 the system either selects c de or c X d.
  • c X d is tracked over time to determine the broad state of the system at step S521. The purpose of this is to avoid suppression when the echo path is close to zero (e.g. during a call with a headset).
  • a thresholded minimum of c X d is computed at step S519 as follows:
  • the system may contain echo and otherwise does not contain echo.
  • the echo state is provided through an interface for potential use by other audio processing components.
  • suppression is limited by selecting suppression factors as follows at step S520, S524 and S518:
  • the overdrive ⁇ is set at step S531 such that applying it to the minimum will result in the target suppression level: ⁇ is smoothed and threshold as
  • ⁇ 0.9 othen ise such that it will tend to move faster upwards than downwards.
  • s t and ⁇ are configurable to control the suppression aggressiveness; by default they are set to -11.5 and 2, respectively.
  • the s h level is computed at step S533.
  • the final suppression factors s y are produced according to the following algorithm.
  • s is first weighted towards si, according to a weighting vector V S N with components 0 ⁇ (n) ⁇ 1 :
  • V T N is another weighting vector fulfilling a similar purpose as V S N. Overdriving through raising to a power serves to accentuate valleys in s v .
  • a minimum statistics method is utilized to generate the comfort noise. More specifically, at every block a modified minimum of the near-end PSD is computed for each band:
  • White noise may be produced by generating a random complex vector, u ⁇ , on the unit circle. This is shaped to match NDI ⁇ and weighted by the suppression levels to give the following comfort noise:
  • N j . N f c O Hay O y' - s ⁇ o s,
  • Fig. 6 is a block diagram of the AEC 100 in conjunction with an automatic gain controller (AGC) 600 in accordance with an embodiment of the present invention.
  • the AGC controller 600 includes an AGC analysis unit 601 and an AGC processing unit 603.
  • the AEC 100 receives, as input, the far-end signal 1 10 and the near-end signal 122.
  • the AEC 100 determines the echo state of the NLP 104 included in the AEC 100 as shown in Fig. 1.
  • the sections above with reference to Fig. 5, as well as the sections below with reference to Fig. 7 describe the algorithms by which the echo state of the NLP 104 is determined.
  • the "no- echo” state is selected when the near-end signal 122 does not contain echo.
  • the "echo” state is entered when the near-end signal 122 might contain echo.
  • echo cancellation information is received by the AGC processing unit 603. This echo cancellation information is used to control the AGC processing unit 603.
  • the AGC processing unit 603 prevents upward adaptation to the near-end signal 122.
  • the AGC analysis unit 601 analyzes the level of the near-end signal 122 and extracts information about the signal level and outputs the level information to the AGC processing unit 603.
  • information about the signal level of the near-end signal 122 may include, but is not limited to, both a long-term and short- term moving average of the signal power.
  • the signal level information is then passed to the AGC processing unit 603.
  • the AGC processing unit 603 makes a decision about what to do with the information and, if available, how to adjust the analog level at the capture device 20.
  • the AGC processing unit 601 may also make digital changes to the near-end signal.
  • Fig. 7 shows a flow diagram illustrating operations performed by the acoustic echo canceller 100 according to the exemplary aspect of the present invention. More specifically, according to an embodiment of the invention, Fig. 7 further describes the algorithms on how echo state and suppression factors are determined in the NLP 104 of the AEC 100 as described above with respect to Figs. 5 and 6.
  • both the coherence c xd between the far-end signal 110 and near-end signal 122 and the coherence c de between the near-end signal 122 and error signal 124 are tracked over time to determine the state of the AEC 100. Based on the determination of a high or a low coherence, the NLP 104 decides whether to enter or leave the coherent state.
  • coherence is a frequency domain analog to time-domain correlation. More specifically, as mentioned above with reference to Fig. 5, coherence is a measure of similarity with 0 ⁇ c(n) ⁇ 1 ; where a higher coherence corresponds to more similarity.
  • step S713 if the NLP 104 determines that the AEC 100 is not in the coherent state, the following suppression factor s is output by the NLP 104 at step S721 :
  • Si S(n
  • the suppression factors may then be applied by the NLP 104 to the error signal 124 to substantially remove residual echo from the error signal 124.
  • Fig. 8 is a flow diagram illustrating interactions of the AEC 100 and the AGC 600 according to an embodiment of the present invention illustrated in Fig. 6.
  • echo state information from the AEC 100 and signal level information of the near-end signal 122 are received.
  • the AGC processing unit 603 determines a gain adaptation for the near-end signal 122 based on the signal level information.
  • the AGC processing unit 603 prevents upward gain adaptation to the received near-end signal when the echo state information indicates that the received near-end signal 122 contains an echo signal.
  • Fig. 9 is a block diagram illustrating an example computing device 900 that may be utilized to implement the AEC 100 including, but not limited to, the NLP 104, the filter 102, the far-end buffer 106, the blocking buffer 108, as well as the AGC analysis unit 601 and the AGC processing unit 603 in accordance with the present disclosure.
  • the computing device 900 may also be utilized to implement the processes illustrated in Figs. 3, 5, 7, and 8 in accordance with the present disclosure.
  • computing device 900 typically includes one or more processors 910 and system memory 920.
  • a memory bus 930 can be used for communicating between the processor 910 and the system memory 920.
  • processor 910 can be of any type including but not limited to a microprocessor ( ⁇ ), a microcontroller ( ⁇ ), a digital signal processor (DSP), or any combination thereof.
  • Processor 910 can include one more levels of caching, such as a level one cache 911 and a level two cache 912, a processor core 913, and registers 914.
  • the processor core 913 can include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof.
  • a memory controller 915 can also be used with the processor 910, or in some implementations the memory controller 915 can be an internal part of the processor 910.
  • system memory 920 can be of any type including but not limited to volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.) or any combination thereof.
  • System memory 920 typically includes an operating system 921, one or more applications 922, and program data 924.
  • Application 922 includes an echo cancellation processing algorithm 923 that is arranged to limit gain control adaptation.
  • Program Data 924 includes echo cancellation routing data 925 that is useful for limiting gain control adaptation, as will be further described below.
  • application 922 can be arranged to operate with program data 924 on an operating system 921 such that gain control adaptation is limited. This described basic configuration is illustrated in FIG. 9 by those components within dashed line 901.
  • Computing device 900 can have additional features or functionality, and additional interfaces to facilitate communications between the basic configuration 901 and any required devices and interfaces.
  • a bus/interface controller 940 can be used to facilitate communications between the basic configuration 901 and one or more data storage devices 950 via a storage interface bus 941.
  • the data storage devices 950 can be removable storage devices 951 , non-removable storage devices 952, or a combination thereof.
  • Examples of removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDD), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD), and tape drives to name a few.
  • Example computer storage media can include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.
  • System memory 920, removable storage 951 and non-removable storage 952 are all examples of computer storage media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 900. Any such computer storage media can be part of device 900.
  • Computing device 900 can also include an interface bus 942 for facilitating communication from various interface devices (e.g., output interfaces, peripheral interfaces, and communication interfaces) to the basic configuration 901 via the bus/interface controller 940.
  • Example output devices 960 include a graphics processing unit 961 and an audio processing unit 962, which can be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 963.
  • Example peripheral interfaces 970 include a serial interface controller 971 or a parallel interface controller 972, which can be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 973.
  • An example communication device 990 includes a network controller 991, which can be arranged to facilitate communications with one or more other computing devices 990 over a network communication via one or more communication ports 992.
  • the communication connection is one example of a communication media.
  • Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media.
  • a “modulated data signal” can be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media can include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared (IR) and other wireless media.
  • RF radio frequency
  • IR infrared
  • computer readable media can include both storage media and communication media.
  • Computing device 900 can be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.
  • a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.
  • PDA personal data assistant
  • Computing device 900 can also be implemented as a personal computer including both laptop computer and non-laptop computer configurations.
  • the implementer may opt for a mainly hardware and/or firmware vehicle; if flexibility is paramount, the implementer may opt for a mainly software implementation; or, yet again alternatively, the implementer may opt for some combination of hardware, software, and/or firmware.
  • ASICs Application Specific Integrated Circuits
  • FPGAs Field Programmable Gate Arrays
  • DSPs digital signal processors
  • Examples of a signal bearing medium include, but are not limited to, the following: a recordable type medium such as a floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, a computer memory, etc.; and a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
  • a typical data processing system generally includes one or more of a system unit housing, a video display device, a memory such as volatile and non-volatile memory, processors such as microprocessors and digital signal processors, computational entities such as operating systems, drivers, graphical user interfaces, and applications programs, one or more interaction devices, such as a touch pad or screen, and/or control systems including feedback loops and control motors (e.g., feedback for sensing position and/or velocity; control motors for moving and/or adjusting components and/or quantities).
  • a typical data processing system may be implemented utilizing any suitable commercially available components, such as those typically found in data computing/communication and/or network computing/communication systems.

Abstract

A method and system for limiting gain control adaptation to a near-end audio signal using echo cancellation information is disclosed. The system includes an echo canceller (100) and an automatic gain controller (AGC) (600). The echo canceller (100) includes a non-linear processor (104) which determines echo state information of the echo canceller. The AGC (100) includes an AGC analyzing unit (601) and an AGC processing unit (603). The AGC analyzing unit (601) analyzes signal level information of the near-end audio signal (122) received by the echo canceller (100). The AGC processing unit (603) determines a gain adaptation for the near-end audio signal (122) based on the signal level information and prevents upward gain adaptation to the received near-end audio signal (122) when said echo state information indicates that the received near-end audio signal (122) contains an echo signal.

Description

USING ECHO CANCELLATION INFORMATION TO LIMIT GAIN CONTROL
ADAPTATION
Technical Field of the Invention
[0001] The present invention relates generally to a method and system for cancellation of echoes in telecommunication systems. It particularly relates to a method and system for using echo cancellation information to minimize gain control adaptation.
Background of the Invention
[0002] Speech quality is an important factor for telephony system suppliers. Customer demand makes it vital to strive for continuous improvements. An echo, which is a delayed version of what was originally transmitted, is regarded as a severe distraction to the speaker if the delay is long. For short round trip delays of less than approximately 20 ms, the speaker will not be able to distinguish the echo from the side tone in the handset. However, for long-distance communications, such as satellite communications, a remotely generated echo signal often has a substantial delay. Moreover, the speech and channel coding compulsory in digital radio communications systems and for telephony over the Internet protocol (IP telephony, for short) also result in significant delays which make the echoes generated a relatively short distance away clearly audible to the speaker. Hence, canceling the echo is a significant factor in maintaining speech quality.
[0003] An echo canceller typically includes a linear filtering part which essentially is an adaptive filter that tries to adapt to the echo path. In this way, a replica of the echo can be produced from the far-end signal and subtracted from the near-end signal, thereby canceling the echo.
[0004] The filter generating the echo replica may have a finite or infinite impulse response. Most commonly it is an adaptive, linear finite impulse response (FIR) filter with a number of delay lines and a corresponding number of coefficients, or filter delay taps. The coefficients are values, which when multiplied with delayed versions of the filter input signal, generate an estimate of the echo. The filter is adapted, i.e. updated, so that the coefficients converge to optimum values. A traditional way to cancel out the echo is to update a finite impulse response (FIR) filter using the normalized least mean square (NLMS) algorithm.
[0005] An automatic gain control (AGC) system attempts to bring an audio signal to an appropriate level prior to application of echo cancellation by an acoustic echo canceller (AEC). In a real-time communications context, an AGC could be presented with audio containing echo in addition to a target speech. As the AGC analyzes a near-end signal before it is processed by the AEC, it is exposed to echo which is very often at a different level than the target speech.
[0006] A problem arises when the AGC erroneously adapts upwards to a low echo signal. This causes the level of the target speech to be inappropriately high.
Summary of the Invention
[0007] This Summary introduces a selection of concepts in a simplified form in order to provide a basic understanding of some aspects of the present disclosure. This Summary is not an extensive overview of the disclosure, and is not intended to identify key or critical elements of the disclosure or to delineate the scope of the disclosure. This Summary merely presents some of the concepts of the disclosure as a prelude to the Detailed Description provided below.
[0008] According to an aspect of the present invention, a method for limiting gain control adaptation to a near-end audio signal using echo state information obtained from an echo canceller is disclosed. The method includes receiving echo state information from the echo canceller and signal level information of the near-end audio signal received by the echo canceller; determining a gain adaptation for the near-end audio signal based on the signal level information; and preventing upward gain adaptation to the received near-end audio signal when the echo state information indicates that the received near-end audio signal contains an echo signal.
[0009] According to a further aspect of the present invention, the method includes computing a first coherence value by comparing correlations between a far-end signal and the near-end signal; computing a second coherence value by comparing correlations between the near-end signal and an error signal containing a residual echo output from a linear adaptive filter; and tracking the first and second coherence values to determine the echo state information.
[0010] According to yet another aspect of the present invention, the method includes performing echo cancellation based on echo cancellation information obtained from the echo canceller to generate an outgoing signal.
[0011] According to a further aspect of the present invention, the method includes adding a comfort noise to the outgoing signal.
[0012] According to an aspect of the present invention, the signal level information of the near-end audio signal includes a moving average of the power of the near-end audio signal.
[0013] According to another aspect of the present invention, a system for limiting gain control adaptation to a near-end audio signal using echo state information obtained from an echo canceller is disclosed. The system includes an echo canceller that receives, as input, the near-end audio signal, the echo canceller comprising a non-linear processor, characterized in that the non-linear processor is configured to output echo state information of the echo canceller and an automatic gain control (AGC) analyzing unit operatively connected to the echo canceller, the AGC analyzing unit analyzing signal level information of the near-end audio signal received by the echo canceller. The system also includes an AGC processing unit operatively connected to the echo canceller and the AGC analyzing unit, the AGC processing unit determining a gain adaptation for the near-end audio signal based on the signal level information and preventing upward gain adaptation to the received near-end audio signal when the echo state information indicates that the received near-end audio signal contains an echo signal.
[0014] According to a further aspect of the present invention, the non-linear processor computes a first coherence value by comparing correlations between a far-end signal and the near-end signal and a second coherence value by comparing correlations between the near- end signal and an error signal containing a residual echo output from a linear adaptive filter and tracks the first and second coherence values to determine the echo state information.
[0015] According to yet another aspect of the present invention, the echo canceller performs echo cancellation on the near-end signal based on echo cancellation information to generate an outgoing signal. [0016] According to another aspect of the present invention, the system includes a comfort noise generator to generate a comfort noise to be added to the outgoing signal.
[0017] In accordance with a further aspect of the present invention, a computer-readable storage medium having stored thereon computer executable program for limiting gain control adaptation to a near-end audio signal using echo state information obtained from an echo canceller is disclosed. The computer program when executed causes a processor to execute the steps of: receiving echo state information from the echo canceller and signal level information of the near-end audio signal received by the echo canceller; determining a gain adaptation for the near-end audio signal based on the signal level information; and preventing upward gain adaptation to the received near-end audio signal when the echo state information indicates that the received near-end audio signal contains an echo signal.
[0018] According to an aspect of the present invention, the computer program when executed causes the processor to further execute the steps of: computing a first coherence value by comparing correlations between a far-end signal and the near-end signal; computing a second coherence value by comparing correlations between the near-end signal and an error signal containing a residual echo output from a linear adaptive filter; and tracking the first and second coherence values to determine the echo state information.
[0019] According to yet another aspect of the present invention, the computer program when executed causes the processor to further execute the step of performing echo cancellation based on echo cancellation information obtained from the echo canceller to generate an outgoing signal.
[0020] According to another aspect of the present invention, the computer program when executed causes the processor to further execute the step of adding a comfort noise to the outgoing signal.
Brief Description of the Drawings
[0021] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate several embodiments of the invention and together with the description, serve to explain the principles of the invention. [0022] Fig. 1 is a block diagram of an acoustic echo canceller in accordance with an embodiment of the present invention.
[0023] Fig. 2 illustrates a more detailed block diagram describing the functions performed in the adaptive filter of Fig. 1 in accordance with an embodiment of the present invention.
[0024] Fig. 3 illustrates computational stages of the adaptive filter of Fig. 2 in accordance with an embodiment of the present invention.
[0025] Fig. 4 illustrates a more detailed block diagram describing block Gm in Fig. 3 in accordance with an embodiment of the present invention.
[0026] Fig. 5 illustrates a flow diagram describing computational stages of the nonlinear processor of Fig. 1 in accordance with an embodiment of the present invention.
[0027] Fig. 6 is a block diagram of an acoustic echo canceller and an automatic gain controller in accordance with an embodiment of the present invention.
[0028] Fig. 7 is a flow diagram illustrating operations performed by the acoustic echo canceller according to an embodiment of the present invention illustrated in Fig. 6.
[0029] Fig. 8 is a flow diagram illustrating interactions of the acoustic echo canceller and the automatic gain controller according to an embodiment of the present invention illustrated in Fig. 6.
[0030] Fig. 9 is a block diagram illustrating an exemplary computing device that is arranged for acoustic echo cancellation in accordance with an embodiment of the present invention.
Detailed Description
[0031] The following detailed description of the embodiments of the invention refers to the accompanying drawings. The following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims and equivalents thereof.
[0032] Fig. 1 illustrates an acoustic echo canceller (AEC) 100 in accordance with an exemplary embodiment of the present invention. [0033] The AEC 100 is designed as a high quality echo canceller for voice and audio communication over packet switched networks. More specifically, the AEC 100 is designed to cancel acoustic echo 130 that emerges due to the reflection of sound waves of a render device 10 from boundary surfaces and other objects back to a near-end capture device 20. The echo 130 may also exist due to the direct path from render device 10 to the capture device 20.
[0034] Render device 10 may be any of a variety of audio output devices, including a loudspeaker or group of loudspeakers configured to output sound from one or more channels. Capture device 20 may be any of a variety of audio input devices, such as one or more microphones configured to capture sound and generate input signals. For example, render device 10 and capture device 20 may be hardware devices internal to a computer system, or external peripheral devices connected to a computer system via wired and/or wireless connections. In some arrangements, render device 10 and capture device 20 may be components of a single device, such as a microphone, telephone handset, etc. Additionally, one or both of render device 10 and capture device 20 may include analog-to-digital and/or digital-to-analog transformation functionalities.
[0035] With reference to Fig. 1 , the echo canceller 100 includes a linear filter 102, a nonlinear processor ( LP) 104, a far-end buffer 106, and a blocking buffer 108. A far- end signal 1 10 generated at the far-end and transmitted to the near-end is input to the filter 102 via the far-end buffer (FEBuf) 106 and the blocking buffer 108. The far-end signal 1 10 is also input to a play-out buffer 1 12 located near the render device 10. The output signal 1 16 of the far-end buffer 106 is input to the blocking buffer 108 and the output signal 118 of the blocking buffer is input to the linear filter 102.
[0036] The far-end buffer 106 is configured to compensate for and synchronize to buffering at sound devices (not shown). The blocking buffer 108 is configured to block the signal samples for a frequency-domain transformation to be performed by the linear filter 102 and the NLP 104.
[0037] The linear filter 102 is an adaptive filter. Linear filter 102 operates in the frequency domain through, e.g., the Discrete Fourier Transform (DFT). The DFT may be implemented as a Fast Fourier Transform (FFT).
[0038] The other input to the filter 102 is the near-end signal (Sin) 122 from the capture device 20 via a recording buffer 114. The near-end signal 122 includes near-end speech 120 and the echo 130. The NLP 104 receives three signals as input. It receives (1) the far-end signal via the far-end buffer 106 and blocking buffer 108, (2) the near-end signal via the recording buffer 114, and (3) the output signal 124 of the filter 102. The output signal 124 is also referred to as an error signal. In a case when the NLP 104 attenuates the output signal 124, a comfort noise signal is generated which will be explained later.
[0039] According to an exemplary embodiment, each frame is divided into 64 sample blocks. Since this choice of block size does not produce an integer number of blocks per frame the signal needs to be buffered before the processing. This buffering is handled by the blocking buffer 108 as discussed above. Both the filter 102 and the NLP 104 operate in the frequency domain and utilize DFTs of 128 samples.
[0040] The performance of the AEC 100 is influenced by the operation of the play- out buffer 112 and the recording buffer 114 at the sound device. The AEC 100 may not start unless the combined size of the play-out buffer 112 and the recording buffer 114 is reasonably stable within a predetermined limit. For example, if the combined size is stable within +/- 8 ms of the first started size, for four consecutive frames, the AEC 100 is started by filling up the internal far-end buffer 106.
[0041] Fig. 2 illustrates a more detailed block diagram describing the functions performed in the filter 102 of Fig. 1. Fig. 3 illustrates computational stages of the filter 102 in accordance with an embodiment of the present invention.
[0042] With reference to Fig. 2, the adaptive filter 102 includes a first transform section 200, an inverse transform section 202, a second transform section 204, and an impulse response section (H) 206. The far-end signal x(n) 210 to be rendered at the render device 10 is input to the first transform section 200. The output signal X(n, k) of the first transform section 200 is input to the impulse response section 206. The output signal Y(n, k) is input to the second transform section 202 which outputs the signal y(n). This signal y(n) is then subtracted from the near-end signal d(n) 220 captured by the capture device 20 to output an error signal e(n) 230 as the output of the linear stage of the filter 102. The error signal 230 is also input to the second transform section 204 the output signal of which, E(n, k), is also input to the impulse response section 206. [0043] The above-mentioned adaptive filtering approach relates to an implementation of a standard blocked time-domain Least Mean Square (LMS) algorithm. According to an embodiment of the invention, the complexity reduction is due to the filtering and the correlations being performed in the frequency domain, where time-domain convolution is replaced by multiplication. The error is formed in the time domain and is transformed to the frequency domain for updating the filter 102 as illustrated in Fig. 2.
[0044] There is a signal delay in the system due to the transform blocking. To reduce delay the filter 102 is partitioned in smaller segments and by overlap-save processing the overall delay is kept to the segment length. This method is referred to as partitioned block frequency domain method or multi-delay partitioned block frequency adaptive filter. For simplicity it is referred to as FLMS.
[0045 J The operation of the FLMS method is illustrated in Fig. 3. Fig. 4 illustrates a more detailed block diagram describing block Gm in the FLMS method of Fig. 3 in accordance with an embodiment of the present invention.
[0046] With a total filter length L = M · N partitioned in blocks of N samples and with F = 2N x 2N Discrete Fourier Transform (DFT) matrix, the time domain impulse response of the filter 102, w(n), n = 0, 1, ... , L - 1, can be expressed in the frequency domain as a collection of partitioned filters
Wra(Ar) - F W (ft), (1 )
where wm(k) = [wmN . . . w(m+i)N-1]T,
lN is a N x N-sized identity matrix, and ON is a N x N-sized zero matrix. This means that the time domain vector is appended with N zeros before the Fourier transform.
[0047] The time domain filter coefficients, w(n) are not utilized in the algorithm and equation (1) is presented to establish the relation between the time- and frequency-domain coefficients.
[0048] As illustrated in Fig. 3, the far-end samples, x(n) 310, are blocked into vectors of 2N samples, i.e. two blocks, at step S312,
x(k-m)=[x ((k - m-2)N) ... x((k - m)N-l)]T
and transformed into a sequence of DFT vectors at step S314,
X(k - m)= diag(Fx(k - m)).
[0049] This is implemented as a table of delayed DFT vectors, since the diagonal matrix also can be expressed as X(k - m)= DmX(k), where D is a delay operator. For each delayed block altering is performed as the multiplication of the diagonal matrix X(k - m) with a filter partition
Ym(k)=X(k - m)Wm(k) m = 0 , 1 , M- 1
[0050] The estimated echo signal is then obtained as the N last coefficients of the inverse transformed sum of the filter products performed at step S320 from which first block is discarded at step S322. The estimated echo signal is represented as
.¥-1
y{k) = Qy ((k - l)N) . .. y(kN - l) = [0 lN] F"1 Ym(k)
=0
[0051] The error is then formed in the time domain as
e(k) = d(k) - y(k)
and this is also the output of the filter 102 of the AEC 100 as shown in Fig. 1. To adjust the filter coefficients, N zeros are inserted at step S316 to the error vector, and the augmented vector is transformed at step S318 as
Figure imgf000011_0001
[0052] Fig.4 illustrates a more detailed block diagram describing block Gm in Fig.3 in accordance with an embodiment of the present invention where the filter coefficient update can be expressed as
Ix ON
Wm{k + 1) =Wm(k} F~^0X"(k-m^ (k).
Οχ 0Λ' with a stepsize μο = 0.5 and where B(k), as shown in Fig.4, is a modified error vector. The modification includes a power normalization followed by a magnitude limiter 410. The normalized error vector, as also shown in Fig.4, is
A(Jfe) = Q(k)E(k),
where
Q(k) = diag ([1/po l/pi ... l/j¾_v-i]) is a diagonal step size matrix controlling the adjustment of each frequency component using power estimates
≠k) = Χ^- 1) + (1-Χρ)Μ\Χ^\2, j = 0f L ... ,2tf - l, recursively calculated with a forgetting factor λρ = 0.9 and individual DFT coefficients Xy = (X(k)} jj is input to the magnitude limiter 410. The component magnitudes are then limited to a constant maximum magnitude, A0= 1.5 x 10"6, into the vector B(k) with components Aj(k)
Ao > Ao
Aj(k)
[0053] As illustrated in Fig. 4, the diagonal matrix X(k-m) is conjugated by the conjugate unit 420 which is then multiplied with vector B(k) prior to performing an inverse DFT transform by the Inverse Discrete Fourier Transform (IDFT) unit 430. Then the discard last block unit 440 discards the last block. After discarding the last block, a zero block is appended by the append zero block unit 450 prior to performing a DFT by the DFT unit 460. Then, a block delay is introduced by the delay unit 480 which outputs Wm(k).
[0054] Fig. 5 illustrates a flow diagram describing computational processes of the NLP 104 of Fig. 1 in accordance with an embodiment of the present invention.
[0055] The NLP 104 of the AEC 100 accepts three signals as input: i) the far-end signal x(n) 110 to be rendered by the render device 10, ii) the near-end signal d(n) 122 captured by the capture device 20, and iii) the output error signal e(n) 124 of the linear stage performed at the filter 102. The error signal e(n) 124 typically contains residual echo that should be removed for good performance. The objective of the NLP 104 is to remove this residual echo.
[0056] The first step is to transform all three input signals to the frequency domain. At step S501 , the far-end signal 1 10 is transformed to the frequency domain. At step S501 ', the near-end signal 122 is transformed to the frequency domain and at step S501 ", the error signal 124 is transformed to the frequency domain. The NLP 104 is block-based and shares the block length N of the linear stage, but uses an overlap-add method rather than overlap- save: consecutive blocks are concatenated, windowed and transformed. By defining o as the element-wise product operator, the kth transformed block is expressed as
Figure imgf000012_0001
where F is the 2N DFT matrix as before, Xk is a length N time-domain sample column vector and W v is a length 2N square-root Harming window column vector with entries w(n) = 1— cos Q; L . . . ; 2N - l
Figure imgf000013_0001
[0057] The vvdndow is chosen such that the overlapping segments satisfy w2 (n) + w2 (n - N) = l, n = N, N + L . . . , 2N to provide perfect reconstruction. According to an embodiment of the invention, the length 2N DFT vectors are retained. Preferably, however, the redundant N - 1 complex coefficients are discarded.
[0058] XA, Di and E* refer to the frequency-domain representations of the k* far-end, near- end and error blocks, respectively.
[0059] According to a further embodiment of the invention, echo suppression is achieved by multiplying each frequency band of the error signal e(n) 124 with a suppression factor between 0 and 1. According to a preferred embodiment, each band corresponds to an individual DFT coefficient. In general, however, each band may correspond to an arbitrary range of frequencies. Comfort noise is added and after undergoing an inverse FFT, the suppressed signal is windowed, and overlapped and added with the previous block to obtain the output.
[0060] For analysis, the power spectral density (PSD) of each signal is obtained. At step S503, the PSD of the far-end signal x(n) 110 is computed. At step S503', the PSD of the near- end signal d(n) 122 is computed and at step S503", the PSD of the error signal e(n) 124 is computed. The PSDs of the far-end signal 110, near-end signal 122, and the error signal 124 are represented by Sx, Sd, and Se, respectively.
[0061] In addition, the complex-valued cross-PSDs between i) the far-end signal x(n) 110 and near-end signal d(n) 122, and ii) the near-end signal d(n) 122 and error signal e(n) 124 are also obtained. At step S504, the complex-valued cross-PSD between the far-end signal 110 and the near-end signal 122 is computed and at step S504', the complex-valued cross-PSD between the near-end signal 122 and the error signal 124 is computed. The complex-valued cross-PSD of the far-end signal 110 and near-end signal 122 is represented as Sxci. The complex-valued cross-PSD of the near-end signal 122 and error signal 124 is represented as S<je. The PSDs are exponentially smoothed to avoid sudden erroneous shifts in echo suppression. The PSDs are given by
where the "*" here represents the complex conjugate, and where the exponential smoothing factor is given by
Figure imgf000014_0001
[0062] Note that X*· = Y* for the "auto" PSDs, which are therefore real-valued while the cross-PSDs are complex valued.
[0063] Rather than using the current input far-end block, an old block is selected to best synchronize it with the corresponding echo in the near-end at step S505. The index of the partition, m, with maximum energy in the linear filter is chosen as follows: d = arg max( 11 Wm 1 12 )
m
[0064] This estimated delay index is used to select the best block at step S507 for use in the far-end PSDs. Additionally, the far-end auto-PSD is thresholded at step S509 in order to avoid numerical instability as follows: s'xkx = max(Sxkxk. So), So = 15
[0065] It is sometimes the case that the linear filter 102 diverges from a good echo path estimate. This tends to result in a highly distorted error signal, which although still useful for analysis, should not be used for output. According to an embodiment of the invention, divergence may be detected fairly easily, as it usually adds rather than removes energy from the near-end signal d(n) 122. The divergence state determined at step S51 1 is utilized to either select (S512) or Dk as follows: If
Sj¾j¾ | | l > \ \ ^Dk Dk | | l then the "diverge" state is entered, in which the effect of the linear stage is reversed by setting E/c = Ok. The diverge state is left if
O | |S )fc£)fe | | i, ο = 1.05 ^
Furthermore, if divergence is very high, such as
Pi¾i¾. | | i > σι ζ¾ζ¾ | |ι, σ-L = 19.95 the linear filter 102 resets to its initial state
Wm(k) = 0N i m = 0. 1 , . . . M - 1
The PSDs are used to compute the coherence measures for each frequency band between i) the far-end signal 1 10 and near-end signal 122 at step S513 as follows:
S X -/),. o S y n
1 S y - ° and ii) the near-end signal 122 and error signal 124 at step S515 as follows:
_ S¾ ¾ ° SDkEk where the "*" here again represents the complex conjugate.
[0066] Denote a c vector entry in position n as c(n). Coherence is a frequency- domain analog to time-domain correlation. It is a measure of similarity with 0 < c(n) < 1 ; where a higher coherence corresponds to more similarity.
[0067] The primary effect of the NLP 104 is achieved through directly suppressing the error signal 124 with the coherence measures. Generally speaking, the output is given by
Υ¾ = E¾ o c de- tinder the assumption that the linear stage is working properly, c{n)de ~ 1 when no echo has been removed, allowing the error to pass through unchanged. In the opposite case of the linear stage having removed echo, 1 » c n)de≥ 0, resulting in a suppression of the error, ideally removing any residual echo remaining after the linear filtering by the filter 102 at the linear stage.
[0068] According to an embodiment of the invention, c*/ is considered to increase robustness, as described below, though Cde tends to be more useful in practice. Contrary to C d e , cXd is relatively high when there is echo 130, and low otherwise. To have the two measures in the same "domain" a modified coherence is defined as follows: c Xd = 1 - cX d .
[0069] It is preferred that to achieve high AEC performance, the echo 130 is suppressed while allowing simultaneous near-end speech 120 to pass through. The NLP 104 is configured to achieve this because the coherence is calculated independently for each frequency band. Thus, bands containing echo are fully or partially suppressed, while bands free of echo are not affected.
[0070] According to an embodiment of the invention, several data analysis method are used to tweak the coherence before it is applied as a suppression factor, s. First, the average coherence across a set of preferred bands is computed at step S517 for C d e > and at step S 5 1 7 ' for c*xd as
Figure imgf000016_0001
where fs is the sampling frequency. The preferred bands were chosen from frequency regions most likely to be accurate across a range of scenarios.
[0071] At step S518, the system either selects c de or c Xd. According to an exemplary embodiment, c Xd is tracked over time to determine the broad state of the system at step S521. The purpose of this is to avoid suppression when the echo path is close to zero (e.g. during a call with a headset). First, a thresholded minimum of c Xd is computed at step S519 as follows:
Figure imgf000016_0002
Γηπι ¾ _1 + xc r Ot erwise with a step-size μ0 = 0.0006m β and factor m /s given by
if fs = 8000
m,fa =
otherwise [0072] This is used to construct two decision variables if Cde < 0.95 and Cxdk < 0.8
if cdek > 0.98 and ¾ > 0.9 . k > 0. ¾¾ = 0
Figure imgf000017_0001
otherwise , and
0 if cxdk = 1 or Wfc = 1
, k > 0, fieo = 0
1 otherwise
[0073] The system is considered in the "coherent state" when c = 1 and in the "echo state" when ut = 1. In the echo state, the system may contain echo and otherwise does not contain echo. The echo state is provided through an interface for potential use by other audio processing components.
[0074] While in the echo state, the suppression factor s is computed at step S520 by selecting the minimum of C d e , c ' x d in each band as s = ηιΐιι(ο&, c'xd)
[0075] Two overall suppression factors are computed at step S533 and S527 from order statistics across the preferred bands:
{sh. si} = {s{nh . s(ni)}, {nh, n-i} = [no + {0-5, 0.75} (nt - no + 1)J
[0076] This approach of selecting suppression factors is more robust to outliers than the average, and allows tuning through the exact selection of the order statistic position.
[0077] While in the "no echo state" (i.e. uc = 0), suppression is limited by selecting suppression factors as follows at step S520, S524 and S518:
Figure imgf000017_0002
[0078] Across most scenarios, there is a typical suppression level required to reasonably remove all residual echo. This is considered to be the target suppression, st. A scalar "overdrive" is applied to s to weight the bands towards sf. This improves performance in more difficult cases where the coherence measures are not accurate enough by themselves. The minimum si level is computed at step S527 and tracked at step S529 over time
/ " - _ ! ¼> si} if Si < 5/fc_. < 0.6
{ ■ - I ; min^ - ½ ; i) } otherwise ' ' ' io ~ with a step-size μ3 = 0.0008 m β.
[0079] When the minimum s'ik is unchanged for two consecutive blocks, the overdrive γ is set at step S531 such that applying it to the minimum will result in the target suppression level:
Figure imgf000018_0001
γ is smoothed and threshold as
0.99 if < ¾
yffc-i 4- (1 - A-,} msx( A7
{ 0.9 othen ise such that it will tend to move faster upwards than downwards. st and γο are configurable to control the suppression aggressiveness; by default they are set to -11.5 and 2, respectively. Additionally, when
= 1 the smoothed overdrive is reset to the minimum,
fc = 70 -
[0080] The sh level is computed at step S533. Next, the final suppression factors sy are produced according to the following algorithm. At step S525 s is first weighted towards si, according to a weighting vector VSN with components 0 < (n) < 1 :
4- [1 - VsN (n)}$(n) if s(n) >
s{n) otherwise [0081] The weighting is selected to influence typically less accurate bands more heavily. Applying the overdriving at step S535, the following is derived:
where VTN is another weighting vector fulfilling a similar purpose as VSN. Overdriving through raising to a power serves to accentuate valleys in sv. Finally, at step S536 the frequency- domain output block is given by
s, o Efc -l- N'fc
where Sk is artificial noise and at step S537, an inverse transform is performed to obtain the output signal y(n). The suppression removes near-end noise as well as echo, resulting in an audible change in the noise level. This issue is mitigated by adding generated "comfort noise" to replace the lost noise. The generation of 'k will be discussed in a later section below.
[0082] The overlap-add transformation is inverted to arrive at the length N time- domain output signal as
Yk Yk
= (F- lYjk ) o w2iv + j k > 0. y0 = O.v
Yk+i ON
[0083] To generate comfort noise, a reliable estimate of the true near-end background noise is required. According to an embodiment of the invention, a minimum statistics method is utilized to generate the comfort noise. More specifically, at every block a modified minimum of the near-end PSD is computed for each band:
'¾>*(«) + -Vfe-t(n) - ¾¾(»))) if ¾>*(«) < - ©.(«) . k > 0. j¾(n) - 10e with a step-size μ = 0.1 and ramp λΝ = 1.0002. No(n) is set such that it will be greater than a reasonable noise power. S Dk is very similar to that discussed above, but is instead computed from the un- windowed DFT coefficients of the linear filter 102 computed at the linear stage.
[0084] White noise may be produced by generating a random complex vector, u^, on the unit circle. This is shaped to match NDI< and weighted by the suppression levels to give the following comfort noise:
Nj. = Nfc O Hay O y' - s^ o s,
[0085] Fig. 6 is a block diagram of the AEC 100 in conjunction with an automatic gain controller (AGC) 600 in accordance with an embodiment of the present invention. The AGC controller 600 includes an AGC analysis unit 601 and an AGC processing unit 603.
[0086] The AEC 100 receives, as input, the far-end signal 1 10 and the near-end signal 122. The AEC 100 determines the echo state of the NLP 104 included in the AEC 100 as shown in Fig. 1. The sections above with reference to Fig. 5, as well as the sections below with reference to Fig. 7 describe the algorithms by which the echo state of the NLP 104 is determined.
[0087] In short, according to an embodiment of the invention with reference to Fig. 6, a determination is made whether the NLP 104 is in a "no -echo" state or "echo" state. The "no- echo" state is selected when the near-end signal 122 does not contain echo. Conversely, the "echo" state is entered when the near-end signal 122 might contain echo. Based on this determination, echo cancellation information is received by the AGC processing unit 603. This echo cancellation information is used to control the AGC processing unit 603. When the NLP 104 has determined that the near-end signal 122 may contain echo, the AGC processing unit 603 prevents upward adaptation to the near-end signal 122. This has the effect of preventing the AGC processing unit 603 from erroneously adapting upwards to a low echo signal, which would otherwise cause the level of the target speech 120 to be inappropriately high. Downward adaptation to an echo signal is still permitted. This is because a saturated echo is difficult for the AEC 100 to handle and should be avoided.
[0088] The AGC analysis unit 601 analyzes the level of the near-end signal 122 and extracts information about the signal level and outputs the level information to the AGC processing unit 603. According to an exemplary embodiment, information about the signal level of the near-end signal 122 may include, but is not limited to, both a long-term and short- term moving average of the signal power. The signal level information is then passed to the AGC processing unit 603. The AGC processing unit 603 makes a decision about what to do with the information and, if available, how to adjust the analog level at the capture device 20. The AGC processing unit 601 may also make digital changes to the near-end signal.
[0089] Fig. 7 shows a flow diagram illustrating operations performed by the acoustic echo canceller 100 according to the exemplary aspect of the present invention. More specifically, according to an embodiment of the invention, Fig. 7 further describes the algorithms on how echo state and suppression factors are determined in the NLP 104 of the AEC 100 as described above with respect to Figs. 5 and 6.
[0090] As described earlier, both the coherence cxd between the far-end signal 110 and near-end signal 122 and the coherence cde between the near-end signal 122 and error signal 124 are tracked over time to determine the state of the AEC 100. Based on the determination of a high or a low coherence, the NLP 104 decides whether to enter or leave the coherent state.
[0091] First, a determination is made by the NLP 104 at step S701 whether the coherence is high and at S705 whether the coherence is low as described above with reference to Fig. 5. As mentioned earlier, coherence is a frequency domain analog to time-domain correlation. More specifically, as mentioned above with reference to Fig. 5, coherence is a measure of similarity with 0 < c(n) < 1 ; where a higher coherence corresponds to more similarity.
[0092] Accordingly, if the NLP 104 determines that the coherence is high at S701, the AEC 100 enters into the coherent state at step S703. If the NLP 104 determines that the coherence is low at S705, the AEC 100 leaves the coherent state at step S707. As mentioned with reference to Fig. 5, the AEC 100 is considered in the "coherent state" when uc = 1 and in the "echo state" when ut = 1.
[0093] According to an exemplary aspect of the invention, a determination is made by the NLP 104 at step S709 whether cxd = 1. If the NLP 104 determines that cxd = 1, the AEC 100 leaves the echo state at step S71 1. Then, a further determination is made by NLP 104 at step S713 whether the AEC 100 is in the coherent state. If the NLP 104 determines that the AEC 100 is still in the coherent state, the following suppression factor s is output by the NLP 104 at step S715:
S = Cde
Sh = de
Si = C de -
[0094] At step S713, if the NLP 104 determines that the AEC 100 is not in the coherent state, the following suppression factor s is output by the NLP 104 at step S721 :
s = c'xd
Sh = c xd
Si = C xd-
[0095] On the other hand, if at S709 the NLP 104 determines that cxd is not equal to 1, a further determination is made at S717 whether the AEC 100 is in the coherent state. As mentioned earlier, AEC 100 is considered in the "coherent state" when uc = 1. If the AEC 100 is in the coherent state, it leaves the echo state at step S719 and outputs the same suppression factor s as outputted at step S721.
[0096] However, at S717, if the NLP 104 determines that the AEC 100 is not in the coherent state, the AEC 100 enters into echo state at step S723 when uc = 1 and the following suppression factor s is output by the NLP 104 at step S725:
Figure imgf000022_0001
Sh = S(nh)
Si = S(n|) .
[0097] According to an exemplary embodiment of the invention, the suppression factors may then be applied by the NLP 104 to the error signal 124 to substantially remove residual echo from the error signal 124.
[0098] Fig. 8 is a flow diagram illustrating interactions of the AEC 100 and the AGC 600 according to an embodiment of the present invention illustrated in Fig. 6. At step S801, echo state information from the AEC 100 and signal level information of the near-end signal 122 are received. At step S803, the AGC processing unit 603 determines a gain adaptation for the near-end signal 122 based on the signal level information. Finally, at step S805, the AGC processing unit 603 prevents upward gain adaptation to the received near-end signal when the echo state information indicates that the received near-end signal 122 contains an echo signal.
[0099] Fig. 9 is a block diagram illustrating an example computing device 900 that may be utilized to implement the AEC 100 including, but not limited to, the NLP 104, the filter 102, the far-end buffer 106, the blocking buffer 108, as well as the AGC analysis unit 601 and the AGC processing unit 603 in accordance with the present disclosure. The computing device 900 may also be utilized to implement the processes illustrated in Figs. 3, 5, 7, and 8 in accordance with the present disclosure. In a very basic configuration 901 , computing device 900 typically includes one or more processors 910 and system memory 920. A memory bus 930 can be used for communicating between the processor 910 and the system memory 920.
[00100] Depending on the desired configuration, processor 910 can be of any type including but not limited to a microprocessor (μΡ), a microcontroller (μθ), a digital signal processor (DSP), or any combination thereof. Processor 910 can include one more levels of caching, such as a level one cache 911 and a level two cache 912, a processor core 913, and registers 914. The processor core 913 can include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof. A memory controller 915 can also be used with the processor 910, or in some implementations the memory controller 915 can be an internal part of the processor 910.
[00101] Depending on the desired configuration, the system memory 920 can be of any type including but not limited to volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.) or any combination thereof. System memory 920 typically includes an operating system 921, one or more applications 922, and program data 924. Application 922 includes an echo cancellation processing algorithm 923 that is arranged to limit gain control adaptation. Program Data 924 includes echo cancellation routing data 925 that is useful for limiting gain control adaptation, as will be further described below. In some embodiments, application 922 can be arranged to operate with program data 924 on an operating system 921 such that gain control adaptation is limited. This described basic configuration is illustrated in FIG. 9 by those components within dashed line 901. [00102] Computing device 900 can have additional features or functionality, and additional interfaces to facilitate communications between the basic configuration 901 and any required devices and interfaces. For example, a bus/interface controller 940 can be used to facilitate communications between the basic configuration 901 and one or more data storage devices 950 via a storage interface bus 941. The data storage devices 950 can be removable storage devices 951 , non-removable storage devices 952, or a combination thereof. Examples of removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDD), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD), and tape drives to name a few. Example computer storage media can include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.
[00103] System memory 920, removable storage 951 and non-removable storage 952 are all examples of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 900. Any such computer storage media can be part of device 900.
[00104] Computing device 900 can also include an interface bus 942 for facilitating communication from various interface devices (e.g., output interfaces, peripheral interfaces, and communication interfaces) to the basic configuration 901 via the bus/interface controller 940. Example output devices 960 include a graphics processing unit 961 and an audio processing unit 962, which can be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 963. Example peripheral interfaces 970 include a serial interface controller 971 or a parallel interface controller 972, which can be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 973. An example communication device 990 includes a network controller 991, which can be arranged to facilitate communications with one or more other computing devices 990 over a network communication via one or more communication ports 992. The communication connection is one example of a communication media. Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. A "modulated data signal" can be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media can include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared (IR) and other wireless media. The term computer readable media as used herein can include both storage media and communication media.
[00105] Computing device 900 can be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. Computing device 900 can also be implemented as a personal computer including both laptop computer and non-laptop computer configurations.
[00106] There is little distinction left between hardware and software implementations of aspects of systems; the use of hardware or software is generally (but not always, in that in certain contexts the choice between hardware and software can become significant) a design choice representing cost vs. efficiency tradeoffs. There are various vehicles by which processes and/or systems and/or other technologies described herein can be effected (e.g., hardware, software, and/or firmware), and that the preferred vehicle will vary with the context in which the processes and/or systems and/or other technologies are deployed. For example, if an implementer determines that speed and accuracy are paramount, the implementer may opt for a mainly hardware and/or firmware vehicle; if flexibility is paramount, the implementer may opt for a mainly software implementation; or, yet again alternatively, the implementer may opt for some combination of hardware, software, and/or firmware.
[00107] The foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, flowcharts, and/or examples. Insofar as such block diagrams, flowcharts, and/or examples contain one or more functions and/or operations, it will be understood by those within the art that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or virtually any combination thereof.
[00108] In one embodiment, several portions of the subject matter described herein may be implemented via Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), digital signal processors (DSPs), or other integrated formats. However, those skilled in the art will recognize that some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computer systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware would be well within the skill of one of skill in the art in light of this disclosure.
[00109] In addition, those skilled in the art will appreciate that the mechanisms of the subject matter described herein are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the subject matter described herein applies regardless of the particular type of signal bearing medium used to actually carry out the distribution. Examples of a signal bearing medium include, but are not limited to, the following: a recordable type medium such as a floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, a computer memory, etc.; and a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
[00110] Those skilled in the art will recognize that it is common within the art to describe devices and/or processes in the fashion set forth herein, and thereafter use engineering practices to integrate such described devices and/or processes into data processing systems. That is, at least a portion of the devices and/or processes described herein can be integrated into a data processing system via a reasonable amount of experimentation.
[00111] Those having skill in the art will recognize that a typical data processing system generally includes one or more of a system unit housing, a video display device, a memory such as volatile and non-volatile memory, processors such as microprocessors and digital signal processors, computational entities such as operating systems, drivers, graphical user interfaces, and applications programs, one or more interaction devices, such as a touch pad or screen, and/or control systems including feedback loops and control motors (e.g., feedback for sensing position and/or velocity; control motors for moving and/or adjusting components and/or quantities). A typical data processing system may be implemented utilizing any suitable commercially available components, such as those typically found in data computing/communication and/or network computing/communication systems.
[00112] With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.
[00113] While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.

Claims

What is claimed is:
1. A method for limiting gain control adaptation to a near-end audio signal using echo state information obtained from an echo canceller, the method comprising: receiving echo state information from said echo canceller and signal level information of the near-end audio signal received by said echo canceller;
determining a gain adaptation for said near-end audio signal based on said signal level information; and
preventing upward gain adaptation to the received near-end audio signal when said echo state information indicates that the received near-end audio signal contains an echo signal.
2. The method according to claim 1, further comprising:
computing a first coherence value by comparing correlations between a far-end signal and the near-end signal;
computing a second coherence value by comparing correlations between the near-end signal and an error signal containing a residual echo output from a linear adaptive filter; and
tracking said first and second coherence values to determine said echo state information.
3. The method according to any of claims 1 to 2, further comprising: performing echo cancellation based on echo cancellation information obtained from said echo canceller to generate an outgoing signal.
4. The method according to any of claims 1 to 3, further comprising: adding a comfort noise to said outgoing signal. .
5. The method according to any of claims 1 to 4, wherein said signal level information of the near-end audio signal includes a moving average of the power of said near-end audio signal.
6. A system for limiting gain control adaptation to a near-end audio signal using echo state information obtained from an echo canceller, the system comprising: an echo canceller that receives, as input, the near-end audio signal, the echo canceller comprising a non-linear processor, characterized in that said non-linear processor is configured to output echo state information of the echo canceller;
an automatic gain control (AGC) analyzing unit operatively connected to said echo canceller, said AGC analyzing unit analyzing signal level information of the near-end audio signal received by the echo canceller; and
an AGC processing unit operatively connected to said echo canceller and said AGC analyzing unit, said AGC processing unit determining a gain adaptation for said near-end audio signal based on said signal level information and preventing upward gain adaptation to the received near-end audio signal when said echo state information indicates that the received near-end audio signal contains an echo signal.
7. The system according to claim 6, wherein said non-linear processor computes a first coherence value by comparing correlations between a far-end signal and the near-end signal and a second coherence value by comparing correlations between the near-end signal and an error signal containing a residual echo output from a linear adaptive filter and tracks said first and second coherence values to determine said echo state information.
8. The system according to any of claims 6 to 7, wherein the echo canceller performs echo cancellation on the near-end signal based on echo cancellation information to generate an outgoing signal.
9. The system according to any of claims 6 to 8, further comprising a comfort noise generator to generate a comfort noise to be added to said outgoing signal.
10. The system according to any of claims 6 to 9, wherein said signal level information of the near-end audio signal includes a moving average of the power of said near-end audio signal.
11. A computer-readable storage medium having stored thereon computer executable program for limiting gain control adaptation to a near-end audio signal using echo state information obtained from an echo canceller, the computer program when executed causes a processor to execute the steps of:
receiving echo state information from said echo canceller and signal level information of the near-end audio signal received by said echo canceller;
determining a gain adaptation for said near-end audio signal based on said signal level information; and
preventing upward gain adaptation to the received near-end audio signal when said echo state information indicates that the received near-end audio signal contains an echo signal.
12. The computer-readable storage medium of claim 1 1, wherein the computer program when executed causes the processor to further execute the steps of:
computing a first coherence value by comparing correlations between a far-end signal and the near-end signal;
computing a second coherence value by comparing correlations between the near-end signal and an error signal containing a residual echo output from a linear adaptive filter; and
tracking said first and second coherence values to determine said echo state information.
13. The computer-readable storage medium of any of claims 11 to 12, wherein the computer program when executed causes the processor to further execute the step of: performing echo cancellation based on echo cancellation information obtained from said echo canceller to generate an outgoing signal.
14. The computer-readable storage medium of any of claims 11 to 13, wherein the computer program when executed causes the processor to further execute the step of: adding a comfort noise to said outgoing signal.
15. The computer-readable storage medium of any of claims 11 to 14, wherein said signal level information of the near-end audio signal includes a moving average of the power of said near-end audio signal.
PCT/US2011/036861 2011-05-17 2011-05-17 Using echo cancellation information to limit gain control adaptation WO2012158164A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/US2011/036861 WO2012158164A1 (en) 2011-05-17 2011-05-17 Using echo cancellation information to limit gain control adaptation
EP11721216.7A EP2710788A1 (en) 2011-05-17 2011-05-17 Using echo cancellation information to limit gain control adaptation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2011/036861 WO2012158164A1 (en) 2011-05-17 2011-05-17 Using echo cancellation information to limit gain control adaptation

Publications (1)

Publication Number Publication Date
WO2012158164A1 true WO2012158164A1 (en) 2012-11-22

Family

ID=44242737

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/036861 WO2012158164A1 (en) 2011-05-17 2011-05-17 Using echo cancellation information to limit gain control adaptation

Country Status (2)

Country Link
EP (1) EP2710788A1 (en)
WO (1) WO2012158164A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2930917A1 (en) 2014-04-08 2015-10-14 Luis Weruaga Method and apparatus for updating filter coefficients of an adaptive echo canceller
CN110148421A (en) * 2019-06-10 2019-08-20 浙江大华技术股份有限公司 A kind of residual echo detection method, terminal and device
US10504501B2 (en) 2016-02-02 2019-12-10 Dolby Laboratories Licensing Corporation Adaptive suppression for removing nuisance audio
CN111199748A (en) * 2020-03-12 2020-05-26 紫光展锐(重庆)科技有限公司 Echo cancellation method, device, equipment and storage medium

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9565493B2 (en) 2015-04-30 2017-02-07 Shure Acquisition Holdings, Inc. Array microphone system and method of assembling the same
US9554207B2 (en) 2015-04-30 2017-01-24 Shure Acquisition Holdings, Inc. Offset cartridge microphones
WO2019231632A1 (en) 2018-06-01 2019-12-05 Shure Acquisition Holdings, Inc. Pattern-forming microphone array
US11297423B2 (en) 2018-06-15 2022-04-05 Shure Acquisition Holdings, Inc. Endfire linear array microphone
EP3854108A1 (en) 2018-09-20 2021-07-28 Shure Acquisition Holdings, Inc. Adjustable lobe shape for array microphones
JP2022526761A (en) 2019-03-21 2022-05-26 シュアー アクイジッション ホールディングス インコーポレイテッド Beam forming with blocking function Automatic focusing, intra-regional focusing, and automatic placement of microphone lobes
US11558693B2 (en) 2019-03-21 2023-01-17 Shure Acquisition Holdings, Inc. Auto focus, auto focus within regions, and auto placement of beamformed microphone lobes with inhibition and voice activity detection functionality
CN113841419A (en) 2019-03-21 2021-12-24 舒尔获得控股公司 Housing and associated design features for ceiling array microphone
CN114051738A (en) 2019-05-23 2022-02-15 舒尔获得控股公司 Steerable speaker array, system and method thereof
EP3977449A1 (en) 2019-05-31 2022-04-06 Shure Acquisition Holdings, Inc. Low latency automixer integrated with voice and noise activity detection
JP2022545113A (en) 2019-08-23 2022-10-25 シュアー アクイジッション ホールディングス インコーポレイテッド One-dimensional array microphone with improved directivity
WO2021243368A2 (en) 2020-05-29 2021-12-02 Shure Acquisition Holdings, Inc. Transducer steering and configuration systems and methods using a local positioning system
WO2022165007A1 (en) 2021-01-28 2022-08-04 Shure Acquisition Holdings, Inc. Hybrid audio beamforming system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5852769A (en) * 1995-12-08 1998-12-22 Sharp Microelectronics Technology, Inc. Cellular telephone audio input compensation system and method
US6381224B1 (en) * 1999-03-31 2002-04-30 Motorola, Inc. Method and apparatus for controlling a full-duplex communication system
US20020076037A1 (en) * 2000-12-15 2002-06-20 Eiichi Nishimura Echo canceler with automatic gain control of echo cancellation signal
US20030117967A1 (en) * 2001-12-20 2003-06-26 Mansour Tahernezhaadi Method and apparatus for performing echo canceller specific automatic gain control
US20070127711A1 (en) * 1999-12-09 2007-06-07 Leblanc Wilfrid Adaptive gain control based on echo canceller performance information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5852769A (en) * 1995-12-08 1998-12-22 Sharp Microelectronics Technology, Inc. Cellular telephone audio input compensation system and method
US6381224B1 (en) * 1999-03-31 2002-04-30 Motorola, Inc. Method and apparatus for controlling a full-duplex communication system
US20070127711A1 (en) * 1999-12-09 2007-06-07 Leblanc Wilfrid Adaptive gain control based on echo canceller performance information
US20020076037A1 (en) * 2000-12-15 2002-06-20 Eiichi Nishimura Echo canceler with automatic gain control of echo cancellation signal
US20030117967A1 (en) * 2001-12-20 2003-06-26 Mansour Tahernezhaadi Method and apparatus for performing echo canceller specific automatic gain control

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2930917A1 (en) 2014-04-08 2015-10-14 Luis Weruaga Method and apparatus for updating filter coefficients of an adaptive echo canceller
US10504501B2 (en) 2016-02-02 2019-12-10 Dolby Laboratories Licensing Corporation Adaptive suppression for removing nuisance audio
CN110148421A (en) * 2019-06-10 2019-08-20 浙江大华技术股份有限公司 A kind of residual echo detection method, terminal and device
CN110148421B (en) * 2019-06-10 2021-07-20 浙江大华技术股份有限公司 Residual echo detection method, terminal and device
CN111199748A (en) * 2020-03-12 2020-05-26 紫光展锐(重庆)科技有限公司 Echo cancellation method, device, equipment and storage medium

Also Published As

Publication number Publication date
EP2710788A1 (en) 2014-03-26

Similar Documents

Publication Publication Date Title
EP2710788A1 (en) Using echo cancellation information to limit gain control adaptation
WO2012158163A1 (en) Non-linear post-processing for acoustic echo cancellation
JP4975073B2 (en) Acoustic echo canceller using digital adaptive filter and same filter
CN105577961B (en) Automatic tuning of gain controller
JP5671147B2 (en) Echo suppression including modeling of late reverberation components
JP5049277B2 (en) Method and system for clear signal acquisition
US8306215B2 (en) Echo canceller for eliminating echo without being affected by noise
JP5284475B2 (en) Method for determining updated filter coefficients of an adaptive filter adapted by an LMS algorithm with pre-whitening
KR100721034B1 (en) A method for enhancing the acoustic echo cancellation system using residual echo filter
CN109273019B (en) Method for double-talk detection for echo suppression and echo suppression
JP2003158476A (en) Echo canceller
US10789933B1 (en) Frequency domain coefficient-based dynamic adaptation control of adaptive filter
WO2012158168A1 (en) Clock drift compensation method and apparatus
US8964967B2 (en) Subband domain echo masking for improved duplexity of spectral domain echo suppressors
KR19990076870A (en) Convergence Measurement of Adaptive Filters
US7177416B1 (en) Channel control and post filter for acoustic echo cancellation
EP2716023A1 (en) Control of adaptation step size and suppression gain in acoustic echo control
CN111355855B (en) Echo processing method, device, equipment and storage medium
JP2005051744A (en) Speech communication apparatus
EP1459510A1 (en) Echo canceller having spectral echo tail estimator
WO2012158165A1 (en) Non-linear post-processing for super-wideband acoustic echo cancellation
JP5057109B2 (en) Echo canceller
KR20220157475A (en) Echo Residual Suppression
JP6143702B2 (en) Echo canceling apparatus, method and program
KR100431965B1 (en) Apparatus and method for removing echo-audio signal using time-varying algorithm with time-varying step size

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11721216

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2011721216

Country of ref document: EP