CN1953060A

CN1953060A - Echo elimination device for microphone and method thereof

Info

Publication number: CN1953060A
Application number: CNA2006101440555A
Authority: CN
Inventors: 张晨
Original assignee: Vimicro Corp
Current assignee: Vimicro Corp
Priority date: 2006-11-24
Filing date: 2006-11-24
Publication date: 2007-04-25
Anticipated expiration: 2026-11-24
Also published as: CN100524466C

Abstract

This invention discloses a microphone echo elimination device and method, which eliminates echo between microphone and sound circuit, wherein, the device comprises long frame adjust module to combine one self adaptive filter parameter data frame for self adapting filter.

Description

Echo cancellation device and echo cancellation method for microphone

Technical Field

The present invention relates to the field of echo cancellation, and in particular, to a microphone echo cancellation device and method using a frequency domain adaptive filter, which are used for canceling an echo generated by an acoustic loop between a speaker and a microphone.

Background

The echo is generated due to the presence of an acoustic loop between the loudspeaker and the microphone. As shown in fig. 1, the sound signal from the far end, which reaches the near end through the communication connection, is recorded as a signal u, is emitted through the near end speaker, passes through the acoustic loop g between the speaker and the microphone, is collected by the microphone to obtain a reference signal d, and then is transmitted back to the far end through the communication connection. At this time, the far-end speaker can hear the echo of the far-end speaker, i.e., the far-end echo. Thereby seriously affecting call quality.

Since the acoustic loop g from the loudspeaker to the microphone is unknown and time-varying, the method of adaptive filtering is widely adopted in echo cancellation schemes. Fig. 1 shows a basic schematic diagram of echo cancellation using adaptive filtering. The adaptive filter takes the minimized residual echo e as a target, and carries out filtering processing on a sound signal u from a far end by adaptively adjusting the filter coefficient of the adaptive filter to track an acoustic feedback loop g from a loudspeaker to a microphone and generate a predicted value y of the echo d received by the microphone. When the filter accurately tracks g, y is very close to d, so that e-d-y tends to 0, thereby realizing the effect of eliminating echo.

In the adaptive filtering process, the adaptive filter needs to track an unknown feedback loop, i.e. to model an unknown device. When the unknown feedback loop g has a large delay, i.e. the unknown apparatus has a high order, the adaptive filter at least needs to have the same order to obtain the best analog effect. Since the time-domain adaptive filtering process is a convolution process of the input signal and the adaptive filter, the complexity of the algorithm increases sharply with the increase of the order of the adaptive filter, and is not practical when the delay of the feedback loop is large. The subband adaptive filtering can reduce the operation complexity, but can bring the problem of signal aliasing.

The convolution of the time domain is equal to the multiplication of the frequency domain, and the self-adaptive filtering algorithm of the frequency domain can reduce the algorithm complexity and improve the operation efficiency when the order of the filter is higher through the fast algorithm by means of FFT, so that the method is a very practical filtering mode.

The frequency domain adaptive filtering algorithm in the prior art is generally as follows.

Some of the signal labels used hereinafter will first be described. In frequency-domain adaptive filtering, the input signal is processed in units of frames, in this context

Represents the current frame signal of the signal x, i.e. the k-th frame signal. Such as by

Representing the sound signal of the k-th frame coming from the far end and about to be output to the loudspeaker

Represents a sound signal having a combined length of 2M, andrepresenting the echo signal of the kth frame acquired by the microphone, etc. Further, the treatment is represented by w (k)

The time-domain filter coefficients, and their corresponding frequency-domain filter coefficients, are denoted by w (k). The FFT represents a fast fourier transform, and the IFFT represents an inverse fast fourier transform.

An echo cancellation device to which a frequency domain adaptive filter is applied generally includes the following components.

(1) A data acquisition and combination module for acquiring the sound signal u from the far end to be output to the loudspeaker, wherein the length of the data frame acquired each time is M, and the current frame, namely the kth frame data is recorded asAnd combined with the previous frame, i.e. the k-1 th frame data u' (k-1), to form a large frame with a length of 2M

(2) Assuming that the order of the adaptive filter is M, the time domain coefficient of the filter is denoted as w (k), and an overlap preservation method is adopted, in order to avoid aliasing, the filter of the M order is extended by M0 s to form a filter with N being 2M coefficients, and the frequency domain coefficient of the filter obtained after FFT processing is:

W (k) = FFT [\begin{matrix} w (k) \\ 0 \end{matrix}],

the length is 2M.

The frequency domain adaptive filter is used for converting the frequency domain adaptive filter into a frequency domain adaptive filter

FFT processing is carried out, and the frequency domain is converted to obtain

<math> <mrow> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mo>[</mo> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> <mo>;</mo> </mrow> </math>

Filtering the U (k) by using the current filter coefficient W (k), and then performing IFFT processing on the filtering result to obtain the echoA frame prediction value of

<math> <mrow> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>IFFT</mi> <mo>[</mo> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> <mo>,</mo> </mrow> </math>

The results were taken M points thereafter.

(3) A subtractor for collecting the echo by a microphoneSubtracting the predicted valueObtaining a residual echo

<math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>

CollectedThe length is also M.

(4) The frequency domain adaptive filter further comprises a voice correlation detection unit for calculating the residual echo in the frequency domain

With the sound signal from the far end

To obtain a speech correlation parameter

<math> <mrow> <mover> <mi>φ</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>IFFT</mi> <mo>[</mo> <msup> <mi>U</mi> <mi>H</mi> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> <mo>,</mo> </mrow> </math>

Wherein U is^H(k) Is the conjugate value to said U (k),

<math> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow> </math>

to pair

The first M points of the result are taken.

(5) The frequency domain adaptive filter further comprises a coefficient updating unit, which is used for updating the coefficient W (k) of the frequency domain adaptive filter according to the voice correlation and by combining the adaptive step size mu of the adaptive filter to obtain the coefficient

<math> <mrow> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>μFFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mover> <mi>φ</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> <mo>.</mo> </mrow> </math>

The frequency domain adaptive filter is updated once every time the coefficient W (k) of the frequency domain adaptive filter is adaptively filtered, and the adaptive filter performs frequency domain filtering on next combined big frame data by taking the updated coefficient W (k +1) as the current W (k) when the adaptive filter is next time.

Fig. 2 is a schematic diagram of a method for performing echo cancellation by using a frequency domain adaptive filtering method in the prior art, where a thin arrow represents time domain signal processing, and a thick arrow represents frequency domain signal processing. Since the signals are processed by frame division by the frequency domain adaptive filtering method, the u, y, d and e signals shown in fig. 1 correspond to those shown in fig. 2

Andk-th frame signals respectively representing the respective signals; in addition, in order to

Show thatThe data of (2) are combined to obtain a large frame with the length of 2M. It is known that the block processing and re-combination after truncation of the long sequence needs to use overlap-add or overlap-save method to avoid aliasing, and the overlap-save method is described herein.

Firstly, assuming that the order of the time domain adaptive filter is M, the coefficient thereof is denoted as w (k), because an overlap preservation method is adopted, in order to avoid aliasing, the filter of the M order is expanded by M0, and the frequency domain coefficient vector of the filter obtained after FFT processing is:

W (k) = FFT [\begin{matrix} w (k) \\ 0 \end{matrix}] - - - (1.1)

as can be seen from the above equation (1.1), the length N of the frequency domain adaptive filter coefficients w (k) is 2 times the length M of the time domain coefficient vector. For frequency domain adaptive filtering algorithms, both adaptive filtering and filter coefficient updating are done in the frequency domain, so the form of time domain filters will not appear. It should be noted that the FFT or IFFT processing length mentioned later is N points.

The steps of the frequency domain adaptive filtering processing are as follows:

1) collecting a frame of sound signal from a remote endThe frame length is M.

2) For input signal

Processing to connect two frames, i.e. to

And merging the data of the k-1 frame of the previous frame into a large frame to obtain the following formula:

whereinThe length of the kth merged large frame is N-2M;

u (kM-M) is the 1 st data in the original k-1 frame data;

u (kM-1) is the Mth data in the original k-1 frame data;

u (km) is the 1 st data in the original k frame data;

u (kM + M-1) is the Mth data in the original k frame data.

3) Will be provided withPerforming FFT processing, and converting into a frequency domain to obtain:

<math> <mrow> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mo>[</mo> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1.3</mn> <mo>)</mo> </mrow> </mrow> </math>

4) filtering the input signal, namely multiplying the input signal in a frequency domain, then performing IFFT processing, converting the input signal into a time domain, and taking the next frame of the result, namely the next M data, namely the predicted value of the echo signal:

<math> <mrow> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>[</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>]</mo> <mo>=</mo> <mi>IFFT</mi> <mo>[</mo> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1.4</mn> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math>

5) for collecting echo signals

Represents, i.e.:

<math> <mrow> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>[</mo> <mi>d</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>d</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>d</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>]</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1.5</mn> <mo>)</mo> </mrow> </mrow> </math>

the residual echo signal is the difference between the echo signal and its predicted value:

<math> <mrow> <mtext></mtext> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>[</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>]</mo> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1.6</mn> <mo>)</mo> </mrow> </mrow> </math>

6) m zeros are added in front of the residual echo signal, and FFT processing is performed to obtain a frequency domain residual echo signal:

<math> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1.7</mn> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math>

the update amount of the adaptive filter coefficients is calculated using e (k) and u (k). First, conjugate U (k) to obtain U^H(k) In that respect In the frequency domain, the update amount of the adaptive filter coefficient vector is determined by calculating the correlation between the error signal and the input signal, since the linear correlation is equivalent in form to an inverse linear convolution, a fast algorithm having FFT in the frequency domain by means of convolution in the time domain has:

<math> <mrow> <mover> <mi>φ</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>IFFT</mi> <mo>[</mo> <msup> <mi>U</mi> <mi>H</mi> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1.8</mn> <mo>)</mo> </mrow> </mrow> </math>

according to the overlap-and-hold method, in the above formula, the frame after the result needs to be deleted, i.e. only the first M points of the IFFT result are taken.

7) Finally we utilize The adaptive filter coefficients are updated. Note that: the filter coefficients in the frequency domain are generated by zero-filling the time domain coefficients followed by FFT processing, so here, correspondingly, the filter coefficients in the frequency domain will be generatedAfter M zeros are complemented, FFT processing is carried out, the result is multiplied by the self-adaptive step length mu, the obtained product is added with the filter coefficient W (k) before updating, and the frequency domain form of the filter coefficient updating can be obtained as follows:

and the next time of adaptive filtering, namely, the W (k +1) is adopted as the updated current filter coefficient W (k) for filtering.

8) And (5) circularly performing the steps 1) to 7) until the data processing is finished.

It can be seen from the steps of the frequency domain adaptive filtering algorithm that the filtering coefficient of the frequency domain adaptive filter is updated once every frame of signal with the frame length of M, so the convergence rate is slow, and especially when the characteristic of the feedback loop changes fast, the effect is not ideal.

Disclosure of Invention

In order to solve the above-mentioned drawbacks of the prior art, the present invention provides an echo cancellation device and an echo cancellation method, so that the coefficients of the frequency domain adaptive filter can work efficiently and stably, thereby achieving the purpose of effectively canceling echo.

In order to solve the above problem, the present invention provides a microphone echo cancellation device for canceling an echo generated by an acoustic loop between a speaker and a microphone, comprising:

a data acquisition and combination module for acquiring the sound signal u from the far end to be output to the loudspeaker, wherein the length of the data frame acquired each time is M, and the current frame, namely the kth frame data is recorded asAnd is compared with the previous frame, i.e. the k-1 frame dataJointly forming a large frame of length 2M

Frequency-domain adaptive filter whose current filter frequency-domain coefficients are denoted

W (k) = FFT [\begin{matrix} w (k) \\ 0 \end{matrix}],

Length 2M, where w (k) is the time domain coefficient of the filter, length M; the frequency domain adaptive filter is used for converting the frequency domain adaptive filter into a frequency domain adaptive filter

FFT processing is carried out, and the frequency domain is converted to obtain

<math> <mrow> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mo>[</mo> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> </mrow> </math>

(ii) a Filtering the U (k) by using the current filter coefficient W (k), and then performing IFFT processing on the filtering result to obtain the echoA frame prediction value of

Taking the next M points of the result;

a subtractor for collecting the echo with length M by a microphone

Subtracting the predicted value

Obtaining a residual echo

<math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>

The frequency domain adaptive filter further comprises a voice correlation detection unit for calculating the residual echo in the frequency domain

With the sound signal from the far endTo obtain a speech correlation parameter

Wherein U is^H(k) Is the conjugate value to said U (k),

to pairTaking the first M points of the result;

the frequency domain adaptive filter further comprises a coefficient updating unit, which is used for updating the coefficient W (k) of the frequency domain adaptive filter according to the voice correlation and by combining the adaptive step size mu of the adaptive filter to obtain the coefficient

<math> <mrow> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>μFFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mover> <mi>φ</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow> </math>

The coefficient W (k) of the frequency domain adaptive filter is updated once each time the frequency domain adaptive filter performs adaptive filtering, and the adaptive filter performs frequency domain filtering on next combined big frame data by using the updated coefficient W (k +1) when performing the next adaptive filtering;

the frame length adjusting module is used for setting the data frame length of the u to a value L smaller than M;

correspondingly, the data acquisition and combination module is used for combining L data of the current kth frame data and the immediately preceding 2M-L continuous data to form a large frame with the length of 2M;

accordingly, the frequency domain adaptive filter adaptively filters the 2M large frame; after the filtering processing of each frame of data with the length of L is finished, updating the frequency domain filtering coefficient of the filter;

and correspondingly, a residual echo interception module is also included,for intercepting said residual echoThe first L signals of each frame result, the final residual echo e is obtained.

Preferably, the frame length adjusting module adjusts the frame length from M to L ═ M/n, where n is an integer greater than 1; correspondingly, the data acquisition and combination module combines the current frame of u and the immediately previous 2n-1 data frames into a large frame with the length of 2M.

Preferably, the system also comprises an acoustic detection module and a filtering control module,

the sound detection module comprises two sound detection units which are respectively used for detecting sound conditions of the microphone input end and the loudspeaker output end and outputting the detection results to the filtering control module;

the filtering control module is used for controlling the work of the frequency domain self-adaptive filter according to the output result of the sound detection module,

if the microphone input end sound detection result is soundless, then the self-adaptive filtering is not carried out, the coefficient updating is not carried out, and the output is directly made

<math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>

Completing the frame processing;

if the detection result of the microphone input end is voiced, then the detection result of the loudspeaker output end is seen, if the detection of the loudspeaker output end is unvoiced, the self-adaptive filtering is normally carried out, but the coefficient updating is not carried out, and the output is carried out

Completing the frame processing;

if the detection results of the microphone input end and the loudspeaker output end are voiced, the adaptive filter is in a normal working state, namely adaptive filtering is carried out, and coefficient updating is also carried out to obtain output

And the updated filter coefficient W (k +1), completing the frame processing.

Preferably, the voiced sound detection module determines whether there is a voiced sound by comparing the short-time average amplitude of the sound signals at the microphone input end and the speaker output end with the noise level, specifically:

if MicSignal _ avg is larger than NoiseFloor, judging that the microphone line is voiced, otherwise, judging that the microphone line is unvoiced;

the above-mentioned

<math> <mrow> <mi>MicSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>M</mi> <munderover> <mi>Σ</mi> <mn>0</mn> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> </math>

Is the short-time average amplitude of the microphone input signal, wherein

Acquiring a sound signal with a frame length of M for a microphone, wherein M is the frame length, and NoiseFlor is an estimated noise level;

if the SpkSignal _ avg is larger than the NoiseFloor, judging that the loudspeaker line is voiced, otherwise, judging that the loudspeaker line is unvoiced;

the above-mentioned

<math> <mrow> <mi>SpkSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>L</mi> <munderover> <mi>Σ</mi> <mn>0</mn> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>,</mo> </mrow> </math>

Is the short-time average amplitude of the signal input to the loudspeaker,

for signals input to the speakers, L is the frame length.

Preferably, the apparatus further comprises a step size adjusting module, configured to detect a coefficient update step size μ of the adaptive filter, and decrease the value of μ when μ is greater than a set maximum coefficient update step size threshold.

Preferably, when it is detected that the update step size of the adaptive filter coefficient is restored to normal, the coefficient update step size is restored to the initial value.

Preferably, the adaptive filter further comprises a coefficient adjusting module, configured to decrease the filter coefficient w (k) when detecting that the coefficient w (k) of the adaptive filter is greater than a set coefficient threshold.

Preferably, the method further comprises the following steps: and the nonlinear processing module is used for suppressing nonlinear components in the echo.

Preferably, the nonlinear processing module enables the signal to be processed when E (e) > NLPfloor

Where e is a residual signal and is also an input of the nonlinear processing module, and an output of the nonlinear processing module is e', e (e) is a short-time average amplitude of the residual signal, and nlpfolor is a decision level.

Preferably, when E (e). ltoreq.NLPfloor, e' is directly replaced by comfort noise.

Preferably, the method further comprises the following steps:

the loudspeaker sound detection module is used for detecting the sound condition of the output end of the loudspeaker;

the nonlinear processing control module is used for turning on or off the nonlinear processing module according to the output result of the loudspeaker sound detection module;

when the loudspeaker sound detection module detects that the output end of the loudspeaker is sound, namely SpkSignal _ avg is larger than NoiseFlor,

when the signal at the output end of the loudspeaker is larger than the residual signal by alpha times, namely SpkSignal _ avg/Ee is larger than alpha, the nonlinear processing module is started;

if one of the two conditions is not met, the NLP processing is closed;

wherein: SpkSignal _ avg is the short-time average amplitude of the loudspeaker output signal, noiseflo is the estimated noise level, and e (e) is the short-time average amplitude of e.

The invention also provides a microphone echo cancellation method, which utilizes a frequency domain adaptive filtering method to cancel the echo d generated by the sound signal u from the far end through an acoustic loop between a loudspeaker and a microphone, and finally obtains a residual echo e, wherein the coefficient of a time domain filter is w (k), the length is M, and the coefficient of a corresponding frequency domain filter is as follows:

W (k) = FFT [\begin{matrix} w (k) \\ 0 \end{matrix}],

the length is 2M, and an overlapping reservation method is adopted; which comprises the following steps of,

1) setting the data frame length L of the signal u acquired each time;

2) collecting a frame signal according to the set frame length L

Represents a k-th frame signal;

3) the current frame

Merging the data with the previous 2M-L data into a large frame with the length of 2M

4) Will be described in

Conversion to the frequency domain, overlap-preserving, with filter coefficients W (k) for said frequency domain

Filtering is carried out, the result is converted into a time domain, and a predicted value of the echo time domain is obtained

5) Collecting echoes

And subtractObtaining the minimum residual echo signal of the k frame

6) According to the above

And

updating the filter coefficient W (k) to obtain W (k + 1);

7) and 2) acquiring the next frame of signals, merging the signals, and performing frequency domain adaptive filtering by using the updated filter coefficient until the data input is finished.

Preferably, the frequency domain adaptive filtering algorithm includes the following steps:

1) frame length adjustment, namely adjusting the frame length of u from M to a positive integer value L smaller than M;

2) collecting the kth frame signal of u, the frame length is L, and recording as

3) Will be described in

The L data in the data list and the 2M-L data in the immediate past are combined to form the combined dataOne large frame of length 2M

u (kL-2M + L) is the 2M-L data before the original k frame,

u (kL-2) is the 2 nd data before the original k frame,

u (kL-1) is the previous data of the original k-th frame,

u (kL) is the 1 st data in the original k frame,

u (kL + L-1) is the L-th data in the original k-th frame;

4) will be provided with

Performing FFT processing, and converting into a frequency domain to obtain:

5) filtering U (k) by current filter coefficient W (k) by overlap preserving method, i.e. multiplying in frequency domain, then taking the last M data of the result after IFFT processing to the result, and recording as

Namely, the method comprises the following steps:

<math> <mrow> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>[</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>]</mo> <mo>=</mo> <mi>IFFT</mi> <mo>[</mo> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> <mo>;</mo> </mrow> </math>

6) after being played by the loudspeaker, the u passes through an acoustic loop between the loudspeaker and the microphone, and then is collected by the microphone to obtain an echo signal with the length of M so as to obtain a signal

Represents, i.e.:

<math> <mrow> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>[</mo> <mi>d</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>d</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>d</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>]</mo> <mo>,</mo> </mrow> </math>

the above-mentioned

And the step 5) mentioned aboveSubtracting to obtain an error signal

Comprises the following steps:

<math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>[</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>]</mo> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>

7) intercept the

The resulting L signals are output as final residual echoes;

8) at said length M, not interceptedM zeros are supplemented in advance, and FFT processing is carried out to obtain:

<math> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow> </math>

simultaneously conjugating the U (k) in the step 4) to obtain U^H(k) Then, performing dot multiplication with the E (k), and performing IFFT operation on the result, and obtaining the following result according to an overlap preservation method:

in the above formula, the next frame of the result needs to be deleted, and only the first M points of the IFFT result are taken;

9) in the above-mentioned

Then, M zeros are complemented, FFT processing is carried out, the result is multiplied by the self-adaptive step size mu, the obtained product is added with the filter coefficient W (k), and the updated value of the filter coefficient in the frequency domain form can be obtained as follows:

<math> <mrow> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>μFFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mover> <mi>φ</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow> </math>

the next time of self-adaptive filtering is to adopt the updated filter coefficient W (k +1) for filtering;

10) and step 2) is executed until the sound signal from the far end is input.

Preferably, the value of L is M/n, and n is an integer greater than 1.

Preferably, before the step 1), the method further comprises a sound detection step and a filtering control step, and the method comprises the following steps:

a sound detection step, detecting sound conditions of the microphone input end and the loudspeaker output end;

a filtering control step of controlling the operation of a filter according to the result of the voiced sound detection step;

the method specifically comprises the following steps:

if the detection result of the microphone input end is silent, then the self-adaptive filtering is not carried out, the coefficient updating is not carried out, and the output is directly made

Completing the frame processing;

if the detection result of the microphone input end is voiced, then looking at the detection result of the loudspeaker output end, if the detection result of the loudspeaker output end is unvoiced, then the self-adaptive filtering is normally carried out, but the coefficient updating is not carried out, and the output is carried out

Completing the frame processing;

if the detection results of the microphone input end and the loudspeaker output end are voiced, the adaptive filter is in a normal working state, and not only is adaptive filtering carried out, but also coefficient updating is carried out, and the adaptive filter is output

Completing the frame processing;

wherein,in order to be the echo received by the microphone,

for the pairs of the adaptive filter outputsThe predicted value of (a) is determined,is the residual echo.

Preferably, the sound detection is to determine whether there is sound by comparing the short-time average amplitude of the sound signals at the microphone input end and the speaker output end with the noise level, specifically:

the above-mentioned

<math> <mrow> <mi>MicSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>M</mi> <munderover> <mi>Σ</mi> <mn>0</mn> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>,</mo> </mrow> </math>

Is the short-time average amplitude of the microphone input signal,

for a microphone input signal, i.e., a received echo signal, M is the length of a frame of speech signal, noise floor is the estimated noise level;

the above-mentioned

For a short-time average amplitude of the loudspeaker output signal,

outputting signals for loudspeakersAnd L is the length of a frame of speech signal.

Preferably, the method further comprises a step size adjusting step, configured to decrease the coefficient update step size of the adaptive filter when it is detected that the coefficient update step size of the adaptive filter is greater than the set maximum coefficient update step size threshold.

Preferably, the method further comprises a coefficient adjusting step, configured to decrease the coefficient of the filter when the coefficient of the adaptive filter is detected to be greater than the set coefficient threshold.

Preferably, the method further comprises the following nonlinear processing steps:

firstly, calculating the short-time average amplitude E (e) of the minimized residual signal;

then, whether E (e) is larger than a preset nonlinear processing threshold NLPfloor is judged, and if yes, the minimized residual noise e' (n) is calculated by using the following formula:

wherein e is a residual signal and is an input of the nonlinear processing module, an output of the nonlinear processing module is e', e (e) is a short-time average amplitude of the residual signal, and nlpfolor is a decision level.

Preferably, if E (e). ltoreq.NLPfloor, e' is directly replaced by comfort noise.

Preferably, the method further comprises a nonlinear processing switch control step, specifically:

detecting the sound condition of the output end of the loudspeaker;

turning on or off the nonlinear processing step according to the detection result, specifically:

when the output end of the loudspeaker is detected to be voiced, namely SpkSignal _ avg is larger than NoiseFlor, and the signal of the output end of the loudspeaker is larger than a residual signal by alpha times, namely SpkSignal _ avg/E [ E ] > alpha, the nonlinear processing module is started;

if one of the two conditions is not met, the NLP processing is closed; wherein: SpkSignal _ avg is the short-time average amplitude of the speaker output signal, noise floor is the estimated noise level, e (e) is the short-time average amplitude of the residual signal, and α is a preset multiple value.

The frame length adjusting module added in the self-adaptive frequency domain filter makes the frame length of the sound signal from the far end processed at one time smaller than the time domain coefficient length of the self-adaptive filter, and then combines more than one frame of signal into a large frame to carry out self-adaptive filtering. On one hand, the length of the self-adaptive filter is kept to be original enough length, and the delay requirement of a feedback loop can be met; on the other hand, the updating frequency of the adaptive filter coefficient is increased, so that the adaptive filter can work efficiently. In addition, the filtering control module disclosed by the invention can ensure that the self-adaptive filter can not be converged wrongly under the special condition that a microphone input line or a loudspeaker output line is silent, so that the normal work of the self-adaptive filter is ensured; the step length adjusting module and the coefficient adjusting module enable the adaptive filter to recover to a normal working state under the condition of divergence; the nonlinear processing module may cancel nonlinear distortion in the feedback loop. Therefore, the echo cancellation device of the invention can make the adaptive filter work efficiently and stably, thereby achieving the purpose of effectively canceling echo.

Drawings

FIG. 1 is a schematic diagram of a basic structure of an apparatus for performing echo cancellation by adaptive filtering;

FIG. 2 is a diagram illustrating a method for performing echo cancellation by frequency-domain adaptive filtering in the prior art;

FIG. 3 is a schematic diagram of the structure of the voice detection module and the filtering control module in the device according to the present invention;

FIG. 4 is a diagram of a data merge unit according to the present invention;

fig. 5 is a schematic diagram of the relationship between echo and decision level before and after the nonlinear processing by the nonlinear processing module according to the present invention.

Detailed Description

The echo cancellation device and method of the present invention will be described in detail below with reference to the accompanying drawings.

In order for the adaptive filter to effectively track the feedback loop, the coefficient length of the adaptive filter must be greater than the number of sampling points of the feedback delay. For example, for a signal with 8K sampling rate, if the time-domain adaptive filter coefficient length M is 1024, the maximum feedback delay of the feedback loop that the filter can track and model is: 1024/8000-128 ms.

In the frequency domain adaptive filtering method described in the background art, the length of the frequency domain filter coefficient is 2M, the length of the corresponding time domain coefficient is M, and the length of each new incoming data frame is also M. That is, the adaptive filter time domain coefficient length is the same as the length of the new data frame, i.e. the adaptive filter coefficient length can be 1024, and then the data frame length processed once is also 1024. Thus, only about 8 filtering and coefficient updates are performed a second. For environments where the feedback loop changes faster, this update frequency is sometimes insufficient.

Therefore, as shown in fig. 3, on the basis of the frequency-domain adaptive filtering, the present invention adds a frame length adjusting module for adjusting the length of the data frame to L. Note that after one adjustment, the frame length is relatively fixed, rather than being adjusted every time a frame of data is acquired. Such as: the length of the frequency domain filter coefficient is 2M, the length of the corresponding time domain filter coefficient is M, and the length L of each new incoming data frame can be the filter time domain coefficientHalf the length, i.e., L ═ M/2(M is an even number). To the input signal

The original two-frame combination is changed into four-frame combination. Through the improvement, on one hand, the length of the self-adaptive filter is still 2M and is long enough to meet the delay requirement of a feedback loop; on the other hand, the adaptive filter coefficient is updated once per M/2 frame length, and the updating frequency of the adaptive filter coefficient is also considered. However, this comes at the cost of increased algorithm complexity. Since the data amount of each frame is L, a residual echo intercepting module is added to intercept the first L data of the obtained residual echo and output the intercepted data as a final result when the residual echo is output.

In the above example, L is M/2, and in actual use, it may be M/3, M/4, M/8, or the like, so that the coefficient update frequency of the adaptive filter can be made higher. Only accordingly, the length of the data intercepted by the residual echo intercepting module needs to be changed.

In addition to this, the length L of each data frame may also be any number less than M, for example: m is 1024, then L can be 1000, 900, 650, etc. any value less than 1024. Only when the data frames are combined, the length of the combined large frame is ensured to be 2M. This problem can be solved by: as shown in FIG. 4, a FIFO buffer of length 2M is used to store the incoming data, with each new frame of data being received

Will be provided withCombined with the previous 2M-L data into one large frameAn adaptive filtering process is performed once.

Adaptive filtering can automatically track the feedback loop, but for special cases, adaptive filters are prone to mis-tracking, such as the case where the microphone and speaker lines are silent at the same time. In this case, the input signal and the reference signal of the adaptive filter are small, and the adaptive filter is liable to misconvergence.

In order to prevent the filter from erroneously converging, the present invention proposes that a voiced sound detection module and a filtering control module may be added to the echo cancellation device, as shown in fig. 3.

The voiced detection module, i.e., VAD (voice Activity detector) module, may include two voiced detection units VAD1 and VAD2 located at the microphone input and the speaker output. VAD detection may make a decision by comparing the short-time average amplitude of the signal to the noise level. The short-time average amplitude of the signal can be obtained by calculating the average amplitude of the signal for one frame.

For the microphone input:

<math> <mrow> <mi>MicSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>M</mi> <munderover> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> </math>

(2.1)

in the formula: MicSignal _ avg is the short-time average amplitude of the microphone input signal,

for the microphone input signal, M is the length of a frame of speech signal.

If MicSignal _ avg > NoiseFlor, the microphone line is judged to be voiced, otherwise, the microphone line is not voiced. Where noise floor is the estimated noise level.

Similarly, for the speaker output:

<math> <mrow> <mi>SpkSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>L</mi> <munderover> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <msup> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mo>'</mo> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> </math>

(2.2)

in the formula: SpkSignal _ avg is the short-time average amplitude of the loudspeaker output signal,

l is the length of one frame of speech signal for the sound signal input to the speaker.

If SpkSignal _ avg > NoiseFlor, judging that the loudspeaker line is voiced, otherwise, judging that the loudspeaker line is unvoiced.

According to the output result of the sound detection unit, the filtering control module performs overall control on the work of the filter, and specifically comprises the following steps:

if the VAD1 detects silence, the output is directly made without adaptive filtering or filter coefficient updating

Completing the frame processing; if the VAD1 detects a sound, then look at the VAD2 detection result, if the VAD2 detectsSilence, adaptive filtering, but no filter coefficient updating, is normally performed, and output

<math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> </math>

Completing the frame processing; if both VAD1 and VAD2 detect voiced sounds, then the adaptive filter is in a normal operating state, i.e., adaptive filtering is performed, filter coefficient updating is also performed, and the output

This frame processing is completed.

Experiments show that after filtering control is added, the adaptive filter can not be converged wrongly under the special condition that a microphone input line or a loudspeaker output line is silent, and normal work of the adaptive filter is guaranteed.

In addition, for adaptive filtering, if the reference signal collected by the microphone

The sound emitted by the speaker is completely generated, so that the adaptive filter can easily track the feedback loop and can stably work. However, the signal collected by the microphone generally includes not only the sound emitted from the speaker but also the sound from the near endA sound signal, and such a sound signal sometimes also occupies a major component. Such a signal will therefore interfere with the adaptive filter tracking the feedback loop correctly, possibly leading to erroneous tracking of the adaptive filter and even coefficient divergence.

When the filter tracks incorrectly, coefficients begin to diverge, which is shown in coefficient updating, and the coefficient updating amount of the adaptive filter is usually larger at this time. Therefore, as shown in fig. 3, the present invention can add a step size adjustment module, and when it is detected that the coefficient update amount is relatively large, it is determined that the adaptive filter is in an abnormal working state at this time, and the coefficient update step size is reduced, so that the error tracking of the filter can be effectively suppressed, and the coefficient divergence is avoided. When the coefficient updating amount is detected to be normal, the adaptive filter is judged to be in a normal working state at the moment, and then the coefficient updating step length can be adjusted, such as the coefficient updating step length is restored to the initial value. This can increase the convergence speed of the adaptive filter.

In particular, for the NLMS algorithm in the frequency domain adaptive algorithm,

as previously described, the coefficient update is shown as follows:

order to

(2.4)

Then, W (k +1) ═ W (k) + μ · Φ (k) (2.5)

Where w (k) is a frequency domain adaptive filter coefficient and is an N-dimensional complex vector, μ is a coefficient update step, Φ (k) is also an N-dimensional complex vector, and N is the number of FFT points. Namely:

Ф(k)＝[Ф₀(k)，Ф₁(k)，...，Ф_N-1(k)]^T

(2.6)

the coefficient update amount thus obtained is:

μ·Ф(k)＝[μ·Ф₀(k)，μ·Ф₁(k)，...，μ·Ф_N-1(k)]^T (2.7)

the key to the step size adjustment mentioned above is to detect the magnitude of the coefficient update amount. The magnitude of the coefficient update can be measured modulo a complex number. Namely:

[μ·‖Ф₀(k)‖，μ·‖Ф₁(k)‖，...，μ·‖Ф_N-1(k)‖]^T (2.8)

in the present invention, the step length adjustment method may be:

for μ | phi_i(k)‖，i＝0，1，...，N-1，

If mu · | phi_i(k) II > MaxStepSize, MaxStepSize is the maximum step size threshold, thenAt this time, the adaptive filter is in an abnormal working state, and then the step size is adjusted, which may be scaling down the step size, for example, by 10 times. I.e., μ ═ 0.1 μ.

Experiments show that after the step length adjusting module is added, although the convergence speed of the frequency domain adaptive filter is reduced to a certain extent, the coefficient is not easy to diverge, and the stability of the adaptive filter is greatly enhanced.

The filtering control module and the step length adjusting module ensure the stable work of the self-adaptive filter to a certain extent. However, some sudden events, or unexpected situations, may still cause the adaptive filter to diverge, and the diverging filter may cause the speaker to emit a loud noise. Therefore, the present invention proposes a strategy for dealing with special situations, and as shown in fig. 3, a coefficient adjustment module can be added as a last line of defense for ensuring stable operation of the adaptive filter.

The principle of the operation of the coefficient adjusting module is simple, that is, when the adaptive filter diverges, the coefficient of the adaptive filter tends to be larger, so that the task of coefficient adjustment is to check the size of the coefficient after each coefficient update, and if the coefficient is larger than a set threshold, the adaptive filter is considered to diverge. Specifically, for the frequency domain NLMS algorithm, as mentioned above, the coefficient update is shown as follows:

(2.9)

where W (k) is the frequency domain adaptive filter coefficient, which is an N-dimensional complex vector, and N is the number of FFT points. Namely: w (k) ═ W₀(k)，W₁(k)，...，W_N-1(k)]^T

(2.10)

The magnitude of the coefficients is measured modulo a complex number. Namely:

[‖W₀(k)‖，‖W₁(k)‖，...，‖W_N-1(k)‖]^T (2.11)

for | W_i(k)‖，i＝0，1，...，N-1，

If | W_i(k) Ii > MaxParam, where MaxParam is the maximum coefficient threshold, then it is determined that the frequency domain adaptive filter has now diverged, and the coefficients of the adaptive filter are adjusted, which may be reducing the adaptive filter coefficients, such as may be zeroed, i.e.: w (k) is 0. After the coefficients are set to zero, the adaptive filter will resume convergence, thus saving the filter from the divergence state. The threshold value MaxParam needs to be carefully selected according to the gain of a feedback loop, and the value is too large, so that the coefficient monitoring is insensitive and the divergence state cannot be effectively identified; the value is too small, and misjudgment is easy to occur, so that the adaptive filter is frequently restarted and cannot work normally.

In addition, a Non-linear processing module, namely an NLP (Non-Linear processor) module, can be added. This is because typical loudspeakers have 5% -10% nonlinear distortion, and adaptive filtering can only track linear systems, so that the nonlinear distortion of the signal in the feedback loop is unpredictable and eliminated. Therefore, an NLP processing module can be added after adaptive filtering to eliminate nonlinear distortion.

Because NLP processing is only performed for non-linear distortion of the speaker, the module can be turned off when not needed, which requires adding a non-linear processing control module and a speaker voiced detection module for controlling the turning on and off of the non-linear processing module, wherein the speaker voiced detection module can use VAD2 in the voiced detection module.

The specific control principle is as follows: when (1) SpkSignal _ avg > NoiseFloor, i.e., VAD2 detects speaker voiced; and (2) SpkSignal _ avg/Ee > alpha, i.e. the loudspeaker signal is alpha times larger than the residual signal; NLP processing is initiated. If either of the conditions (1) and (2) is not met, the NLP module is closed.

Where condition (1) states that when the speaker is silent, no echo is possible, and NLP processing is unnecessary; condition (2) shows that when the near end has sound, Ee is larger, so that condition (2) is not satisfied, NLP processing is closed, and the near end signal is transmitted without distortion.

In the formula: SpkSignal _ avg is the short-time average amplitude of the speaker output signal, noise floor is the estimated noise level, E [ E ] is the short-time average amplitude of the residual signal, and the value of α can be 2 in this embodiment. The short-time average amplitude may be an average of sums of absolute values of signals in one frame of signal.

The NLP processing in this scheme can adopt a center clipping method to suppress the residual echo. As shown in fig. 5, a schematic view of NLP processing for center clipping. Its action can be represented by the following formula: when E > NLPfloor,

e^{'} = \{\begin{matrix} e - NLPfloor, if (e > NLPfloor) \\ e + NLPfloor, elseif (e < - NLPfloor) \\ 0, else \end{matrix} - - - (3.1)

in the formula, e and e' are residual echoes before and after passing through the NLP module. E [ E ] is the short-term average amplitude, NLPfloor is the decision level, the value of which needs to be carefully chosen, too small to effectively suppress the residual echo, and too large to seriously affect the near-end sound quality.

In addition, when E ≦ NLPfloor, E' may be replaced with comfort noise. The reason why e 'is replaced by comfort noise is that if e' is set to zero directly, noise is introduced when NLP switches on and off, giving the illusion of half-duplex. Comfort noise may be generated using an analog gaussian random signal.

The following describes a method for performing echo cancellation of a microphone by using a frequency-domain adaptive filtering method according to the present invention.

First, some basic concepts are explained as used below, the frequency domain filter coefficients are:

W (k) = FFT [\begin{matrix} w (k) \\ 0 \end{matrix}],

length 2M, where w (k) is the corresponding time-domain adaptive filter coefficient of length M, using overlap-and-hold method.

On the basis of the echo cancellation method in the background art, the invention proposes to add a frame length adjustment step, which is used for adjusting the length of a data frame. First, this step is explained in detail, and in the present invention, the length is adjusted to any positive integer L smaller than M. Such as: the length of the frequency domain filter coefficients is 2M, corresponding to the time domain filter coefficientsWith a length of M, the length of each new incoming data frame can be adjusted to half the length of the time-domain filter coefficients, i.e., L equals M/2(M is an even number). Then compared to the background art for the input signal

The original two-frame combination is changed into four-frame combination. Through the improvement, on one hand, the length of the self-adaptive filter is long enough to meet the delay requirement of a feedback loop; on the other hand, the updating frequency of the adaptive filter coefficient is also considered.

In the above example, L is M/2, and in actual use, it may be M/4, M/3, M/8, or the like, so that the coefficient update frequency of the adaptive filter can be made higher. Only accordingly, the length of the data intercepted by the residual echo intercepting module needs to be changed. In practical use, the length L of the data frame may be any number less than M, for example: m is 1024, then L can be 1000, 900, 650, etc. any value less than 1024. However, this comes at the cost of increased algorithm complexity. Note that after one adjustment, the frame length is relatively fixed until all data is processed, rather than performing frame length adjustment every time a frame of data is collected. Finally, because the data volume of each frame is L, when the residual echo is output, a residual echo interception step is added for intercepting the first L or L data of the obtained residual echo and outputting the intercepted data as a final result.

The microphone echo cancellation method using the frequency domain adaptive filtering method, which adds the frame length adjustment step and the residual echo interception step, is completely described below by taking M as an example 1024.

1) A frame length adjusting step, wherein the frame length is adjusted to a positive integer value L smaller than M; in this embodiment, let L be 800.

2) Collecting a frame of k-th far-end sound signal to be output to a loudspeakerFrame lengthIs 80O.

3) The current frameThe 800 data in the frame and the previous 2M-L2048-800-1248 data are combined to form a large frame with the length of 2M

As shown in fig. 4, the newly acquired current frame

The 800 data and the 1248 data form a large frame with length of 2048

u (800k-1248) is the 1248 th data before the original k frame,

u (800k-2) is the 2 nd data before the original k frame,

u (800k-1) is the previous data before the original k frame,

u (800k) is the 1 st data in the original k frame data,

u (800k +799) is the 800 th data in the original k frame data.

When the first and second frame signals are collected initially, the third frame data is waited to come and then combined with the 448 data in the first frame data and 800 data in the second frame to form a large frame with length of 2048

And carrying out adaptive filtering processing once. The following data is new data every coming frame

Namely, data combination is carried out and then adaptive filtering processing is carried out once.

4) Will be provided with

Performing FFT processing, and converting into a frequency domain to obtain:

<math> <mrow> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mo>[</mo> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> <mo>.</mo> </mrow> </math>

5) filtering U (k) by using current filter coefficient W (k) by using an overlap-and-leave method, namely multiplying the result on a frequency domain, performing IFFT processing on the result, and taking the last M data of the result, namely the last 1024 data, and recording the result as the last M data

Namely, the method comprises the following steps:

<math> <mrow> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>[</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>]</mo> <mo>=</mo> <mi>IFFT</mi> <mo>[</mo> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> <mo>.</mo> </mrow> </math>

6) the far-end sound signalAfter being played by the loudspeaker, the echo signal with the length of M is collected by the microphone to pass through an acoustic loop between the loudspeaker and the microphoneRepresents, i.e.:

the above-mentioned

And the step 5) mentioned above

Subtracting to obtain an error signal

Comprises the following steps:

<math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>[</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>]</mo> <mo></mo> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>

7) intercept the

The resulting first L signals are output as final residual echoes;

8) at said length M, not intercepted

M zeros are supplemented in advance, and FFT processing is carried out to obtain:

；

9) in thatThen, M zeros are complemented, FFT processing is carried out, the result is multiplied by the self-adaptive step size mu, the obtained product is added with the filter coefficient W (k), and the updated value of the filter coefficient in the frequency domain form can be obtained as follows:

the next adaptive filtering uses the updateFiltering the subsequent filter coefficient W (k +1) as the current W (k);

10) and step 2) is executed until the sound signal from the far end is input, and the whole process is finished.

In the above-mentioned embodiment, the value of L is 800, and in practical use, the value may also be other integer values smaller than M, such as 600, 500, and so on. In addition, the value of L can be M/n, i.e., 1024/n, n being an integer greater than 1, and 1024/n also being an integer. If it can be 1024/2, only 4 data frames need to be combined to get a large frame with length 2048. In this case, the filter coefficient can be updated once every 1024/2 data, so that the convergence rate of the filter coefficient is increased, and the efficiency is improved.

Before the step 1), the method may further include an active sound detection step and a filtering control step, and the active sound detection step and the filtering control step are used for integrally controlling the operation of the filter, and the method includes:

a filtering control step of controlling the operation of the filter according to the result of the voiced sound detection step, specifically:

Completing the frame processing;

if the detection result of the microphone input end is voiced, then the detection result of the loudspeaker output end is seen, and if the detection result of the loudspeaker output end is unvoicedThen the adaptive filtering is normally performed but the coefficient updating is not performed, and the output

Completing the frame processing;

This frame processing is completed.

Wherein,in order to be the echo received by the microphone,

for the pairs of the adaptive filter outputs

The predicted value of (a) is determined,

is the residual echo.

The voiced sound detection is to judge whether voiced sound exists or not by comparing the short-time average amplitude of the sound signals at the microphone input end and the loudspeaker output end with the noise level, and specifically comprises the following steps:

the above-mentioned

Is the short-time average amplitude of the microphone input signal,

the above-mentioned

Outputting signals for loudspeakersThe short-time average amplitude of the signal,for the loudspeaker output signal, L is the length of a frame of speech signal.

The method also comprises a step length adjusting step, which is used for reducing the coefficient updating step length of the adaptive filter when the coefficient updating step length of the adaptive filter is detected to be larger than the set maximum coefficient updating step length threshold value. The step size of the coefficient update of the adaptive filter is reduced, and the step size of the coefficient update can be reduced by a certain proportion.

And when the updating step length of the self-adaptive filter coefficient is detected to be recovered to be normal, the coefficient updating step length is recovered to the initial value.

In addition, a coefficient adjusting step is included for reducing the coefficient of the filter when the coefficient of the adaptive filter is detected to be larger than the set coefficient threshold value. To effectively prevent the filter coefficients from diverging.

Further comprising a non-linear processing step: firstly, calculating the short-time average amplitude E (e) of the minimized residual signal; then, whether E (e) is larger than a preset nonlinear processing threshold NLPfloor is judged, and if yes, the minimized residual noise e' (n) is calculated by using the following formula:

If E (e) is ≦ NLPfloor, e' is directly replaced with comfort noise.

The step 7) may be followed by a nonlinear processing switch control step, specifically: detecting the sound condition of the output end of the loudspeaker; and turning on or off the nonlinear processing step according to the detection result.

The opening or closing method specifically comprises the following steps: when the output end of the loudspeaker is detected to be voiced, namely SpkSignal _ avg is larger than NoiseFlor, and the signal of the output end of the loudspeaker is larger than the residual signal by alpha times, namely SpkSignal _ avg/E [ E ] > alpha, if alpha is 6, the nonlinear processing module is started;

if one of the two conditions is not met, the NLP processing is closed; wherein: SpkSignal _ avg is the short-time average amplitude of the speaker output signal, noise floor is the estimated noise level, and e (e) is the short-time average amplitude of the residual signal.

By using the technical scheme of the invention, the frequency domain filter can work efficiently and stably, and the specific performance indexes obtained by experiments are as follows:

echo compression: 50-60 dB;

convergence time: less than 1 s;

supported feedback loop delay time: adjustable, e.g., at 8K sample rate, filter length 1024, 128ms delay can be supported.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and the like that are within the spirit and principle of the present invention are included in the present invention.

Claims

1. A microphone echo cancellation device for canceling echo generated by an acoustic loop between a speaker and a microphone, comprising:

a data acquisition and combination module for acquiring the sound signal u from the far end to be output to the loudspeaker, wherein the length of the data frame acquired each time is M, and the current frame, namely the kth frame data is recorded asAnd is combined with the previous frame, i.e. the k-1 frame data

Jointly forming a large frame of length 2M

W (k) = FFT [\begin{matrix} w (k) \\ 0 \end{matrix}]

Length 2M, where w (k) is the time domain coefficient of the filter, length M; the frequency domain adaptive filter is used for converting the frequency domain adaptive filter into a frequency domain adaptive filterFFT processing is carried out, and the frequency domain is converted to obtain

Taking the next M points of the result;

a subtractor for collecting the echo with length M by a microphone

Subtracting the predicted valueObtaining a residual echo

The frequency domain adaptive filter further comprises a voice correlation detection unit for calculating the residual echo in the frequency domainWith the sound signal from the far endTo obtain a speech correlation parameterWherein U is^H(k) Is the conjugate value to said U (k),

to, for

Taking the first M points of the result;

the device is characterized by also comprising a frame length adjusting module, a frame length adjusting module and a frame length adjusting module, wherein the frame length adjusting module is used for setting the data frame length of the u to be a value L smaller than M;

and correspondingly, the echo signal processing device also comprises a residual echo intercepting module for intercepting the residual echoThe first L signals of each frame result, the final residual echo e is obtained.

2. The echo cancellation device according to claim 1, wherein the frame length adjusting module adjusts a frame length from M to L/n, where n is an integer greater than 1; correspondingly, the data acquisition and combination module combines the current frame of u and the immediately previous 2n-1 data frames into a large frame with the length of 2M.

3. The echo cancellation device according to claim 1 or 2, further comprising an audible detection module and a filtering control module,

Completing the frame processing;

if the detection result of the microphone input end is voiced, then the detection result of the loudspeaker output end is seen, and if the loudspeaker output end is voicedDetecting silence, the adaptive filtering is normally carried out, but the coefficient updating is not carried out, and the outputCompleting the frame processing;

And the updated filter coefficient W (k +1), completing the frame processing.

4. The echo cancellation device according to claim 3, wherein the sound detection module determines whether there is sound by comparing the short-time average amplitude of the sound signals at the microphone input and the speaker output with a noise level, and specifically:

the above-mentioned

Is the short-time average amplitude of the microphone input signal, wherein

the above-mentionedThe average amplitude of the signal input to the loudspeaker is a short-time average amplitude,for signals input to the speakers, L is the frame length.

5. The echo cancellation device of claim 1 or 2, further comprising a step size adjustment module configured to detect a coefficient update step size μ of the adaptive filter, and to decrease the value of μ when μ is greater than a set maximum coefficient update step size threshold.

6. The echo cancellation device according to claim 5, wherein the coefficient update step is restored to the initial value when it is detected that the update step of the adaptive filter coefficients is restored to normal.

7. The echo cancellation device according to claim 1 or 2, further comprising a coefficient adjustment module configured to decrease the filter coefficient w (k) when detecting that the coefficient w (k) of the adaptive filter is greater than a set coefficient threshold.

8. The echo cancellation device according to claim 1 or 2, further comprising: and the nonlinear processing module is used for suppressing nonlinear components in the echo.

9. The echo cancellation device of claim 8, wherein the non-linear processing module causes the non-linear processing module to perform the processing when e (e) > nlpffloor

10. The echo cancellation device of claim 8, wherein e' is directly replaced with comfort noise when e (e) ≦ nlpffloor.

11. The echo cancellation device according to claim 8, further comprising:

if one of the two conditions is not met, the NLP processing is closed;

12. A microphone echo cancellation method, using a frequency domain adaptive filtering method to cancel an echo d generated by a sound signal u from a far end through an acoustic loop between a loudspeaker and a microphone, and finally obtaining a residual echo e, wherein the coefficient of a time domain filter is w (k), the length of the time domain filter is M, and the coefficient of a corresponding frequency domain filter is:

W (k) = FFT [\begin{matrix} w (k) \\ 0 \end{matrix}]

the length is 2M, and an overlapping reservation method is adopted;

it is characterized in that the preparation method is characterized in that,

1) setting the data frame length L of the signal u acquired each time;

2) collecting a frame signal according to the set frame length L

， Represents a k-th frame signal;

3) the current frameMerging the data with the previous 2M-L data into a large frame with the length of 2M

4) Will be described inConversion to the frequency domain, overlap-preserving, with filter coefficients W (k) for said frequency domain

5) Collecting echoes

And subtract

Obtaining the minimum residual echo signal of the k frame

6) According to the aboveAndupdating the filter coefficient W (k) to obtain W (k + 1);

13. The method of claim 12,

the frequency domain adaptive filtering algorithm comprises the following steps:

3) Will be described in

The L data in the frame and the immediately previous 2M-L data are combined to form a large frame with the length of 2M

u (kL-2M + L) is the 2M-L data before the original k frame,

u (kL-2) is the 2 nd data before the original k frame,

u (kL-1) is the previous data of the original k-th frame,

u (kL) is the 1 st data in the original k frame,

u (kL + L-1) is the L-th data in the original k-th frame;

4) will be provided with

Performing FFT processing, and converting into a frequency domain to obtain:

5) filtering U (k) by current filter coefficient W (k) by overlap preserving method, i.e. multiplying in frequency domain, then taking the last M data of the result after IFFT processing to the result, and recording asNamely, the following steps are provided:

Represents, i.e.:

the above-mentioned

And the step 5) mentioned above

Subtracting to obtain an error signal

Comprises the following steps:

7) intercept the

The resulting L signals are output as final residual echoes;

9) in the above-mentionedThen, M zeros are complemented, FFT processing is carried out, the result is multiplied by the self-adaptive step size mu, the obtained product is added with the filter coefficient W (k), and the updated value of the filter coefficient in the frequency domain form can be obtained as follows:

the next time of adaptive filtering, namely, the updated filter coefficient W (k +1) is adopted for filtering;

10) and step 2) is executed until the sound signal from the far end is input.

14. The method of claim 12 or 13, wherein the value of L is M/n, and n is an integer greater than 1.

15. The method according to claim 12 or 13, characterized by further comprising a step of detecting presence of sound and a step of controlling filtering before the step 1), comprising:

the method specifically comprises the following steps:

Completing the frame processing;

Completing the frame processing;

Completing the frame processing;

wherein,

in order to be the echo received by the microphone,

for the pairs of the adaptive filter outputs

The predicted value of (a) is determined,is the residual echo.

16. The method of claim 15, wherein the sound detection is performed by comparing the short-time average amplitude of the sound signals at the microphone input and the speaker output with a noise level to determine whether sound is present, specifically:

the above-mentioned

The short-time average amplitude of the microphone input signal,

the above-mentioned

The short-time average amplitude of the output signal of the loudspeaker is obtained,

for the loudspeaker output signal, L is the length of a frame of speech signal.

17. The method according to claim 12 or 13, further comprising a step size adjustment step of decreasing the coefficient update step size of the adaptive filter when it is detected that the coefficient update step size of the adaptive filter is greater than a set maximum coefficient update step size threshold.

18. The method of claim 17, wherein the coefficient update step size is restored to the initial value upon detecting that the update step size of the adaptive filter coefficients is restored to normal.

19. The method according to claim 12 or 13, further comprising a coefficient adjusting step for reducing the coefficients of the filter when it is detected that the coefficients of the adaptive filter are greater than a set coefficient threshold.

20. The method according to claim 12 or 13, further comprising a non-linear processing step of:

21. The method of claim 20, wherein if e (e ≦ nlpffloor, e' is directly replaced with comfort noise.

22. The method according to claim 20, further comprising a step of nonlinear processing of the switch control, in particular:

detecting the sound condition of the output end of the loudspeaker;