CN1953060A - Echo elimination device for microphone and method thereof - Google Patents

Echo elimination device for microphone and method thereof Download PDF

Info

Publication number
CN1953060A
CN1953060A CNA2006101440555A CN200610144055A CN1953060A CN 1953060 A CN1953060 A CN 1953060A CN A2006101440555 A CNA2006101440555 A CN A2006101440555A CN 200610144055 A CN200610144055 A CN 200610144055A CN 1953060 A CN1953060 A CN 1953060A
Authority
CN
China
Prior art keywords
mrow
frame
coefficient
signal
loudspeaker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2006101440555A
Other languages
Chinese (zh)
Other versions
CN100524466C (en
Inventor
张晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vimicro Corp
Original Assignee
Vimicro Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vimicro Corp filed Critical Vimicro Corp
Priority to CNB2006101440555A priority Critical patent/CN100524466C/en
Publication of CN1953060A publication Critical patent/CN1953060A/en
Application granted granted Critical
Publication of CN100524466C publication Critical patent/CN100524466C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

This invention discloses a microphone echo elimination device and method, which eliminates echo between microphone and sound circuit, wherein, the device comprises long frame adjust module to combine one self adaptive filter parameter data frame for self adapting filter.

Description

Echo cancellation device and echo cancellation method for microphone
Technical Field
The present invention relates to the field of echo cancellation, and in particular, to a microphone echo cancellation device and method using a frequency domain adaptive filter, which are used for canceling an echo generated by an acoustic loop between a speaker and a microphone.
Background
The echo is generated due to the presence of an acoustic loop between the loudspeaker and the microphone. As shown in fig. 1, the sound signal from the far end, which reaches the near end through the communication connection, is recorded as a signal u, is emitted through the near end speaker, passes through the acoustic loop g between the speaker and the microphone, is collected by the microphone to obtain a reference signal d, and then is transmitted back to the far end through the communication connection. At this time, the far-end speaker can hear the echo of the far-end speaker, i.e., the far-end echo. Thereby seriously affecting call quality.
Since the acoustic loop g from the loudspeaker to the microphone is unknown and time-varying, the method of adaptive filtering is widely adopted in echo cancellation schemes. Fig. 1 shows a basic schematic diagram of echo cancellation using adaptive filtering. The adaptive filter takes the minimized residual echo e as a target, and carries out filtering processing on a sound signal u from a far end by adaptively adjusting the filter coefficient of the adaptive filter to track an acoustic feedback loop g from a loudspeaker to a microphone and generate a predicted value y of the echo d received by the microphone. When the filter accurately tracks g, y is very close to d, so that e-d-y tends to 0, thereby realizing the effect of eliminating echo.
In the adaptive filtering process, the adaptive filter needs to track an unknown feedback loop, i.e. to model an unknown device. When the unknown feedback loop g has a large delay, i.e. the unknown apparatus has a high order, the adaptive filter at least needs to have the same order to obtain the best analog effect. Since the time-domain adaptive filtering process is a convolution process of the input signal and the adaptive filter, the complexity of the algorithm increases sharply with the increase of the order of the adaptive filter, and is not practical when the delay of the feedback loop is large. The subband adaptive filtering can reduce the operation complexity, but can bring the problem of signal aliasing.
The convolution of the time domain is equal to the multiplication of the frequency domain, and the self-adaptive filtering algorithm of the frequency domain can reduce the algorithm complexity and improve the operation efficiency when the order of the filter is higher through the fast algorithm by means of FFT, so that the method is a very practical filtering mode.
The frequency domain adaptive filtering algorithm in the prior art is generally as follows.
Some of the signal labels used hereinafter will first be described. In frequency-domain adaptive filtering, the input signal is processed in units of frames, in this context
Figure A20061014405500121
Represents the current frame signal of the signal x, i.e. the k-th frame signal. Such as by
Figure A20061014405500122
Representing the sound signal of the k-th frame coming from the far end and about to be output to the loudspeaker
Figure A20061014405500123
Represents a sound signal having a combined length of 2M, andrepresenting the echo signal of the kth frame acquired by the microphone, etc. Further, the treatment is represented by w (k)
Figure A20061014405500125
The time-domain filter coefficients, and their corresponding frequency-domain filter coefficients, are denoted by w (k). The FFT represents a fast fourier transform, and the IFFT represents an inverse fast fourier transform.
An echo cancellation device to which a frequency domain adaptive filter is applied generally includes the following components.
(1) A data acquisition and combination module for acquiring the sound signal u from the far end to be output to the loudspeaker, wherein the length of the data frame acquired each time is M, and the current frame, namely the kth frame data is recorded asAnd combined with the previous frame, i.e. the k-1 th frame data u' (k-1), to form a large frame with a length of 2M
Figure A20061014405500128
(2) Assuming that the order of the adaptive filter is M, the time domain coefficient of the filter is denoted as w (k), and an overlap preservation method is adopted, in order to avoid aliasing, the filter of the M order is extended by M0 s to form a filter with N being 2M coefficients, and the frequency domain coefficient of the filter obtained after FFT processing is: W ( k ) = FFT w ( k ) 0 , the length is 2M.
The frequency domain adaptive filter is used for converting the frequency domain adaptive filter into a frequency domain adaptive filter
Figure A20061014405500132
FFT processing is carried out, and the frequency domain is converted to obtain <math> <mrow> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mo>&lsqb;</mo> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>;</mo> </mrow> </math> Filtering the U (k) by using the current filter coefficient W (k), and then performing IFFT processing on the filtering result to obtain the echoA frame prediction value of <math> <mrow> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>IFFT</mi> <mo>&lsqb;</mo> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>,</mo> </mrow> </math> The results were taken M points thereafter.
(3) A subtractor for collecting the echo by a microphoneSubtracting the predicted valueObtaining a residual echo <math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> CollectedThe length is also M.
(4) The frequency domain adaptive filter further comprises a voice correlation detection unit for calculating the residual echo in the frequency domain
Figure A200610144055001310
With the sound signal from the far end
Figure A200610144055001311
To obtain a speech correlation parameter
<math> <mrow> <mover> <mi>&phi;</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>IFFT</mi> <mo>&lsqb;</mo> <msup> <mi>U</mi> <mi>H</mi> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>,</mo> </mrow> </math> Wherein U isH(k) Is the conjugate value to said U (k),
<math> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow> </math> to pair
Figure A200610144055001314
The first M points of the result are taken.
(5) The frequency domain adaptive filter further comprises a coefficient updating unit, which is used for updating the coefficient W (k) of the frequency domain adaptive filter according to the voice correlation and by combining the adaptive step size mu of the adaptive filter to obtain the coefficient
<math> <mrow> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>&mu;FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mover> <mi>&phi;</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> <mo>.</mo> </mrow> </math>
The frequency domain adaptive filter is updated once every time the coefficient W (k) of the frequency domain adaptive filter is adaptively filtered, and the adaptive filter performs frequency domain filtering on next combined big frame data by taking the updated coefficient W (k +1) as the current W (k) when the adaptive filter is next time.
Fig. 2 is a schematic diagram of a method for performing echo cancellation by using a frequency domain adaptive filtering method in the prior art, where a thin arrow represents time domain signal processing, and a thick arrow represents frequency domain signal processing. Since the signals are processed by frame division by the frequency domain adaptive filtering method, the u, y, d and e signals shown in fig. 1 correspond to those shown in fig. 2
Figure A20061014405500141
Figure A20061014405500143
Andk-th frame signals respectively representing the respective signals; in addition, in order to
Figure A20061014405500145
Show thatThe data of (2) are combined to obtain a large frame with the length of 2M. It is known that the block processing and re-combination after truncation of the long sequence needs to use overlap-add or overlap-save method to avoid aliasing, and the overlap-save method is described herein.
Firstly, assuming that the order of the time domain adaptive filter is M, the coefficient thereof is denoted as w (k), because an overlap preservation method is adopted, in order to avoid aliasing, the filter of the M order is expanded by M0, and the frequency domain coefficient vector of the filter obtained after FFT processing is:
W ( k ) = FFT w ( k ) 0 - - - ( 1.1 )
as can be seen from the above equation (1.1), the length N of the frequency domain adaptive filter coefficients w (k) is 2 times the length M of the time domain coefficient vector. For frequency domain adaptive filtering algorithms, both adaptive filtering and filter coefficient updating are done in the frequency domain, so the form of time domain filters will not appear. It should be noted that the FFT or IFFT processing length mentioned later is N points.
The steps of the frequency domain adaptive filtering processing are as follows:
1) collecting a frame of sound signal from a remote endThe frame length is M.
2) For input signal
Figure A20061014405500149
Processing to connect two frames, i.e. to
Figure A200610144055001410
And merging the data of the k-1 frame of the previous frame into a large frame to obtain the following formula:
Figure A20061014405500151
whereinThe length of the kth merged large frame is N-2M;
u (kM-M) is the 1 st data in the original k-1 frame data;
u (kM-1) is the Mth data in the original k-1 frame data;
u (km) is the 1 st data in the original k frame data;
u (kM + M-1) is the Mth data in the original k frame data.
3) Will be provided withPerforming FFT processing, and converting into a frequency domain to obtain:
<math> <mrow> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mo>&lsqb;</mo> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1.3</mn> <mo>)</mo> </mrow> </mrow> </math>
4) filtering the input signal, namely multiplying the input signal in a frequency domain, then performing IFFT processing, converting the input signal into a time domain, and taking the next frame of the result, namely the next M data, namely the predicted value of the echo signal:
<math> <mrow> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>&lsqb;</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>=</mo> <mi>IFFT</mi> <mo>&lsqb;</mo> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1.4</mn> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math>
5) for collecting echo signals
Figure A20061014405500156
Represents, i.e.:
<math> <mrow> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>&lsqb;</mo> <mi>d</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>d</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>d</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1.5</mn> <mo>)</mo> </mrow> </mrow> </math>
the residual echo signal is the difference between the echo signal and its predicted value:
<math> <mrow> <mtext></mtext> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>[</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>]</mo> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1.6</mn> <mo>)</mo> </mrow> </mrow> </math>
6) m zeros are added in front of the residual echo signal, and FFT processing is performed to obtain a frequency domain residual echo signal:
<math> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1.7</mn> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math>
the update amount of the adaptive filter coefficients is calculated using e (k) and u (k). First, conjugate U (k) to obtain UH(k) In that respect In the frequency domain, the update amount of the adaptive filter coefficient vector is determined by calculating the correlation between the error signal and the input signal, since the linear correlation is equivalent in form to an inverse linear convolution, a fast algorithm having FFT in the frequency domain by means of convolution in the time domain has:
<math> <mrow> <mover> <mi>&phi;</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>IFFT</mi> <mo>&lsqb;</mo> <msup> <mi>U</mi> <mi>H</mi> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1.8</mn> <mo>)</mo> </mrow> </mrow> </math>
according to the overlap-and-hold method, in the above formula, the frame after the result needs to be deleted, i.e. only the first M points of the IFFT result are taken.
7) Finally we utilize The adaptive filter coefficients are updated. Note that: the filter coefficients in the frequency domain are generated by zero-filling the time domain coefficients followed by FFT processing, so here, correspondingly, the filter coefficients in the frequency domain will be generatedAfter M zeros are complemented, FFT processing is carried out, the result is multiplied by the self-adaptive step length mu, the obtained product is added with the filter coefficient W (k) before updating, and the frequency domain form of the filter coefficient updating can be obtained as follows:
<math> <mrow> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>&mu;FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mi>&phi;</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1.9</mn> <mo>)</mo> </mrow> </mrow> </math>
and the next time of adaptive filtering, namely, the W (k +1) is adopted as the updated current filter coefficient W (k) for filtering.
8) And (5) circularly performing the steps 1) to 7) until the data processing is finished.
It can be seen from the steps of the frequency domain adaptive filtering algorithm that the filtering coefficient of the frequency domain adaptive filter is updated once every frame of signal with the frame length of M, so the convergence rate is slow, and especially when the characteristic of the feedback loop changes fast, the effect is not ideal.
Disclosure of Invention
In order to solve the above-mentioned drawbacks of the prior art, the present invention provides an echo cancellation device and an echo cancellation method, so that the coefficients of the frequency domain adaptive filter can work efficiently and stably, thereby achieving the purpose of effectively canceling echo.
In order to solve the above problem, the present invention provides a microphone echo cancellation device for canceling an echo generated by an acoustic loop between a speaker and a microphone, comprising:
a data acquisition and combination module for acquiring the sound signal u from the far end to be output to the loudspeaker, wherein the length of the data frame acquired each time is M, and the current frame, namely the kth frame data is recorded asAnd is compared with the previous frame, i.e. the k-1 frame dataJointly forming a large frame of length 2M
Figure A20061014405500174
Frequency-domain adaptive filter whose current filter frequency-domain coefficients are denoted W ( k ) = FFT w ( k ) 0 , Length 2M, where w (k) is the time domain coefficient of the filter, length M; the frequency domain adaptive filter is used for converting the frequency domain adaptive filter into a frequency domain adaptive filter
Figure A20061014405500176
FFT processing is carried out, and the frequency domain is converted to obtain <math> <mrow> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mo>&lsqb;</mo> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> </math> (ii) a Filtering the U (k) by using the current filter coefficient W (k), and then performing IFFT processing on the filtering result to obtain the echoA frame prediction value of <math> <mrow> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>IFFT</mi> <mo>&lsqb;</mo> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>,</mo> </mrow> </math> Taking the next M points of the result;
a subtractor for collecting the echo with length M by a microphone
Figure A200610144055001710
Subtracting the predicted value
Figure A200610144055001711
Obtaining a residual echo <math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
The frequency domain adaptive filter further comprises a voice correlation detection unit for calculating the residual echo in the frequency domain
Figure A200610144055001713
With the sound signal from the far endTo obtain a speech correlation parameter <math> <mrow> <mover> <mi>&phi;</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>IFFT</mi> <mo>&lsqb;</mo> <msup> <mi>U</mi> <mi>H</mi> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>,</mo> </mrow> </math> Wherein U isH(k) Is the conjugate value to said U (k), <math> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow> </math> to pairTaking the first M points of the result;
the frequency domain adaptive filter further comprises a coefficient updating unit, which is used for updating the coefficient W (k) of the frequency domain adaptive filter according to the voice correlation and by combining the adaptive step size mu of the adaptive filter to obtain the coefficient
<math> <mrow> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>&mu;FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mover> <mi>&phi;</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow> </math>
The coefficient W (k) of the frequency domain adaptive filter is updated once each time the frequency domain adaptive filter performs adaptive filtering, and the adaptive filter performs frequency domain filtering on next combined big frame data by using the updated coefficient W (k +1) when performing the next adaptive filtering;
the frame length adjusting module is used for setting the data frame length of the u to a value L smaller than M;
correspondingly, the data acquisition and combination module is used for combining L data of the current kth frame data and the immediately preceding 2M-L continuous data to form a large frame with the length of 2M;
accordingly, the frequency domain adaptive filter adaptively filters the 2M large frame; after the filtering processing of each frame of data with the length of L is finished, updating the frequency domain filtering coefficient of the filter;
and correspondingly, a residual echo interception module is also included,for intercepting said residual echoThe first L signals of each frame result, the final residual echo e is obtained.
Preferably, the frame length adjusting module adjusts the frame length from M to L ═ M/n, where n is an integer greater than 1; correspondingly, the data acquisition and combination module combines the current frame of u and the immediately previous 2n-1 data frames into a large frame with the length of 2M.
Preferably, the system also comprises an acoustic detection module and a filtering control module,
the sound detection module comprises two sound detection units which are respectively used for detecting sound conditions of the microphone input end and the loudspeaker output end and outputting the detection results to the filtering control module;
the filtering control module is used for controlling the work of the frequency domain self-adaptive filter according to the output result of the sound detection module,
if the microphone input end sound detection result is soundless, then the self-adaptive filtering is not carried out, the coefficient updating is not carried out, and the output is directly made <math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing;
if the detection result of the microphone input end is voiced, then the detection result of the loudspeaker output end is seen, if the detection of the loudspeaker output end is unvoiced, the self-adaptive filtering is normally carried out, but the coefficient updating is not carried out, and the output is carried out <math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing;
if the detection results of the microphone input end and the loudspeaker output end are voiced, the adaptive filter is in a normal working state, namely adaptive filtering is carried out, and coefficient updating is also carried out to obtain output <math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> And the updated filter coefficient W (k +1), completing the frame processing.
Preferably, the voiced sound detection module determines whether there is a voiced sound by comparing the short-time average amplitude of the sound signals at the microphone input end and the speaker output end with the noise level, specifically:
if MicSignal _ avg is larger than NoiseFloor, judging that the microphone line is voiced, otherwise, judging that the microphone line is unvoiced;
the above-mentioned <math> <mrow> <mi>MicSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>M</mi> <munderover> <mi>&Sigma;</mi> <mn>0</mn> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> </math> Is the short-time average amplitude of the microphone input signal, wherein
Figure A20061014405500195
Acquiring a sound signal with a frame length of M for a microphone, wherein M is the frame length, and NoiseFlor is an estimated noise level;
if the SpkSignal _ avg is larger than the NoiseFloor, judging that the loudspeaker line is voiced, otherwise, judging that the loudspeaker line is unvoiced;
the above-mentioned <math> <mrow> <mi>SpkSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>L</mi> <munderover> <mi>&Sigma;</mi> <mn>0</mn> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>,</mo> </mrow> </math> Is the short-time average amplitude of the signal input to the loudspeaker,
Figure A20061014405500197
for signals input to the speakers, L is the frame length.
Preferably, the apparatus further comprises a step size adjusting module, configured to detect a coefficient update step size μ of the adaptive filter, and decrease the value of μ when μ is greater than a set maximum coefficient update step size threshold.
Preferably, when it is detected that the update step size of the adaptive filter coefficient is restored to normal, the coefficient update step size is restored to the initial value.
Preferably, the adaptive filter further comprises a coefficient adjusting module, configured to decrease the filter coefficient w (k) when detecting that the coefficient w (k) of the adaptive filter is greater than a set coefficient threshold.
Preferably, the method further comprises the following steps: and the nonlinear processing module is used for suppressing nonlinear components in the echo.
Preferably, the nonlinear processing module enables the signal to be processed when E (e) > NLPfloor
Where e is a residual signal and is also an input of the nonlinear processing module, and an output of the nonlinear processing module is e', e (e) is a short-time average amplitude of the residual signal, and nlpfolor is a decision level.
Preferably, when E (e). ltoreq.NLPfloor, e' is directly replaced by comfort noise.
Preferably, the method further comprises the following steps:
the loudspeaker sound detection module is used for detecting the sound condition of the output end of the loudspeaker;
the nonlinear processing control module is used for turning on or off the nonlinear processing module according to the output result of the loudspeaker sound detection module;
when the loudspeaker sound detection module detects that the output end of the loudspeaker is sound, namely SpkSignal _ avg is larger than NoiseFlor,
when the signal at the output end of the loudspeaker is larger than the residual signal by alpha times, namely SpkSignal _ avg/Ee is larger than alpha, the nonlinear processing module is started;
if one of the two conditions is not met, the NLP processing is closed;
wherein: SpkSignal _ avg is the short-time average amplitude of the loudspeaker output signal, noiseflo is the estimated noise level, and e (e) is the short-time average amplitude of e.
The invention also provides a microphone echo cancellation method, which utilizes a frequency domain adaptive filtering method to cancel the echo d generated by the sound signal u from the far end through an acoustic loop between a loudspeaker and a microphone, and finally obtains a residual echo e, wherein the coefficient of a time domain filter is w (k), the length is M, and the coefficient of a corresponding frequency domain filter is as follows:
W ( k ) = FFT w ( k ) 0 , the length is 2M, and an overlapping reservation method is adopted; which comprises the following steps of,
1) setting the data frame length L of the signal u acquired each time;
2) collecting a frame signal according to the set frame length L
Figure A20061014405500212
Represents a k-th frame signal;
3) the current frame
Figure A20061014405500214
Merging the data with the previous 2M-L data into a large frame with the length of 2M
Figure A20061014405500215
4) Will be described in
Figure A20061014405500216
Conversion to the frequency domain, overlap-preserving, with filter coefficients W (k) for said frequency domain
Figure A20061014405500217
Filtering is carried out, the result is converted into a time domain, and a predicted value of the echo time domain is obtained
5) Collecting echoes
Figure A20061014405500219
And subtractObtaining the minimum residual echo signal of the k frame
Figure A200610144055002111
6) According to the above
Figure A200610144055002112
And
Figure A200610144055002113
updating the filter coefficient W (k) to obtain W (k + 1);
7) and 2) acquiring the next frame of signals, merging the signals, and performing frequency domain adaptive filtering by using the updated filter coefficient until the data input is finished.
Preferably, the frequency domain adaptive filtering algorithm includes the following steps:
1) frame length adjustment, namely adjusting the frame length of u from M to a positive integer value L smaller than M;
2) collecting the kth frame signal of u, the frame length is L, and recording as
3) Will be described in
Figure A200610144055002115
The L data in the data list and the 2M-L data in the immediate past are combined to form the combined dataOne large frame of length 2M
Figure A20061014405500221
Figure A20061014405500222
u (kL-2M + L) is the 2M-L data before the original k frame,
u (kL-2) is the 2 nd data before the original k frame,
u (kL-1) is the previous data of the original k-th frame,
u (kL) is the 1 st data in the original k frame,
u (kL + L-1) is the L-th data in the original k-th frame;
4) will be provided with
Figure A20061014405500223
Performing FFT processing, and converting into a frequency domain to obtain: <math> <mrow> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mo>&lsqb;</mo> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>;</mo> </mrow> </math>
5) filtering U (k) by current filter coefficient W (k) by overlap preserving method, i.e. multiplying in frequency domain, then taking the last M data of the result after IFFT processing to the result, and recording as
Figure A20061014405500225
Namely, the method comprises the following steps:
<math> <mrow> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>&lsqb;</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>=</mo> <mi>IFFT</mi> <mo>&lsqb;</mo> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>;</mo> </mrow> </math>
6) after being played by the loudspeaker, the u passes through an acoustic loop between the loudspeaker and the microphone, and then is collected by the microphone to obtain an echo signal with the length of M so as to obtain a signal
Figure A20061014405500227
Represents, i.e.:
<math> <mrow> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>&lsqb;</mo> <mi>d</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>d</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>d</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>,</mo> </mrow> </math>
the above-mentioned
Figure A20061014405500229
And the step 5) mentioned aboveSubtracting to obtain an error signal
Figure A200610144055002211
Comprises the following steps:
<math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>[</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>]</mo> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
7) intercept the
Figure A200610144055002213
The resulting L signals are output as final residual echoes;
8) at said length M, not interceptedM zeros are supplemented in advance, and FFT processing is carried out to obtain: <math> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow> </math>
simultaneously conjugating the U (k) in the step 4) to obtain UH(k) Then, performing dot multiplication with the E (k), and performing IFFT operation on the result, and obtaining the following result according to an overlap preservation method:
<math> <mrow> <mover> <mi>&phi;</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>IFFT</mi> <mo>&lsqb;</mo> <msup> <mi>U</mi> <mi>H</mi> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>,</mo> </mrow> </math>
in the above formula, the next frame of the result needs to be deleted, and only the first M points of the IFFT result are taken;
9) in the above-mentioned
Figure A200610144055002217
Then, M zeros are complemented, FFT processing is carried out, the result is multiplied by the self-adaptive step size mu, the obtained product is added with the filter coefficient W (k), and the updated value of the filter coefficient in the frequency domain form can be obtained as follows: <math> <mrow> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>&mu;FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mover> <mi>&phi;</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow> </math> the next time of self-adaptive filtering is to adopt the updated filter coefficient W (k +1) for filtering;
10) and step 2) is executed until the sound signal from the far end is input.
Preferably, the value of L is M/n, and n is an integer greater than 1.
Preferably, before the step 1), the method further comprises a sound detection step and a filtering control step, and the method comprises the following steps:
a sound detection step, detecting sound conditions of the microphone input end and the loudspeaker output end;
a filtering control step of controlling the operation of a filter according to the result of the voiced sound detection step;
the method specifically comprises the following steps:
if the detection result of the microphone input end is silent, then the self-adaptive filtering is not carried out, the coefficient updating is not carried out, and the output is directly made <math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing;
if the detection result of the microphone input end is voiced, then looking at the detection result of the loudspeaker output end, if the detection result of the loudspeaker output end is unvoiced, then the self-adaptive filtering is normally carried out, but the coefficient updating is not carried out, and the output is carried out <math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing;
if the detection results of the microphone input end and the loudspeaker output end are voiced, the adaptive filter is in a normal working state, and not only is adaptive filtering carried out, but also coefficient updating is carried out, and the adaptive filter is output <math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing;
wherein,in order to be the echo received by the microphone,
Figure A20061014405500236
for the pairs of the adaptive filter outputsThe predicted value of (a) is determined,is the residual echo.
Preferably, the sound detection is to determine whether there is sound by comparing the short-time average amplitude of the sound signals at the microphone input end and the speaker output end with the noise level, specifically:
if MicSignal _ avg is larger than NoiseFloor, judging that the microphone line is voiced, otherwise, judging that the microphone line is unvoiced;
the above-mentioned <math> <mrow> <mi>MicSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>M</mi> <munderover> <mi>&Sigma;</mi> <mn>0</mn> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>,</mo> </mrow> </math> Is the short-time average amplitude of the microphone input signal,
Figure A20061014405500242
for a microphone input signal, i.e., a received echo signal, M is the length of a frame of speech signal, noise floor is the estimated noise level;
if the SpkSignal _ avg is larger than the NoiseFloor, judging that the loudspeaker line is voiced, otherwise, judging that the loudspeaker line is unvoiced;
the above-mentioned <math> <mrow> <mi>SpkSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>L</mi> <munderover> <mi>&Sigma;</mi> <mn>0</mn> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>,</mo> </mrow> </math> For a short-time average amplitude of the loudspeaker output signal,
Figure A20061014405500244
outputting signals for loudspeakersAnd L is the length of a frame of speech signal.
Preferably, the method further comprises a step size adjusting step, configured to decrease the coefficient update step size of the adaptive filter when it is detected that the coefficient update step size of the adaptive filter is greater than the set maximum coefficient update step size threshold.
Preferably, when it is detected that the update step size of the adaptive filter coefficient is restored to normal, the coefficient update step size is restored to the initial value.
Preferably, the method further comprises a coefficient adjusting step, configured to decrease the coefficient of the filter when the coefficient of the adaptive filter is detected to be greater than the set coefficient threshold.
Preferably, the method further comprises the following nonlinear processing steps:
firstly, calculating the short-time average amplitude E (e) of the minimized residual signal;
then, whether E (e) is larger than a preset nonlinear processing threshold NLPfloor is judged, and if yes, the minimized residual noise e' (n) is calculated by using the following formula:
Figure A20061014405500245
wherein e is a residual signal and is an input of the nonlinear processing module, an output of the nonlinear processing module is e', e (e) is a short-time average amplitude of the residual signal, and nlpfolor is a decision level.
Preferably, if E (e). ltoreq.NLPfloor, e' is directly replaced by comfort noise.
Preferably, the method further comprises a nonlinear processing switch control step, specifically:
detecting the sound condition of the output end of the loudspeaker;
turning on or off the nonlinear processing step according to the detection result, specifically:
when the output end of the loudspeaker is detected to be voiced, namely SpkSignal _ avg is larger than NoiseFlor, and the signal of the output end of the loudspeaker is larger than a residual signal by alpha times, namely SpkSignal _ avg/E [ E ] > alpha, the nonlinear processing module is started;
if one of the two conditions is not met, the NLP processing is closed; wherein: SpkSignal _ avg is the short-time average amplitude of the speaker output signal, noise floor is the estimated noise level, e (e) is the short-time average amplitude of the residual signal, and α is a preset multiple value.
The frame length adjusting module added in the self-adaptive frequency domain filter makes the frame length of the sound signal from the far end processed at one time smaller than the time domain coefficient length of the self-adaptive filter, and then combines more than one frame of signal into a large frame to carry out self-adaptive filtering. On one hand, the length of the self-adaptive filter is kept to be original enough length, and the delay requirement of a feedback loop can be met; on the other hand, the updating frequency of the adaptive filter coefficient is increased, so that the adaptive filter can work efficiently. In addition, the filtering control module disclosed by the invention can ensure that the self-adaptive filter can not be converged wrongly under the special condition that a microphone input line or a loudspeaker output line is silent, so that the normal work of the self-adaptive filter is ensured; the step length adjusting module and the coefficient adjusting module enable the adaptive filter to recover to a normal working state under the condition of divergence; the nonlinear processing module may cancel nonlinear distortion in the feedback loop. Therefore, the echo cancellation device of the invention can make the adaptive filter work efficiently and stably, thereby achieving the purpose of effectively canceling echo.
Drawings
FIG. 1 is a schematic diagram of a basic structure of an apparatus for performing echo cancellation by adaptive filtering;
FIG. 2 is a diagram illustrating a method for performing echo cancellation by frequency-domain adaptive filtering in the prior art;
FIG. 3 is a schematic diagram of the structure of the voice detection module and the filtering control module in the device according to the present invention;
FIG. 4 is a diagram of a data merge unit according to the present invention;
fig. 5 is a schematic diagram of the relationship between echo and decision level before and after the nonlinear processing by the nonlinear processing module according to the present invention.
Detailed Description
The echo cancellation device and method of the present invention will be described in detail below with reference to the accompanying drawings.
In order for the adaptive filter to effectively track the feedback loop, the coefficient length of the adaptive filter must be greater than the number of sampling points of the feedback delay. For example, for a signal with 8K sampling rate, if the time-domain adaptive filter coefficient length M is 1024, the maximum feedback delay of the feedback loop that the filter can track and model is: 1024/8000-128 ms.
In the frequency domain adaptive filtering method described in the background art, the length of the frequency domain filter coefficient is 2M, the length of the corresponding time domain coefficient is M, and the length of each new incoming data frame is also M. That is, the adaptive filter time domain coefficient length is the same as the length of the new data frame, i.e. the adaptive filter coefficient length can be 1024, and then the data frame length processed once is also 1024. Thus, only about 8 filtering and coefficient updates are performed a second. For environments where the feedback loop changes faster, this update frequency is sometimes insufficient.
Therefore, as shown in fig. 3, on the basis of the frequency-domain adaptive filtering, the present invention adds a frame length adjusting module for adjusting the length of the data frame to L. Note that after one adjustment, the frame length is relatively fixed, rather than being adjusted every time a frame of data is acquired. Such as: the length of the frequency domain filter coefficient is 2M, the length of the corresponding time domain filter coefficient is M, and the length L of each new incoming data frame can be the filter time domain coefficientHalf the length, i.e., L ═ M/2(M is an even number). To the input signal
Figure A20061014405500261
The original two-frame combination is changed into four-frame combination. Through the improvement, on one hand, the length of the self-adaptive filter is still 2M and is long enough to meet the delay requirement of a feedback loop; on the other hand, the adaptive filter coefficient is updated once per M/2 frame length, and the updating frequency of the adaptive filter coefficient is also considered. However, this comes at the cost of increased algorithm complexity. Since the data amount of each frame is L, a residual echo intercepting module is added to intercept the first L data of the obtained residual echo and output the intercepted data as a final result when the residual echo is output.
In the above example, L is M/2, and in actual use, it may be M/3, M/4, M/8, or the like, so that the coefficient update frequency of the adaptive filter can be made higher. Only accordingly, the length of the data intercepted by the residual echo intercepting module needs to be changed.
In addition to this, the length L of each data frame may also be any number less than M, for example: m is 1024, then L can be 1000, 900, 650, etc. any value less than 1024. Only when the data frames are combined, the length of the combined large frame is ensured to be 2M. This problem can be solved by: as shown in FIG. 4, a FIFO buffer of length 2M is used to store the incoming data, with each new frame of data being received
Figure A20061014405500271
Will be provided withCombined with the previous 2M-L data into one large frameAn adaptive filtering process is performed once.
Adaptive filtering can automatically track the feedback loop, but for special cases, adaptive filters are prone to mis-tracking, such as the case where the microphone and speaker lines are silent at the same time. In this case, the input signal and the reference signal of the adaptive filter are small, and the adaptive filter is liable to misconvergence.
In order to prevent the filter from erroneously converging, the present invention proposes that a voiced sound detection module and a filtering control module may be added to the echo cancellation device, as shown in fig. 3.
The voiced detection module, i.e., VAD (voice Activity detector) module, may include two voiced detection units VAD1 and VAD2 located at the microphone input and the speaker output. VAD detection may make a decision by comparing the short-time average amplitude of the signal to the noise level. The short-time average amplitude of the signal can be obtained by calculating the average amplitude of the signal for one frame.
For the microphone input: <math> <mrow> <mi>MicSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>M</mi> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> </math>
(2.1)
in the formula: MicSignal _ avg is the short-time average amplitude of the microphone input signal,
Figure A20061014405500275
for the microphone input signal, M is the length of a frame of speech signal.
If MicSignal _ avg > NoiseFlor, the microphone line is judged to be voiced, otherwise, the microphone line is not voiced. Where noise floor is the estimated noise level.
Similarly, for the speaker output: <math> <mrow> <mi>SpkSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>L</mi> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <msup> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mo>'</mo> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> </math>
(2.2)
in the formula: SpkSignal _ avg is the short-time average amplitude of the loudspeaker output signal,
Figure A20061014405500282
l is the length of one frame of speech signal for the sound signal input to the speaker.
If SpkSignal _ avg > NoiseFlor, judging that the loudspeaker line is voiced, otherwise, judging that the loudspeaker line is unvoiced.
According to the output result of the sound detection unit, the filtering control module performs overall control on the work of the filter, and specifically comprises the following steps:
if the VAD1 detects silence, the output is directly made without adaptive filtering or filter coefficient updating <math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing; if the VAD1 detects a sound, then look at the VAD2 detection result, if the VAD2 detectsSilence, adaptive filtering, but no filter coefficient updating, is normally performed, and output <math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> </math> Completing the frame processing; if both VAD1 and VAD2 detect voiced sounds, then the adaptive filter is in a normal operating state, i.e., adaptive filtering is performed, filter coefficient updating is also performed, and the output <math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> </math> This frame processing is completed.
Experiments show that after filtering control is added, the adaptive filter can not be converged wrongly under the special condition that a microphone input line or a loudspeaker output line is silent, and normal work of the adaptive filter is guaranteed.
In addition, for adaptive filtering, if the reference signal collected by the microphone
Figure A20061014405500286
The sound emitted by the speaker is completely generated, so that the adaptive filter can easily track the feedback loop and can stably work. However, the signal collected by the microphone generally includes not only the sound emitted from the speaker but also the sound from the near endA sound signal, and such a sound signal sometimes also occupies a major component. Such a signal will therefore interfere with the adaptive filter tracking the feedback loop correctly, possibly leading to erroneous tracking of the adaptive filter and even coefficient divergence.
When the filter tracks incorrectly, coefficients begin to diverge, which is shown in coefficient updating, and the coefficient updating amount of the adaptive filter is usually larger at this time. Therefore, as shown in fig. 3, the present invention can add a step size adjustment module, and when it is detected that the coefficient update amount is relatively large, it is determined that the adaptive filter is in an abnormal working state at this time, and the coefficient update step size is reduced, so that the error tracking of the filter can be effectively suppressed, and the coefficient divergence is avoided. When the coefficient updating amount is detected to be normal, the adaptive filter is judged to be in a normal working state at the moment, and then the coefficient updating step length can be adjusted, such as the coefficient updating step length is restored to the initial value. This can increase the convergence speed of the adaptive filter.
In particular, for the NLMS algorithm in the frequency domain adaptive algorithm,
as previously described, the coefficient update is shown as follows:
<math> <mrow> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>&mu;FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mi>&phi;</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2.3</mn> <mo>)</mo> </mrow> </mrow> </math>
order to <math> <mrow> <mi>&Phi;</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mi>&phi;</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> </mrow> </math>
(2.4)
Then, W (k +1) ═ W (k) + μ · Φ (k) (2.5)
Where w (k) is a frequency domain adaptive filter coefficient and is an N-dimensional complex vector, μ is a coefficient update step, Φ (k) is also an N-dimensional complex vector, and N is the number of FFT points. Namely:
Ф(k)=[Ф0(k),Ф1(k),...,ФN-1(k)]T
(2.6)
the coefficient update amount thus obtained is:
μ·Ф(k)=[μ·Ф0(k),μ·Ф1(k),...,μ·ФN-1(k)]T (2.7)
the key to the step size adjustment mentioned above is to detect the magnitude of the coefficient update amount. The magnitude of the coefficient update can be measured modulo a complex number. Namely:
[μ·‖Ф0(k)‖,μ·‖Ф1(k)‖,...,μ·‖ФN-1(k)‖]T (2.8)
in the present invention, the step length adjustment method may be:
for μ | phii(k)‖,i=0,1,...,N-1,
If mu · | phii(k) II > MaxStepSize, MaxStepSize is the maximum step size threshold, thenAt this time, the adaptive filter is in an abnormal working state, and then the step size is adjusted, which may be scaling down the step size, for example, by 10 times. I.e., μ ═ 0.1 μ.
Experiments show that after the step length adjusting module is added, although the convergence speed of the frequency domain adaptive filter is reduced to a certain extent, the coefficient is not easy to diverge, and the stability of the adaptive filter is greatly enhanced.
The filtering control module and the step length adjusting module ensure the stable work of the self-adaptive filter to a certain extent. However, some sudden events, or unexpected situations, may still cause the adaptive filter to diverge, and the diverging filter may cause the speaker to emit a loud noise. Therefore, the present invention proposes a strategy for dealing with special situations, and as shown in fig. 3, a coefficient adjustment module can be added as a last line of defense for ensuring stable operation of the adaptive filter.
The principle of the operation of the coefficient adjusting module is simple, that is, when the adaptive filter diverges, the coefficient of the adaptive filter tends to be larger, so that the task of coefficient adjustment is to check the size of the coefficient after each coefficient update, and if the coefficient is larger than a set threshold, the adaptive filter is considered to diverge. Specifically, for the frequency domain NLMS algorithm, as mentioned above, the coefficient update is shown as follows:
<math> <mrow> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>&mu;FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mi>&phi;</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> </mrow> </math>
(2.9)
where W (k) is the frequency domain adaptive filter coefficient, which is an N-dimensional complex vector, and N is the number of FFT points. Namely: w (k) ═ W0(k),W1(k),...,WN-1(k)]T
(2.10)
The magnitude of the coefficients is measured modulo a complex number. Namely:
[‖W0(k)‖,‖W1(k)‖,...,‖WN-1(k)‖]T (2.11)
for | Wi(k)‖,i=0,1,...,N-1,
If | Wi(k) Ii > MaxParam, where MaxParam is the maximum coefficient threshold, then it is determined that the frequency domain adaptive filter has now diverged, and the coefficients of the adaptive filter are adjusted, which may be reducing the adaptive filter coefficients, such as may be zeroed, i.e.: w (k) is 0. After the coefficients are set to zero, the adaptive filter will resume convergence, thus saving the filter from the divergence state. The threshold value MaxParam needs to be carefully selected according to the gain of a feedback loop, and the value is too large, so that the coefficient monitoring is insensitive and the divergence state cannot be effectively identified; the value is too small, and misjudgment is easy to occur, so that the adaptive filter is frequently restarted and cannot work normally.
In addition, a Non-linear processing module, namely an NLP (Non-Linear processor) module, can be added. This is because typical loudspeakers have 5% -10% nonlinear distortion, and adaptive filtering can only track linear systems, so that the nonlinear distortion of the signal in the feedback loop is unpredictable and eliminated. Therefore, an NLP processing module can be added after adaptive filtering to eliminate nonlinear distortion.
Because NLP processing is only performed for non-linear distortion of the speaker, the module can be turned off when not needed, which requires adding a non-linear processing control module and a speaker voiced detection module for controlling the turning on and off of the non-linear processing module, wherein the speaker voiced detection module can use VAD2 in the voiced detection module.
The specific control principle is as follows: when (1) SpkSignal _ avg > NoiseFloor, i.e., VAD2 detects speaker voiced; and (2) SpkSignal _ avg/Ee > alpha, i.e. the loudspeaker signal is alpha times larger than the residual signal; NLP processing is initiated. If either of the conditions (1) and (2) is not met, the NLP module is closed.
Where condition (1) states that when the speaker is silent, no echo is possible, and NLP processing is unnecessary; condition (2) shows that when the near end has sound, Ee is larger, so that condition (2) is not satisfied, NLP processing is closed, and the near end signal is transmitted without distortion.
In the formula: SpkSignal _ avg is the short-time average amplitude of the speaker output signal, noise floor is the estimated noise level, E [ E ] is the short-time average amplitude of the residual signal, and the value of α can be 2 in this embodiment. The short-time average amplitude may be an average of sums of absolute values of signals in one frame of signal.
The NLP processing in this scheme can adopt a center clipping method to suppress the residual echo. As shown in fig. 5, a schematic view of NLP processing for center clipping. Its action can be represented by the following formula: when E > NLPfloor,
e ' = e - NLPfloor , if ( e > NLPfloor ) e + NLPfloor , elseif ( e < - NLPfloor ) 0 , else - - - ( 3.1 )
in the formula, e and e' are residual echoes before and after passing through the NLP module. E [ E ] is the short-term average amplitude, NLPfloor is the decision level, the value of which needs to be carefully chosen, too small to effectively suppress the residual echo, and too large to seriously affect the near-end sound quality.
In addition, when E ≦ NLPfloor, E' may be replaced with comfort noise. The reason why e 'is replaced by comfort noise is that if e' is set to zero directly, noise is introduced when NLP switches on and off, giving the illusion of half-duplex. Comfort noise may be generated using an analog gaussian random signal.
The following describes a method for performing echo cancellation of a microphone by using a frequency-domain adaptive filtering method according to the present invention.
First, some basic concepts are explained as used below, the frequency domain filter coefficients are: W ( k ) = FFT w ( k ) 0 , length 2M, where w (k) is the corresponding time-domain adaptive filter coefficient of length M, using overlap-and-hold method.
On the basis of the echo cancellation method in the background art, the invention proposes to add a frame length adjustment step, which is used for adjusting the length of a data frame. First, this step is explained in detail, and in the present invention, the length is adjusted to any positive integer L smaller than M. Such as: the length of the frequency domain filter coefficients is 2M, corresponding to the time domain filter coefficientsWith a length of M, the length of each new incoming data frame can be adjusted to half the length of the time-domain filter coefficients, i.e., L equals M/2(M is an even number). Then compared to the background art for the input signal
Figure A20061014405500322
The original two-frame combination is changed into four-frame combination. Through the improvement, on one hand, the length of the self-adaptive filter is long enough to meet the delay requirement of a feedback loop; on the other hand, the updating frequency of the adaptive filter coefficient is also considered.
In the above example, L is M/2, and in actual use, it may be M/4, M/3, M/8, or the like, so that the coefficient update frequency of the adaptive filter can be made higher. Only accordingly, the length of the data intercepted by the residual echo intercepting module needs to be changed. In practical use, the length L of the data frame may be any number less than M, for example: m is 1024, then L can be 1000, 900, 650, etc. any value less than 1024. However, this comes at the cost of increased algorithm complexity. Note that after one adjustment, the frame length is relatively fixed until all data is processed, rather than performing frame length adjustment every time a frame of data is collected. Finally, because the data volume of each frame is L, when the residual echo is output, a residual echo interception step is added for intercepting the first L or L data of the obtained residual echo and outputting the intercepted data as a final result.
The microphone echo cancellation method using the frequency domain adaptive filtering method, which adds the frame length adjustment step and the residual echo interception step, is completely described below by taking M as an example 1024.
1) A frame length adjusting step, wherein the frame length is adjusted to a positive integer value L smaller than M; in this embodiment, let L be 800.
2) Collecting a frame of k-th far-end sound signal to be output to a loudspeakerFrame lengthIs 80O.
3) The current frameThe 800 data in the frame and the previous 2M-L2048-800-1248 data are combined to form a large frame with the length of 2M
Figure A20061014405500333
As shown in fig. 4, the newly acquired current frame
Figure A20061014405500334
The 800 data and the 1248 data form a large frame with length of 2048
Figure A20061014405500335
u (800k-1248) is the 1248 th data before the original k frame,
u (800k-2) is the 2 nd data before the original k frame,
u (800k-1) is the previous data before the original k frame,
u (800k) is the 1 st data in the original k frame data,
u (800k +799) is the 800 th data in the original k frame data.
When the first and second frame signals are collected initially, the third frame data is waited to come and then combined with the 448 data in the first frame data and 800 data in the second frame to form a large frame with length of 2048
Figure A20061014405500337
And carrying out adaptive filtering processing once. The following data is new data every coming frame
Figure A20061014405500338
Namely, data combination is carried out and then adaptive filtering processing is carried out once.
4) Will be provided with
Figure A20061014405500339
Performing FFT processing, and converting into a frequency domain to obtain:
<math> <mrow> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mo>&lsqb;</mo> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>.</mo> </mrow> </math>
5) filtering U (k) by using current filter coefficient W (k) by using an overlap-and-leave method, namely multiplying the result on a frequency domain, performing IFFT processing on the result, and taking the last M data of the result, namely the last 1024 data, and recording the result as the last M data
Figure A20061014405500341
Namely, the method comprises the following steps:
<math> <mrow> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>&lsqb;</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>y</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>=</mo> <mi>IFFT</mi> <mo>&lsqb;</mo> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>.</mo> </mrow> </math>
6) the far-end sound signalAfter being played by the loudspeaker, the echo signal with the length of M is collected by the microphone to pass through an acoustic loop between the loudspeaker and the microphoneRepresents, i.e.:
<math> <mrow> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>&lsqb;</mo> <mi>d</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>d</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>d</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>,</mo> </mrow> </math>
the above-mentioned
Figure A20061014405500346
And the step 5) mentioned above
Figure A20061014405500347
Subtracting to obtain an error signal
Figure A20061014405500348
Comprises the following steps:
<math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>[</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>e</mi> <mrow> <mo>(</mo> <mi>kM</mi> <mo>+</mo> <mi>M</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>]</mo> <mo></mo> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
7) intercept the
Figure A200610144055003410
The resulting first L signals are output as final residual echoes;
8) at said length M, not intercepted
Figure A200610144055003411
M zeros are supplemented in advance, and FFT processing is carried out to obtain:
<math> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow> </math>
simultaneously conjugating the U (k) in the step 4) to obtain UH(k) Then, performing dot multiplication with the E (k), and performing IFFT operation on the result, and obtaining the following result according to an overlap preservation method:
<math> <mrow> <mover> <mi>&phi;</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>IFFT</mi> <mo>&lsqb;</mo> <msup> <mi>U</mi> <mi>H</mi> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>,</mo> </mrow> </math>
in the above formula, the next frame of the result needs to be deleted, and only the first M points of the IFFT result are taken;
9) in thatThen, M zeros are complemented, FFT processing is carried out, the result is multiplied by the self-adaptive step size mu, the obtained product is added with the filter coefficient W (k), and the updated value of the filter coefficient in the frequency domain form can be obtained as follows: <math> <mrow> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>&mu;FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mover> <mi>&phi;</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow> </math> the next adaptive filtering uses the updateFiltering the subsequent filter coefficient W (k +1) as the current W (k);
10) and step 2) is executed until the sound signal from the far end is input, and the whole process is finished.
In the above-mentioned embodiment, the value of L is 800, and in practical use, the value may also be other integer values smaller than M, such as 600, 500, and so on. In addition, the value of L can be M/n, i.e., 1024/n, n being an integer greater than 1, and 1024/n also being an integer. If it can be 1024/2, only 4 data frames need to be combined to get a large frame with length 2048. In this case, the filter coefficient can be updated once every 1024/2 data, so that the convergence rate of the filter coefficient is increased, and the efficiency is improved.
Before the step 1), the method may further include an active sound detection step and a filtering control step, and the active sound detection step and the filtering control step are used for integrally controlling the operation of the filter, and the method includes:
a sound detection step, detecting sound conditions of the microphone input end and the loudspeaker output end;
a filtering control step of controlling the operation of the filter according to the result of the voiced sound detection step, specifically:
if the detection result of the microphone input end is silent, then the self-adaptive filtering is not carried out, the coefficient updating is not carried out, and the output is directly made <math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing;
if the detection result of the microphone input end is voiced, then the detection result of the loudspeaker output end is seen, and if the detection result of the loudspeaker output end is unvoicedThen the adaptive filtering is normally performed but the coefficient updating is not performed, and the output <math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing;
if the detection results of the microphone input end and the loudspeaker output end are voiced, the adaptive filter is in a normal working state, and not only is adaptive filtering carried out, but also coefficient updating is carried out, and the adaptive filter is output <math> <mrow> <mover> <mi>e</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> This frame processing is completed.
Wherein,in order to be the echo received by the microphone,
Figure A20061014405500355
for the pairs of the adaptive filter outputs
Figure A20061014405500356
The predicted value of (a) is determined,
Figure A20061014405500357
is the residual echo.
The voiced sound detection is to judge whether voiced sound exists or not by comparing the short-time average amplitude of the sound signals at the microphone input end and the loudspeaker output end with the noise level, and specifically comprises the following steps:
if MicSignal _ avg is larger than NoiseFloor, judging that the microphone line is voiced, otherwise, judging that the microphone line is unvoiced;
the above-mentioned <math> <mrow> <mi>MicSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>M</mi> <munderover> <mi>&Sigma;</mi> <mn>0</mn> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>d</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>,</mo> </mrow> </math> Is the short-time average amplitude of the microphone input signal,
Figure A20061014405500362
for a microphone input signal, i.e., a received echo signal, M is the length of a frame of speech signal, noise floor is the estimated noise level;
if the SpkSignal _ avg is larger than the NoiseFloor, judging that the loudspeaker line is voiced, otherwise, judging that the loudspeaker line is unvoiced;
the above-mentioned <math> <mrow> <mi>SpkSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>L</mi> <munderover> <mi>&Sigma;</mi> <mn>0</mn> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>u</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>,</mo> </mrow> </math> Outputting signals for loudspeakersThe short-time average amplitude of the signal,for the loudspeaker output signal, L is the length of a frame of speech signal.
The method also comprises a step length adjusting step, which is used for reducing the coefficient updating step length of the adaptive filter when the coefficient updating step length of the adaptive filter is detected to be larger than the set maximum coefficient updating step length threshold value. The step size of the coefficient update of the adaptive filter is reduced, and the step size of the coefficient update can be reduced by a certain proportion.
And when the updating step length of the self-adaptive filter coefficient is detected to be recovered to be normal, the coefficient updating step length is recovered to the initial value.
In addition, a coefficient adjusting step is included for reducing the coefficient of the filter when the coefficient of the adaptive filter is detected to be larger than the set coefficient threshold value. To effectively prevent the filter coefficients from diverging.
Further comprising a non-linear processing step: firstly, calculating the short-time average amplitude E (e) of the minimized residual signal; then, whether E (e) is larger than a preset nonlinear processing threshold NLPfloor is judged, and if yes, the minimized residual noise e' (n) is calculated by using the following formula:
Figure A20061014405500365
wherein e is a residual signal and is an input of the nonlinear processing module, an output of the nonlinear processing module is e', e (e) is a short-time average amplitude of the residual signal, and nlpfolor is a decision level.
If E (e) is ≦ NLPfloor, e' is directly replaced with comfort noise.
The step 7) may be followed by a nonlinear processing switch control step, specifically: detecting the sound condition of the output end of the loudspeaker; and turning on or off the nonlinear processing step according to the detection result.
The opening or closing method specifically comprises the following steps: when the output end of the loudspeaker is detected to be voiced, namely SpkSignal _ avg is larger than NoiseFlor, and the signal of the output end of the loudspeaker is larger than the residual signal by alpha times, namely SpkSignal _ avg/E [ E ] > alpha, if alpha is 6, the nonlinear processing module is started;
if one of the two conditions is not met, the NLP processing is closed; wherein: SpkSignal _ avg is the short-time average amplitude of the speaker output signal, noise floor is the estimated noise level, and e (e) is the short-time average amplitude of the residual signal.
By using the technical scheme of the invention, the frequency domain filter can work efficiently and stably, and the specific performance indexes obtained by experiments are as follows:
echo compression: 50-60 dB;
convergence time: less than 1 s;
supported feedback loop delay time: adjustable, e.g., at 8K sample rate, filter length 1024, 128ms delay can be supported.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and the like that are within the spirit and principle of the present invention are included in the present invention.

Claims (22)

1. A microphone echo cancellation device for canceling echo generated by an acoustic loop between a speaker and a microphone, comprising:
a data acquisition and combination module for acquiring the sound signal u from the far end to be output to the loudspeaker, wherein the length of the data frame acquired each time is M, and the current frame, namely the kth frame data is recorded asAnd is combined with the previous frame, i.e. the k-1 frame data
Figure A2006101440550002C2
Jointly forming a large frame of length 2M
Figure A2006101440550002C4
Frequency-domain adaptive filter whose current filter frequency-domain coefficients are denoted W ( k ) = FFT w ( k ) 0 Length 2M, where w (k) is the time domain coefficient of the filter, length M; the frequency domain adaptive filter is used for converting the frequency domain adaptive filter into a frequency domain adaptive filterFFT processing is carried out, and the frequency domain is converted to obtain
Figure A2006101440550002C7
(ii) a Filtering the U (k) by using the current filter coefficient W (k), and then performing IFFT processing on the filtering result to obtain the echoA frame prediction value of
Figure A2006101440550002C9
Taking the next M points of the result;
a subtractor for collecting the echo with length M by a microphone
Figure A2006101440550002C10
Subtracting the predicted valueObtaining a residual echo
Figure A2006101440550002C12
The frequency domain adaptive filter further comprises a voice correlation detection unit for calculating the residual echo in the frequency domainWith the sound signal from the far endTo obtain a speech correlation parameterWherein U isH(k) Is the conjugate value to said U (k),
Figure A2006101440550002C16
to, for
Figure A2006101440550002C17
Taking the first M points of the result;
the frequency domain adaptive filter further comprises a coefficient updating unit, which is used for updating the coefficient W (k) of the frequency domain adaptive filter according to the voice correlation and by combining the adaptive step size mu of the adaptive filter to obtain the coefficient
Figure A2006101440550003C1
The coefficient W (k) of the frequency domain adaptive filter is updated once each time the frequency domain adaptive filter performs adaptive filtering, and the adaptive filter performs frequency domain filtering on next combined big frame data by using the updated coefficient W (k +1) when performing the next adaptive filtering;
the device is characterized by also comprising a frame length adjusting module, a frame length adjusting module and a frame length adjusting module, wherein the frame length adjusting module is used for setting the data frame length of the u to be a value L smaller than M;
correspondingly, the data acquisition and combination module is used for combining L data of the current kth frame data and the immediately preceding 2M-L continuous data to form a large frame with the length of 2M;
accordingly, the frequency domain adaptive filter adaptively filters the 2M large frame; after the filtering processing of each frame of data with the length of L is finished, updating the frequency domain filtering coefficient of the filter;
and correspondingly, the echo signal processing device also comprises a residual echo intercepting module for intercepting the residual echoThe first L signals of each frame result, the final residual echo e is obtained.
2. The echo cancellation device according to claim 1, wherein the frame length adjusting module adjusts a frame length from M to L/n, where n is an integer greater than 1; correspondingly, the data acquisition and combination module combines the current frame of u and the immediately previous 2n-1 data frames into a large frame with the length of 2M.
3. The echo cancellation device according to claim 1 or 2, further comprising an audible detection module and a filtering control module,
the sound detection module comprises two sound detection units which are respectively used for detecting sound conditions of the microphone input end and the loudspeaker output end and outputting the detection results to the filtering control module;
the filtering control module is used for controlling the work of the frequency domain self-adaptive filter according to the output result of the sound detection module,
if the microphone input end sound detection result is soundless, then the self-adaptive filtering is not carried out, the coefficient updating is not carried out, and the output is directly made
Figure A2006101440550004C1
Completing the frame processing;
if the detection result of the microphone input end is voiced, then the detection result of the loudspeaker output end is seen, and if the loudspeaker output end is voicedDetecting silence, the adaptive filtering is normally carried out, but the coefficient updating is not carried out, and the outputCompleting the frame processing;
if the detection results of the microphone input end and the loudspeaker output end are voiced, the adaptive filter is in a normal working state, namely adaptive filtering is carried out, and coefficient updating is also carried out to obtain output
Figure A2006101440550004C3
And the updated filter coefficient W (k +1), completing the frame processing.
4. The echo cancellation device according to claim 3, wherein the sound detection module determines whether there is sound by comparing the short-time average amplitude of the sound signals at the microphone input and the speaker output with a noise level, and specifically:
if MicSignal _ avg is larger than NoiseFloor, judging that the microphone line is voiced, otherwise, judging that the microphone line is unvoiced;
the above-mentioned
Figure A2006101440550004C4
Is the short-time average amplitude of the microphone input signal, wherein
Figure A2006101440550004C5
Acquiring a sound signal with a frame length of M for a microphone, wherein M is the frame length, and NoiseFlor is an estimated noise level;
if the SpkSignal _ avg is larger than the NoiseFloor, judging that the loudspeaker line is voiced, otherwise, judging that the loudspeaker line is unvoiced;
the above-mentionedThe average amplitude of the signal input to the loudspeaker is a short-time average amplitude,for signals input to the speakers, L is the frame length.
5. The echo cancellation device of claim 1 or 2, further comprising a step size adjustment module configured to detect a coefficient update step size μ of the adaptive filter, and to decrease the value of μ when μ is greater than a set maximum coefficient update step size threshold.
6. The echo cancellation device according to claim 5, wherein the coefficient update step is restored to the initial value when it is detected that the update step of the adaptive filter coefficients is restored to normal.
7. The echo cancellation device according to claim 1 or 2, further comprising a coefficient adjustment module configured to decrease the filter coefficient w (k) when detecting that the coefficient w (k) of the adaptive filter is greater than a set coefficient threshold.
8. The echo cancellation device according to claim 1 or 2, further comprising: and the nonlinear processing module is used for suppressing nonlinear components in the echo.
9. The echo cancellation device of claim 8, wherein the non-linear processing module causes the non-linear processing module to perform the processing when e (e) > nlpffloor
Figure A2006101440550005C1
Where e is a residual signal and is also an input of the nonlinear processing module, and an output of the nonlinear processing module is e', e (e) is a short-time average amplitude of the residual signal, and nlpfolor is a decision level.
10. The echo cancellation device of claim 8, wherein e' is directly replaced with comfort noise when e (e) ≦ nlpffloor.
11. The echo cancellation device according to claim 8, further comprising:
the loudspeaker sound detection module is used for detecting the sound condition of the output end of the loudspeaker;
the nonlinear processing control module is used for turning on or off the nonlinear processing module according to the output result of the loudspeaker sound detection module;
when the loudspeaker sound detection module detects that the output end of the loudspeaker is sound, namely SpkSignal _ avg is larger than NoiseFlor,
when the signal at the output end of the loudspeaker is larger than the residual signal by alpha times, namely SpkSignal _ avg/Ee is larger than alpha, the nonlinear processing module is started;
if one of the two conditions is not met, the NLP processing is closed;
wherein: SpkSignal _ avg is the short-time average amplitude of the loudspeaker output signal, noiseflo is the estimated noise level, and e (e) is the short-time average amplitude of e.
12. A microphone echo cancellation method, using a frequency domain adaptive filtering method to cancel an echo d generated by a sound signal u from a far end through an acoustic loop between a loudspeaker and a microphone, and finally obtaining a residual echo e, wherein the coefficient of a time domain filter is w (k), the length of the time domain filter is M, and the coefficient of a corresponding frequency domain filter is: W ( k ) = FFT w ( k ) 0 the length is 2M, and an overlapping reservation method is adopted;
it is characterized in that the preparation method is characterized in that,
1) setting the data frame length L of the signal u acquired each time;
2) collecting a frame signal according to the set frame length L
Figure A2006101440550006C2
Represents a k-th frame signal;
3) the current frameMerging the data with the previous 2M-L data into a large frame with the length of 2M
4) Will be described inConversion to the frequency domain, overlap-preserving, with filter coefficients W (k) for said frequency domain
Figure A2006101440550006C7
Filtering is carried out, the result is converted into a time domain, and a predicted value of the echo time domain is obtained
Figure A2006101440550006C8
5) Collecting echoes
Figure A2006101440550006C9
And subtract
Figure A2006101440550006C10
Obtaining the minimum residual echo signal of the k frame
Figure A2006101440550006C11
6) According to the aboveAndupdating the filter coefficient W (k) to obtain W (k + 1);
7) and 2) acquiring the next frame of signals, merging the signals, and performing frequency domain adaptive filtering by using the updated filter coefficient until the data input is finished.
13. The method of claim 12,
the frequency domain adaptive filtering algorithm comprises the following steps:
1) frame length adjustment, namely adjusting the frame length of u from M to a positive integer value L smaller than M;
2) collecting the kth frame signal of u, the frame length is L, and recording as
Figure A2006101440550006C14
3) Will be described in
Figure A2006101440550006C15
The L data in the frame and the immediately previous 2M-L data are combined to form a large frame with the length of 2M
Figure A2006101440550006C16
u (kL-2M + L) is the 2M-L data before the original k frame,
u (kL-2) is the 2 nd data before the original k frame,
u (kL-1) is the previous data of the original k-th frame,
u (kL) is the 1 st data in the original k frame,
u (kL + L-1) is the L-th data in the original k-th frame;
4) will be provided with
Figure A2006101440550007C1
Performing FFT processing, and converting into a frequency domain to obtain:
5) filtering U (k) by current filter coefficient W (k) by overlap preserving method, i.e. multiplying in frequency domain, then taking the last M data of the result after IFFT processing to the result, and recording asNamely, the following steps are provided:
6) after being played by the loudspeaker, the u passes through an acoustic loop between the loudspeaker and the microphone, and then is collected by the microphone to obtain an echo signal with the length of M so as to obtain a signal
Figure A2006101440550007C5
Represents, i.e.:
Figure A2006101440550007C6
the above-mentioned
Figure A2006101440550007C7
And the step 5) mentioned above
Figure A2006101440550007C8
Subtracting to obtain an error signal
Figure A2006101440550007C9
Comprises the following steps:
Figure A2006101440550007C10
7) intercept the
Figure A2006101440550007C11
The resulting L signals are output as final residual echoes;
8) at said length M, not interceptedM zeros are supplemented in advance, and FFT processing is carried out to obtain:
Figure A2006101440550007C13
simultaneously conjugating the U (k) in the step 4) to obtain UH(k) Then, performing dot multiplication with the E (k), and performing IFFT operation on the result, and obtaining the following result according to an overlap preservation method:
in the above formula, the next frame of the result needs to be deleted, and only the first M points of the IFFT result are taken;
9) in the above-mentionedThen, M zeros are complemented, FFT processing is carried out, the result is multiplied by the self-adaptive step size mu, the obtained product is added with the filter coefficient W (k), and the updated value of the filter coefficient in the frequency domain form can be obtained as follows:
Figure A2006101440550008C2
the next time of adaptive filtering, namely, the updated filter coefficient W (k +1) is adopted for filtering;
10) and step 2) is executed until the sound signal from the far end is input.
14. The method of claim 12 or 13, wherein the value of L is M/n, and n is an integer greater than 1.
15. The method according to claim 12 or 13, characterized by further comprising a step of detecting presence of sound and a step of controlling filtering before the step 1), comprising:
a sound detection step, detecting sound conditions of the microphone input end and the loudspeaker output end;
a filtering control step of controlling the operation of a filter according to the result of the voiced sound detection step;
the method specifically comprises the following steps:
if the detection result of the microphone input end is silent, then the self-adaptive filtering is not carried out, the coefficient updating is not carried out, and the output is directly made
Figure A2006101440550008C3
Completing the frame processing;
if the detection result of the microphone input end is voiced, then looking at the detection result of the loudspeaker output end, if the detection result of the loudspeaker output end is unvoiced, then the self-adaptive filtering is normally carried out, but the coefficient updating is not carried out, and the output is carried out
Figure A2006101440550008C4
Completing the frame processing;
if the detection results of the microphone input end and the loudspeaker output end are voiced, the adaptive filter is in a normal working state, and not only is adaptive filtering carried out, but also coefficient updating is carried out, and the adaptive filter is output
Figure A2006101440550008C5
Completing the frame processing;
wherein,
Figure A2006101440550008C6
in order to be the echo received by the microphone,
Figure A2006101440550008C7
for the pairs of the adaptive filter outputs
Figure A2006101440550008C8
The predicted value of (a) is determined,is the residual echo.
16. The method of claim 15, wherein the sound detection is performed by comparing the short-time average amplitude of the sound signals at the microphone input and the speaker output with a noise level to determine whether sound is present, specifically:
if MicSignal _ avg is larger than NoiseFloor, judging that the microphone line is voiced, otherwise, judging that the microphone line is unvoiced;
the above-mentioned
Figure A2006101440550009C1
The short-time average amplitude of the microphone input signal,
Figure A2006101440550009C2
for a microphone input signal, i.e., a received echo signal, M is the length of a frame of speech signal, noise floor is the estimated noise level;
if the SpkSignal _ avg is larger than the NoiseFloor, judging that the loudspeaker line is voiced, otherwise, judging that the loudspeaker line is unvoiced;
the above-mentioned
Figure A2006101440550009C3
The short-time average amplitude of the output signal of the loudspeaker is obtained,
Figure A2006101440550009C4
for the loudspeaker output signal, L is the length of a frame of speech signal.
17. The method according to claim 12 or 13, further comprising a step size adjustment step of decreasing the coefficient update step size of the adaptive filter when it is detected that the coefficient update step size of the adaptive filter is greater than a set maximum coefficient update step size threshold.
18. The method of claim 17, wherein the coefficient update step size is restored to the initial value upon detecting that the update step size of the adaptive filter coefficients is restored to normal.
19. The method according to claim 12 or 13, further comprising a coefficient adjusting step for reducing the coefficients of the filter when it is detected that the coefficients of the adaptive filter are greater than a set coefficient threshold.
20. The method according to claim 12 or 13, further comprising a non-linear processing step of:
firstly, calculating the short-time average amplitude E (e) of the minimized residual signal;
then, whether E (e) is larger than a preset nonlinear processing threshold NLPfloor is judged, and if yes, the minimized residual noise e' (n) is calculated by using the following formula:
Figure A2006101440550010C1
wherein e is a residual signal and is an input of the nonlinear processing module, an output of the nonlinear processing module is e', e (e) is a short-time average amplitude of the residual signal, and nlpfolor is a decision level.
21. The method of claim 20, wherein if e (e ≦ nlpffloor, e' is directly replaced with comfort noise.
22. The method according to claim 20, further comprising a step of nonlinear processing of the switch control, in particular:
detecting the sound condition of the output end of the loudspeaker;
turning on or off the nonlinear processing step according to the detection result, specifically:
when the output end of the loudspeaker is detected to be voiced, namely SpkSignal _ avg is larger than NoiseFlor, and the signal of the output end of the loudspeaker is larger than a residual signal by alpha times, namely SpkSignal _ avg/E [ E ] > alpha, the nonlinear processing module is started;
if one of the two conditions is not met, the NLP processing is closed; wherein: SpkSignal _ avg is the short-time average amplitude of the speaker output signal, noise floor is the estimated noise level, e (e) is the short-time average amplitude of the residual signal, and α is a preset multiple value.
CNB2006101440555A 2006-11-24 2006-11-24 Echo elimination device for microphone and method thereof Expired - Fee Related CN100524466C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006101440555A CN100524466C (en) 2006-11-24 2006-11-24 Echo elimination device for microphone and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006101440555A CN100524466C (en) 2006-11-24 2006-11-24 Echo elimination device for microphone and method thereof

Publications (2)

Publication Number Publication Date
CN1953060A true CN1953060A (en) 2007-04-25
CN100524466C CN100524466C (en) 2009-08-05

Family

ID=38059354

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006101440555A Expired - Fee Related CN100524466C (en) 2006-11-24 2006-11-24 Echo elimination device for microphone and method thereof

Country Status (1)

Country Link
CN (1) CN100524466C (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101192411B (en) * 2007-12-27 2010-06-02 北京中星微电子有限公司 Large distance microphone array noise cancellation method and noise cancellation system
CN101771925A (en) * 2008-12-30 2010-07-07 Gn瑞声达A/S Hearing instrument with improved initialisation of parameters of digital feedback suppression circuitry
CN101888455A (en) * 2010-04-09 2010-11-17 熔点网讯(北京)科技有限公司 Self-adaptive echo counteracting method for frequency domain
CN102131014A (en) * 2010-01-13 2011-07-20 歌尔声学股份有限公司 Device and method for eliminating echo by combining time domain and frequency domain
CN102204276A (en) * 2008-11-05 2011-09-28 雅马哈株式会社 Sound emission and collection device, and sound emission and collection method
CN101217039B (en) * 2008-01-08 2011-11-23 北京中星微电子有限公司 A method, system and device for echo elimination
CN102387272A (en) * 2011-09-09 2012-03-21 南京大学 Restraining method for residual echo in echo cancellation system
CN102413384A (en) * 2011-11-16 2012-04-11 杭州艾力特音频技术有限公司 Echo cancellation two-way voice talk back equipment
CN102956236A (en) * 2011-08-15 2013-03-06 索尼公司 Information processing device, information processing method and program
CN103366757A (en) * 2012-04-09 2013-10-23 广达电脑股份有限公司 Communication system and method with echo cancellation mechanism
CN106067301A (en) * 2016-05-26 2016-11-02 浪潮(苏州)金融技术服务有限公司 A kind of method using multidimensional technology to carry out echo noise reduction
CN106664481A (en) * 2014-03-19 2017-05-10 思睿逻辑国际半导体有限公司 Non-linear control of loudspeakers
CN106716527A (en) * 2014-07-31 2017-05-24 皇家Kpn公司 Noise suppression system and method
CN106713685A (en) * 2016-11-25 2017-05-24 东莞市嘉松电子科技有限公司 Hands-free communication control method
CN106910500A (en) * 2016-12-23 2017-06-30 北京第九实验室科技有限公司 The method and apparatus of Voice command is carried out to the equipment with microphone array
CN107017004A (en) * 2017-05-24 2017-08-04 建荣半导体(深圳)有限公司 Noise suppressing method, audio processing chip, processing module and bluetooth equipment
CN107071197A (en) * 2017-05-16 2017-08-18 中山大学花都产业科技研究院 A kind of echo removing method and system based on the piecemeal frequency domain of delay more than all phase
CN107123430A (en) * 2017-04-12 2017-09-01 广州视源电子科技股份有限公司 Echo cancellation method, device, conference tablet and computer storage medium
CN107393546A (en) * 2017-09-04 2017-11-24 恒玄科技(上海)有限公司 A kind of echo cancel method and speech recognition apparatus for speech recognition process
CN108986836A (en) * 2018-08-29 2018-12-11 质音通讯科技(深圳)有限公司 A kind of control method of echo suppressor, device, equipment and storage medium
CN109102821A (en) * 2018-09-10 2018-12-28 苏州思必驰信息科技有限公司 Delay time estimation method, system, storage medium and electronic equipment
CN109215672A (en) * 2017-07-05 2019-01-15 上海谦问万答吧云计算科技有限公司 A kind of processing method of acoustic information, device and equipment
CN109346096A (en) * 2018-10-18 2019-02-15 深圳供电局有限公司 Echo cancellation method and device for voice recognition process
WO2019128402A1 (en) * 2017-12-26 2019-07-04 深圳Tcl新技术有限公司 Method, system and storage medium for solving echo cancellation failure
CN110024025A (en) * 2016-11-23 2019-07-16 哈曼国际工业有限公司 Dynamic stability control system based on coherence
CN110225214A (en) * 2014-04-02 2019-09-10 想象技术有限公司 Control method, attenuation units, system and the medium fed back to sef-adapting filter
CN110838300A (en) * 2019-11-18 2020-02-25 紫光展锐(重庆)科技有限公司 Echo cancellation processing method and processing system
CN110913310A (en) * 2018-09-14 2020-03-24 成都启英泰伦科技有限公司 Echo cancellation method for broadcast distortion correction
CN111091846A (en) * 2019-12-26 2020-05-01 江亨湖 Noise reduction method and echo cancellation system applying same
CN111341336A (en) * 2020-03-16 2020-06-26 北京字节跳动网络技术有限公司 Echo cancellation method, device, terminal equipment and medium

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101192411B (en) * 2007-12-27 2010-06-02 北京中星微电子有限公司 Large distance microphone array noise cancellation method and noise cancellation system
CN101217039B (en) * 2008-01-08 2011-11-23 北京中星微电子有限公司 A method, system and device for echo elimination
CN102204276B (en) * 2008-11-05 2015-04-15 雅马哈株式会社 Sound emission and collection device, and sound emission and collection method
US8855327B2 (en) 2008-11-05 2014-10-07 Yamaha Corporation Sound emission and collection device and sound emission and collection method
CN102204276A (en) * 2008-11-05 2011-09-28 雅马哈株式会社 Sound emission and collection device, and sound emission and collection method
CN101771925B (en) * 2008-12-30 2013-07-31 Gn瑞声达A/S Hearing instrument with improved initialisation of parameters of digital feedback suppression circuitry
CN101771925A (en) * 2008-12-30 2010-07-07 Gn瑞声达A/S Hearing instrument with improved initialisation of parameters of digital feedback suppression circuitry
CN102131014A (en) * 2010-01-13 2011-07-20 歌尔声学股份有限公司 Device and method for eliminating echo by combining time domain and frequency domain
CN101888455B (en) * 2010-04-09 2013-07-03 熔点网讯(北京)科技有限公司 Self-adaptive echo counteracting method for frequency domain
CN101888455A (en) * 2010-04-09 2010-11-17 熔点网讯(北京)科技有限公司 Self-adaptive echo counteracting method for frequency domain
CN102956236A (en) * 2011-08-15 2013-03-06 索尼公司 Information processing device, information processing method and program
CN102387272A (en) * 2011-09-09 2012-03-21 南京大学 Restraining method for residual echo in echo cancellation system
CN102413384A (en) * 2011-11-16 2012-04-11 杭州艾力特音频技术有限公司 Echo cancellation two-way voice talk back equipment
CN103366757A (en) * 2012-04-09 2013-10-23 广达电脑股份有限公司 Communication system and method with echo cancellation mechanism
CN106664481B (en) * 2014-03-19 2019-06-07 思睿逻辑国际半导体有限公司 The nonlinear Control of loudspeaker
CN106664481A (en) * 2014-03-19 2017-05-10 思睿逻辑国际半导体有限公司 Non-linear control of loudspeakers
CN110225214A (en) * 2014-04-02 2019-09-10 想象技术有限公司 Control method, attenuation units, system and the medium fed back to sef-adapting filter
CN110225214B (en) * 2014-04-02 2021-05-28 想象技术有限公司 Method, attenuation unit, system and medium for attenuating a signal
CN106716527B (en) * 2014-07-31 2021-06-08 皇家Kpn公司 Noise suppression system and method
CN106716527A (en) * 2014-07-31 2017-05-24 皇家Kpn公司 Noise suppression system and method
CN106067301A (en) * 2016-05-26 2016-11-02 浪潮(苏州)金融技术服务有限公司 A kind of method using multidimensional technology to carry out echo noise reduction
CN106067301B (en) * 2016-05-26 2019-06-25 浪潮金融信息技术有限公司 A method of echo noise reduction is carried out using multidimensional technology
CN110024025B (en) * 2016-11-23 2023-05-23 哈曼国际工业有限公司 Dynamic stability control system based on coherence
CN110024025A (en) * 2016-11-23 2019-07-16 哈曼国际工业有限公司 Dynamic stability control system based on coherence
CN106713685A (en) * 2016-11-25 2017-05-24 东莞市嘉松电子科技有限公司 Hands-free communication control method
CN106910500A (en) * 2016-12-23 2017-06-30 北京第九实验室科技有限公司 The method and apparatus of Voice command is carried out to the equipment with microphone array
CN107123430A (en) * 2017-04-12 2017-09-01 广州视源电子科技股份有限公司 Echo cancellation method, device, conference tablet and computer storage medium
WO2018188282A1 (en) * 2017-04-12 2018-10-18 广州视源电子科技股份有限公司 Echo cancellation method and device, conference tablet computer, and computer storage medium
CN107123430B (en) * 2017-04-12 2019-06-04 广州视源电子科技股份有限公司 Echo cancellation method, device, conference tablet and computer storage medium
CN107071197B (en) * 2017-05-16 2020-04-24 中山大学花都产业科技研究院 Echo cancellation method and system based on full-phase multi-delay block frequency domain
CN107071197A (en) * 2017-05-16 2017-08-18 中山大学花都产业科技研究院 A kind of echo removing method and system based on the piecemeal frequency domain of delay more than all phase
CN107017004A (en) * 2017-05-24 2017-08-04 建荣半导体(深圳)有限公司 Noise suppressing method, audio processing chip, processing module and bluetooth equipment
CN109215672B (en) * 2017-07-05 2021-11-16 苏州谦问万答吧教育科技有限公司 Method, device and equipment for processing sound information
CN109215672A (en) * 2017-07-05 2019-01-15 上海谦问万答吧云计算科技有限公司 A kind of processing method of acoustic information, device and equipment
CN107393546A (en) * 2017-09-04 2017-11-24 恒玄科技(上海)有限公司 A kind of echo cancel method and speech recognition apparatus for speech recognition process
WO2019128402A1 (en) * 2017-12-26 2019-07-04 深圳Tcl新技术有限公司 Method, system and storage medium for solving echo cancellation failure
US11276416B2 (en) 2017-12-26 2022-03-15 Shenzhen Tcl New Technology Co., Ltd. Method, system and storage medium for solving echo cancellation failure
CN108986836A (en) * 2018-08-29 2018-12-11 质音通讯科技(深圳)有限公司 A kind of control method of echo suppressor, device, equipment and storage medium
CN109102821A (en) * 2018-09-10 2018-12-28 苏州思必驰信息科技有限公司 Delay time estimation method, system, storage medium and electronic equipment
CN110913310A (en) * 2018-09-14 2020-03-24 成都启英泰伦科技有限公司 Echo cancellation method for broadcast distortion correction
CN109346096A (en) * 2018-10-18 2019-02-15 深圳供电局有限公司 Echo cancellation method and device for voice recognition process
CN109346096B (en) * 2018-10-18 2021-07-06 深圳供电局有限公司 Echo cancellation method and device for voice recognition process
CN110838300A (en) * 2019-11-18 2020-02-25 紫光展锐(重庆)科技有限公司 Echo cancellation processing method and processing system
CN110838300B (en) * 2019-11-18 2022-03-25 紫光展锐(重庆)科技有限公司 Echo cancellation processing method and processing system
CN111091846B (en) * 2019-12-26 2022-07-26 江亨湖 Noise reduction method and echo cancellation system applying same
CN111091846A (en) * 2019-12-26 2020-05-01 江亨湖 Noise reduction method and echo cancellation system applying same
CN111341336A (en) * 2020-03-16 2020-06-26 北京字节跳动网络技术有限公司 Echo cancellation method, device, terminal equipment and medium
CN111341336B (en) * 2020-03-16 2023-08-08 北京字节跳动网络技术有限公司 Echo cancellation method, device, terminal equipment and medium

Also Published As

Publication number Publication date
CN100524466C (en) 2009-08-05

Similar Documents

Publication Publication Date Title
CN100524466C (en) Echo elimination device for microphone and method thereof
US7003099B1 (en) Small array microphone for acoustic echo cancellation and noise suppression
US6597787B1 (en) Echo cancellation device for cancelling echos in a transceiver unit
US7773759B2 (en) Dual microphone noise reduction for headset application
JP5049277B2 (en) Method and system for clear signal acquisition
US9264807B2 (en) Multichannel acoustic echo reduction
EP3080975B1 (en) Echo cancellation
EP0843934B1 (en) Arrangement for suppressing an interfering component of an input signal
EP1855457A1 (en) Multi channel echo compensation using a decorrelation stage
JP5148150B2 (en) Equalization in acoustic signal processing
US20040264610A1 (en) Interference cancelling method and system for multisensor antenna
EP1081985A2 (en) Microphone array processing system for noisly multipath environments
JPH09504668A (en) Variable block size adaptive algorithm for noise-resistant echo canceller
JP2002501337A (en) Method and apparatus for providing comfort noise in a communication system
US11189297B1 (en) Tunable residual echo suppressor
CN102185991A (en) Echo cancellation method, system and device
US20180308503A1 (en) Real-time single-channel speech enhancement in noisy and time-varying environments
Albu et al. The hybrid simplified Kalman filter for adaptive feedback cancellation
EP3692703A1 (en) Echo canceller and method therefor
JPH09307625A (en) Sub band acoustic noise suppression method, circuit and device
CN107005268B (en) Echo cancellation device and echo cancellation method
EP2930917B1 (en) Method and apparatus for updating filter coefficients of an adaptive echo canceller
Bulling et al. Stepsize Control for Acoustic Feedback Cancellation Based on the Detection of Reverberant Signal Periods and the Estimated System Distance.
Yang Multilayer adaptation based complex echo cancellation and voice enhancement
US6507623B1 (en) Signal noise reduction by time-domain spectral subtraction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090805

Termination date: 20201124

CF01 Termination of patent right due to non-payment of annual fee