CN1953060A - Echo elimination device for microphone and method thereof - Google Patents
Echo elimination device for microphone and method thereof Download PDFInfo
- Publication number
- CN1953060A CN1953060A CNA2006101440555A CN200610144055A CN1953060A CN 1953060 A CN1953060 A CN 1953060A CN A2006101440555 A CNA2006101440555 A CN A2006101440555A CN 200610144055 A CN200610144055 A CN 200610144055A CN 1953060 A CN1953060 A CN 1953060A
- Authority
- CN
- China
- Prior art keywords
- mrow
- frame
- coefficient
- signal
- loudspeaker
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 230000008030 elimination Effects 0.000 title abstract 2
- 238000003379 elimination reaction Methods 0.000 title abstract 2
- 230000003044 adaptive effect Effects 0.000 claims abstract description 178
- 238000012545 processing Methods 0.000 claims description 111
- 238000001914 filtration Methods 0.000 claims description 105
- 238000001514 detection method Methods 0.000 claims description 70
- 230000005236 sound signal Effects 0.000 claims description 28
- 238000002592 echocardiography Methods 0.000 claims description 6
- 238000004321 preservation Methods 0.000 claims description 5
- 230000001268 conjugating effect Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 230000003247 decreasing effect Effects 0.000 claims 1
- 238000002360 preparation method Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 5
- 101150059859 VAD1 gene Proteins 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 101150012763 endA gene Proteins 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- QERYCTSHXKAMIS-UHFFFAOYSA-M thiophene-2-carboxylate Chemical compound [O-]C(=O)C1=CC=CS1 QERYCTSHXKAMIS-UHFFFAOYSA-M 0.000 description 1
Images
Landscapes
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
This invention discloses a microphone echo elimination device and method, which eliminates echo between microphone and sound circuit, wherein, the device comprises long frame adjust module to combine one self adaptive filter parameter data frame for self adapting filter.
Description
Technical Field
The present invention relates to the field of echo cancellation, and in particular, to a microphone echo cancellation device and method using a frequency domain adaptive filter, which are used for canceling an echo generated by an acoustic loop between a speaker and a microphone.
Background
The echo is generated due to the presence of an acoustic loop between the loudspeaker and the microphone. As shown in fig. 1, the sound signal from the far end, which reaches the near end through the communication connection, is recorded as a signal u, is emitted through the near end speaker, passes through the acoustic loop g between the speaker and the microphone, is collected by the microphone to obtain a reference signal d, and then is transmitted back to the far end through the communication connection. At this time, the far-end speaker can hear the echo of the far-end speaker, i.e., the far-end echo. Thereby seriously affecting call quality.
Since the acoustic loop g from the loudspeaker to the microphone is unknown and time-varying, the method of adaptive filtering is widely adopted in echo cancellation schemes. Fig. 1 shows a basic schematic diagram of echo cancellation using adaptive filtering. The adaptive filter takes the minimized residual echo e as a target, and carries out filtering processing on a sound signal u from a far end by adaptively adjusting the filter coefficient of the adaptive filter to track an acoustic feedback loop g from a loudspeaker to a microphone and generate a predicted value y of the echo d received by the microphone. When the filter accurately tracks g, y is very close to d, so that e-d-y tends to 0, thereby realizing the effect of eliminating echo.
In the adaptive filtering process, the adaptive filter needs to track an unknown feedback loop, i.e. to model an unknown device. When the unknown feedback loop g has a large delay, i.e. the unknown apparatus has a high order, the adaptive filter at least needs to have the same order to obtain the best analog effect. Since the time-domain adaptive filtering process is a convolution process of the input signal and the adaptive filter, the complexity of the algorithm increases sharply with the increase of the order of the adaptive filter, and is not practical when the delay of the feedback loop is large. The subband adaptive filtering can reduce the operation complexity, but can bring the problem of signal aliasing.
The convolution of the time domain is equal to the multiplication of the frequency domain, and the self-adaptive filtering algorithm of the frequency domain can reduce the algorithm complexity and improve the operation efficiency when the order of the filter is higher through the fast algorithm by means of FFT, so that the method is a very practical filtering mode.
The frequency domain adaptive filtering algorithm in the prior art is generally as follows.
Some of the signal labels used hereinafter will first be described. In frequency-domain adaptive filtering, the input signal is processed in units of frames, in this contextRepresents the current frame signal of the signal x, i.e. the k-th frame signal. Such as byRepresenting the sound signal of the k-th frame coming from the far end and about to be output to the loudspeakerRepresents a sound signal having a combined length of 2M, andrepresenting the echo signal of the kth frame acquired by the microphone, etc. Further, the treatment is represented by w (k)The time-domain filter coefficients, and their corresponding frequency-domain filter coefficients, are denoted by w (k). The FFT represents a fast fourier transform, and the IFFT represents an inverse fast fourier transform.
An echo cancellation device to which a frequency domain adaptive filter is applied generally includes the following components.
(1) A data acquisition and combination module for acquiring the sound signal u from the far end to be output to the loudspeaker, wherein the length of the data frame acquired each time is M, and the current frame, namely the kth frame data is recorded asAnd combined with the previous frame, i.e. the k-1 th frame data u' (k-1), to form a large frame with a length of 2M
(2) Assuming that the order of the adaptive filter is M, the time domain coefficient of the filter is denoted as w (k), and an overlap preservation method is adopted, in order to avoid aliasing, the filter of the M order is extended by M0 s to form a filter with N being 2M coefficients, and the frequency domain coefficient of the filter obtained after FFT processing is: the length is 2M.
The frequency domain adaptive filter is used for converting the frequency domain adaptive filter into a frequency domain adaptive filterFFT processing is carried out, and the frequency domain is converted to obtain <math> <mrow> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mo>[</mo> <mover> <mi>u</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> <mo>;</mo> </mrow> </math> Filtering the U (k) by using the current filter coefficient W (k), and then performing IFFT processing on the filtering result to obtain the echoA frame prediction value of <math> <mrow> <mover> <mi>y</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>IFFT</mi> <mo>[</mo> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> <mo>,</mo> </mrow> </math> The results were taken M points thereafter.
(3) A subtractor for collecting the echo by a microphoneSubtracting the predicted valueObtaining a residual echo <math> <mrow> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> CollectedThe length is also M.
(4) The frequency domain adaptive filter further comprises a voice correlation detection unit for calculating the residual echo in the frequency domainWith the sound signal from the far endTo obtain a speech correlation parameter
(5) The frequency domain adaptive filter further comprises a coefficient updating unit, which is used for updating the coefficient W (k) of the frequency domain adaptive filter according to the voice correlation and by combining the adaptive step size mu of the adaptive filter to obtain the coefficient
The frequency domain adaptive filter is updated once every time the coefficient W (k) of the frequency domain adaptive filter is adaptively filtered, and the adaptive filter performs frequency domain filtering on next combined big frame data by taking the updated coefficient W (k +1) as the current W (k) when the adaptive filter is next time.
Fig. 2 is a schematic diagram of a method for performing echo cancellation by using a frequency domain adaptive filtering method in the prior art, where a thin arrow represents time domain signal processing, and a thick arrow represents frequency domain signal processing. Since the signals are processed by frame division by the frequency domain adaptive filtering method, the u, y, d and e signals shown in fig. 1 correspond to those shown in fig. 2 Andk-th frame signals respectively representing the respective signals; in addition, in order toShow thatThe data of (2) are combined to obtain a large frame with the length of 2M. It is known that the block processing and re-combination after truncation of the long sequence needs to use overlap-add or overlap-save method to avoid aliasing, and the overlap-save method is described herein.
Firstly, assuming that the order of the time domain adaptive filter is M, the coefficient thereof is denoted as w (k), because an overlap preservation method is adopted, in order to avoid aliasing, the filter of the M order is expanded by M0, and the frequency domain coefficient vector of the filter obtained after FFT processing is:
as can be seen from the above equation (1.1), the length N of the frequency domain adaptive filter coefficients w (k) is 2 times the length M of the time domain coefficient vector. For frequency domain adaptive filtering algorithms, both adaptive filtering and filter coefficient updating are done in the frequency domain, so the form of time domain filters will not appear. It should be noted that the FFT or IFFT processing length mentioned later is N points.
The steps of the frequency domain adaptive filtering processing are as follows:
1) collecting a frame of sound signal from a remote endThe frame length is M.
2) For input signalProcessing to connect two frames, i.e. toAnd merging the data of the k-1 frame of the previous frame into a large frame to obtain the following formula:
whereinThe length of the kth merged large frame is N-2M;
u (kM-M) is the 1 st data in the original k-1 frame data;
u (kM-1) is the Mth data in the original k-1 frame data;
u (km) is the 1 st data in the original k frame data;
u (kM + M-1) is the Mth data in the original k frame data.
3) Will be provided withPerforming FFT processing, and converting into a frequency domain to obtain:
4) filtering the input signal, namely multiplying the input signal in a frequency domain, then performing IFFT processing, converting the input signal into a time domain, and taking the next frame of the result, namely the next M data, namely the predicted value of the echo signal:
the residual echo signal is the difference between the echo signal and its predicted value:
6) m zeros are added in front of the residual echo signal, and FFT processing is performed to obtain a frequency domain residual echo signal:
the update amount of the adaptive filter coefficients is calculated using e (k) and u (k). First, conjugate U (k) to obtain UH(k) In that respect In the frequency domain, the update amount of the adaptive filter coefficient vector is determined by calculating the correlation between the error signal and the input signal, since the linear correlation is equivalent in form to an inverse linear convolution, a fast algorithm having FFT in the frequency domain by means of convolution in the time domain has:
according to the overlap-and-hold method, in the above formula, the frame after the result needs to be deleted, i.e. only the first M points of the IFFT result are taken.
7) Finally we utilize The adaptive filter coefficients are updated. Note that: the filter coefficients in the frequency domain are generated by zero-filling the time domain coefficients followed by FFT processing, so here, correspondingly, the filter coefficients in the frequency domain will be generatedAfter M zeros are complemented, FFT processing is carried out, the result is multiplied by the self-adaptive step length mu, the obtained product is added with the filter coefficient W (k) before updating, and the frequency domain form of the filter coefficient updating can be obtained as follows:
and the next time of adaptive filtering, namely, the W (k +1) is adopted as the updated current filter coefficient W (k) for filtering.
8) And (5) circularly performing the steps 1) to 7) until the data processing is finished.
It can be seen from the steps of the frequency domain adaptive filtering algorithm that the filtering coefficient of the frequency domain adaptive filter is updated once every frame of signal with the frame length of M, so the convergence rate is slow, and especially when the characteristic of the feedback loop changes fast, the effect is not ideal.
Disclosure of Invention
In order to solve the above-mentioned drawbacks of the prior art, the present invention provides an echo cancellation device and an echo cancellation method, so that the coefficients of the frequency domain adaptive filter can work efficiently and stably, thereby achieving the purpose of effectively canceling echo.
In order to solve the above problem, the present invention provides a microphone echo cancellation device for canceling an echo generated by an acoustic loop between a speaker and a microphone, comprising:
a data acquisition and combination module for acquiring the sound signal u from the far end to be output to the loudspeaker, wherein the length of the data frame acquired each time is M, and the current frame, namely the kth frame data is recorded asAnd is compared with the previous frame, i.e. the k-1 frame dataJointly forming a large frame of length 2M
Frequency-domain adaptive filter whose current filter frequency-domain coefficients are denoted Length 2M, where w (k) is the time domain coefficient of the filter, length M; the frequency domain adaptive filter is used for converting the frequency domain adaptive filter into a frequency domain adaptive filter FFT processing is carried out, and the frequency domain is converted to obtain <math> <mrow> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mo>[</mo> <mover> <mi>u</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> </mrow> </math> (ii) a Filtering the U (k) by using the current filter coefficient W (k), and then performing IFFT processing on the filtering result to obtain the echoA frame prediction value of <math> <mrow> <mover> <mi>y</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>IFFT</mi> <mo>[</mo> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> <mo>,</mo> </mrow> </math> Taking the next M points of the result;
a subtractor for collecting the echo with length M by a microphoneSubtracting the predicted valueObtaining a residual echo <math> <mrow> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
The frequency domain adaptive filter further comprises a voice correlation detection unit for calculating the residual echo in the frequency domainWith the sound signal from the far endTo obtain a speech correlation parameter <math> <mrow> <mover> <mi>φ</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>IFFT</mi> <mo>[</mo> <msup> <mi>U</mi> <mi>H</mi> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> <mo>,</mo> </mrow> </math> Wherein U isH(k) Is the conjugate value to said U (k), <math> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow> </math> to pairTaking the first M points of the result;
the frequency domain adaptive filter further comprises a coefficient updating unit, which is used for updating the coefficient W (k) of the frequency domain adaptive filter according to the voice correlation and by combining the adaptive step size mu of the adaptive filter to obtain the coefficient
The coefficient W (k) of the frequency domain adaptive filter is updated once each time the frequency domain adaptive filter performs adaptive filtering, and the adaptive filter performs frequency domain filtering on next combined big frame data by using the updated coefficient W (k +1) when performing the next adaptive filtering;
the frame length adjusting module is used for setting the data frame length of the u to a value L smaller than M;
correspondingly, the data acquisition and combination module is used for combining L data of the current kth frame data and the immediately preceding 2M-L continuous data to form a large frame with the length of 2M;
accordingly, the frequency domain adaptive filter adaptively filters the 2M large frame; after the filtering processing of each frame of data with the length of L is finished, updating the frequency domain filtering coefficient of the filter;
and correspondingly, a residual echo interception module is also included,for intercepting said residual echoThe first L signals of each frame result, the final residual echo e is obtained.
Preferably, the frame length adjusting module adjusts the frame length from M to L ═ M/n, where n is an integer greater than 1; correspondingly, the data acquisition and combination module combines the current frame of u and the immediately previous 2n-1 data frames into a large frame with the length of 2M.
Preferably, the system also comprises an acoustic detection module and a filtering control module,
the sound detection module comprises two sound detection units which are respectively used for detecting sound conditions of the microphone input end and the loudspeaker output end and outputting the detection results to the filtering control module;
the filtering control module is used for controlling the work of the frequency domain self-adaptive filter according to the output result of the sound detection module,
if the microphone input end sound detection result is soundless, then the self-adaptive filtering is not carried out, the coefficient updating is not carried out, and the output is directly made <math> <mrow> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing;
if the detection result of the microphone input end is voiced, then the detection result of the loudspeaker output end is seen, if the detection of the loudspeaker output end is unvoiced, the self-adaptive filtering is normally carried out, but the coefficient updating is not carried out, and the output is carried out <math> <mrow> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing;
if the detection results of the microphone input end and the loudspeaker output end are voiced, the adaptive filter is in a normal working state, namely adaptive filtering is carried out, and coefficient updating is also carried out to obtain output <math> <mrow> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> And the updated filter coefficient W (k +1), completing the frame processing.
Preferably, the voiced sound detection module determines whether there is a voiced sound by comparing the short-time average amplitude of the sound signals at the microphone input end and the speaker output end with the noise level, specifically:
if MicSignal _ avg is larger than NoiseFloor, judging that the microphone line is voiced, otherwise, judging that the microphone line is unvoiced;
the above-mentioned <math> <mrow> <mi>MicSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>M</mi> <munderover> <mi>Σ</mi> <mn>0</mn> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> </math> Is the short-time average amplitude of the microphone input signal, whereinAcquiring a sound signal with a frame length of M for a microphone, wherein M is the frame length, and NoiseFlor is an estimated noise level;
if the SpkSignal _ avg is larger than the NoiseFloor, judging that the loudspeaker line is voiced, otherwise, judging that the loudspeaker line is unvoiced;
the above-mentioned <math> <mrow> <mi>SpkSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>L</mi> <munderover> <mi>Σ</mi> <mn>0</mn> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>u</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>,</mo> </mrow> </math> Is the short-time average amplitude of the signal input to the loudspeaker,for signals input to the speakers, L is the frame length.
Preferably, the apparatus further comprises a step size adjusting module, configured to detect a coefficient update step size μ of the adaptive filter, and decrease the value of μ when μ is greater than a set maximum coefficient update step size threshold.
Preferably, when it is detected that the update step size of the adaptive filter coefficient is restored to normal, the coefficient update step size is restored to the initial value.
Preferably, the adaptive filter further comprises a coefficient adjusting module, configured to decrease the filter coefficient w (k) when detecting that the coefficient w (k) of the adaptive filter is greater than a set coefficient threshold.
Preferably, the method further comprises the following steps: and the nonlinear processing module is used for suppressing nonlinear components in the echo.
Preferably, the nonlinear processing module enables the signal to be processed when E (e) > NLPfloor
Where e is a residual signal and is also an input of the nonlinear processing module, and an output of the nonlinear processing module is e', e (e) is a short-time average amplitude of the residual signal, and nlpfolor is a decision level.
Preferably, when E (e). ltoreq.NLPfloor, e' is directly replaced by comfort noise.
Preferably, the method further comprises the following steps:
the loudspeaker sound detection module is used for detecting the sound condition of the output end of the loudspeaker;
the nonlinear processing control module is used for turning on or off the nonlinear processing module according to the output result of the loudspeaker sound detection module;
when the loudspeaker sound detection module detects that the output end of the loudspeaker is sound, namely SpkSignal _ avg is larger than NoiseFlor,
when the signal at the output end of the loudspeaker is larger than the residual signal by alpha times, namely SpkSignal _ avg/Ee is larger than alpha, the nonlinear processing module is started;
if one of the two conditions is not met, the NLP processing is closed;
wherein: SpkSignal _ avg is the short-time average amplitude of the loudspeaker output signal, noiseflo is the estimated noise level, and e (e) is the short-time average amplitude of e.
The invention also provides a microphone echo cancellation method, which utilizes a frequency domain adaptive filtering method to cancel the echo d generated by the sound signal u from the far end through an acoustic loop between a loudspeaker and a microphone, and finally obtains a residual echo e, wherein the coefficient of a time domain filter is w (k), the length is M, and the coefficient of a corresponding frequency domain filter is as follows:
1) setting the data frame length L of the signal u acquired each time;
3) the current frameMerging the data with the previous 2M-L data into a large frame with the length of 2M
4) Will be described inConversion to the frequency domain, overlap-preserving, with filter coefficients W (k) for said frequency domainFiltering is carried out, the result is converted into a time domain, and a predicted value of the echo time domain is obtained
7) and 2) acquiring the next frame of signals, merging the signals, and performing frequency domain adaptive filtering by using the updated filter coefficient until the data input is finished.
Preferably, the frequency domain adaptive filtering algorithm includes the following steps:
1) frame length adjustment, namely adjusting the frame length of u from M to a positive integer value L smaller than M;
2) collecting the kth frame signal of u, the frame length is L, and recording as
3) Will be described inThe L data in the data list and the 2M-L data in the immediate past are combined to form the combined dataOne large frame of length 2M
u (kL-2M + L) is the 2M-L data before the original k frame,
u (kL-2) is the 2 nd data before the original k frame,
u (kL-1) is the previous data of the original k-th frame,
u (kL) is the 1 st data in the original k frame,
u (kL + L-1) is the L-th data in the original k-th frame;
4) will be provided withPerforming FFT processing, and converting into a frequency domain to obtain: <math> <mrow> <mi>U</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mo>[</mo> <mover> <mi>u</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>]</mo> <mo>;</mo> </mrow> </math>
5) filtering U (k) by current filter coefficient W (k) by overlap preserving method, i.e. multiplying in frequency domain, then taking the last M data of the result after IFFT processing to the result, and recording asNamely, the method comprises the following steps:
6) after being played by the loudspeaker, the u passes through an acoustic loop between the loudspeaker and the microphone, and then is collected by the microphone to obtain an echo signal with the length of M so as to obtain a signalRepresents, i.e.:
the above-mentionedAnd the step 5) mentioned aboveSubtracting to obtain an error signalComprises the following steps:
8) at said length M, not interceptedM zeros are supplemented in advance, and FFT processing is carried out to obtain: <math> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow> </math>
simultaneously conjugating the U (k) in the step 4) to obtain UH(k) Then, performing dot multiplication with the E (k), and performing IFFT operation on the result, and obtaining the following result according to an overlap preservation method:
in the above formula, the next frame of the result needs to be deleted, and only the first M points of the IFFT result are taken;
9) in the above-mentionedThen, M zeros are complemented, FFT processing is carried out, the result is multiplied by the self-adaptive step size mu, the obtained product is added with the filter coefficient W (k), and the updated value of the filter coefficient in the frequency domain form can be obtained as follows: <math> <mrow> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>μFFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mover> <mi>φ</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow> </math> the next time of self-adaptive filtering is to adopt the updated filter coefficient W (k +1) for filtering;
10) and step 2) is executed until the sound signal from the far end is input.
Preferably, the value of L is M/n, and n is an integer greater than 1.
Preferably, before the step 1), the method further comprises a sound detection step and a filtering control step, and the method comprises the following steps:
a sound detection step, detecting sound conditions of the microphone input end and the loudspeaker output end;
a filtering control step of controlling the operation of a filter according to the result of the voiced sound detection step;
the method specifically comprises the following steps:
if the detection result of the microphone input end is silent, then the self-adaptive filtering is not carried out, the coefficient updating is not carried out, and the output is directly made <math> <mrow> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing;
if the detection result of the microphone input end is voiced, then looking at the detection result of the loudspeaker output end, if the detection result of the loudspeaker output end is unvoiced, then the self-adaptive filtering is normally carried out, but the coefficient updating is not carried out, and the output is carried out <math> <mrow> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing;
if the detection results of the microphone input end and the loudspeaker output end are voiced, the adaptive filter is in a normal working state, and not only is adaptive filtering carried out, but also coefficient updating is carried out, and the adaptive filter is output <math> <mrow> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing;
wherein,in order to be the echo received by the microphone,for the pairs of the adaptive filter outputsThe predicted value of (a) is determined,is the residual echo.
Preferably, the sound detection is to determine whether there is sound by comparing the short-time average amplitude of the sound signals at the microphone input end and the speaker output end with the noise level, specifically:
if MicSignal _ avg is larger than NoiseFloor, judging that the microphone line is voiced, otherwise, judging that the microphone line is unvoiced;
the above-mentioned <math> <mrow> <mi>MicSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>M</mi> <munderover> <mi>Σ</mi> <mn>0</mn> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>,</mo> </mrow> </math> Is the short-time average amplitude of the microphone input signal,for a microphone input signal, i.e., a received echo signal, M is the length of a frame of speech signal, noise floor is the estimated noise level;
if the SpkSignal _ avg is larger than the NoiseFloor, judging that the loudspeaker line is voiced, otherwise, judging that the loudspeaker line is unvoiced;
the above-mentioned <math> <mrow> <mi>SpkSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>L</mi> <munderover> <mi>Σ</mi> <mn>0</mn> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>u</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>,</mo> </mrow> </math> For a short-time average amplitude of the loudspeaker output signal,outputting signals for loudspeakersAnd L is the length of a frame of speech signal.
Preferably, the method further comprises a step size adjusting step, configured to decrease the coefficient update step size of the adaptive filter when it is detected that the coefficient update step size of the adaptive filter is greater than the set maximum coefficient update step size threshold.
Preferably, when it is detected that the update step size of the adaptive filter coefficient is restored to normal, the coefficient update step size is restored to the initial value.
Preferably, the method further comprises a coefficient adjusting step, configured to decrease the coefficient of the filter when the coefficient of the adaptive filter is detected to be greater than the set coefficient threshold.
Preferably, the method further comprises the following nonlinear processing steps:
firstly, calculating the short-time average amplitude E (e) of the minimized residual signal;
then, whether E (e) is larger than a preset nonlinear processing threshold NLPfloor is judged, and if yes, the minimized residual noise e' (n) is calculated by using the following formula:
wherein e is a residual signal and is an input of the nonlinear processing module, an output of the nonlinear processing module is e', e (e) is a short-time average amplitude of the residual signal, and nlpfolor is a decision level.
Preferably, if E (e). ltoreq.NLPfloor, e' is directly replaced by comfort noise.
Preferably, the method further comprises a nonlinear processing switch control step, specifically:
detecting the sound condition of the output end of the loudspeaker;
turning on or off the nonlinear processing step according to the detection result, specifically:
when the output end of the loudspeaker is detected to be voiced, namely SpkSignal _ avg is larger than NoiseFlor, and the signal of the output end of the loudspeaker is larger than a residual signal by alpha times, namely SpkSignal _ avg/E [ E ] > alpha, the nonlinear processing module is started;
if one of the two conditions is not met, the NLP processing is closed; wherein: SpkSignal _ avg is the short-time average amplitude of the speaker output signal, noise floor is the estimated noise level, e (e) is the short-time average amplitude of the residual signal, and α is a preset multiple value.
The frame length adjusting module added in the self-adaptive frequency domain filter makes the frame length of the sound signal from the far end processed at one time smaller than the time domain coefficient length of the self-adaptive filter, and then combines more than one frame of signal into a large frame to carry out self-adaptive filtering. On one hand, the length of the self-adaptive filter is kept to be original enough length, and the delay requirement of a feedback loop can be met; on the other hand, the updating frequency of the adaptive filter coefficient is increased, so that the adaptive filter can work efficiently. In addition, the filtering control module disclosed by the invention can ensure that the self-adaptive filter can not be converged wrongly under the special condition that a microphone input line or a loudspeaker output line is silent, so that the normal work of the self-adaptive filter is ensured; the step length adjusting module and the coefficient adjusting module enable the adaptive filter to recover to a normal working state under the condition of divergence; the nonlinear processing module may cancel nonlinear distortion in the feedback loop. Therefore, the echo cancellation device of the invention can make the adaptive filter work efficiently and stably, thereby achieving the purpose of effectively canceling echo.
Drawings
FIG. 1 is a schematic diagram of a basic structure of an apparatus for performing echo cancellation by adaptive filtering;
FIG. 2 is a diagram illustrating a method for performing echo cancellation by frequency-domain adaptive filtering in the prior art;
FIG. 3 is a schematic diagram of the structure of the voice detection module and the filtering control module in the device according to the present invention;
FIG. 4 is a diagram of a data merge unit according to the present invention;
fig. 5 is a schematic diagram of the relationship between echo and decision level before and after the nonlinear processing by the nonlinear processing module according to the present invention.
Detailed Description
The echo cancellation device and method of the present invention will be described in detail below with reference to the accompanying drawings.
In order for the adaptive filter to effectively track the feedback loop, the coefficient length of the adaptive filter must be greater than the number of sampling points of the feedback delay. For example, for a signal with 8K sampling rate, if the time-domain adaptive filter coefficient length M is 1024, the maximum feedback delay of the feedback loop that the filter can track and model is: 1024/8000-128 ms.
In the frequency domain adaptive filtering method described in the background art, the length of the frequency domain filter coefficient is 2M, the length of the corresponding time domain coefficient is M, and the length of each new incoming data frame is also M. That is, the adaptive filter time domain coefficient length is the same as the length of the new data frame, i.e. the adaptive filter coefficient length can be 1024, and then the data frame length processed once is also 1024. Thus, only about 8 filtering and coefficient updates are performed a second. For environments where the feedback loop changes faster, this update frequency is sometimes insufficient.
Therefore, as shown in fig. 3, on the basis of the frequency-domain adaptive filtering, the present invention adds a frame length adjusting module for adjusting the length of the data frame to L. Note that after one adjustment, the frame length is relatively fixed, rather than being adjusted every time a frame of data is acquired. Such as: the length of the frequency domain filter coefficient is 2M, the length of the corresponding time domain filter coefficient is M, and the length L of each new incoming data frame can be the filter time domain coefficientHalf the length, i.e., L ═ M/2(M is an even number). To the input signalThe original two-frame combination is changed into four-frame combination. Through the improvement, on one hand, the length of the self-adaptive filter is still 2M and is long enough to meet the delay requirement of a feedback loop; on the other hand, the adaptive filter coefficient is updated once per M/2 frame length, and the updating frequency of the adaptive filter coefficient is also considered. However, this comes at the cost of increased algorithm complexity. Since the data amount of each frame is L, a residual echo intercepting module is added to intercept the first L data of the obtained residual echo and output the intercepted data as a final result when the residual echo is output.
In the above example, L is M/2, and in actual use, it may be M/3, M/4, M/8, or the like, so that the coefficient update frequency of the adaptive filter can be made higher. Only accordingly, the length of the data intercepted by the residual echo intercepting module needs to be changed.
In addition to this, the length L of each data frame may also be any number less than M, for example: m is 1024, then L can be 1000, 900, 650, etc. any value less than 1024. Only when the data frames are combined, the length of the combined large frame is ensured to be 2M. This problem can be solved by: as shown in FIG. 4, a FIFO buffer of length 2M is used to store the incoming data, with each new frame of data being receivedWill be provided withCombined with the previous 2M-L data into one large frameAn adaptive filtering process is performed once.
Adaptive filtering can automatically track the feedback loop, but for special cases, adaptive filters are prone to mis-tracking, such as the case where the microphone and speaker lines are silent at the same time. In this case, the input signal and the reference signal of the adaptive filter are small, and the adaptive filter is liable to misconvergence.
In order to prevent the filter from erroneously converging, the present invention proposes that a voiced sound detection module and a filtering control module may be added to the echo cancellation device, as shown in fig. 3.
The voiced detection module, i.e., VAD (voice Activity detector) module, may include two voiced detection units VAD1 and VAD2 located at the microphone input and the speaker output. VAD detection may make a decision by comparing the short-time average amplitude of the signal to the noise level. The short-time average amplitude of the signal can be obtained by calculating the average amplitude of the signal for one frame.
For the microphone input: <math> <mrow> <mi>MicSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>M</mi> <munderover> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> </math>
(2.1)
in the formula: MicSignal _ avg is the short-time average amplitude of the microphone input signal,for the microphone input signal, M is the length of a frame of speech signal.
If MicSignal _ avg > NoiseFlor, the microphone line is judged to be voiced, otherwise, the microphone line is not voiced. Where noise floor is the estimated noise level.
Similarly, for the speaker output: <math> <mrow> <mi>SpkSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>L</mi> <munderover> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <msup> <mover> <mi>u</mi> <mo>→</mo> </mover> <mo>'</mo> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> </math>
(2.2)
in the formula: SpkSignal _ avg is the short-time average amplitude of the loudspeaker output signal,l is the length of one frame of speech signal for the sound signal input to the speaker.
If SpkSignal _ avg > NoiseFlor, judging that the loudspeaker line is voiced, otherwise, judging that the loudspeaker line is unvoiced.
According to the output result of the sound detection unit, the filtering control module performs overall control on the work of the filter, and specifically comprises the following steps:
if the VAD1 detects silence, the output is directly made without adaptive filtering or filter coefficient updating <math> <mrow> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing; if the VAD1 detects a sound, then look at the VAD2 detection result, if the VAD2 detectsSilence, adaptive filtering, but no filter coefficient updating, is normally performed, and output <math> <mrow> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> </math> Completing the frame processing; if both VAD1 and VAD2 detect voiced sounds, then the adaptive filter is in a normal operating state, i.e., adaptive filtering is performed, filter coefficient updating is also performed, and the output <math> <mrow> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> </math> This frame processing is completed.
Experiments show that after filtering control is added, the adaptive filter can not be converged wrongly under the special condition that a microphone input line or a loudspeaker output line is silent, and normal work of the adaptive filter is guaranteed.
In addition, for adaptive filtering, if the reference signal collected by the microphoneThe sound emitted by the speaker is completely generated, so that the adaptive filter can easily track the feedback loop and can stably work. However, the signal collected by the microphone generally includes not only the sound emitted from the speaker but also the sound from the near endA sound signal, and such a sound signal sometimes also occupies a major component. Such a signal will therefore interfere with the adaptive filter tracking the feedback loop correctly, possibly leading to erroneous tracking of the adaptive filter and even coefficient divergence.
When the filter tracks incorrectly, coefficients begin to diverge, which is shown in coefficient updating, and the coefficient updating amount of the adaptive filter is usually larger at this time. Therefore, as shown in fig. 3, the present invention can add a step size adjustment module, and when it is detected that the coefficient update amount is relatively large, it is determined that the adaptive filter is in an abnormal working state at this time, and the coefficient update step size is reduced, so that the error tracking of the filter can be effectively suppressed, and the coefficient divergence is avoided. When the coefficient updating amount is detected to be normal, the adaptive filter is judged to be in a normal working state at the moment, and then the coefficient updating step length can be adjusted, such as the coefficient updating step length is restored to the initial value. This can increase the convergence speed of the adaptive filter.
In particular, for the NLMS algorithm in the frequency domain adaptive algorithm,
as previously described, the coefficient update is shown as follows:
order to <math> <mrow> <mi>Φ</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mi>φ</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> </mrow> </math>
(2.4)
Then, W (k +1) ═ W (k) + μ · Φ (k) (2.5)
Where w (k) is a frequency domain adaptive filter coefficient and is an N-dimensional complex vector, μ is a coefficient update step, Φ (k) is also an N-dimensional complex vector, and N is the number of FFT points. Namely:
Ф(k)=[Ф0(k),Ф1(k),...,ФN-1(k)]T
(2.6)
the coefficient update amount thus obtained is:
μ·Ф(k)=[μ·Ф0(k),μ·Ф1(k),...,μ·ФN-1(k)]T (2.7)
the key to the step size adjustment mentioned above is to detect the magnitude of the coefficient update amount. The magnitude of the coefficient update can be measured modulo a complex number. Namely:
[μ·‖Ф0(k)‖,μ·‖Ф1(k)‖,...,μ·‖ФN-1(k)‖]T (2.8)
in the present invention, the step length adjustment method may be:
for μ | phii(k)‖,i=0,1,...,N-1,
If mu · | phii(k) II > MaxStepSize, MaxStepSize is the maximum step size threshold, thenAt this time, the adaptive filter is in an abnormal working state, and then the step size is adjusted, which may be scaling down the step size, for example, by 10 times. I.e., μ ═ 0.1 μ.
Experiments show that after the step length adjusting module is added, although the convergence speed of the frequency domain adaptive filter is reduced to a certain extent, the coefficient is not easy to diverge, and the stability of the adaptive filter is greatly enhanced.
The filtering control module and the step length adjusting module ensure the stable work of the self-adaptive filter to a certain extent. However, some sudden events, or unexpected situations, may still cause the adaptive filter to diverge, and the diverging filter may cause the speaker to emit a loud noise. Therefore, the present invention proposes a strategy for dealing with special situations, and as shown in fig. 3, a coefficient adjustment module can be added as a last line of defense for ensuring stable operation of the adaptive filter.
The principle of the operation of the coefficient adjusting module is simple, that is, when the adaptive filter diverges, the coefficient of the adaptive filter tends to be larger, so that the task of coefficient adjustment is to check the size of the coefficient after each coefficient update, and if the coefficient is larger than a set threshold, the adaptive filter is considered to diverge. Specifically, for the frequency domain NLMS algorithm, as mentioned above, the coefficient update is shown as follows:
(2.9)
where W (k) is the frequency domain adaptive filter coefficient, which is an N-dimensional complex vector, and N is the number of FFT points. Namely: w (k) ═ W0(k),W1(k),...,WN-1(k)]T
(2.10)
The magnitude of the coefficients is measured modulo a complex number. Namely:
[‖W0(k)‖,‖W1(k)‖,...,‖WN-1(k)‖]T (2.11)
for | Wi(k)‖,i=0,1,...,N-1,
If | Wi(k) Ii > MaxParam, where MaxParam is the maximum coefficient threshold, then it is determined that the frequency domain adaptive filter has now diverged, and the coefficients of the adaptive filter are adjusted, which may be reducing the adaptive filter coefficients, such as may be zeroed, i.e.: w (k) is 0. After the coefficients are set to zero, the adaptive filter will resume convergence, thus saving the filter from the divergence state. The threshold value MaxParam needs to be carefully selected according to the gain of a feedback loop, and the value is too large, so that the coefficient monitoring is insensitive and the divergence state cannot be effectively identified; the value is too small, and misjudgment is easy to occur, so that the adaptive filter is frequently restarted and cannot work normally.
In addition, a Non-linear processing module, namely an NLP (Non-Linear processor) module, can be added. This is because typical loudspeakers have 5% -10% nonlinear distortion, and adaptive filtering can only track linear systems, so that the nonlinear distortion of the signal in the feedback loop is unpredictable and eliminated. Therefore, an NLP processing module can be added after adaptive filtering to eliminate nonlinear distortion.
Because NLP processing is only performed for non-linear distortion of the speaker, the module can be turned off when not needed, which requires adding a non-linear processing control module and a speaker voiced detection module for controlling the turning on and off of the non-linear processing module, wherein the speaker voiced detection module can use VAD2 in the voiced detection module.
The specific control principle is as follows: when (1) SpkSignal _ avg > NoiseFloor, i.e., VAD2 detects speaker voiced; and (2) SpkSignal _ avg/Ee > alpha, i.e. the loudspeaker signal is alpha times larger than the residual signal; NLP processing is initiated. If either of the conditions (1) and (2) is not met, the NLP module is closed.
Where condition (1) states that when the speaker is silent, no echo is possible, and NLP processing is unnecessary; condition (2) shows that when the near end has sound, Ee is larger, so that condition (2) is not satisfied, NLP processing is closed, and the near end signal is transmitted without distortion.
In the formula: SpkSignal _ avg is the short-time average amplitude of the speaker output signal, noise floor is the estimated noise level, E [ E ] is the short-time average amplitude of the residual signal, and the value of α can be 2 in this embodiment. The short-time average amplitude may be an average of sums of absolute values of signals in one frame of signal.
The NLP processing in this scheme can adopt a center clipping method to suppress the residual echo. As shown in fig. 5, a schematic view of NLP processing for center clipping. Its action can be represented by the following formula: when E > NLPfloor,
in the formula, e and e' are residual echoes before and after passing through the NLP module. E [ E ] is the short-term average amplitude, NLPfloor is the decision level, the value of which needs to be carefully chosen, too small to effectively suppress the residual echo, and too large to seriously affect the near-end sound quality.
In addition, when E ≦ NLPfloor, E' may be replaced with comfort noise. The reason why e 'is replaced by comfort noise is that if e' is set to zero directly, noise is introduced when NLP switches on and off, giving the illusion of half-duplex. Comfort noise may be generated using an analog gaussian random signal.
The following describes a method for performing echo cancellation of a microphone by using a frequency-domain adaptive filtering method according to the present invention.
First, some basic concepts are explained as used below, the frequency domain filter coefficients are: length 2M, where w (k) is the corresponding time-domain adaptive filter coefficient of length M, using overlap-and-hold method.
On the basis of the echo cancellation method in the background art, the invention proposes to add a frame length adjustment step, which is used for adjusting the length of a data frame. First, this step is explained in detail, and in the present invention, the length is adjusted to any positive integer L smaller than M. Such as: the length of the frequency domain filter coefficients is 2M, corresponding to the time domain filter coefficientsWith a length of M, the length of each new incoming data frame can be adjusted to half the length of the time-domain filter coefficients, i.e., L equals M/2(M is an even number). Then compared to the background art for the input signalThe original two-frame combination is changed into four-frame combination. Through the improvement, on one hand, the length of the self-adaptive filter is long enough to meet the delay requirement of a feedback loop; on the other hand, the updating frequency of the adaptive filter coefficient is also considered.
In the above example, L is M/2, and in actual use, it may be M/4, M/3, M/8, or the like, so that the coefficient update frequency of the adaptive filter can be made higher. Only accordingly, the length of the data intercepted by the residual echo intercepting module needs to be changed. In practical use, the length L of the data frame may be any number less than M, for example: m is 1024, then L can be 1000, 900, 650, etc. any value less than 1024. However, this comes at the cost of increased algorithm complexity. Note that after one adjustment, the frame length is relatively fixed until all data is processed, rather than performing frame length adjustment every time a frame of data is collected. Finally, because the data volume of each frame is L, when the residual echo is output, a residual echo interception step is added for intercepting the first L or L data of the obtained residual echo and outputting the intercepted data as a final result.
The microphone echo cancellation method using the frequency domain adaptive filtering method, which adds the frame length adjustment step and the residual echo interception step, is completely described below by taking M as an example 1024.
1) A frame length adjusting step, wherein the frame length is adjusted to a positive integer value L smaller than M; in this embodiment, let L be 800.
2) Collecting a frame of k-th far-end sound signal to be output to a loudspeakerFrame lengthIs 80O.
3) The current frameThe 800 data in the frame and the previous 2M-L2048-800-1248 data are combined to form a large frame with the length of 2MAs shown in fig. 4, the newly acquired current frameThe 800 data and the 1248 data form a large frame with length of 2048
u (800k-1248) is the 1248 th data before the original k frame,
u (800k-2) is the 2 nd data before the original k frame,
u (800k-1) is the previous data before the original k frame,
u (800k) is the 1 st data in the original k frame data,
u (800k +799) is the 800 th data in the original k frame data.
When the first and second frame signals are collected initially, the third frame data is waited to come and then combined with the 448 data in the first frame data and 800 data in the second frame to form a large frame with length of 2048And carrying out adaptive filtering processing once. The following data is new data every coming frameNamely, data combination is carried out and then adaptive filtering processing is carried out once.
4) Will be provided withPerforming FFT processing, and converting into a frequency domain to obtain:
5) filtering U (k) by using current filter coefficient W (k) by using an overlap-and-leave method, namely multiplying the result on a frequency domain, performing IFFT processing on the result, and taking the last M data of the result, namely the last 1024 data, and recording the result as the last M dataNamely, the method comprises the following steps:
6) the far-end sound signalAfter being played by the loudspeaker, the echo signal with the length of M is collected by the microphone to pass through an acoustic loop between the loudspeaker and the microphoneRepresents, i.e.:
the above-mentionedAnd the step 5) mentioned aboveSubtracting to obtain an error signalComprises the following steps:
8) at said length M, not interceptedM zeros are supplemented in advance, and FFT processing is carried out to obtain:
;
simultaneously conjugating the U (k) in the step 4) to obtain UH(k) Then, performing dot multiplication with the E (k), and performing IFFT operation on the result, and obtaining the following result according to an overlap preservation method:
in the above formula, the next frame of the result needs to be deleted, and only the first M points of the IFFT result are taken;
9) in thatThen, M zeros are complemented, FFT processing is carried out, the result is multiplied by the self-adaptive step size mu, the obtained product is added with the filter coefficient W (k), and the updated value of the filter coefficient in the frequency domain form can be obtained as follows: <math> <mrow> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mi>W</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>μFFT</mi> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mover> <mi>φ</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow> </math> the next adaptive filtering uses the updateFiltering the subsequent filter coefficient W (k +1) as the current W (k);
10) and step 2) is executed until the sound signal from the far end is input, and the whole process is finished.
In the above-mentioned embodiment, the value of L is 800, and in practical use, the value may also be other integer values smaller than M, such as 600, 500, and so on. In addition, the value of L can be M/n, i.e., 1024/n, n being an integer greater than 1, and 1024/n also being an integer. If it can be 1024/2, only 4 data frames need to be combined to get a large frame with length 2048. In this case, the filter coefficient can be updated once every 1024/2 data, so that the convergence rate of the filter coefficient is increased, and the efficiency is improved.
Before the step 1), the method may further include an active sound detection step and a filtering control step, and the active sound detection step and the filtering control step are used for integrally controlling the operation of the filter, and the method includes:
a sound detection step, detecting sound conditions of the microphone input end and the loudspeaker output end;
a filtering control step of controlling the operation of the filter according to the result of the voiced sound detection step, specifically:
if the detection result of the microphone input end is silent, then the self-adaptive filtering is not carried out, the coefficient updating is not carried out, and the output is directly made <math> <mrow> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing;
if the detection result of the microphone input end is voiced, then the detection result of the loudspeaker output end is seen, and if the detection result of the loudspeaker output end is unvoicedThen the adaptive filtering is normally performed but the coefficient updating is not performed, and the output <math> <mrow> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> Completing the frame processing;
if the detection results of the microphone input end and the loudspeaker output end are voiced, the adaptive filter is in a normal working state, and not only is adaptive filtering carried out, but also coefficient updating is carried out, and the adaptive filter is output <math> <mrow> <mover> <mi>e</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>y</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> This frame processing is completed.
Wherein,in order to be the echo received by the microphone,for the pairs of the adaptive filter outputsThe predicted value of (a) is determined,is the residual echo.
The voiced sound detection is to judge whether voiced sound exists or not by comparing the short-time average amplitude of the sound signals at the microphone input end and the loudspeaker output end with the noise level, and specifically comprises the following steps:
if MicSignal _ avg is larger than NoiseFloor, judging that the microphone line is voiced, otherwise, judging that the microphone line is unvoiced;
the above-mentioned <math> <mrow> <mi>MicSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>M</mi> <munderover> <mi>Σ</mi> <mn>0</mn> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>d</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>,</mo> </mrow> </math> Is the short-time average amplitude of the microphone input signal,for a microphone input signal, i.e., a received echo signal, M is the length of a frame of speech signal, noise floor is the estimated noise level;
if the SpkSignal _ avg is larger than the NoiseFloor, judging that the loudspeaker line is voiced, otherwise, judging that the loudspeaker line is unvoiced;
the above-mentioned <math> <mrow> <mi>SpkSignal</mi> <mo>_</mo> <mi>avg</mi> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>L</mi> <munderover> <mi>Σ</mi> <mn>0</mn> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mover> <mi>u</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>,</mo> </mrow> </math> Outputting signals for loudspeakersThe short-time average amplitude of the signal,for the loudspeaker output signal, L is the length of a frame of speech signal.
The method also comprises a step length adjusting step, which is used for reducing the coefficient updating step length of the adaptive filter when the coefficient updating step length of the adaptive filter is detected to be larger than the set maximum coefficient updating step length threshold value. The step size of the coefficient update of the adaptive filter is reduced, and the step size of the coefficient update can be reduced by a certain proportion.
And when the updating step length of the self-adaptive filter coefficient is detected to be recovered to be normal, the coefficient updating step length is recovered to the initial value.
In addition, a coefficient adjusting step is included for reducing the coefficient of the filter when the coefficient of the adaptive filter is detected to be larger than the set coefficient threshold value. To effectively prevent the filter coefficients from diverging.
Further comprising a non-linear processing step: firstly, calculating the short-time average amplitude E (e) of the minimized residual signal; then, whether E (e) is larger than a preset nonlinear processing threshold NLPfloor is judged, and if yes, the minimized residual noise e' (n) is calculated by using the following formula:
wherein e is a residual signal and is an input of the nonlinear processing module, an output of the nonlinear processing module is e', e (e) is a short-time average amplitude of the residual signal, and nlpfolor is a decision level.
If E (e) is ≦ NLPfloor, e' is directly replaced with comfort noise.
The step 7) may be followed by a nonlinear processing switch control step, specifically: detecting the sound condition of the output end of the loudspeaker; and turning on or off the nonlinear processing step according to the detection result.
The opening or closing method specifically comprises the following steps: when the output end of the loudspeaker is detected to be voiced, namely SpkSignal _ avg is larger than NoiseFlor, and the signal of the output end of the loudspeaker is larger than the residual signal by alpha times, namely SpkSignal _ avg/E [ E ] > alpha, if alpha is 6, the nonlinear processing module is started;
if one of the two conditions is not met, the NLP processing is closed; wherein: SpkSignal _ avg is the short-time average amplitude of the speaker output signal, noise floor is the estimated noise level, and e (e) is the short-time average amplitude of the residual signal.
By using the technical scheme of the invention, the frequency domain filter can work efficiently and stably, and the specific performance indexes obtained by experiments are as follows:
echo compression: 50-60 dB;
convergence time: less than 1 s;
supported feedback loop delay time: adjustable, e.g., at 8K sample rate, filter length 1024, 128ms delay can be supported.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and the like that are within the spirit and principle of the present invention are included in the present invention.
Claims (22)
1. A microphone echo cancellation device for canceling echo generated by an acoustic loop between a speaker and a microphone, comprising:
a data acquisition and combination module for acquiring the sound signal u from the far end to be output to the loudspeaker, wherein the length of the data frame acquired each time is M, and the current frame, namely the kth frame data is recorded asAnd is combined with the previous frame, i.e. the k-1 frame dataJointly forming a large frame of length 2M
Frequency-domain adaptive filter whose current filter frequency-domain coefficients are denoted Length 2M, where w (k) is the time domain coefficient of the filter, length M; the frequency domain adaptive filter is used for converting the frequency domain adaptive filter into a frequency domain adaptive filterFFT processing is carried out, and the frequency domain is converted to obtain(ii) a Filtering the U (k) by using the current filter coefficient W (k), and then performing IFFT processing on the filtering result to obtain the echoA frame prediction value ofTaking the next M points of the result;
a subtractor for collecting the echo with length M by a microphoneSubtracting the predicted valueObtaining a residual echo
The frequency domain adaptive filter further comprises a voice correlation detection unit for calculating the residual echo in the frequency domainWith the sound signal from the far endTo obtain a speech correlation parameterWherein U isH(k) Is the conjugate value to said U (k),to, forTaking the first M points of the result;
the frequency domain adaptive filter further comprises a coefficient updating unit, which is used for updating the coefficient W (k) of the frequency domain adaptive filter according to the voice correlation and by combining the adaptive step size mu of the adaptive filter to obtain the coefficient
The coefficient W (k) of the frequency domain adaptive filter is updated once each time the frequency domain adaptive filter performs adaptive filtering, and the adaptive filter performs frequency domain filtering on next combined big frame data by using the updated coefficient W (k +1) when performing the next adaptive filtering;
the device is characterized by also comprising a frame length adjusting module, a frame length adjusting module and a frame length adjusting module, wherein the frame length adjusting module is used for setting the data frame length of the u to be a value L smaller than M;
correspondingly, the data acquisition and combination module is used for combining L data of the current kth frame data and the immediately preceding 2M-L continuous data to form a large frame with the length of 2M;
accordingly, the frequency domain adaptive filter adaptively filters the 2M large frame; after the filtering processing of each frame of data with the length of L is finished, updating the frequency domain filtering coefficient of the filter;
and correspondingly, the echo signal processing device also comprises a residual echo intercepting module for intercepting the residual echoThe first L signals of each frame result, the final residual echo e is obtained.
2. The echo cancellation device according to claim 1, wherein the frame length adjusting module adjusts a frame length from M to L/n, where n is an integer greater than 1; correspondingly, the data acquisition and combination module combines the current frame of u and the immediately previous 2n-1 data frames into a large frame with the length of 2M.
3. The echo cancellation device according to claim 1 or 2, further comprising an audible detection module and a filtering control module,
the sound detection module comprises two sound detection units which are respectively used for detecting sound conditions of the microphone input end and the loudspeaker output end and outputting the detection results to the filtering control module;
the filtering control module is used for controlling the work of the frequency domain self-adaptive filter according to the output result of the sound detection module,
if the microphone input end sound detection result is soundless, then the self-adaptive filtering is not carried out, the coefficient updating is not carried out, and the output is directly madeCompleting the frame processing;
if the detection result of the microphone input end is voiced, then the detection result of the loudspeaker output end is seen, and if the loudspeaker output end is voicedDetecting silence, the adaptive filtering is normally carried out, but the coefficient updating is not carried out, and the outputCompleting the frame processing;
if the detection results of the microphone input end and the loudspeaker output end are voiced, the adaptive filter is in a normal working state, namely adaptive filtering is carried out, and coefficient updating is also carried out to obtain outputAnd the updated filter coefficient W (k +1), completing the frame processing.
4. The echo cancellation device according to claim 3, wherein the sound detection module determines whether there is sound by comparing the short-time average amplitude of the sound signals at the microphone input and the speaker output with a noise level, and specifically:
if MicSignal _ avg is larger than NoiseFloor, judging that the microphone line is voiced, otherwise, judging that the microphone line is unvoiced;
the above-mentionedIs the short-time average amplitude of the microphone input signal, whereinAcquiring a sound signal with a frame length of M for a microphone, wherein M is the frame length, and NoiseFlor is an estimated noise level;
if the SpkSignal _ avg is larger than the NoiseFloor, judging that the loudspeaker line is voiced, otherwise, judging that the loudspeaker line is unvoiced;
the above-mentionedThe average amplitude of the signal input to the loudspeaker is a short-time average amplitude,for signals input to the speakers, L is the frame length.
5. The echo cancellation device of claim 1 or 2, further comprising a step size adjustment module configured to detect a coefficient update step size μ of the adaptive filter, and to decrease the value of μ when μ is greater than a set maximum coefficient update step size threshold.
6. The echo cancellation device according to claim 5, wherein the coefficient update step is restored to the initial value when it is detected that the update step of the adaptive filter coefficients is restored to normal.
7. The echo cancellation device according to claim 1 or 2, further comprising a coefficient adjustment module configured to decrease the filter coefficient w (k) when detecting that the coefficient w (k) of the adaptive filter is greater than a set coefficient threshold.
8. The echo cancellation device according to claim 1 or 2, further comprising: and the nonlinear processing module is used for suppressing nonlinear components in the echo.
9. The echo cancellation device of claim 8, wherein the non-linear processing module causes the non-linear processing module to perform the processing when e (e) > nlpffloor
Where e is a residual signal and is also an input of the nonlinear processing module, and an output of the nonlinear processing module is e', e (e) is a short-time average amplitude of the residual signal, and nlpfolor is a decision level.
10. The echo cancellation device of claim 8, wherein e' is directly replaced with comfort noise when e (e) ≦ nlpffloor.
11. The echo cancellation device according to claim 8, further comprising:
the loudspeaker sound detection module is used for detecting the sound condition of the output end of the loudspeaker;
the nonlinear processing control module is used for turning on or off the nonlinear processing module according to the output result of the loudspeaker sound detection module;
when the loudspeaker sound detection module detects that the output end of the loudspeaker is sound, namely SpkSignal _ avg is larger than NoiseFlor,
when the signal at the output end of the loudspeaker is larger than the residual signal by alpha times, namely SpkSignal _ avg/Ee is larger than alpha, the nonlinear processing module is started;
if one of the two conditions is not met, the NLP processing is closed;
wherein: SpkSignal _ avg is the short-time average amplitude of the loudspeaker output signal, noiseflo is the estimated noise level, and e (e) is the short-time average amplitude of e.
12. A microphone echo cancellation method, using a frequency domain adaptive filtering method to cancel an echo d generated by a sound signal u from a far end through an acoustic loop between a loudspeaker and a microphone, and finally obtaining a residual echo e, wherein the coefficient of a time domain filter is w (k), the length of the time domain filter is M, and the coefficient of a corresponding frequency domain filter is: the length is 2M, and an overlapping reservation method is adopted;
it is characterized in that the preparation method is characterized in that,
1) setting the data frame length L of the signal u acquired each time;
3) the current frameMerging the data with the previous 2M-L data into a large frame with the length of 2M
4) Will be described inConversion to the frequency domain, overlap-preserving, with filter coefficients W (k) for said frequency domainFiltering is carried out, the result is converted into a time domain, and a predicted value of the echo time domain is obtained
6) According to the aboveAndupdating the filter coefficient W (k) to obtain W (k + 1);
7) and 2) acquiring the next frame of signals, merging the signals, and performing frequency domain adaptive filtering by using the updated filter coefficient until the data input is finished.
13. The method of claim 12,
the frequency domain adaptive filtering algorithm comprises the following steps:
1) frame length adjustment, namely adjusting the frame length of u from M to a positive integer value L smaller than M;
3) Will be described inThe L data in the frame and the immediately previous 2M-L data are combined to form a large frame with the length of 2M
u (kL-2M + L) is the 2M-L data before the original k frame,
u (kL-2) is the 2 nd data before the original k frame,
u (kL-1) is the previous data of the original k-th frame,
u (kL) is the 1 st data in the original k frame,
u (kL + L-1) is the L-th data in the original k-th frame;
4) will be provided withPerforming FFT processing, and converting into a frequency domain to obtain:
5) filtering U (k) by current filter coefficient W (k) by overlap preserving method, i.e. multiplying in frequency domain, then taking the last M data of the result after IFFT processing to the result, and recording asNamely, the following steps are provided:
6) after being played by the loudspeaker, the u passes through an acoustic loop between the loudspeaker and the microphone, and then is collected by the microphone to obtain an echo signal with the length of M so as to obtain a signalRepresents, i.e.:
the above-mentionedAnd the step 5) mentioned aboveSubtracting to obtain an error signalComprises the following steps:
8) at said length M, not interceptedM zeros are supplemented in advance, and FFT processing is carried out to obtain:
simultaneously conjugating the U (k) in the step 4) to obtain UH(k) Then, performing dot multiplication with the E (k), and performing IFFT operation on the result, and obtaining the following result according to an overlap preservation method:
in the above formula, the next frame of the result needs to be deleted, and only the first M points of the IFFT result are taken;
9) in the above-mentionedThen, M zeros are complemented, FFT processing is carried out, the result is multiplied by the self-adaptive step size mu, the obtained product is added with the filter coefficient W (k), and the updated value of the filter coefficient in the frequency domain form can be obtained as follows:the next time of adaptive filtering, namely, the updated filter coefficient W (k +1) is adopted for filtering;
10) and step 2) is executed until the sound signal from the far end is input.
14. The method of claim 12 or 13, wherein the value of L is M/n, and n is an integer greater than 1.
15. The method according to claim 12 or 13, characterized by further comprising a step of detecting presence of sound and a step of controlling filtering before the step 1), comprising:
a sound detection step, detecting sound conditions of the microphone input end and the loudspeaker output end;
a filtering control step of controlling the operation of a filter according to the result of the voiced sound detection step;
the method specifically comprises the following steps:
if the detection result of the microphone input end is silent, then the self-adaptive filtering is not carried out, the coefficient updating is not carried out, and the output is directly madeCompleting the frame processing;
if the detection result of the microphone input end is voiced, then looking at the detection result of the loudspeaker output end, if the detection result of the loudspeaker output end is unvoiced, then the self-adaptive filtering is normally carried out, but the coefficient updating is not carried out, and the output is carried outCompleting the frame processing;
if the detection results of the microphone input end and the loudspeaker output end are voiced, the adaptive filter is in a normal working state, and not only is adaptive filtering carried out, but also coefficient updating is carried out, and the adaptive filter is outputCompleting the frame processing;
16. The method of claim 15, wherein the sound detection is performed by comparing the short-time average amplitude of the sound signals at the microphone input and the speaker output with a noise level to determine whether sound is present, specifically:
if MicSignal _ avg is larger than NoiseFloor, judging that the microphone line is voiced, otherwise, judging that the microphone line is unvoiced;
the above-mentionedThe short-time average amplitude of the microphone input signal,for a microphone input signal, i.e., a received echo signal, M is the length of a frame of speech signal, noise floor is the estimated noise level;
if the SpkSignal _ avg is larger than the NoiseFloor, judging that the loudspeaker line is voiced, otherwise, judging that the loudspeaker line is unvoiced;
17. The method according to claim 12 or 13, further comprising a step size adjustment step of decreasing the coefficient update step size of the adaptive filter when it is detected that the coefficient update step size of the adaptive filter is greater than a set maximum coefficient update step size threshold.
18. The method of claim 17, wherein the coefficient update step size is restored to the initial value upon detecting that the update step size of the adaptive filter coefficients is restored to normal.
19. The method according to claim 12 or 13, further comprising a coefficient adjusting step for reducing the coefficients of the filter when it is detected that the coefficients of the adaptive filter are greater than a set coefficient threshold.
20. The method according to claim 12 or 13, further comprising a non-linear processing step of:
firstly, calculating the short-time average amplitude E (e) of the minimized residual signal;
then, whether E (e) is larger than a preset nonlinear processing threshold NLPfloor is judged, and if yes, the minimized residual noise e' (n) is calculated by using the following formula:
wherein e is a residual signal and is an input of the nonlinear processing module, an output of the nonlinear processing module is e', e (e) is a short-time average amplitude of the residual signal, and nlpfolor is a decision level.
21. The method of claim 20, wherein if e (e ≦ nlpffloor, e' is directly replaced with comfort noise.
22. The method according to claim 20, further comprising a step of nonlinear processing of the switch control, in particular:
detecting the sound condition of the output end of the loudspeaker;
turning on or off the nonlinear processing step according to the detection result, specifically:
when the output end of the loudspeaker is detected to be voiced, namely SpkSignal _ avg is larger than NoiseFlor, and the signal of the output end of the loudspeaker is larger than a residual signal by alpha times, namely SpkSignal _ avg/E [ E ] > alpha, the nonlinear processing module is started;
if one of the two conditions is not met, the NLP processing is closed; wherein: SpkSignal _ avg is the short-time average amplitude of the speaker output signal, noise floor is the estimated noise level, e (e) is the short-time average amplitude of the residual signal, and α is a preset multiple value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2006101440555A CN100524466C (en) | 2006-11-24 | 2006-11-24 | Echo elimination device for microphone and method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2006101440555A CN100524466C (en) | 2006-11-24 | 2006-11-24 | Echo elimination device for microphone and method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1953060A true CN1953060A (en) | 2007-04-25 |
CN100524466C CN100524466C (en) | 2009-08-05 |
Family
ID=38059354
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2006101440555A Expired - Fee Related CN100524466C (en) | 2006-11-24 | 2006-11-24 | Echo elimination device for microphone and method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100524466C (en) |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101192411B (en) * | 2007-12-27 | 2010-06-02 | 北京中星微电子有限公司 | Large distance microphone array noise cancellation method and noise cancellation system |
CN101771925A (en) * | 2008-12-30 | 2010-07-07 | Gn瑞声达A/S | Hearing instrument with improved initialisation of parameters of digital feedback suppression circuitry |
CN101888455A (en) * | 2010-04-09 | 2010-11-17 | 熔点网讯(北京)科技有限公司 | Self-adaptive echo counteracting method for frequency domain |
CN102131014A (en) * | 2010-01-13 | 2011-07-20 | 歌尔声学股份有限公司 | Device and method for eliminating echo by combining time domain and frequency domain |
CN102204276A (en) * | 2008-11-05 | 2011-09-28 | 雅马哈株式会社 | Sound emission and collection device, and sound emission and collection method |
CN101217039B (en) * | 2008-01-08 | 2011-11-23 | 北京中星微电子有限公司 | A method, system and device for echo elimination |
CN102387272A (en) * | 2011-09-09 | 2012-03-21 | 南京大学 | Restraining method for residual echo in echo cancellation system |
CN102413384A (en) * | 2011-11-16 | 2012-04-11 | 杭州艾力特音频技术有限公司 | Echo cancellation two-way voice talk back equipment |
CN102956236A (en) * | 2011-08-15 | 2013-03-06 | 索尼公司 | Information processing device, information processing method and program |
CN103366757A (en) * | 2012-04-09 | 2013-10-23 | 广达电脑股份有限公司 | Communication system and method with echo cancellation mechanism |
CN106067301A (en) * | 2016-05-26 | 2016-11-02 | 浪潮(苏州)金融技术服务有限公司 | A kind of method using multidimensional technology to carry out echo noise reduction |
CN106664481A (en) * | 2014-03-19 | 2017-05-10 | 思睿逻辑国际半导体有限公司 | Non-linear control of loudspeakers |
CN106716527A (en) * | 2014-07-31 | 2017-05-24 | 皇家Kpn公司 | Noise suppression system and method |
CN106713685A (en) * | 2016-11-25 | 2017-05-24 | 东莞市嘉松电子科技有限公司 | Hands-free communication control method |
CN106910500A (en) * | 2016-12-23 | 2017-06-30 | 北京第九实验室科技有限公司 | The method and apparatus of Voice command is carried out to the equipment with microphone array |
CN107017004A (en) * | 2017-05-24 | 2017-08-04 | 建荣半导体(深圳)有限公司 | Noise suppressing method, audio processing chip, processing module and bluetooth equipment |
CN107071197A (en) * | 2017-05-16 | 2017-08-18 | 中山大学花都产业科技研究院 | A kind of echo removing method and system based on the piecemeal frequency domain of delay more than all phase |
CN107123430A (en) * | 2017-04-12 | 2017-09-01 | 广州视源电子科技股份有限公司 | Echo cancellation method, device, conference tablet and computer storage medium |
CN107393546A (en) * | 2017-09-04 | 2017-11-24 | 恒玄科技(上海)有限公司 | A kind of echo cancel method and speech recognition apparatus for speech recognition process |
CN108986836A (en) * | 2018-08-29 | 2018-12-11 | 质音通讯科技(深圳)有限公司 | A kind of control method of echo suppressor, device, equipment and storage medium |
CN109102821A (en) * | 2018-09-10 | 2018-12-28 | 苏州思必驰信息科技有限公司 | Delay time estimation method, system, storage medium and electronic equipment |
CN109215672A (en) * | 2017-07-05 | 2019-01-15 | 上海谦问万答吧云计算科技有限公司 | A kind of processing method of acoustic information, device and equipment |
CN109346096A (en) * | 2018-10-18 | 2019-02-15 | 深圳供电局有限公司 | Echo cancellation method and device for voice recognition process |
WO2019128402A1 (en) * | 2017-12-26 | 2019-07-04 | 深圳Tcl新技术有限公司 | Method, system and storage medium for solving echo cancellation failure |
CN110024025A (en) * | 2016-11-23 | 2019-07-16 | 哈曼国际工业有限公司 | Dynamic stability control system based on coherence |
CN110225214A (en) * | 2014-04-02 | 2019-09-10 | 想象技术有限公司 | Control method, attenuation units, system and the medium fed back to sef-adapting filter |
CN110838300A (en) * | 2019-11-18 | 2020-02-25 | 紫光展锐(重庆)科技有限公司 | Echo cancellation processing method and processing system |
CN110913310A (en) * | 2018-09-14 | 2020-03-24 | 成都启英泰伦科技有限公司 | Echo cancellation method for broadcast distortion correction |
CN111091846A (en) * | 2019-12-26 | 2020-05-01 | 江亨湖 | Noise reduction method and echo cancellation system applying same |
CN111341336A (en) * | 2020-03-16 | 2020-06-26 | 北京字节跳动网络技术有限公司 | Echo cancellation method, device, terminal equipment and medium |
-
2006
- 2006-11-24 CN CNB2006101440555A patent/CN100524466C/en not_active Expired - Fee Related
Cited By (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101192411B (en) * | 2007-12-27 | 2010-06-02 | 北京中星微电子有限公司 | Large distance microphone array noise cancellation method and noise cancellation system |
CN101217039B (en) * | 2008-01-08 | 2011-11-23 | 北京中星微电子有限公司 | A method, system and device for echo elimination |
CN102204276B (en) * | 2008-11-05 | 2015-04-15 | 雅马哈株式会社 | Sound emission and collection device, and sound emission and collection method |
US8855327B2 (en) | 2008-11-05 | 2014-10-07 | Yamaha Corporation | Sound emission and collection device and sound emission and collection method |
CN102204276A (en) * | 2008-11-05 | 2011-09-28 | 雅马哈株式会社 | Sound emission and collection device, and sound emission and collection method |
CN101771925B (en) * | 2008-12-30 | 2013-07-31 | Gn瑞声达A/S | Hearing instrument with improved initialisation of parameters of digital feedback suppression circuitry |
CN101771925A (en) * | 2008-12-30 | 2010-07-07 | Gn瑞声达A/S | Hearing instrument with improved initialisation of parameters of digital feedback suppression circuitry |
CN102131014A (en) * | 2010-01-13 | 2011-07-20 | 歌尔声学股份有限公司 | Device and method for eliminating echo by combining time domain and frequency domain |
CN101888455B (en) * | 2010-04-09 | 2013-07-03 | 熔点网讯(北京)科技有限公司 | Self-adaptive echo counteracting method for frequency domain |
CN101888455A (en) * | 2010-04-09 | 2010-11-17 | 熔点网讯(北京)科技有限公司 | Self-adaptive echo counteracting method for frequency domain |
CN102956236A (en) * | 2011-08-15 | 2013-03-06 | 索尼公司 | Information processing device, information processing method and program |
CN102387272A (en) * | 2011-09-09 | 2012-03-21 | 南京大学 | Restraining method for residual echo in echo cancellation system |
CN102413384A (en) * | 2011-11-16 | 2012-04-11 | 杭州艾力特音频技术有限公司 | Echo cancellation two-way voice talk back equipment |
CN103366757A (en) * | 2012-04-09 | 2013-10-23 | 广达电脑股份有限公司 | Communication system and method with echo cancellation mechanism |
CN106664481B (en) * | 2014-03-19 | 2019-06-07 | 思睿逻辑国际半导体有限公司 | The nonlinear Control of loudspeaker |
CN106664481A (en) * | 2014-03-19 | 2017-05-10 | 思睿逻辑国际半导体有限公司 | Non-linear control of loudspeakers |
CN110225214A (en) * | 2014-04-02 | 2019-09-10 | 想象技术有限公司 | Control method, attenuation units, system and the medium fed back to sef-adapting filter |
CN110225214B (en) * | 2014-04-02 | 2021-05-28 | 想象技术有限公司 | Method, attenuation unit, system and medium for attenuating a signal |
CN106716527B (en) * | 2014-07-31 | 2021-06-08 | 皇家Kpn公司 | Noise suppression system and method |
CN106716527A (en) * | 2014-07-31 | 2017-05-24 | 皇家Kpn公司 | Noise suppression system and method |
CN106067301A (en) * | 2016-05-26 | 2016-11-02 | 浪潮(苏州)金融技术服务有限公司 | A kind of method using multidimensional technology to carry out echo noise reduction |
CN106067301B (en) * | 2016-05-26 | 2019-06-25 | 浪潮金融信息技术有限公司 | A method of echo noise reduction is carried out using multidimensional technology |
CN110024025B (en) * | 2016-11-23 | 2023-05-23 | 哈曼国际工业有限公司 | Dynamic stability control system based on coherence |
CN110024025A (en) * | 2016-11-23 | 2019-07-16 | 哈曼国际工业有限公司 | Dynamic stability control system based on coherence |
CN106713685A (en) * | 2016-11-25 | 2017-05-24 | 东莞市嘉松电子科技有限公司 | Hands-free communication control method |
CN106910500A (en) * | 2016-12-23 | 2017-06-30 | 北京第九实验室科技有限公司 | The method and apparatus of Voice command is carried out to the equipment with microphone array |
CN107123430A (en) * | 2017-04-12 | 2017-09-01 | 广州视源电子科技股份有限公司 | Echo cancellation method, device, conference tablet and computer storage medium |
WO2018188282A1 (en) * | 2017-04-12 | 2018-10-18 | 广州视源电子科技股份有限公司 | Echo cancellation method and device, conference tablet computer, and computer storage medium |
CN107123430B (en) * | 2017-04-12 | 2019-06-04 | 广州视源电子科技股份有限公司 | Echo cancellation method, device, conference tablet and computer storage medium |
CN107071197B (en) * | 2017-05-16 | 2020-04-24 | 中山大学花都产业科技研究院 | Echo cancellation method and system based on full-phase multi-delay block frequency domain |
CN107071197A (en) * | 2017-05-16 | 2017-08-18 | 中山大学花都产业科技研究院 | A kind of echo removing method and system based on the piecemeal frequency domain of delay more than all phase |
CN107017004A (en) * | 2017-05-24 | 2017-08-04 | 建荣半导体(深圳)有限公司 | Noise suppressing method, audio processing chip, processing module and bluetooth equipment |
CN109215672B (en) * | 2017-07-05 | 2021-11-16 | 苏州谦问万答吧教育科技有限公司 | Method, device and equipment for processing sound information |
CN109215672A (en) * | 2017-07-05 | 2019-01-15 | 上海谦问万答吧云计算科技有限公司 | A kind of processing method of acoustic information, device and equipment |
CN107393546A (en) * | 2017-09-04 | 2017-11-24 | 恒玄科技(上海)有限公司 | A kind of echo cancel method and speech recognition apparatus for speech recognition process |
WO2019128402A1 (en) * | 2017-12-26 | 2019-07-04 | 深圳Tcl新技术有限公司 | Method, system and storage medium for solving echo cancellation failure |
US11276416B2 (en) | 2017-12-26 | 2022-03-15 | Shenzhen Tcl New Technology Co., Ltd. | Method, system and storage medium for solving echo cancellation failure |
CN108986836A (en) * | 2018-08-29 | 2018-12-11 | 质音通讯科技(深圳)有限公司 | A kind of control method of echo suppressor, device, equipment and storage medium |
CN109102821A (en) * | 2018-09-10 | 2018-12-28 | 苏州思必驰信息科技有限公司 | Delay time estimation method, system, storage medium and electronic equipment |
CN110913310A (en) * | 2018-09-14 | 2020-03-24 | 成都启英泰伦科技有限公司 | Echo cancellation method for broadcast distortion correction |
CN109346096A (en) * | 2018-10-18 | 2019-02-15 | 深圳供电局有限公司 | Echo cancellation method and device for voice recognition process |
CN109346096B (en) * | 2018-10-18 | 2021-07-06 | 深圳供电局有限公司 | Echo cancellation method and device for voice recognition process |
CN110838300A (en) * | 2019-11-18 | 2020-02-25 | 紫光展锐(重庆)科技有限公司 | Echo cancellation processing method and processing system |
CN110838300B (en) * | 2019-11-18 | 2022-03-25 | 紫光展锐(重庆)科技有限公司 | Echo cancellation processing method and processing system |
CN111091846B (en) * | 2019-12-26 | 2022-07-26 | 江亨湖 | Noise reduction method and echo cancellation system applying same |
CN111091846A (en) * | 2019-12-26 | 2020-05-01 | 江亨湖 | Noise reduction method and echo cancellation system applying same |
CN111341336A (en) * | 2020-03-16 | 2020-06-26 | 北京字节跳动网络技术有限公司 | Echo cancellation method, device, terminal equipment and medium |
CN111341336B (en) * | 2020-03-16 | 2023-08-08 | 北京字节跳动网络技术有限公司 | Echo cancellation method, device, terminal equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN100524466C (en) | 2009-08-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100524466C (en) | Echo elimination device for microphone and method thereof | |
US7003099B1 (en) | Small array microphone for acoustic echo cancellation and noise suppression | |
US6597787B1 (en) | Echo cancellation device for cancelling echos in a transceiver unit | |
US7773759B2 (en) | Dual microphone noise reduction for headset application | |
JP5049277B2 (en) | Method and system for clear signal acquisition | |
US9264807B2 (en) | Multichannel acoustic echo reduction | |
EP3080975B1 (en) | Echo cancellation | |
EP0843934B1 (en) | Arrangement for suppressing an interfering component of an input signal | |
EP1855457A1 (en) | Multi channel echo compensation using a decorrelation stage | |
JP5148150B2 (en) | Equalization in acoustic signal processing | |
US20040264610A1 (en) | Interference cancelling method and system for multisensor antenna | |
EP1081985A2 (en) | Microphone array processing system for noisly multipath environments | |
JPH09504668A (en) | Variable block size adaptive algorithm for noise-resistant echo canceller | |
JP2002501337A (en) | Method and apparatus for providing comfort noise in a communication system | |
US11189297B1 (en) | Tunable residual echo suppressor | |
CN102185991A (en) | Echo cancellation method, system and device | |
US20180308503A1 (en) | Real-time single-channel speech enhancement in noisy and time-varying environments | |
Albu et al. | The hybrid simplified Kalman filter for adaptive feedback cancellation | |
EP3692703A1 (en) | Echo canceller and method therefor | |
JPH09307625A (en) | Sub band acoustic noise suppression method, circuit and device | |
CN107005268B (en) | Echo cancellation device and echo cancellation method | |
EP2930917B1 (en) | Method and apparatus for updating filter coefficients of an adaptive echo canceller | |
Bulling et al. | Stepsize Control for Acoustic Feedback Cancellation Based on the Detection of Reverberant Signal Periods and the Estimated System Distance. | |
Yang | Multilayer adaptation based complex echo cancellation and voice enhancement | |
US6507623B1 (en) | Signal noise reduction by time-domain spectral subtraction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20090805 Termination date: 20201124 |
|
CF01 | Termination of patent right due to non-payment of annual fee |