Background technology
Echo eliminator
Echo eliminator can be a kind of equipment, also can be a kind of software, and it eliminates echo-signal by using reference signal.Reference signal is called received signal (Rx signal) again.Echo-signal is called echo return signal again, is mingled in transmission signals (Tx signal) lining.Echo eliminator has two kinds of main types: a kind of acoustic echo arrester (AEC) that is called, another kind is called linear echo canceller (LEC).Obviously, the acoustic echo arrester is to be used to eliminate acoustic echo, and linear echo canceller then is to be used to eliminate linear echo.
The reason that linear echo produces is because there are disequilibrium in hybrid circuit and impedance when two to four line signal transformations.Acoustic echo appears in the telecommunication apparatus, and its generation is because the echo of sound combination is transmitted the tieback debit when providing full duplex to connect; This problem also appears in the videoconference-and how converse and eliminate the noise that transmits by other people in the bridge building two or many people.In the speech exchange process, some echo is an acceptable, but the user is unwilling to hear the sound of speech of oneself usually; Promptly be subjected to the influence of system two-way time and sluggish to some extent echo.
Echo eliminator is a sef-adapting filter normally, and combines a lowest mean square (LMS) algorithm, and this algorithm can produce the echo replica signal that is similar to echo-signal.Between echo return signal and echo self-control signal, also there is a subtraction, is used to eliminate echo return signal.For various reasons, reproducing signals can not duplicate echo return signal fully, therefore also can have some remaining echoes in transmission signals.The echo-signal inhibitor is a kind of equipment that can reduce or eliminate echo effectively, and it is particularly useful for reducing or eliminating the residual echo in the signal that echo eliminator was handled.Because all relying on very much, echo eliminator built-in problem, various solutions add the echo-signal inhibitor.
The echo-signal inhibitor
Echo suppressor can be a kind of equipment, also can be a kind of software, and it can reduce (remnants) backward energy effectively and can not make the obvious distortion of non-echo voice signal.Although echo suppressor can be independent of echo eliminator work, it still will be used in combination with echo eliminator usually.Echo suppressor not only can suppress the residual echo energy effectively, but also can eliminate the background noise energy by some useful parameters.Echo suppresses to be considered to a kind of independently function or only is a part of echo cancellation system.
Existing echo inhibition method realizes by following approach mostly:
The inhibition of frequency spectrum echo is one of normal method of using.This method is fairly simple, and it is converted to frequency domain by the FFT operation with time-domain signal.When practice, it need adjust all parameters meticulously, thereby avoids " the music noise " that may occur before the frequency spectrum of revising is converted back to time-domain.
Another kind of method relatively more commonly used is called nonlinear processor method (NLP), and this method is replaced the residual echo signal with random noise, perhaps the reverse transfer of shutoff signal under the situation of one-man's speech.The NLP method is simple, but need compare accurate detection to both-end sound (Double Talk) and residual echo.Owing to be difficult to obtain a more accurate both-end voice detector or more perfect residual echo detection mode, therefore this method can produce " discontinuity " usually in noise circumstance.
Summary of the invention
The objective of the invention is in order to overcome the weak point of above-mentioned existing echo-signal inhibition method, propose a kind ofly not only can suppress residual echo but also can suppress background noise, even go for the both-end acoustic environment, and when practical application, do not introduce any " discontinuity ", can not hear the effectiveness height of tangible voice distortion and a kind of effective echo suppressor that is easy to realize yet.
A kind of effective echo suppressor of the present invention includes two adaptive gain factor G
r(RSR) and G
n(NSR), self adaptation filter unit at zero point A
1(z) and an adaptive pole filter unit A
2(z), it is characterized in that:
Gain factor G
r(RSR) control by RSR;
Gain factor G
n(NSR) control by NSR;
Filter unit A
1(z) by LSF
1Convert; LSF
1Be based on initial setting up LSF
TxFirst revision;
Filter unit A
2(z) by LSF
2Convert; LSF
2Be based on initial setting up LSF
TxSecond revision.
Described RSR is the ratio of residual echo level and signal level; Signal level is meant the Tx signal level of present frame or subframe; The residual echo level is calculated by the product of RRR and current received signal (Rx) level.
Described RRR is the mean ratio of residual echo level and received signal (Rx) level.
Described NSR is the ratio of background-noise level and current demand signal (Tx) level.
Described LSF is meant linear spectral (Line Spectral Frequencies) frequently.
The above-mentioned initial setting up LSF before modification
TxThe lpc analysis that is based on the Tx signal draws.
Described initial setting up LSF
TxFirst revision and initial setting up LSF
TxSecond revision be subjected to RSR, NSR, LSF
EchoAnd LSF
NoisControl.
Described LSF
EchoEstimation can be by LSF in the pure residual echo district of Tx signal
RxAnd LSF
TxBetween relation draw; Here LSF
RxThe lpc analysis that is based on the Rx signal draws.
The LSF of described background noise
NoisThe lpc analysis that is based on Tx signal background noise range draws.
The present invention has and not only can suppress residual echo but also can suppress background noise, even goes for the both-end acoustic environment, and does not introduce any advantages such as " discontinuities " when practical application, thereby also can not hear tangible voice distortion.It is the effective echo suppressor that a kind of effectiveness is high and be easy to realize.
Embodiment
In conjunction with the accompanying drawings the present invention is further described as follows:
Following content is illustrated the special relevant information of this echo suppressor.But the expert can understand that this invention can be in conjunction with different algorithm application in some different occasions.The details that general professional person understands in some industries will be in this discussion, so as not to the mould lake emphasis of the present invention.
The content of this paper accompanying drawing and related description introduction thereof only is concrete exemplary applications more of the present invention.For asking concise and to the point, the example of other same application principles that the present invention relates to is not done detailed diagram and explanation at this.
Fig. 1 is an example of acoustic echo arrester (AEC) system 112.Echo-signal 111 transmits and is back to microphone 101 by loud speaker 110.The signal of delivering to loud speaker is called the reference signal of received signal 109 or acoustic echo arrester 103 and echo suppressor 105.Enter acoustic echo arrester (AEC) and be called transmission signals 102 (Tx signal) by the signal that microphone (MIC) 101 spreads out of, this transmission signals has echo return signal (being derived from loud speaker).Acoustic echo arrester (AEC) 103 and echo suppressor 105 are responsible for eliminating or suppressing echo-signal.Acoustic echo arrester (AEC) produces the reproducing signals that is similar to echo return signal by reference signal Rx 109, thereby echo return signal is eliminated or suppressed.Residual echo signal 104 will further be suppressed by a post-processing module 105, and this module can be echo suppressor or nonlinear processor (NLP).
Fig. 2 is an example of linear echo canceller (LEC) system 209.This is a typical example, and wherein, echo-signal 211 is penetrated from returning of phone blender 210; Send signal Tx1201 and contain echo return signal; By the echo eliminator 202 of LEC system 209 and echo suppressor 204 this echo return signal 211 is eliminated then or suppressed.The operation principle of LEC is similar to AEC.One of them main difference is the difference of echo path.And the sluggish scope of its echo also may be different.Echo eliminator produces the reproducing signals that is similar to echo return signal by reference signal Rx 208, thereby echo return signal is eliminated or suppressed.Residual echo signal Tx2 203 will further be suppressed by a post-processing module 204, and this module can be echo suppressor or nonlinear processor (NLP).
Fig. 3 is the schematic diagram of the signal spectrum envelope of a both-end voice example, and residual echo signal wherein is mixed in non-echo voice signal.302 what show is the spectrum envelope of mixed signal.The 301st, suppose the spectrum envelope of no echo voice signal.Under the situation, residue signal formant 303 is less than speech resonant peak 304 mostly.Compared with the residue signal of individualism,, then more difficult it is suppressed if the residual echo in the both-end speech region is blended in the voice signal; This is to guarantee that again voice signal can distortion because should suppress the residual echo signal.
It is expression Rx signals (reference signal) 401 that Fig. 4-6 has introduced 3 signal example: Fig. 4, Fig. 5 is the preceding Tx signal 402 of input (remnants) echo suppressor (202 among Fig. 2), and Fig. 6 is the Tx signal 406 from (remnants) echo suppressor (204 Fig. 2) output.As shown in Figure 5,403 is the both-end speech region, and effective (remnants) echo suppressor both can be removed residual echo signal 404 under the normal situation of operation, can remove background noise 405 again.
The present invention proposes the echo inhibition method that a kind of effectiveness is high and be easy to realize.It is realized by the Filtering Processing of gain controlling, can explain with following Filtering Model:
Wherein, G
n() is a gain, and this gain is the function of NSR (or SNR), and NSR is defined as the ratio of background-noise level and signal level.The numerical value of NSR can be by the Tx signal before the measurement echo suppressor and in conjunction with VAD (voice activity detection) information acquisition.G
r() also is a gain, and this gain is the function of FSR, and RSR is defined as the ratio of residual echo level and signal level in the Tx signal and estimates, this estimation more complicated, and this paper back can be explained in details it.A
1(z) and A
2(z) be linear predictor, be made up of the LPC coefficient, the LPC coefficient then is from LSF
1And LSF
2Convert, wherein, LSF refers to linear spectral (Line Spectral Frequencies) frequently.LPC coefficient and LSF are parameter well known in the field of voice signal, and they are through being usually used in representing spectrum envelope.LSF
1By LSF
TxModification first draw, wherein, LSF
TxCarrying out lpc analysis according to the Tx signal calculates; LSF
2By LSF
TxSecond revision draw.The modification of LSF is subjected to parameter S NR, RSR and another group LSF
RxControl, this group LSF
RxBe to calculate according to the Rx signal is carried out lpc analysis.
In equation (1), gain can not drop to 0 usually, but enough little at pure echo area, like this, pure echo just can't be heard; The main contribution of gain is can reduce echo or noise energy significantly to non-both-end speech region.In the normal voice district or the both-end speech region, gain factor is littler than 1 usually, and depends on parameter N SR and RSR.Because the variation of NSR and RSR is smoother and slow, and the variation of gain factor also is consistent with it, so just can avoid generation " discontinuity ".
LPC filter unit A in the equation (1)
1(z) and A
2(z) be mainly used in the residual echo formant (as Fig. 3) that suppresses the both-end speech region or reduce the noise or the spectrum amplitude of low SNR speech region.Because LPC filter unit A
1(z) and A
2(z) parameter changes smoother and slow, has therefore just avoided " discontinuity ", thereby also can not hear tangible voice distortion.
Foregoing is set forth basic principle of the present invention.Hereinafter will make detailed explanation to the present invention.
The estimation of NSR (or SNR), this value is defined as the ratio of background-noise level and current Tx signal level.This parameter just can be determined by a method commonly used.Background noise refers to the recent average background noise level when only having background noise in the Tx signal.Signal level refers to the present frame or the subframe signal level of Tx signal.When having only background noise to exist, the NSR value is approximately 1, is approximately 0 dB in the dB territory.In speech region, the NSR value is less than 1.
The residual echo input refers to detect most of residual echo signaling zone under the situation of having only residual echo signal and noise to exist.This detecting unit need not be accurate, because it will only be used to estimate that backward energy compares the average loss of Rx signal energy.After the sluggishness between Rx signal and the echo return signal was detected, Rx signal and remaining inverse signal all were synchronized in echo eliminator.If there is no the both-end voice are compared with original Rx signal energy, and the energy of residual echo signal reduces highly significant behind the elementary echo arrester.This information can be used to survey most of residual echo signal.
The estimated value of RSR, this value is defined as the ratio of residual echo level and current Tx signal level.Signal level still refers to the present frame or the subframe signal level of Tx signal.Calculate residual echo level more complicated.If there is not residual echo, then the residual echo level is 0.At first, only under the situation situation that residual echo exists, the average loss of energy of residual echo (energy reduces) estimates in corresponding zone, and it is defined as a ratio of residual echo level and corresponding received signal level (Rx signal level).Energy level can direct representation, also can represent in the dB territory.The calculation expression of its mean ratio (or sliding average) is as follows:
Therefore, current residual echo energy level can be estimated by following formula:
Current residual echo energy level=(RRR) (current Rx signal energy level) (3)
According to above formula,, also can estimate current residual echo energy level even in the both-end speech region.Therefore, RSR can calculate according to following formula:
According to following formula, the RSR value is about 1 in pure residual echo district, be less than 1 in the both-end speech region.
Gain G
n(NSR), it can be linear function or the nonlinear function of parameter N SR.Below be an example of linear function:
G
n(NSR)=1-C
n·NSR (5)
Wherein, C
nBe constant: a 0<C
n<1
Gain G
r(RSR), can be linear function or the nonlinear function of parameters R RR.Below be an example of linear function:
G
r(RSR)=1-C
r·RSR (6)
Wherein, G
rConstant: a 0<C
r<1
The LSF of Tx signal is expressed as LSF
Tx(i), i=0,1 ..., M-1; Its estimation is based on the lpc analysis to the Tx signal.In sample rate is under the situation of 8kHz, and for narrow-band signal, the representative value of exponent number (M) is approximately 10.
The LSF of noise signal is expressed as LSF
Nois(i), i=0,1 ..., M-1; Its estimation is based on LSF in the Tx signal background noise range
Tx(i) mean value (or sliding average).
The LSF of Rx signal is expressed as LSF
Rx(i), i=0,1 ..., M-1; Its estimation is based on the lpc analysis to the Rx signal.
(remnants) echo-signal LSF is expressed as LSF
Echo(i), i=0,1 ..., M-1; When the residual echo signal is mingled in the voice signal, LSF
Echo(i) estimation in the both-end speech region is more difficult.For example, LSF
Echo(i) can pass through by LSF
Rx(i) calculate and draw.At first, factor P (i) is calculated in estimation; It is LSF in pure residual echo district
Rx(i) and LSF
Tx(i) the recent mean ratio between (or moving average ratio):
Then, estimate the current LSF of residual echo according to following formula
Echo(i) value:
LSF
echo(i)=P(i)·LSF
Rx(i),i=0,1,…,M-1 (8)
Wherein, LSF
Rx(i) be the current linear spectral frequency of Rx signal.
LPC predictive factor A
1(z) and A
2(z) respectively by LSF
1(i) and LSF
2(i) convert, i=0 here, 1 ..., M-1.LSF
1(i) and LSF
2(i) estimation all is based on LSF
Tx(i) modification.Revise and mainly be subjected to LSF
Echo(i), LSF
Nois(i), NSR, and the influence of RSR.For instance, LSF
1(i) and LSF
2(i) the available mode as follows of formation:
LSF
1(i)=λ
1·LSF
Tx(i)+β·LSF
echo(i)+α·LSF
nois(i),
i=0,1,....,M-1 (9)
LSF
2(i)=λ
2·[LSF
Tx(i)-β·LSF
echo(i)-α·LSF
nois(i)],
i=0,1,....,M-1 (10)
Wherein:
β=C
β·RSR, (11)
α=C
α·NSR,(12)
C
αAnd C
βBe constant; Their value is all greater than 0 but be far smaller than 1.λ
1And λ
2Determine according to following mode:
λ
1=1-β-α,(13)
EC filtering divergence protection refers to the Tx signal level avoided behind echo eliminator and echo suppressor greater than the level before the echo eliminator (annotating Tx1 signal by name).Can pass through two gain factor C in the equation (1)
nAnd C
rThe simply adjustment mode of carrying out guarantees that the output energy of echo suppressor is less than or equal to the energy of Tx1 signal.
The present invention can extend to some other particular form and is applied under the prerequisite that is not departing from essence.Instantiation herein is only as demonstration of the present invention, and the present invention is not limited to these examples.Above content is just to explanations more of the present invention, and hereinafter claim then is a fundamental content of the present invention.Following every claim comprise with its equivalent scope in various explanations and change.