EP1055318A2

EP1055318A2 - Method for improving acoustic noise attenuation in hand-free devices

Info

Publication number: EP1055318A2
Application number: EP99907267A
Authority: EP
Inventors: Gerhard Schmidt
Original assignee: Infineon Technologies AG
Current assignee: Infineon Technologies AG
Priority date: 1998-02-13
Filing date: 1999-01-21
Publication date: 2000-11-29
Also published as: WO1999041897A2; US6834108B1; WO1999041897A3; JP2002503923A; DE19806015C2; DE19806015A1

Abstract

The present invention relates to a method for improving acoustic noise attenuation, wherein said method uses a combination which comprises an adaptation control for the partial-band echo compensation process as well as a global-band post-filtration for suppressing residual echo in hand-free devices. This method also uses a level balance (22) as well as a controllable frequency-selection echo compensation (28) with partial-band processing. After the frequency-selection echo compensation (28), the outputted signal is submitted to a post-filtration in another frequency-selection filter (30) using an Wiener-equation adjustment algorithm (Wiener filtration). A single control value (increment vector) is used for controlling both the frequency-selection echo compensation and the other filter. This method can thus be implemented with a very reduced amount of calculations so that it can also be used in simple consumer-directed processors.

Description

description

_F Ver Ahren for improving the acoustic sidetone attenuation in handsfree

The present invention relates to a method for improving the acoustic attenuation in hands-free devices with a level balance, a frequency-selective, controllable echo compensation with subband processing and residual error post-filtering.

In the case of hands-free systems, it is absolutely necessary to suppress the signals of the remote subscriber which are sent out by the loudspeaker and thus picked up again by the microphone, since otherwise unpleasant echoes disrupt the connection. Up to now, a level balance has been usually provided to suppress these echoes, that is, for acoustic attenuation, which strongly dampens the transmission or reception path depending on the conversation situation. However, this makes two-way communication (full duplex operation) practically impossible.

With the previous technology, attempts have already been made to provide sufficient attenuation despite acceptable intercom characteristics. For this purpose, a frequency-selective, controllable echo compensation was provided in addition to the level balance. In this regard, reference is made to the applicant's unpublished patent application DE 197 14 966. Other methods are, for example, the advertising brochure from NEC "Reflexion ™ Acoustic Echo Canceller on the μPD7701x Family", 1996, or from the description of the Motorola DSP5600x digital processor (M. Knox, P. Abbot, Cyox: A Highly Integrated H320 audio subsystem using the Motorola DSP5600x digital process sor ". However, even with the long signal propagation times of video conference connections or GSM connections, these methods cannot offer sufficient echo suppression if two-way communication is to be possible at the same time.

It has therefore already been proposed to provide additional post-filtering after frequency-selective echo cancellation with subband processing. Such post-filtering is described, for example, in the article "V. Turbin, A. Gilloire, P. Sealart: Comparison Of Three Post-Filtering Algorithm For Residual Acoustic Echo Reduction" ICASSP97, International Workshop on Acoustic Speech and Signal Processing, Munich 1997, or from the article by R. Martin "An improved Echo-shape Algorithm for Acoustic Echo Control", EUSIPC096, 8th European Signal Processing Conference, Trieste, Italy, 1996. These concepts have so far been difficult to implement, since digital signal processing has to be provided both for echo compensation with subband processing and for post-filtering, and the computing power required for this cannot be provided with reasonable effort before the processors currently available.

It is therefore an object of the invention to provide a method for improving the acoustic attenuation in hands-free devices, in which the computational effort is minimized so that both frequency-selective echo cancellation with subband processing and the required post-filtering can be realized on conventional "consumer processors". This object is achieved with a method having the features of claim 1. Advantageous refinements of this method are specified in the subclaims.

According to the invention, therefore, only a single control variable, namely the step size vector, is used both for controlling the frequency-selective echo compensation and for the Control of the additional filter used. Several different sampling rates can preferably be used. This can further reduce the computing effort.

It is also preferred to use adaptive filters both for echo compensation and for the further filter.

The echo cancellation is preferably implemented in frequency subbands by means of a filter bank.

Both performance-based estimates and correlation-based analyzes are preferably used for the adaptation or step size de-control.

It is also preferred to estimate power transmission factors in subbands for determining the step size.

It is also preferred that both the echo cancellers and the residual error post-filtering provide the estimates for the echo attenuation introduced by them, since these estimates can preferably be used to control the attenuation of the level balance. As a result, the attenuation to be introduced by the level balance can be further reduced and the conversation quality in the case of two-way communication can be further improved.

In addition, it is preferred to detect the simultaneous activity of both conversation participants (intercom). It is then possible, for example, to reduce the total attenuation of the level balance in the case of two-way communication in order to further improve the two-way communication capability (full duplex operation) of the hands-free device.

The present invention is described in more detail below with reference to the exemplary embodiment shown in the accompanying drawings. It shows: 1 shows a simplified model of a hands-free device with connection to a digital connection;

Figure 2 is a block diagram of the speakerphone according to the invention;

FIG. 3 curves for the attenuation requirements for the hands-free device as a function of the echo time;

FIG. 4 shows an overview of the method according to the invention;

FIG. 5 shows the structure of the adaptation of the subband echo compensators;

FIG. 6 shows a model for the power transmission factors;

FIG. 7 shows the signals of the distant and the local subscriber on the basis of which the method according to the invention is explained below;

FIG. 8 the resulting excitation and the disturbed error in band 1;

9 shows the estimated power transmission factor under the conditions according to 7 and 8 in volume 1;

10 shows the step size selected by the step size control in band 1 under the conditions according to FIG. Fig. 7 and

FIG. 11 the smoothing of the attenuation reduction according to the invention; FIG. 12 shows a detailed illustration of the post-filtering of the error signal;

FIG. 13 the smoothing of the step sizes according to the invention (part A for the same time constants, part B for different time constants);

FIG. 14 shows a further example of the signals of the remote and the local subscriber, which are the basis for the processing in the further figures;

FIG. 15 the adjustment curve and the damping by the further filter in band 1;

16 shows the attenuation by the further filter in band 1;

FIG. 17 the transfer of the damping values to the level balance; and

18 shows the excitation and error power in the entire band (in each case for the input signal curve according to FIG. 14).

1 shows a simplified model of a hands-free device 10 connected to a digital connection 12. The A-law coding or decoding used in the European ISDN network is shown in the two left blocks 14, 16. The loudspeaker-room microphone system 18 (LRM system) with the local subscriber 20, the user of the hands-free device, is sketched on the right-hand side.

The acoustic coupling between loudspeaker and microphone leads to crosstalk via the LRM system. This crosstalk is echoed by the distant subscriber perceived. Acoustic waves emerge from the loudspeaker and spread out in the room. Reflection on the walls and other objects in the room creates several paths of propagation, which result in different durations of the loudspeaker signal. The echo signal at the microphone thus consists of the superimposition of a large number of echo components and possibly the useful signal n (t): the local speaker.

The connection between the participants can also generate echoes at transitions between different transmission systems. However, the network operators try to take special measures against such echo sources directly at the critical points, so that these echoes can be disregarded here. Fork echoes, which arise in phones with an analog interface due to mismatching of the line simulation to the line impedance, can be disregarded by using digital connections.

2 shows an overview of the hands-free device according to the invention. The central element is a level balance 22, which is shown in the left part of FIG. 2. Optionally, two gain controls 24, 26 (Automatic Gain Control = AGC) can be switched on in the transmit and receive path. The level balance 22 guarantees the minimum attenuations prescribed by the ITU or ETSI recommendations by inserting attenuations into the transmission and / or reception path depending on the conversation situation. When the remote subscriber is active, the reception path is activated and the signal from the remote subscriber is output undamped on the loudspeaker. The echoes that occur when the compensators are switched off or poorly balanced are greatly reduced by the damping inserted into the transmission path. When the local speaker is active, the situation is reversed. While the reception path is strongly attenuated, the level balance 22 adds to the Transmission path no attenuation and the signal of local ^case-¬ Chers is transmitted unattenuated. More difficult is the Steue ^¬ tion of the level discriminator in the duplex case. Here, both paths (and thus also the subscriber signals) each receive half of the damping to be inserted or, if the control is not optimal, at least one of the two signal paths is damped. Intercom is therefore not possible or only possible to a limited extent.

This is remedied by the use of adaptive echo cancellers 28 - shown in the right part of FIG. 2. These try to digitally emulate the LRM system in order to then calculate the echo component of the distant subscriber from the microphone signal. Depending on how well the compensators manage this, the total attenuation to be introduced by the level balance can be reduced.

The echo composition was implemented in frequency subbands, the width of the individual bands preferably being between 250 Hz and 500 Hz at 8 kHz sampling rate or between 500 Hz and 1000 Hz at 16 kHz sampling rate. The use of frequency selective echo cancellation has several advantages. On the one hand, by using undersampling and oversampling, the system can be operated as a multirate system, which reduces the calculation effort. On the other hand, by dividing the sub-band, the "compensation power" can be distributed differently over the individual frequency ranges and thus an effective adaptation of the "compensation power" to speech signals can be achieved. Subband processing also has a decorrelating effect when the overall tape processing is compared with the individual subband systems. For speech signals, this means an increase in the convergence speed of the adaptive filters. In addition to these advantages, the disadvantage of subband processing must not be ignored. The decomposition of a signal into individual frequency ranges always has a duration - in the present preferred method ren 32 ms at 8 kHz sampling rate and 16 ms at 16 kHz te Abtastra ^¬. However, since the method is used for video conferences or in GSM mobile phones, such runtimes are permissible.

In video conferencing systems, the runtime is mainly determined by the image processing component. Since attempts are generally made to output the image and sound of the remote subscriber lip-synchronized to the local subscriber, the running time of the acoustic echoes can increase to several hundred milliseconds. 3 shows the results of a study in which an attempt was made to find out which echo attenuation is necessary depending on the duration of this echo, so that 90, 70 and 50 percent of the respondents were satisfied with the quality of the conversation.

Based on this study, a pure audio runtime of 30 - 40 ms (at 8 kHz sampling rate) only requires 35 dB echo attenuation. In the case of lip-synchronous transmission of image and sound and an associated runtime of, for example, 300 ms, the requirement increases to 53 dB. The runtime can also be more than 100 ms in GSM connections. The requirements placed on echo cancellation methods in video conferencing and GSM systems are thus higher than the requirements placed on conventional hands-free telephones.

Since the echo cancellers are limited in their performance and cannot achieve such high echo attenuation with the available hardware, a so-called post filter 30 was introduced. This evaluates the step sizes of the individual subbands together with the other detector results and filters the synthesis filter output signal again in a frequency-selective manner. Since the setting algorithm of filter 30 was designed in accordance with a Wiener approach, this post-filtering is also referred to below as Wiener filtering. The echo cancellers are controlled in several stages. All power-based control units 32 work autonomously for each compensator, that is to say independently of the remaining frequency ranges. A separate adaptation and control unit 32 is therefore sketched in FIG. 2 for each compensator. The control stage, which is based on correlation analyzes of the loudspeaker and microphone signals, is used for intercom detection and is therefore evaluated equally in all frequency ranges. A further level takes into account the accuracy limited by the fixed point arithmetic and controls the adaptation depending on the modulation.

The final intercom detection is also carried out separately with its own unit, which is based on both the level balance detectors and the echo cancellers. This unit causes the level balance in intercom situations to reduce the total attenuation to be inserted again (in accordance with ITU recommendation G.167).

4 shows an overview of the relationship described above. The central element here is the calculation of the step size vector c (k). This is used both to control the subband echo cancellers and to calculate the coefficients of the post filter. The two sub-methods each calculate the echo attenuation caused by them and communicate this information to the level balance 22. The scale 22 then reduces the total attenuation set by the user and only inserts the remaining attenuation into the transmission or reception path.

Since the present invention relates to the combination of the above-mentioned Wiener filtering and the adaptation control of the subband echo cancellers, both methods are described in detail in separate chapters. New to the featured The approach is to use a single control variable - the step size vector ά (k) - for both methods. By ^¬ here by reduced computational cost (less 100 cycles / sampling for post filtering), it is possible to implement both methods on inexpensive "Consumer" -Signalprozessoren and thus increase the quality of the speakerphone.

Previous approaches to error filtering initially use a (complex) FFT analysis or other computation-intensive calculation methods and always consider the control of post-filtering separately from the control of echo compensation.

The frequency band analysis and synthesis required for subband processing is implemented as a polyphase filter bank.

First of all - regardless of the later use within the Wiener filtering - a step size control is described, which ensures a fast and stable adaptation of the subband echo cancellers. In addition, methods are presented that estimate the echo attenuation achieved. The level balance 22 can thus reduce the total attenuation based on these estimated values. For the attenuation estimate, it is irrelevant whether the attenuation of well-balanced echo cancellers is achieved by the acoustic arrangement of the loudspeaker and microphone or by an appropriate choice of the analog amplifications.

The adaptation of the subband echo cancellers is carried out by means of an NLMS method adapted to the signal processor used. In order to explain the notation of the following description, a structural representation of the adaptation process is shown in FIG. 5. ,(O

By folding the estimated subband impulse responses _P_ (*,) with the subband excitation signals of the remote subscriber% P * ■ ^r , the estimated microphone signals V ^{y (} μ ^Γ) Ä? ' educated:

The index μ should show the subband number. The adaptation error e ^(r) (k _r ) is calculated by forming the difference between the estimated and the measured microphone signal:

This error consists of a so-called undisturbed error ^and the portion caused by the local speaker together:

e [ ^r) () = ^> (*,) + "« (*,). : 3.3:

The adaptation is carried out using an approximation of the NLMS algorithm

c? (k _r + ϊ) (3. 4:

where F (X) denotes the approximation function already mentioned.

The coefficients of the subband echo cancellers are continuously applied to the subband impulse responses of the LRM system during the operation of the hands-free device using the adaptation methods. fit. A reduction in acoustic echoes can thus be achieved even after system changes. The setting criterion for the adaptation method used is the minimization of the mean square error. According to the calculation specification of the NLMS algorithm, the coefficients undergo a strong change if the samples of the compensated signal e ^(r) (k _r ) of the μth subband are large. Constantly large values e ^r) (k _r ) can be attributed to two causes:

1. After changes in the LRM system, the adaptive filters are poorly adapted to the room impulse response. There is then no or only a slight reduction in the acoustic echoes - the uncompensated echo components cause the signals e ^{r) (k _r ) to increase. • In such situations, the compensators should be adjusted as quickly as possible.

2. An increase in the local component n (k) - for example when the local speaker is active - also causes the signals e ^(r) (k _r ) to increase. This component is the useful signal to be transmitted for the hands-free device and for the adaptive device However, the filter represents a malfunction that can lead to an incorrect setting of the coefficients. In such situations, the compensators should not be adjusted, or only slightly, so that the adjustment already achieved is not deteriorated again.

A step size control has already been presented which takes into account the two described conversation situations or states of the compensators and fulfills the demands placed on the adaptation control. The step size in the -th subband should be according to

can be set. The disturbed error signal e ^(r) (k _r ) in the denominator of equation 3.5 can be measured directly - the expected value of this can be determined by

can be estimated. The right-hand side of approximation 3.6 is intended to denote first-order recursive smoothing:

A power transfer factor p _μ ^r) (k _r ) is introduced to estimate the meter. The parallel connection is switched off

Modeled LRM system and echo canceller including the subtraction point in a first approximation as a simple attenuator.

The size of this damping (ratio of excitation to error power) is determined by the power transmission factor in the subband

^' * i ^r) (* _r ) ^"

P _μ (k _r ) with k _r GK ES.FT: 3. 8) x ^{ _μ ^r K)

estimated. The model assumes that there are no additional interference signals in the LRM system - e.g. B. Local speaker activity - are present. Equation 3.8 was made from for this reason the amount K _ES , _FT introduced. This amount is to the times in which the handsfree in to ^¬ stand Single the remote subscriber is located, beinhal ^¬ th.

The smoothed square Anre used in equation 3.8 ^¬ acceleration signal is in this case analogous to the estimated error power determined:

* v (κ) \ ² = ß _x \ z? (κ) r + α-A) κ ^r) (κ -i) i (3.9;

In states without a change in space, the power transmission factor will only change very slowly compared to the (short-term) excitation powers. Recursive smoothing with large time constants can thus be used to improve the variance of the above estimate. The designation large is to be seen in relation to the time constants in the performance estimates.

If the local participant is active, the estimation of the residual echo is severely disturbed. In such cases, the power transfer factor estimate should not be updated - the most recently calculated p _μ ^r) (k _r ) are retained. This measure means that changes in space cannot be detected when the local speaker is active. In such cases, the power transmission factors are only adjusted after the individual subscriber status has been reached again. The determination equation for the smoothed power transmission factors can thus according to

can be specified. The increments (k _r) can be as follows approached to ^¬:

From the previous considerations it follows that the determination of the power transmission factors can be divided into two parts. On the one hand, an effective calculation of the two performance estimates or the divisions of these two quantities must be found on the hardware available. On the other hand, the times that are contained in the set K _ES , _FT must be detected.

Nonlinear, recursive smoothing was used for the first subproblem. The sum of the amount of the real part and the amount of the imaginary part of the subband signals was selected as the input signals of these filters. To avoid division, the performance factors were calculated logarithmically - the division can therefore be replaced by a subtraction.

A so-called correlation measure ξ (k _r ) was used for the second sub-problem. A standardized cross-correlation analysis of the excitation signal of the distant subscriber and the microphone signal is carried out. When the distant subscriber speaks individually, the two signals are strongly correlated and the correlation measure gives values ξ () «1. When the local subscriber is active, the correlation is reduced and values ξ (kr) <1 are detected.

In order to clarify the considerations that follow here, the control was tested with the input signals of the distant and local subscriber shown in FIG. 7. White, Gaussian-distributed noise was selected for both signals in the activity phases. At the beginning of the sequence there is "individual speaking" of the distant subscriber (phase A _τ ). The adaptive echo cancellers can adjust in this phase and reach their final adjustment after about 3 to 4 seconds. After 7.5 seconds, the local subscriber begins to interrupt the distant one (intercom, area B _α ) and then takes on the role of "sole speaker" (area C). The situation reverses after 10.75 seconds. The distant participant interrupts the local (intercom, area B ₂ ) and finally "talks" alone (phase A ₂ ).

The microphone signal is formed by convolution of the excitation signal with the impulse response already presented in an office room (length 2044 coefficients at 8 kHz sampling rate) and subsequent addition of the signal from the local speaker.

8 shows the mean powers of the excitation and error signals. The adaptation was carried out with the step size control described below, it being assumed that the correlation evaluations only deliver releases in the areas A _α and A ₂ . The figure clearly shows that the adjustment of about 25 dB achieved in the course of phase A _τ can be maintained over the areas of intercom and individual speaking of the local subscriber.

To determine the power transmission factor in the / -th subband, the average powers of the excitation signal and the undisturbed error signal must be estimated according to equation 3.8. In order to avoid the problem of limit cycles, a calculation with double-word accuracy (32 bits) would be necessary if smoothing was directly carried out as suggested in equation 3.7 or in equation 3.9. To the associated To reduce the memory requirement or the required computing power, only amount smoothing is carried out:

zV () \ = ß _z \ zV (K) I + 0- U \ * (-i) | (3.12:

: 3,131

So that the critical case of the activity of the local subscriber in intercom can be recognized as quickly as possible, two different time constants {ß _er and ß _ef ) were introduced for rising and falling edges when smoothing the error signal. The time constant ß _e becomes according to ß _er if Yp (k _r )> ß. ß _ef sons (3.14: with 0 <ß _e <A <1

educated. The estimate obtained in this way loses its expectations by choosing two different time constants. For this reason, correction factors are introduced in the prior art. Another path is to be taken here. The excitation power is estimated with the same time constants as the error power estimate:

ß _v if ^ xk _r )> | * (* _r -l) ß ^ else: 3.151 with ß _zr = ß _er andß _zf = ß _ef .

The correction factor can be dispensed with by subsequently dividing the two quantities. The amount calculations were made by the more cost-effective estimates , (') (k _r ) | «| Re {(Ä _r )} | + Jim {xk _r )} | (3.16;

i ^r ) \ | Re {^ (Ä _r )} | + | lm {(Ä _r ) (3.17)

approximated. Here, too, a correction term can be omitted by forming the division. As already mentioned in the previous section, the power transmission factors are only determined logarithmically - the division is thus reduced to two logarithms and one subtraction. The power transmission factors are thus according to

_P η (k _r ) = LOG - LOG {| _* ^W (* _r ) |}; 3.18)

and

estimated. LOG {...} denotes the logarithm. The time constant ß _p was also chosen differently for rising and falling edges. This is intended to do justice to the non-compensable part of the system runtime (artificial delay of the microphone signal). Due to this runtime, the signal power of the excitation signal drops earlier than that of the error signal - without correcting this process, the estimate would lower the estimated value after each excitation phase. In addition, the time constants are increased when two-way communication is detected. The two-way detector used is described below. The equation for the time constant ß _p is: GK, GS

<£ K, GS

with 0 <ß _{pr, GS} <ß _{Ff, GS} <and O <ß _{pr ES} <ß _{pf ES} <\. (3.20)

K _{GS is used} to denote the times at which the detector described above detects intercom. The set K _ES , _FT denotes the points in time at which the correlation measure recognizes individual speech by the distant subscriber.

Comparisons between these approximations and the exact calculation according to Equation 3.10 showed deviations with speech excitation of less than 2 dB. This is sufficient for use within the step size control, so this estimation method was used for the power transmission factor.

9 shows the estimated power transmission factor in the first band p ^^ ik,). Its estimate is not renewed in areas i, C and B ₂ , since no releases are provided by the correlation measure. In comparison with FIG. 8, a good match between the target and the estimated value can be seen. The power difference between pickup and error can be seen as the setpoint. Both the course and the final value of approximately 26-30 dB that can be seen in FIG. 8 are well reproduced in the estimate.

The step size ^a (k _r) ^i-n can each band from the previously calculated sizes, according to

With

ÖG { £ <}}

[3.22)

be determined. The linearization is designated with LIN {...}. If the excitation power is a limit falls below, it is assumed that the excitation consists only of background noise and the adaptation is stopped.

The step size in the first subband is shown logarithmically in FIG. In phases of individual speaking by the distant subscriber (Ai and A), the step size is approximately 1 - in phases of individual speaking by the local subscriber (βi and B ₂ ), a difference from disturbed to undisturbed error performance of approximately 26 to 30 dB can be determined from FIG. 8 become. The step size is therefore also in the expected range (approx. -27 dB) in the intercom phases.

For the step size control presented above, an estimate of the power transmission factor is required. This estimate should only be renewed if the distant participant speaks individually. For this reason, the set K _ES , _{FT was} introduced in equation 3.19, which should contain the times at which the desired single speech is present. Due to the strong recursive smoothing, short-term wrong decisions in the selection of the times do not lead to major misjudgments of the transmission factors. The desired detector should be able to decide between single-talk and two-way talk independently of room changes and also independently of the power of the input signals. A correlation measure is used - a detector that meets the above requirements. The cross correlation between the loudspeaker signal and the microphone signal is evaluated in a standardized form.

For the evaluation, the two signals are multiplied by estimation windows (rectangular functions) of length ii. The finite signal sequences thus obtained are according to

(3.23!

evaluated. In the case of strongly correlated signals, a maximum of the evaluation described above is achieved when the estimation windows are just shifted from one another by the running time of the LRM system. Since this runtime is unknown and also changeable (eg by moving the loudspeaker or the microphone), the maximum from a sequence of L ₂ evaluations is processed further. The individual evaluations then use an excitation signal x (kl) delayed by 1 cycle. The equation of determination extends to:

with le {θ ... L ₂ - l}.

(3.24)

The numerators and denominators of the above equation must be evaluated in double word precision (32 bits). To reduce the computational effort, the individual correlation measures ξ (k, l) are calculated recursively:

Z (k, l) ξ (k, l) = N (k)

(3.25)

A release is set when the maximum of the determined correlation measures is greater than a limit value ξ ₀ . In order to avoid a division of two 32-bit values, the limit wird _{0 is} determined by a finite sum of non-positive powers of two

^N tn = 0 with a _n e {0, l} (3.26) approximated. The threshold value comparison can then be traced back to a summation of right-shifted denominator values and a comparison:

Nξ

∑ a _n 2- "N (k, l) <> Z (k, l). N = 0

(3.27)

In order to further reduce the computing effort, the evaluations were only carried out in the most powerful, first subband and there only with the real parts of the complex signals. In this band, the greatest signal-to-noise ratio can be expected for voice excitation, which should improve the reliability of the detector results. As a result of this measure, the subsampling will only carry out the calculations every r sampling cycles. The time k _r is then included in the set K _ES , _F τ if one of the L ₂ comparisons yields a correlation measure greater than ξ ₀ .

According to ITU recommendation G. 167, the echo attenuation to be provided by the hands-free device can be reduced by 15 dB in intercom situations. For this reason, an intercom detector has been developed according to the following considerations. At the same time, this detector can be used to "more carefully" set the estimates in the step size control when two-way communication occurs.

The detection of intercom is carried out in two steps. In a first stage it is checked whether the distant speaker is active. For this purpose, on the one hand, the excitation signal of the distant subscriber, smoothed in magnitude, with a threshold | x | compared - on the other hand, it is checked whether the level balance algorithm has detected excitation from the remote subscriber. The second comparison is always necessary if the level scale brings in large attenuation values (e.g. after changes in room). In such situations, the reception path can be severely damped. Here the comparison with the smoothed input signal would not provide a reliable result. The remote participant's excitation (A _fe = 1) is therefore always accepted if either the performance comparison or the level balance detector (variable SR = 1) detects this:

_A J l if _Q)

0, otherwise

The amount-smoothed excitation signal is calculated analogously to the recursive, non-linear smoothing described in the step size control. It should be noted here, however, that the higher sampling rate means that larger time constants must be used and limit cycles can occur as a result. A double-word precision calculation (32 bit) is therefore required:

x (k) \ = ß _a x (k - N ^ + (l - ß „) | * (*" 1)! 3.29)

The time constant ß _xg is chosen as follows

ß _xgr ßlls \ χ (k - N) \> \ χ (k - 1) | ß „= ß _m , otherwise (3.30) with ^Q <ß _xgr <ß _χgf <\.

The delay of N clocks was introduced in order to compensate for the runtime of the analysis-synthesis system in the comparisons in the second detector stage. No additional memory is necessary for this, since the analysis filter stores the last N signal values of the input signal anyway. In a second stage it is determined whether the local call participant is also active. To this end, a comparison Zvi ^¬ will rule the power of the estimated undisturbed error and the measurable, disturbed error made. The power estimates are traced back to smoothing the amount or the determination of a power transfer factor. The error signal is smoothed according to

carried out. The time constant ß _eg is chosen as follows:

ß, eg „r, if e (k) \> r ez

\ ß. otherwise (3.32)

^with ß _S r = _Zgr ^U "d ß _Sf = ß _Zgr

A (total band) power transmission factor p _EK (k) is determined to estimate the undisturbed error power:

p _EK (k) =}. 13.33)

In order to improve the variance of the estimate, this variable is also smoothed recursively. Since the determination of the transmission factor only consists of smoothed quantities, it is only carried out under-sampled:

^nύt ° <ß _re < ^l

(3. 34) To detect the excitation of the local participant (A _lo = 1), the difference between the measured and the estimated error power is determined. To avoid wrong decisions, an additional safety threshold p _{GS was} introduced. The detector detects excitation of the local subscriber when the measured error power is at least p _G s dB greater than the error power estimated from the excitation power and the power transmission factor. This comparison is also carried out under-sampled:} + / &> (*,).

[3.35)

The detector detects two-way communication when the AND combination of the variables A _fe and A _1O results in the value one. In these cases, the residual attenuation introduced by the level balance can be reduced by p _G s _max = 15 dB. The attenuation requirement is reduced using a low-pass filter. The time constant for the rising edge ß _Gsr should be as small as possible so as not to cut off the beginning of a speech _passage . The time constant for the falling flank ß _Gsf should be greater than the arrival be selected rose constant, thus lowering the damping Pos e _r) i ⁿ short speech pauses is not completely withdrawn. This relationship is shown in FIG. The smoothed damping reduction is determined as follows:

The time k _r is included in the quantity K _gs if the damping reduction is above a predetermined value. An exemplary course of the damping reduction is shown in FIG. 11. The total attenuation of the level balance, which is prescribed by ITU-T recommendation G. 167, can be reduced by the attenuation of the overall system consisting of room and echo canceller. Even when echo compensation is switched off, the control described above estimates the transmission factor of the acoustic path from the loudspeaker to the microphone, including the analog amplifications. In this way it is possible to react to different loudspeaker or different (analog) microphone amplifications and to adjust the total attenuation (digital) according to the required values. In the opposite case, the total attenuation can also be set to a lower value in accordance with ITUT recommendation G. 167. For this, too, a detector and a corresponding transfer size were presented or defined. The total level balance damping D _PW {k) is thus controlled (initially without taking post-filtering into account) using the following procedure:

D _PW (k) = D ₀ - D _EK (k) - D _GS (k) .3. 31)

All quantities of the above equation are available in logarithmic form in accordance with the requirements of the ARCOFI level balance method. D ₀ is the required maximum attenuation (eg 45 dB). The attenuation of the echo canceller D _EK (k) is determined by the form of calculation

! />£> (*) if k = ir (3.38) _* ⁽ * ⁾ = D _j xik- Ϊ) otherwise (3.39) with i eZ

certainly. Similarly, the intercom reduction D _GS (k) can be used p% £ if k = ir (3.40)

D _GS (k) =

P _π i - V) otherwise (3.41) with i eZ

can be specified.

The real-time implementation of the echo cancellation method shows that the adaptive filters can never completely calculate the portion of the distant speaker from the microphone signal. This can have many different causes, three of which are listed here as examples:

a) The space impulse responses are generally longer than the echo cancellers, leaving a residual error.

b) The fixed point arithmetic of the DSP used has a limiting effect on the final adjustment of the filters.

c) In the case of room changes, the NLMS algorithm only tracks the adaptive filters at a finite speed - echoes are more noticeable again until the final adjustment is reached again.

The error signal e (k) thus contains, in addition to the portion of the local speaker n (k), also the uncompensated portion of the distant speaker, which was already referred to in the previous parts of this description as an 'undisturbed' error ε (k). For the distant subscriber, the signal n (k) is the useful component of the signal e (k) - the signal ε (k) is the disturbance from this point of view.

The following shows how post-filtering of the signal e (k) - to dampen the "interference" ε (k) - based on a Wiener filter approach with the step size control for the Subband echo cancellers can be linked. For this purpose, a transversal filter of order M - 1 is inserted after the synthesis filtering. The parameter M is also the number of bands in the filter bank. The coefficients are determined in the subband level and transformed into the time domain with an inverse DFT. The coefficient determination is affected by several smoothings with an inertia and thus a running time. This runtime can be at least partially compensated for by the maximum-phase synthesis filter that lies between the determination and use of coefficients. The post-filtering takes place in the time domain and frequency-selective.

In the derivation there are simple control variables with which the "influence" of the Wiener filter can be controlled depending on the compensation power of the adaptive filter. The damping introduced by this measure can also be estimated with little effort and "notified" to the level balance.

It will be shown below that the determination of the coefficients of the Wiener filter can be traced back to the calculation of + \ subtractions, a (simplified) inverse Fourier transform of length M and some recursive smoothing. The subtractions as well as the inverse FFT and the smoothing are only to be carried out every r samples. The computation effort is very low compared to the other components of the hands-free system!

12, the filter g (k) 30 is placed behind the synthesis. The order of the filter is M - 1, so M coefficients must be set. According to the Wiener approach, the filter 30 should optimally free the "disturbed" signal e (k) from the "disturbance" ε (k). The frequency response of such a filter is: s _m (Ω σ- (Ω) =

The following applies to the signal e (k):

e (k) = ε (k) + n (k).

The filter frequency response can be too

C- (Ω) =

_S _εε (Ω) + - S _εB (Ω) + S _{/ ιε} (Ω) + S _{/ ιn} (Ω)

be reshaped. The signals of the distant and the local subscriber (n (k) or ε (k)) are assumed to be uncorrelated.

Due to the high-pass filtering of the line input and the microphone signal, freedom from the mean values of the signals n (k) and ε (k) is also assumed. This simplifies the frequency response to:

<v ⁽ " ⁾ = S _εε (Ω ^Sn ) ⁿ + ⁽ⁿ S _n ⁾ _n (Ω) ^ (Ω)

1-

S _εε (Ω) + S _nn (Ω) ^ (Ω)

= 1-

^ _e (Ω)

Since the filter g (k) has the order M - 1 and is to be determined from the frequency response G _opt (Ω) by inverse Fourier transformation, M nodes of the frequency response must be determined. For the frequencies

a _μ = μ ² ^ with μ e {0 ... M - l}

surrendered: <(Ω _μ ) = l S, (Ω _μ )

The frequencies Ω ^ represent, in addition to the support points in the frequency range, also the band centers of the bandpasses described above when dividing the subband. When estimating the size <> ° __ ("" μ), it is therefore possible to use corresponding sizes in the individual subbands. G _opt (Ω ^) can by

be approximated. Since stationarity of the input signals was assumed in the derivation of the Wiener filter, but this can only be assumed for short passages in speech, the power density spectra should be replaced by corresponding short-term power estimates in the respective frequency range. The following therefore apply to the estimation of the quotients

the same requirements as for the estimation of the step sizes in the respective bands. The DFT transform of the filter g (k) could therefore according to

G (k) = l - a (k)

be determined. The superscript "(r)" should indicate the subsampling level. G (k) or a \ (k) therefore only change every r sampling steps. In the preferred embodiment r = 13 was chosen. It was shown that the complex Ribbons ^¬ only for μ = l ... ^ - must be calculated \ - the tapes = ^ - l ... -l can be determined by complex conjugation ^¬ to. However, since the step sizes are real, the vector ä (k) can be formed as follows

Since the subband division filters out the area of the last subband (at 8 kHz sampling rate 3750 Hz - 4000 Hz), this area should also be impermeable in the Wiener filter used, which means that the choice of G ₈ ^(r) (A :) = 0 or a ⁽ _s ^r k) = l results.

The practical application of this method shows that a slightly modified approach leads to better results. Analogous to known methods of noise reduction, the estimated support points of the filter frequency response are smoothed over time, and are provided with a so-called overestimation factor β and a maximum attenuation G _min (k). The temporal smoothing is applied to the step sizes and is carried out with a first-order IIR filter with two different time constants for rising (γ _r ) and falling (γ _f ) edges:

<xV (k) = l- y _r ) aγ (k) + r _r ^U P (k - \)

In the case of linear smoothing (γ _r = γ _f ), the damping at the beginning of a speech passage of the distant subscriber would first be introduced slowly and then more and more quickly. At the end of the speech passage, the attenuation would then be reduced quickly and then slowly. To clarify this connection, 13 shows an exemplary course of the term (1 - α (k)) in one of the subbands. At the beginning there should be a speech pause of the distant speaker, the term (1 - α (k)) is therefore equal to one. With the onset of speech passage, the step size α (k) is set to a value close to one - for simplification, the step size remains at this value until the end of the speech sequence, after which the step size is reset to zero. To illustrate the size of the inserted damping (it is assumed for simplicity that all bands have the same course), the points at which the curve with the smoothed step size have the values (1 - ^), (1-) and (l - \) reached, marked. These values then correspond to an attenuation of 6 dB, 12 dB or 18 dB. In the lower part of FIG. 13, the term a (k) smoothed with two different time constants is shown. At the beginning of the speech passage, the damping is inserted here quickly - at the end there is a slower reduction in the damping introduced.

The vector a ^{r) (k) used in the implementation is thus composed of the smoothed step sizes:

5 ^(r) (Ä) = (^ (), ^ (Ä), ..., ^ (W, ^ (), ..., ^ (Ä)) ^r .

The filter frequency response is then according to

estimated. If the selection is greater than one, the overestimation factor ß accelerates the introduction of the damping and increases the damping. A value between 1.0 and 3.0 is preferably chosen for β. With the parameter G _mln (k) the spectral estimates of the filter can be limited. If this parameter is chosen to be zero, for example, the filter could set the output signal to zero. If G _m ι _n (k) = 1, the output signal does not change. Thus, with the parameter G _m i _n (k) of the "influence" of the Wiener filter can be controlled. In real-time tests showed that it is advisable to link the control of this parameter with the collated status of the echo cancellers. At the beginning of a trimming operation is The attenuation achieved by the echo cancellers is still very low. The Wiener filter should intervene strongly and be able to introduce large attenuations (eg up to 45 dB according to the ITU recommendations). Is in the room in which the hands-free system is located If there is strong background noise, the echo is suppressed by the Wiener filter, but the distant participant then perceives a kind of modulation of the background noise. During the pauses in the speech, the noise is transmitted undamped while he is speaking, it experiences a (e.g. B. 45 dB) attenuation.

At the beginning of an adjustment process, such "effects" are tolerable, especially since "conventional" methods such as the level balance have a similar effect. However, this effect should be reduced with increasing compensation of the compensators. Here, too, the step size control provides a suitable control variable - the estimated power transmission _factor D _EK (k). The parameter G _mln (k) is therefore set according to:

G _min () = LIN {Max {0, {G _maXrlog - D _EK (k) - D _GS (k))}}. (4.2)

"LIN" denotes the linearization of logarithmic variables already used in the step size control. The maximum insertion loss (for example 45 dB) can be set with the parameter G _maXrlog . This fixed value is then _reduced by the attenuation D _EK (k), which the echo cancellers provide on average, and the intercom reduction D _GS (k) reduced. The sizes D _EK (k) and D _GS (k) lie in the same logarithmic ^¬ mix form as the constant G _{max / log} before. Limiting the calculated size to 0 dB serves to adapt to the linearization.

All control variables for setting the Wiener filter and the filter coefficients in the subband range are thus determined. The spectral estimates of the filter obtained in this way are transformed into the time domain with the aid of an inverse DFT in such a way that a phase-linear filter is produced. Here, the fact that the system function is both real and symmetrical can be used and the effort of the IDFT can be reduced to about a quarter.

The attenuation D _w (k) of the signal e (k) by the Wiener filter is communicated analogously to the attenuation of the echo cancellers and the attenuation reduction in the case of two-way communication via an interface of the level balance. Attenuation is approximated by the mean over all frequency ranges to be transmitted:

With "LOG" the standardization or logarithmization already used in the step size control is designated. It ensures the interface-specific communication with the level balance. The division by 8 is achieved by shifting to the right by 3 bits. Before the damping then finally reaches the level balance is passed, there is a recursive, nonlinear smoothing:

D '^ w (k) = ß, ./ D (k \) + (\ -ß _rf ) D (k). The use of different time constants for rising and falling edges causes the estimate to be "more careful". If attenuation is added by the Wiener filter, the level balance reduces its attenuation more slowly. For a short time, the error signal thus exceeds the required 45 dB Conversely, if the Wiener filter reduces its attenuation, the level balance very quickly adds the remaining attenuation, and the delay due to the synthesis filtering can also result in a brief total attenuation of more than the set upper limit (eg 45 dB) .

To illustrate the previous considerations, the simulation described in the section on step size control was repeated - this time, however, expanded with the Wiener filter presented above. The measured room impulse response of an office room with about 300 ms reverberation time was used as the room impulse response. As suggestions, white noise according to FIG. 14 was fed in both on the distant and on the local subscriber side.

In order to clearly show the influence of the Wiener filter, the maximum _attenuation G _maX χ _{og was} chosen to be 60 dB. The initial adjustment process of the compensators takes place in area A \. At the beginning of this area, the compensators have not yet been adjusted - in the end, the final adjustment status was reached in all bands. Since there is no intercom in this phase, the Wiener filter should insert the difference between 60 dB and the attenuation achieved by the echo canceller. The coefficient for this is in area Ai

G 'k) = r {(l- ßa \ ^r) (k)), G _nl (*)} in sub-band 1 (250 -750 Hz at 8 kHz sampling rate) together with the excitation and error signal before the Wiener filter in FIG. 15. To recognize here is first of all a ^¬ oscillating operation of the Wiener filter. Due to the inertia of the low-pass smoothing, the damping is not inserted immediately - this effect is partially compensated for by the transformation into the time domain and the synthesis filter in between. At least 25 dB of attenuation is thus already inserted in the overall band signal (see FIG. 18) at the beginning of the activity of the distant speaker. After about 200 ms, the attenuation has already increased to its final value of 60 dB. With increasing compensation of the compensator, the attenuation by the Wiener filter in band 1 decreases and, as expected, reaches a final value of about 30 dB (60 dB maximum limit - 30 dB echo attenuation by the compensator). Since the Wiener filter was only inserted after the synthesis, the courses of the excitation, the error, the step size and the power transmission factor in band 1 can be seen from FIGS. 9 and 10.

In the event that the distant call participant speaks individually (areas A? And A ₂ ), the maximum limit of the damping to be inserted G _min (k) is the determining variable. According to the approach of the filter, the total signal e (k) should be separated from its interference ε (k). However, since the local participant - the useful signal in e (k) - is not active, the overall signal only consists of the disturbance. Would limit when determining the coefficient this would set these coefficients to zero and thus eliminate the disturbance.

16 is to illustrate this relationship

Attenuation, which is inserted through the Wiener filter, in

Volume 1 shown. The initial value of about 60 dB is determined by the set maximum _attenuation G _maXrιog . The too Beginning of the simulation initialized with zero vectors Kompen ^¬ capacitors same from the course of the Phase A and _Ύ reduce as ^¬ with the upper limit of the damping insert to approximately 30 dB. In the following intercom phase B, this upper limit is reduced again by the intercom detector by 15 dB to about 15 dB. However, since the performance of the local speaker is significantly higher than that of the residual echo, this limit is not reached. According to the selected setting algorithm, almost no attenuation is inserted in the two-way phase Bi. The determining factor in the intercom phase is the power ratio of the signal from the local speaker and the residual echo from the distant speaker. The performance of the residual echo depends on the one hand on the excitation power of the distant participant and on the other hand on the balancing state of the compensators. The better these are balanced, the less the influence of the Wiener filter will be in these passages.

In the following conversation situation C, the local participant took over. In these situations, the step sizes are set to zero, which means that the Wiener filter becomes an interconnection. Passages B ₂ and A _z are to be seen analogously to the phases just described.

Since the estimation of the damping, which is inserted by the Wiener filter, is carried out with different time constants, a "careful" estimation occurs in certain phases. In order to clarify this fact, the course of the estimation of the damping is shown in FIG plotted by the echo canceller and by the Wiener filter, as well as the reduction in the case of two-way communication. The sum of these three quantities is transferred to the level balance and is shown in the lower part of Fig. 17. This estimate can be compared with the actual signal curves of the excitation and the error in the The total band can be compared in Fig. 18. In areas B ₂ and B ₂ , the intercom detector detects the activity of the two subscriber sides and increases the attenuation transfer by 15 dB. This increase is inserted with a short time constant and slowly removed at the end of the intercom phase. This measure was introduced to bridge short language breaks. At the same time, with the onset of intercom, the step size is reduced and the Wiener filter reduces its damping. In the passages without excitation (area C) of the distant participant, the step size is set to zero - the Wiener filter thus only acts as a delay element.

However, the procedure presented so far was slightly modified for the final implementation - as a result, the computational effort could be reduced again without any noticeable loss of quality.

After a step width-dependent determination of the filter coefficients in the subband range, an upper limit of the attenuation was determined in accordance with equation 4.1. This upper limit was determined as a function of the attenuation already achieved, which is given by the power transmission factors in the respective band or by the intercom attenuation. Both quantities were only calculated and saved in logarithmic representation in the step size calculation. In order to be able to use the variables in the limitation function, eight linearizations are necessary. The determination of the maximum values would therefore require more computing power than the entire remaining coefficient calculation. For this reason, a uniform upper limit has been introduced for all tapes. This is also calculated according to equation 4.1, but with the total band sizes. The resource requirements of the post-filtering obtained in this way are well below 1 MIPS when using 16-bit fixed-point signal processors. When the Wiener filter 30 is switched on, the total attenuation can additionally be weakened by the attenuation of the Wiener filter 30. The maximum stroke of the level balance can thus be

D _PW (k) = D ₀ ~ D _EK (k) - D _GS (k) ~ D _w (k) (4.3)

can be specified. The size D _w (k) is according to

I _> (*) if k = ιr

D „(k) = l ID *> _B « _K ( ⁽ k * - \ ») * ^< (4.4) with ie Z determined.

Claims

claims

1. A method for improving the acoustic attenuation in hands-free devices with a level balance (22) and a frequency-selective controllable echo cancellation (28) with subband processing, the outgoing signal after the frequency-selective echo cancellation (28) a post-filtering in a further frequency-selective filter (30) with on - Adjustment algorithm is subjected to a Wiener approach (Wiener filtering), characterized in that a single

Control variable (step width vector ά (k)) is used both for the control of the frequency-selective echo compensation and for the control of the further filter (30).

2. The method according to claim 1, characterized in that several different sampling rates are used.

3. The method according to claim 1 or claim 2, characterized in that adaptive filters are used both in the echo cancellation (28) and for the further filter (30).

4. The method according to any one of claims 1 to 3, characterized in that the echo cancellation (28) is implemented by means of a filter bank in frequency subbands.

5. The method according to any one of claims 1 to 4, characterized in that both performance-based estimates and correlation-based analyzes are used to control the adaptation and the step size.

6. The method according to any one of claims 1 to 5, characterized in that power transmission factors in sub-bands are estimated for determining step size.

7. The method according to any one of claims 1 to 6, characterized in that both the echo cancellation (28) and the further filter (30) provides estimates for the echo attenuation introduced by them.

8. The method according to claim 7, characterized in that the estimated values for the damping for controlling the damping of the

Level balance (22) can be used.

9. The method according to any one of claims 1 to 8, characterized in that the simultaneous activity of both participants (intercom) is detected.

10. The method according to claim 9, characterized in that the total attenuation of the level balance is reduced in the intercom.