DE10120231A1

DE10120231A1 - Single-channel noise reduction of speech signals whose noise changes more slowly than speech signals, by estimating non-steady noise using power calculation and time-delay stages

Info

Publication number: DE10120231A1
Application number: DE2001120231
Authority: DE
Inventors: Rainer Zelinski
Original assignee: Deutsche Telekom AG
Current assignee: Deutsche Telekom AG
Priority date: 2001-04-19
Filing date: 2001-04-19
Publication date: 2002-10-24

Abstract

The method involves a non-steady noise estimation (GSI) which allows effective noise reduction in combination with a selection stage (AS). The selection stage checks if the estimation is acceptable. A stage for calculating the power and power maximum of the input signal and a time delay stage for the non-steady noise component are used in the checking. A time delay stage (TX) extends the analysis range for the GSI stage into the future time domain. A bypass controller (BS) prevents the influence of unwanted speech components in the determination of a noise filter (GF). An Independent claim is included for a single-channel noise reduction apparatus.

Description

Die Erfindung bezieht sich auf das Gebiet der Geräuschreduktion von Fremdgeräuschen bei der Sprachkommunikation und ist auf ein Verfahren und eine Anordnung zur einkanaligen Geräuschreduktion ausgerichtet.The invention relates to the field of noise reduction of extraneous noise in voice communication and is based on a method and an arrangement for single channel noise reduction.

Bei vielen Anwendungen der Sprachkommunikation kann das vom Mikrofon aufgenommene Sprachsignal durch ein akustisch überlagertes Geräuschsignal aus der Umgebung des Sprechers gestört sein. Das ist insbesondere der Fall, wenn eine sehr laute Geräuschquelle vorliegt oder wenn Freisprecheinrichtungen eingesetzt werden. Es können in diesem Fall erhebliche Beeinträchtigungen in der Sprachkommunikation auftreten.In many applications of voice communication, this can be done by the microphone recorded voice signal by an acoustically superimposed noise signal from the Surrounding the speaker. This is especially the case when a very loud one Noise source is present or if hands-free devices are used. It can cause significant impairments in voice communication occur.

In der Literatur gibt es eine Reihe von Vorschlägen zu Geräuschreduktionsverfahren; eine Übersicht dazu findet man z. B. in "Signalverarbeitungsverfahren zur Verbesserung der Sprachkommunikation über Freisprecheinrichtungen - Teil 3: Verfahren zur Geräuschreduktion" von R. Wehrmann, R. Poltmann, H. Schütze und R. Zelinski, Der Fernmelde-Ingenieur, Heft 2, 1995. Die Verfahren lassen sich grob in einkanalige (nur 1 Mikrofon) oder mehrkanalige (2 oder mehr Mikrofone) Verfahren unterteilen. Aus Gründen des technischen Aufwandes sollen im Folgenden nur einkanalige Verfahren betrachtet werden.There are a number of proposals for noise reduction techniques in the literature; an overview can be found e.g. B. in "signal processing method for improvement of hands-free voice communication - Part 3: Procedure for Noise reduction "by R. Wehrmann, R. Poltmann, H. Schütze and R. Zelinski, The Telecommunications Engineer, Issue 2, 1995. The procedures can be roughly divided into single-channel Subdivide (only 1 microphone) or multi-channel (2 or more microphones) procedures. Out For reasons of technical complexity, only single-channel processes are described below to be viewed as.

Fig. 1 zeigt die typische Struktur eines einkanaligen Geräuschreduktionsverfahrens. Vom Mikrofon M wird das gestörte Eingangssignal x = s + n aufgenommen, das aus dem Sprachsignal s und dem überlagerten Störsignal n besteht. Da Amplitude und Phase des aktuellen Störsignals n nicht bekannt sind und auch nicht unmittelbar geschätzt werden können, wird in der Stufe zur stationären Geräuschschätzung GSS in den Sprachpausen der stationäre Geräuschanteil als statistische Beschreibung des Geräuschsignals geschätzt, z. B. in der Form des von der Frequenz f abhängigen Kurzzeit- Leistungsdichtespektrums des stationären Geräuschanteils P_GS(f). Dieser Geräuschanteil könnte aber auch gleichwertig in Form der Kurzzeit-Autokorrelationsfunktion beschrieben werden; daher soll im Folgenden die Bezeichnung "Geräuschanteil" beide Darstellungsmöglichkeiten umfassen. Durch die Auswertung des Eingangssignals x in Form des dazugehörigen Kurzzeit-Leistungsdichtespektrums P_X(f) oder der Kurzzeit- Autokorrelationsfunktion und des geschätzten Kurzzeit-Leistungsdichtespektrums für den stationären Geräuschanteil P_GS(f) wird das adaptive Geräuschfilter GF berechnet und fortlaufend dem variablen Kurzzeit-Leistungsdichtespektrum des Sprachsignals angepasst. Das Geräuschfilter GF kann z. B. als Wiener-Schätzfilter oder nach dem Prinzip der spektralen Subtraktion realisiert werden. Aus der Filterung des Eingangssignals x mit dem adaptiven Geräuschfilter GF resultiert das geräuschreduzierte rekonstruierte Sprachsignal , das zum fernen Teilnehmer übertragen wird. Fig. 1 shows the typical structure of a single-channel noise reduction process. The disturbed input signal x = s + n, which consists of the speech signal s and the superimposed interference signal n, is picked up by the microphone M. Since the amplitude and phase of the current interference signal n are not known and also cannot be estimated directly, the stationary noise component is estimated in the speech pause stage GSS during the speech pauses as a statistical description of the noise signal, e.g. B. in the form of the frequency-dependent short-term power density spectrum of the stationary noise component P _GS (f). This noise component could also be described as equivalent in the form of the short-term autocorrelation function; Therefore, the term "noise component" is intended to encompass both display options in the following. By evaluating the input signal x in the form of the associated short-term power density spectrum P _X (f) or the short-term autocorrelation function and the estimated short-term power density spectrum for the stationary noise component P _GS (f), the adaptive noise filter GF is calculated and continuously Power density spectrum of the speech signal adapted. The noise filter GF can e.g. B. can be realized as a Wiener estimation filter or according to the principle of spectral subtraction. The filtering of the input signal x with the adaptive noise filter GF results in the noise-reduced, reconstructed speech signal, which is transmitted to the remote subscriber.

Das zeitveränderliche Geräuschfilter GF kann gelegentlich zu zusätzlichen und auffälligen Nebengeräuschen (Artefakte) führen, die vom fernen Teilnehmer als störend wahrgenommen werden. Zur Abhilfe sind zahlreiche Verfahren vorgeschlagen worden, so enthält z. B. DE 198 18 609 C2 ein Geräuschfilter kombiniert mit adaptiv gesteuertem Bypass. Diese Bypass-Funktion reduziert zwar die Filterwirkung, verringert aber auch gleichzeitig die Stärke solcher störenden Artefakte. Insbesondere in den Zeitabschnitten mit hoher Sprachsignalenergie kann die durch den Bypass herabgesetzte Filterwirkung in Kauf genommen werden, da eine teilweise Verdeckung der Geräuschkomponente durch das energiereiche Sprachsignal erfolgt.The time-varying noise filter GF can occasionally lead to additional and conspicuous background noises (artifacts) that are disturbed by the distant participant be perceived. To remedy this, numerous methods have been proposed so contains z. B. DE 198 18 609 C2 a noise filter combined with adaptively controlled Bypass. This bypass function reduces the filter effect, but also reduces it at the same time the strength of such disruptive artifacts. Especially in the time periods with high speech signal energy, the filter effect reduced by the bypass in Purchase because of a partial masking of the noise component the high-energy speech signal occurs.

Nachteilig bei den hier beschriebenen einkanaligen Geräuschreduktionsverfahren ist jedoch, dass sie in der Regel nur einsetzbar sind, wenn das Umgebungsgeräusch stationär oder nur sehr langsam veränderlich ist (z. B. Lüftergeräusch oder Straßenverkehrslärm). Diese Einschränkung wird durch die Schätzung des Geräuschanteils hervorgerufen, die üblicherweise in den Sprachpausen aktiviert wird. Da eine solche Sprachpause bis zu einer Sekunde zurückliegen kann, können bei zeitveränderlichen Geräuschen erhebliche Abweichungen zwischen aktuellem Geräuschanteil und geschätztem Geräuschanteil auftreten. Diese Abweichungen können dazu führen, dass nicht nur das störende Umgebungsgeräusch nicht abgeschwächt wird, sondern zusätzlich noch neue Störgeräusche durch das fehleingestellte Geräuschfilter GF in den Signalpfad zum fernen Teilnehmer eingeführt werden.The single-channel noise reduction method described here is disadvantageous however, that they can usually only be used when the ambient noise is stationary or changes very slowly (e.g. fan noise or road traffic noise). This limitation is caused by the estimate of the amount of noise that is usually activated during the pauses in speech. Because such a language break up to one second can be significant with time-varying noises Deviations between the current noise component and the estimated noise component occur. These deviations can lead to not only the annoying Ambient noise is not attenuated, but also new ones Noise caused by the incorrectly set noise filter GF in the signal path to the far distance Participants are introduced.

Eine Abhilfe wäre durch den Einsatz von Geräuschreduktionsverfahren mit mehreren Mikrofonen (mehrkanalige Verfahren) möglich; solche Verfahren sind jedoch auf Grund des wesentlich erhöhten technischen Aufwands oft nicht einsetzbar.A remedy would be to use noise reduction methods with several Microphones (multi-channel process) possible; however, such procedures are grounded of the significantly increased technical effort often not applicable.

In der Literatur gibt es kaum Vorschläge zur Realisierung eines einkanaligen Geräusch reduktionsverfahrens, das auch für weniger stationäre Geräuschsignale einsetzbar ist. Wenn das störende Geräuschsigal ein zweites Sprachsignal aus der Umgebung des gewünschten Sprechers ist (z. B. ein Hintergrundsprecher), so können Verfahren zur Lösung des "Co-Channel Speech"-Problems eingesetzt werden, wie sie beispielsweise beschrieben sind in "Techniques for suppression of an interfering talker in co-channel speech" von J. A. Naylor und S. F. Boll, Proceedings of 1987 International Conf On Acoustics, Speech and Signal Processing, Dallas, S. 205-208. Nachteilig ist jedoch, dass diese Verfahren nur für dieses eine Problem einsetzbar sind (ein einziger Hintergrundsprecher als Störung) und außerdem nur unter sehr idealen Randbedingungen zuverlässig arbeiten.There are hardly any suggestions in the literature for realizing a single-channel noise reduction method that can also be used for less stationary noise signals. If the disturbing noise signal is a second speech signal from the vicinity of the desired speaker (e.g. a background speaker), methods for Solution of the "Co-Channel Speech" problem can be used, such as are described in "Techniques for suppression of an interfering talker in co-channel speech "by J.A. Naylor and S.F. Boll, Proceedings of 1987 International Conf On Acoustics, Speech and Signal Processing, Dallas, pp. 205-208. The disadvantage, however, is that these methods can only be used for this one problem (one Background speaker as a disturbance) and also only under very ideal Boundary conditions work reliably.

Die technische Aufgabe ist auf eine Anordnung und ein einkanaliges Verfahren zur Geräuschreduktion ausgerichtet, das aus dem gestörten Sprachsignal ein Sprachsignal aufbereitet, das erheblich weniger Geräuschkomponenten enthält als die durch die bekannten einkanaligen Verfahren zur Geräuschreduktion aufbereiteten Sprachsignale.The technical task is based on an arrangement and a single-channel method Noise reduction aligned, the speech signal from the disturbed speech signal processed, which contains considerably fewer noise components than that by the known single-channel method for noise reduction processed speech signals.

Die erfindungsgemäße Lösung zur einkanaligen Geräuschreduktion ist einsetzbar, wenn zwei Voraussetzungen erfüllt sind:
The solution according to the invention for single-channel noise reduction can be used if two conditions are met:

A) The spectrum of noise may not change as quickly as that Spectrum of the speech signal. So it should last for at least 60 ms or no longer distinguishable from the spectrum of a stationary noise be.
B) The signal-to-noise ratio at the microphone input of the microphone M should amount to at least 6 dB or more, so that the speech signal compared to the Noise signal slightly dominated.

Grundidee des neuen Verfahrens ist es, den Geräuschanteil nicht mehr in den (gegebenenfalls lange zurückliegenden) Sprachpausen, sondern in unmittelbarer zeitlicher Nähe des gerade zu filternden Signalabschnitts aus dem gestörten Eingangssignal x zu schätzen. Dadurch wird im Vergleich zu herkömmlichen Verfahren die Abweichung zwischen der Geräuschschätzung und dem aktuellen Geräusch-Spektrum sehr viel geringer, wenn das Geräuschspektrum sich zeitlich verändert. Da die Geräuschschätzung nicht mehr in den Sprachpausen durchgeführt wird, enthält diese jedoch zusätzlich unerwünschte Anteile des Sprachsignals. Dieser nachteilige Einfluss kann aber durch ergänzende Maßnahmen weiter abgeschwächt werden. So werden durch eine Auswahlstufe AS gesteuert nur solche Geräuschschätzungen akzeptiert, bei denen der Signalpegel signifikant unter dem maximalen Sprachpegel liegt. Eine zusätzliche Zeitverzögerung TX vor dem Eingang des Geräuschfilters GF ermöglicht es, auch "zukünftige" Signalabschnitte zur Schätzung des Geräuschanteils heranzuziehen. Dadurch wird die Chance erheblich vergrößert, einen Signalabschnitt des Eingangssignals x mit nur geringem Spracheinfluss zur Schätzung des Geräuschanteils zu finden. Abschließend wird der Einfluss von noch verbliebenen Sprachanteilen in der Geräuschschätzung durch Einsatz einer adaptiven Bypass-Funktion, Bypass-Steuerung BS, für das Geräuschfilter GF abgeschwächt. Die Bypass-Funktion verringert die Filterwirkung für energiereiche Abschnitte des Eingangssignals x. Die Steuerung dieser Funktion erfolgt jedoch nicht wie in den bekannten Lösungen beschrieben.The basic idea of the new process is to stop the noise component in the (Possibly long ago) language breaks, but in the immediate time Proximity of the signal section to be filtered from the disturbed input signal x estimate. This makes the deviation compared to conventional methods a lot between the noise estimate and the current noise spectrum less if the noise spectrum changes over time. Because the noise estimate is no longer carried out during the language breaks, however, it also contains these unwanted parts of the speech signal. However, this disadvantageous influence can be caused by complementary measures are further weakened. So through a Selection level AS controlled only accepts those noise estimates at which the Signal level is significantly below the maximum speech level. An additional Time delay TX before the entrance of the noise filter GF allows, too Use "future" signal sections to estimate the noise component. This significantly increases the chance of a signal section of the input signal x to be found with only slight speech influence to estimate the noise component. Finally, the influence of remaining language parts in the Noise estimation through the use of an adaptive bypass function, bypass control BS, weakened for the noise filter GF. The bypass function reduces the Filter effect for high-energy sections of the input signal x. The control of this However, function does not take place as described in the known solutions.

In Fig. 2 ist ein Blockschaltbild der Verfahrensanordnung abgebildet. Den Grundkern bilden die schon aus Fig. 1 bekannten Elemente Mikrofon M, Geräuschfilter GF und Geräuschschätzung GSS für den stationären Geräuschanteil. Weiterhin werden eine an sich bekannte Bypass-Steuerung BS, an sich bekannte Stufen zur Zeitverzögerung und eine an sich bekannte Stufe zur Berechnung des Leistungsmaximums des Eingangssignals LM eingesetzt. Die Zeitverzögerungsstufe für das Eingangssignal TX ist zwischen dem Eingang des nahen Teilnehmers E und dem Geräuschfilter GF angeordnet. Der Ausgang der Zeitverzögerungsstufe für das Eingangssignal TX ist mit einem Eingang der Bypass- Steuerung BS, einem Eingang des Geräuschfilters GF und einem Eingang der Filteradaption des Geräuschfilters GF verbunden. Der Eingang des nahen Teilnehmers E ist zum einen über eine an sich bekannte Stufe zur Berechnung des Leistungsmaximums des Eingangssignals LM mit der Bypass-Steuerung BS und einer Auswahlstufe AS und zum anderen mit der an sich bekannten Stufe zur stationären Geräuschschätzung GSS und einer zusätzlichen Stufe zur instationären Geräuschschätzung GSI verbunden. Die Stufe zur stationären Geräuschschätzung GSS ist mit der Bypass-Steuerung BS, und der Auswahlstufe AS verbunden. Die Stufe zur instationären Geräusschätzung GSI ist zum einen direkt und zum anderen über eine Baugruppe zur Zeitverzögerung TG mit der Auswahlstufe AS verbunden. Die Auswahlstufe AS ist über einen Multiplizierer MU mit der Baugruppe Filteradaption des Geräuschfilters GF verbunden. Der Multiplizierer MU weist eine Verbindung mit dem Ausgang der Bypass-Steuerung BS, dem Ausgang der Stufe zur stationären Geräusschätzung GSS, dem Ausgang der Stufe zur instationären Geräuschschätzung GSI und dem Ausgang der Zeitverzögerungsstufe für den instationären Geräuschanteil TG auf.A block diagram of the method arrangement is shown in FIG. 2. The basic core is formed by the elements known from FIG. 1, microphone M, noise filter GF and noise estimation GSS for the stationary noise component. Furthermore, a bypass control BS known per se, stages known per se for time delay and a stage known per se for calculating the maximum power of the input signal LM are used. The time delay stage for the input signal TX is arranged between the input of the nearby subscriber E and the noise filter GF. The output of the time delay stage for the input signal TX is connected to an input of the bypass control BS, an input of the noise filter GF and an input of the filter adaptation of the noise filter GF. The input of the near participant E is on the one hand via a known stage for calculating the maximum power of the input signal LM with the bypass control BS and a selection stage AS and on the other hand with the known stage for stationary noise estimation GSS and an additional stage non-stationary noise estimation connected to GSI. The stage for stationary noise estimation GSS is connected to the bypass control BS and the selection stage AS. The stage for the transient noise estimation GSI is connected on the one hand directly and on the other hand via an assembly for the time delay TG to the selection stage AS. The selection stage AS is connected to the filter adaptation module of the noise filter GF via a multiplier MU. The multiplier MU has a connection to the output of the bypass control BS, the output of the stage for stationary noise estimation GSS, the output of the stage for transient noise estimation GSI and the output of the time delay stage for the transient noise component TG.

In der Stufe zur Berechnung des Leistungsmaximums LM wird fortlaufend die Leistung L_x des Eingangssignals x als Kurzzeit-Quadratmittel über etwa 30 ms nach der Beziehung,
In the stage for calculating the maximum power LM, the power L _{x of} the input signal x is continuously measured as a short-term mean square over about 30 ms according to the relationship

L_x = x² (1)
L _x = x² (1)

berechnet und daraus das gleitende Leistungsmaximum L_XM entsprechend der Beziehung
calculated and from this the sliding power _maximum L _XM according to the relationship

L_XM.neu = max {L_x; µL_XM,alt}; µ < 1. (2)
L _XM.new = max {L _x ; µL _{XM, old} }; µ <1. (2)

bestimmt. Der Abklingfaktor µ bewirkt, dass ein hoher Leistungswert nicht ewig fortbesteht, so dass sich das Leistungsmaximum L_XM an allmählich fallende Signallautstärke anpassen kann. Das nach der Beziehung (2) berechnete Leistungsmaximum des Eingangssignals L_XM dient als Steuergröße für die Auswahlstufe AS und die Bypass-Steuerung BS. certainly. The decay factor µ means that a high power value does not persist forever, so that the power maximum L _XM can adapt to gradually decreasing signal volume. The power maximum of the input signal L _XM calculated according to the relationship (2) serves as a control variable for the selection stage AS and the bypass control BS.

Das Eingangssignal x wird in der schon bekannten Stufe zur stationären Geräuschschätzung GSS ausgewertet. Am Ausgang der Stufe zur stationären Geräuschschätzung GSS liegt dann das geschätzte Kurzzeit-Leistungsdichtespektrum für den stationären Geräuschanteil vor. Die zusätzliche Stufe zur instationären Geräuschschätzung GSI liefert den aktuellen Wert des geschätzten Kurzzeit- Leistungsdichtespektrums für den instationären Geräuschanteil P_GI(f).The input signal x is evaluated in the already known stage for the stationary noise estimation GSS. The estimated short-term power density spectrum for the stationary noise component is then available at the output of the stage for stationary noise estimation GSS. The additional stage for the transient noise estimation GSI supplies the current value of the estimated short-term power density spectrum for the transient noise component P _GI (f).

Fig. 3 verdeutlicht die Wirkungsweise der Stufe zur stationären Geräuschschätzung GSS in Verbindung mit der Stufe zur instationären Geräusschätzung GSI. Die Signalauswertung und Geräuschfilterung geschieht blockweise mit einer Blocklänge von etwa 30 ms. Der aktuell zu filternde Signalblock ist ein Ausschnitt aus dem Eingangssignal x und hat den Index n. In der Stufe zur stationären Geräuschschätzung GSS wird nach einer Sprachpause in einem Suchbereich SBS = {n, n - 1 . . ., n - K} gesucht, der eine zeitliche Länge von typischerweise etwa 1 s aufweist und den aktuellen Block sowie zeitlich zurückliegende Blöcke umfasst (Fig. 3a). Der Signalblock mit der kleinsten Blockleistung innerhalb des Suchbereiches SBS deutet auf eine Sprachpause hin und das diesem Signalblock zugeordnete Kurzzeit-Leistungsdichtespektrum P_GS(f) wird daher als Schätzgröße für den stationären Geräuschanteil verwendet. FIG. 3 illustrates the mode of operation of the stage for stationary noise estimation GSS in connection with the stage for transient noise estimation GSI. The signal evaluation and noise filtering takes place block by block with a block length of about 30 ms. The signal block to be currently filtered is a section of the input signal x and has the index n. In the stage for stationary noise estimation GSS, after a speech pause, SBS = {n, n - 1 in a search area. . ., n-K}, which has a length of time of typically about 1 s and comprises the current block and blocks in time ( FIG. 3a). The signal block with the smallest block power within the search area SBS indicates a speech pause and the short-term power density spectrum P _GS (f) assigned to this signal block is therefore used as an estimate for the stationary noise component.

In der zusätzlichen neuen Stufe GSI wird nur der Suchbereich SBI = {n - 1, n, n + 1} ausgewertet, der den aktuellen Signalblock, den vorausgegangenen Signalblock und auch den zukünftigen Signalblock mit dem Index n + 1 umfasst (Fig. 3b). Die Auswertung des zukünftigen Signalblocks wird durch eine zusätzliche Zeitverzögerungsstufe für das Eingangssignal TX möglich, die zwischen dem Eingang des nahen Teilnehmers E und dem adaptiven Geräuschfilter GF angeordnet ist. Die konkrete Zeitverzögerung der Zeitverzögerungsstufe TX entspricht genau der Länge eines Signalblockes. Ausgehend davon, dass der Signalblock aus dem Suchbereich SBI, der die kleinste Blockleistung enthält, mutmaßlich auch den geringsten Sprachanteil enthält, wird das diesem Signalblock zugeordnete Kurzzeit-Leistungsdichtespektrum als Schätzgröße für den instationären Geräuschanteil P_GI(f) weiter verwendet. In the additional new stage GSI, only the search area SBI = {n-1, n, n + 1} is evaluated, which comprises the current signal block, the previous signal block and also the future signal block with the index n + 1 ( FIG. 3b) . The evaluation of the future signal block is made possible by an additional time delay stage for the input signal TX, which is arranged between the input of the nearby subscriber E and the adaptive noise filter GF. The concrete time delay of the time delay stage TX corresponds exactly to the length of a signal block. Assuming that the signal block from the search area SBI, which contains the smallest block power, presumably also contains the lowest speech component, the short-term power density spectrum assigned to this signal block is used further as an estimate for the unsteady noise component P _GI (f).

Da die Schätzgröße für den instationären Geräuschanteil P_GI(f) insbesondere in den vokalischen Sprachbereichen noch erhebliche Sprachanteile enthalten kann, werden mittels der Auswahlstufe AS Werte des geschätzten Kurzzeit-Leistungsdichtespektrums für den instationären Geräuschanteil P_GI(f) mit hoher Signalleistung, die auf einen möglichen Sprachanteil hindeuten, von der Weiterverarbeitung ausgeschlossen. Dazu werden in der Auswahlstufe AS verschiedene Leistungsvergleiche durchgeführt. Zunächst wird aus dem Leistungsmaximum des Eingangssignals L_XM nach (2) ein Leistungsschwellwert
Since the estimate for the transient noise component P _GI (f) can still contain considerable speech components, particularly in the vocal language areas, the selection level AS is used to select values of the estimated short-term power density spectrum for the transient noise component P _GI (f) with a high signal power, which correspond to a indicate possible language content, excluded from further processing. Various performance comparisons are carried out in the AS selection level. First, the power maximum of the input signal L _XM according to (2) becomes a power threshold

L_SW = c.L_XM; c < 1 (3)
L _SW = cL _XM ; c <1 (3)

bestimmt, der deutlich unter der Leistung der aktuellen vokalischen Sprachbereiche liegt. Geprüft wird die geschätze Leistung des instationären Geräuschanteils L_GI, die aus dem Wert des geschätzten Kurzzeit-Leistungsdichtespektrums für den instationären Geräuschanteil P_GI(f) berechnet wird. Es werden folgende Abfragen durchgeführt:
determined, which is significantly below the performance of the current vocal language areas. The estimated power of the transient noise component L _{GI is checked} , which is calculated from the value of the estimated short-term power density spectrum for the transient noise component P _GI (f). The following queries are carried out:

L_GI < L_SW? (4a)L _GI <L _SW ? (4a)

Ist (4a) erfüllt, so wird als Geräuschschätzung P_G(t) der Wert des geschätzten Kurzzeit- Leistungsdichtespektrums für den instationären Geräuschanteil P_GI(f) eingesetzt und die Auswahl ist abgeschlossen. Anderenfalls wird auf zeitlich zurückliegende Werte des geschätzten Kurzzeit-Leistungsdichtespektrums für den instationären Geräuschanteil P_GI(f) zurückgegriffen (verzögert mittels der Stufe TG). In diesem Fall wird der Wert der geschätzten Leistung des zurückliegenden instationären Geräuschsignals L_GI,TG für den Vergleich herangezogen. Ist innerhalb des Suchbereiches SBS die Bedingung
If (4a) is fulfilled, the value of the estimated short-term power density spectrum for the transient noise component P _GI (f) is used as the noise estimate P _G (t) and the selection is completed. Otherwise, past values of the estimated short-term power density spectrum for the transient noise component P _GI (f) are used (delayed by means of stage TG). In this case, the value of the estimated power of the past transient noise signal L _{GI, TG is used} for the comparison. Is the condition within the search area SBS

L_GI,TG < L_SW? (4b)
L _{GI, TG} <L _SW ? (4b)

erfüllt, so wird als Geräuschschätzung P_G(f) das zeitverzögerte geschätzte Kurzzeit- Leistungsdichtespektrum für den instationären Geräuschanteil P_GI,TG(f) eingesetzt und die Auswahl ist abgeschlossen. Ist auch die Bedingung (4b) nicht erfüllt, so wird das geschätzte Kurzzeit-Leistungsspektrum für den stationären Geräuschanteil P_GS(f) als endgültige Geräuschschätzung P_G(f) herangezogen. Der in Fig. 2 abgebildete Schalter S des Multiplizierers MU wird dann gemäß der in der Auswahlstufe AS getroffenen Entscheidung zur Weiterleitung der Geräuschschätzung P_G(f) eingestellt. fulfilled, the time-delayed estimated short-term power density spectrum for the transient noise component P _{GI, TG} (f) is used as the noise estimate P _G (f) and the selection is completed. If condition (4b) is also not met, the estimated short-term power spectrum for the stationary noise component P _GS (f) is used as the final noise estimate P _G (f). The switch S of the multiplier MU shown in FIG. 2 is then set in accordance with the decision made in the selection stage AS for forwarding the noise estimate P _G (f).

Das Kurzzeit-Leistungsdichtespektrum der Geräuschschätzung P_G(f) kann immer noch leichte Sprachanteile enthalten. Um diesen Resteinfluss zu verringern, wird mit Hilfe der Bypass-Steuerung BS ein Gewichtungsfaktor b (0 < b < 1) als Multiplikator für P_G(f) adaptiv eingestellt. Bei hoher Leistung des Eingangssignals L_x werden kleine Werte von b eingestellt (mit der Konsequenz einer abgeschwächten Geräuschschätzung), so dass ein Geräuschfilter GF mit reduzierter Filterwirkung berechnet wird. Der Einfluss des Gewichtungsfaktors b wirkt also wie ein zusätzlicher Bypass für das optimale Geräuschfilter. Da bei hoher Leistung des Eingangssignals L_x vermutlich ein energiereicher Sprachlaut vorliegt und damit der Wert des geschätzten Kurzzeit- Leistungsdichtespektrums für den instationären Geräuschanteil P_GI(f) noch Sprachanteile enthalten könnte, wird so der Einfluss solcher Sprachanteile auf die Filteradaption abgemildert. Bei niedriger Leistung des Eingangssignals L_x dominiert der Geräuschanteil; es werden dann hohe Werte des Gewichtungsfaktors b eingestellt (b → 1) und das adaptive Geräuschfilter GF entfaltet seine volle Wirkung.The short-term power density spectrum of the noise estimate P _G (f) can still contain slight speech components. In order to reduce this residual influence, the bypass control BS is used to adaptively set a weighting factor b (0 <b <1) as a multiplier for P _G (f). At high power of the input signal L _x , small values of b are set (with the consequence of a weakened noise estimate), so that a noise filter GF with a reduced filter effect is calculated. The influence of the weighting factor b thus acts as an additional bypass for the optimal noise filter. Since at high power of the input signal L _{x there is} probably an energetic speech and the value of the estimated short-term power density spectrum for the transient noise component P _GI (f) could still contain speech components, the influence of such speech components on the filter adaptation is thus mitigated. At low power of the input signal L _x , the noise _component dominates; high values of the weighting factor b are then set (b → 1) and the adaptive noise filter GF has its full effect.

Eine zweckmäßige Ausgestaltung der Bypass-Funktion b(L_x) ist in Fig. 4 skizziert. In der Bypass-Steuerung BS werden außer der Leistung des Eingangssignals L_x zusätzlich das Leistungsmaximum des Eingangssignals L_XM nach (2) und die geschätzte Leistung des stationären Geräuschsignals L_GS ausgewertet. Die geschätzte Leistung des stationären Geräuschsignals L_GS kann direkt aus dem Wert für das geschätzte Kurzzeit- Leistungsdichtespektrum für den stationären Geräuschanteil P_GS(f) bestimmt werden. Die dynamischen Eckpunkte L_y und L_z der Kennlinie b(L_x) in Fig. 4 werden bestimmt durch
An expedient embodiment of the bypass function b (L _x ) is outlined in FIG. 4. In addition to the power of the input signal L _x , the power maximum of the input signal L _XM according to (2) and the estimated power of the stationary noise signal L _GS are evaluated in the bypass control BS. The estimated power of the stationary noise signal L _GS can be determined directly from the value for the estimated short-term power density spectrum for the stationary noise component P _GS (f). The dynamic corner points L _y and L _{z of} the characteristic curve b (L _x ) in FIG. 4 are determined by

wobei y, z und c konstante Faktoren sind.where y, z and c are constant factors.

Durch die Gestaltung der Kennlinie b(L_x) nach Fig. 4 ist sichergestellt, dass im sprach freien Fall, also bei kleinen Werten von L_x, stets eine gleichmäßig starke Geräusch dämpfung mit dem Bypass-Faktor b = b_max erzielt wird. Bei großen Werten von L_x, wenn also voraussichtlich Sprache vorliegt, wird der Eckpunkt L_z mit b = b_min stets an das aktuelle Leistungsmaximum L_XM und damit an den Leistungsbereich der energiereichen Vokalanteile angepasst.The design of the characteristic curve b (L _x ) according to FIG. 4 ensures that in speech-free fall, that is to say with small values of L _x , uniformly strong noise damping with the bypass factor b = b _max is always achieved. With large values of L _x , that is, if speech is likely to be present, the corner point L _z with b = b _{min is} always adapted to the current power maximum L _XM and thus to the power range of the high-energy vowel components.

Der Übersichtlichkeit halber wurden sowohl in Fig. 1 und 2 als auch im Text alle frequenzabhängigen Größen ausschließlich im Frequenzbereich beschrieben. Es ist natürlich möglich, Teile oder auch das gesamte Verfahren im Zeitbereich zu realisieren. So können statt der Kurzzeit-Leistungsdichtespektren P_GS(f), P_GI(f) und P_G(f) auch die entsprechenden Kurzzeit-Autokorrelationsfunktionen ermittelt werden. Das adaptive Geräuschfilter GF kann dann z. B. aus diesen aufbereiteten Korrelationsfunktionen als Wiener-Filter berechnet werden. Auch die Filterung des Eingangssignals x mit dem adaptiven Geräuschfilter GF kann im Zeitbereich erfolgen. Hierzu wird das Geräuschfilter als Transversalfilter mit endlich vielen Koeffizienten berechnet, so dass die Filterung als Faltungsoperation mit dem Eingangsignal x realisiert wird.For the sake of clarity, both frequency-dependent variables were only described in the frequency domain in FIGS. 1 and 2 and in the text. It is of course possible to implement parts or the entire process in the time domain. Instead of the short-term power density spectra P _GS (f), P _GI (f) and P _G (f), the corresponding short-term autocorrelation functions can also be determined. The adaptive noise filter GF can then, for. B. can be calculated from these prepared correlation functions as Wiener filters. The input signal x can also be filtered with the adaptive noise filter GF in the time domain. For this purpose, the noise filter is calculated as a transversal filter with a finite number of coefficients, so that the filtering is implemented as a convolution operation with the input signal x.

Die Erweiterung des einkanaligen Geräuschreduktionsverfahrens um die Stufe zur instationären Geräuschschätzung GSI in Kombination mit der Auswahlstufe AS ermöglicht eine wirksame Geräuschreduktion auch bei instationärer Geräuschumgebung. Wesentliche Bedingung ist, dass das Geräuschspektrum sich etwas langsamer als das Sprachspektrum verändert. Darüber hinaus sind keine weiteren Bedingungen, wie etwa an die Anzahl oder die Struktur der Geräuschquellen, zu erfüllen.The expansion of the single-channel noise reduction process by the level of unsteady noise estimation GSI in combination with the selection level AS enables effective noise reduction even in a transient noise environment. An essential condition is that the noise spectrum is somewhat slower than that Language spectrum changed. Beyond that there are no other conditions such as the number or structure of the noise sources.

Die zusätzliche Stufe zur Zeitverzögerung des Eingangssignals TX, die vor dem Eingang des Geräuschfilters GF angeordnet ist, erweitert den Analysebereich für die instationäre Geräuschschätzung auch auf den "Zukunfts"-Signalbereich und reduziert so den unerwünschten Einfluss von Sprachanteilen bei der Geräuschschätzung. Durch den Einfluss der Zeitverzögerungsstufe für das Eingangssignals TX erhöht sich die gesamte Signal-Durchlaufzeit nur geringfügig. In Kombination mit Bild-Codecs wird diese erhöhte Signal-Durchlaufzeit sogar bedeutungslos, weil Bild-Codecs in der Regel eine deutlich höhere Signal-Durchlaufzeit aufweisen.The additional stage for time delay of the input signal TX, which is before the input of the noise filter GF is arranged, extends the analysis range for the transient Noise estimation also on the "future" signal range and thus reduces the undesirable influence of speech components in the noise estimation. By the The influence of the time delay stage for the input signal TX increases the total Signal throughput time only marginal. In combination with picture codecs this will increased signal throughput time even meaningless, because picture codecs are usually one have significantly longer signal throughput times.

Die eingefügte Bypass-Steuerung BS vermindert den Einfluss von unerwünschten Sprachanteilen aus der instationären Geräuschschätzung bei der Berechnung des Geräuschfilters GF. So wird verhindert, dass durch diesen unerwünschten Einfluss etwa Artefakte im geräuschgefilterten Sprachsignal auftreten. The inserted bypass control BS reduces the influence of unwanted ones Speech components from the transient noise estimate when calculating the Noise filter GF. This prevents this undesirable influence from occurring Artifacts occur in the noise-filtered speech signal.

List of reference symbols

E Eingang des nahen Teilnehmers
A Ausgang zum fernen Teilnehmer
x Eingangssignal
s Sprachsignal
n Störsignal
rekonstruiertes Sprachsignal
M Mikrofon
GSS Stufe zur stationären Geräuschschätzung
GF adaptives Geräuschfilter
LM Stufe zur Berechnung der Leistung und des Leistungsmaximums des Eingangssignals
AS Auswahlstufe
BS Bypass-Steuerung
GSI Stufe zur instationären Geräuschschätzung
MU Multiplizierer
TX Zeitverzögerungsstufe für das Eingangssignal
TG Zeitverzögerungsstufe für den instationären Geräuschanteil
P_G E Entrance of the nearby participant
A Exit to the far party
x input signal
s voice signal
n Interference signal
reconstructed speech signal
M microphone
GSS level for stationary noise estimation
GF adaptive noise filter
LM stage for calculating the power and the maximum power of the input signal
AS selection level
BS bypass control
GSI level for transient noise estimation
MU multiplier
TX time delay stage for the input signal
TG time delay stage for the transient noise component
P _G

(f) geschätztes Kurzzeit-Leistungsdichtespektrum für den aktuellen Geräuschanteil
P_GS (f) Estimated short-term power density spectrum for the current noise component
P _GS

(f) geschätztes Kurzzeit-Leistungsdichtespektrum für den stationären Geräuschanteil
P_GI (f) Estimated short-term power density spectrum for the stationary noise component
P _GI

(f) geschätztes Kurzzeit-Leistungsdichtespektrum für den instationären Geräuschanteil
P_GI,TG (f) Estimated short-term power density spectrum for the transient noise component
P _{GI, TG}

(f) geschätztes zeitverzögertes Kurzzeit-Leistungsdichtespektrum für den instationären Geräuschanteil
Lx Leistung des Eingangssignals
L_XM (f) Estimated time-delayed short-term power density spectrum for the transient noise component
Lx power of the input signal
L _XM

Leistungsmaximum des Eingangssignals
L_SW Maximum power of the input signal
L _SW

Leistungsschwellwert
L_GS Power threshold
L _GS

geschätzte Leistung des stationären Geräuschsignals
L_GI Estimated power of the stationary noise signal
L _GI

geschätzte Leistung des instationäres Geräuschsignals
L_GI/TG Estimated power of the transient noise signal
L _{GI / TG}

geschätzte Leistung des zurückliegenden instationären Geräuschsignals
Ly; Lz dynamische Eckpunkte der Kennlinie b(Lx)
µ Abklingfaktor
SBS Suchbereich für den stationären Geräuschanteil (für Stufe GSS)
SBI Suchbereich für den instationären Geräuschanteil (für Stufe GSI)
b Gewichtungsfaktor
y, z c konstante Faktoren
Estimated power of the past transient noise signal
Ly; Lz dynamic corner points of the characteristic curve b (Lx)
µ decay factor
SBS search area for the stationary noise component (for level GSS)
SBI search area for the transient noise component (for level GSI)
b weighting factor
y, zc constant factors

Claims

1.Procedure for single-channel noise reduction of disturbed speech signals, the noise of which changes more slowly than the speech signal, the signal-to-noise ratio at the input of the microphone (M) being at least 6 dB, and in which the estimated short-term power density spectrum for the stationary noise component ( P _GS (f)) is determined, characterized in that
in addition to the estimated short-term power density spectrum for the stationary noise component (P _GS (f)) the estimated short-term power density spectrum for the current unsteady noise component (P _GI (f)) is determined that
A permanently active selection function is used to determine a power threshold value (L _SW ) from the power of the sliding power maximum of the input signal (L _XM ), which successively corresponds to the power of the current estimated short-term power density spectrum for the transient noise component (P _GI (f)) , and the power of the time-delayed estimated short-term power density spectrum for the unsteady noise component P _{GI / TG} (f) is compared, starting from the most current estimate for the short-term power density spectrum of the noise component, the estimated value for further processing is always selected, its associated power First the comparison criterion power <power _threshold value (L _SW ) is fulfilled and its power is therefore significantly smaller than the power maximum of the input signal (L _XM ), and that in the event that no assigned power value meets the comparison criterion, the estimate for the short-term power density spectrum of the stationary ren noise component (P _GS (f)) is selected for further processing, and
that the time-delayed power of the input signal (L _x ), the power maximum of the input signal (L _XM ) and the estimated power of the stationary noise signal (L _GS ) are evaluated using a bypass function b (L), the estimated power of the stationary noise signal (L _GS ) is derived directly from the estimated short-term power density spectrum for the stationary noise component (P _GS (f)), and that

a) at a signal level of the power of the input signal (L _X ), which is equal to or in the range of 3-6 dB above the level of the estimated stationary noise component (L _GS ), the maximum value of the bypass factor b _{max is} set and thus a maximum effect of the noise filter (GF) is achieved and that
b) when the power of the input signal (L _X ) increases, the bypass factor (b) and thus the effect of the noise filter (GF) is reduced, the strength of the reduction being additionally determined by the size of the maximum power of the input signal (L _XM ) .

2. The method according to claim 1, characterized in that to determine the sliding power maximum (L _XM ), the power (L _X ) of the input signal (x) is determined continuously as a short-term square mean over a period of time that corresponds to the block length of a signal block. and that from the relationship L _XM.neu = max {L _x ; µL _{XM, old} }; µ <1 the sliding maximum power (L _XM ) is continuously determined.

3. The method according to claim 1, characterized in that the power threshold (L _SW ) according to the relationship L _SW = cL _XM ; c <1 is determined so that it is significantly below the performance of the current vocal language areas.

4. The method according to claim 1, characterized in that the time delay of the Power of the input signal over the time delay stage for the input signal (TX) is set so that it corresponds to the block length of a signal block, and that the previous signal block as well as the current and the future signal block for the evaluation of the search area for the transient Noise component (SBI) can be provided.

5. Arrangement for single-channel noise reduction for disturbed speech signals from a known stage for calculating the power and the Power maximum of the input signal (LM), a stage to the stationary Noise Estimation (GSS), a known bypass control (BS) and one Noise filter (GF) with filter adaptation, characterized in that between the near participant's input (E) and the adaptive noise filter (GF) a time delay stage for the input signal (TX) is arranged that the output of the time delay stage for the input signal (TX) with a Input of the bypass control (BS), an input of the adaptive noise filter (GF) and an input of the filter adaptation of the adaptive noise filter (GF) is connected that the input (E) of the nearby participant additionally with a step is connected to the transient noise estimation (GSI), which on the one hand is direct and on the other hand via a time delay stage for the transient noise component (TG) is connected to a selection level (AS) that the selection level (AS) via a multiplier (MU) with the filter adaptation of the adaptive noise filter (GF) and that the multiplier (MU) is connected to the output of the bypass Control (BS), the output of the stationary noise estimation stage (GSS), the output of the transient noise estimation (GSI) stage and the output the time delay stage for the transient noise component (TG) is connected.