DE112007003674T5

DE112007003674T5 - Method and apparatus for single-channel speech enhancement based on a latency-reduced auditory model

Info

Publication number: DE112007003674T5
Application number: DE112007003674T
Authority: DE
Inventors: Martin Opitz; Robert Höldrich; Franz Zotter; Markus Noisternig
Original assignee: AKG Acoustics GmbH
Current assignee: AKG Acoustics GmbH
Priority date: 2007-10-02
Filing date: 2007-10-02
Publication date: 2010-08-12
Also published as: AT509570B1; GB201004090D0; GB2465910A; AT509570A5; WO2009043066A1; GB2465910B

Abstract

Methode zur Störgeräuschunterdrückung für ein Eingangsaudiosignal (y[n]), das ein gewünschtes Signal (x[n]) und eine Störgeräuschsignalkomponente aufweist, wobei die Methode folgende Schritte aufweist:
– Aufspaltung des Eingangsaudiosignals (y[n]) in eine Vielzahl von Frequenzteilbänder (y_k[n]) durch eine Bandaufspaltungsanalyse,
– Störgeräuschunterdrückung in jedem Teilband (y_k[n]) durch eine Vielzahl von Störgeräuschenunterdrückungsprozessoren,
– Zusammensetzung der Vielzahl von Teilbändern (y_k[n]) zu einem Ausgangssignal (x ^[n]) durch ein Synthesefilter,
wobei alle Schritte im Zeitbereich ausgeführt werden.A method for noise suppression for an input audio signal (y [n]) having a desired signal (x [n]) and a noise signal component, the method comprising the steps of:
Splitting the input audio signal (y [n]) into a plurality of frequency subbands (y _k [n]) by a band splitting analysis,
Noise suppression in each subband (y _k [n]) by a plurality of noise suppression processors,
- composition of the plurality of subbands (y _k [n]) to an output signal (x ^ [n]) through a synthesis filter,
where all steps are performed in the time domain.

Description

Bereich der ErfindungField of the invention

Die gegenständliche Erfindung bezieht sich auf eine Methode zur Verbesserung eines breitbandigen Audiosignals mit Hintergrundgeräuschen und im Speziellen auf ein Störgeräuschunterdrückungssystem, eine Störgeräuschunterdrückungsmethode und ein Störgeräuschunterdrückungsprogramm. Im Speziellen bezieht sich die gegenständliche Erfindung auf eine latenzzeitreduzierte Einkanalstörgeräuschunterdrückung, unter Verwendung von Teilbandverarbeitung basierend auf Ausblendeigenschaften des menschlichen Gehörsystems.The This invention relates to a method to improve a broadband audio signal with background noise and more particularly to a noise cancellation system, a noise reduction method and a noise canceling program. In particular, the subject invention relates to a latency reduced Einkanalstörgeräuschunterdrückung, using subband processing based on fade properties of the human hearing system.

Hintergrund der ErfindungBackground of the invention

Zusätzliche Hintergrundgeräusche in der Sprachkommunikationssysteme reduziert die subjektive Qualität und Verständlichkeit der wahrgenommenen Stimme. Deshalb erfordern Sprachverarbeitungssysteme Störgeräuschreduktionsmethoden, z. B. Methoden, die auf eine Verarbeitung abzielen, um den Rauschpegel in einem verrauschten Signal zu eliminieren oder zu abzuschwächen und das Störabstand (Signal-zu-Rausch-Verhältnis, SNR) zu verbessern ohne die Sprache oder ihre Charakteristik zu beeinträchtigen. Störgräuschreduktion wird im Allgemeinen auch Störgeräuschunterdrückung oder Sprachverbesserung genannt.additional Background noise in the voice communication systems reduces the subjective quality and comprehensibility the perceived voice. Therefore, speech processing systems require noise reduction methods, z. B. Methods that are aimed at processing to reduce the noise level in to eliminate or attenuate a noisy signal and the signal-to-noise ratio, SNR) without the language or its characteristics too affect. Störgräuschreduktion generally also gets noise suppression or language improvement called.

Zum Beispiel werden Mobiltelefone oft in Umgebungen, wie öffentliche Plätze, mit hohem Hinergrundstörgeräuschen verwendet. Die Verwendung von Mobiltelefonen und sprachgesteuerte Geräte und Kommunikationssysteme in Autos hat einen großen Bedarf an Freisprechinstallationen für die Erhöhung der Sicherheit und des Komforts im Auto geschaffen. In vielen Staaten und Regionen verbietet das Gesetz z. B. das handgehaltene Telefonieren im Auto. Störgeräuschreduktion wird für diese Anwendungen wichtig, da ihre Anwendungen in akustisch ungünstigen Umgebungen notwendig sind, im Speziellen bei niedrigem Störabstand (SNR) und hoher zeitlich veränderlichen Störgeräuschpegelcharakteristik, wie z. B. Rollgeräusche von Autos.To the Mobile phones are often used in environments such as public Places, with high background noise used. The use of mobile phones and voice-controlled Devices and communication systems in cars has a big one Need for hands-free installations for the increase safety and comfort in the car created. In many states and regions prohibits the law z. B. the hand-held telephone in the car. Noise reduction is for These applications are important because their applications are acoustically unfavorable Environments are necessary, in particular with low signal-to-noise ratio (SNR) and high time-varying noise floor characteristics, such as B. Rolling noise of cars.

In (Freisprech-)Applikationen für Telekonferenzen, wie Videokonferenzen oder Spracherkennung und Abfragesysteme rührt das Hintergrundstörgeräusch von Ventilatoren von Computern, Druckern oder Faxgeräten her, welches als (langzeitlich) stationär betrachtet werden kann. Konversationsstörgeräusche von (Telefon-)Gesprächen, die von Kollegen stammen, die sich das Zimmer teilen, werden oft als Schnattergeräusch (babble noise) bezeichnet und bestehen aus harmonischen Komponenten und sind deshalb schwieriger durch eine Störgräuschreduktionseinheit abzuschwächen.In (Handsfree) applications for teleconferencing, such as video conferencing or speech recognition and interrogation systems stir the background noise from fans of computers, printers or fax machines which are considered to be (long-term) stationary can. Conversation noise of (telephone) calls, which come from colleagues who share the room, are often referred to as chatter sound (babble noise) and exist from harmonic components and are therefore more difficult by a To reduce the noise reduction unit.

Applikationen in Hörhilfen und Autosprechkommunikationssystemen erfordern jedoch Rauschunterdrückungsmethoden, die in Echtzeit ausgeführt werden können.applications in hearing aids and car speech communication systems however, noise reduction methods are executed in real time can be.

Trotzdem, die rasante Entwicklung der darunterliegenden Hardware in Bezug auf Rechenleistung und Speicherkapazität unterstützt den Fortschritt der Softwarerealisierungen.Nevertheless, the rapid development of the underlying hardware in terms on computing power and storage capacity the progress of software implementations.

Einer der meist verbreiteten Methoden der Rauschunterdrückung in anwendungsnahen Anwendungen wird in der Fachsprache als spektrale Subtraktion bezeichnet (vgl. S. F. Boll, ”Suppression of Acoustic Noise in Speech using Spectral Subtraction,” IEEE Trans. Acoust. Speech and Sig. Proc., vol. ASSP-27, pp. 113–120, Apr. 1979 ). Im Allgemeinen schätzt der spektrale Subtraktionsansatz die kurzzeitige spektrale Amplitude (STSA) der klaren Sprache von einem gestörten Sprachsignal, z. B. die gewünschte, durch Rauschen verunreinigte Sprache durch Subtraktion eines geschätzten Rauschsignals. Basierend auf der Annahme, dass das menschliche Ohr unempfindlich gegenüber Phasenverzerrungen ist, wird der geschätzte Betrag des Sprachsignals mit der Phase des gestörten Signals kombiniert (vgl. C. L. Wang et al., ”The unimportance of phase in speech enhancement,” IEEE Trans. Acoust. Speech and Sig. Proc., vol. ASSP-30, pp. 679–681, Aug. 1982 ). In der Praxis wird die spektrale Subtraktion durch die Multiplikation des Eingangssignalspektrums mit einer Gewichtsfunktion bewerkstelligt, um so Frequenzkomponenten mit geringer SNR zu unterdrücken. Diese SNR-basierte Gewichtsfunktion wird durch Abschätzungen des Störgeräuschspektrums gebildet und das gestörte Sprachspektrum wird im weitesten Sinne als stationär, und die mittelwertfreien Zufallsignale, die Sprache und die Rauschsignale als unkorreliert angenommen. Diese konventionellen spektralen Subtraktionsmethoden bieten signifikante Geräuschunterdrückung mit dem Hauptnachteil der Reduktion der Signalqualität an, akustisch wahrgenommen als musikalische Klänge oder musikalisches Geräusch. Die musikalisches Klänge stammen von den spektralen Schätzfehlern. In letzten Jahren wurden viele Verbesserungen des einfachen spektralen Subtraktionsansatzes entwickelt.One of the most widely used methods of noise suppression in application-related applications is referred to in technical jargon as spectral subtraction (cf. SF Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction," IEEE Trans. Acoust. Speech and Sig. Proc., Vol. ASSP-27, pp. 113-120, Apr. 1979 ). In general, the spectral subtraction approach estimates the short-term spectral amplitude (STSA) of the clear speech from a disturbed speech signal, e.g. For example, the desired noise-contaminated speech is obtained by subtracting an estimated noise signal. Based on the assumption that the human ear is insensitive to phase distortions, the estimated magnitude of the speech signal is combined with the phase of the distorted signal (cf. CL Wang et al., "The unimportance of phase in speech enhancement," IEEE Trans. Acoust. Speech and Sig. Proc., Vol. ASSP-30, pp. 679-681, Aug. 1982 ). In practice, the spectral subtraction is accomplished by the multiplication of the input signal spectrum with a weighting function so as to suppress low SNR frequency components. This SNR-based weighting function is formed by estimates of the noise spectrum and the disturbed speech spectrum is broadly assumed to be stationary and the mean-free random signals, speech and noise signals to be uncorrelated. These conventional spectral subtraction methods offer significant noise suppression with the major disadvantage of reducing signal quality, perceived acoustically as musical sounds or musical noise. The musical sounds come from the spectral estimation errors. In recent years, many improvements of the simple spectral subtraction approach have been developed.

Eine oft angewendete Methode um die musikalisches Klänge zu reduzieren ist ein überschätzes Störgeräuschspektrum zu substrahieren um die Fluktuationen in der DFT-Koeffizienten zu reduzieren und um zu verhindern, dass die spektralen Komponenten unter eine spektrale Untergrenze gehen (vgl. M. Berouti et al., ”Enhancement of speech corrupted by acoustic noise,” in Proc. IEEE Int. Conf. on Acoust., Speech and Sig. Proc. (ICASSP'79), vol. 4, pp. 208–211, Washington D. C., Apr. 1979 ). Dieser Ansatz reduziert erfolgreich die musikalisches Klänge bei schlechten SNR-Verhältnissen und Perioden mit alleinigem Störgeräuschen. Der Hauptnachteil ist die Verzerrung des Sprachsignals während des Sprechens. In der Praxis wurde eine Kompromiss zwischen Sprachqualität und dem Rest-Störgeräuschpegel gefunden. Weitere Methoden bewältigen dieses Problem durch die Einführung von optimalen und adaptiven Übersubtraktionsfaktoren für schlechte SNR-Verhältnisse und schlagen eine Untersubtraktion der Störgeräuschspektrums für gute SNR-Verhältnisse vor (vgl. W. M. Kushner et al., ”The effects of subtractive-type speech enhancement/noise reduction algorithms on parameter estimation for improved recognition and coding in high noise environments,” in Proc. IEEE Int. Conf. Acoustics, Speech and Sig. Proc. (ICASSP'89), vol. 1, pp. 211–214, 1989 ).An often used method to reduce musical sounds is to subtract an overestimated noise spectrum in order to reduce the fluctuations in the DFT coefficients and to prevent the spectral components from falling below a spectral lower limit (cf. M. Berouti et al., "Enhancement of speech corrupted by acoustic noise," in Proc. IEEE Int. Conf. on Acoust., Speech and Sig. Proc. (ICASSP'79), vol. 4, pp. 208-211, Washington DC, Apr. 1979 ). This approach successfully reduces the musical sounds at poor SNR ratios and periods of only noise. The main disadvantage is the distortion of the speech signal during speech. In practice, a compromise has been found between voice quality and the residual noise level. Other methods overcome this problem by introducing optimal and adaptive oversubduction factors for poor SNR ratios and propose sub-subtraction of the noise spectrum for good SNR ratios (cf. WM Kushner et al., "The effects of subtractive-type speech enhancement / noise reduction algorithms on parameter estimation for improved recognition and coding in high-noise environments," in Proc. IEEE Int. Conf. Acoustics, Speech and Sig. Proc. (ICASSP'89), vol. 1, pp. 211-214, 1989 ).

Die Anwendung einer auf weichen Entscheidung basierenden (soft-decision based) Modifikation der spektralen Gewichtsfunktion (vgl. R. McAulay and M. Malpass, ”Speech enhancement using a soft-decision noise suppression filter,” in IEEE Trans. Acoust, Speech and Sig. Proc., vol. 28, no. 2, pp. 137–145, 1980 ) hat Verbesserungen der Störgeräuschunterdrückungseigenschaften des Verstärkersystems in Bezug auf die Unterdrückung der musikalisches Klänge gezeigt. Diese weichen Entscheidungsansätze hängen hauptsächlich von der a priori Wahrscheinichkeit des Fehlens der Sprache in jeder spektralen Komponente der gestörten Sprache ab.The application of a soft-decision based modification of the spectral weight function (cf. R. McAulay and M. Malpass, "Speech enhancement using a soft-decision noise suppression filter," in IEEE Trans. Acoust, Speech and Sig. Proc., Vol. 28, no. 2, pp. 137-145, 1980 ) has shown improvements in the noise cancellation characteristics of the amplifier system with respect to the suppression of musical sounds. These soft decision approaches depend mainly on the a priori likelihood of the lack of speech in each spectral component of the disturbed speech.

Die kleinste mittlere quadratische Abweichung des kurzzeitigen spektralen Amplitudenschätzers (MMSE-STSA, vgl. Y. Ephraim and D. Malah, ”Speech enhancement using a minimum mean-square error short-time amplitude estimator,” IEEE Trans. Acoust. Speech and Sig. Proc, vol. 32, no. 6, pp. 1109–1121, 1984 ) und die kleinste mittlere quadratische Abweichung des logarithmischen spektralen Amplitudenschätzers (MMSE-LSA, Y. Ephraim and D. Malah, ”Speech enhancement using a minimum mean-square error log spectral amplitude estimator,” IEEE Trans. Acoust. Speech and Sig. Proc., vol. 33, no. 2, pp. 443–445, 1985 ) minimieren die entsprechende mittlere quadratische Abweichung der geschätzten kurzzeitigen spektralen oder logarithmischen spektralen Amplitude. Es wurde erkannt, dass der nicht-lineare Glättungsvorgang der MMSE-SP/LSA Methoden (die sogenannten entscheidungsgesteuerten Ansätze), eine einheitliche Abschätzung des SNR erwirkt, der ein gute Störgeräuschunterdrückung ohne unangenehme musikalisches Klänge bewerkstelligt (vgl. O. Capp, ”Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor,” IEEE Trans. Speech and Audio Proc., vol. 2, no. 2, pp. 345–349, 1994 ). Beide: Capp and Malah (vgl. E. Malah et al., ”Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments,” in Proc. IEEE Int. Conf. Acoust., Speech and Sig. Proc. (ICASSP'99), vol. 2, pp. 789–792, 1999 ) schlagen eine Begrenzung der a priori SNR-Abschätzung vor, um das Problem des wahrnehmbaren musikalisches Rauschens mit niedrigem Pegel während Sprechpausen zu bewältigen. Das sogenannte a priori SNR stellt die Informtion über das unbekannte Betragssprektrum dar, das von den vorhergegangenen Frames gesammelt und im entscheidungsgesteuerten Ansatz (DDA) ausgewertet wurde. Weil die Glättung, die vom DDA ausgeführt wird, Unregelmäßigkeiten aufweist, kann das musikalisches Geräusch mit geringem Pegel auftreten. Eine einfache Lösung für diese Problem besteht in der Einschränkung des a priori SNR durch eine untere Schranke.The smallest mean square deviation of the short-term spectral amplitude estimator (MMSE-STSA, cf. Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time amplitude estimator," IEEE Trans. Acoust. Speech and Sig. Proc, vol. 32, no. 6, pp. 1109-1121, 1984 ) and the smallest mean square deviation of the logarithmic spectral amplitude estimator (MMSE-LSA, Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error log spectral amplitude estimator," IEEE Trans. Acoust. Speech and Sig. Proc., Vol. 33, no. 2, pp. 443-445, 1985 ) minimize the corresponding mean square deviation of the estimated short-term spectral or logarithmic spectral amplitude. It has been recognized that the non-linear smoothing process of the MMSE-SP / LSA methods (the so-called decision-driven approaches) provides a uniform estimate of the SNR that provides good noise cancellation without unpleasant musical sounds (cf. O. Capp, "Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor," IEEE Trans. Speech and Audio Proc., Vol. 2, no. 2, pp. 345-349, 1994 ). Both: Capp and Malah (cf. E. Malah et al., "Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments," in Proc. IEEE Int. Conf. Acoust., Speech and Sig. Proc. (ICASSP'99), vol. 2, pp. 789-792, 1999 ) propose limiting the a priori SNR estimation to overcome the problem of perceptible low level musical noise during pauses in speech. The so-called a priori SNR represents the information about the unknown magnitude spectrum collected from the previous frames and evaluated in the decision-driven approach (DDA). Because the smoothing performed by the DDA has irregularities, the musical noise may occur at a low level. A simple solution to this problem is to restrict the a priori SNR to a lower bound.

In der Einkanal-Spektralsubtraktion wird das Störgeräuschspektrum normalerweise während der Sprechpause abgeschätzt, das Sprechaktivitätserkennungmethoden (VAD) erfordert (vgl. R. McAulay and M. Malpass, ”Speech enhancement using a softdecision noise suppression filter,” in IEEE Trans. Acoust., Speech and Sig. Proc., vol. 28, no. 2, pp. 137–145, 1980 ; and W. J. Hess, ”A pitch-synchronous digital feature extraction system for phonemic recognition of speech”, in IEEE Trans. Acoust., Speech and Sig. Proc., vol. 24, no. 1, pp. 14–25, 1976 ). Dieser Ansatz impliziert statische Störgeräuschcharkteristika während der Perioden des Sprechens. Arslan et al. entwickelte eine robuste Störgeräuschschätzmethode, die keine Sprechaktivitätserkennungmethoden wegen der rekursiven Mittelung mittels pegelabhängiger Zeitkonstanten für jedes Teilband erfordert (vgl. L. Arslan et al. ”New methods for adaptive noise suppression”, in Proc. Int. Conf. on Acoustics, Speech and Sig. Proc. (ICASSP-95), Detroit, May 1995 ). Martin schlägt eine Störgeräuschschätzmethode vor, basierend auf einer Minimum-Statistik und einer optimalen Glättung der Leistungsspektrumdichte (PSD, vgl. R. Martin, ”Noise power spectral density estimation based on optimal smoothing and minimum statistics,” in IEEE Trans. Speech and Audio Proc., vol. 9, no. 5, pp. 512, July 2001 ). Weiters prsentiert Ealey et al. eine Methode zur Abschätzung der nicht-stationären Störgeräusche während der Dauer der gesprochenen Worte durch die Verwendung der harmonischen Struktur des gesprochenen Sprachspektrums, auch bekannt als harmonisches Tunneln (vgl. D. Ealey et al., ”Harmonic tunnelling: tracking non-stationary noises during speech,” in Proc. Eurospeech Aalborg, 2001 ). Des Weiteren wird von Sohn und Sung vorgeschlagen, wenn Informationen aus weichen Entscheidungen verwendet werden, dass das Störgeräuschspektrum kontinuierlich adaptiert wird, ob Sprache vorhanden ist der nicht, (vgl. J. Sohn and W. Sung, ”A voice activity detector employing soft decision based noise spectrum adaptation,” in Proc. IEEE Int. Conf. Acoustics, Speech and Sig. Proc. (ICASSP'98), vol. 1, pp-365–368, 1998 ).In single-channel spectral subtraction, the noise spectrum is usually estimated during the speech pause, which requires speech activity detection (VAD) techniques (cf. R. McAulay and M. Malpass, "Speech enhancement using a soft-decision noise suppression filter," in IEEE Trans. Acoust., Speech and Sig. Proc., Vol. 28, no. 2, pp. 137-145, 1980 ; and WJ Hess, "A pitch-synchronous digital feature extraction system for phonemic recognition of speech", in IEEE Trans. Acoust., Speech and Sig. Proc., Vol. 24, no. 1, pp. 14-25, 1976 ). This approach implies static noise characteristics during the periods of speech. Arslan et al. developed a robust noise estimation method that does not require speech activity detection methods due to the recursive averaging using level-dependent time constants for each subband (cf. L. Arslan et al. "New methods for adaptive noise suppression", in Proc. Int. Conf. on Acoustics, Speech and Sig. Proc. (ICASSP-95), Detroit, May 1995 ). Martin proposes a noise estimation method based on minimum statistics and optimal power spectrum density smoothing (PSD, cf. R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics," in IEEE Trans. Speech and Audio Proc., Vol. 9, no. 5, pp. 512, July 2001 ). Further presented Ealey et al. a method for estimating the non-stationary noise during the duration of the spoken words by using the harmonic structure of the spoken speech spectrum, also known as harmonic tunneling (cf. D. Ealey et al., "Harmonic tunneling: tracking non-stationary noises during speech," in proc. Eurospeech Aalborg, 2001 ). Furthermore, if information from soft decisions is used, it is suggested by Sohn and Sung that the noise spectrum be continuously adapted, if speech is not present (cf. J. Sohn and W. Sung, "A voice activity detector employing soft decision based noise spectrum adaptation," in Proc. IEEE Int. Conf. Acoustics, Speech and Sig. Proc. (ICASSP'98), vol. 1, pp-365-368, 1998 ).

Ephraim und Van Trees schlagen eine andere wichtige auf Signalteilraumzerlegung basierte Methode zur Störgeräuschunterdrückung vor (vgl. Y. Ephraim and H. L. Van Trees, ”A signal subspace approach for speech enhancement”, in IEEE Trans. Speech and Audio Proc., vol. 3, pp. 251–266, July 1995 ). Dabei wird das verrauschte Signal in einen Signal-plus-Störgeräusch Teilraum und einen Störgeräuschteilraum zerlegt, wobei diese beiden Teilräume orthogonal zueinander sind. Dadurch wird es möglich das klare Sprachsignal von dem verrauscheten Signal abzuschätzen. Der resultierende lineare Schätzer ist ein allgemeines Wiener-Filter mit einem justierbaren Störgeräuschpegel um den Kompromiss zwischen der Signalverzerrung und dem Reststörgeräusch zu einzustellen, weil sie nicht gleichzeitig minimiert werden können.Ephraim and Van Trees propose another important signal subspace decomposition based method for noise suppression (cf. Y. Ephraim and HL Van Trees, "A signal subspace approach for speech enhancement", in IEEE Trans. Speech and Audio Proc., Vol. 3, pp. 251-266, July 1995 ). In this case, the noisy signal is decomposed into a signal plus noise subspace and a noise subspace, these two subspaces are orthogonal to each other. This makes it possible to estimate the clear speech signal from the noisy signal. The resulting linear estimator is a generic Wiener filter with adjustable noise level to adjust the trade-off between signal distortion and residual noise because they can not be simultaneously minimized.

Skoglund und Kleijn zeigen die Wichtigkeit des temporären Ausblendens von Eigenschaften in Verbindung mit der Einspeisung der gesprochenen Sprache (vgl. J. Skoglund and W. B. Kleijn, ”On Time-Frequency Masking in Voiced Speech”, in IEEE Trans. Speech and Audio Proc, vol. 8, no. 4, pp. 361–369, July 2000 ). Es wird gezeigt, dass Störgeräusche zwischen zwei Einspeisungsimpulsen stärker wahrgenommen werden, als Störgeräusche in der Nähe der Impulse und dies ist speziell für Sprache mit geringer Wortdichte der Fall, für die der Einspeisungsimpuls temporär spärlich zu finden ist. Temporäres Ausblenden wird nicht von konventionellen Störgeräuschreduktionsmethoden verwendet, die einen Frequenzbereichschätzer verwenden. WO 2006 114100 offenbart ein Signalteilraum-Ansatz, der temporäre Ausbledungseigenschaften in Betracht zieht.Skoglund and Kleijn show the importance of the temporary blanking of properties in connection with the feeding of the spoken language (cf. J. Skoglund and WB Kleijn, "On Time Frequency Masking in Voiced Speech", in IEEE Trans. Speech and Audio Proc, vol. 8, no. 4, pp. 361-369, July 2000 ). It is shown that noise between two feed pulses is perceived more strongly than noise near the pulses and this is especially the case for low word density speech for which the feed pulse is temporarily sparse. Temporary fading is not used by conventional noise reduction methods that use a frequency domain estimator. WO 2006 114100 discloses a signal subspace approach that takes into account temporary branching characteristics.

Gegenstand und Zusammenfassung der ErfindungObject and summary the invention

Das Ziel der vorliegenden Erfindung besteht darin, eine auf einem Einkanalhörmodell basierende Geräuschunterdrückungsmethode mit latenzzeitreduzierten Verarbeitung eines breitbandigen Sprachsignals in der Gegenwart von Hintergrundgeräuschen zu schaffen. Im Speziellen basiert die gegenwärtige Erfindung auf einer Methode zur spektralen Subtraktion unter Verwendung eines modifizierten entscheidungsgesteuerten Ansatzes, umfassend eine Übersubtraktion und einen einstellbaren Geräuschpegel zur Vermeidung von wahrnehmbaren musikalisches Klängen Weiters verwendet die gegenwärtige Erfindung Teilbandverarbeitung mit Vor- und Nachfilterung, um zur menschlichen Wahrnehmung gehörendes zeitweiliges und gleichzeitiges Ausblenden zu berücksichtigen, im Speziellen um die wahrnehmbaren Signalverzerrungen während der Sprechperioden zu minimieren.The The aim of the present invention is to provide a one-channel hearing model based noise suppression method with latency-reduced Processing of a wideband speech signal in the presence of background noise. In particular, based the present invention on a method for spectral Subtraction using a modified decision-driven approach, comprising an over-subtraction and an adjustable one Noise level to avoid perceptible musical Sounds Further, the present invention uses Subband processing with pre- and post-filtering to human Perception belonging temporary and simultaneous Hide, especially the perceptible Minimize signal distortion during the speech periods.

Die Frequenzbereichverarbeitung wird durch das vorgeschlagene System ausgefhrt, das mittels einer uneinheitlichen Gammaton-Filterbank (GTF), die in kritische Bänder, auch oft als Bark-Bänder bezeichnet, unterteilt ist. Diese Analysefilterbank teilt das verrauschte Signal in eine Vielzahl von sich überlappenden schmalbandigen Signalen auf, wobei die sepktrale (gleichzeitige) Ausblendeigenschaften des menschlichen Hörempfindens berüchsichtigt wird.The Frequency domain processing is provided by the proposed system carried out by means of a non-uniform Gammaton filter bank (GTF) that are in critical bands, often called bark bands designated, is divided. This analysis filter bank shares the noisy Signal in a variety of overlapping narrowband Signals, with the sepktrale (simultaneous) blanking properties of human hearing becomes.

Eine Vorverarbeitungseinheit, die das Transferverhalten des menschlichen Außen- und Mittelohr nachbildet, wird auf das zeit-diskrete verrauschte Eingangssignal angewendet (z. B. auf die gewünschte mit Störgeräuschen und Interferenzen verunreinigte Sprache).A Preprocessing unit, which is the transfer behavior of the human The outer and middle ear imitates the time-discrete noisy input signal applied (eg to the desired contaminated with noise and interference Language).

In jedem Teilband wird der Pegel des verrauschte Signals detektiert und geglättet. Diese engbandigen Pegeldetektoren werden auf eine Vielzahl von Teilbändern angewendet, um die Phase der einfachen Filterteile auszunutzen und um kürzeste Signalverarbeitungszeiten zu erhalten.In Each subband detects the level of the noisy signal and smoothed. These narrowband level detectors will be applied to a variety of subbands to the phase exploit the simple filter parts and short signal processing times to obtain.

Von der geglätteten Einhüllenden der Teilbandsignale wird der Störgeräuschpegel unter der Verwendung eines heuristischen, auf der rekursiven Minimum-Statistik basierenden Ansatzes für jedes Teilband geschätzt.From the smoothed envelope of the subband signals the noise level is under use a heuristic, based on the recursive minimum statistics Approach estimated for each subband.

Das unmittelbare Signal-zu-Störgeräusch-Verhältnis (SNR) wird für jedes Teilband von der Einhüllenden des verrauschten Signals und der Störgeräuschpegels geschätzt.The immediate signal-to-noise ratio (SNR) for each subband of the envelope the noisy signal and the noise level estimated.

Die a priori SNR wird von der unmittelbaren SNR durch die Verwendung der spektralen Ephraim-Malah-Subtraktionsregel (EMSR) geschätzt. Um den Einfluss der Schätzfehler zu minimieren, wird ein verbesserter entscheidungsgesteuerter Ansatz (DDA) vorgeschlagen, der einen Unterschätzungsparameter und einen unteren Störgeräuschpegelparameter einführt.The a priori SNR is estimated from the immediate SNR through the use of the Ephraim-Malah Spectral Subtraction Rule (EMSR). In order to minimize the influence of the estimation errors, an improved decision-driven approach (DDA) is proposed which includes an underestimation parameter and a introduces lower noise level parameters.

Das zeitliche, auf dem menschlichen Hörempfinden basierende Ausblenden wird durch das adäquate Filtern der Teilbandsignale berücksichtigt. Diese nichtlineare Gehörnachblendfilter wenden rekursive Mittelwertbildung an fallende Flanken der in jedem Teilband detektierten Signalpegel an; mit den folgenden Effekten: (a) Überschätzungsvarianzen der stoßartigen Störgeräusche, (b) Störgeräuschunterdrückungsalgorithmen haben keinen Effekt auf Signal unterhalb der zeitlichen Ausblendgrenze und (c) es wird keine zusätzliche Signalverzögerung für transiente Signale verursacht, die wichtig für die Sprachwahrnehmung sind.The temporal, based on human hearing Fading out is done by adequately filtering the subband signals considered. These nonlinear auditory admittance filters Apply recursive averaging to falling edges in each Subband detected signal level on; with the following effects: (a) overestimation variances the jerky noise, (b) noise reduction algorithms have no effect on signal below the timeout limit and (c) there is no additional signal delay is responsible for transient signals that are important for the speech perception are.

Eine nichtlineare Gewichtsfunktion für jedes Teilband wird aus der a priori SNR abgeschätzt, welche eine Übersubstraktion des geschätzten Störgeräuschsignals umfasst.A nonlinear weight function for each subband is off the a priori SNR is estimated to be an oversubstraction of the estimated noise signal.

Das gestörte Signal in jedem Teilband wird mit einem entsprechenden Gewichtsfaktor multipliziert, um die Störgeräuschsignalkomponenten zu unterdrücken.The disturbed signal in each subband is matched with a corresponding one Weight factor multiplied by the noise signal components to suppress.

Eine optimierte, nahezu perfekte Rekonstruktionsfilterbank setzt ein Entscheidungskriterium für vorzeichenbehaftetes Summieren zum Wiederherstellen des verbesserten Vollbandsprachsignals ein.A Optimized, almost perfect reconstruction filter bank begins Decision criterion for signed summation for restoring the improved full-band speech signal.

Letztlich wird ein Nachfilter auf das verbesserte Vollbandsignal angewendet, um den Effekt vom Vorfilter zu kompensieren.Ultimately a post-filter is applied to the improved full-band signal, to compensate for the effect of the pre-filter.

Bemerkungen: Die eingangs zitierten Störgeräuschunterdrückungsmethoden arbeiten im Frequenzbereich und verwenden die Diskrete Zeit-Fourier-Transformation (DTFT), die auf eine Blockverarbeitung der zeit-diskreten Eingangsignale basiert. Diese Blockverarbeitung fügt eine framegrößenabhängige Signalverzögerung hinzu.Remarks: The noise cancellation methods cited at the beginning work in the frequency domain and use the Discrete Time Fourier Transform (DTFT) based on a block processing of time-discrete input signals based. This block processing adds a frame size dependent Signal delay added.

Einkanal-Sprachverstärkungssysteme des Subtraktionstyps sind effizient in der Reduktion der Hintergrundgeräusche; jedoch bergen sie wahrnehmbare, lästige Reststörgeräusche. Um dieses Problem zu bewältigen, werden die Eigenschaften des Hörsystems in den Verstärkungsprozess eingebracht. Dieses Phänomen wird durch die Berechung der Störgeräuschausblendungsgrenze im Frequenzbereich modelliert, unter der alle Komponenten unhörbar sind (vgl. N. Virag, ”Single Channel Speech Enhancement Based on Masking Properties of the Human Auditory System”, IEEE Trans. on Speech and Audio Proc., vol. 7, no. 2, pp. 126–137, March 1999 ).Single-channel subtractive-type speech enhancement systems are effective in reducing background noise; However, they contain noticeable, annoying residual noise. To cope with this problem, the characteristics of the hearing system are introduced into the amplification process. This phenomenon is modeled by the calculation of the noise suppression limit in the frequency domain, under which all components are inaudible (cf. N. Virag, "Single Channel Speech Enhancement Based on Masking Properties of the Human Auditory System," IEEE Trans. On Speech and Audio Proc., Vol. 7, no. 2, pp. 126-137, March 1999 ).

Um die Hörausblendung in Sprachverstärkungssystemen des Subtraktionstyps zu modellieren, sind Filterbankimplementierungen speziell attraktiv, da sie auf die spektrale und zeitliche Auflösung des menschlichen Ohrs adaptiert werden können. Die Autoren schlagen eine Störgeräuschunterdruckungsmethode vor, basierend auf spektraler Subtraktion kombiniert mit der Zerlegung in kritische Bänder Gammaton-Filterbänke (GTF). Das Konzept der kritischen Bänder, welches die Auflösung des menschlichen Gehörsystems beschreibt, führt zu einer nichtlinearen Frequenzskala, der sogenannten Bark-Skala (vgl. J. O. Smith III and J. S. Abel, ”Bark and ERB Eilinear Transforms,” IEEE Trans. on Speech and Audio Proc., vol. 7, no. 6, pp. 697–708, Nov. 1999 ).To model the eclipses in subtractive-type speech enhancement systems, filter bank implementations are particularly attractive because they can be adapted to the spectral and temporal resolution of the human ear. The authors propose a noise suppression method based on spectral subtraction combined with decomposition into critical bands of gammaton filter banks (GTF). The concept of critical bands, which describes the resolution of the human auditory system, leads to a nonlinear frequency scale, the so-called Bark scale (cf. JO Smith III and JS Abel, "Bark and ERB Eilinear Transforms," IEEE Trans. On Speech and Audio Proc., Vol. 7, no. 6, pp. 697-708, Nov. 1999 ).

Die Verwendung der Gammaton-Filterbank übertrifft die DTFT basierten Ansätze in Bezug auf die rechnerische Komplexität und die Gesamtsystemverzögerungszeit. Jedoch, erlauben die GTF-Ansätze Auführungen mit kurzen Laufzeiten, Analyse-Synthese-Schemata mit geringer rechnerische Komplexität und nahzu perfekter Rekonstruktion. Der vorgeschlagene Systhesefilter erstellt das breitbandige Ausgangssignal durch eine einfache Summation der Teilbandsignale unter Einführung eines Kriteriums der Notwendigkeit, das Vorzeichen vor der Summation zu wechseln. Dieser Ansatz übertrifft die von McAulay and Malpass vorgeschlagenen sprachkanalentschlüsselungsbasierten (vocoder-based) Ansätze (vgl. R. J. McAulay and M. L. Malpass, ”Speech Enhancement Using a Soft-Decision Noise Suppression Filter”, IEEE Trans. on Acoust., Speech and Sig. Proc., vol. ASSP-28, no. 2, pp. 137–145, April 1980 ). In diesem Ansatz wird die Vollbandrekonstruktion des Ausgangsignals durch Summation von alternierend aus-der-Phase befindlichen Teilbandsignalen ohne Berücksichtigung der realen Phasenbeziehung zwischen Subbändern bewerkstelligt. Das bringt große Verzerrungen für das Ausgansignal.The use of the Gammaton filter bank outperforms the DTFT based approaches in terms of computational complexity and overall system delay time. However, the GTF approaches allow short-term runs, low-computational complexity, and near-perfect reconstruction analysis-synthesis schemes. The proposed system filter generates the wideband output signal by simply summing the subband signals, introducing a criterion of the need to change the sign before summation. This approach outperforms the voice channel decryption-based (vocoder-based) approaches proposed by McAulay and Malpass (cf. RJ McAulay and ML Malpass, Speech Enhancement Using a Soft-Decision Noise Suppression Filter, IEEE Trans. On Acoust., Speech and Sig. Proc., Vol. ASSP-28, no. 2, pp. 137-145, April 1980 ). In this approach, the fullband reconstruction of the output signal is accomplished by summing alternate out-of-phase subband signals without regard to the real phase relationship between subbands. That brings big distortions for the Ausgansignal.

Wichtige Bemerkung: Teilbandsignale ohne Downsampling, wie sie oft in Hörhilfssystemen angewendet werden, benötigen keine Synthesefilterbank. Daher ist dieser Ansatz für laufzeitreduzierte Sprachverstärkungssysteme anwendbar, aber rechnerisch hoch ineffizient. Die von den Autoren vorgeschlagene Methode erlaubt die Berechnung des Ausgangsignals von den Teilbandsignalen durch einfache Summation unter Berücksichtigung der Phasenunterschiede!Important Note: Sub-band signals without downsampling, as often used in hearing aids, do not require a synthesis filter bank. Therefore, this approach is applicable to delay-reduced speech enhancement systems, but is computationally highly inefficient. The method proposed by the authors allows the calculation of the output signal from the subband signals by simple summation under Be Consideration of the phase differences!

Es ist wert zu erwähnen, dass es viele Anwendungen, wie Hörhilfen oder Freisprecheinrichtungen in Autos, gibt, bei denen die rechnerischen Komplexität und Signalverzögerungen von äußerster Wichtigkeit sind.It is worth mentioning that there are many applications, such as hearing aids or hands-free in cars, where the computational Complexity and signal delays of utmost Importance.

Die Hauptvorteile der gegenwärtigen Erfindung, verglichen mit konventionellen Störgeräuschunterdrückungsmethoden, sind die signifikanten Verbesserungen betreffende den Gesamtsignalverzögerungen und die rechnerische Effizienz.The Main advantages of the present invention compared to conventional noise reduction methods, are the significant improvements related to the overall signal delays and the computational efficiency.

Die Erfindung wird nicht durch die folgende Ausführungsform beschränkt. Sie ist lediglich zur Erläuterung des erfinderischen Kozeptes und zur Darstellung einer möglichen Anwendung vorgesehen.The Invention is not achieved by the following embodiment limited. It is for explanation only of the inventive concept and to illustrate a possible Application provided.

Erfindungsgemäß arbeitet die Methode für laufzeitreduzierte, auf einem Gehörmodell basierte Einkanal-Störgeräuschunterdrückung und -reduktion als unabhängiges Modul und ist für Installationen in digitalen Signalverarbeitungsketten vorgesehen, worin ein durch Software spezifizierter Algorithmus in einen kommerziell verfügbaren digitalen Signalprozessor (DSP), insbesondere ein DSP für Audio-anwendungen, implementiert ist.Works according to the invention the method for reduced-time, on a hearing model based single-channel noise cancellation and -reduction as an independent module and is for Installations in digital signal processing chains provided, wherein a software specified algorithm into a commercial available digital signal processor (DSP), in particular a DSP for audio applications, is implemented.

Bemerkungen: Die Amplitude des klaren Sprachsignals wird mit der spektralen Ephraim-Malah-Subtraktionsregel (EMSR) von der gegebenen Amplitude des verrauschten Signals und der geschätzten Störgeräuschvarianz abgeschätzt. Um Artifakte wie das musikalisches Geräusch zu vermeiden, werden modifizierte entscheidungsgesteuerte Ansätze (DDA), die Übersubtraktion (Unterschätzung) der Störgeräuschvarianz mit einem unteren Störgeräuschpegelparameter einführen.Remarks: The amplitude of the clear speech signal is determined by the Ephraim-Malah spectral subtraction rule (EMSR) from the given amplitude of the noisy signal and the estimated noise variance estimated. To avoid artifacts like the musical noise, become modified decision-driven approaches (DDA), the over-subtraction (underestimation) of the noise variance with a lower noise level parameter.

Kurzbeschreibung der ZeichnungfigurenBrief description of the drawing figures

1 ist eine schematische Darstellung einer Einkanal-Teilband-Sprachverstärkungseinheit der vorliegenden Erfindung. 1 Fig. 10 is a schematic diagram of a single-channel subband voice amplification unit of the present invention.

2 ist eine schematische Darstellung der nichtlinearen Berechnung des Verstärkungsfaktors für die Störgeräuschunterdrückung, welche für jedes Teilband angewendet wird. 2 Figure 4 is a schematic representation of the non-linear calculation of the noise reduction gain used for each subband.

3 und 4 zeigen die dachförmigen MMSE-SP-Abschwächungsfläche in Abhängigkeit der a posteriori (γ_k) und der a priori (ξ_k) SNR. Um alle Werte 0 < γ_k < ∞ abzudecken bezieht sich die x-Achse auf γ_k ünd nicht wie in der Literatur auf (γ_k – 1). Die strich-punktierte Linie in 3 markiert den Übergang zwischen den Bereichen

und G_w, die strichlierte Linie zeigt die spektrale Leistungssubtraktionskontour. Die Konturen der DDA-Abschätzung sind in 4 über der MMSE-SP-Abschwächnungsfläche eingezeichnet. Die gestrichelten Linien in 4 zeigen den Durchschnitt der dynamischen Verhältnisse zwischen γ_k und ξ_k. Die soliden Linien zeigen die statischen Verhältnisse. 3 and 4 show the roof-shaped MMSE-SP attenuation surface as a function of the a posteriori (γ _k ) and the a priori (ξ _k ) SNR. In order to cover all values 0 <γ _k <∞, the x-axis refers to γ _k and does not refer to (γ _k -1) as in the literature. The dash-dotted line in 3 marks the transition between the areas

and G _w , the dashed line shows the spectral power subtraction contour. The contours of the DDA estimate are in 4 located above the MMSE-SP suppression area. The dashed lines in 4 show the average of the dynamic relationships between γ _k and ξ _k . The solid lines show the static conditions.

5 und 6 sind Darstellungen des kombinierten (modifizierten) DDA- und MMSE-SP-Abschätzungsverhalten. Die strichlierten Linien in 5 zeigen den Durchschnitt des dynamischen Verhältnis zwischen γ_k und ξ_k. Die soliden Linien zeigen die statischen Verhältnisse. Zwei fikitive Hystereseschleifen in 6 passen mit den Beobachtungen von informellen Experimenten überein. 5 and 6 are representations of the combined (modified) DDA and MMSE SP estimation behavior. The dotted lines in 5 show the average of the dynamic relationship between γ _k and ξ _k . The solid lines show the static conditions. Two fictitious hysteresis loops in 6 match the observations of informal experiments.

7 zeigt ein Blockdiagramm des Komplettsystems. 7 shows a block diagram of the complete system.

8 zeigt das Komplettsystem, das eine Hörfrequenzanalyse und eine Wiederzusammensetzung als Eingang und Ausgang umfasst, sowie eine spezielle verzögerszeitreduzierte Sprachverstärkung mit geringem Aufwand dazwischen. Eine Kombination eines ausgeklügelten Geräuschunterdrückungsgesetz mit einem menschlichen Gehörmodell ermöglicht hochqualitative Leistungsmerkmale. 8th shows the complete system, which includes a hearing frequency analysis and a re-composition as input and output, as well as a special delay time-reduced speech enhancement with little effort in between. A combination of a sophisticated noise suppression law with a human auditory model enables high quality performance.

9 zeigt einen Außenohr- und einen Mittelohrfilter zusammengestellt aus drei Abschnitten zweiter Ordnung (SOS). 9 shows an outer ear and a middle ear filter composed of three second order SOS sections.

10 zeigt ein Beispiel: Three-Zero Gammaton-Filter der Ordnung 3. Die gemeinsame Null bei z = 1 ist nicht in dieser Figur dargestellt. 10 shows an example: three-zero gammaton filter of order 3. The common zero at z = 1 is not shown in this figure.

11 zeigt eine bekannte Art der Pegelerkennung. Bei der Verwendung der Signalleistung wird das Quadrat der Amplitude detektiert. 11 shows a known type of level detection. When using the signal power, the square of the amplitude is detected.

12 zeigt den laufzeitreduzierten FIR-Pegeldetektor. 12 shows the runtime reduced FIR level detector.

13 zeigt einen nichtlinearen rekursiven Post-Masking auditorisches Filter, der auf fallende Flanken anspricht. 13 shows a non-linear recursive post-masking auditory filter that responds to falling edges.

14 zeigt einen rekursiven Störgeräuschpegelabschätzer, der drei Zeitkonstanten und einem Zählerschwellwert verwendet. 14 shows a recursive noise floor estimator that uses three time constants and a counter threshold.

Detaillierte BeschreibungDetailed description

In dieser Beschreibung werden neue Aspekte vorgelegt, welche die Ephraim-Malah-Störgeräuschunterdrückungsregel (EMSR) und den entscheidungsgesteuerten Ansatz (DDA) für eine a priori Störabstandabschätzung betreffen. Nach Aufteilung des Bereichs des Amplitudenabschätzers wird es klar, dass die kombinierte DDA-Abschätzung eines unkonfigurierten Hysteresezyklus folgt. Die Einführung eines Hysteresebreiteparamteters verbessert die Hystereseform und reduziert das musikalisches Geräusch. Schließlich erhalten wir einen flexibleren Störgeräuschunterdrücker mit geringerer Abhängigkeit von der Abtastraste des Systems.In In this description, new aspects are presented which include the Ephraim Malah noise suppression rule (EMSR) and the decision-driven approach (DDA) for relate to an a priori S / N estimate. After splitting the range of the amplitude estimator it becomes clear that the combined DDA estimate of a unconfigured hysteresis cycle follows. The introduction a hysteresis full-width paremeterer improves the hysteresis and reduces the musical noise. Finally we get a more flexible noise canceler with less dependency on the sampling rate of the system.

I. EINFÜHRUNGI. INTRODUCTION

Der Ephraim-Malah-Amplitudenabschätzer und die entscheidungsgesteuerte Ephraim-Malah a priori SNR-Abschätzung ( Y. Ephraim and D. Malah, ”Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator”, IEEE Transactions on Acoustics, Speech, and Signal Processing, nr. 6, vol. ASSP-32, pp. 1109–1121, Dec. 1984 and Y. Ephraim and D. Malah, ”Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator”, IEEE Transactions on Acoustics, Speech, and Signal Processing, nr. 2, vol. ASSP-33, pp. 443–445, Apr. 1985 .) sind leistungsstarke Werkzeuge der Störgeräuschunterdrückung in der Sprachsignalverarbeitung. Gegenwärtig gibt es eine ganze Menge von kürzlich publizierten Arbeiten zu beiden Themen, da der kombinierte Algorithmus ein leistungsfähiges Werkzeug einerseits ist ( O. Cappé, ”Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor”, IEEE Transactions an Speech and Audio Processing, nr. 2, vol. 2, pp. 345–349, Apr. 1994 ), aber anderseits sind Vereinfachungen ( P. J. Wolfe and S. J. Godsill, ”Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement”, Proc. 11th IEEE Signal Processing Workshop, pp. 496-499, 6-8. Aug 2001 ) sowie Weiterentwicklungen ( I. Cohen and B. Berdugo, ”Speech Enhancement for non-stationary noise environments”, Signal Processing, no. 11, pp. 2403-2418, Elsevier, Nov. 2001 ; I. Cohen, ”Speech Enhancement Using a Noncausal A Priori SNR estimator”, IEEE Signal Processing Letters, no. 9, pp. 725-728, Sep. 2004 ; I. Cohen, ”Relaxed Statistical Model for Speech Enhancement and A Priori SNR Estimation”, Center for Communication and Information Technologies, Israel Institute of Technology, Oct, 2003, CCIT Report no. 443 ; M. K. Hasan, S. Salahuddin, M. R. Khan, ”A Modified A Priori SNR for Speech Enhancement Using Spectral Subtraction Rules”, IEEE Signal Processing Letters, vol. 11, no. 4, pp 450–453, April 2004 ) wünschenswert sind.The Ephraim-Malah amplitude estimator and the decision-driven Ephraim-Malah a priori SNR estimate ( Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 6, vol. ASSP-32, pp. 1109-1121, Dec. 1984 and Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 2, vol. ASSP-33, pp. 443-445, Apr. 1985 .) are powerful tools for noise reduction in speech processing. At the moment there are a lot of recently published papers on both topics, as the combined algorithm is a powerful tool on the one hand ( O. Cappé, "Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor," IEEE Transactions to Speech and Audio Processing, no. 2, vol. 2, pp. 345-349, Apr. 1994 ), but on the other hand simplifications ( PJ Wolfe and SJ Godsill, "Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement," Proc. 11th IEEE Signal Processing Workshop, pp. 496-499, 6-8. Aug 2001 ) as well as further developments ( Cohen and B. Berdugo, Speech Enhancement for Non-Stationary Noise Environments, Signal Processing, no. 11, pp. 2403-2418, Elsevier, Nov. 2001 ; I. Cohen, "Speech Enhancement Using a Noncausal A Priori SNR Estimator", IEEE Signal Processing Letters, no. 9, pp. 725-728, Sep. 2004 ; I. Cohen, "Relaxed Statistical Model for Speech Enhancement and A Priori SNR Estimation", Center for Communication and Information Technologies, Israel Institute of Technology, Oct, 2003, CCIT Report no. 443 ; MK Hasan, S. Salahuddin, MR Khan, "A Modified Priori SNR for Speech Enhancement Using Spectral Subtraction Rules", IEEE Signal Processing Letters, vol. 11, no. 4, pp 450-453, April 2004 ) are desirable.

Im Amplitudenabschätzungsteil des Alogrithmus wird ein Signalmodell herangezogen, in welchem ein Störgeräuschsignal y[n], bestehend aus Sprache x[n] und additiven Störgeräuschen d[n], zum Zeitindex n. Die Signale x[m] und d[n] werden als statistisch unabhängige Gauß'sche Zufallsvariablen angenommen. Wegen bestimmter Eigenschaften der Fouriertransformation kann das selbe statistische Modell für die entsprechenden kurzzeitigen spektralen Amplituden X _k[m] und D _k[m] in jedem Frequenzintervall k zum Analysezeitpunkt m angenommen werden. (Unterstrichene Variablen kennzeichnen hier komplexwertige Größen. Deshalb ist X _k[m] in unserer Notation eine komplexe Variable. Zur Vereinfachung der Notation soll X _k[m] den Betrag |X _k[m]| darstellen.) Bei gegebenen Sprach- und Störgeräuschvarianzen σ2x,k und σ2d,k kann die Sprachamplitude X _k[m] von der verrauschten Sprache Y _k[m] abgeschätzt werden. Ein geeigneter Abschätzer

_k[m] für die klare Sprachamplitude wird in Abschnitt I-A beschrieben.In the amplitude estimation part of the algorithm, a signal model is used in which a noise signal y [n], consisting of speech x [n] and additive noise d [n], becomes the time index n. The signals x [m] and d [n] are called assumed statistically independent Gaussian random variables. Because of certain properties of the Fourier transformation, the same statistical model for the corresponding short-term spectral amplitudes X _k [m] and D _k [m] can be assumed in each frequency interval k at the time of analysis m. (Underlined variables feature here complex-valued variables Therefore X _k [m] is in our notation a complex variable to simplify the notation to X _k [m] the amount |.. X _k [m] |. Group) Given voice and Störgeräuschvarianzen σ 2 x, k and σ 2 d, k For example, the speech amplitude X _k [m] can be estimated from the noisy speech Y _k [m]. A suitable estimator

_k [m] for the clear speech amplitude is described in Section IA.

Die unbekannten Varianzen der klaren Sprache σ2x,k werden implizit im a priori SNR-Abschätzungsteil des Algorithmus bestimmt, wobei die Störgeräuschvarianz σ2d,k im Vorhinein zu bestimmen ist, z. B. durch die Verwendung der Minimum-Statistik ( P. J. Wolfe and S. J. Godsill, ”Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement”, Proc. 11th IEEE Signal Processing Workshop, pp. 496–499, 6–8. Aug 2001 ), MCRA ( I. Cohen and B. Berdugo, ”Speech Enhancement for non-stationary noise environments”, Signal Processing, no. 11, pp. 2403–2418, Elsevier, Nov. 2001 ) oder harmonisches Tunneln ( D. Ealey, H. Kelleher, D. Pearce, ”Harmonic Tunneling: Tracking Non-Stationary Noises During Speech”, Proc. Eurospeech, 2001 ).The unknown variances of the clear language σ 2 x, k are implicitly determined in the a priori SNR estimation part of the algorithm, where the noise variance σ 2 d, k to be determined in advance, z. By using the minimum statistics ( PJ Wolfe and SJ Godsill, "Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement," Proc. 11th IEEE Signal Processing Workshop, pp. 496-499, 6-8. Aug 2001 ), MCRA ( Cohen and B. Berdugo, Speech Enhancement for Non-Stationary Noise Environments, Signal Processing, no. 11, pp. 2403-2418, Elsevier, Nov. 2001 ) or harmonic tunneling ( D. Ealey, H. Kelleher, D. Pearce, "Harmonic Tunneling: Tracking Non-Stationary Noises During Speech", Proc. Eurospeech, 2001 ).

Die entscheidungsgesteuerte Abschätzung, beschrieben in Abschnitt I-B, bestimmt die a priori SNR ξk = σ2x,k /σ2d,k in jedem Frequenzintervall k. Zusätzlich verwendet der Störgeräuschunterdrücker eine unmittelbare Abschätzung, den sogenannten a posteriori SNR-Abschätzer, der das Quadrat des gegenwärtigen Störgeräuschbetrags auf die Störgeräuschvarianz bezieht γk[m] = Y2k [m]/σ2d,k . The decision-driven estimation described in Section IB determines the a priori SNR ξ k = σ 2 x, k / σ 2 d, k in every frequency interval k. In addition, the noise canceler uses an immediate estimate, the so-called a posteriori SNR estimator, which relates the square of the current noise floor to the noise variance γ k [m] = Y 2 k [M] / σ 2 d, k ,

In Abschnitt II wird ein Überblick über die kombinierte Abschätzung gegeben und die Hystereseform präsentiert. Anschließend wird in Abschnitt III gezeigt, wie eine kleine Modifikation ungwünschtes Abschätzungsverhalten reduzieren kann und eine glattere Hysterese ermöglicht.In Section II will give an overview of the combined Given estimation and presented the hysteresis form. Subsequently, section III shows how a small Modification unwanted estimation behavior can reduce and allows for a smoother hysteresis.

A. Das Ephraim-Malah-Unterdrückungsgestz (EMSR)A. The Ephraim-Malah Suppression Gest (EI)

Wie eingangs beschrieben, rekonstruiert der EMSR den Betrag des klaren Sprachsignals X ^_k[m] von der verrauschten Beobachtung Y_k[m]. Weil die Beträge zu unterschiedlichen Zeitpunkten m als statistisch unabhängig angenommen wurden, kann der Zeitindex m zur Vereinfachung der Notation weggelassen werden.As described above, the EMSR reconstructs the magnitude of the clear speech signal X _k [m] from the noisy observation Y _k [m]. Because the amounts have been assumed to be statistically independent at different times m, the time index m may be omitted for simplicity of notation.

Der MMSE-SA-Schätzer von Ephraim und Malah ( Y. Ephraim and D. Malah, ”Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator”, IEEE Transactions on Acoustics, Speech, and Signal Processing, nr. 6, vol. ASSP-32, pp. 1109–1121, Dec. 1984 ) löst die Bayes'sche Formel X ^_k = E{X_k|Y_k} um die Amplitude der klaren Sprache X_k abzuschätzen. Werden verschiedene Verzerrungen auf die Amplitude angewendet, werden andere Schätzer in ähnlicher Weise abgeleitet, z. B. der MMSE-LSA Y. Ephraim and D. Malah, ”Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator”, IEEE Transactions on Acoustics, Speech, and Signal Processing, nr. 2, vol. ASSP-33, pp. 443–445, Apr. 1985 ) X ^_k =

und der MMSE-SP von Wolfe und Godsill ( P. J. Wolfe and S. J. Godsill, ”Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement”, Proc. 11th IEEE Signal Processing Workshop, pp. 496–499, 6–8. Aug 2001 ) X ^_k =

Für eine detailiertere Beschreibung sei auf Cohen verwiesen ( I. Cohen, ”Relaxed Statistical Model for Speech Enhancement and A Priori SNR Estimation”, Center for Communication and Information Technologies, Israel Institute of Technology, Oct, 2003, CCIT Report no. 443 ).The MMSE-SA estimator of Ephraim and Malah ( Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 6, vol. ASSP-32, pp. 1109-1121, Dec. 1984 ) solves the Bayesian formula X ^ _k = E {X _k | Y _k } to estimate the amplitude of the clear language X _k . If different distortions are applied to the amplitude, other estimators are similarly derived, e.g. The MMSE-LSA Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 2, vol. ASSP-33, pp. 443-445, Apr. 1985 ) X ^ _k =

and the MMSE-SP of Wolfe and Godsill ( PJ Wolfe and SJ Godsill, "Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement," Proc. 11th IEEE Signal Processing Workshop, pp. 496-499, 6-8. Aug 2001 ) X ^ _k =

For a more detailed description, please refer to Cohen ( I. Cohen, "Relaxed Statistical Model for Speech Enhancement and A Priori SNR Estimation", Center for Communication and Information Technologies, Israel Institute of Technology, Oct, 2003, CCIT Report no. 443 ).

Gemäß Ephraim and Malah ist die verrauschte Phase eine optimale Schätzung der klaren Phase. Daher ist der Rekonstruktionsoperator ein reell-wertiges Spektralgewicht G[m]:

According to Ephraim and Malah, the noisy phase is an optimal estimate of the clear phase. Therefore, the reconstruction operator is a real-valued spectral weight G [m]:

Wegen seiner Einfachheit haben wir die MMSE-SP (3) von Wolfe und Godsill als Basis für unsere Betrachtungen gewählt. Die entsprechende Gewichtsregel kann wie folgt angegeben werden:

unter der Verwendung der Gleichung des Wiener-Filters

Because of its simplicity, we have chosen the MMSE SP (3) from Wolfe and Godsill as the basis for our considerations. The corresponding weight rule can be specified as follows:

using the Wiener Filter equation

Um die Anwendung zu vereinfachen, zerlegen wir den Rekonstruktionsoperator in einige Regionen

• (γ_k – 1) << 1/ξ_k: G_MMSE-SP ≈
• (γ_k – 1) >> 1/ξ_k: G_MMSE-SP ≈ G_w
• (γ_k – 1) = 1/ξ_k: G_MMSE-SP =

To simplify the application, we decompose the reconstruction operator into some regions

• (γ _k - 1) << 1 / ξ _k : G _MMSE-SP ≈
• (γ _k - 1) >> 1 / ξ _k : G _MMSE-SP ≈ G _w
• (γ _k - 1) = 1 / ξ _k : G _MMSE-SP =

Zusätzlich können wir das Wiener-Filter durch

• ξ_k << 1: G_w ≈ ξ_k
• ξ_k >> 1: G_w ≈ 1

approximieren. Mit der Kombination von beiden können wir die MMSE-SP-Fläche logarithmisch in flache Teile zerlegen (vgl. auch 3):

1) (γ_k – 1) << 1/ξ_k, ξ_k << 1 ⇒ G_MMSE-SP ≈
2) (γ_k – 1) << 1/ξ_k, ξ_k >> 1 ⇒ G_MMSE-SP ≈
3) (γ_k – 1) >> 1/ξ_k, ξ_k << 1 ⇒ G_MMSE-SP ≈ ξ_k
4) (γ_k – 1) >> 1/ξ_k, ξ_k >> 1 ⇒ G_MMSE-SP ≈ 1

In addition, we can through the Wiener filter

• ξ _k << 1: G _w ≈ ξ _k
• ξ _k >> 1: G _w ≈ 1

approximate. With the combination of both, we can decompose the MMSE SP surface logarithmically into flat parts (see also 3 ):

1) (γ _k - 1) << 1 / ξ _k , ξ _k << 1 ⇒ G _MMSE-SP ≈
2) (γ _k - 1) << 1 / ξ _k , ξ _k >> 1 ⇒ G _MMSE-SP ≈
3) (γ _k - 1) >> 1 / ξ _k , ξ _k << 1 ⇒ G _MMSE-SP ≈ ξ _k
4) (γ _k - 1) >> 1 / ξ _k , ξ _k >> 1 ⇒ G _MMSE-SP ≈ 1

In den folgenden Abschnitten verwenden wir die Kurzform G wenn wir uns auf G_MMSE-SP beziehen.In the following sections, we use the short form G when referring to G _MMSE-SP .

B. Der Entscheidungsgesteuerte Ansatz (DDA)B. The Decision-driven Approach (DDA)

Der DDA kombiniert zwei einfache SNR-Schätzer zu einem neuen Schätzer für a priori SNR ξ_k.The DDA combines two simple SNR estimators into a new estimator for a priori SNR ξ _k .

Der erste Schätzer ist der unmittelbare SNR (γk – 1) = Y2k /σ2d,k – 1 = (Y2k – σ2d,k )/σ2d,k . Mit nur positiven SNR-Werten erhält man SNRinst = max(γk – 1,0), (5)das vor der Störgeräuschunterdrückung berechnet werden kann. Dieses unmittelbare SNR unterscheidet sich von dem wirklichen SNR in den folgenden Fällen:

• wenn das Analysezeitfenster zu kurz, hinsichtlich der Stationarität der Signale x[n] und d[n], ist,
• wenn ein nichtstationäres Störgeräusch nicht im Detail indentifiziert werden kann oder
• wenn Störgeräusch und Sprachsignal stark korreliert sind.

The first estimator is the immediate SNR (γ k - 1) = Y 2 k / σ 2 d, k - 1 = (Y 2 k - σ 2 d, k ) / Σ 2 d, k , With only positive SNR values you get SNR inst = max (γ k - 1,0), (5) that can be calculated before noise suppression. This immediate SNR differs from the true SNR in the following cases:

If the analysis time window is too short with regard to the stationarity of the signals x [n] and d [n],
• if a nonstationary noise can not be identified in detail or
• if noise and speech signal are strongly correlated.

Der Schätzer zweiter Ordnung beschreibt das wiederhergestellte SNR, welches nach der Störgeräuschunterdrückung folgendermaßen berechnet wird

The second order estimator describes the recovered SNR which is calculated after the noise cancellation as follows

Bei schlechten SNR-Verhältnissen, z. B. 0 < γ_k < 2, zeigt das a posteriori SNR γ_k relative Variationen mit der Zeit, die kleiner als jene von (γ_k – 1) sind. (Relative Variationen, z. B. 10·log(γ_k[m]) – 10·log(γ_k[m – 1]), sind signifikanter als lineare Variationen hinsichtlich des menschlichen Hörempfindens.) Idealer Weise liefert G eine konsistente hohe Dämpfung für schlechte SNR-Verhältnisse. Daher ergibt die wiederhergestellte SNR_rec beständigere Werte als SNR_inst bei schlechten SNR Fällen.For poor SNR ratios, eg. 0 <γ _k <2, the a posteriori SNR γ _{k shows} relative variations with time smaller than those of (γ _k -1). (Relative variations, eg, 10 · log (γ _k [m]) - 10 · log (γ _k [m - 1]), are more significant than linear variations in human hearing.) Ideally, G gives consistently high Attenuation for poor SNR ratios. Therefore, the reconstructed SNR _rec yields more consistent values than SNR _inst in bad SNR cases.

Letztendlich kombiniert der DDA zur Abschätzung des a priori SNR SNR_inst und SNR_rec: ξk[m] = (1 – α)·SNRinst[m] + α·SNRrec[m – 1]. (7) Finally, the DDA combines SNR SNR _inst and SNR _rec to estimate the a priori SNR ξ k [m] = (1 - α) · SNR inst [m] + α · SNR rec [m - 1]. (7)

Die spezifischen Eigenschaften des Schätzers können beim Einsetzen der Unterdrückungsverstärkung in den DDA beobachtet werden.The specific characteristics of the estimator can when substituting the suppression gain in observed by the DDA.

II. KOMBINATION VON DDA UND EMSRII. COMBINATION OF DDA AND EMSR

Das Einsetzen der Teile des Rekonstruktionsoperators G_MMSE-SP von Wolfe und Godsill aus Abschnitt I-A in die DDA-Gleichung (7) von Ephraim und Malah ergibt für die kombinierte a priori SNR-Schätzung folgende Wirkungsbereiche:

The substitution of the parts of the reconstruction operator G _MMSE-SP by Wolfe and Godsill from section IA into the DDA equation (7) by Ephraim and Malah gives the following ranges of action for the combined a priori SNR estimation:

Die Charakteristik des kombinierten Ansatzes kann in 4 betrachtet werden. Unter der Berücksichtigung der Amplitude des Sprachsignals und eines konstanten Störgeräuschpegels, z. B. einer zeitlich veränderlichen a posteriori SNR γ_k als Eingangssequenz, kann man sich eine Art von Hystereseschleifeentwicklung auf der MMSE-SP-Fläche vorstellen. Neben offensichtlichen Unstetigkeiten in dieser Schleife werden andere Eigenschaften gezeigt ( O. Cappé, ”Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor”, IEEE Transactions an Speech and Audio Processing, nr. 2, vol. 2, pp. 345–349, Apr. 1994 ).The characteristic of the combined approach can be found in 4 to be viewed as. Considering the amplitude of the speech signal and a constant noise level, e.g. As a time-varying a posteriori SNR γ _k as an input sequence, one can imagine a kind of hysteresis loop development on the MMSE-SP surface. Besides obvious discontinuities in this loop, other properties are shown ( O. Cappé, "Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor," IEEE Transactions to Speech and Audio Processing, no. 2, vol. 2, pp. 345-349, Apr. 1994 ).

A. Rekursive MittelwertbildungA. Recursive averaging

1) Expectations of Recursive Averaging: In the above enumeration it can be seen that the a priori SNR estimate in part 1 corresponds to the recursive mean (8) of the immediate SNR inst (5). It is possible to generalize the averaging process by introducing a time constant τ avg which determines the mean value parameter α = exp [-1 / (τ avg · f s )]. Here, the sampling rate f s = 1 / T denotes the number of time-frequency transformations per second.
2) The constant ξ effect: If the a priori SNR ξ k has a constant value in part 1, eg. For example, in the case of large time constants τ avg or at the edges of the ξ k value range, the estimator could work strangely. At small and constant ξ k , the system becomes the output at a constant Pe kept gel. This happens when the entrance is small enough (Y 2 k [M] / σ 2 d, k - 1) << 1 / ξ k ⇒ Y 2 k [m] << σ 2 d, k /G w ≈ σ 2 d, k / ξ k (using (8) and its requirements):
Under certain circumstances, this can lead to disturbing, additional broadband noise, which can be worse than a constant output, which, due to the limitation of ξ k, to a minimum ζ for Y 2 k [m] <σ 2 d, k / ζ is caused.
3) Unstable recursive averaging: Following (12), part 5 may result in a priori SNR estimation by unstable recursive averaging of SNR inst , if α> 1/2, e.g. For example, ξ k can suddenly rise in this part.

B. Teile ohne rekursiver MittelwertbildungB. Parts without recursive averaging

In den Teilen 2, 3, und 4 ist die Interpretation der rekursiven Mittelwertbildung nicht brauchbar. In (9) nimmt nämlich die a priori SNR-Schätzung ξ_k einen konstanten Wert an, und in (10) wird ξ_k durch eine einfache Verzögerungszeit bestimmt. Es wirkt merkwürdig, dass SNR ξ_k in (10) reduzierte Version von SNR_inst ist.In parts 2, 3, and 4 the interpretation of the recursive averaging is not useful. Namely, in (9), the a priori SNR estimate ξ _{k takes} a constant value, and in (10), ξ _{k is} determined by a simple delay time. It seems strange that SNR ξ _{k is} in (10) reduced version of SNR _inst .

C. Zusammenfassung der EigenschaftenC. Summary of the properties

Tatsächtlich, besitzt jeder Teil außer 1 und 4 (Eqs. (8) und (11)) unerwartetes Verhalten. Mit der Definition von a durch eine Zeitkonstante erhält man verallgemeinerte mittelwertbildende Eigenschaften von (8), wohingegen a abtastratenabhängiges Verhalten durch die durch Eqs. (9)–(12) definierte Schätzung eingeführt wird. Diese Form der Abtastrate schließt einen allgemein passenden Parametersatz für unterschiedliche Zeitschrittanalysen und Transformationsgrößen aus.Tatsächtlich, every part except 1 and 4 (Eqs. (8) and (11)) has an unexpected Behavior. With the definition of a obtained by a time constant generalized averaging properties of (8), whereas a sample rate dependent behavior by the Eqs. (9) - (12) defined estimate. This form of sample rate includes a common one Parameter set for different time step analyzes and transformation sizes.

Ungünstiges Schätzverhalten, z. B. der ”Konstant-ξ-Effekt”, und die Unstetigkeiten in der Hystereseschleife (4) erhöhen die Erwägung bezüglich einer Modifikation der DDA und einer nochmalige Prüfung der Zeitkonstanten und Minimum-a priori SNR-Größen.Unfavorable estimation behavior, e.g. The "constant ξ effect", and the discontinuities in the hysteresis loop ( 4 ) increase the consideration of modifying the DDA and retesting the time constants and minimum a priori SNR magnitudes.

III. EIN MODIFIZIERES, SCHNELL ANTWORTENDER DDAIII. A MODIFYING, FAST RESPONSE DDA

Um den Einfluss unerwartender Schätzfunktionen zu minimieren, wird der entscheidungsgesteuerte Ansatz modifiziert: ξk[m] = (1 – α)·(ρ·SNRinst[m] + ζ) + α·SNRrec[m – 1], (14) mit ζ als unterer Störgeräuschpegelparameter ( O. Cappé, ”Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor”, IEEE Transactions on Speech and Audio Processing, nr. 2, vol. 2, pp. 345–349, Apr. 1994 ) und ρ and Unterschätzparameter des unmittelbaren SNR. Ähnlich wie bei den Teilen in Abschnitt II kann man folgendes finden:

To minimize the impact of unexpected estimators, the decision-driven approach is modified: ξ k [m] = (1 - α) · (ρ · SNR inst [m] + ζ) + α · SNR rec [m - 1], (14) with ζ as the lower noise level parameter ( O. Cappé, "Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor", IEEE Transactions on Speech and Audio Processing, no. 2, vol. 2, pp. 345-349, Apr. 1994 ) and ρ and underestimate parameters of the immediate SNR. Similar to the parts in Section II you can find the following:

Hinsichtlich der Teilungen des neuen Schätzers, kann man das Schema des Gesamtschätzers in 5 betrachten. Statt der Zeitkonstanten in dem quasistationären Bereich der Sprache wird jetzt τ_avg = 2 ms verwendet. ρ = 10^–15/10 garantiert, dass der Skalierungsfaktor in (17) durch ρ(1 – α) ≈ ρ approximiert wird, das die Unstetigkeiten in der Abschätzhysterese behebt. Man kann den unteren Störgeräuschpegel ζ = 10^–25/10 so klein wählen, dass die maximale Abschwächung ζ am unteren Ende des dynamischen Bereichs des Frequenzintervalls liegt. Diese Maßnahmen reduzieren größtenteils die in Abschnitt II-C beschriebende Abtastratenabhängigkeit un den ”Konstant-ξ-Effekt” aus Abschnitt II-A.2.Regarding the divisions of the new estimator, one can see the scheme of the total estimator in 5 consider. Instead of the time constant in the quasi-stationary domain of speech, τ _avg = 2 ms is now used. ρ = 10 ^-15/10 guarantees that the scaling factor in (17) is approximated by ρ (1 - α) ≈ ρ, which fixes the discontinuities in the estimation hysteresis. You can choose the lower noise level ζ = 10 ^-25/10 small ^enough so that the maximum attenuation ζ is at the lower end of the dynamic range of the frequency interval. These measures largely reduce the sample rate dependency described in Section II-C and the "constant ξ effect" from Section II-A.2.

Es wird klar, dass steigende unmittelbare SNRs nun besser abgeschwächt werden nach 5 als in 4. Daher kann eine starke Abschwächung für musikalisches Klänge, z. B. inkonsistente hohe unmittelbare SNR, bereitgestellt werden, während ein Signal mit durchwegs hoher SNR, durch den Störgeräuschunterdrücker hindurchgehen kann. Die zwei gekräuselten Schleifen in 6 geben ein Beispiel einer approximierten Hystereseschleife während des Systembetriebs.It becomes clear that rising immediate SNRs are now better attenuated 5 as in 4 , Therefore, a strong attenuation for musical sounds, z. Inconsistent high immediate SNR, while a signal of consistently high SNR can pass through the noise canceler. The two curled loops in 6 give an example of an approximated hysteresis loop during system operation.

Der Parameter ρ kann direkt die Unterdrückungshysteresebreite und die Unterdrückung des musikalisches Geräusches steuern. Unsere Modifikationen ermöglichen eine separate Steuerung der mittelwertbildenen Zeitkonstante und das Störgeräuschunterdrückung.Of the Parameter ρ can directly set the suppression hysteresis width and the suppression of musical noise Taxes. Our modifications allow a separate Control of the averaging time constant and noise suppression.

IV. SCHLUSSFOLGERUNGIV. CONCLUSION

Wir haben einen nachvollziehbaren Weg gefunden, um die Eigenschaften der spektralen Amplitudenschätzung von Wolfe und Godsill sowie die entscheidungsgesteuerte a priori SNR-Abschätzung von Ephraim und Malah grafisch zu beschreiben. Diese Beschreibung kann in ähnlicher Weise für andere Amplitudenschätzreglen verwendet werden und bietet eine neue Einsicht in den Störgeräuschunterdrücker von Ephraim und Malah.We have found a traceable way to the properties the spectral amplitude estimate of Wolfe and Godsill and decision-driven a priori SNR estimation graphically to describe Ephraim and Malah. This description can similarly estimate for other amplitude estimates used and offers a new insight into the noise canceler from Ephraim and Malah.

Bisher war die die Unterdrückung des musikalisches Geräusches ein Kompromiss zwischen der Unterdrückung des musikalisches Geräusches und transienten Verzerrung. Kleine Modifikationen in der entscheidungsgesteuerten Schätzregel erlaubt ein flexibleres Handhaben der Unterdrückung des musikalisches Geräusches, bei gleichzeitiger Reduktion der Abhängigkeiten der Zeitschrittanalyse und des ”Konstant-ξ-Effektes”. Ein informeller Hörtest mit modifiziertem Algorithmus und justierbarer Analysezeit/Frequenzauflösung (Filterbankansatz) zeigte bereits nützliche Verbesserungen in Gesamtsystem.So far that was the suppression of the musical noise a compromise between the suppression of the musical Noise and transient distortion. Small modifications in the decision-driven estimation rule allows one more flexible handling of the repression of the musical Noise, while reducing dependencies the time step analysis and the "constant ξ effect". An informal hearing test with modified algorithm and adjustable analysis time / frequency resolution (filter bank approach) already showed useful improvements in the overall system.

Unsere zukünftige Arbeit wird unsere beschreibenden Methoden in ausgeklügeltere Schätzansätze von Cohen ( I. Cohen, ”Speech Enhancement Using a Noncausal A Priori SNR estimator”, IEEE Signal Processing Letters, no. 9, pp. 725–728, Sep. 2004 ) oder Hasan ( M. K. Hasan, S. Salahuddin, M. R. Khan, ”A Modified A Priori SNR for Speech Enhancement Using Spectral Subtraction Rules”, IEEE Signal Processing Letters, vol. 11, no. 4, pp 450–453, April 2004 ) einsetzen.Our future work will turn our descriptive methods into more sophisticated estimation approaches by Cohen ( I. Cohen, "Speech Enhancement Using a Noncausal A Priori SNR Estimator", IEEE Signal Processing Letters, no. 9, pp. 725-728, Sep. 2004 ) or Hasan ( MK Hasan, S. Salahuddin, MR Khan, "A Modified Priori SNR for Speech Enhancement Using Spectral Subtraction Rules", IEEE Signal Processing Letters, vol. 11, no. 4, pp 450-453, April 2004 ) deploy.

Apparat für laufzeitreduzierte Einkanal-SprachverstärkungApparatus for reduced running time Single channel speech enhancement

Im Folgendem wird eine bevorzugte Ausführungsform beschrieben, jedoch ist die Erfindung nicht auf diese Ausführungsform beschränkt.in the A preferred embodiment will be described below. however, the invention is not limited to this embodiment limited.

Die Reduktion von musikalisches Geräuschen in Störgeräuschunterdrückungsalgorithmen ist immer noch ein Kernpunkt für Störgeräuschreduktion. Obwohl die Ephraim-Malah-Unterdrückungsregel (EMSR) und der entscheidungsgesteuerte Ansatz (DDA) ein gutes Leistungsvermögen aufweisen, müssen zusätzliche Hilfsmittel angewendet werden. Darüber hinaus stellen die Verarbeitungszeiten von der Signalanalyse kommend (schnelle Fourier-Transformation, FFT) ein Problem für Echtzeitanwendungen dar. Entscheidende Verbesserungen in beiden Punkten kann durch die Implementierung der Signalanalyse und Filteransätze mit menschliche Hörempfindungsmodellen und Laufzeitreduktion erreicht werden.The Reduction of musical noise in noise reduction algorithms is still a key issue for noise reduction. Although the Ephraim-Malah suppression rule (EMSR) and the decision-driven approach (DDA) a good performance have additional aids applied become. In addition, set the processing times coming from the signal analysis (fast Fourier transformation, FFT) is a problem for real-time applications. Crucial Improvements in both points can be made through the implementation Signal analysis and filter approaches with human auditory sensory models and runtime reduction can be achieved.

V. EINFÜHRUNGV. INTRODUCTION

Der Hauptteil dieser Beschreibung ist der Aufbereitung und der Anlayse des Hörsignals unter Verwendung von effizienten Algorithmen mit kurzen Verzögerungszeiten gewidmet. Unser System kombiniert eine Gehör-Gammaton-Filterbank ( R. F. Lyon, ”The All-Pole Gammatone Filter and Auditory Models”, Proc. Forum Acusticum, Antwerpen 1996 ; L. Lin, E. Ambikairajah, W. H. Holmes, ”Auditory Filterbank Design Using Masking Curves”, Proc. EUROSPEECH Scandinavia, 7th European Conference on Speech Communication and Technology, 2001 ; L. Lin, E. Ambikairajah, W. H. Holmes, ”Perceptual Domain Based Speech and Audio Coder”, Proc. of the third International Symposion DSPCS 2002, Sydney, Jan. 28–31, 2002 ) mit der Ephraim-Malah Störgeräuschunterdrückungsregel ( Y. Ephraim and D. Malah, ”Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator”, IEEE Transactions on Acoustics, Speech, and Signal Processing, nr. 6, vol. ASSP-32, pp. 1109–1121, Dec. 1984 ; Ephraim and D. Malah, ”Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator”, IEEE Transactions on Acoustics, Speech, and Signal Processing, nr. 2, vol. ASSP-33, pp. 443–445, Apr. 1985 ; P. J. Wolfe and S. J. Godsill, ”Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement”, Proc. 11th IEEE Signal Processing Workshop, pp. 496–499, 6–8. Aug 2001 ). Diese Kombination wurde kürzlich von den Autoren vorgestellt, wobei die Kombination einer Gehör-Gammaton-Filterbank mit einem Wiener-Störgeräuschunterdrücker von ( L. Lin, E. Ambikairajah, ”Speech Denoising Based on an Auditory Filterbank”, 6th ICSP, International Conference on Signal Processing, (552–555), 26–30 Aug. 2002 ) und eine Frequenzbereichlösung von WO 00/30264 (International applicatoin No. PCT/SG99/00119 ) bekannt ist. Ferner ist die Integration eines Außen- und Mittelohrfilters im Zeitbereich sowie die Integration eines nichtlinearen temporären Post-Masking Filter ( G. Stoll, J. G. Beerends, R. Bitto, K. Brandenburg, C. Colomes, B. Feiten, M. Keyhl, C. Schmidmer, T. Sporer, T. Thiede, W. C. Treurniet, ”PEAQ – der neue ITU-Standard zur objektiven Messung der wahrgenommenen Audioqualität”, RTM – Rundfunktechnische Mitteilungen, die Fachzeitschrift für Hörfunk und Fernsehtechnik, 43. Jahrgang, ISSN 0035-9890 (81–120), Firma Mensing GmbH + Co. KG, Abteilung Verlag, Sept 1999 ; L. Lin, E. Ambikairajah, W. H. Holmes, ”Perceptual Domain Based Speech and Audio Coder”, Proc. of the third International Symposion DSPCS 2002, Sydney, Jan. 28–31, 2002 ) in ein Störgeräuschunterdrückungssystem neu. Zusätzlich wird ein engbandiger Pegeldetektor mit kurzer Latenzzeit, der die Phase eines einfachen Filters erster Ordnung ausnützt, erstmals vorgestellt. Abschließend präsentieren wir ein einfaches Schema zur Signalrekonstruktion (Wiederherstellung) unter der Vermeidung von Bandkantensignalauslöschungen.

• Die Kombination einer Gehör-Gammaton-Filterbank und eines EMSR-Störgeräuschunterdrückers in einem Zeitbereichansatz
• Integration eines Außen- und Mittelohrfilters in das Unterdrückungssystem in einem Zeitbereichansatz
• Integration eines Post-Masking auditorischen Filters
• Engbandiger Pegeldetektor mit kurzer Latenzzeit
• Signalwiederherstellung nach Wolfe und Godsill mit geringem Aufwand
• Upsampling mit kurzer Latenzzeit
• Wiederherstellung mit kurzer Latenzzeit trotz hindernder destruktiver Interferenzen

The main part of this description is devoted to the preparation and the analysis of the audio signal using efficient algorithms with short delay times. Our system combines a hearing gammaton filter bank ( RF Lyon, "The All-Pole Gammatone Filters and Auditory Models", Proc. Forum Acusticum, Antwerp 1996 ; L. Lin, E. Ambikairajah, WH Holmes, "Auditory Filterbank Design Using Masking Curves", Proc. EUROSPEECH Scandinavia, 7th European Conference on Speech Communication and Technology, 2001 ; L. Lin, E. Ambikairajah, WH Holmes, "Perceptual Domain Based Speech and Audio Coder", Proc. of the third International Symposium DSPCS 2002, Sydney, Jan. 28-31, 2002 ) with the Ephraim-Malah noise suppression rule ( Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 6, vol. ASSP-32, pp. 1109-1121, Dec. 1984 ; Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 2, vol. ASSP-33, pp. 443-445, Apr. 1985 ; PJ Wolfe and SJ Godsill, "Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement," Proc. 11th IEEE Signal Processing Workshop, pp. 496-499, 6-8. Aug 2001 ). This combination was recently presented by the authors, with the combination of a hearing gammatone filter bank with a Wiener noise canceler from ( L. Lin, E. Ambikairajah, "Speech Denoising Based on Auditory Filterbank", 6th ICSP, International Conference on Signal Processing, (552-555), 26-30 Aug. 2002 ) and a frequency domain solution of WO 00/30264 (International applicatoin No. PCT / SG99 / 00119 ) is known. Furthermore, the integration of an outer and middle ear filter in the time domain and the integration of a nonlinear temporary post-masking filter ( G. Stoll, JG Beerends, R. Bitto, K. Brandenburg, C. Colomes, B. Feiten, M. Keyhl, C. Schmidmer, T. Sporer, T. Thiede, WC Treurniet, "PEAQ - the New ITU Standard for the objective measurement of perceived audio quality ", RTM - Rundfunktechnische Mitteilungen, the trade journal for radio and television technology, 43rd year, ISSN 0035-9890 (81-120), Mensing GmbH + Co. KG, Publishing Department, Sept 1999 ; L. Lin, E. Ambikairajah, WH Holmes, "Perceptual Domain Based Speech and Audio Coder", Proc. of the third International Symposium DSPCS 2002, Sydney, Jan. 28-31, 2002 ) into a noise canceling system. In addition, a short-latency narrow-band level detector utilizing the phase of a first-order simple filter is first introduced. Finally, we present a simple scheme for signal reconstruction (recovery) while avoiding band edge signal cancellations.

• The combination of an auditory gammaton filter bank and an EMSR noise canceler in a time domain approach
• Integration of an outer and middle ear filter into the suppression system in a time domain approach
• Integration of a post-masking auditory filter
• Narrow latency narrowband level detector
• Signal restoration to Wolfe and Godsill with little effort
• Upsampling with low latency
• Short-latency recovery despite hindering destructive interference

VI. SYSTEMÜBERBLICKVI. SYSTEM OVERVIEW

Das Gesamtsystem ist als Blockdiagramm in 7 dargestellt und kann als analoger oder digialer Effektprozessor oder als Teil eines Softwarealgorithmus implementiert werden. Innerhalb des Gesamtsystems sind mehrere Subsysteme (8):

• ein Außen- und Mittelohrfilter (H_OME),
• ein Gammaton-Filterbank-Analyseabschnitt (GFB),
• der Pegeldetektror mit kurzer Latenzzeit (LD),
• der auditorische Post-Masking-Filter (PM),
• ein rekursiver Störgeräuschspektrumschätzer (NE),
• das spektralen Subtraktionsgewicht (EMSR),
• Upsampling mit kurzer Latenzzeit (L ↑),
• dem Vocoder-Zustand und
• das inverse Außen- und Mittelohrfilter (H_IOME).

The overall system is as a block diagram in 7 and may be implemented as an analog or digital effects processor or as part of a software algorithm. Within the overall system are several subsystems ( 8th ):

• an outer and middle ear filter (H _OME ),
A gammaton filter bank analysis section (GFB),
The low-latency level detector (LD),
The auditory post-masking filter (PM),
A recursive noise spectrum estimator (NE),
The spectral subtraction weight (EMSR),
• short latency upsampling (L ↑),
• the vocoder state and
• the inverse outer and middle ear _filters (H _IOME ).

VII. AUSSEN- UND MITTELOHRFILTERVII. OUTER AND MIDDLE EAR FILTER

Ein Außen- und Mittelohrfilter unfasst drei Teile von zweiter Ordnung (SOS), die den physiolgischen Teil des menschlichen Ohrs repräsentieren ( E. Zwicker, H. Fastl, ”Psychoacoustics, facts and models”, Springer, Berlin Heidelberg, 1999 ; E. Terhardt, ”Akustische Kommunikation”, Springer, Berlin Heidelberg, 1998 ):

1) Die Hochpassdämpfungskurve unterhalb von 1 KHz modelliert die 100-Phon-Kurve, die die akustischen Impedanz des Außenohrs und die mechanische Impedanz der Gehörknöchelchen im Mittelohr repräsentiert
2) Die Resonanz des Ohrkanals und
3) Die Tiefpassdämpfungskurve überhlb 1 kHz modelliert die Hörschwelle.

An outer and middle ear filter incorporates three second order (SOS) parts that represent the physiological part of the human ear ( E. Zwicker, H. Fastl, "Psychoacoustics, Facts and Models", Springer, Berlin Heidelberg, 1999 ; E. Terhardt, "Acoustic Communication", Springer, Berlin Heidelberg, 1998 ):

1) The high-pass attenuation curve below 1 KHz models the 100-phonon curve, which represents the acoustic impedance of the outer ear and the mechanical impedance of the ossicles in the middle ear
2) The resonance of the ear canal and
3) The low-pass attenuation curve over 1 kHz models the hearing threshold.

Die letzten zwei Filter sind optional, wobei die Hochpass-Komponente obligatorisch ist und den Einfluss der niederfrequenten Störgeräusche auf den Störgeräuschunterdrücker reduziert.The last two filters are optional, with the high-pass component is mandatory and the influence of low frequency noise reduced to the noise canceler.

Eine Filterstruktur mit einem adequaten Größentransferfunktion könnte letztendlich wie in 9 aussehen. Alle drei Filterabschnitte müssen Abschnitte zweiter Ordnung aufweisen, um geeignete Flanken zu gewährleisten. Die äußeren Filterränder können als zweite-Ordnung Tief- und Hochpass-Kuhschwanzfilter modelliert werden, wobei die Resonanzen als parametrischen Glockenfilter modelliert werden kann ( P. Dutilleux, U. Zölzer, ”DAFX”, Wiley & Sons, 2002 ).A filter structure with an adequate size transfer function could ultimately, as in 9 appearance. All three filter sections must have second order sections to ensure proper edges. The outer filter edges can be modeled as second order low and high pass cow tail filters, where the resonances can be modeled as a parametric bell filter ( P. Dutilleux, U. Zölzer, "DAFX", Wiley & Sons, 2002 ).

Die Filterinversion ist unkompliziert. Falls Nullen bei z. B. z = 1 im z-Bereich sein sollen, kann das inverse Filter das nicht bewerkstelligen. Möglicherweise ist z = 0.99 eine geeignete Wahl für einen Startwert zur Inversion eine z = 1 Null.The Filter inversion is straightforward. If zeros at z. B. z = 1 in the z-range, the inverse filter can not do that. Maybe z = 0.99 is a good choice for a starting value for inversion a z = 1 zero.

VIII. FREQUENZGRUPPEN/GEHÖRBANDBREITENVIII. FREQUENCY GROUPS / HEADBAND WIDTHS

Frequenzgruppierung ist ein wichtiger Effekt in der menschlichen Wahrnehmung der Lautstärke. Die wahrgenommene Lautstärke unfasst besondere Lautstärken für unterschiedliche Frequenzbereiche. Eine hörbare Frequenzskala kann zum Modellieren der Frequenzgruppeneffekte verwendet werden, dessen Einheiten als die Frequenzauflösung der menschlichen Lautstärkewahrnehmung gesehen werden kann ( E. Zwicker, H. Fastl, ”Psychoacoustics, facts and models”, Springer, Berlin Heidelberg, 1999 ). Wir bezeichnen eine beliebige hörbare Frequenztransformation mit

{·} und die dazugehörige inverse Frequenztransformation mit

^–1{·}. Eine vernünftige Frequenzskala verwendet eine kleine Anzahl von Frequenzgrupppen gemäß der Formel von Traunmüller ( E. Terhardt, ”Akustische Kommunikation”, Springer, Berlin Heidelberg, 1998 )

Frequency grouping is an important effect in human perception of volume. The perceived volume includes particular volumes for different frequency ranges. An audible frequency scale can be used to model the frequency group effects, the units of which can be seen as the frequency resolution of human volume perception ( E. Zwicker, H. Fastl, "Psychoacoustics, Facts and Models", Springer, Berlin Heidelberg, 1999 ). We denote any audible frequency transformation

{·} And the associated inverse frequency transformation with

^-1 {·}. A reasonable frequency scale uses a small number of frequency groups according to the formula of Traunmüller ( E. Terhardt, "Acoustic Communication", Springer, Berlin Heidelberg, 1998 )

Demgemäß ist die inverse Tranformation

^–1{·}

Accordingly, the inverse transformation

^-1 {}}

Die Mittelfrequenzen f_k der Gehör-Filterbank kann unter Anwendung der inversen Transformation f_k =

^tr – 1{v_k} an einer äquidistanten Skala v_k (mit Abständen d_v, z. B. dv = 1[Bark]) im Bark-Raum berechnet werden. hnlich können die Bandbreiten B_k von B_k =

^–1{v_k + dv/2} –

^–1{v_k – dv/2} berechnet werden. Andere Bark-Skalen (z. B. E. Zwicker, H. Fastl, ”Psychoacoustics, facts and models”, Springer, Berlin Heidelberg, 1999 ) verwenden kleinere Bandbreiten und ergeben Gehörfilter mit größerer Gruppenverzögerung; daher wird der obige Abstand bevorzugt.The center frequencies f _{k of} the auditory filter bank can be calculated using the inverse transformation f _k =

^t r - 1 {v _k } are calculated on an equidistant scale v _k (with distances d _v , eg dv = 1 [Bark]) in the Bark space. Similarly, the bandwidths B _k of B _k =

^-1 {v _k + dv / 2} -

^-1 {v _k - dv / 2} can be calculated. Other bark scales (eg E. Zwicker, H. Fastl, "Psychoacoustics, Facts and Models", Springer, Berlin Heidelberg, 1999 ) use smaller bandwidths and make auditory filters with larger group delay; therefore, the above distance is preferred.

Um die Verwechslung mit der Variable z der z-Bereichs zu vermeinden, wird v anstelle von z für die Bark-Frequenzen verwendet.Around to avoid the confusion with the variable z of the z-range, v is used instead of z for the bark frequencies.

IX. GEHÖR-GAMMATON-FILTERSIX. HEARING Gammatone FILTERS

Gehör-Gammaton-Filter ( R. F. Lyon, ”The All-Pole Gammatone Filter and Auditory Models”, Proc. Forum Acusticum, Antwerpen 1996 ) können effizient im Zeitbereich implementiert werden und erlauben die Separation eines breitbandigen Audiosignals in Gehörbandsignalen. Die Antwortgröße des Gammaton-Filters korrespondiert mit den unmittelbaren Ausbendungseigenschaften des menschlichen Ohrs. Die Größe dieses Filters über die hörbare Frequenzskala aufgetragen bleibt gleich, egal für welche Mittelfrequenz das Filter ausgelegt wurde. Die beliebige Form repräsentiert eine Familie von Gammaton-Filtern der Ordnung m und ist weiter dargestellt, worin k der Filterbankkanalindex ist. Eine entsprechende z-Transformation, worin *GF ein beliebiges Gammaton-Filter (z. B. GF, APGF, OZGF, TZGF) bezeichnet:

Hearing gammaton filter ( RF Lyon, "The All-Pole Gammatone Filters and Auditory Models", Proc. Forum Acusticum, Antwerp 1996 ) can be efficiently implemented in the time domain and allow the separation of a wideband audio signal in auditory canal signals. The response size of the gamma-tone filter corresponds to the immediate output characteristics of the human ear. The size of this filter plotted over the audible frequency scale remains the same regardless of the center frequency the filter has been designed for. The arbitrary shape represents a family of gamma-tone filters of order m and is further illustrated, where k is the filter bank channel index. A corresponding z-transformation, where * GF denotes any Gammaton filter (eg GF, APGF, OZGF, TZGF):

Digitale Mittelfrequenzen θ_k und Pol-Radien r_k werden von den zeitkontinuierlichen Größen Mittelfrequenz f_k, Bandbreite B_k, die Bandrandunterdrückung C_dB (z. B. C_dB = – 5[dB]) und die Abtastrate f_s:

Digital center frequencies θ _k and pole radii r _k are determined by the continuous-time quantities middle frequency f _k , bandwidth B _k , band edge suppression C _dB (eg C _dB = -5 [dB]) and the sampling rate f _s :

Eine Gehör-Gammaton-Filterbank repräsentiert ein Gruppe von überlappenden Gammatone-Filtern, welche die hörbare Frequenzskala in äquidistante Frequenzbänder unterteilt. Die Ordnung m = 4 wird häufig in der Literatur verwendet, wobei die Ordnung m = 3 zur Minimierung der Rechenleistung vorgeschlagen wurde. Der Term g_*GF soll derart justierbar sein, dass die Einheitsverstärkung bei der Mittelfrequenz f_k erreicht wird. Für eine spezielle Form des Gammaton-Filters muss das System H_num,k(z), wie in der folgenden Unterabschnitten gezeigt, geeignet adaptiert werden.An auditory gammaton filterbank represents a group of overlapping gamma-tone filters which subdivide the audible frequency scale into equidistant frequency bands. The order m = 4 is often used in the literature, where the order m = 3 has been proposed to minimize the computational power. The term g _{* GF} should be adjustable so that the unity gain at the center frequency f _{k is} reached. For a particular form of gamma-tone filter, the system H _{num, k} (z) must be suitably adapted as shown in the following subsections.

A. Einfaches Gammaton-FilterA. Simple gammaton filter

Das einfache Gammaton-Filter (GF; R. F. Lyon, ”The All-Pole Gammatone Filter and Auditory Models”, Proc. Forum Acusticum, Antwerpen 1996 ) muss von der zeitkontinuierlichen Impulsantwort unter er Verwendung der Laplace- und Impulsvarianzentransformation ( A. V. Oppenheim, R. W. Schafer, J. R. Buck, ”Discrete-Time Signal Processing”, Prentice Hall, 1999 ) abgeleitet werden:

welches das unbekannte Polynom H_num,k(z) in (21) bestimmt. Wegen seiner Form und des rechnerischen Aufwands ist seine Verwendung nicht empfohlen.The simple gammaton filter (GF; RF Lyon, "The All-Pole Gammatone Filters and Auditory Models", Proc. Forum Acusticum, Antwerp 1996 ) of the time-continuous impulse response using the Laplace and impulse variance transformation ( AV Oppenheim, RW Schafer, JR Buck, Discrete Time Signal Processing, Prentice Hall, 1999 ) be derived:

which determines the unknown polynomial H _{num, k} (z) in (21). Because of its shape and computational effort, its use is not recommended.

B. All-Pol Gammaton-FilterB. All-pole gammaton filter

Ein All-Pol Gammaton-Filter (APGF) erhält man wenn das Polynom in (21) verschwindet H_num,k(z) = 1. Es is das effizienteste Gammaton-Filter ( R. F. Lyon, ”The All-Pole Gammatone Filter and Auditory Models”, Proc. Forum Acusticum, Antwerpen 1996 ).An all-pole gamma-tone filter (APGF) is obtained when the polynomial in (21) disappears H _{num, k} (z) = 1. It is the most efficient gammaton filter ( RF Lyon, "The All-Pole Gammatone Filters and Auditory Models", Proc. Forum Acusticum, Antwerp 1996 ).

C. One-Zero Gammatone-FilterC. One-Zero Gammatone Filter

Das Setzen von H_num,k(z) = (1 – z^–1) in (21) führt zu einem sogenannten One-Zero Gammaton-Filter ( R. F. Lyon, ”The All-Pole Gammatone Filter and Auditory Models”, Proc. Forum Acusticum, Antwerpen 1996 ). Das One-Zero Gammaton-Filter(OZGF) kann effizient aus einem ”One-Zero” für alle Kanäle k vor dem Zerlegen in k All-Pol Gammaton-Filters zusammengesetzt werden.Setting H _{num, k} (z) = (1 - z ^-1 ) in (21) leads to a so-called one-zero gammaton filter ( RF Lyon, "The All-Pole Gammatone Filters and Auditory Models", Proc. Forum Acusticum, Antwerp 1996 ). The One-Zero Gammaton Filter (OZGF) can be efficiently composed of a "one-zero" for all k channels before disassembling into k All-Pol Gammaton filters.

D. Three-Zero Gammaton-FilterD. Three-Zero Gammaton Filter

Wenn ein Paar von komplex-konjugierten Nullstellen z = r_z·

mit der digitalen Frequenz θ_z,k bei 1 Bark über der Mittelfrequenz θ_k mit einem Radius r_z ≈ 0.98 und eine zusätzlichen Nullstelle bei z = 1 hinzugefügt werden, erhält man Hnum,k(z) = (1 – 2rzcos(θz,k)z–1 + r2z z–2)·(1 – z–1 ) für das Three-Zero Gammaton-Filter (TZGF) mit einer verbesserten Form ( L. Lin, E. Ambikairajah, W. H. Holmes, ”Auditory Filterbank Design Using Masking Curves”, Proc. EUROSPEECH Scandinavia, 7th European Conference on Speech Communication and Technology, 2001 ). Der rechnerische Aufwand des One-Zero Gammaton-Filters der Ordnung m + 1 ist gleich dem Aufwand des Three-Zero Gammaton-Filter der Ordnung m, falls wieder ein einzelnes ”One-Zero” für alle Kanäle k verwendet wird. Geeignete Transformationen und digitale Frequenzberechnugen θ_z,k folgen aus (19), (20) und (22).If a pair of complex conjugate zeros z = r _z ·

with the digital frequency θ _{z, k added} at 1 bar over the center frequency θ _k with a radius r _z ≈ 0.98 and an additional zero at z = 1, one obtains H num, k (z) = (1 - 2r z cos (θ z, k ) z -1 + r 2 z z -2 ) · (1 - z -1 ) for the Three Zero Gammaton Filter (TZGF) with an improved shape ( L. Lin, E. Ambikairajah, WH Holmes, "Auditory Filterbank Design Using Masking Curves ", Proc. EUROSPEECH Scandinavia, 7th European Conference on Speech Communication and Technology, 2001 ). The arithmetic effort of the one-zero gamma-tone filter of order m + 1 is equal to the effort of the three-zero gamma-tone filter of order m, if again a single "one-zero" is used for all channels k. Suitable transforms and digital frequency calculations θ _{z, k} follow from (19), (20) and (22).

X. WIEDERZUSAMMENSETZUNGX. REPRODUCTION

Die Wiederzusammensetzung eines breitbandigen Signals von den hörbaren Bandsignalen kann als Addition aller Signalbänder implementiet werden. Unglücklicherweise kann das destruktive Signalauslöschung in den Überlappungsbereichen benachbarter Signalkanäle mit sich bringen. Deshalb leiten wir ein einfaches Kriterium ab, das die Notwendigkeit eines Vorzeichenwechsels für jeden zweiten Kanal vor der Summation zeigt:

The reassembly of a wideband signal from the audible band signals may be implemented as addition of all signal bands. Unfortunately, destructive signal cancellation can occur in the overlapping regions of adjacent signal channels. Therefore, we derive a simple criterion that shows the need for a sign change for every other channel before the summation:

Bei der Verwendung dieser Formel liegt die Frequenzantwort der Superposition aller Signale im Bereich C_dB + 3[dB] and 0[dB]. Das Weglassen eines notwendigen Vorzeichens kann zu destruktiver Signalauslöschung an den Bandrändern benachbarter Filter führen.Using this formula, the frequency response of the superposition of all signals is in the range C _dB + 3 [dB] and 0 [dB]. The omission of a necessary sign may result in destructive signal cancellation at the band edges of adjacent filters.

XI. (LAUFZEITREDUZIERTE) PEGELERKENNUNGXI. (RATE-REDUCED) LEVEL DETECTION

Von der Gehör-Filterbank modellierte Ausblendungseffekte können nicht ausgenutzt werden, solange die Amplitude des Filterbankkanals nicht bestimmt ist. Geeignete Wege der Pegelerkennung werden in der folgenden Unterabschnittten vorgeschlagen.From the auditory filter bank can model muted effects not be exploited as long as the amplitude of the filter bank channel is not determined. Suitable ways of level detection are in the following subsections.

Wir schlagen den ersten einfachen Ansatz für hochfrequente Kanäle und den laufzeitreduzierten Ansatz für die niederfrequente Bänder vor.We suggest the first simple approach to high-frequency Channels and the runtime-reduced approach for the low frequency bands.

A. Einfache Pegelerkennung mit Pre-MaskingA. Easy level detection with pre-masking

Normalerweise werden Nichlinearitäten, wie z. B. Absolutbetrag, Quadrat, Halbwellen-Gleichrichtung, dazu verwendet, um die Signalamplitude in das Basisband bei etwa 0 Hz zu transformieren. Des Weiteren entfernt ein Glättungsfilter höherfrequente Komponenten, und letztendlich wird das gewünschte Amplitudensignal gefunden. 11 zeigt ein Beispiel, das auch den Formfaktor F mitberücksichtigt.Normally, nonlinearities such. Absolute magnitude, square, half-wave rectification, used to transform the signal amplitude into baseband at about 0 Hz. Furthermore, a smoothing filter removes higher frequency components, and ultimately the desired amplitude signal is found. 11 shows an example that also takes into account the form factor F.

Üblich verwendete Ansätze der Amplitudenerkennung sind rechnerisch effizient, Glättungsfilter beinhalten Gruppenlaufzeiten im Signalpfad, die zu kompensieren sind. Wir empfehlen den rekursiven Glättungsparameter α durch eine Zeitkonstante τ_avg in [s] zu beschreiben

Usally used approaches of amplitude detection are computationally efficient, smoothing filters include group delay times in the signal path, which are to be compensated. We recommend _describing the recursive smoothing _parameter α by a time constant τ _avg in [s]

Geeignete Zeitkonstanten stimmen mit der Vor-Hörausblendzeitkonstante überein, und ist näherungsweise τ_avg ≈ 2[ms] ( G. Stoll, J. G. Beerends, R. Bitto, K. Brandenburg, C. Colomes, B. Feiten, M. Keyhl, C. Schmidmer, T. Sporer, T. Thiede, W. C. Treurniet, ”PEAQ – der neue ITU-Standard zur objektiven Messung der wahrgenommenen Audioqualität”, RTM – Rundfunktechnische Mitteilungen, die Fachzeitschrift für Hörfunk und Fernsehtechnik, 43. Jahrgang, ISSN 0035-9890 (81–120), Firma Mensing GmbH + Co. KG, Abteilung Verlag, Sept 1999 ).Suitable time constants coincide with the pre-holographic time constant, and is approximately τ _avg ≈ 2 [ms] ( G. Stoll, JG Beerends, R. Bitto, K. Brandenburg, C. Colomes, B. Feiten, M. Keyhl, C. Schmidmer, T. Sporer, T. Thiede, WC Treurniet, "PEAQ - the New ITU Standard for the objective measurement of perceived audio quality ", RTM - Rundfunktechnische Mitteilungen, the trade journal for radio and television technology, 43rd year, ISSN 0035-9890 (81-120), Mensing GmbH + Co. KG, Publishing Department, Sept 1999 ).

B. Laufzeitreduzierte PegelerkennungB. Runtime reduced level detection

Unsere neue Methode nützt die Phase eines einfachen Filterabschnitts aus. Diese Methode zur Pegelerkennung kann ebenfalls in anderen technischen Gebieten Anwendung finden und ist nicht alleine auf die Störgeräuschunterdrückung beschränkt.Our new method takes advantage of the phase of a simple filter section out. This method of level detection can also be done in others technical fields apply and is not alone on the noise reduction is limited.

Mit der Hilbert-Transformation kann das breitbandige Signal konsistent um 90° phasenverschoben werden. Durch Summation der Quadrate des originalen und des verschobenen Signals bleiben die Quadrate der Amplituden (z. B. Signalleistung), und die sinusförmigen Komponenten löschen einander aus. Aber eine kausale Implementierung der Hilbert-Transformation existiert nicht.With The Hilbert transform makes the broadband signal consistent be phase-shifted by 90 °. By summation of the squares of the original and the shifted signal remain the squares of the Amplitudes (eg signal power), and the sinusoidal Components cancel each other out. But a causal implementation the Hilbert transformation does not exist.

Im Gegensatz zum idealen Hilbert-Transformator, benötigen wir die 90° Phasenverschiebung nur im betrachteten Frequenzintervall, z. B. in der entsprechenden hörbaren Frequenzgruppe.in the Unlike the ideal Hilbert transformer, need the 90 ° phase shift only in the considered frequency interval, z. In the corresponding audible frequency group.

Wir schlagen vor, folgende Filterarten für eine 90° Phasenverschiebung bei einer Frequenz θ_k zu verwenden:

• einen einfachen FIR-Abschnitt erster Ordnung,
• einen einfachen IIR-All-Pass (AP) erster Ordnung, und
• eine einfache Verzögerungsline mit einer λ/4 Verzögerung bei θ_k.

We suggest using the following filter types for a 90 ° phase shift at a frequency θ _k :

• a simple FIR section of first order,
• a simple IIR All-Pass (AP) of first order, and
• a simple delay line with a λ / 4 delay at θ _k .

Jede der obgenannten Methoden erbringt 90° Phasenverschiebung bei einer virtuellen beliebigen Frequenz θ_k und ist deshalb geeignet.Each of the above methods provides 90 ° phase shift at a virtual arbitrary frequency θ _k and is therefore suitable.

Man kann zwischen den folgenden Eigenschaften wählen:

• FIR: numerisch nicht stabil bei θ_k = [0, π/2, π], bietet das breiteste Band mit 90° Phasenverschiebung.
• AP: numerisch nicht stabil bei θ_k = [0, π/2, π], das 90° Phasen-Frequenzband ist schmäler und der Rechenaufwand ist größer.
• λ/4-delay: numerisch stabil, das schmalste Frequenzband mit 90° Phaseverschiebung, Rechenaufwand gering, viel Speicher notwendig.

You can choose between the following properties:

• FIR: numerically unstable at θ _k = [0, π / 2, π], provides the widest band with 90 ° phase shift.
• AP: numerically not stable at θ _k = [0, π / 2, π], the 90 ° phase frequency band is narrower and the computational effort is greater.
• λ / 4-delay: numerically stable, the narrowest frequency band with 90 ° phase shift, low computation, much memory required.

12 zeigt ein Beispiel für die FIR-Pegelerkennungsmethode. Ein geeigneter Parameter kann über die Phasengleichung für das entsprechende System gefunden werden, z. B. A. V. Oppenheim, R. W. Schafer, J. R. Buck, ”Discrete-Time Signal Processing”, Prentice Hall, 1999 . 12 shows an example of the FIR level detection method. A suitable parameter can be found via the phase equation for the corresponding system, e.g. BA V. Oppenheim, RW Schafer, JR Buck, "Discrete Time Signal Processing", Prentice Hall, 1999 ,

XII. AUDITORISCHES POST-MASKINGXII. AUDITOR POST MASKING

Die Verwendung der nichtlinearen Post-Masking-Filter (z. B. rekursive Mittelwertbildung reagiert auf fallende Flanken) birgt einige Vorteile:

• Die Impulsive Störgeräuschvarianz ist wegen dem Nachausblenden leicht überschätzt (Übersubtraktion).
• Störgeräuschunterdrückungsalgorithmen können keine Signale abschwächen bis die Nach-Hörausblendzeit verstrichen ist.
• Aliasing-Effekte nach dem Downsampling oder die Welligkeit im Amplitudensignal sind aufgrund der glättenden Wirkung des Nachausblendens reduziert.
• Dabei wird geglättet und die Amplituden der wichtigen transient Signale erfahren keine zusätzlichen Grupppenverzögerungszeiten.

The use of non-linear post-masking filters (such as recursive averaging on falling edges) has several advantages:

• The Impulsive noise variance is slightly overestimated (over-subtraction) due to the fade out.
• Noise reduction algorithms can not attenuate signals until the post-audible timeout has elapsed.
• Aliasing effects after downsampling or ripple in the amplitude signal are reduced due to the smoothing effect of the fade out.
• This smoothes and the amplitudes of the important transient signals do not experience additional group delay times.

Wir schlagen eine Struktur vor, die an der Signalleistung in jeden Kanal arbeitet (vgl. 13, L. Lin, E. Ambikairajah, W. H. Holmes, ”Perceptual Domain Based Speech and Audio Coder”, Proc. of the third International Symposion DSPCS 2002, Sydney, Jan. 28–31, 2002 ).We propose a structure that works on the signal power in each channel (cf. 13 . L. Lin, E. Ambikairajah, WH Holmes, "Perceptual Domain Based Speech and Audio Coder", Proc. of the third International Symposium DSPCS 2002, Sydney, Jan. 28-31, 2002 ).

Der Mittelwertparameter α_k im Kanal k hat mit dem menschlichen Nach-Hörausblendzeitkonstanten für die ensprechenden Frequenzen f_k zu korrespondieren. Deshalb verwenden wir folgende Gleichung um den Mittelwertparameter α herzuleiten:

The mean parameter α _k in the channel k has to correspond to the human post-echo time constant for the corresponding frequencies f _k . Therefore, we use the following equation to derive the mean parameter α:

Ein Parameter G kann zum Skalieren der Nachausblendzeitkonstanten verwendet werden.One Parameter G can be used to scale the fade out time constants become.

Die Zeitkonstante für 1[Bark] ist näherungsweise

≈ 40[ms], und für 20[Bark] näherungsweise

≈ 4[ms] ( G. Stoll, J. G. Beeren, R. Bitto, K. Brandenburg, C. Colomes, B. Feiten, M. Keyhl, C. Schmidmer, T. Sporer, T. Thiede, W. C. Treurniet, ”PEAQ – der neue ITU-Standard zur objektiven Messung der wahrgenommenen Audioqualität”, RTM – Rundfunktechnische Mitteilungen, die Fachzeitschrift für Hörfunk und Fernsehtechnik, 43. Jahrgang, ISSN 0035-9890 (81–120), Firma Mensing GmbH + Co. KG, Abteilung Verlag, Sept 1999 ). Folgende Gleichung kann zur Herleitung von τ_k verwendet werden:

The time constant for 1 [Bark] is approximate

≈ 40 [ms], and approximately 20 for [Bark]

≈ 4 [ms] ( G. Stoll, JG Beeren, R. Bitto, K. Brandenburg, C. Colomes, B. Feiten, M. Keyhl, C. Schmidmer, T. Sporer, T. Thiede, WC Treurniet, "PEAQ - the New ITU Standard for the objective measurement of perceived audio quality ", RTM - Rundfunktechnische Mitteilungen, the trade journal for radio and television technology, 43rd year, ISSN 0035-9890 (81-120), Mensing GmbH + Co. KG, Publishing Department, Sept 1999 ). The following equation can be used to derive τ _k :

Alternativ kann die Gleichung in der zitierten Referenzen verwendet werden, aber unsere Formel bietet eine geeignete Interpolation mit längeren Zeitkonstanten.alternative the equation can be used in the cited references, but our formula offers a suitable interpolation with longer ones Time constants.

XIII. REKURSIVE MINIMUM-STATISTIKXIII. REKURSIVE MINIMUM STATISTICS

Wir können die Struktur in 14 verwenden, um den Störgeräuschpegel in jedem Frequenzband abzuschätzen. Ähnliche Ansätze können in R. Martin, ”Noise Power Spectral Estimation Based on Optimal Smoothing and Minimum Statistics”, IEEE Transactions on Speech and Audio Processing, nr. 5, vol. 9, pp. 504–512, Jul. 2001 oder WO 00/30264 (International application No. PCT/SG99/00119 ) gefunden werden.We can structure in 14 use to estimate the noise level in each frequency band. Similar approaches can be found in R. Martin, "Noise Power Spectral Estimation Based on Optimum Smoothing and Minimum Statistics", IEEE Transactions on Speech and Audio Processing, no. 5, vol. 9, pp. 504-512, Jul. 2001 or WO 00/30264 (International application no. PCT / SG99 / 00119 ) being found.

Diese Methode verwendet hauptsächtlich drei Zeitkonstanten zum Mitteln der Signalpegel. Fallende Flanken werden leicht gemittelt, wobei während steigender Eingangsflanken der Ausgang während der Periode von N_w Abtastintervallen konstant gehalten wird (unendlich große Zeitkonstante). Wenn N_w Abtastintervalle verstrichen sind, wird die steigende Flanke durch eine dritte Zeitkonstante gemittelt. Die Zeitkonstanten können, ähnlich wie in (25) und (26), zu einem rekursiven Mittelwertparameter konvertiert werden.This method mainly uses three time constants to average the signal levels. Falling edges are slightly averaged, while during increasing input edges the output is kept constant during the period of N _w sampling intervals (infinite time constant). If N _w sampling intervals have elapsed, the rising edge is averaged by a third time constant. The time constants can be converted to a recursive mean parameter, similar to (25) and (26).

Eine geeignete Zählergrenze N_w kann mittels einem kontinuierlichen Zeitintervall T_w berechnet werden Nw = round(Tw·fs). (28) A suitable counter limit N _w can be calculated by means of a continuous time interval T _w N w = round (T. w · f s ). (28)

Für Äußerungen oder Wörter der menschlichen Sprache kann dieses Zeitintervall angemessen gewählt werden, z. B. T_w ≈ 1.5 s. Die Zeitkonstante für die fallende Flanke kann eine skalierte Version der Nachausblendzeitkonstante τ_k oder z. B. konstant 200[ms] sein.For utterances or words of human speech, this time interval may be appropriately selected, e.g. B. T _w ≈ 1.5 s. The time constant for the falling edge may be a scaled version of the fade time constant τ _k or z. Constant at 200 [ms].

Die steigendene Flanke definierede Zeitkonstante β kann näherungsweise 700[ms] sein, das einer Geschwindigkeit von circa 6[dB]/[s] entspricht. Im Gegensatz zu allen anderen Zeitkonstanten, wird diese als für alle Kanäle k gleich vorgeschlagen.The rising edge defining time constant β can be approximated 700 [ms], which corresponds to a speed of about 6 [dB] / [s]. Unlike all other time constants, this is considered to be for everyone Channels k suggested the same.

Die Sättigungswirkung in 14 kann wie folgt angegeben werden:

The saturation effect in 14 can be specified as follows:

XIV. EPHRAIM-MALAH STÖRGERÄUSCHUNTERDRÜCKUNGSREGEL (EMSR)XIV. EPHRAIM-MALAH TROUBLESHOOTING RULE (EI)

Mit der EMSR ( Y. Ephraim and D. Malah, ”Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator”, IEEE Transactions an Acoustics, Speech, and Signal Processing, nr. 6, vol. ASSP-32, pp. 1109–1121, Dec. 1984 ; Y. Ephraim and D. Malah, ”Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator”, IEEE Transactions on Acoustics, Speech, and Signal Processing, nr. 2, vol. ASSP-33, pp. 443–445, Apr. 1985 ) können wir die klare Sprachamplitude aus der gegebenen verrauschten Sprachamplitude und der Störgeräuschvarianz abschätzen. Wir können z. B. die Definition von Wolfe und Godsill für die spektralen Gewichte ( P. J. Wolfe and S. J. Godsill, ”Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement”, Proc. 11th IEEE Signal Processing Workshop, pp. 496–499, 6–8. Aug 2001 ) und einen modifizierten entscheidungsgesteuerten Ansatz ( F. Zotter, M. Noisternig, R. Höldrich, ”Speech Enhancement Using the Ephraim and Malah Suppression Rule and Decision Directed Approach: A Hysteretic Process”, to appear in IEEE Signal Processing Letters, 2005. First manuscript submitted Jan 24, 2005 ) verwenden

With the EMSR ( Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Transactions to Acoustics, Speech, and Signal Processing, no. 6, vol. ASSP-32, pp. 1109-1121, Dec. 1984 ; Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 2, vol. ASSP-33, pp. 443-445, Apr. 1985 ) we can estimate the clear speech amplitude from the given noisy speech amplitude and noise variance. We can z. For example, the definition of Wolfe and Godsill for the spectral weights ( PJ Wolfe and SJ Godsill, "Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement," Proc. 11th IEEE Signal Processing Workshop, pp. 496-499, 6-8. Aug 2001 ) and a modified decision-driven approach ( F. Zotter, M. Noisternig, R. Höldrich, "Speech Enhancement Using the Ephraim and Malah Suppression Rule and Decision Directed Approach: A Hysteretic Process", to appear in IEEE Signal Processing Letters, 2005. First manuscript submitted Jan 24, 2005 ) use

Die folgenden Beziehungen sind in der obigen Gleichung involviert:

The following relationships are involved in the above equation:

Die Störgeräuschvarianz σ2d,k [m] ist durch den Störgeräuschschätzalgorithmus gegeben; m und n sind Zeitindices, f_s ist die Systemabtastrate und L ist ein Downsampling-Faktor.The noise variance σ 2 d, k [M] is given by the noise estimation algorithm; m and n are time indexes, f _s is the system sampling rate and L is a downsampling factor.

Gemäß Y. Ephraim and D. Malah, ”Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator”, IEEE Transactions on Acoustics, Speech, and Signal Processing, nr. 6, vol. ASSP-32, pp. 1109–1121, Dec. 1984 , ist γ_k[m] das a posteriori SNR und ξ_k[m] das a priori SNR. G_w,k[m] ist das spektrale Gewicht des Wiener-Filters, α der Mittelwertparameter, definiert durch eine mittelwertbildene Zeitkonstante τ_snr,k, die entweder näherungsweise 2[ms] ( F. Zotter, M. Noisternig, R. Höldrich, ”Speech Enhancement Using the Ephraim and Malah Suppression Rule and Decision Directed Approach: A Hysteretic Process”, to appear in IEEE Signal Processing Letters, 2005. First manuscript submitted Jan 24, 2005 ) oder von den Hörausblendzeitkonstanten ableitet ist.According to Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 6, vol. ASSP-32, pp. 1109-1121, Dec. 1984 , γ _k [m] is the a posteriori SNR and ξ _k [m] is the a priori SNR. G _{w, k} [m] is the spectral weight of the Wiener filter, α the mean _parameter defined by a mean time constant τ _{snr, k} , which is either approximately 2 [ms] ( F. Zotter, M. Noisternig, R. Höldrich, "Speech Enhancement Using the Ephraim and Malah Suppression Rule and Decision Directed Approach: A Hysteretic Process", to appear in IEEE Signal Processing Letters, 2005. First manuscript submitted Jan 24, 2005 ) or is derived from the hearing-time constants.

Der ”Übersubtraktionsfaktor” ρ (vgl. Zotter et al ) kann als ρ = 10^–15/10 gewählt werden und der untere Störgeräuschparameter ζ als ζ = 10^–40/10 The "oversubduction factor" ρ (cf. Zotter et al ) can be selected as ρ = 10 ^-15/10 and the lower noise ^parameter ζ as ζ = 10 ^-40/10

XV. LAUFZEITREDUZIERTES UPSAMPLINGXV. RUNTIME-REDUCED UPSAMPLING

Normales Upsampling benötigt entweder eine Verarbeitungsverzögerung oder eine Gruppenlaufzeit wegen der involvierten Interpolationsoperation. Bei der Verwendung des Upsampling-Faktors L sind solche Verzögerungszeiten näherungsweise L Abtastschritte lang.normal Upsampling requires either a processing delay or a group delay because of the involved interpolation operation. When using the upsampling factor L, such delay times are approximately L scanning steps long.

Wir schlagen vor, eine spezielle Methode für das Upsampling zu verwenden, das keine zusätzlichen Verzögerungszeiten bringt. Das kann dadurch bewerkstelligt werden, dass das Signal in Puffer aufgeteilt wird (vorzugsweise mit einer Puffergröße des ADCs und DACs).We suggest a special method for upsampling to use that no extra delay times brings. This can be done by the signal is divided into buffers (preferably with a buffer size of the ADCs and DACs).

Wenn in jedem Signalblock der letzte Abtastwert des vorangegangenen Blocks vorhanden ist, ist es möglich die folgenden Abtastwerte linear zu interpolieren. Deshalb hat der letzte Abtastwert in jedem Block mit dem Abtastzeitpunkt der niedrigeren Abtastrate übereinzustimmen.If in each signal block, the last sample of the previous block is present, it is possible the following samples linear interpolation. Therefore, the last sample in each Block to coincide with the sampling time of the lower sampling rate.

XVI. SCHLUSSFOLGERUNGENXVI. CONCLUSIONS

Frequenzbereichslösungen, die äquivalente Gehörmodelle verwenden, benötigen Verzögerungszeiten im Bereich von 10 Milisekunden. Die Implementierung unseres Systems mit 20 Frequenzbändern und einem TZGF der dritten Ordnung hat eine mittlere Latenzzeit von 3.5 bis 4 Milisekunden. Der erforderliche rechnerische Aufwand ist etwa 8.9 MIPS bei fs = 16[kHz], das ist ein wenig mehr, als für DFT-Lösungen benötigt wird (7 MIPs). Wir haben ebenfalls eine leicht modifizierte Ephraim-Malah-Unterdrückungsregel (EMSR) mit der vereinfachten Wolfe-Godsill-Formel und dem modifizierten entscheidungsgesteuerten Ansatz angewendet.Frequency range solutions need to use the equivalent hearing models Delay times in the range of 10 milliseconds. The Implementation of our system with 20 frequency bands and a TZGF of the third order has a mean latency from 3.5 to 4 milliseconds. The required arithmetic effort is about 8.9 MIPS at fs = 16 [kHz], that's a little more than needed for DFT solutions (7 MIPs). We also have a slightly modified Ephraim-Malah suppression rule (EMSR) with the simplified Wolfe-Godsill formula and modified decision-driven approach.

Die Offenbarung aller zitierten Publikationen ist zur Gänze in dieser Beschreibung eingeschlossen.The Revelation of all cited publications is wholly included in this description.

ZusammenfassungSummary

Methode zur Störgeräuschunterdrückung für ein Eingangsaudiosignal (y[n]), das ein gewünschtes Signal (x[n]) und eine Störgeräuschsignalkomponente aufweist, wobei die Methode folgende Schritte aufweist:

– Aufspaltung des Eingangsaudiosignals (y[n]) in eine Vielzahl von Frequenzteilbänder (y_k[n]) durch eine Bandaufspaltungsanalyse,
– Störgeräuschunterdrückung in jedem Teilband (y_k[n]) durch eine Vielzahl von Störgeräuschenunterdrückungsprozessoren,
– Zusammensetzung der Vielzahl von Teilbändern (y_k[n]) zu einem Ausgangssignal (x ^[n]) durch ein Synthesefilter,

wobei alle Schritte im Zeitbereich ausgeführt werden.A method for noise suppression for an input audio signal (y [n]) having a desired signal (x [n]) and a noise signal component, the method comprising the steps of:

Splitting the input audio signal (y [n]) into a plurality of frequency subbands (y _k [n]) by a band splitting analysis,
Noise suppression in each subband (y _k [n]) by a plurality of noise suppression processors,
- composition of the plurality of subbands (y _k [n]) to an output signal (x ^ [n]) through a synthesis filter,

where all steps are performed in the time domain.

ZITATE ENTHALTEN IN DER BESCHREIBUNGQUOTES INCLUDE IN THE DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list The documents listed by the applicant have been automated generated and is solely for better information recorded by the reader. The list is not part of the German Patent or utility model application. The DPMA takes over no liability for any errors or omissions.

Zitierte PatentliteraturCited patent literature

WO 2006114100 [0013]
WO 00/30264 [0081, 0118]
SG 99/00119 [0081, 0118]

Zitierte Nicht-PatentliteraturCited non-patent literature

- SF Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction," IEEE Trans. Acoust. Speech and Sig. Proc., Vol. ASSP-27, pp. 113-120, Apr. 1979 [0007]
- CL Wang et al., "The unimportance of phase in speech enhancement," IEEE Trans. Acoust. Speech and Sig. Proc., Vol. ASSP-30, pp. 679-681, Aug. 1982 [0007]
M. Berouti et al., "Enhancement of speech corrupted by acoustic noise," in Proc. IEEE Int. Conf. on Acoust., Speech and Sig. Proc. (ICASSP'79), vol. 4, pp. 208-211, Washington DC, Apr. 1979 [0008]
- WM Kushner et al., "The effects of subtractive-type speech enhancement / noise reduction algorithms on parameter estimation for improved recognition and coding in high-noise environments," in Proc. IEEE Int. Conf. Acoustics, Speech and Sig. Proc. (ICASSP'89), vol. 1, pp. 211-214, 1989 [0008]
R. McAulay and M. Malpass, "Speech enhancement using a soft-decision noise suppression filter," in IEEE Trans. Acoust, Speech and Sig. Proc., Vol. 28, no. 2, pp. 137-145, 1980 [0009]
Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time estimator," IEEE Trans. Acoust. Speech and Sig. Proc, vol. 32, no. 6, pp. 1109-1121, 1984 [0010]
Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error log spectral amplitude estimator," IEEE Trans. Acoust. Speech and Sig. Proc., Vol. 33, no. 2, pp. 443-445, 1985 [0010]
- O. Capp, "Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor," IEEE Trans. Speech and Audio Proc., Vol. 2, no. 2, pp. 345-349, 1994 [0010]
- E. Malah et al., "Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments," in Proc. IEEE Int. Conf. Acoust., Speech and Sig. Proc. (ICASSP'99), vol. 2, pp. 789-792, 1999 [0010]
R. McAulay and M. Malpass, "Speech enhancement using a soft-decision noise suppression filter," in IEEE Trans. Acoust., Speech and Sig. Proc., Vol. 28, no. 2, pp. 137-145, 1980 [0011]
WJ Hess, "A pitch-synchronous digital feature extraction system for phonemic recognition of speech", in IEEE Trans. Acoust., Speech and Sig. Proc., Vol. 24, no. 1, pp. 14-25, 1976 [0011]
- Arslan et al. [0011]
- L. Arslan et al. "New methods for adaptive noise suppression", in Proc. Int. Conf. on Acoustics, Speech and Sig. Proc. (ICASSP-95), Detroit, May 1995 [0011]
R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics," in IEEE Trans. Speech and Audio Proc., Vol. 9, no. 5, pp. 512, July 2001 [0011]
- Ealey et al. [0011]
D. Ealey et al., "Harmonic tunneling: tracking non-stationary noises during speech," in Proc. Eurospeech Aalborg, 2001 [0011]
J. Sohn and W. Sung, "A voice activity detector employing soft decision based noise spectrum adaptation," in Proc. IEEE Int. Conf. Acoustics, Speech and Sig. Proc. (ICASSP'98), vol. 1, pp-365-368, 1998 [0011]
Y. Ephraim and HL Van Trees, "A signal subspace approach for speech enhancement", in IEEE Trans. Speech and Audio Proc., Vol. 3, pp. 251-266, July 1995 [0012]
J. Skoglund and WB Kleijn, "On Time-Frequency Masking in Voiced Speech", in IEEE Trans. Speech and Audio Proc, vol. 8, no. 4, pp. 361-369, July 2000 [0013]
N. Virag, "Single Channel Speech Enhancement Based on Masking Properties of the Human Auditory System", IEEE Trans. On Speech and Audio Proc., Vol. 7, no. 2, pp. 126-137, March 1999 [0027]
JO Smith III and JS Abel, "Bark and ERB Eilinear Transforms," IEEE Trans. On Speech and Audio Proc., Vol. 7, no. 6, pp. 697-708, Nov. 1999 [0028]
RJ McAulay and ML Malpass, Speech Enhancement Using a Soft-Decision Noise Suppression Filter, IEEE Trans. On Acoust., Speech and Sig. Proc., Vol. ASSP-28, no. 2, pp. 137-145, April 1980 [0029]
- Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator", IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 6, vol. AS SP-32, pp. 1109-1121, Dec. 1984 [0049]
- Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator", IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 2, vol. ASSP-33, pp. 443-445, Apr. 1985 [0049]
- O. Cappé, "Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor", IEEE Transactions to Speech and Audio Processing, no. 2, vol. 2, pp. 345-349, Apr. 1994 [0049]
- PJ Wolfe and SJ Godsill, "Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement", Proc. 11th IEEE Signal Processing Workshop, pp. 496-499, 6-8. Aug 2001 [0049]
- Cohen and B. Berdugo, Speech Enhancement for non-stationary noise environments, Signal Processing, no. 11, pp. 2403-2418, Elsevier, Nov. 2001 [0049]
I. Cohen, "Speech Enhancement Using a Noncausal A Priori SNR Estimator", IEEE Signal Processing Letters, no. 9, pp. 725-728, Sep. 2004 [0049]
- I Cohen, "Relaxed Statistical Model for Speech Enhancement and A Priori SNR Estimation", Center for Communication and Information Technologies, Israel Institute of Technology, Oct, 2003, CCIT Report no. 443 [0049]
MK Hasan, S. Salahuddin, MR Khan, "A Modified Priori SNR for Speech Enhancement Using Spectral Subtraction Rules", IEEE Signal Processing Letters, vol. 11, no. 4, pp 450-453, April 2004 [0049]
- PJ Wolfe and SJ Godsill, "Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement", Proc. 11th IEEE Signal Processing Workshop, pp. 496-499, 6-8. Aug 2001 [0051]
- Cohen and B. Berdugo, Speech Enhancement for non-stationary noise environments, Signal Processing, no. 11, pp. 2403-2418, Elsevier, Nov. 2001 [0051]
D. Ealey, H. Kelleher, D. Pearce, "Harmonic Tunneling: Tracking Non-Stationary Noises During Speech", Proc. Eurospeech, 2001 [0051]
- Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator", IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 6, vol. ASSP-32, pp. 1109-1121, Dec. 1984 [0055]
- Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator", IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 2, vol. ASSP-33, pp. 443-445, Apr. 1985 [0055]
- PJ Wolfe and SJ Godsill, "Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement", Proc. 11th IEEE Signal Processing Workshop, pp. 496-499, 6-8. Aug 2001 [0055]
- I Cohen, "Relaxed Statistical Model for Speech Enhancement and A Priori SNR Estimation", Center for Communication and Information Technologies, Israel Institute of Technology, Oct, 2003, CCIT Report no. 443 [0055]
- O. Cappé, "Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor", IEEE Transactions to Speech and Audio Processing, no. 2, vol. 2, pp. 345-349, Apr. 1994 [0068]
- O. Cappé, "Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor", IEEE Transactions on Speech and Audio Processing, no. 2, vol. 2, pp. 345-349, Apr. 1994 [0072]
I. Cohen, "Speech Enhancement Using a Noncausal A Priori SNR Estimator", IEEE Signal Processing Letters, no. 9, pp. 725-728, Sep. 2004 [0078]
MK Hasan, S. Salahuddin, MR Khan, "A Modified Priori SNR for Speech Enhancement Using Spectral Subtraction Rules", IEEE Signal Processing Letters, vol. 11, no. 4, pp 450-453, April 2004 [0078]
- RF Lyon, "The All-Pole Gammatone Filters and Auditory Models", Proc. Forum Acusticum, Antwerp 1996 [0081]
L. Lin, E. Ambikairajah, WH Holmes, "Auditory Filterbank Design Using Masking Curves", Proc. EUROSPEECH Scandinavia, 7th European Conference on Speech Communication and Technology, 2001 [0081]
L. Lin, E. Ambikairajah, WH Holmes, "Perceptual Domain Based Speech and Audio Coder", Proc. of the third International Symposium DSPCS 2002, Sydney, Jan. 28-31, 2002 [0081]
- Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator", IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 6, vol. ASSP-32, pp. 1109-1121, Dec. 1984 [0081]
- Ephraim and D. Malah, Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator, IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 2, vol. ASSP-33, pp. 443-445, Apr. 1985 [0081]
- PJ Wolfe and SJ Godsill, "Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement", Proc. 11th IEEE Signal Processing Workshop, pp. 496-499, 6-8. Aug 2001 [0081]
L. Lin, E. Ambikairajah, "Speech Denoising Based on an Auditory Filterbank", 6th ICSP, International Conference on Signal Processing, (552-555), 26-30 Aug. 2002 [0081]
- G. Stoll, JG Beerends, R. Bitto, K. Brandenburg, C. Colomes, B. Feiten, M. Keyhl, C. Schmidmer, T. Sporer, T. Thiede, WC Treurniet, "PEAQ - the new ITU Standard for the objective measurement of the true genome mens Audioqualität ", RTM - Rundfunktechnische Mitteilungen, the trade journal for radio and television technology, 43rd year, ISSN 0035-9890 (81-120), Mensing GmbH + Co. KG, Publishing Department, Sept 1999 [0081]
L. Lin, E. Ambikairajah, WH Holmes, "Perceptual Domain Based Speech and Audio Coder", Proc. of the third International Symposium DSPCS 2002, Sydney, Jan. 28-31, 2002 [0081]
E. Zwicker, H. Fastl, "Psychoacoustics, Facts and Models", Springer, Berlin Heidelberg, 1999 [0083]
- E. Terhardt, "Acoustic Communications", Springer, Berlin Heidelberg, 1998 [0083]
Dutilleux, U. Zölzer, "DAFX", Wiley & Sons, 2002 [0085]
E. Zwicker, H. Fastl, "Psychoacoustics, Facts and Models", Springer, Berlin Heidelberg, 1999 [0087]
- E. Terhardt, "Acoustic Communications", Springer, Berlin Heidelberg, 1998 [0087]
E. Zwicker, H. Fastl, "Psychoacoustics, Facts and Models", Springer, Berlin Heidelberg, 1999 [0089]
- RF Lyon, "The All-Pole Gammatone Filters and Auditory Models", Proc. Forum Acusticum, Antwerp 1996 [0091]
- RF Lyon, "The All-Pole Gammatone Filters and Auditory Models", Proc. Forum Acusticum, Antwerp 1996 [0094]
AV Oppenheim, RW Schafer, JR Buck, "Discrete Time Signal Processing", Prentice Hall, 1999 [0094]
- RF Lyon, "The All-Pole Gammatone Filters and Auditory Models", Proc. Forum Acusticum, Antwerp 1996 [0095]
- RF Lyon, "The All-Pole Gammatone Filters and Auditory Models", Proc. Forum Acusticum, Antwerp 1996 [0096]
L. Lin, E. Ambikairajah, WH Holmes, "Auditory Filterbank Design Using Masking Curves", Proc. EUROSPEECH Scandinavia, 7th European Conference on Speech Communication and Technology, 2001 [0097]
- G. Stoll, JG Beerends, R. Bitto, K. Brandenburg, C. Colomes, B. Feiten, M. Keyhl, C. Schmidmer, T. Sporer, T. Thiede, WC Treurniet, "PEAQ - the new ITU Standard for the objective measurement of perceived audio quality ", RTM - Rundfunktechnische Mitteilungen, the trade journal for radio and television technology, 43rd year, ISSN 0035-9890 (81-120), Mensing GmbH + Co. KG, Publishing Department, Sept 1999 [0104 ]
- Oppenheim, RW Schafer, JR Buck, "Discrete Time Signal Processing", Prentice Hall, 1999 [0111]
L. Lin, E. Ambikairajah, WH Holmes, "Perceptual Domain Based Speech and Audio Coder", Proc. of the third International Symposium DSPCS 2002, Sydney, Jan. 28-31, 2002 [0113]
- G. Stoll, JG Beeren, R. Bitto, K. Brandenburg, C. Colomes, B. Feiten, M. Keyhl, C. Schmidmer, T. Sporer, T. Thiede, WC Treurniet, "PEAQ - the new ITU Standard for the objective measurement of perceived audio quality ", RTM - Rundfunktechnische Mitteilungen, the trade journal for radio and television technology, 43rd year, ISSN 0035-9890 (81-120), Mensing GmbH + Co. KG, Publishing Department, Sept 1999 [0116 ]
- R. Martin, "Noise Power Spectral Estimation Based on Optimum Smoothing and Minimum Statistics", IEEE Transactions on Speech and Audio Processing, no. 5, vol. 9, pp. 504-512, Jul. 2001 [0118]
- Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Transactions to Acoustics, Speech, and Signal Processing, no. 6, vol. ASSP-32, pp. 1109-1121, Dec. 1984 [0124]
- Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator", IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 2, vol. ASSP-33, pp. 443-445, Apr. 1985 [0124]
- PJ Wolfe and SJ Godsill, "Simple Alternatives to the Ephraim and Malah Suppression Rule for Speech Enhancement", Proc. 11th IEEE Signal Processing Workshop, pp. 496-499, 6-8. Aug 2001 [0124]
- F. Zotter, M. Noisternig, R. Höldrich, "Speech Enhancement Using the Ephraim and Malah Suppression Rule and Decision Directed Approach: A Hysteretic Process", to appear in IEEE Signal Processing Letters, 2005. First manuscript submitted Jan 24, 2005 [0124]
- Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator", IEEE Transactions on Acoustics, Speech, and Signal Processing, no. 6, vol. ASSP-32, pp. 1109-1121, Dec. 1984 [0127]
- F. Zotter, M. Noisternig, R. Höldrich, "Speech Enhancement Using the Ephraim and Malah Suppression Rule and Decision Directed Approach: A Hysteretic Process", to appear in IEEE Signal Processing Letters, 2005. First manuscript submitted Jan 24, 2005 [0127]
- Zotter et al [0128]

Claims

A method for noise suppression for an input audio signal (y [n]) having a desired signal (x [n]) and a noise signal component, the method comprising the steps of: splitting the input audio signal (y [n]) into a plurality of frequency subbands (y _k [n]) by a band splitting analysis, - noise reduction in each subband (y _k [n]) by a plurality of Störgeräuschenunterdrückungsprozessoren, - composition of the plurality of subbands _(yk [n]) (an output signal x [n ]) through a synthesis filter, where all steps are performed in the time domain.

Method according to claim 1, characterized that the splitting of the input audio signal into a plurality of Subbands by the band splitting analysis according to the human volume hearing sensation executed becomes.

Method according to claim 1, characterized the band split analysis is a gammaton filter bank (GFB), preferably a nonuniform gammaton filterbank.

Method according to one of claims 1 to 3, characterized in that a pre-processor (H _OME ) and a post-processor (H _IOME ) _perform non-linear filtering of the input audio signal, and a. a preprocessing filter that emulates the transfer behavior of the human outer and middle ear and applies it to the discrete-time noisy input audio signal; and b. a post-processing filter applied to the amplified full-band signal to compensate for the effect of the preprocessed filter.

Method according to one of claims 1 to 4, characterized in that each noise processor a signal level detection (LD), a noise estimator (NE), an auditory blanking filter (PM) and a subtraction processor includes.

Method according to claim 5, characterized in that said signal level detection (LD) evaluates the phase of the low-order filter section to generate a quadratic signal and an in-phase signal from the subband signal y _k [n]), and the sums up the quadratic amplitudes of these signals.

Method according to claim 5, characterized in that that the noise estimator mentioned therein a subband noise value by smoothing based on the minimum statistics generated, in particular weighted averaging of the previous noise value and the current input value with three different ones Time constant is applied.

Method according to claim 5 or 6, wherein said Blanking filter the detected signal power in each Subband for generating a temporary on a human Hearing sensed based blanking used, where in particular a nonlinear weighted average of the preceding Subband input value and the current input value only with falling edges depending on the detected Level is applied in each subband.

Method according to one of claims 1 to 8, being the update of the noise estimator depends on the current input value compared to time-dependent level-dependent Thresholds, z. If the current input value is larger is a predetermined threshold, becomes the current one Input value not considered as new and the named Estimator is not updated.

Method according to one of claims 1 to 9, with the noise canceling in each subband through the Ephraim-Malah noise suppression rule (EMSR) is executed.

Method according to one of claims 1 to 10, wherein the noise suppression in each sub-band through a decision-driven approach (DDA) is executed.

A noise cancellation apparatus for an input audio signal (y [n]) having a desired signal (x [n]) and a noise signal component, the apparatus comprising the steps of: A band splitting analyzer for splitting the input audio signal (y [n]) into a plurality of frequency subbands (y _k [n]), a plurality of noise suppression processors for noise suppression in each subband (y _k [n]), a synthesis filter for the composition the plurality of subbands (y _k [n]) to an output signal (x ^ [n]), wherein band splitting analyzer, noise suppression processors, and synthesis filters operate in the time domain.

Apparatus according to claim 12, characterized a level detector (LD) is provided for each subband is.

Apparatus according to claim 13, characterized in that said level detector (LD) evaluates the phase of the low-order filter section to generate a quadratic signal, and an in-phase signal from the subband signal y _k [n]), and the quadratic amplitudes summed up these signals.

Apparatus according to claim 14, characterized that said quadratic signal from one in the level detector (LD) provided FIR first-order portion.

Apparatus according to claim 14, characterized that said quadratic signal from one in the level detector (LD) first-order FIR all-pass (AP) is generated.

Apparatus according to claim 14, characterized in that said quadratic signal is generated from a delay line to provide a λ / 4 delay at a digital frequency (θ _k ).

Apparatus according to any one of claims 12 to 17, characterized in that each noise processor a signal level detector (LD), a noise estimator (NE), an auditory blanking filter (PM) and a subtraction processor having.

Apparatus according to any one of claims 12 to 18, characterized in that a band split analyzer a gammaton filter bank (GFB), preferably a non-uniform one Gammaton filter bank has.

Apparatus according to one of claims 12 to 19, characterized in that a pre-processor (H _OME ) and a post-processor (H _IOME ) are provided for the non-linear filtering of the input audio signal, and a. a preprocessing filter which emulates the transfer behavior of the human outer and middle ear and is applied to the discrete-time noisy input audio signal; and b. a post-processing filter applied to the improved full-band signal to compensate for the effect of the preprocessed filter.